Restricted maximum likelihood estimation of genetic principal components and smoothed covariance matrices

Principal component analysis is a widely used 'dimension reduction' technique, albeit generally at a phenotypic level. It is shown that we can estimate genetic principal components directly through a simple reparameterisation of the usual linear, mixed model. This is applicable to any analysis fitting multiple, correlated genetic effects, whether effects for individual traits or sets of random regression coefficients to model trajectories. Depending on the magnitude of genetic correlation, a subset of the principal component generally suffices to capture the bulk of genetic variation. Corresponding estimates of genetic covariance matrices are more parsimonious, have reduced rank and are smoothed, with the number of parameters required to model the dispersion structure reduced from k(k + 1)/2 to m(2k - m + 1)/2 for k effects and m principal components. Estimation of these parameters, the largest eigenvalues and pertaining eigenvectors of the genetic covariance matrix, via restricted maximum likelihood using derivatives of the likelihood, is described. It is shown that reduced rank estimation can reduce computational requirements of multivariate analyses substantially. An application to the analysis of eight traits recorded via live ultrasound scanning of beef cattle is given.


INTRODUCTION
In most quantitative genetic analyses we are interested in multiple, correlated genetic effects per individual. Predominantly, these are additive genetic effects for different traits, but recently applications which model trajectories for traits measured repeatedly per individual through sets of correlated, (genetic) random regression (RR) coefficients, have found increasing use. Traditionally, the corresponding genetic covariance matrices have been considered unstructured, i.e. for k traits (or RR coefficients) there were k(k + 1)/2 distinct (co)variance components to be estimated. By and large, restrictions on estimates were imposed only to ensure that estimates were within the parameter space, i.e. that all variances and conditional variances were positive, that all correlation estimates were in the range of −1 to +1, and that all partial correlations were consistent with each other. In statistical terms, this was equivalent to the requirement that the estimated covariance matrix was positive semidefinite, i.e. that none of its eigenvalues was negative.
In contrast, other areas of statistics have long since assumed and estimated structured covariance matrices. On the one hand, these have been associated with longitudinal or spatial data, or simply 'repeated' records. Jennrich and Schluchter [21] and Wolfinger [57], for instance, discuss a number of common structures, including compound symmetry, autoregressive and factor-analytic structure, and random coefficient models (or see books by Diggle et al. [5] and Lindsey [32]). On the other hand, principal component (PC) analysis, and the closely related factor analysis, applied to records for several traits, have found widespread use, both as a dimension reduction technique and to identify common factors.
Analyses of quantitative genetic data have the added complexity, not shared by other fields, that we want to partition the phenotypic variance into its genetic and non-genetic components, using information on covariances between relatives to do so. For 'standard' multi-trait analyses, considering records for multiple characteristics usually measured only once per individual, little attention has been paid to estimation assuming any structural relationship. However, for traits with repeated records, measured potentially infinitely many times along some continuous scale such as time, also referred to as function-valued or functional traits, there has been substantial interest in modelling changing covariances along the associated trajectory, both at the genetic and environmental level. Following suggestions by Kirkpatrick et al. [29], covariances were described through covariance functions, based on orthogonal polynomials of the continuous covariable, which were generally estimated to reduced rank, i.e. involving polynomials of lower order than the number of points on the trajectory. It was shown that covariance functions could be estimated efficiently fitting mixed linear models which included sets of RR coefficients for all levels of random effects, and estimating the coefficients of the covariance functions as covariances among RR coefficients [39]. A recent review of applications in animal breeding is given by Schaeffer [48].
The parametric structures, widely used elsewhere [21,57], have been considered as well in genetic analyses of such data. Pletcher and Geyer [46] teamed parametric correlation functions involving low numbers of parameters with variance functions to accommodate heterogeneous variances, to model genetic and non-genetic covariances between records taken at different ages, dubbing the resulting models 'character process' models. Alternatively, 'hybrid' models have been described, which model genetic trajectories through sets of RR coefficients, but assume a parametric correlation structure, e.g. an auto-regressive or an ante-dependence structure, for the within subject covariances [8,40]. However, applications of the latter two types of model have been limited so far [1,2,18].
Eigenvalues and -vectors of covariance matrices per se have received relatively little attention. Hayes and Hill [12] used the so-called 'canonical decomposition', which involves simultaneous diagonalisation of two covariance matrices through an eigen-decomposition, to examine sampling properties of genetic selection indexes. Subsequently, Hayes and Hill [13] proposed a modification of eigenvalues on the canonical scale ('bending') to constrain non-positive definite estimates of genetic covariance matrices to the parameter space. In mixed model analyses, both genetic evaluation and variance component estimation, the transformation given by the associated matrix of eigenvectors has been used extensively to reduce computational requirements.
For models without additional random effects, with all traits measured on all animals and with equal design matrices, the canonical transformation reduces the multivariate analysis to a series of univariate analyses; see Jensen and Mao [23] for a review. Extensions to models with multiple random effects and missing data have been considered, in particular for genetic evaluation [6,7,31]. Whilst it was recognised that canonical indices, i.e. linear combinations of the original variables which are independent of each other and explain maximum amounts of genetic variation, have an interpretation in their own right (see Meyer [36] for an example), this has found little use. For instance, Wiggans and Goddard [55] developed six linear combinations of test day records on dairy cows for milk, fat and protein yields to represent most genetic variation in 60 measures based on a canonical decomposition of the genetic covariance matrix.
Popularity of RR analyses for function-valued traits has sparked interest in the eigendecomposition of covariance matrices of RR coefficients, especially at the genetic level. Eigenfunctions are the 'infinite-dimensional' equivalents to the eigenvectors, and can be estimated by evaluating the eigenvectors of coefficient matrices for the range of covariables fitted [29]. Genetic eigenfunctions then describe the expected deformation of the mean trajectory due to selection, i.e. the change in any point along the trajectory likely to occur when selecting on estimated breeding values for this eigenfunction; see Kirkpatrick et al. [29] and Kirkpatrick and Lofsvold [28] for further details. This implies that selection strategies to optimise changes along the complete curve are most easily determined considering the eigenfunctions directly [27,54]. Furthermore, we are not interested in estimates of the RR coefficients themselves, only in the resulting curve. Hence, as suggested by Kirkpatrick and Meyer [30], it is logical to estimate the values of eigenfunctions directly. This can be done by a simple reparameterisation of the model, as shown in this paper.
Estimating eigenvectors and -values of covariance matrices is equivalent to PC analysis [25]. PC analysis goes back to Hotelling [17] and Pearson [44] (both reprinted in Bryant and Atchley [3]). PC analysis is widely used as a dimension reduction technique, but so far has had only limited applications in quantitative genetic analyses. Eigenvectors define independent linear functions of the variables considered, the so-called principal components, which successively explain the maximum amount of variation, measured by the corresponding eigenvalues. This implies that for a given number of components considered, PCs minimise the mean square of deviations or, conversely, approximate the multivariate data most accurately. It follows that PCs with variances (eigenvalues) close to zero contribute virtually no information to the analysis, that is not already contained in the PCs with large eigenvalues. Hence, these components can be ignored, resulting in, by and large, equivalent analyses involving fewer variables, i.e. traits or RR coefficients, and often reduced sampling variation. Reducing the dimension of analyses by considering fewer variables can considerably decrease the associated computational requirements. This is applicable for both genetic evaluation and variance component estimation, and to all models involving multiple, correlated effects, i.e. 'standard' multivariate analyses as well as RR analyses. In terms of variance components, such analyses provide estimates of covariance matrices which are of correspondingly reduced rank and are smoothed. Kirkpatrick and Meyer [30] considered a derivative-free algorithm to estimate reduced rank genetic covariance matrices by restricted maximum likelihood (REML), using a PC parameterisation. James et al. [19] applied a PC model at the phenotypic level to sparse, function-valued data, using maximum likelihood estimation. This paper extends the approach of Kirkpatrick and Meyer [30] to algorithms using derivatives of the likelihood, in particular the so-called 'average information' (AI) algorithm, and gives an application for a multivariate analysis of 'live' carcass traits in beef cattle, recorded by ultrasound scanning.

Linear model
In the usual notation, let denote the mixed linear model, with y the vector of observations on N animals, β the vector of fixed effects, α the random vector of additive genetic effects for N A animals (including any parents without records), γ the random vector of animals' permanent environmental effects (including any non-additive genetic effects), ε the vector of residual errors, and X, Z and W the incidence matrices pertaining to β, α and γ, respectively. Assume y, α, γ and ε are ordered according to traits within animals, and let Σ A , Σ R and Σ E denote the k × k matrices of additive genetic, permanent environmental and temporary environmental covariances for the k traits. Temporary environmental effects are usually assumed independently distributed, i.e. Σ E = Diag{σ 2 i }. Assumed covariances are then with A the numerator relationship matrix between animals, I N an identity matrix of size N, Σ n j E the submatrix of Σ E for the jth animal with n j ≤ k observations, obtained by deleting any rows and columns pertaining to traits not recorded, and '⊗' and 'Σ + ' denoting the direct matrix product and sum, respectively. Covariances between random effects α, γ, and ε are taken to be zero. This gives Var(y) = ZGZ + WTW + R = V.

Implicit permanent environmental effects
In the simplest case, (1) represents a 'standard' multivariate model for k traits, with rows of Z and W containing a single non-zero element of unity only. If there are single records per trait, temporary environmental effects and permanent environmental effects due to the individual cannot be separated. This implies that we need to amalgamate Wγ and ε in (1) into a single effect ε = Wγ + ε with variance where Σ n j R is the submatrix of Σ R corresponding to Σ n j E , and phenotypic vari- Alternatively, we might choose to fit an equivalent model to (1) which accounts for permanent environmental covariances between repeated records for the same individual through the residual covariances, instead of explicitly fitting a random effect γ to model permanent environmental effects. This might be preferable if the number of equations in a mixed model analysis are of concern, if only some of the traits have repeated observations, or if we want to impose a parametric correlation structure on Σ R [40].

Extensions
Extensions of (1) and the algorithm described below to more complicated models including additional random effects, such as genetic or permanent environmental maternal effects, are straightforward. Note, that (1) encompasses RR models, where we describe the trajectories for individuals over the range of continuous covariable(s) through corresponding sets of regression coefficients. For instance, in a univariate RR model, we might fit k A genetic coefficients α i j for each animal and k R permanent environmental coefficients γ i j for each individual with records, with covariance matrices Σ A and Σ R , respectively. In contrast to the simple multivariate model, incidence matrices for RR models have elements corresponding to the vector of covariates, i.e. are considerably less sparse. For example, if we modelled the additive genetic trajectory for each animal through k A coefficients, regressing on Legendre polynomials of age at recording, each row of Z would have k A non-zero elements consisting of Legendre polynomials of degree 0 to k A − 1, evaluated for the age at which the corresponding record was taken. Computations, however, are analogous.

Reparameterisation
Consider a covariance matrix Σ, of dimension k × k. An eigenvalue decomposition of Σ gives with Λ the diagonal matrix of eigenvalues of Σ, λ i for i = 1, . . . , k, and E = (e 1 |e 2 | . . . |e k ) the pertaining matrix of eigenvectors e i . For a particular eigenvalue λ i , the corresponding e i is only determined to proportionality. Hence, standard procedures for eigenvalue decomposition usually report the e i standardised to a length of unity, so that E is orthonormal. In addition, eigenvalues and -vectors are usually given in descending order of magnitude of the λ i , and this is assumed to hold in the following. If Σ represented the covariance matrix of a vector of variables v, the linear function e 1 v with variance λ 1 would be the combination of the original variables which explained the maximum amount of variation. Similarly, with E orthogonal, E v with covariance matrix Λ gives k uncorrelated variables with the ith new variable explaining most variation, given counterparts 1 to i − 1. Hence, if we considered only the first m eigenvectors, we would obtain m linear combinations E m v capturing the maximum of the original variation (with E m the k × m submatrix of columns 1, . . . , m of E). This is the principle underlying the use of principal components as a 'dimension reduction' technique. Moreover, if λ m+1 , . . . , λ k are close to zero, the 'reproduced' matrix with Λ m the submatrix of Λ corresponding to E m , is a close approximation of Σ which has rank m and is smoothed. Assume Σ above stood for Σ A . We can then reparameterise (1) to with Var(α • ) = A ⊗ Λ m . A corresponding reparameterisation, based on the decomposition of Σ R , can be applied to γ. For m = k, (9) is an equivalent model to (1). Otherwise, i.e. for m < k, it has reduced dimension, considering the first m principal components only. In other words, estimating α • instead of α in a mixed model analysis directly yields estimates of breeding values for the linear combinations of the original variables given by the first m eigenvectors. This is of particular interest when (1) represents a RR model, so that the eigenvectors of Σ A describe deformations of the mean trajectory due to selection [29]. Misztal et al. [42] considered the parameterisation shown in (9) for RR analyses of test-day records on dairy cattle, but failed to recognise the interpretation of α • .
Alternatively, as suggested by Kirkpatrick and Meyer [30], we can reparameterise (1) so that the random effects fitted have variances of unity.
Q m is the k × m matrix of the first m ≤ k eigenvectors of the k × k covariance matrix Σ, scaled by the square root of the corresponding eigenvalues. Note that the columns of Q m do not have a norm of unity and, for m < k, Introducing subscripts A and R to distinguish between genetic and environmental effects and, for convenience, omitting subscript m in the following, we can then reparameterise (1) to (11) is equivalent to (1).

Variance component estimation
Kirkpatrick and Meyer [30] described how estimates of the parameters defining the dispersion structure for model (11) can be estimated by REML, using a derivative-free algorithm. They employed a step-wise procedure to maximise the likelihood with respect to only the elements of the ith eigenvectors in the ith step, to reduce the dimension of search and thus the number of likelihood evaluations required. This was teamed with a final maximisation with respect to all parameters. Using a graphical interpretation, i.e. considering changes in the directions and lengths of the eigenvectors through small rotations, allowed a maximisation with respect to the rotation angles, thus keeping the number of parameters to be estimated low.
In the following we describe REML algorithms to estimate the 'variance components' for model (11) utilising information on derivatives of the likelihood. The AI-REML algorithm, described by Johnson and Thompson [24] and Gilmour et al. [9] for univariate scenarios, utilises the average of observed and expected information in a Newton-Raphson type iterative estimation scheme. The average information is equivalent to second derivatives of the data part of the likelihood only, and is thus considerably easier to compute than either of the two former values. Multivariate implementations have been described, for instance, by Jensen et al. [22] and Meyer [38]. Whilst most implementations relied on computing the inverse of the coefficient matrix in the mixed model equations (MME) [9,22,24], the latter utilised a Cholesky factorisation of the coefficient matrix, and this approach is adopted in the following.

Parameters to be estimated
In contrast to other parameterisations, the 'variance components' for the random effects in (11) are incorporated into their design matrices, Z and W . Let θ of length p denote the vector of (variance) parameters to be estimated. On the original scale, there were k(k + 1)/2 covariance components for each covariance matrix Σ of size k × k. On first glance, it might appear that parameterisation (11) might have increased the number of parameters to be estimated, unless m was substantially smaller than k. Whilst each matrix Q has mk elements, it is a matrix of eigenvectors, albeit scaled, and we require eigenvectors e i for i = 1, . . . , m to be orthogonal. This implies that we have k parameters for the first eigenvector q 1 = √ λ 1 e 1 . For the second eigenvector, q 2 = √ λ 2 e 2 , however, there are only k -1 'free' elements, with the kth element determined by the restriction that q 1 q 2 = √ λ 1 λ 2 e 1 e 2 = 0. Analogously, for the ith column of Q, there are only k -i + 1 elements to be estimated, with the remaining i -1 elements fully determined by columns 1, . . . , i -1 of Q which are already estimated. Let elements 1 to k -i + 1 of q i be the elements to be estimated. Elements k -i + 2 to k are then obtained as solutions to the i -1 equations given by q r q i = 0 for r = 1, . . . , i -1.
with q i , r the rth element of q i (commas shown here for better separation of subscripts only). Hence, the number of parameters to be estimated for Q is for a reduced dimension parameterisation, the number of variables describing the covariance structure is reduced by (k−m)(k−m+1)/2. The reduction increases quadratically with the difference in dimension. In specific cases, the matrix to be inverted in (12) can become singular, which implies that the corresponding elements of q i are undetermined. The number of parameters to be estimated in total is then where m E is the number of distinct elements of Σ E . Parameters are the elements of Q A , q A i j with i = 1, m A and j = 1, k A − m A + 1, the elements of Q R , q R i j with i = 1, m R and j = 1, k R − m R + 1, and the temporary environmental variances, σ 2 1 , . . . , σ 2 m E . Note, that there are no constraints on the parameter space for the former. Variances σ 2 i have to be positive, but estimating log (σ 2 i ) instead of σ 2 i yields a parameterisation, for which maximising the likelihood is an unconstrained optimisation problem.
For a univariate RR analysis, m E might be as low as 1 when measurement error variances are assumed homogeneous. For standard multivariate analyses of k traits, usually m E = k and k R ≤ k − 1. If there are only single records per trait, there is no information to separate temporary and permanent environmental covariances, i.e. m E = 0 and m R = k. Alternatively, we might impose a structure of This facilitates estimation of temporary environmental variances even though there are no repeated records per animal, but m R must be chosen small enough, so that

Likelihood
The MME for (11) are then or Cû = r. Whilst the part of the coefficient matrix C arising from the covariances among random effects is sparser than in the corresponding MME for (1), it has to be borne in mind that Z and W can be considerably denser than matrices Z and W, although there is a 'trade-off' as the covariance matrices among random effects proportional to identity matrices generate fewer links between random effects levels. Hence computational savings in manipulating the MME arise mainly from the reduction in the number of equations when m A < k A or m R < k R .
The REML log likelihood (log L) can be written in terms of the MME [10]. On the original scale, (15) For the reparameterised model (11), Contributions to log L from the covariance matrices among random effects, Σ A and Σ R , are now contained in log |C| and y Py. For a given model, m A log |A| is a constant and is not required when maximising log L. However, log |A| is readily obtained as a by-product when setting up A −1 . With R blockdiagonal for animals, its (log) determinant is readily determined considering blocks for one animal at a time. If there are animals with the same combinations of traits (ages) recorded, the number of determinants which need to be evaluated is the number of combinations, and log |R| is equal to the sum of products of individual determinants and the respective number of animals [37].
As shown by Graser et al. [10], both y Py and log |C| can be evaluated simultaneously by factoring the so-called mixed model matrix (MMM) M, which is the coefficient matrix in the MME augmented by the vector of right hand sides and its transpose, and the weighted sum of squares of the data. A Cholesky factorisation requires the matrix to be factored to be positive definite. If multiple, fixed effects are fitted, X is generally not of full rank, resulting in a matrix M which is not positive definite. Hence it is assumed in the following, without changing notation, that, if applicable, X has been replaced by a full rank submatrix (or that the factorisation procedure has been modified to accommodate matrices of reduced rank).
If γ is not fitted, the corresponding rows and columns in (14) and (17) are omitted, and R is replaced by R .
Let L denote the Cholesky factor of M, i.e. M = LL , with diagonal elements l rr for r = 1, . . . , M. Terms required in (16) can then be calculated as [41] log |C| = M−1 r=1 log(l rr ) and y Py = l 2 MM .
Employing sparse matrix techniques together with an appropriate ordering of rows and columns of M, log L can be evaluated for large data sets and models of analyses.

Maximising the likelihood
One of the most widely used methods to optimise non-linear functions is the Newton-Raphson (NR) algorithm. Letθ t be an estimate of θ from the tth iterate. An update is then given bŷ where I(θ t ) and g(θ t ) are the matrix of second derivatives of log L and the vector of first derivatives of log L, respectively, both evaluated atθ t , and k t and τ t are parameters of the optimisation procedure. If second derivatives in (19) are replaced by their expected values, the algorithm is referred to as Fisher's Method of Scoring. Similarly, for the AI-REML algorithm, the average of observed and expected values is used instead of the second derivatives in (19). Calculation of derivatives of log L with respect to the parameters of the PC parameterisation is described in the Appendix.
For τ t = 0 and k t = 1, (19) reduces to the standard NR algorithm. Generally, however, it is used with some modifications to avoid 'overshooting' or to enforce an increase in log L. k t ≤ 1 is a step size scaling factor. Jennrich and Sampson [20] suggested a 'step halving' procedure, i.e., for τ t = 0 to try successive values of k t of 1, 0.5, 0.25. . . until a value ofθ t+1 was found which increased log L.
For τ t 0 and k t = 1, (19) gives Marquardt [35]'s modification, resulting in a search step which is intermediate between a NR step, and a step in the method of steepest descent (ascent). If required, a simple search for an optimal value of either τ t or k t can be carried out to maximize the increase in log L achieved. This is especially advisable when observed second derivatives rather than AI are used, as the ratio of computational efforts required for simple function evaluations and calculation of second derivatives is wide.
Other, more sophisticated, modifications of the NR algorithm, aimed at making it more robust and efficient enough to cope with starting values far from the optimum and nonconvex regions, are described by Nocedahl and Wright [43] (Chap. 6).

APPLICATION
The method described is illustrated using records for 'carcass traits' measured by live ultrasound scanning for a herd of Australian beef cattle. Traits recorded were eye muscle area (EMA; in cm 2 ), fat depth at the 12/13th rib (RIB; in mm) and at the P8 site (P8; in mm), and percentage intra-muscular fat (IMF; in % × 10). Treating records on heifers and bulls as different traits yielded eight traits in the analysis. Ages at recording from 300 to 700 days were considered. After basic edits, the data comprised 20 171 records on 2335 heifers and 3702 bulls, i.e. 5605 animals in total. Generally, all four measures for an animal were taken at the same time. However, IMF recording was introduced some time after the other traits. Hence, only 57% of bulls and 69% of heifers had all four traits recorded, with most of the remainder having records for P8, RIB and EMA. In addition, there was a small number of animals with other combinations of traits, due to missing observations or deletion of dubious records. Table I gives further details.
The model of analysis was a simple animal model, fitting contemporary groups (CG), birth type effects and a dam age class effect (heifers vs. cows) as cross-classified fixed effects, and age at recording and age of dam as linear and quadratic covariables. CG were defined as herd-management group-sex-date of recording subclasses, with further subdivisions so that each CG comprised ages at recording no more than 60 days apart. Animals' additive genetic effects were the only random effects considered. Extracting pedigree information for up to four generations backwards yielded 2439 ancestors to be included after 'pruning' of pedigrees, i.e. a total of 8044 animals in the analysis. Animals with records were progeny of 239 sires and 2284 dams.
REML estimates of covariance components among the eight traits were obtained fitting 1, 2, . . . , 8 principal components for genetic effects, using an AI-REML algorithm to locate the maximum of the likelihood. This was supplemented by simple, derivative-free search steps at convergence or when AI-REML failed to improve log L, although first derivatives different from zero suggested that a maximum had not been attained.
Residual covariances matrices were modelled as full rank matrices throughout. As traits were measured on two distinct subsets of animals, there were no environmental covariances between traits 1 to 4 and traits 5 to 8. Hence, four eigenvectors of length 4 were fitted separately for each of the two diagonal blocks of the residual covariance matrix, resulting in only 20 parameters for residuals to be estimated (16 zero covariances). REML forms of Akaike's (AIC) and Schwarz' Bayesian (BIC) information criterion [56] were calculated for each analysis. In addition, 28 corresponding bivariate analyses amongst all pairs of traits were carried out. Results from the latter were pooled using the 'iterative summing of expanded part matrices' procedure of Mäntysaari [34], as implemented by Henshall and Meyer [14]. Table II summarises characteristics of the analyses. In essence, fitting m < 8 PCs reduces the mixed model equations and computational demands per likelihood evaluation to those of a corresponding m-variate, full rank analysis. As requirements increase quadratically with the dimension of analysis (m), even a small reduction resulted in a substantial decrease in the number of operations needed for numerical factorisation of the MMM and its derivatives. With the design matrices for random effects for the PC parameterisation comprising the eigenvectors and thus being markedly denser than for standard multivariate analyses, more elements in the MMM arise from the data part. However, this is balanced by substantially fewer elements contributed by the inverses of covariance matrices of random effects. In our example, fitting eight principal components, there were 1 322 063 non-zero off-diagonal elements in the lower triangle of M from the data part and 164 568 from the inverse of the genetic covariance matrix. Corresponding figures for the usual parameterisation were 571 675 and 1 541 776. Whilst this resulted in fewer elements in M for the PC parameterisation, numbers of non-zero elements in its Cholesky factor and operation counts per factorisation were of comparable magnitude.
Likelihoods increased significantly with the number of PCs fitted up to a fit of 6. Similarly, AIC was lowest for 6 PCs, suggesting this was the best   Table III, were low for analyses fitting 4 and 5 PCs, respectively, there appeared to be some repartitioning of variation, resulting in estimates of λ 4 and λ 5 being higher for analyses fitting 6 or more PCs. Estimates of eigenvalues from pooled bivariate analyses agreed well with those from analyses fitting 6 or more PCs. As the sum of genetic eigenvalues shows, the total variance attributed to genetic effects increased with the number of PCs fitted, implying a biased partitioning, i.e. that residual covariances have been overestimated, for the lowest orders of fit. This is further illustrated in Figure 1, which shows that estimates of phenotypic variances for individual traits were essentially the same for all analyses, while estimates of genetic variances increased and estimates of the residual variances decreased with increasing number of PCs considered (up to 6). Figure 2 shows corresponding changes in estimates of selected genetic correlations. Results suggest that 6 PCs may be required to estimate covariances between the eight traits adequately, but, conversely, that the first three of these PCs suffice to explain 98.4% of genetic variation between animals.

DISCUSSION
Multi-dimensional analyses involving more than a few correlated genetic effects are inherently difficult. Typically, computational requirements are large. In addition, sampling variances and probabilities of parameter estimates outside the boundaries of the parameter space, increase with the dimension of analysis and magnitude of genetic correlations [15]. The model proposed is capable of alleviating both problems. Estimating only the most important principal components allows a more parsimonious parameterisation than standard multivariate analyses, while still capturing the bulk of variation and smoothing estimates of the corresponding covariance matrices. In a small simulation study for a RR model, Kirkpatrick and Meyer [30] showed that the first eigenvector and -value tended to be estimated accurately regardless of the number of principal components fitted, and that this dominated the sampling errors in the associated covariance function. Further work is required to evaluate the sampling behaviour in more practical situations.
In addition, computational demands can be decreased markedly, when fitting reduced numbers of PCs. This could prove especially advantageous for RR analyses, where regression on higher order polynomials has frequently resulted in 'end-of-range' problems. Lower order, segmented polynomials, i.e. splines, may provide a more flexible and less sensitive alternative. In particular, Rice and Wu [47] recommended the so-called 'B-splines' [4] as base functions, and Torres and Quaas [53] presented an application for test day records of dairy cows. A drawback of spline models is that, depending on the number of segments or 'knots' used, the number of regression coefficients and thus parameters to be estimated is often increased compared to polynomial models. However, when the first few PCs suffice to estimate covariance functions, reduced rank RR analyses using spline base functions do not need to be any more complex than current, full rank analyses using polynomials. James et al. [19] considered this scenario at a phenotypic level.
The model and algorithm presented are closely related to the factor-analytic (FA) model and AI-REML algorithms considered by Smith et al. [49] and Thompson et al. [51]. In particular, if specific variances are assumed zero, the FA model collapses into a reduced rank model [51]. However, both authors assumed the random effect with FA structure to be distributed proportionally to an identity matrix, and necessary constraints on the number of parameters to be estimated were imposed by setting the k(k − 1)/2 elements in the upper triangle of the matrix of factor loadings to zero [49]. In addition, both algorithms relied on calculation of C −1 , the inverse of the coefficient matrix in the MME.

Convergence behaviour
Initial experience with the algorithm described, using an AI-REML algorithm together with simple 'step halving', showed that a modification of the step size was required for most iterations, sometimes substantially so. This lead to slower convergence than the quadratic convergence expected and often observed; see Hofer [16] for a summary of literature results. For the example presented, estimating all eight principal components required more than twice as many iterates than the corresponding standard, multi-trait analysis. Hence, the PC parameterisation is most advantageous for a reduced rank fit, where any increase in iterations required is off-set by a smaller number of parameters to be estimated, and much lower computational requirements to evaluate log L and its derivatives.
Comparing various transformations of covariance matrices, Pinheiro and Bates [45] encountered slower convergence in obtaining maximum likelihood estimates for a parameterisation using eigenvalues and Givens rotation angles of the matrix than for the alternatives considered. The NR algorithm and its modifications are based on a quadratic expansion of the likelihood function about its maximum, i.e. they work best when the shape of the likelihood surface is unimodal and close to quadratic. Little is known about the shape of the likelihood surface for the principal component parameterisation suggested. James et al. [19] commented briefly on difficulties of maximising log L due to non-convexity of the optimisation problem, but gave no further details.

Variance matrix non-linear in vector of parameters
Results given by Hofer [16] by and large referred to models in which the covariance matrix of observations (V) was linear in the parameters to be estimated. In this case, the AI is an exact average of observed and expected values of second derivatives of log L [9]. Otherwise, contributions from nonzero second derivatives of V are omitted. No reports of convergence rates for V non-linear in θ were available.
For a full rank fit, derivatives of log L with respect to the elements of Q can be obtained by evaluating derivatives with respect to covariance components σ rs (elements of Σ) and then transforming these to the eigenvector scale. Using the chain rule of differentiation for parameters θ v = q rs and θ w = q tu and covariances φ x = σ r s and φ y = σ t u [33] The first component in the right hand side of (20) is the average information given by (A.13). The second part arises from the fact that some second derivatives of α xy with respect to the elements of the corresponding eigenvectors, q rs , are non-zero. As (20) clearly shows, this component is the more important the larger first derivatives are, i.e. the further we are away from the maximum of the likelihood. With σ xy = m r=1 q rx q ry , partial derivatives in (20) are simple to evaluate, non-zero values being ∂σ 2 s /∂q rs = 2q rs and ∂σ sy /∂q rs = q ry , and ∂ 2 σ 2 s /(∂q rs ) 2 = 2 and ∂ 2 σ 2 sy /∂q rs ∂q ry = 1, for r, s, y = 1, . . . , k.

Alternative parameterisation
In constructing our algorithm to maximise the (log) likelihood, we have adopted the conventional approach of taking derivatives of log L with respect to the parameters to be estimated. Alternatives exist and may be advantageous.
Kirkpatrick and Meyer [30] considered a search strategy to exploit the orthogonality of PCs: after first estimating the last (mth) PC conditional on those (1, . . . , m − 1) already estimated, they suggested a final optimisation step consisting of small rotations and perturbations of all PCs simultaneously. For m PCs, this involved maximisation of log L for m(m − 1)/2 rotation angles and m perturbation factors, i.e. m(m + 1)/2 parameters. Earlier, Juga and Thompson [26] teamed rotations and a canonical transformation in a derivative-free search scheme for bivariate analyses.
It is straightforward to rewrite the mixed model and MMM, in terms of fixed eigenvectors and variable rotation angles and perturbation factors as the parameters to be estimated. Derivatives of log L with respect to the new variables are readily evaluated in the same manner as described above for the elements of eigenvectors, and the likelihood can be maximised accordingly. Further research is planned to implement such algorithm and examine its efficacy.
For consistency, the algorithm has been presented assuming a reparameterisation to scaled eigenvectors for all covariance components to be estimated, other than temporary environmental variances. This is not mandatory, and in some instances we may want to combine the eigenvector parameterisation for genetic effects with a different form for other random effects. For covariance matrices fitted to full rank, it may be advantageous to compute the average information for the covariance components themselves first, and then transform these to derivatives with respect to the elements of the scaled eigenvectors (shown above), as this would capture the non-linear part of (20) ignored otherwise.
Indeed, for random effects where we are not interested in the eigen-structure per se, another parameterisation may be preferable. In particular, for the Cholesky decomposition, Σ = L Σ L Σ , a parameterisation to estimate the elements of the lower triangular matrix L Σ , taking log values of the diagonals to remove constraints on the parameter space, has been used [33,41,45] and been found to yield good convergence rates [11,45,52]. Pivoting on the largest diagonal, this parameterisation can be used to estimate reduced rank covariance matrices, by estimating the first m columns of L Σ only [39]. Again, more work is required to determine the best strategy and optimisation procedure to be used.

CONCLUSIONS
Estimation of genetic principal components directly is appealing. It has several advantages over the estimation of individual genetic effects and their covariance components, including reduction in the number of parameters to be considered, sampling variation and computational requirements. Apart from offering greater parsimony, it provides readily interpretable results characterising the patterns of genetic covariation in multiple dimension. It is expected to facilitate routine analyses considering more than a few traits simultaneously. from the National Science Foundation (NSF) and NER/A/S/2002/00857 from the Natural Environment Research Council(NERC) (MK).
As outlined by Meyer and Smith [41], the last two terms required in (A.1) can be evaluated simultaneously, using the 'automatic differentiation' of a Cholesky factor proposed by Smith [50] (see also Nocedahl and Wright [43], Chap. 7). This requires the corresponding derivatives of M to be specified. Let ∂M/∂θ t = (∂L/∂θ t )(∂L /∂θ t ) with ∂l rr /∂θ t the diagonal elements of the derivative of the Cholesky factor, ∂L/∂θ t . Derivatives of log |C| and y Py are then [41] ∂ log |C| ∂θ t = 2 with D rs a matrix of the same size as Q, k × m, with a single non-zero element of unity in position rs. Hence H rs is a symmetric matrix of the same size as Σ, k × k, with elements h ss = 2q rs and h st = h ts = q rt for t s = 1, . . . , k, and zero otherwise. Introducing subscripts to distinguish between matrices D rs pertaining to Q A and Q R , the non-zero derivatives of the design matrices are for r = 1, . . . , m A ; s = 1, . . . , k A − r + 1; for r = 1, . . . , m R ; s = 1, . . . , k R − r + 1.
As for log |R|, its derivatives can be evaluated for one animal or one combination of records at a time.
where n s j is the number of records for the jth animal which have temporary environmental variance σ 2 s .

A.1.2. Derivatives of M
With the design matrices Z and W incorporating the 'variance components' for the random effects fitted, derivatives of M for the reparameterised model (11) have a less simple structure than those for model (1) [41]. For elements of Q A and Q R , the derivatives are ∂M/∂ log(σ 2 s ), the corresponding matrix of derivatives for the temporary environmental variances, has the same form as M given in (17) but with each R −1 replaced by its derivative ∂R −1 /∂ log(σ 2 s ), for s = 1, . . . , m E .
For individual j with n j records, the submatrix of (A.9) is −σ −2 s D s j , where D s j is a matrix of size n j × n j , which has a diagonal element of unity for each record which has temporary environmental variance σ 2 s , and elements of zero otherwise.

A.1.3. Implicit permanent environmental effects
Again, if γ is not fitted, (A.7) and ∂M/∂ log(σ 2 s ) are reduced accordingly, omitting rows and columns pertaining to γ, and R is replaced by R . In that case, blocks of R are submatrices of Σ R + Σ E which is assumed to have structure Q R Q R + Diag{σ 2 i }. Hence, derivatives ∂R /∂ q R rs are non-zero, with the block for the jth animal equal to 10) where (H rs R ) n j is the submatrix of H rs R = ∂Σ R /∂q R rs , as defined above (see (A.3)), corresponding to the combination of traits recorded for individual j. Alternatively, we may have assumed Σ R + Σ E = Q R Q R with m R = k R . In that case, we do not estimate separate components σ 2 s , i.e. p = (m A (2k A − m A + 1) + k R (k R + 1))/2 and ∂M/∂σ 2 s do not exist. Similarly, derivatives of log |R | are (H rs R ) n j for θ t = q R rs . (A.12)

A.2. Average information
As shown by Gilmour et al. [9] and Johnson and Thompson [24], the 'average information' is proportional to the second derivatives of the data part of log L, y Py. i.e., b t has elements equal to the estimated residuals for records which are assumed to have error variance σ 2 s and zero otherwise.

A.2.2. Calculation of average information
As outlined above, y Py is readily calculated from y R −1 y by factoring the mixed model matrix M. As described in more detail in Meyer [38,40], this can be extended to determine products b t Pb u .
Define a matrix B with columns b r for r = 1, . . . , p. Replacing each vector y and y in M with B and B , respectively, then yields a matrix M B with M − 1 + p rows and columns. Factoring M B or, equivalently, 'absorbing' the first M − 1 rows and columns into the last p rows and columns of M B , then replaces B R −1 B with B PB, which, to proportionality, is the average information matrix. As the Cholesky factorisation of the first M − 1 rows and columns of M B has already been determined in calculating log L, this is computationally undemanding.

A.2.3. Implicit permanent environmental effects
For γ not fitted, derivatives of R with respect to q R rs are non-zero, and the equivalent to (A. 15

A.3.1. Derivatives of log |R|
The first term in (A.18), can again be evaluated considering submatrices for one animal at a time. As for the corresponding first derivatives (see (A.6-A.12) above), simplifications of (A.21) exist and are straightforward, but are not developed here.

A.3.2. Derivatives of M
Second derivatives of M have non-zero elements only in blocks for random effects. In the 4 × 4 grid representing M, diagonal blocks corresponding to α , i.e., in position (2,2), are ∂ 2 M/(∂q A rs ∂q A tu ) which expand to Similarly, in position (3,3), diagonal non-zero diagonal blocks In addition, there are non-zero off-diagonal blocks corresponding to α and γ . Blocks in position (3,2) are with obvious simplifications if R is a diagonal matrix of temporary environmental variances.