Estimating variances and covariances for multivariate animal models by restricted maximum likelihood

Summary — Restricted maximum likelihood estimates of variance and covariance components can be obtained by direct maximization of the associated likelihood using standard, derivative-free optimization procedures. In general, this requires a multi-dimensional search and numerous evaluations of the (log) likelihood function. Use of this approach for analyses under an animal model has been described for the univariate case. This model includes animals’ additive genetic merit as random effect and accounts for all relationships between animals. In addition, other random factors such as common environmental or maternal genetic effects can be fitted. This paper describes the extension to multivariate analyses, allowing for missing records. A numerical example is given and simplifications for specific models are discussed. variance component / restricted maximum likelihood / animal model / additional random effect / derivative-free approach / multivariate analysis

analyses under an animal model has been described for the univariate case. This model includes animals' additive genetic merit as random effect and accounts for all relationships between animals. In addition, other random factors such as common environmental or maternal genetic effects can be fitted. This paper describes the extension to multivariate analyses, allowing for missing records. A numerical example is given and simplifications for specific models are discussed. variance component / restricted maximum likelihood / animal model / additional random effect / derivative-free approach / multivariate analysis INTRODUCTION In the statistical analyses of animal breeding data, traits are often considered one at a time. Usually we are interested, however, not only in the mode of inheritance of a particular trait but also in its relationships with other traits and correlated responses when selecting on the trait analyzed. Multivariate analyses are required to obtain estimates of genetic and phenotypic correlations between traits. Moreover, while univariate analyses implicitly assume that all correlations are 0, joint analyses of correlated traits utilize information from all traits to obtain estimates for a specific trait and are thus likely to yield more accurate results. This is of particular relevance when data are not a random sample, ie if records for some traits are missing as the result of selection. For animal breeding data, this is often the case since, typically, data originate from selection experiments or are field records from livestock improvement schemes which select animals on the basis of performance. In that situation, univariate analyses are expected to be biased while multivariate analyses may account for selection.
Analysis of (Co) variance (AOV) type methods have been used widely to estimate genetic and phenotypic correlations. These require records for all traits for all individuals. If there are missing records, this implies that part of the information available is ignored. More importantly, if lack of records is the outcome of selection based on some criterion correlated to trait(s) under analysis, estimates are likely to be biased by selection. In contrast, maximum likelihood (ML) estimation procedures utilize all records available and, under certain conditions, account for selection.
Recently this has been considered more formally and from a Bayesian point of view ( eg Im et al, 1989). Even if these conditions are only partially fulfilled, ML estimates are often considerably less biased by selection than their AOV counterparts (Meyer and Thompson, 1984).
A modified ML procedure, so-called restricted maximum likelihood (REML), which accounts for the loss in degrees of freedom due to fixed effects in the model of analysis (Patterson and Thompson, 1971), has become the preferred method of analysis for animal breeding data, not least for its property of reducing selection bias. Multivariate REML algorithms suggested so far, however, in general require the direct inverse of a matrix of size equal to the total number of levels of random effects multiplied by the number of traits considered simultaneously, in each round of an iterative solution scheme. This represents not only a substantial computational requirement but imposes severe limitations on the model and dimension of analysis.
Simplifications have only been suggested for the 'equal design matrix' case, ie all traits recorded for all animals at the same (or at strictly corresponding) time(s), for models containing only one random factor (eg sires) apart from residual errors. To date, there are no practical applications of multivariate REML analyses for models including additional random factors. REML algorithms as employed in practice today, by and large rely on the use of information from first or even second derivatives of the likelihood function to locate its maximum. Recently, use of a derivative-free procedure, involving explicit evaluation of the likelihood and maximization by direct search has been advocated by Graser et al (1987). This was described for univariate analyses fitting animal's additive genetic merit as the only random effect, ie estimating 2 variance components, and required only a 1-dimensional search. Such models, which also incorporate all information on relationships between animals are usually referred to as animal models (AM).
The derivative-free approach provides a flexible and powerful alternative to REML algorithms used currently. Its application for AMs including additional random effects, for instance animals' maternal genetic effects or common environmental effects, for the univariate case has been described previously (Meyer, 1989). This paper presents an extension to multivariate analyses.

THE REML ALGORITHM
The model Let denote the multivariate linear model of analysis for q traits with: y the vector of N observations for all traits; b the vector of NF fixed effects (including any linear or higher order covariables); X the N x NF incidence or design matrix for fixed effects with column rank NF * ; u the vector of all NR random effects fitted; Z the N x NR incidence matrix for random effects; and e the vector of N random residual errors.
Assume that which gives Define E with elements e ij (i < j = 1,..., q) as the symmetric matrix of residual or error covariances between traits. Correspondingly, let T = {t ij } of size rq x rq denote the matrix of covariances between random effects where r denotes the number of random factors in the model (apart from residual errors). Assume there are r * < r(r&mdash; 1)/2 covariances between the r random factors. The total number of parameters to be estimated is then s = q(q + 1)(r + 1)/2 + q 2 r * .

The likelihood
As outlined previously (Meyer, 1989), the natural log of the likelihood function to be maximised is I assuming y has a multivariate normal distribution with mean Xb and variance V, where X * (of order N x NF * ) is a full column rank submatrix of X . Alternatively, where C is the coefficient matrix in the mixed model equations (MME) pertaining to (1) (or a full rank submatrix thereof), and P is a matrix, As described by Graser et al (1987), the last 2 terms required in (3), logICI and y'Py, can be evaluated simultaneously in a general way for all models of form (1).
This involves application of Gaussian Elimination with diagonal pivoting to the matrix which is the coefficient matrix in the MME, augmented by the right hand sides and a quadratic in the data vector (see Meyer (1989) for further details).

Calculation of log IRI
Let y be ordered according to traits within individuals, and assume error covariances between traits measured on different animals are zero. This results in R being block-diagonal for animals, with ND the number of animals which have records, and F + denoting the direct matrix sum (Searle, 1982).
For q traits, there are a total of GV = 2! -1 possible combinations of traits recorded. For q = 2, for example, W = 3 with combinations trait 1 only, trait 2 only and both traits. For animal i which has combination of traits w, R i is equal to E w , the submatrix of E obtained by deleting rows and columns pertaining to missing records. This gives where N,,, represents the number of animals having records for combination of traits w. Hence, evaluation of logIRI requires calculation of W log determinants of matrices E u , of size q x q or smaller.

Calculation of log[GI ]
As for the univariate case, logIGI depends on the random effects fitted and their covariance structure. Corresponding conditions to those discussed by Meyer (1989) apply to allow logIGI to be evaluated 'indirectly', ie without the need to set up and perform numerous Gaussian Elimination steps for a large matrix, to obtain its log determinant, similar to the procedure required to determine logIC1.
Consider the simplest case with animal's additive genetic merit, denoted by the vector a of length q x NA (with NA the total number of animals), as the only random effects in the model, ie r = 1 and s = q(q + 1). Assume effects are ordered according to traits within animals, and let a i be the subvector for the ith animal with covariance matrix T. Here, T has dimension q x q and its elements are the additive genetic covariances. Then, where A is the numerator relationship matrix between animals, and x denotes the direct matrix product (Searle, 1982). As for the univariate case, loglal does not depend on the parameters to be estimated and is not required in order to maximize log C.
Extend the model by allowing for a second random effect for each animal, m with subvectors m i (i = 1, ..., NA), which has the same correlation structure between animals as a. A typical example is a maternal genetic effect. Then r = 2 and T is of size 2q x 2q, with s = 3q(q + 1)/2 if a and m are assumed uncorrelated (r * = 0) and s = q(5q + 3)/2 otherwise (r * = 1). Assuming u is ordered according to effects within animals, ie u' = (a' mi ...a' m NA ), (7) and (8) hold, with T = var{(ai mi)}.
Often we want to include an additional random effect, uncorrelated to the other random factors, in the model of analysis. This could be a common environmental effect, such as a litter effect in the analysis of pig data, or the permanent effect due to an animal which is not additive genetic in a multivariate repeatability model. Let this effect, with NC levels per trait, be denoted by c. T can then be partitioned into diagonal blocks T A (for additive genetic effects) and Tc (for the additional uncorrelated effect). Correspondingly, where D, most commonly taken to be the identity matrix, describes the correlation structure amongst levels of c. Again, loglal and logIDI are constants and do not need to be evaluated. As above, (9) and (10) also apply, with an appropriate modification fo T A , if a second random effect, m, is fitted for each animal. Extensions to other models, for instance including several additional factors c, can be derived accordingly.
Maximizing the likelihood Different strategies to locate the maximum of the log likelihood function, or equivalently the minimum of -2log 1:, with respect to several parameters have been examined by Meyer (1989). The so-called Simplex procedure of Nelder and Mead (1965) proved to be robust and easy to use and was chosen for the current application, together with the associated convergence criterion of the variance of function values in the Simplex. As for univariate analyses, a step size of 20% was used throughout in setting up the initial Simplex. In particular, this procedure allow constraints on the parameter space to be imposed simply by assigning a very large value to -2 log £ for parameter vectors out of bounds. This is especially important for multivariate analyses, as estimated genetic covariance matrices have a high probability of being non-positive definite, increasingly so with the number of traits considered and the magnitude (absolute value) of genetic correlations (Hill and Thompson (1978)).
To illustrate the convergence behaviour of the maximization procedure, data were simulated, sampling from a multivariate normal distribution, consisting of records for 2 traits for each of 4000 animals, assumed to be offspring of 500 base animals, 100 sires mated to 4 dams each. Fitting an overall mean as the only effect and families (NC = 400) as an additional random effect, this gave M of size 9 803 with 80 903 non-zero off-diagonal elements. Using the population values for additive genetic variances (QA2! = 50, 32, 80 for i <_ j = 1, ... , q), variances due to family or litter effects (uc i j = 12, 10, 60) and error variances (o 'Etj = 40, 100, 260) as starting values, 1321 Simplex iterates involving a total of 2437 likelihood evaluations were carried out, at the end of which the variance of function values in the Simplex (-2log G) was reduced to 1.06 x 10-7 . The behaviour of the multi-dimensional search is illustrated in figures 1 and 2, showing changes in estimates of variance components and associated log likelihoods for successive Simplex iterates. For 'good' starting values, increases in log likelihood after 190 iterates, equivalent to 300 function evaluations and V(-2log C) = 1.9 X 10-4 , were only very small. Estimates of variances, though, changed until changes in log£ were of order 10-3 or less (see inset of figure 2). About 1 000 likelihood evaluations were required to reach that stage of convergence.

SPECIAL CASES
Traits measured on difl'e. ent animals Specialized multivariate REML algorithms using information from derivatives of the likelihood function have been suggested for models with one random factor for various special cases. Schaeffer et al (1978) considered the situation where traits were measured on different animals, so that residual covariances were zero. The algorithm presented here is adapted! for this case simply by reducing the vector of parameters to be estimated accordingly. For the analysis under an animal model, however, it has to be borne in mind that with a record for one trait only for each animal, information on genetic covariances is available only through relatives with records for the other trait(s). Hence estimates are likely to be subject to large sampling errors unless animals are highly related or data sets are large.

Equal design matrices for all traits
If the design or incidence matrices in the linear model are equal for all traits, (1) can be rewritten as assuming, in contrast to the above, that records are ordered according to animals within traits. Let n = N/q, nr =NR/q and n f =NF/q denote the number of records, random effects levels and fixed effects levels per trait, respectively. X o of order n x n f and Z o of order n x nr are then the design matrices for fixed and random effects for each trait, while Iq denotes an identity matrix of order q.
Since all animals have records for all traits, Consider now a decomposition of the residual covariance matrix into This gives with Q-T = (Q')-l . Using that where W and rcw stand in turn for X o and nf, and Z o and nr, it can be shown that R-1 can be factored from the coefficient matrix in the MME.
Transforming the data vector to the augmented MME, (4), can be replaced by with G * = (Q-' x I!r)G(Q-T x I nr ). Absorbing all rows and columns of M * into y * 'y * then directly yields the quadratic in the data vector required in (3), ie The log determinant of the coefficient matrix, logIC1, calculated when operating on M * rather than M, however, has to be adjusted for the fact that R-1 has been factored out. with nf * = NF* /q, C * = (Q' x 1,, f +,,,)C( Q x I nJ + nr ) and P * = (Q-T X I n )P(Q-1 x In). For ease of presentation, (14) has been written for the vector of random effects assumed to be ordered according to efFects within traits. For computational purposes, however, some re-ordering would be advisable in order to minimize 'fill-in' during the absorption steps. Ordering effects within animals and animals according to date of birth, for instance, would result in equations for the youngest animals to be eliminated first.
An obvious choice for Q would be the Cholesky decomposition of E. Using (14) rather than (4) then reduces computational requirements in setting up the augmented coefficient matrix. However, a large proportion of the off-diagonal elements thus 'saved' initially arise subsequently as fill-in during the absorption steps due to covariances between traits for random effects levels. Alternative forms of Q exist though which yield G * with more or larger diagonal submatrices, ie considerably less off-diagonal elements, and are thus computationally advantageous.

Canonical transformation
The use of a canonical transformation of the data to estimate variance components by REML for multivariate linear models with one random factor and equal design matrices for all traits, has been considered by a number of authors. Estimation procedures have been described for expectation-maximization (EM) type algorithms (eg Taylor et al, 1985;Smith and Graser, 1986) and Fisher's method of scoring (Meyer, 1985).
For q correlated traits, this transformation yields a set of q new traits, so-called canonical variables, which are both genetically and phenotypically uncorrelated. Hence a multivariate analysis can be carried out as a series of q corresponding univariate analyses which results in a substantial reduction of computational effort; see references given for further details.
Consider an AM without additional random effects, ie u = a, T = Var(a i ) and G = T x A, with equal design matrices for all traits. Let A i for i = 1, ... , q denote the eigenvalues of E-1 T and S the corresponding matrix of eigenvectors. Then and ie E = S-1 S-T , Q = S-' and G * = Diag Pi A}. Hence S, with elements Sij , describes the canonical transformation.
For y * = (S x I,,)y, then variance matrix of the transformed data vector, V * = Var(y * ), the coefficient matrix C * and projection matrix P * (based on (14)) are block-diagonal for traits (Meyer, 1985). Consequently, M * can be partitioned into q independent matrices M2 : where y* is the subvector of y * for trait i.
Clearly, each of these submatrices is equivalent to the augmented coefficient matrix in a univariate analysis of a trait with heritability A,/(A : + 1). Absorbing rows and columns 2 to k = n f + nr + 1 in (18) (skipping rows with zero pivots) into YT ' yi then yields a quadratic y( PTyi and determinant 10glCT where Pi and Ci are the submatrices of P * and C * , respectively, for the i-th trait. Quantities required in (3) are then obtained by summing over traits Alternatively, the log likelihood can be evaluated as the sum of likelihoods for univariate analyses on the canonical scale together with an adjustment for the transformation. with q Noting that Y1 = L 8ik Yk , it follows that k=l ie that an explicit transformation of the data vector is not required. Replace Y i' y* in (18) by the q x q matrix of sums of squares and crossproducts between traits on the original scale, Y = ly'y?), 1, and expand the first row and column correspondingly, ie replace X'y* by q columns X!y! , and Z'y* by columns Zoy!, for k = 1, ... , q.
Absorbing rows and columns q + 1 to q + n f + nr into the first q rows and columns then yields q(q + 1)/2 terms y!Pi y&dquo;l, and yi p i Y i can be calculated according to (22).
For univariate analyses, the error variance can be estimated directly from the residual sums of squares, ie the quadratic in the data vector at the end of the Gaussian Elimination steps (Graser et al, 1987 (1987) and Harville and Callanan (1990) parameterised the analysis to the error variance and its ratio(s) with the other component(s), and maximised the 'concentrated' log likelihood, log GC with the respect to the latter only. At the maximum of log LC, the estimate of the error variance is equal to its (unconditional) REML estimate and estimates of the other components can be obtained from it and the REML estimates of the variance ratios.
Analogously, we can parameterise the multivariate analysis to E and a function of E and T, F = f (E, T). Corresponding to the univariate case, an obvious choice is the generalized ratio of T and E. Log GC is then maximised with respect to the elements of F and, at the maximum, T is obtained from F and E via the inverse of f (E, T). This reduces the dimension of search by q to the q 2 elements of F. As T is, by definition, symmetric and is required to set up M, F should be chosen so that it is ensured that T is symmetric. A possible strategy is to force F to be symmetric, eg F = E-! i-2 TE-or F = L-1 TL-T with LL' = E, and to maximise with respect to the q(q + 1)/2 elements of the upper triangle of F only thus reducing the search by a further q(q &mdash; 1)/2. So far, the canonical decomposition has only been considered as a means to reduce the number of off-diagonal elements in M and thus the computational requirements in each likelihood evaluation. However, it can be regarded as an alternative parameterization in its own right which effectively reduces the dimension of search to the number of traits. Instead of estimating the q(q + 1) elements of T and E, we can estimate the q eigenvalues A i of E-1 T and q 2 elements si! of S. This is a one-to-one transformation so that maximizing log £ with respect to elements of Diag f A i and S is equivalent to maximizing with respect to elements of T and E.
Using this parameterization, maximization can be carried out as a nested 2-step procedure, analogous to the approach taken by Smith and Graser (1986) to solve a 2-dimensional problem performing 2 1-dimensional search steps in sequence. From (17), (21) and (22) it follows that (20) can be rewritten as The first 2 terms in (23) depend on the A i only. Hence for given values of Ai, the log likelihood can be evaluated for different matrices S. This allows an 'internal' search to be performed to determine S which maximizes log ,C for these A,. In general, this search involves q 2 parameters. Computationally it is comparatively undemanding though: once quantities loglc*l and y[Pgy m (i = 1, ... , q) have been determined, each likelihood evaluation for different values of S merely requires scalar manipulations or matrix operations proportional to the number of traits.
This 'internal' search is repeated for each 'external' search step, ie each likelihood evaluation required in maximizing log C with respect to the A i . Since the dimension of the external search is reduced to q, the number of likelihood evaluations, or equivalently the number of computationally demanding Gaussian Eliminations, required is lowered substantially. Overall this yielded to a considerable decrease in computational resources needed. After REML estimates A, have been determined with the pertaining matrix S, estimates on the original scale are obtained by reversing the canonical decomposition. This parameterization also has been considered by Juga and Thompson (1990) for 2 traits only, where additional knowledge about the structure of S could be exploited to reduce the dimensionality of the internal search.

NUMERICAL EXAMPLE
Data from a selection experiment in mice by Sharp et al (1984) are given in table I. Records are body weight at 6 wk of age and adjusted food intake from wk 4 to 6 (adjusted for weight at 4 wk by within-family, within-sex regression) for 3 generations of mice selected for increased appetite, ie adjusted food intake. In addition, pedigree information for parents of the first generation was available and utilized, yielding a total of 339 animals in the analysis, 284 with records and 55 which were parents only. These data were analyzed for a multivariate animal model with animals as the only random effect (model 1), and for an AM including litters as an additional, uncorrelated random effect (model 2). Fixed effects fitted were generations (3 levels), sex (2 levels) and litter size (7 levels) for each trait, where the first levels of the latter two were set to 0 to account to X not of full column rank.
Analyses for both models were carried out for data of a general structure (strategy I: M of form (4)), and accounting for the fact that design matrices for both traits were equal (strategy II: M of form (14)), utilizing the canonical decomposition of the error (E) and additive genetic covariance (T A ) matrices to obtain a transformation of the data which diagonalized the submatrix of G * pertaining to animals. In addition, for the analysis under model 1, strategy III used the 'concentrated' likelihood approach, estimating residual covariances directly, while strategy IV used the parameterization to characteristics of the canonical transformation with a nested 2-stage search procedure (M of form (18) for both).
Preliminary univariate analyses yielded estimates of variance components which were utilized as starting values for the multivariate analyses. Values used were U , 4 ii = 4.7, 4.0, 8.3 and {J Ei j = 2.5 3.0, 12.9 for model 1, and QAi! _ 4.9, 1.0, 6.0, {J Cij = 1.5, 1.0, 3.0, and (J&dquo; Eij = 1, 7, 1.0, 12.6 for model 2, for i <_ j = 1; 2, respectively. Characteristics of the augmented coefficient matrix M, components of the log likelihood for the starting values, time required per evaluation of the log likelihood, and estimates of variance components are summarized in table II. While iterates performed were identical for strategies I and II, the canonical transformation of the data reduced computing time required by more than 50%. For strategy III, a parameterization was chosen so that F was symmetric (F = L-1 TL-T with LL' = E as described above). Maximizing the likelihood with respect to 3 instead of 6 parameters decreased the number of iterates and likelihood evaluations needed considerably.
However, for this parameterization the search procedure appeared to be less successful in determining the maximum of log C accurately. Restarting the search from the 'converged' values eventually led to the same estimates as for strategies I and II, but this required several restarts and an excessive number of likelihood evaluations, indicating problems of numerical nature, so that, on the whole strategy III appeared less advantageous than II. Strategy IV performed markedly better. The internal search for S which maximized log £ for given values of A i increased the time per likelihood evaluation by * 5%, but reduced the number of Gaussian Elimination steps to a fraction of those required by the other strategies and reduced the total time for the analysis to < 3% of that required when ignoring the special structure of the data (strategy I).

CONCLUSIONS
The derivative-free approach is well suited to the multivariate estimation of variance components by REML. It provides a flexible algorithm which can be adapted to a range of models, differing in the random effects fitted and assumptions about covariances between them, of interest for the analysis of animal breeding data. As for multivariate REML algorithms described previously though, computational requirements are high. Not only is the time required for each likelihood evaluation increased considerably in comparison to the univariate case, but the number of parameters to be estimated'simultaneously rises markedly with the number of traits.
This causes a dramatic increase in the number of search steps needed to locate the maximum of the likelihood, with a correspondingly high number of likelihood evaluations required. Components of the likelihood comprise the determinants of the covariance matrices of the random effects fitted and of the residual errors, as well as the determinant of the coefficient matrix in mixed model equations and a quadratic function of the data vector. The first 2 can often be obtained indirectly, requiring only the determinants of matrices of variance components of size qr x qr, q x q or less. Calculation of the latter 2 terms, however, requires a matrix of size proportional to the number of traits and the number of levels of fixed and random effects in the model, to be set up and 'swept out'. This matrix is typically very large but also sparse so that sparse matrix techniques can be employed successfully to handle analyses involving several thousand effects (see Graser et al (1987) or Meyer (1989) for details). Computational demands to absorb all rows and columns of the augmented coefficient matrix into the first element are determined not only by the size of the matrix but, more importantly, by the number of non-zero off-diagonal elements. For multivariate analyses, this number is generally large due to contributions from covariances between traits. If design matrices are equal for all traits, a canonical transformation of the data can be utilized to eliminate a considerable proportion of these off-diagonal elements, thus reducing computational effort required for each likelihood evaluation markedly. Furthermore, if the model of analysis contains only one random factor, we can reparameterize our analysis from the covariance components to the eigenvalues and elements of the corresponding eigenvectors arising in the canonical decompositon. This allows maximization to be carried out as a nested 2-step procedure, reducing computational requirements dramatically.
Further research is required to investigate potential improvements of the algorithm presented. There may be alternative parameterizations which give quicker convergence, ie for which the maximum of the likelihood can be determined more easily. The Simplex method has been reported to perform well for few dimensions but to be progressively less successful as the dimension of search increases (Box, 1966). Use of a NAG library subroutine (E04CCF) which incorporates modifications of the Simplex procedure, as suggested by Parkinson and Hutchinson (1972), to improve its efficiency failed to reduce the number of likelihood evaluations required for the cases examined (models 1 and 2, 2 traits, general data structure). Other optimization procedures may require less iterates and thus reduce the number of likelihood evaluations necessary.
In particular, it should be noted that by expanding the quadratic in the data vector to the corresponding matrix of weighted sums of squares and crossproducts for multiple right hand sides, as described in the Canonical transformation section, each Gaussian Elimination step can yield several points on the likelihood surface. While the Simplex method used here is sequential, an alternative search strategy might exploit this additional information and thus reduce the heavy computational demands of a multivariate animal model analysis.