Estimating variances and covariances for bivariate animal models using scaling and transformation

Cet article traite de l'estimation des parametres genetiques dans un modele individuel a 2 variables. On montre que, dans beaucoup de situations, le temps de calcul peut etre diminue en standardisant les caracteres et en les rendant independants par une transformation, ce qui permet une maximisation sur un espace de parametres de moindre dimension. Un exemple numerique particulier montre que le temps de calcul est divise par 8. Les avantages des differents modeles transformes sont presentes


INTRODUCTION
There is often the need for estimation of genetic and environmental variances and covariances from animal breeding data. For example, to consider the responses from alternative selection schemes or to efficiently predict the genetic merit of animals. Usually in animal breeding schemes animals are selected on some criteria and so methods of analysis are needed that take account of selection. Maximum likelihood (ML) methods have been shown to take account of selection in univariate and multivariate settings (for example, Henderson et al, 1959; Thompson, 1973) if the records on which selection is based are included in the data. If this condition is only partially fulfilled, ML methods are less biased by selection than analysis of variance methods (Meyer and Thompson, 1984). A restricted or residual maximum likelihood (REML) procedure uses the likelihood of residuals and has the advantage that it takes account of the estimation of fixed effects when estimating variance components and corrects for degrees of freedom (Patterson and Thompson, 1971).
In general these methods are computationally expensive requiring the solution and inversion of equations of the order of number of animals x numher of traits, but there are simplifications when all the traits are measured on all auimals and the same fixed effect model is applied to all traits (Thompson, 1977: Meyer, 1985. In the past, estimation methods have used equations based on first and second differentials, but recently Graser et al (1987) and Meyer (1991) have shown how the likelihood can be calculated recursively in univariate and multivariate settings and advocated the direct maximization of the likelihood. Meyer (1991) also showed that the computational effort can be reduced if, given some of the parameters, it is relatively easy to maximize the likelihood for the rest of the parameters. The maximization then has 2 stages and the dimension of search is reduced. Meyer (1991) showed the advantage of these techniques for models with equal design matrices and 3 variance components. In the recent analysis of data from a pig nucleus herd (Crump, 1992) we required to estimate genetic correlations between male and female performance when the 2 sexes were reared in different environments and between growth and reproductive traits, and these models do not directly fit into Meyer's algorithm.
In this paper we show how Meyer's method can be extended to fit these and other models, by the introduction of scaling and transformation models. The models considered included those for bivariate traits when different fixed effect models are appropriate to each trait and to models with equal design matrices with more than 2 traits.

ESTIMATION
We will consider in turn estimation for 3 models.

Model 1
The first and simplest model is of the form with and var(e l ) = 1 0 ';1 and var(e 2 ) = I Q e 2 and e l and e 2 are uncorrelated, and the fixed effects #i and (3 2 have no elements in common, and the random effects u i , u 2 , e l and e 2 normally distributed. The vectors Yl and y 2 are of length n l and n 2 and matrices X l , X 2 , Z l and Z 2 are of size n l x t l , n 2 x t 2 , n i x m and n 2 x m. Our motivation was a case when a trait Yl was measured on males and y 2 was measured on females and there was interest in the genetic covariance between traits (0' A12 ) and there was no environmental covariance between the records. This model was analysed by Schaeffer et al (1978) using a method that involved calculation of the second differentials of the likelihood and inversion of a matrix of order 2m for eachiteration.
If the 2 residual variances are homogeneous then the univariate method used by Meyer (1989) can be used if the model is written in the form with If, however, the residual variances are not homogeneous the situation is slightly more complicated. If ue2, > U;2 the vector of residual can be written as and if the elements of the first vector have variance 0 '; 2 and the non-zero elements of the second vector variance 0 ';1 -0';2 then it is seen that an extra component could be introduced (if 0' ; 1 > 0 ';2) and this component estimated, but this will increase the dimension of the search by 1. We now develop a method that does not increase the dimension of search. It is useful to think of a composite matrix of y i and y 2 , of the form Y! _ [yi Y2] with so that the vector of observations is y = yi + y* 2 * We wish to maximize the log-likelihood of error contrasts (Patterson and Thompson, 1971). with í3 = (X'V-1 X)-1 X'V-l y and V is of the form R+ZGZ' = R+Z(AxT A )Z' with x denoting direct product and and The terms in [1] can be written in terms of H, 0 &dquo;; 1' Qe2 and Y c as follows: with !,, a matrix of effects for the 2 traits yi, y* and (Y c -X(3!)'H-1 (Y! -X#!) a 2 x 2 matrix of sums of squares and cross-products of residuals for these 2 traits. By using these relationships, and the formulae for log-likelihood of a model with variance matrix H developed by Graser et al (1987), it can be shown that logL can be written using where df i = n 2 -t i (i = 1, 2), C is the coefficient matrix of mixed-model equations with variance matrix I + ZG,,Z' and P = H-1 -H-1 X(X'H-1 X)-1 X'H-1 . The terms in C and P can be calculated in an analogous way to Graser et al (1987) by the formation of M, an augmented matrix of mixed-model coefficients and righthand sides of the form: with log ICI associated with the pivots involved in the Gaussian elimination of the terms associated with X and Z, and is the (2 x 2) residual matrix after elimination of the term associated with X and Z. The term log [G s[ can be written, using properties of direct products (Seaxle, 1982), as m log I T AS + 2 log IAI, with the last term independent of the variance parameters.
For given T AS , log L can be written as a function of terms of M, a el 2 and 0 '; 2 2 using Differentiating this log-likelihood with respect to 0 ';1 and Q e 2 , noting that G s and C are independent of 0 ';1 and 0 '; 2' and equating to zero gives with the ratio r = Qe i/ Qe2 satisfying the equation with df * = 1/df 2 -1 /df i and so and hence ufli and CT ;2 can be found from [4] and [5] given the 3 parameters in T AS -Substitution of the values for U2 e and CT ;2 in [3] gives log L for given T AS .
Hence the ML estimates could be found with maximization over the 3 parameters in T AS . Essentially the structure of the model has allowed the scaling of Yl and y 2 by < 7ei and CTe2 to be carried out independently of T AS .

Model 2
A natural extension of Model 1 is to allow a non-zero covariance matrix between the residuals e l and e z , B CTe12 with, for simplicity, the (n l x n 2 ) matrix so that the first n 2 animals are measured on Yl and y 2 and this will be denoted Model 2. There are obviously several ways of writing this model. We give below a form of this model that has 2 properties. Firstly, this form allows univariate algorithms to be used to calculate likelihoods. This is achieved by introducing uncorrelated effects, Ub , to help model covariance effects. Secondly, in order that scaled versions of Yl and y 2 have homogeneous contributions for e l and e 2 , as in model 1, but also for Ub , scaling factors, a and b for the contributions of Ub to y i and y 2 are introduced. This model has the general form: with It should be noted that the range of maximization of o, bi 2 is from minus infinity to infinity rather than 0 to infinity in order to allow negative environmental variances.
This only needs minor changes to the algorithm. The term log ! G ! + log ! C ! is normally found from, say, E log g i +log c 7 , where the terms g i and c i are positive. If ufli is negative then the term log [ G + log [ C is still positive definite and therefore there are an even number of negative terms in g i and c i . Therefore log G ! + log ! C ! I can be calculated as E log 1 9i1 +2: log I c i The equivalent residual variance structure has 2 equivalent formulations so the relationships between the parameters are: Any non-zero value for a and b can be used but if a =  (Thompson, 1977;Meyer, 1985;Taylor et al, 1985). Meyer (1991), for the case of additive and residual matrices, has recently emphasized a 2-stage maximization procedure, using S, a p x p transformation matrix, and 71, a p x p diagonal matrix of canonical heritabilities. For a given value of 71, the log likelihood can be written in a form analogous to !2!, with the use of p matrices of the form of M and with Y! an n x p matrix Y with the ith column of Y the ith variate y i . Given 71, the log-likelihood maximization in terms of S is computationally easier, and in fact when p = 2 there is an explicit estimate of S in terms of the residual matrices (Juga and Thompson, 1992). On a small numerical example Meyer (1991) has reduced computation to a half by such a technique and one would expect larger savings as p increased. For more than 2 sets of components, there is a natural extension of Meyer's method and the method used for Models 1 and 2. To illustrate the method suppose 3 symmetric (2 x 2) matrices E, T A and T B require estimation and the variance matrix is With 2 components a transformation to simultaneously diagonalize the variance matrices is available, but not generally for more than 2 components. However, there is a transformation S(= Q-1 ) such that SES' = I, ST B S' = T BS and ST A S' = T AS with T BS a diagonal matrix. When p = 2, the 3 x 3 = 9 parameters in E, T A and T B can therefore be written in terms of the 4 parameters in S, 2 in T BS and 3 in T AS . The calculation of the log likelihood is now based on a composite p x p 2 matrix Y c . This matrix has p 2 variates formed from the direct product of the p x p identity matrix and Y, the n x p matrix of observations with each column representing a trait. The log likelihood can be calculated using a formula similar to !2!, and it can be seen that Y c includes the variates used in Models 1 and 2 and expands the matrix Y used by Meyer (1991) when estimating 2 components.
The log likelihood in this case is similar to [2] of the form: with G s = A x T AS B s = B x T BS and !C! found from [7] with I(1/ Qb ) replaced by B S 1 . The (1 x p 2 ) vector s' is found by stacking the rows of S into a vector, ie s!i_1!P+! = Sij. The term Y'PY, = U can be found from the residual sum of squares and cross-product residual matrix for the p 2 variates in Y c after adjusting for all the effects.
Differentiation of [10] with respect to S shows that the estimates of S that maximize [8] for given values of T AS and T BS satisfy with si! _ ( S -') ij .
The appendix shows that S can be found as the solution of an eigenvalue problem if p = 2. Hence if p = 2 maximization can be reduced from considering 9 parameters to a search over the 5 dimensional space of T AS and T BS . Meyer (1991) illustrated her methods with data from a selection experiment of Sharp et al (1984) and fitted a 3-component model to bivariate data. The likelihood was maximised over a 9 dimensional space and required 722 iterations to reach convergence. Using the same starting values and convergence criteria the 5 dimensional strategy outlined in this section reached convergence in 89 evaluations.

DISCUSSION
It has been shown, for a variety of models, how scaling and transformation can reduce the considerable effort in finding maximum likelihood estimates, especially for multivariate models. Another advantage is that the transformation can suggest more parsimonious models. For example, one could consider a constrained model of the form:

SES' = I, ST BS S' = T B and ST AS S' = T A with T AS and T BS diagonal,
that is fitting a model with underlying uncorrelated traits that are transformed using Q = S-' to form the p measured traits. This model has the advantage of having fewer parameters (p(p -E-2)) and that the likelihood, for given T BS and T AS , can be calculated in about (1/p 2 ) of the time of the unconstrained model because the likelihood of each underlying trait can be calculated separately. For Meyer's example an underlying uncorrelated model converged in 50 iterations. The difference in 2 log L was 0.94 suggesting that an underlying independent model would adequately fit the data. Lin and Smith (1990)  where this procedure has very high efficiency. The strategy outlined gives a logical method for choosing the transformation S.
In the multivariate case S can be found, for given T AS and T BS , by derivativefree methods but the explicit solution for p = 2 has advantages. In fact for the numerical example above at the maximum likelihood estimates for T AS and T BS , 2 of the solutions of [8] for S correspond to local maxima and 2 to saddlepoints.
Explicit solutions for S if p > 2 were not obtained and the best computational strategy in terms of derivative-free methods or iterative use of [8] or solutions for p = 2 deserves further investigation.
The motivation has been to reduce the computation in derivative-free estimation procedure, but the idea of scaling and transformation carries over to other methods of estimation, for example, using first and/or second differentials of likelihoods.
Formulae for derivatives of the scaled parameters T AS , T BS , are easily derived if not easily calculated, for a given transformation matrix S. The arguments in this paper show how to calculate derivatives for any S. This allows for any T AS and T BS , S to be found using the derivatives calculated at this optimal S. The efficiency of this technique will depend on the structure of the data and the correlation of the parameters and deserves further investigation. If, after fitting this underlying model, there is interest in getting some information on covariances between the underlying traits but full p trait evaluation is impractical, then use of the Thompson and Hill (1990) procedure to estimate covariance parameters from analysis of sums of approximately independent traits is an attractive option.