Univariate and multivariate parameter estimates for milk production traits using an animal model. II. Efficiency of selection when using simplified covariance structures

The effect of simplifying covariance structures for milk, fat and protein yield in lactations 1-3 on accuracy of selection for lifetime yield was investigated using selection index theory. Previously estimated (co)variances were assumed to be the true population parameters. A modified repeatability model, ie assuming a genetic correlation of unity between performances across lactations but allowing for different heritabilities and variances in different lactations, was used to investigate the efficiency of selecting sire on their progeny performance. For most combinations of heritabilities in first and later lactations, selecting sires using the repeatability model was nearly 100% efficient in terms of accuracy of selection compared to a general multivariate model. The predicted response to selection using a modified repeatability model was approximately 10% too high. It was found that applying a transformation to make new traits in lactation 1 uncorrelated at

Résumé -Utilisation du modèle animal pour l'estimation des paramètres univariates et multivariates concernant les caractères de production laitière. II. Efficacité de INTRODUCTION In a previous paper (Visscher and Thompson, 1992), genetic and environmental parameters were presented for milk yield (M), fat yield (F) and protein yield (P) in lactations 1-3. If the breeding goal for dairy cattle breeding is some (linear) combination of these production traits in all lactations, an optimal way to combine all available information to predict breeding values is a multivariate (MV) BLUP analysis. For a national animal model (AM) breeding value prediction, however, a general MV BLUP analysis is computationally not yet feasible. In practice, therefore, simplified assumptions are made when predicting breeding values for large populations using an AM. In dairy cattle AM prediction, milk, fat and protein yield are usually evaluated separately using a repeatability model with some scaling for observations in later lactations to account for heterogeneity of variance across lactations (Wiggans et al, 1988a, b;Ducrocq et al, 1990;Jones and Goddard, 1990).
Selection index theory is used to investigate the loss in accuracy of selection when simplified covariance structures are used to predict breeding values. A second aim is to investigate how to reduce the dimensionality of the above MV prediction problem to a manageable size without a great loss in accuracy.

MATERIAL
As reported previously (Visscher and Thompson, 1992), the 9 x 9 genetic covariance matrix of M, F and P in lactations 1-3 was found to be not positive definite.
To create a (semi) positive definite matrix the single negative eigenvalue was set to 10-', and covariance matrices were recalculated. These matrices were then used for subsequent (index) calculations. Without loss of generality, phenotypic variances for M, F, and P in lactation one were set to 1.0. The parameters are summarised in table I. An alternative way to summarise the covariance matrices is to calculate eigenvalues and eigenvectors of the matrix P-1 G. Eigenvectors then represent linear combinations of the original traits which are independent of each other at the genetic and phenotypic level (Hayes and Hill, 1980;Meyer, 1985).
The corresponding eigenvalues of the newly created traits, called canonical variates (eg Anderson, 1958;ch 12), have been termed canonical heritabilities (Hayes and Hill, 1980;Meyer, 1985). Table II shows the eigenvalues and eigenvectors of matrix P -1 G, where P and G are the 9 x 9 phenotypic and genetic covariance matrices for milk, fat and protein yield in lactations 1-3 (M 1 F 1 P 1 M 2 F 2 P 2 M 3 F 3 P 3 ), calculated from parameters in table I. A number following M, F or P indicates the lactation number. The smallest eigenvalue from the original P-'G was -0.03, and the corresponding eigenvector was: Therefore, one canonical trait, {-0.04M 1 + 0.15F i .'.. + 0.09P 31 , was found to have a heritability of -0.03. The negative eigenvalue resulted mainly from the contrast of individual yield traits in lactations 1, 2 and 3 ((M 2 -M 3 ) + (F 1 +F Z -F 3 ) + (P 2 -P d ). After setting the only negative eigenvalue of the original matrix G to &dquo;zero&dquo; (10-6 ), the corresponding eigenvector for the newly formed matrix P-'G represented mainly the contrast between yield traits in lactations 2 and 3 (see last row of table II). This was expected, given the very high genetic correlations for yield traits in lactations 2 and 3 (see table I). ANALYSIS Selection index theory Let: u = q x 1 vector of breeding values of q traits; a = q x 1 vector of (marginal) economic values for q traits; H = u'a = aggregate breeding value; x = p x 1 vector of sources of information on an individual (for example phenotypic observations, daughter averages, predicted breeding values); b = p x 1 vector of index weights; I = b'x = index value used to predict H ; P = v(x); G = cov(x'u) and the symbol A added to a scalar or matrix indicates an estimate thereof.
For index calculations standard results were used (see, for example, Sales and Hill, 1976).
Equations for the responses are using b = P-'Ga for the optimal index, and b = P-1 Ga for an index using estimates of P and G : Where R, R and R * are the optimal, predicted and achieved response to selection in the aggregate breeding value (=H) respectively, expressed as a ratio of the selection intensity.
If new traits are created which are linear combinations of the observations, y = W'x, ie variables in vector y are a linear combination of the variables in vector x, then It was assumed that the marginal economic value for any of the production traits in later lactations was the product of the relative expression of that trait and the phenotypic standard deviation, thus reflecting survival to later lactations and the economic importance of a larger standard deviation (and mean) in later lactations. a z = e ZQi , where a i , s i and a j are relative economic value, relative expression and standard deviation for lactation i. Relative expression was assumed to follow a geometric series, Ei = (0.8)?!! , assuming a relative survival of 80% from one lactation to the next and setting the expression in lactation one to 1.0. Phenotypic standard deviations were assumed to be 1.0, 1.20 and 1.25 for lactations 1-3, and 1.25 for all subsequent lactations (from table I). If it is further assumed that the covariance of any observation with the breeding value in lactation 3 is equal to the covariance of that observation with breeding values in later lactations, ie the corresponding rows of matrix G are identical, then the economic values Single trait multiple lactations considerations Meyer (1983) investigated the potential gain in response to selection from including multiple lactation information on progeny of sires for sire evaluation. The accuracy 1 1 = phenotypic index, ie sources of information are phenotypic observations on individuals; 1 2 = sire index; sources of information are daughter averages of sires in different lactations; 1 3 = cow index; sources of information are the predicted breeding value (index) of the cow's sire and dam and the cow's own records.
The largest reduction in response to selection is expected when selection is across age classes, eg across cohorts with different amounts of information, since an improper weighting of later lactations then would have the largest impact. In the following examples, only 2 lactations and 2 cohorts were considered, but the results are thought to be similar for more lactations (given the very high genetic correlation between second and later lactation yields) and more age groups. The genetic means for the cohorts were assumed to be zero, hence the consequences of the error in predicting genetic trend were ignored. (A thorough study of long-term losses in response through incorrect estimation of genetic trend, thus creating an suboptimal ranking of young vs old animals, was outside the scope of this study.) For each index there were different amounts of information on the 2 cohorts, I 1 : Cohort 1: phenotypic observation in lactation 1 Cohort 2: phenotypic observations in lactations 1 and 2 I 2 : Cohort 1: first lactation daughter average based on n l daughters Cohort 2:n 1 first lactation daughter records and n 2 second lactation records (n 2 < n l ) I 3 : 1 Cohort 1: sire index based on n first lactation progeny, dam index based on sire index of dam and dam's first lactation record, cow's first lactation record Cohort 2: sire index based on n l + n 2 progeny records, dam index based on sire index of dam and dam's records in first and second lactation, cow's first and second lactation records. Parameters used for the example with 2 lactations and 2 cohorts were (from table I): a' = !15!; 7-g = 0.85; 7-p = 0.55; phenotypic variances were 1.0 and 1.45 and heritabilities were 0.40 and 0.30 for first and second lactations respectively; n l = 50; n 2 = 35. The &dquo;estimated&dquo; (assumed) parameters were: r 9 = 1.0 (repeatability model); the &dquo;true&dquo; phenotypic covariance matrix (from table I) was used and heritabilities for lactation 1(!2 ) and for lactation 2 (h2) were varied. A proportion of 10% of the total number of animals available was selected. The definition of this repeatability model differs from that usually used because heritabilities and phenotypic variances are not necessarily equal in different lactations.
Responses to selection were calculated using equations (1), (2) and (3). Given any set of parameters the optimal proportion of animals to be s from each cohort was determined using an algorithm from Ducrocq and Quaas (1988), assuming the parameters used were the true population parameters. Results are presented in table III.
For the parameter set chosen the loss in efficiency was small; a 0-5% reduction in genetic gain for a range of heritabilities for first and second lactation performance.
These results may be expected, since the &dquo;true&dquo; genetic correlation (= 0.85) between performance in lactations 1 and 2 was high and an observation for lactation performance is always conditional on the presence of a first lactation observation. The ratio of achieved to predicted response was less robust to changes in parameters.
Even when the correct heritabilities (0.40 and 0.30) were used the achieved response (accuracy) was approximately 10% below the maximum expected response. This may be seen as a very simple illustration that one should be cautious when using predicted values (whether from selection indices or BLUP) to estimate genetic trend when the parameters used in the prediction are subject to large sampling errors or when they are a priori incorrect (as in the case of a repeatability model when it is known that r 9 G 1). Since results were similar for the 3 indices used, subsequent calculations were only performed for the case of mass selection.

Multiple trait multiple lactation considerations
Suppose the breeding goal is a linear combination of 9 traits (M 1 , F 1 , P I , M 2 , F 2 , P 2 , M 3 , F 3 , P 3 ), which is thought to be a good indicator of lifetime economic production since third and later lactation performances are assumed to be highly correlated. Then, choosing a set of economic values and using parameters from table I, the relative accuracy of selection for different indices which use different amounts of information can be investigated. For 3 different sets of economic values these relative accuracies were calculated, and results are presented in table IV. The economic values for M 1 , F, and P I in the second breeding goal (H 2 ) are similar to first lactation economic weightings used in practical selection indices in Europe (eg Wilmink, 1988). H 3 reflects a more &dquo;progressive&dquo; breeding goal with selection only on protein production. Results from table IV show that approximately 20% accuracy is lost when only one observation is used to predict the aggregate breeding value. Results for H 2 and H 3 were similar since breeding values for these composite traits were highly correlated. If only accuracy is considered, using milk and fat yield in a selection index does not contribute substantially to increase response to selection for lifetime protein yield (H 3 ). For all 3 breeding goals, the expected gain in accuracy for sire selection when adding progeny means on multiple traits is expected to be smaller than for mass selection.
Although these calculations are an oversimplification of breeding value prediction and selection in practice, they are useful when comparing the accuracies from table IV with accuracies when simplified assumptions are made regarding the covariance structure of the observations (in next section).

Proportionality considerations
One possible way to reduce the dimensionality of a MV prediction problem is to investigate whether some traits may be approximately expressed as linear combinations of other traits, or if some linear combination of the traits explains most of the variation in the aggregate breeding value. To reduce computations (further) it would be of interest to find a minimum number of independent traits which would provide all the necessary information (see for example Lin and Smith, 1990). In particular, it would be convenient if one linear transformation could be found that reduced the prediction problem of 9 highly correlated traits (milk, fat and protein yield in lactations 1-3) to that of 3 independent new traits, since in that case 3 separate analyses would contain all the information and would (therefore) account for selection. One approach is to investigate if submatrices of the 9 x 9 covariance matrix are proportional to each other, since that would indicate that a suitable transformation may exist to reduce the dimensionality of the prediction problem. With observation on p traits in I lactations, this suggests testing if the covariance matrix of lp traits, V, may be written as V = K Q9 V h , where V h is a submatrix (or a transformation thereof) describing a (co)variance block of p traits within or between lactations, K is a I x I symmetric matrix of proportionality or scaling constants, and (9 is the direct product operator (Searle, 1966). For the case of M, F and P in lactations 1-3, the hypothesis is that the complete covariance matrix may be expressed as a proportionality (or scaling) matrix multiplied by a transformation matrix. In the appendix a general framework is presented to test for proportionality of covariance matrices using likelihood ratio tests and to estimate constants of proportionality, using multivariate normal theory applied to observed and expectations of moment matrices.
Unfortunately, the data from table I were found unsuitable for a likelihood ratio test. Obviously the additive genetic covariance matrix (A) and the environmental covariance matrix (E) from table I are not independent moment matrices; A and E are highly correlated and the determinant of A is zero. An alternative is to transform A and E into a between and within sire covariance matrix (B and W), assuming these matrices are from a balanced half-sib design based on s sires and n progeny per sire. However, there was insufficient information about the sampling variances of the estimated E and A matrices to determine the appropriate degrees of freedom. Furthermore, the exact distribution of the likelihood ratio test statistic based on empirically derived degrees of freedom and using animal model estimates may differe substantially from a Chi-square distribution. Therefore, significance testing for proportionality was not pursued.
Using parameter estimates from table I, however, some inference with respect to proportionality may be drawn. One (obvious) choice for the transformation matrix V h is a canonical transformation on M 1 , F 1 , and P l . This transformation was calculated (see Hayes and Hill, 1980 ;and lMeycr, 1985, for computations) and the matrix was used to transform the traits within second and third lactations. Let: the 3 x 3 genetic (Vg ll ) and phenotypic (V Pll ) covariance matrices of milk, fat and protein yield in lactation 1. Then the transformation matrix of milk, fat and protein yield in lactation one, Q l , was chosen such that: with elements of diagonal matrix D equal to eigenvalues of the matrix (V-l vgil).
Using Q 1 , the vector of observations, was transformed using: The eigenvectors for the 3 canonical variates in lactation 1 were (2.96 -0.72 -2.09], (-0.85 -1.85 2.61! and [-0.08 0.42 0.69] respectively, which form the rows of matrix Q 1 . The 9 x 9 correlation matrices and the heritabilities of the 9 new traits (y c ) are shown in table V. Off-diagonals in all 3 x 3 blocks were small, indicating that one transformation matrix created 3 nearly independent variates with for each new variate highly correlated &dquo;observations &dquo; in later lactations. Using the covariance matrix of milk, fat and protein yield in lactation 1 as V h , proportionality matrices for additive genetic and environmental effects were calculated from equation (A7]. This assumed the observed covariance matrices E and A were moment matrices, but degrees of freedom did not need to be specified. For A and E, the estimates of proportionality matrix K, K a and K e respectively, were: For example, the estimate of the average scaling factor for genetic variances for M, F and P on the transformed scale (transformation such that new variates are uncorrelated) in lactation 3 was 1.74, and the estimate of the average scaling factor for environmental covariances between the transformed variates in lactations 1 and 3 was 0.58. If proportionality is assumed, the 9 x 9 MV prediction problem may be reduced to 3 independent 3 x 3 multivariate predictions or to 3 independent evaluations with a repeatability model. Using the breeding goals defined previously the efficiency of the reduction in dimensionality was calculated for mass selection, conditional on the parameters in table I being the true population parameters. Thus the parameters from table V were used for index calculations with all off-diagonals of all 3 x 3 covariance blocks set to zero. In the case of a repeatability model on the canonical variates, genetic correlations between canonical variates across lactations were set to unity. Results of selection index calculations for phenotypic selection are presented in table VI. The relative accuracy when using the first 3 canonical variates is slightly lower than the corresponding accuracy using the original first 3 variates (M 1 , F 1 and P 1 ) from table III because the genetic covariance structure between the canonical traits in lactation one and transformed variates in later lactations was simplified (off-diagonals of 3 x 3 blocks in matrix G were set to zero). Clearly little accuracy is lost assuming proportionality of the covariance structure for milk, fat and protein yield across lactations. Simplification to a repeatability model on 3 canonical variates was approximately 97% as efficient compared to a multivariate analysis on 9 traits. When using the canonical variates there was no advantage of a MV analysis over an analysis with a repeatability model.

Analysing linear combinations of the observations
A final reduction in dimensionality is achieved by analysing a reduced set of traits which are linear combinations of the available observations. One suggestion is to create new traits for each lactation which are the sum of the phenotypic observations weighted by the corresponding economic values in the aggregate breeding value.
Using the notation from equation (4!, ! y = a'x, where variables in x are, for example, observations for M 1 , F, and P 1 . Relative accuracies were calculated using equation (4!, fitting first lactation yield traits, first and second lactation yield traits, and all yield traits in vector x, respectively. Results are presented in table VII. Another suggestion is to use linear combinations of the yield traits within a lactation as new traits and to perform an analysis on those new traits. For example, if y l = w ' x l , for x ' = (M1F1P1!, and !2 = W'X 2 , for x' = [M 2 F 2 P Z ), then in the selection index framework this would be fitting y = W'x as used for equation (4).
Using Xl and x 2 as above, and x3 = [M 3 F 3 P3!, 3 new traits were created using the economic values for each trait in the aggregate breeding value as elements for matrix W. Accuracies for fitting combinations of these new traits are presented in Finally, 3 separate selection indices could be calculated from using (M l M 2 M 3 ), (F l F 2 F 3 ), and (P l P 2 P 3 ) respectively, and these 3 indices could be combined into one predicted aggregate breeding value. This approach is analogous to the way selection indices for dairy cattle are usually calculated: breeding values are calculated for milk, fat and protein yield separately, and the 3 estimated breeding values are subsequently combined in a selection index ( eg Dommerholt and Wilmink, 1986;Wilmink, 1988). Separate selection indices for milk yield, fat yield and protein yield were calculated using either the &dquo;true&dquo; (multivariate) covariance structure or the modified repeatability model (r . across lactations set to unity), and the accuracy of using the 3 indices to predict the aggregate breeding value was calculated. For the construction of each of the 3 indices it was assumed that the traits in the index were identical to the traits in the aggregate breeding values. For example, the index using M I , M 2 and M 3 was calculated as Pn; G&dquo;,a!&dquo; where P m and GI&dquo; are the 3 x 3 phenotypic and genetic covariance matrices for milk yield in lactations 1-3, and a m is a vector of economic values for these traits. The relative accuracies for this approach are shown in table VII.
Comparing results from tables IV, VI and VII shows that little efficiency was lost when analysing linear combinations of the observations using economic values as weights. For the case of just using observations for M i , F 1 and P I this is not surprising, since these traits were so highly correlated and had similar heritabilities, hence their index values resembled the economic values. For breeding goals H 2 and H 3 approximately 8% accuracy was lost when using all observations weighted by their economic values as a single trait, and approximately 4% accuracy was lost when analysing 3 new traits, each trait being a linear combination of observations and economic values within a lactation (table VII).
The last 2 rows of table VII indicate what accuracy was lost by ignoring within lactation information between milk, fat and protein yield for each lactation separately. In effect only average correlations between the traits within lactations are taken into account when combining separate indices for milk yield, fat yield and protein yield. The loss in accuracy was small and very similar to losses when using a repeatability model on the canonical variates (see table VI).

DISCUSSION
Only one aspect of efficiency of selection, namely accuracy of predicting some aggregate breeding value assuming fixed effects were known, was considered in this study. Meyer (1983) found that for BLUP prediction of breeding values increase in accuracy from including later lactation observations was largely through an improved data structure. Results from including information from relatives in the calculations, and including comparisons between young and old animals should have more direct relevance to practical breeding programmes.
Use of a repeatability model for milk production traits across lactations seemed to have little effect on accuracy of selection, although the predicted gain/accuracy may be approximately 10% too high. More research is needed to investigate longterm losses in response to selection when incorrect models are used to predict breeding values.
More information on the sampling variance of the parameter estimates are needed for testing the proportionality hypothesis. Ideally, one MV REML analysis on the 9 traits should be carried out, with an algorithm that would produce (second) derivatives. Still, calculations would then involve a 90 x 90 (45 genetic and 45 environmental) sampling variance matrix which would probably be subject to large sampling errors itself.
As pointed out by Meyer (1985), the canonical variates from creating independent variates in lactation 1 may have a biological explanation. The eigenvectors show that canonical variate 1 corresponds approximately to percentage protein (and fat content to a lesser extent) and canonical variate 2 to the difference between fat and protein content. Canonical variate 3 seems just to be the sum of fat and protein yield. Heritabilities for canonical variates were consistent with heritabilities found for fat and protein content previously reported. Canonical variates from diagonalising the complete 9 x 9 P-1 G matrix have similar biological explanations to the canonical variates from lactation 1, but now including comparisons between lactations (see table II).
Testing for proportionality of covariance matrices Notation: M = Moment matrix; a symmetric positive definite (PD) matrix of order lp, with mean squares and mean cross-products based on df degrees of freedom I = number of lactations, p = number of traits per lactation V = E(M) ; unknown PD covariance matrix of lp traits K = symmetric matrix of proportionality constants of order 1 Q 9 = direct product operator (see eg Searle, 1966), tr = trace operator L = natural logarithm of likelihood.
Using standard multivariate theory (eg Anderson, 1958;ch   ACKNOWLEDGMENT PM Visscher acknowledges financial support from the Milk Marketing Boards in the UK.