Univariate and multivariate parameter estimates for milk production traits using an animal model. I. Description and results of REML analyses

Using production records in lactations 1-3 from 100 large Holstein-F'riesian pedigree herds, parameters for milk, fat and protein yield in lactations 1-3 were estimated with REML using an animal model. The number of records for each lactation was 38 811, 26 223 and 16 542 for lactation 1, 2 and 3 respectively. Heritabilities for the 3 yield traits were similar: approximately 0.36 in lactation 1 and 0.30 in lactations 2 and 3. Genetic correlations between yield traits in lactations 1 and 2, for example between milk production in first and second lactations, were approximately 0.86. Genetic correlations between yield traits in lactations 2 and 3 were near unity. Genetic correlations between yield traits within lactations ranged from 0.58, for milk and fat yield in lactation 3, to 0.91, for milk and protein yield in lactation 1. Genetic correlations between yield traits between lactations ranged from 0.55, for milk yield in lactation 1 and fat yield in lactation 2, to 0.85, for milk yield in lactation 2 and protein yield in lactation 3. Environmental correlations between traits within lactations were approximately 0.95, and approximately 0.40 across lactations. dairy cattle / animal model / maximum likelihood / multivariate analyse / multi-trait / multi-lactation Résumé-Utilisation du modèle animal pour l'estimation des paramètres univariates et multivariates concernant les caractères de production laitière. I. Description et résultats des analyses selon le maximum de vraisemblance restreint (REML). A partir des données obtenues pendant les lactations 1 à 3 dans 100 grands troupeaux Holstein-Freisian inscrits, les paramètres de production de lait, de matière grasse et de matière protéique pour les lactations 1 à d ont été estimés par maximum de vraisemblance restreint (REML) selon le modèle animal. Le nombre des données pour chaque lactation a été de 38 811, 26 223 et 16 542 pour les lactations 1, 2 et 3 respectivement. Les héritabilités des 3 critères de production ont été similaires: approximativement 0,36 en première lactation et 0,30 en seconde et troisième lactation. Les corrélations génétiques entre les caractères de production en lactations 1 et 2, par exemple la production laitière, ont été approximativement de 0,86. Les corrélations génétiques entre critères de production aux lactations 2 et 3 ont été pratiquement égaux à 1. Les corrélations génétiques entre les critères de production intralactation ont varié de 0,58, pour la production laitière et la production de matière grasse en lactation 3, à 0,91 pour la production laitière …


INTRODUCTION
Dairy cattle sire evaluation in many countries is carried out using best linear unbiased prediction (BLUP) (Interbull, 1988), while cows are usually evaluated separately using a selection index type approach (eg Hill and Swanson, 1983).
Recently there has been a shift towards a joint evaluation of cows and bulls, using a so-called animal model (AM). Some countries have implemented an AM national evaluation for single traits (Wiggans et al, 1988a, b;Ducrocq et al, 1990; Jones and Goddard, 1990), others are in the process of doing so. Assumptions about the covariance structure of observations analysed with a linear model are often simplified to make computations feasible. For example, the USA (Wiggans et al, 1988a), France (Ducrocq et al, 1990; Bonaiti and Boichard, 1990) and Australia (Jones and Goddard, 1990) use a modified repeatability model for which a genetic correlation of unity is assumed between performances across lactations and some (pre)scaling is applied to later lactation records to account for higher phenotypic variances of traits in later lactations. Later lactation records are given lower weightings by adjusting the error structure of the observations, and milk, fat, and protein yield are analysed separately using this modified repeatability model. The potential loss in efficiency of selection by making these assumptions depends on the true, unknown, covariance structure of the data, and on the breeding goal. By estimating relevant (co)variances and assuming a particular combination of traits to select for, the potential loss in efficiency of selection by using simplified covariance structures may be quantified.
For estimating (co)variance components it seems desirable to use the same model as is, or soon will be, used for the prediction of breeding values, ie an animal model. Few (co)variance estimates from AM analyses have been reported; Swalve and Van Vleck (1987) analysed milk yield in lactations 1-3, and Van Vleck and Dong (1988) performed a multivariate analysis on milk, fat and protein yield in the first lactation.
The aims of this study were: 1) to estimate parameters for milk (M), fat (F) and protein (P) yield in lactations 1, 2 and 3 (L l , L 2 , L 3 ); 2) to investigate the implications of the estimates for prediction of breeding values when simplified assumptions are made regarding covariances structures. This part of the study is reported separately .
Estimates of correlations between different traits in different lactations, for example between milk yield in lactation 1 (M 1 ) and fat yield in lactation 2 (F Z ) have not been reported before. In the notation used, the number following M, F or P refers to lactation number, and the combination above, M 1 and F 2 , may be written as M 1 F 2 . Similarly, a multivariate (MV) analysis on M 1 , F, and P I may be written as analysing M 1 F 1 P 1 . MATERIAL , First, second and third lactation production records for the period [1979][1980][1981][1982][1983][1984][1985][1986][1987] from 100 large pedigree herds were extracted from the Milk Marketing Board's production files. Herds were selected on the number of heifers present in 1987, ie data were extracted from those herds which had the largest number of first lactation cows in 1987. Later lactation records, ie second or third, were included only from cows for which the previous lactations were present. All cows were pedigree Holstein-Friesian (HF). Some summary statistics of the data are presented in table I. METHODS _ Residual maximum likelihood (REML; Patterson and Thompson, 1971) was used to estimate (co)variances, using programs based on software written by Meyer (1988Meyer ( , 1989. Fixed effects in the mixed linear model were herd-year-seasons (HYS) and month of calving. Seasons were defined as 4-month periods, corresponding to the definition used for the current UK sire evaluations. Proportion of HF ancestry in the cow, age at calving and lactation length were fitted as covariables. All animal effects, including those of proven sires. were treated as random; this may cause a (downward) bias in the estimates, since comparisons between proven sires contribute to the estimate of genetic variance. For these data, the possible bias in parameter estimates from treating all sire effects as random was investigated elsewhere .
The following analyses were carried out: 1) univariate analyses for each of M, F and P in lactations 1-3. If culling takes place on performance in previous lactations, the parameter estimates from univariate analyses on later lactations will be biased. Comparing variance components from these univariate analyses with components from models that (partly) take account of selection may give some indication about what kind of selection (if any) has acted on these data; 2) analyses using a repeatability model for each of M, F and P in lactations 1 and 2. For this model it was assumed that the genetic correlation of performance between lactations was unity and that variances were constant across lactations.
A permanent environmental effect was fitted as an additional random effect for these analyses. Comparing results from these analyses with results from bivariate analyses may show how the (co)variances are partitioned when a genetic correlation of unity between performances in lactations 1 and 2 is implicity assumed; 3) within lactation (for L 1 , L 2 and L 3 ) MV analyses for traits M, F and P. An algorithm proposed by Thompson and Hill (1990) was used to estimate (co)variances. Their algorithm was designed to reduce a multivariate estimation problem to a set of independent univariate estimations. Assuming equal design matrices for p traits, Thompson and Hill (1990) proposed performing q = p(p+1)/2 univariate analyses, where the q &dquo;traits&dquo; are obtained from linear transformations of the p traits, and suggested finding a transformation matrix (iteratively) that would stabilise the backtransformed p x p covariance matrices from one round to the next. Following Thompson and Hill's suggestion, the initial transformation matrix was chosen so that p = 3 traits and 3 sums of traits were analysed. Subsequently, after q = 6 univariate analyses, a canonical transformation was calculated and 3 canonical variates were formed. The next &dquo;round&dquo; consisted of performing univariate analyses on these 3 canonical variates and on 3 pairwise sums of the canonical variates. The whole procedure was stopped after 5 complete rounds of iteration, since correlations on the original scale changed very little from round 4 to 5. Thompson and Hill (1990) proposed their algorithm for the case of equal design matrices and more than 2 random effects in the linear model. For the analyses described above, only 2 random effects (animal and residual) were fitted, so that a &dquo;standard&dquo; canonical transformation (see eg Meyer, 1985) could have been applied. Both methods, however, should give similar estimates, since the described algorithm was found to be highly efficient (Thompson and Hill, 1990); 4) bivariate (BV) analyses on all pairwise combinations of traits in different lactations Unfortunately, analysing the data using a general MV model (for example with 3 traits in 3 lactations, ie for 9 traits) was computationally not feasible. Therefore, selection bias is likely to affect some of the parameter estimates. In particular, (co)variances estimated for lactation 2 and 3 will be biased if culling was based on performance in the first lactation. For all BV analyses the fixed effect structure was different for both traits.
For most analyses data sets were too large to be handled in one single likelihood evaluation. Data sets were therefore randomly subdivided into subsets of herd groups. The estimates from each sample of herds were assumed to be independent of other estimates. This assumption is not strictly true, since some sires had progeny in different subsets. The correlation between estimates from different samples depends on the number of sires represented in different samples and their contribution to the parameter estimates in each sample. For analyses 1) and 2) data were split into 5 subsets of 20 herds each, for analyses 3) into 5, 4 and 2 subsets (for L 1 , L 2 , and L 3 respectively), and for 4) into 10 groups of 10 herds. For the univariate analyses and the analyses using a repeatability model, the standard errors (SE) of the estimates were calculated by approximating the likelihood surface at the maximum likelihood estimates by a quadratic function in the parameters of interest and using the matrix of second differentials to calculate asymptotic variances of the estimates (see Visscher et al, 1991, for an application and discussion of this procedure). For the within lactation MV analyses and the BV analyses, the average (co)variance estimates are presented with the empirical standard error of the mean estimate. No weighting of estimates was applied because subsets were roughly of equal size and there was insufficient information about the sampling (co)variances of the variance components (a weighting according to the number of records in the analysis was tried and showed differences between weighted and unweighted means of the order of 1% of,tl,1e : mean).
It was not clear how to combine the different estimates efficiently into one overall (9 x 9) covariance matrix, since there was insufficient information about sampling variances and;.culling, bias. Estimates of variances and covariances of M, F and P in lactation 3, fpre X 8 J llple, were available from bivariate analyses L l L 3 and L 2 L 3 and from MV analyses within L 3 , all of which were probably subject to culling bias. The following method .was chosen to create 9 x 9 genetic, environmental and phenotypic covariance matrices which were consistent with each other: for L l the (co)variances from analyses 3) were,used,. The variances (diagonals) in L 2 and L 3 were taken from BV analyses L l L 2 and.L l L 3 u'sing the same trait in each lactation. For example, the variance estimate for P 3 wa! iised from analysis P l P 3 -Within lactation genetic and environmental covariances between M, F And P for lactations 2 and 3 were calculated using the variances as described above and the estimates of the within lactation genetic and environmental correlations. The phenotypic covariances were calculated as the sum of the genetic and environmental covariances thus created and phenotypic correlations were calculated from these. The same method was used to calculate covariances between different traits in different lactations, now using the genetic and environmental correlations estimated from BV analyses. This somewhat arbitrary way of combining different estimates was found to give fewest problems of negative definite covariance matrices. It was thus assumed that variances from BV analyses L l L 2 and L 1 L 3 , and genetic and environmental correlations between traits within lactations, were least biased by selection.
To summarise the calculation of the 9 x 9 covariance matrices: -All genetic, environmental and phenotypic (co)variances within lactation one were from multivariate analyses on M i F i P i ; -Environmental and genetic correlations between milk, fat and protein yield within lactations 2 and 3 were from multivariate analyses on M 2 F 2 P 2 and M 3 F 3 P 3 respectively; -Environmental, genetic and phenotypic variances for M 2 , F z , P z , M 3 , F 3 , and P 3 were calculated from bivariate analyses on M i M 2 , F 1 F 2 , PiP 2 , M 1 M 3 , F l F 3 and P 1 P 3 respectively; -Environmental and genetic correlations between traits between lactations were taken from bivariate analyses for each pairwise comparison; -All remaining phenotypic covariances and phenotypic correlations followed directly from combining the above calculated elements.
Parameters for fat and protein content were approximated using a first order Taylor series expansion. If x il y i and x jl y j are ratio traits in lactations i and j respectively, then an approximation of the covariance between those 2 traits is: with CV the coefficient of variation (= a //1 ) and r x ,y the correlation between traits x and y. Formula [1] was applied using estimates of the coefficients of variation and estimates of the (co)variances for the yield traits in lactations 1-3.

RESULTS
The main results of the different analyses are presented in tables II-X. Heritabilities for production traits for the first lactations (table II) were moderate to high.
Although the genetic parameter estimates from the univariate analysis for lactation 2 may be biased by selection, increase in the environmental variance for lactation 2 (which is unlikely to be greatly affected by culling) was striking; the ratio of environmental variances in lactation 2 to that in 1 was approximately 1.6. Part of the increase in variances for the second lactation may be a scale effect associated with a larger mean (see also tables I and XIII for means and coefficients of variation), since the (biased) genetic variance for lactation 2 is also larger than the first lactation genetic variance. Results for lactation 3 also showed an increase in environmental and phenotypic variance. The estimate of the genetic variance in lactation 3 was smaller than estimates for either lactation 1 or 2, most likely explained by ignoring the effect of culling from lactation 1 to ,2 and from lactation 2 to 3.
Results from analyses with a repeatability model are presented in table III. Heritabilities were slightly lower than those estimated from univariate analyses on first lactations only. The variance component estimates from the analyses using a repeatability model (table III) may be explained using the general bivariate model results from tables V to VII ; it seems that both the genetic and phenotypic variances from the repeatability model were roughly the average of the bivariate first and second lactation parameters, and the average environmental variance in lactation 1 and 2 was partitioned into a permanent environmental and residual variance. If selection were on first lactation performance, a repeatability model should account for this selection effect, conditional on a genetic correlation of unity between first and second lactation performance. Table IV shows the heritability and correlation estimates from the within lactation MV analyses. Heritabilities were similar to univariate (unitrait) estimates from table II, as expected, and again heritability estimates from L 2 and L 3 are expected to be biased downwards. Phenotypic correlations between yield traits were very similar for different lactations, and genetic correlations were slightly lower in L 2 in comparison with L 1 , but similar for L 2 and L 3 . Genetic and phenotypic correlations between milk and protein yield were very high, and environmental correlations for these traits calculated using the estimates from table IV were close to unity.
In tables V to VII the combined 9 x 9 covariance matrices are presented. The similarity between the various 3 x 3 lactation by lactation covariance blocks is striking. In a subsequent study the consequences of these results for prediction of breeding values are investigated further. From table V it seems that genetically L 2 and L 3 are essentially the same for the yield traits, with genetic correlations between performances in second and third lactations in excess of 0.97. Comparing pairs of covariances or correlations such as M 1 F 2 and F I M 2 shows that their values are similar, which indicates that the ratio of variances for traits in different lactations are similar for M, F, and P. Similar proportionalities seem to exist for environmental components (table VI). Environmental correlations between traits within lactations were similar for lactations 1-3. Phenotypic correlations between traits within lactations (table VII) are not necessarily the same as those from   table IV, because of the way this table was constructed. Little change, however, is observed. Phenotypic correlations for M l M 2 , F 1 F 2 and P 1 P 2 were slightly higher than repeatability estimates from table III. Again the proportionality of the various 3 x 3 covariance blocks is striking.
In table VIII, heritability estimates for the 9 &dquo;traits&dquo; are given which are expected to be least biased through selection, with coefficients of variation for genetic, environmental and phenotypic effects. As before, lactations 2 and 3 seem very similar. For all yield traits the additive genetic CV slightly decreased from L l to L 2 , and the environmental CV increased from L l to L 2 . Scale effects therefore act differently for genetic and environmental effects, and there seems to be no single scale transformation which would standardise both genetic and residual variances across lactations.
Many analyses that were carried out yielded different estimates for the same variance component. For example, an estimate for M 1 was available from a univariate analysis, from a MV analysis with F, and P 1 , and from 6 different BV analyses.
All those different estimates for the same component are shown in table IX. For each row the 2 identical values were from within lactation MV analyses, since, for example, M 1 , F, and P I were analysed multivariately but pairwise combinations M 1 F 1 , M 1 P 1 and F 1 P 1 were not analysed bivariately. Diagonals in table IX were from univariate analyses (see table II). As expected, the various estimates for first lactation variances are very similar, since these estimates are free from selection bias. Ignoring first lactation information to estimate variances in later lactations reduces the additive genetic variances by approximately 10%, most likely due to culling bias. It is not clear why the highest estimate for any trait in L 2 was from a combined analysis with the same trait in L 1 , ie M 1 M 2 gave the highest estimate for M 2 , and F l F 2 and P I P 2 showed the highest estimates for F 2 and P 2 respectively. Using prediction equations for selection biases from Meyer and Thompson (1984), no selection strategy for first lactation production traits was found that would produce these results.
A summary of the parameters calculated for fat and protein content (F% and P% respectively), from using equation (1), is presented in tables X and XI. Heritabilities for F% and P% were high and were fairly constant across lactations. Genetic correlations for F 2 %F 3 % and P 2 %P 3 % were substantially lower than the genetic correlations between yield traits in second and third lactations. Parameters for first lactation traits (M 1 , F 1 , P 1 , F l % and P 1 %) were similar to estimates from a 5 x 5 MV analysis on all traits in lactation one (results not presented). Genetic correlations between protein yield and protein percentage were negative in first and positive in later lactations, although small in all cases.

DISCUSSION
Univariate first lactation heritabilities were similar to the most recent UK estimates using a sire model (Meyer, 1987), but higher than estimates of Hill et al (1983) and Meyer (1983 and1984). Heritability estimates from pedigree populations are often higher than from non-pedigree populations (Meyer, 1987;Carabafio et al, 1990). In dairy cattle, heritability estimates from daughter-dam regression are usually higher than estimates from paternal half-sib comparisons (Maijala and Hanna, 1974;Van Vleck 1986), and since the AM-REML estimates are a combination of both, this may &dquo;explain&dquo; why the AM estimates are higher than previous estimates from sire models. Swalve and Van Vleck (1987) found AM-REML heritability estimates of approximately 0.33 for milk yield in the first 3 lactations, using a trivariate model and ignoring relationships between animals across herds. Information contributing to their heritability estimates was therefore mainly from daughter-dam comparisons. Van Vleck and Dong (1988) reported AM heritability estimates of 0.36, 0.35 and 0.33 for milk, fat and protein yield in first lactations. The increase of the phenotypic variance over time, additional to an increase associated with a higher mean production, is striking; a regression of the coefficients of variation (CV) of milk production in the UK on time, using literature estimates from , Meyer (1984Meyer ( , 1987 and Visscher et al (1991), shows a slight increase in the phenotypic CV from 1976-1987 and an increase in the genetic CV from 7 to 9%. The explanation for this observation is not clear, although perhaps better estimation procedures, in particular those accounting for selection on the data, may account for some increase in the estimate of the genetic variance in addition to a scale effect.
Genetic and phenotypic correlations between M 1 , F 1 and P 1 were slightly higher than the correlations found by Van Vleck and Dong (1988). Genetic correlations between M 1 , M 2 and M 3 were almost identical to the estimates of Swalve and Van Vleck (1987) and slighly lower than the sire model estimates of Meyer (1987).
A small negative genetic correlation between protein yield and protein content in lactation 1 was also reported by Swanson and Gnanasakthy (1991). Genetic correlations between protein percentage and yield traits indicate that response to selection for fat and protein yield can be achieved without a reduction in the level of protein percentage, which accords with the wishes of many European dairy breeders. The explanation for the substantially lower genetic correlation between content traits in lactation 2 and 3, ie for F 2 %F 3 % and P 2 %P 3 %, compared with near unity correlations for the yield traits is not clear. Applying equation [1] to F Z % and F 3 %, assuming all CVs are equal and genetic correlations for F 2 F 3 and M 2 M 3 are unity, gives: Therefore one explanation may be that the within lactation correlations, calculated from within lactation MV analyses were biased downwards relatively more than the between lactation between trait correlations which were calculated from BV analyses.
If culling of first lactation cows were on some linear combination of their milk, fat and protein production in the first lactation or on any &dquo;culling variate&dquo; correlated with the traits being analysed, this form of selection would only partially be accounted for when using a bivariate REML estimation (see Robertston, 1966, for a detailed theoretical framework of a culling process). Therefore the BV second lactation parameter estimates may be slightly biased. The 3 traits considered were highly correlated, however, and the ratio of bivariate to univariate variance components was similar for all traits, which suggests that the bias may be small. Thompson (1984) presented prediction equations of selection biases for a one-way sire classification, when culling is on a trait correlated with yield in the first lactation and maximum likelihood is used to estimate the parameters. Using their prediction formulas the selection bias was investigated for various combinations of genetic and environmental correlations between the culling variate and the traits in the BV analyses. Selection intensity was calculated from the relative number of cows that had second lactations. It was found that for a range of parameter values likely to correspond with the true population values for milk, fat and protein yield, small biases were predicted for the estimates of the genetic parameters, but substantial biases (up to 40% fo the true values) could occur for the environmental correlations between the 2 traits in the analyses. For example, if the culling variate was fat yield in lactation 1, the percentage biases in the estimate of the heritability for the trait in lactation 2 and for the genetic and environmental correlation would be 0, 0.4 and -4.4 respectively for 1!11M2, and 0.2, 0.3 and 4.4 for PiP 2 , using the BV parameter estimates as true population values. Although most of the information used in AM-REML is a combination of comparisons between (paternal) half-sibs and daughter-dam pairs, the effect of selection on a correlated trait is unlikely to be large for the range of parameters investigated.

Meyer and
The parameter estimates from the bivariate model clearly showed that production traits in the second lactation are not repeated observations of first lactation records. Still, most countries use a repeatability model in their national AM evaluation, albeit with a lower weighting given to second and later lactation records. The weighting of later lactations seems the only instrument within the present day national AM evaluations to approximate the more appropriate multivariate model, for which heritabilities are lower and variances are much higher in later lactations. Additional to the implicit assumption of a genetic correlation of unity between first and later lactation yields, an improper weighting of later lactations when using a repeatability model will reduce genetic progress. Some calculations thereof are given in a subsequent study.
As described previously, the method used to create 9 x 9 covariance matrices from various available estimates was somewhat arbitrary. Any combination of estimates is expected to give sampling problems, since the traits are so highly correlated. For example, using heritability estimates from table VIII with genetic and phenotypic correlations from table IV gives 3 within lactation environmental covariance matrices which all are negative definite. Using estimates of environmental correlations between M 1 , F, and P I from Maijala and Hanna (1974), Meyer (1985) and Van Vleck and Dong (1988), determinants of the environmental correlation matrix were found to be -0.003, 0.012 and 0.03 respectively, indicating that sampling problems may be expected with these traits. Still, when using the method described to calculate full 9 x 9 covariance matrices, sampling problems were not eliminated: the 9 x 9 genetic covariance matrix presented in table V is not positive definite. However, the only negative eigenvalue is this matrix was relatively close to zero (&mdash;0.04 after standardising all phenotypic variances to unity for M 1 , F, and P 1 ). Setting this eigenvalue to a small positive number (eg 10-6 ) and recalculating all matrices showed very little difference for all variance components.