Estimation of genetic parameters for test day records of dairy traits in the first three lactations

Application of test-day models for the genetic evaluation of dairy populations requires the solution of large mixed model equations. The size of the (co)variance matrices required with such models can be reduced through the use of its first eigenvectors. Here, the first two eigenvectors of (co)variance matrices estimated for dairy traits in first lactation were used as covariables to jointly estimate genetic parameters of the first three lactations. These eigenvectors appear to be similar across traits and have a biological interpretation, one being related to the level of production and the other to persistency. Furthermore, they explain more than 95% of the total genetic variation. Variances and heritabilities obtained with this model were consistent with previous studies. High correlations were found among production levels in different lactations. Persistency measures were less correlated. Genetic correlations between second and third lactations were close to one, indicating that these can be considered as the same trait. Genetic correlations within lactation were high except between extreme parts of the lactation. This study shows that the use of eigenvectors can reduce the rank of (co)variance matrices for the test-day model and can provide consistent genetic parameters.


INTRODUCTION
National genetic evaluation models for dairy traits (milk, fat and protein yield, fat and protein content and somatic cell score (SCS)) are nowadays more often based on test-day models (TDM), using test-day (TD) records instead of yields aggregated over 305 d of lactation. The advantages of these models over former lactation models are widely acknowledged [11,17,23,26]. However, the application of such models for routine genetic evaluation of large commercial dairy populations is still a challenging task. Different strategies have been proposed to deal with these difficulties: multiple trait reduced rank random regression TDM [4,13,31], random regression TDM with parametric functions of low order [1,10], etc.
In a previous study [6], we found, in agreement with other studies [29], that the first two eigenvectors of the genetic (co)variance matrix explained a large part of the genetic variation and seemed to have a biological interpretation. This was also observed for the permanent environmental effect. The shape of these vectors was also conserved across studies. Therefore, in order to reduce the computational cost of the TDM, we proposed using these eigenvectors as new covariables in a random regression model instead of more traditional covariables such as the Legendre polynomials. The objectives of this study were to estimate these eigenvectors for all dairy traits used in French national genetic evaluation (milk, fat and protein yield, fat and protein content and SCS) and to apply a random regression model using these eigenvectors as covariables to all these traits separately for the first three lactations.

Data
The data were selected from about 4 million lactation records (first, second and third lactations) of Holstein cows from the four French administrative regions of Brittany. Only lactations with at least four TD records were considered and the lactation stage of TD records had to be between 5 and 335 days.
The following edits were applied to the data: calving date was required to be in the period from August 1994 to July 2000, and age at calving had to be from 20 to 38, 34 to 50 and 46 to 62 months inclusive for the first, second and third calving, respectively. Records with fewer than 40 days open, from animals with unknown parents or with unknown calving age, were also discarded. Herds with an average of at least 72 TD records per year were kept. Finally a random selection on herd number was applied to create eight data sets, each with approximately 6000 cows with records, 12 500 animals in pedigree files and 97 000 TD records. In total, 50 368 cows with records (101 369 animals in pedigree files) and 779 081 TD records were selected. The traits analyzed were milk, fat and protein production, fat and protein content and SCS.

Method
Previous studies [6,29] showed that the first eigenvectors explained most of the variation for both genetic and permanent environmental effects. Furthermore, these vectors were found to be very similar across studies using different methodologies. Therefore, a two-step procedure was applied to estimate variance components.

Step 1: Estimation of eigenvectors in first lactation
In the first step, variance components were estimated for the first lactation alone with a 5th order Legendre polynomial. This was done on the eight samples simultaneously with a pooling method [6]. The genetic and permanent environmental (co)variances among all DIM (days in milk) were obtained from the estimated (co)variance matrices among regression coefficients and the values of the Legendre polynomials at each DIM. The eigenvectors of these estimated genetic or permanent environmental covariance matrices were then computed [6].

Step 2: Estimation of parameters for the first three lactations
In the second step, the first two eigenvectors of the (co)variance matrices obtained in the first step were used to estimate the variance components of the three lactations altogether considered as correlated traits.
The same model was applied to milk yield, fat yield, protein yield, fat content, protein content and somatic cell score separately. The fixed part of the lactation curve was modeled as in [6] with a herd by test-date effect and regression splines [30] with 6 knots (chosen at 5, 20, 50, 130, 230 and 335 days in milk) for the age by lactation (21 classes), month by lactation (36 classes), and dry period by lactation (11 classes) effects and with five knots for the days carried calf by lactation effect.
Random genetic effects were modeled as: where g il (t) is the genetic value of animal i in lactation l at days in milk t, ν 1 (t) and ν 2 (t) are the values at time t of the first two eigenvectors of the genetic covariance matrix obtained in the first step. The same eigenvectors were used for each lactation. a ilm is the random additive genetic regression coefficients for animal i in lactation l for the mth eigenvector. The permanent environment was modeled similarly while the herd-by-year random effect was modeled with a 4th order Legendre polynomial for each lactation.
The residual variance was expressed as a function of days in milk with regression splines of 12 knots and separately for the three lactations: where t is the time at which the record is observed in lactation l, n is the number of parameters describing the residual variance, c i is the ith parameter of the function and ψ i (t) is the value at time t of the ith covariable describing changes of the residual variance over time. These covariables described here are regression splines with 12 knots which allowed flexibility and did not impose any particular shape to the residual variance curves. The exponential function ensured that the residual variance always remained positive.

Estimation of eigenvectors in first lactation
The first two eigenvectors of the genetic variance matrix obtained for each trait are presented in Figure 1. They appear to be similar across traits and seem to have a biological interpretation. Indeed, the first eigenvector (for all traits) appears to represent the average lactation potential of an animal (roughly a constant term during lactation) while the second (again for all traits) is characterized by values of opposite sign at the beginning and end of lactation, representing a measure of persistency. In addition, these eigenvectors explain most of the genetic variation: more than 95% for all traits (Tab. I). For traits, such as fat content or SCS, even the first eigenvector alone explains close to 95% of the genetic variation in the first lactation. The genetic variation of these traits therefore seemed easy to model and genetic correlations within the first lactation would be close to one. For the permanent environmental variance, the first two eigenvectors explained less variation: from 83.0 to 90.8%.
One important issue was to determine if these eigenvectors can be used as covariates in step 2 and how many should be used. Therefore, the results obtained in step 1 on milk production in first lactation were compared to those obtained with step 2 on first lactation data only (results not presented here). When using the first two eigenvectors for both genetic and permanent environmental variance, estimated parameters ((co)variances and correlations) were consistent and close to the results from step 1. However, some differences were observed and it was clear that a small part of the variation was not taken into account with the reduced model. When adding the third eigenvector for the  permanent environment, the results appeared very close to those of step 1 and only slight differences in variances were observed. Despite the fact that with only two eigenvectors for both random effects, the estimated genetic parameters showed some minor differences with step 1 estimates, the reduced model was chosen for several reasons. First, the proportion of total variation explained by this model was considered large enough. More critically the decision was largely influenced by computational considerations.
Indeed, we considered that a model with more than four random coefficients per cow and per lactation would be too large to implement. We were therefore very restrictive with the number of eigenvectors used. In addition, the two first eigenvectors had a clear biological interpretation while the meaning of the third eigenvectors was less straightforward. So, it made less sense to use them.
The eigenvectors associated to small eigenvalues which were discarded have a larger impact on some parts of lactation. For instance, in the comparison mentioned above (unpublished data), differences between models were larger at the beginning and the end of lactation. However, as mentioned before, we considered that modeling the variation associated to the main vectors was sufficient. Moreover, using only two eigenvectors ensures that the model is relatively robust and that we do not model undesirable variations due to overparameterization. Indeed, we do not know which part of the (co)variance structure has a real biological root and which part is due to artefacts of the model. For instance, with high order Legendre polynomials, the variance can present border or wave effects which do not reflect any biological reality.
These eigenvectors were estimated only on first lactation data. For the second and third lactations, the eigenvectors were not estimated again because using fifth order Legendre polynomial on three lactations jointly was computationally too demanding. A reduction of the number of animals in the study would result in a reduction of the precision of the estimated parameters. We considered that using the same eigenvectors for all lactations offered a reasonable alternative. Indeed, de Roos (personal communication) estimated that the first two eigenvectors were similar across the first three lactations and in a preliminary study on milk yield in the first two lactations (unpublished data), the similarity of eigenvectors across lactations was confirmed.
The choice of these eigenvectors as covariates appears as an attractive alternative to other covariates such as Legendre polynomials or the Wilmink function for instance. Indeed, the model presents most of the advantages of TDM [11,17,23,26], is more flexible than a repeatability TDM [17,20] and has lower computational costs than most random regression TDM [3,9,10,19]. The use of a third order Legendre polynomial as in Auvray and Gengler [1] or of a Wilmink curve [10,23] are models using only three parameters by effect for each lactation curve and in consequence require relatively low computational cost. In comparison with such models, the present approach seems competitive even when eigenvectors are computed on first lactation records only. Indeed, in all these models, covariates are identical for the three lactations, the two main covariates are relatively similar (a constant and a linear regression). Legendre polynomials are general covariates (not especially adapted to lactation curves), Wilmink curves were developed for standard milk production curves (not for content traits for instance -and not for deviations) while in the present case, eigenvectors are estimated for each trait separately. Globally, the covariates used in this study, even if estimated on one lactation only, seem good alternatives to more traditional covariates.

Estimation of parameters in the first three lactations
In Figure 2, an example of the pattern of variance components across the lactation is presented for fat yield. Some patterns were consistent for all traits. Indeed, the environmental variance (the sum of residual and permanent environmental variances) always presented a large decrease at the beginning of the lactations, was minimum in the middle and finally increased slightly at the end. This was also observed in numerous studies [6,15,16,18,19]. Nevertheless, for content traits, the increase at the end was sharper while for SCS, the curve was still decreasing at the end of the lactation as observed by Rupp [22]. This difference was not surprising since yields decrease while contents increase at the end of lactation. The herd by year variance was small in comparison with other variances for all traits as in [7] or [5]. As expected from previous studies [2,8,20,28], variances increased with the lactation number. It was also observed that the part of the genetic variance explained by the second eigenvector (related to persistency) was always smaller in the first lactation. These observations could be expected since lactation curves of cows show higher levels of production and are steeper when the lactation number increases [24]. The difference in lactation curves is especially high between first and later lactations. On the contrary, the pattern of the genetic variance changed across traits. Genetic variance of milk yield was maximum in the mid part of the lactation [15,16,18,21,26] while genetic variance of traits related to fat or protein were more or less increasing throughout the lactations. Finally, for SCS, this variance was almost constant for a large part of the lactations as in Rupp [22]. Since most previous studies focused on milk yield in first lactation, it was difficult to compare our results with those of others. Nevertheless, they were in good agreement with those obtained by Mayeres [15] or de Roos (personal communication) who analyzed milk, fat and protein yield.
The resulting heritabilities are presented in Figure 3. As expected, SCS presented the lowest heritability and heritability of content traits were as high as 0.50 to 0.60. These heritabilities were in agreement with lactation models or test-day model studies. Indeed, minimal and maximal values for heritability of milk yield in all lactations were 0.16 and 0.45, respectively. For most  parts of lactation, this heritability laid between 0.25 and 0.40. These values were in agreement with studies from Pool et al. [19], Liu et al. [14], Guo et al. [8], Jakobsen et al. [9], and de Roos (personal communication). For fat and protein yield, heritabilities were lower than for milk yield as observed by Liu et al. [14], Jakobsen et al. [9] or de Roos (personal communication).
In agreement with these authors, in first lactation, the heritability for both traits laid between 0.10 and 0.27. For later lactations, heritabilities for fat yield were similar while for protein they were larger at the end of the lactations. This was also observed by de Roos (personal communication). With lactation models, Tong et al. [28], Teepker and Swalve [27] or Schutz et al. [25] found that the heritability of fat or protein content was close to 0.50. This was also found in this study with a TDM. For fat content, heritability was maximum in the middle of lactation, with values close to 0.60. Heritability of protein content increased from 0.10 to 0.50 in the first part of lactation and then remained relatively constant. Finally, heritability for SCS ranged from 0.10 to 0.20, increasing during lactation, in agreement with Rupp [22].
Because covariables used in this study have a biological interpretation, correlations of the (co)variance matrices can be analyzed and compared more easily to the results from previous studies based on lactational models. Genetic correlations among coefficients of eigenvectors for all traits are presented in Table II. The first eigenvector will be called "production level" and the second "persistency" (these terms are not appropriate for non-production traits but for simplicity, we will use the same terms for all traits). Production levels, which represented the most important part of the genetic variation, showed high correlations across all lactations while persistency presented more variation across lactations. Coefficients for first lactation showed the lowest correlations with coefficients of later lactations. Between first and second lactation, they ranged from 0.89 to 0.99 and from 0.56 to 0.90 for eigenvectors related to production and persistency, respectively. As expected, first and third lactations were even less correlated: from 0.82 to 0.98 and 0.48 to 0.83 for eigenvectors related to production and persistency, respectively. For all traits, correlations among coefficients for production level in second and third lactation were above 0.95. This was also observed for persistency with the exception of protein content for which the correlation was equal to 0.75. This indicates that the effects for second and third lactation are almost the same but expressed on different scales (variances). Their modeling can be even more simplified by considering them as equal in second and third lactation and including a parameter for the varying magnitude of the variance. Teepker and Swalve [27] estimated genetic correlations among the first three lactations for the same traits (without SCS) but with 305 d records while Reents et al. [20] estimated these correlations for yield traits and SCS with TD records. They also observed that genetic correlations between second and third lactation were higher than 0.95 for all traits. Their estimates of the genetic correlations between first and later lactations were comparable to our estimates for the eigenvector related to production level which is logical because this covariable contributed the most to the 305 d records. Kistemaker [12] computed correlations between persistency proofs for milk yield in different lactations with three different measures of persistency. These correlations were lower than ours but he also found the largest correlations between the second and third lactation (0.86). Genetic correlations between level of production and persistency within a given lactation are also presented in Table II. A positive coefficient for the second eigenvector results in a steeper decreasing slope: the lactation of the cow is less persistent. Positive correlations indicate that, as expected, lactations of cows with high levels of production are less persistent. By construction, the correlation between eigenvectors is expected to be 0. Deviations were, however, observed because information from later lactations bring changes to the (co)variance structure in the first lactation, because the variation formerly explained by the removed eigenvectors must be redistributed to the remaining covariates and because eigenvectors of the first lactation are not completely appropriate for later lactations. Therefore, correlations between eigenvectors deviated from 0 in order to fit the (co)variance structure as well as possible. Most correlations are close to 0 in first lactation, in agreement with expectations. In later lactations, for production traits, correlations increased, indicating an opposition between production and persistency. For content traits or somatic cell count, the patterns were different: close to 0 or negative, for fat content. For these traits, the first eigenvector can be linked to mean content and the second to the variation of content during lactation. Correlations between these effects were not necessarily expected to be negative. Indeed, the behavior of content traits during lactation is different than the behavior of production traits: high contents in the beginning of lactation are not expected to be associated with low contents at the end of lactation.
Correlations among coefficients of eigenvectors for permanent environmental effects are presented in Table III. These effects are less correlated than genetic effects. These correlations were all positive but ranging from as low as 0 up to 0.85. The model appeared quite flexible because correlations among permanent environmental effects were allowed to have a different pattern as genetic correlations. For instance, genetic correlations for production level for fat content or SCS were very high while for permanent environmental effects, they were high for fat content and low for SCS.
Two extreme examples of genetic correlations between daily records within the same lactation are given in Table IV. Genetic correlations within lactation were higher in first compared to later lactations for all traits. Accordingly, Lidauer et al. [13] observed that the genetic correlations between extreme DIM were intermediate in first lactation but negative, although weak, in second lactation. In first lactation, the first eigenvector, which gives a relatively constant weight throughout lactation, represented most of the genetic variance. Consequently, the genetic values were relatively constant during most of the lactation. In later lactations, the importance of the second eigenvector increased, creating more opposition between genetic values of extreme parts of lactation. Therefore, genetic correlations decreased. This was noticeable on variance patterns: traits or lactations where the second eigenvector had little importance and for which genetic correlations were relatively high between all parts of lactation presented a linear variance. On the contrary, when the second eigenvector explained more variation, the pattern of the variance was  Fig. 2 for instance where correlations are higher in first than in later lactations). Milk yield in second lactation and fat content in first lactation presented the lowest and the highest correlations within lactation, respectively. They are presented in Table IV. For fat content, genetic level was mostly constant throughout the first lactation, with correlations above 0.70 for extreme parts of lactation. On the contrary, in second lactation, genetic levels of extreme parts (before day 20 and after day 320) of lactation were nearly independent for milk yield. However, for a large part of the lactation, correlations remained high and only the end of the lactation was less correlated with the remainder. Correlations for other lactations and traits were intermediate between those two situations. Our results were consistent with previous studies for milk yield in first parity [3,16,18].

CONCLUSION
The first two eigenvectors of the genetic and permanent environmental matrices in first lactation were found to explain most of the variance and to be similar for all traits. They seemed to have biological interpretations, the first one representing the level of production (a relatively constant term during lactation) and the second would be related to persistency (opposing the beginning and the end of lactation).
The use of these eigenvectors as covariables in a random regression test day model limits the rank of the (co)variance matrices and resulted in consistent genetic parameters. Because the eigenvectors had a biological meaning, interpretation of genetic parameters was easier. Genetic correlations between random coefficients for second and third lactations were close to 1, indicating that these effects were very similar, but expressed with different variances. The rank of the genetic (co)variance matrix can be reduced even further by considering these effects as being the same.