Validation of an approximate approach to compute genetic correlations between longevity and linear traits

The estimation of genetic correlations between a nonlinear trait such as longevity and linear traits is computationally difficult on large datasets. A two-step approach was proposed and was checked via simulation. First, univariate analyses were performed to get genetic variance estimates and to compute pseudo-records and their associated weights. These pseudo-records were virtual performances free of all environmental effects that can be used in a BLUP animal model, leading to the same breeding values as in the (possibly nonlinear) initial analyses. By combining these pseudo-records in a multiple trait model and fixing the genetic and residual variances to their values computed during the first step, we obtained correlation estimates by AI-REML and approximate MT-BLUP predicted breeding values that blend direct and indirect information on longevity. Mean genetic correlations and reliabilities obtained on simulated data confirmed the suitability of this approach in a wide range of situations. When nonzero residual correlations exist between traits, a sire model gave nearly unbiased estimates of genetic correlations, while the animal model estimates were biased upwards. Finally, when an incorrect genetic trend was simulated to lead to biased pseudo-records, a joint analysis including a time effect could adequately correct for this bias.


INTRODUCTION
Functional traits refer to traits related to the ability to remain productive. Their importance increases significantly in situations where production is limited or constrained (quotas) [16]. In general, with the exception of some type traits, functional traits exhibit two problems: rather low heritabilities and insufficient information early in life. These lead to genetic evaluations with low reliabilities for young sires [14]. Fortunately, more heritable traits can be used as early predictors of these functional traits. For example, in dairy cattle, early predictors of, e.g., somatic cell count or functional longevity can be found in the long list of type traits recorded in each breed [2,6,18,21,22]. A technique to properly combine these pieces of information is needed.
The optimal estimation procedure to combine information from different linear traits is known to be the multiple trait BLUP evaluation [8,20]. MT-BLUP provides an improved accuracy of the evaluation on each trait through an increase of the amount of information, an improved data structure through better connectedness and a correction of biases due to selection on correlated traits. A multiple trait evaluation automatically accounts for the fact that traits are correlated and that the relative accuracy of the evaluation for each trait may greatly vary between the animals [14].
However an MT-BLUP evaluation on functional and production traits altogether, although conceptually possible, is not routinely feasible. Traits are often described by very different models. Some of these models are not linear; others involve repeated measures and/or more than one random effect or are analysed accounting for heterogeneous variances. Above all, the amount of data to manipulate in national evaluations is tremendous. Despite huge and fast improvements in computing power, computational considerations are still a limiting factor. Furthermore, a large set of dispersion parameters must be estimated accurately before being included in such an evaluation.
Ducrocq et al. [14] proposed a two-step approach for multiple trait evaluation of functional and production traits. First, univariate analyses are performed for each trait to get genetic variance estimates and to compute pseudo-records and their associated weights. Pseudo-records here can be regarded as a generalization of deviations or records corrected for environmental factors to more complex situations such as repeated records and nonlinear traits. Combining these pseudo-records in a multiple trait animal model while fixing the genetic and residual variances, one can get correlation estimates and approximate MT-BLUP breeding values that blend direct and indirect information.
Among functional traits, longevity was found to be the most important in most studies on relative economic weights in dairy cattle [5,16]. Routine genetic evaluations of bulls on the length of productive life of theirs daughters rely on the modelling of a hazard function, which describes the limiting probability for a cow alive just prior to time t of being culled at time t. This allows a conceptually natural analysis of records from animals that are still alive (censored records) together with already culled animals (uncensored records). Moreover, this non-linear model used to describe the hazard function can include time-dependent fixed effects (e.g., herd-year-season, stage of lactation), which permit to precisely account for changes in culling policies over time.
The aim of this paper was to check via simulation the two-step approach for multiple trait genetic evaluation of longevity and linear traits. After the analysis of a reference situation, a sensitivity analysis was performed to check the suitability of this approach in a wide range of situations.

General strategy
Observations of longevity (t) and of two linear traits (y 1 , y 2 ) with known genetic and residual correlations were simulated. For longevity traits, the current models of analysis used are so different (they have to deal with non linearity, censoring and non normal residuals) that an exact multiple trait approach is usually not feasible, at least on moderate or large size data sets. The proposed approach was aimed at summarising the data in such a way that the simplest linear animal model can be used for each trait. This first step requires the calculation of a pseudo-record y * i,m for each animal m and trait i corrected for all non genetic effects and an associated weight w i,m indicating the amount of information for that animal. These pseudo-records are obtained from a univariate (or a simpler multivariate) analysis, after estimation of the relevant dispersion parameters. Then, all pseudo-records are analysed together using a classical MT-BLUP framework assuming an animal model: where µ i is the overall mean for trait i, a i,m is the additive genetic value of animal m for trait i and e i,m is the residual. In order to account for the variable amount of information summarised in y * i,m , its residual variance is assumed to be heterogeneous: var(e i,m ) = σ 2 e,i /w i,m , where σ 2 e,i is the residual variance for trait i.
The derivation of pseudo-records and their weights is based on the following principle: when analysed using the simplistic univariate BLUP animal model (1), these records should lead to EBV equal or as close as possible to the EBV obtained with the complete model and in the case of nonlinear traits, with the adequate methodology. Then, the application of a MT-BLUP animal model based on equation (1) is straightforward. It provides the appropriate EBV for all traits and all animals. However, MT-BLUP requires an adequate knowledge of the correlations between traits. The variances are supposed to be the ones estimated for the simpler univariate analysis used to compute pseudo-records.

Simulation of the dataset
The records of 5000 animals m roughly resembling the length of productive life of dairy cows were simulated using the following Weibull log normal frailty model: where h(t m ) is the hazard function at time t m > 0, the Weibull parameters ρ and λ are strictly positive. A ρ value of 2.0 (increasing hazard) and a λ value such that the median time t med was 1000 days were used for the simulation. The two linear traits were simulated based on the model Two means µ 1 = 100 and µ 2 = 200 were also arbitrarily added without a lack of generality. For each trait i, two fixed effects β i = β i,1 β i,2 , with respectively 10 and 100 unbalanced levels, were generated from a uniform distribution and appropriate change of scale. The true breeding values of the 5000 animals a i,m , that were progeny of 50 unrelated sires, were obtained by adding half the breeding value of their sire s i,m to a value u i,m covering the dam contribution and Mendelian sampling, i.e. representing three quarters of the total genetic variance. The values were drawn from a MVN(0, G) distribution, where G is the genetic covariance matrix. In the reference situation, genetic variances for longevity and linear characters were respectively 0.20, 400 and 600. Genetic correlations between all pairs of traits were 0.4.
Residual values e i,m for the two linear traits were also generated from values drawn from a BV N(0, R) distribution, where R is the desired residual covariance matrix. Residual variances were chosen to lead to heritabilities of 0.25 for the first linear character and of 0.10 for the second. The distribution of the residual component for the longevity measure is proportional to an extreme value distribution. The residual correlation between the two linear traits was 0.4. In the reference situation, a residual correlation of 0 was assumed between longevity and the two linear traits.

Calculation of pseudo-records and their associated weight
Longevity was analysed using a Weibull frailty model [12,14]. First, the genetic variance was estimated using a sire model. Then, assuming that this estimated variance is the correct one, longevity pseudo-records and their weights were obtained using two procedures. The first one was a two-step procedure where the first step involved the estimation of fixed effects and sire EBV using a sire model, and the second step consisted in calculating the progeny's EBVâ i,m , considering fixed effects and sire EBV as known. This results in an approximation of the EBV solutionsâ i,m from a Weibull animal model (see [9] for details). The second procedure consisted in directly estimating the progeny's EBVâ i,m from a Weibull animal model. The resulting pseudo-record for longevity for animal m in both procedures was (for details, see [15]).
where δ m = 0/1 is the censoring code. The associated weight w i,m =Ĥ i,m is the cumulative riskĤ i,m of animal m from time 0 to culling or censoring time. The calculations were done using a modified version (version 5.0) of the Survival Kit [12]. For linear traits, instead of a univariate analysis for each trait, we decided to use a bivariate animal model, which is simple to implement here. First, genetic and residual dispersion parameters were estimated via an AI-REML procedure. The pseudo-record for trait i and animal m was simply the record corrected for fixed effects: The associated weight w i,m for the approximate MT-BLUP evaluation was the diagonal element of the least square part of the mixed model equations (MME) after absorption of the "contemporary group" fixed effect, that is the effect with the largest number of levels. This way, a lower weight is given to observations from small contemporary groups. It may seem inconsistent to absorb this fixed effect on one side and still correct for its estimate in (4). But as pointed out by a referee, the absorption matrix M is idempotent (MM = M). Therefore, the use of the absorption matrix M to weight y* leads to the correct genetic estimates: Using y or y* leads to the same solution vectorâ.

Joint analysis
Genetic and residual correlations were estimated via an AI-REML procedure applied to pseudo-records assuming that the variances are known and equal to the estimates obtained in the first step (the residual variance for longevity was fixed to 1, since this value plays the role of the residual variance in the variance ratio 1/σ 2 a to be multiplied to the inverse relationship matrix [9, equation (12)]). Data were analysed using either a sire or an animal model to compare the performance of both models. Several REML packages exist but none is really adapted to model equations such as (1) with heterogeneous residual variances. A simple trick avoids this limitation [14].
Multiplying both sides of the model equation (1) by v i,m , one gets: Now, the residual part ε i,m has homogeneous variance: The REML estimation of the dispersion parameters of model (5) considering y # i,m as the data and v i,m as a continuous covariate gives results identical to the analysis of model (1) [14]. A version of I. Misztal's AI-REML software was modified to impose the constraints that genetic and residual variances are fixed [7]. Indeed, this is equivalent to impose structured genetic and residual (co)variance matrices where the only unknown parameters are the correlations. The AI-REML equations were expressed as functions of the unknown parameters (correlations) and of the first derivatives of the (co)variances matrices with respect to these [7]. Finally, an MT-BLUP evaluation based on pseudo-records (and their weights) was performed using the estimated genetic and residual (co)variance matrices.

Reliabilities
Two different longevity EBV were obtained for each animal: one from direct evaluation (via the Weibull model) and the other taking into account indirect information from correlated traits (MT-BLUP approach). Average reliabilities for longevity of progeny and their sires were also computed in two ways. First, asymptotic mean reliability was obtained as the mean of the diagonal elements of the inverse of the Weibull Hessian matrix at convergence of the maximisation process. The second way was to compute the correlation between the simulated (true) BVs and the estimated ones (via a Weibull model or via the MT-BLUP approach). The average reliability is the square of this correlation.

Genetic parameters for longevity
Two different genetic variances for longevity (either 0.05 or 0.20 (reference)) were used, indicating low and high genetic variation. These variances gave heritabilities for longevity of 0.05 and 0.19 respectively [23].
Also different genetic correlations (0.20, 0.40 and 0.60) were simulated between all three traits (longevity and the two linear traits) as well as a situation where correlations of longevity with linear traits 1 and 2 were equal to 0.4 and -0.4, respectively, and the two linear traits were assumed uncorrelated.

Level of random censoring
To see how censoring affected the results, four levels of random censoring (no censoring and approximately 30, 60 and 90%) were applied to the simulated datasets. Compared with the reference situation (no censoring), random censoring was generated in the following way. All sires from the same dataset had on average the same percentage of censored records among their progeny. Censoring randomly occurred at time C m equal to 400, 800 or 1200 d, mimicking the end of a reproductive cycle. In datasets with 90% of censored records, all censoring times C m were in the 400 days group. In the case of 60% of censored records, 40% of the censoring times C m were in the 400 days group, 30% in 800 days and 20% in 1200. For a 30% of censored records, the respective values were 14%, 10% and 7%. Finally, the actual longevity measure available for analysis was set to min(C m , T m ), where T m was the failure time generated as in the reference situation.

Batches of progeny with different censoring rates
The reliability of the younger sires' proofs would increase with the multiple trait approach as a function of censoring percentage. In contrast with the previous section, sires were simulated with different percentages of censoring among their progeny. The actual longevity measure for the first 1000 animals (progeny of the 10 "young" bulls) was set to min(400, T m ), i.e., censored at 400 days if the actual failure time T m was higher than 400 days. Longevity for the next 1000 animals was set to min(800, T m ) days and the records of the following 1000 animals were set to min(1200, T m ) days. Finally, the last 2000 animals were not censored (progeny of old sires).

Non zero residual correlation between longevity and linear traits
So far we assumed a zero residual correlation between longevity and linear traits. Assuming that longevity data follows a Weibull distribution, equation (2) is equivalent to: where e follows an extreme value distribution [17] with Var(e) = π 2 /6. In order to generate a nonzero correlation, records for linear traits were simulated as: where the complete residual values e i were decomposed into two components: one (ω i ) correlated with longevity and the other (ε i ) correlated with the other linear trait. That is to say, for i = 1, 2: cov(e, e i ) = cov(e, ω i ) + cov(e, ε i ) = r longi σ e σ e i + 0 = r longi σ e σ e i cov(e 1 , e 2 ) = cov(ε 1 , ε 2 ) = r 12 σ e 1 σ e 2 .
To obtain ω i,m , first the failure time y m was simulated in (6). Then, the residual e m was obtained as e m = ρ log t m + ρ log λ + x m β + z m a. The component correlated with longevity ω i,m was generated as ω i,m = b longi e m , i.e., the regression of e i on e with b longi = r longi (σ e i /σ e ). Finally, the values ε i,m were generated from a bivariate normal distribution with the adequate covariance matrix. Note that the non zero residual correlation between longevity and the linear traits was obtained using equation (6) to model log t while the final model of analysis with pseudo-records is on a different scale with heterogeneous residual variance. Also, a positive correlation r longi between linear trait i and log t corresponds to a negative relationship between the linear trait and the hazard. This will have implications on the interpretation of the results.

Biases generated by incorrect univariate analyses
In this section, it is assumed that a genetic trend on all traits existed, i.e., progeny born in years 0, 1, . . . ,10 do not have the same average genetic level, and it was incorrectly estimated (biased). To generate such a situation, the simulation models for longevity and linear traits were modified to: where, for an animal born in year j = 0, 1, ..., 10 (11 years), an effect δ j equal to 5% of a genetic standard deviation per year was added for each trait.
To create an unbalanced but connected design, each sire was assumed to have half his progeny in year j and the other half in year j + 1 (where j = 0 to 10). So, sires had their progeny in different years.
The analysis of such a situation was done ignoring the year effect in the first step (univariate analysis) leading to potentially biased estimates of variances, pseudo-records and weights. However, to see how the joint analysis of the data can cope with such biases, a year effect was included in the second step, i.e., for the estimation of the genetic and residual correlations and for the MT-BLUP evaluation of the three traits together.

Reference situation
Means of the genetic and residual variances estimates over 200 replicates for linear traits were always in the confidence interval of the mean. In the case of longevity, the sire variance is slightly underestimated for the reference situation (0.0449 estimated vs. 0.05 simulated). However, this bias may be considered negligible [10].
The calculation of the longevity pseudo-records and their weights was obtained by two alternative procedures: an approximate two-step procedure based on a sire model and a direct procedure based on the exact animal model. The correlations between both procedures were 0.9985 for pseudo-records and 0.9910 for their weights. Therefore, the approximate two-step procedure gave results very similar to the more demanding exact one. Only results from the exact procedure are reported hereafter because the moderate size of our datasets allowed the use of the animal model.
Next, univariate BLUP analyses were performed based on pseudo-records for longevity. The resulting EBV were compared with the ones calculated from the appropriate Weibull animal model. The correlations between both were 0.9980 for sires and 0.9999 for progeny. Their standard deviations were also nearly identical. As desired, the use of the pseudo-records in the univariate BLUP evaluation based on an animal model led to sire and progeny EBV equivalent to the EBV obtained in the Weibull analysis.
The pseudo-records were useful to compute genetic and residual correlations under a multiple trait sire or an animal model. All mean genetic correlation estimates were similar to the simulated ones whatever the estimation Table I. Estimates of genetic and residual correlations between longevity (Long) and two linear traits (L1, L2) using the AI-REML approach under a sire and an animal model for two levels of animal genetic variation for longevity: low (0.05) and high (or reference) (0.20). models (Tab. I). However, it was necessary to impose constraints such that the genetic and residual variances were known and equal to the estimated ones. Otherwise convergence was rarely obtained, due to genetic variances quickly going to 0. The standard deviations of genetic correlations were between 0.139 and 0.160 for the sire model and were slightly smaller for the animal model (Tab. I). The average asymptotic standard errors provided by the AI-REML algorithm were lower than these standard deviations (between 0.100 and 0.113 for the sire model). This was expected because it was assumed that the variances were known without error. Nevertheless, these asymptotic standard errors provided a rough idea about the magnitude of the accuracy of the estimates.

Model of Analysis
Finally, average reliabilities for longevity computed as the squared correlations between true breeding values and EBV were 0.890 for sires and 0.538 for progeny when just direct information was used in a Weibull model (Tab. II). These reliabilities were nearly identical to the asymptotic ones computed from EBV standard errors. When the genetic correlation between longevity and linear characters was accounted for in the MT-BLUP approach, the gain in reliability was very limited for sires (0.002 in absolute terms) and slightly higher (0.017) for progeny. This small increase may be related to the initial level of reliability, which was already high for sires.

Effect of the genetic variance of longevity
Reducing the genetic variance for longevity from 0.20 (reference) to 0.05 did not greatly affect neither the average estimates of genetic variances nor the correlations (Tab. I), although the standard deviations of the latter increased to 0.182-0.209 (sire model). The gain in reliability for the MT-BLUP approach was higher than in the reference situation, 0.012 for sires and 0.031 for progeny (Tab. II).

Effect of the genetic correlation
Again, the characteristics of the estimates of genetic and residual variances and genetic correlations (results not shown) were not substantially modified by the level of genetic correlation in a range from 0.2 to 0.6. This was also true when genetic correlations of 0.4 between longevity and the first linear trait and -0.4 with the second were imposed. When indirect information was included, the gain in reliability increased with the amount of the genetic correlation simulated (Fig. 1). These gains were substantially greater when genetic correlations with different signs were simulated.

Effect of level of random censoring
The mean of genetic correlations estimated for datasets with different degrees of censoring were always close to the simulated values. The estimation procedure seems robust even with 90% of censoring, since the average values were only slightly underestimated. Nevertheless, the amount of censoring clearly affected the standard error of the estimates, which increased moderately until 60% of censoring (from 0.139-0.160 to 0.159-0.196 for the sire model), but the increase was substantial when censoring reached 90% (0.245-0.294).
Average reliabilities for longevity logically decreased when the level of censoring increased because the amount of direct information decreased. The gains in reliability with the inclusion of indirect information initially increased with censoring rate. In the case of 90% censoring, there was no gain in reliability for progeny. This is explained by a reduced accuracy of the genetic correlation estimates in this extreme situation as a consequence of the small number of informative records. This was checked by simulating a population of 25 000 progeny under the same conditions and with 90% censoring. Then, the standard deviation of the genetic correlation estimates was reduced to 0.08−0.11 (animal model) and the gain in reliability for progeny was about 0.06.

Progeny batches with different censoring rates
When sires were simulated with different percentages of censoring among their progeny, the average reliabilities for longevity decreased and their standard deviations increased when censoring rate increased (Fig. 2). The average reliability of young sires with 90% of their daughters censored was 0.36 and 0.60 for low and high genetic variation, respectively. When indirect information was taken into account (Fig. 3), there was no gain in reliability for old sires (up to 30% censored). However, the gain for young sires was important when genetic variation was high (0.04-0.05), and even more so when it was low (up to 0.10). This can be attributed to the fact that the information of older sires allowed a more accurate estimation of variances and genetic correlations compared with the previous situation (all sires with the same percentage of censoring).

Effect of non zero residual correlations
When non zero residual correlations between longevity and linear traits exist, estimates of genetic and residual variances were again similar to the simulated values (results not shown). In the joint analysis, the sire model gave again virtually unbiased estimates of genetic correlations. However, the estimates from the animal model were clearly biased (Tab. III). The direction of the bias depended on the sign of the residual correlation. The standard deviations were similar to those obtained with zero residual correlation. It should be noted that the estimated residual correlations on longevity trait were not comparable to the simulated ones (Tab. III). The latter ones corresponded to the modelling of log t with a residual variance of π 2 /6 [11]. They were not directly transposable to the pseudo-record scale which assumes a heterogeneous residual variance and corresponds to a modelling of the hazard, not of log t.
Average reliabilities for sires and progeny were similar to those obtained with zero residual correlation (results not shown).

Effect of incorrect univariate analysis
When the univariate analyses were incorrect (for example, because of the existence of a hidden bias in the estimated genetic trend and/or an incorrect modelling of fixed effects), estimated genetic variances (0.0485 for longevity) were slightly increased compared to the reference situation (0.0449) and residual variances were slightly underestimated. The pseudo-records were also biased. A year effect was included in the joint analysis of the data to try to capture this bias. For longevity, the slope of the regression of the year effects estimated with the MT-BLUP approach on year was 0.0221 (sire model) and 0.0223 (animal model) and was very close to the simulated value (0.0224). Similar results were obtained for the year effects of the two linear traits. With the inclusion of this year effect, and in spite of the use of biased genetic and residual variances, all genetic correlations were on average similar to the simulated ones for both estimation models (Tab. III). The standard deviations of these correlations were also similar to those of the reference situation, between 0.141 and 0.171. On the contrary, the average reliabilities for sires and progeny were slightly lower (0.01 to 0.02) than the reference ones.

DISCUSSION
Although conceptually possible, the exact joint analysis of longevity data with early predictors or other functional and production traits is not routinely feasible on large data sets. This is due to the need for very different models for these traits (e.g., accounting for nonlinearity, censoring and non normal residuals for longevity traits) and above all, to the fact that the amount of data to manipulate in national evaluations is tremendous. Despite huge and fast improvements in computing power, computational considerations are still a limiting factor. To avoid this, a less demanding two step approach was proposed by Ducrocq et al. [14] and was checked here via simulation. The results obtained under a sire and an animal model confirmed the suitability of the proposed approach in a wide range of situations.
The two-step approach starts with the estimation of dispersion parameters via univariate analyses (or simpler multivariate analyses of subsets of traits) and the evaluation of all recorded animals to get pseudo-records. These pseudo-records are performances free of all environmental effects that can be used in a BLUP animal model to get the same breeding values as in the Weibull animal model. The longevity pseudo-records and their associated weights can be obtained using an animal model if the dataset is small. However, if there are computational constraints to implement it (i.e., for large national applications) a two-step procedure to get approximate animal solutions based on a sire model is a less demanding alternative. The correlations between both procedures were very high for pseudo-records as well as for weights and, therefore, both of them could be used indistinctly.
Combining these pseudo-records into a multiple trait sire model and fixing genetic and residual variances to the previously estimated values, AI-REML estimates of genetic and residual correlations were virtually unbiased. This confirms the suitability of the multiple trait approach under a sire model for analysing this kind of data. A concern related to the necessary constraint of assuming that genetic and residual variances are known is that potentially biased genetic and residual variances might lead to biased estimates of correlations. In fact, it was found that the multiple trait approach under a sire model is quite robust: if the genetic trend is wrongly estimated in univariate analyses and estimates of variances, pseudo-records and weights are biased, the joint analysis of the data can correct and even estimate this bias by including a time (year) effect. Again, this leads to nearly unbiased estimates of genetic correlations.
The adequacy of the estimation of genetic correlations in the multiple trait animal model was first assessed under the assumption of a zero residual correlation between longevity and linear traits. This assumption is natural when different traits are recorded in different countries, i.e., on different animals, but is no longer satisfying when the traits are observed on the same animals. Then residual correlations can differ substantially from 0 for some pairs of traits [14]. In such a situation, it was found that the genetic and residual correlations should be estimated under a sire model. The correlation between individual pseudo-record residuals is clearly deviating from the true residual correlation. A sire model somewhat averages residuals over progeny of a same sire and is more robust for variance component estimation. Then, the estimated correlations can be used in an MT-BLUP animal model.
After the MT-BLUP evaluation, a gain in true reliability for longevity is observed with respect to the situation when just direct information is used in a Weibull model. This gain is greater when the reliability in the initial Weibull model is lower. However, this gain was very limited in uncensored datasets, at least with the moderate size of our dataset. The increase in reliability was more noticeable when progeny batches with different censoring rates were simulated. Then, the information from older sires allowed an accurate estimation of genetic variances and correlations and the multiple trait evaluation used this information to significantly improve the reliability for young sires (with 90% censored daughters).
The suitability of the two-step multiple trait approach was also assessed under situations where progeny of all sires had an extreme percentage of censoring. The estimation procedure for variances and correlations seems nearly unbiased but very imprecise in extreme cases, e.g., with 90% censoring for all progeny groups and small size datasets. In these extreme situations, the gain in reliability is negligible. But the increase in true reliability due to the joint analysis can be quite important, when the dataset is large enough (e.g., at least 25 000 records) to accurately estimate variances and correlations. These extreme levels of censoring usually exist only for a fraction of the bulls for dairy cattle length of productive life, but apply more generally to the whole population for example in piglet [4], beef calf [19] or laying hen [13] survival.
It was not our intention to compare this approach with other approximate strategies, which, for example, directly combine sets of estimated breeding values using selection index theory. Although computationally more demanding, the strategy proposed here has several attractive features: it accommodates nonlinear traits, it gets as close as possible to a true multiple trait BLUP which has well known theoretical characteristics and it offers a framework to the approximate estimation of genetic correlations between complex traits.

CONCLUSIONS
In conclusion, one can note that the two-step approach is an operational tool that can be implemented in many situations where a multiple trait approach is desirable but not applicable, either because of the huge size of the datasets analysed or the complexity and heterogeneity of the models to be implemented. Applications have been reported for total merit index constructions [14], joint analyses of longevity, discrete and linear traits [1], and joint cow and bull international evaluation [3].