Estimation of genetic parameters of egg production traits of laying hens by restricted maximum likelihood applied to a multiple-trait reduced animal model

Variance components for egg production traits (No of eggs produced between 19 and 26, 26 and 38 and 26 and 54 wk of age), egg characteristics (average egg weight at 2 different ages and egg density) and body weight of hens at 40 wk of age were estimated in two strains of a breeding company by univariate and multivariate Restricted Maximum Likelihood (REML) applied to a Reduced Animal Model (RAM). To allow tridiagonalization of the coefficient matrix when RAM is considered, the approach of Thompson and Meyer (1990) using an imaginary random effect with negative variance

The purposes of this study were: 1) to estimate genetic parameters of 7 correlated egg production traits by REML applied to a multiple-trait reduced animal model; and 2) to show the application of some state-of-the-art techniques which make estimation possible in strains of laying hens with large numbers of birds.

Data and traits description
Data, including records of 165, 748 and 47, 115 survivor laying hens for strains A and B respectively, were supplied by the &dquo;Institut de Selection Animale-ISA&dquo;. For both strains, these records represented 6 generations of hens.
Traits considered in this analysis were related to egg production (number of eggs produced between 19-26 wk of age (P l ), 26-38 wk (P Z ) and 26-54 wk (P 3 )) and egg characteristics (average egg weight at 2 different ages (EW I , EW 2 ) and egg density (ED)). Weight of hens at 40 wk of age (W 4o ) was also included in the analysis. P I can be considered as being a combination of sexual maturity and early egg production. ED was a measure of the shell strength determined by specific gravity. P i , P 2 and P 3 variables exhibited markedly skewed distributions. They were transformed into new variables, satisfying the classical hypotheses for describing traits with polygenic inheritance via a linear model with normal error, using a power transformation (Box and Cox, 1964). This transformation relies on a single parameter t and has the following form (Ibe and Hill, 1988): were y is the geometric mean of the original observations. The parameter t was empirically chosen in such a way that several normality criteria, such as the low residual sum of squares of the genetic model used to describe the data, the linearity of half-sib on individual regression, the coefficient of symmetry and the Kolmogorov-Smirnov test for normality of the residuals were satisfied as closely as possible and simultaneously, as proposed by Ibe and Hill (1988) and detailed in Besbes et al (1992).

Model of analysis
The model describing the records is the following multiple-trait animal model: where Y is the vector (n x t) of observations for the t egg production traits considered in the multiple-trait analysis (t = 7), b is the vector ( f x t) of fixed contemporary groups ( f was equal to 107 and 70, for strains A and B respectively), a is the vector (n x t) of random additive genetic effects associated with the animal's traits, e is the vector (n x t) of residuals, X and Z are known incidence matrices associated with b and a and ( * ) indicates direct (Kronecker) product.

It was assumed that
A is the relationship matrix between animals. G and R are unknown genetic and residual (co)variance matrices between the t traits considered.
Computing strategy for genetic parameter estimation The multiple-trait animal model [2] had one random effect and equal design matrices for all traits which were recorded for all animals (no missing records). Canonical decomposition was then applied to yield new uncorrelated variables without loss of any information contained in the original variables. This transformation was first suggested for animal breeding problems by Thompson (1976, cited by Jensen and Mao, 1988). The transformation matrix Q, is chosen such that (Quaas et al, 1984): where G c is a diagonal matrix and I nt is the identity matrix. After transformation to the canonical scale, model [2] becomes: The subscript c refers to the canonical scale. This transformation reduced the multivariate analysis to a series of univariate analyses and consequently, drastically decreased computational costs.
These computational costs were also lowered by reducing the number of equations. Since parents represented only 9% and 8% of the total number of animals, for strains A and B respectively, the reduced animal model (RAM) of (auaas and Pollak (1980) was used. With RAM, all the equations corresponding to animals which were nonparents were absorbed into the remaining equations. As a consequence, the size of the system was brought down to the number of parents.
Considering model [3] for the pth trait, with RAM, the vector Y c , was divided into parents (denoted by subscript p) and nonparents (denoted by subscript n) as follows (subscript 1L is dropped for clarity): with e:t c = e!!-!!nc('!nc being the Mendelian samplign on the canonical scale) and P! is a matrix of 0 and 1 relating nonparents to their parents. Random variables in [4] have the following (co)-variance structure: C7! is the pth diagonal element of G c . A n is a diagonal matrix whose elements are equal to 1/2 if both parents are known and 3/4 if one parent if known. It was assumed that there is equal parental information for all nonparents and that parents are not inbred. If this assumption is not satisfied, nonparents with unknown parents can have dummy parents (Thompson and Meyer, 1990).
Since REML is an iterative procedure, the major cost was the need, at each iteration, of direct inversion of a matrix of size equal to the number of parents. This burden was reduced by tridiagonalizing the coefficient matrix through a series of orthogonal transformations, as proposed by Smith and Graser (1986). This transformation, however, was not directly applicable because, as shown in !5!, RAM generates heterogeneous residual variances between parents and nonparents. To overcome this problem, Thompson and Meyer (1990) reparameterized [4] adding an imaginary effect, e, with negative variance; e j = iwa d , (with i 2 = -1), was chosen such that ep c -e j and e!c had the same variance, hence var(e j ) = -w2(T!Ip (with w 2 being an element of An). Hence, the mixed model equations corresponding to model [6] can be written as: where a is the pth trait's variance ratio cr!/o-!, with CT! = 1 +(.¡)2CT! an element of the new canonical residual matrix R e . For estimating the (co)variance components, equations for fixed effects are absorbed. The resulting system has equations of the form (Thompson and Meyer, 1990): Further simplification of [8] was achieved by eliminating A!,1. This involved the Cholesky decomposition of the numerator relationship matrix (Ap = LL') and preand post-multiplication of the left hand side by L' and L and multiplication of the right hand side by L' (Smith and Graser, 1986). The solutions for ap e and a d are of the form: As shown in !9!, this method led to the tridiagonalization of a complex matrix of size twice the number of parents. After the tridiagonal matrix T (for T such that T = PHP', where P is an orthogonal matrix) was found and using algebra similar to that of Smith and Graser (1986) system [9] becomes: Then, we iteratively calculate the canonical REML estimates of the (co)variance matrices, G c and R e , between all traits using an Expectation-Maximization (EM) type algorithm (Dempster et al, 1977;Harville, 1977).
For the (!7 + 1) round of iteration, estimators of the elements in G c are: and those in R c are: where N is the total number of observations, f is the number of levels of the fixed effect, p' is twice the number of parents and Y §'Y § is the quadratic form of the canonical observations after absorption of the fixed effects. It has the following expression: The single trait analysis corresponds to the special case where Q = I (Hence G and R are diagonal), and the EM-REML equations are those in [11] and !13!.
Sampling procedure Despite these cost-reducing techniques (canonical decomposition, use of RAM, tridiagonalization) the time-consuming tridiagonalization and above all the amount of computer memory required, prohibited the application of REML estimation to the whole population. Therefore, 15 samples, reflecting as well as possible the population's structure and selection, were drawn in each strain from the data file in the following manner: 1. Choose S sire (S = 5) at random among the youngest parents.
2. Include the sire and dam of each selected sire.
3. For each of the selected sires S i in (1) and (2), choose 3 sires S j at random among those whose offspring are contemporary to those of S i . 4. For each of the sire selected in (1) to (3), choose 3 females at random among its mates.
5. Repeat step (2) to (4) until the generation which is assumed to be the base population has been reached. 6. Include all offspring of all the selected matings in the sample.
As mentioned above, the system to solve is of size twice the number of parents.
This, depending on the computer capacity, limits the sample's size. The values 5 and 3 in steps 1,3 and 4 were chosen by trial and error in order to obtain a maximum of 800 parents, corresponding to sampling rates of m 5 and 17% respectively for strains A and B, which is as many as can be handled.
This sampling procedure has the following characteristics: only the pedigrees of males are complete.
Step 3 leads to the inclusion of hens contemporary to those hens whose records were used to select sires. Hence, selection on the male side is (approximately) accounted for. This is not the case on the female sire. However, the change from one generation to the next in the expected value E(a) = g of the genetic merit of hens, due to selection, is accounted for since the levels of the fixed effect (contemporary groups) are defined within generation. In other words, this expected value, considered as fixed, using the approach of Westell (1984) or Quaas (1988) is completely confounded with the contemporary group effect. Only the effect of selection on the female side on the genetic variance is not accounted for as usually indicated when using animal models.
As already mentioned, only the pedigrees of males are complete. Hence, neither Henderson's rules (1976) for computing the numerator relationship matrix A or its inverse A -I , nor Quaas's algorithm (1989) to eliminate A-1 from mixed model equations used in REML could be utilized. A and its Cholesky factor L(A = LL') were therefore calculated directly considering the pedigree available for the entire population (Tier, 1990).

RESULTS AND DISCUSSION
The average numbers of parents per sample were 710 and 600 for strains A and B respectively, which represented samples of 7000 and 6500 recorded hens partitioned into 107 and 70 contemporary groups. The tridiagonalization of the coefficient matrix of size twice the number of parents required respectively 113 and 60 min of CPU time on an IBM 3090-17T. This represented 98% of the total computational time needed for genetic parameter estimation for such samples.
As a consequence of a smaller number of sires per generation for strain B, the corresponding samples overlapped in a higher proportion. On average, 38% of parents were common to any 2 samples vs 17% for strain A. This overlap was neglected when computing means and standard deviations of the estimates obtained from the 15 samples.
As a consequence of working with imaginary terms, the algorithm proposed by Thompson and Meyer produced a tridiagonal matrix T with some negative eigenvalues. When by chance, during the EM iteration, the variance ration a was very close to one of those negative eigenvalues, the matrix of system [10] was singular. This led to an infinite trace, preventing the EM-REML from converging.
When this occurred, it was necessary to &dquo;jump&dquo; over the value a to avoid numerical problems.
In any case, after the tridiagonal matrix was found, the equations of the EM could be solved in linear time and any number of iterations was easily performed (Smith and Graser, 1986). However, the number of rounds needed to reach convergence varied from sample to sample (from 120 to 400 here).

Single trait genetic parameter estimates
Heritability values (table I) exhibited very small differences between strains A and B. The highest difference was observed for the number of eggs produced between 26 and 38 wk of age: 0.09 for strain A vs 0.13 for strain B. However, it should be noted that for similar heritabilities, strain B presented, in general, larger additive genetic and residual variances.
These estimates also showed that egg production traits (P l , P 2 and P 3 ) are much less heritable than egg characteristics or body weight. Heritabilities of egg production traits were lower than usually reported in the literature, especially those for P 2 and P 3 . Regarding heritability of P i , a combined trait of both sexual maturity and early egg production, it was roughly within the range of Liljedahl and Weyde (1980) estimates but closer to that of egg production. We should, however, be careful with such comparisons since most reported heritabilities and variance components were obtained on the original scale without performing any transformation but using methods assuming normality of the data distribution. Strictly speaking, such results should be interpreted with caution.
The purpose of the Box-Cox transformation, applied for P l , P 2 and P 3 is then to change the scale of measurements in order to make the analysis more valid.
Besbes et al (1992) showed that this transformation resulted in an increase of all heritabilities without drastically modifying the genetic and residual correlations between these traits. Heritability values of egg and hen weight, though rather small, remained within the range of estimates reported by King and Henderson (1954, cited by Kolstad, 1980, Kolstad (1980) and Sorensen et al (1980). The same trend was observed for the estimates of egg density (specific gravity).

Multiple-trait genetic parameter estimates
As shown in table II, heritabilities obtained by multivariate EM-REML were very close to those obtained by single trait analysis. This result was in agreement with that of Colleau et al (1989).
The comparison of genetic correlations of both strains revealed quite similar global trends concerning the sign of these correlations. But those of strain A were, in general, smaller in absolute values.
These correlations showed a rather small antagonism between the group of egg production traits (P I , P 2 and P 3 ) and egg density (ED) on one hand and that of average egg weights on the other hand. The largest antagonism was observed between egg weight (EW l ) and egg density (-0.17 and -0.23 respectively for strain A and B) but remained rather small. In the literature, there is a large variation in the reported scale of the genetic correlation between number of eggs produced and average egg weight. For Sorensen et al (1980), this correlation was -0.32 and -0.17 depending on the population. Liljedahl and Weyde (1980) reported an evolution of this correlation from slightly positive and non significant values in the base population to markedly negative ones in selected lines. These results were in contrast with those of Kolstad (1980) which showed some evidence for a decline in the genetic correlation from -0.18 in the base population to -0.11 in the selected lines. Nearly all these correlations and heritabilities were estimated by analyzing the resemblance between paternal half-sibs and separately for each generation and trait. Hence, they were biased by selection. This was theoretically not the case with our REML estimation. However, it must be said that our &dquo;base&dquo; population was already a selected one and, also, the sampling procedure could not take all the selection process into account.
From table II, it can be seen that correlations between egg number and body weight of hens varied considerably among strains (the correlation between P 3 and W40 was 0.25 for strain A and -0.12 for strain B) which was also found by Kolstad (1980). This author reported that an intermediate body weight is optimum for reproduction, of which egg production is a component, and hence one should expect these correlations to be positive or negative depending on whether the weight is below or above the optimum of the population. Therefore, it might be thought that strain A was closer to an optimum for body size than strain B. Great variations of these correlations were also observed among the 3 recording periods (they varied, in these case of strain A, from -0.09 for P l to 0.25 for P 3 while correlations between egg and body weight were positive and consistently moderate in all cases.
T ble II also shows that traits within each group of traits were positively correlated. The highest correlations were between P 2 and P 3 for the first group of traits, respectively 0.66 and 0.78 and EW I and EW z for the second group, 0.92 and 0.94. To some degree, ELI' 1 and EW z were measurements of the same characteristic.
T!e correlation between P 2 and P 3 is lower than the literature average values (0.79 and 0.85, respectively for selected and unselected White Leghorn breed) obtained by Fairfull and Gowe (1990). This is probably due to a low correlation between P 2 and the residual egg production (38-54 wk of age). Hence, P 2 and P 3 cannot be considered as measuring the same trait even if they overlap.
The positive correlations between P I and the other egg production traits (P z and P 3 ) indicate that this trait has to be considered as an egg production trait rather thanl a sexual maturity trait (sexual maturity is usually negatively correlated with the number of eggs). The nonsignificant correlation between these traits and shell quality was also reported by Kolstad (1980). However Fairfull and Gowe (1990) obtained, for selected White Leghorn, a negative correlation of m -0.2 between survivor egg production from housing to 40 or 55 wk of age and specific gravity.

CONCLUSION
The present study showed a rather small antagonism between egg production traits and egg weights and a positive correlation between traits within the same category.
Comparison of the heritabilities obtained with univariate and multivariate RE1V$L estimation showed no significant differences.
By employing a series of transformations in the variance components estimation process, especially the tridiagonalization of the system proposed by Thompson and ,Meyer (1990), the problem of slow convergence of the EM algorithm was largely alleviated since the computations necessary at each round of iteration were performed in linear time. This allowed us to study large samples with an acceptable computational time. However, this algorithm encountered numerical problems, depending on the matrix structure, that potential users should be aware of. ; Tue animal model used involved additive genetic effects only. The complexity of calculations precluded the use of a more sophisticated model including eg nonadditive, maternal and common environmental effects. Comparison of sire and dam components estimated elsewhere (Besbes, 1992) revealed very small differences between these 2, which support the choice of model !2).