Variance components for survival of piglets at farrowing using a reduced animal model

Farrowing survival is usually analysed as a trait of the sow, but this precludes estimation of any direct genetic effects associated with individual piglets. In order to estimate these effects, which are particularly important for sire lines, it is necessary to fit an animal model. However this can be computationally very demanding. We show how direct and maternal genetic effects can be estimated with a simpler analysis based on the reduced animal model and we illustrate the method using farrowing survival information on 118 193 piglets in 10 314 litters. We achieve a 30% reduction in computing time and a 70% reduction in memory use, with no important loss of accuracy. This use of the reduced animal model is not only of interest for pig breeding but also for poultry and fish breeding where large full-sib families are performance tested.


INTRODUCTION
The reduced animal (RA) model of Quaas and Pollack [6] was introduced to obtain predicted breeding values without the need to set up equations for all animals. This model distinguishes between animals with progeny records (parents) and those without progeny records (non-parents). The section of A −1 pertaining to non-parents is diagonal. Equations are obtained for parents only, then predicted progeny breeding values, equal to the parent average plus an estimated Mendelian effect, are obtained by back-solving. Predicting breeding values is a less computationally demanding task than estimating variance components, and since computing power has increased, this model has lost favour and nowadays is rarely used. Many animal breeders believe that the computational benefits of the RA model are outweighed by the difficulty of setting it up.
When pedigree information is available, variance components are usually estimated by fitting the animal model [8]. In some species, animals with records form large groups (litters) with common parentage. Here we revive the RA model and show how it can be used to estimate genetic and environmental variance components in this situation. An approximate version of the model can be fitted even when the only data recorded are litter mean values of piglet survival. We demonstrate that it is possible to fit an RA model with standard software for variance component or BLUP estimation.
As an example we applied the method to piglet mortality data. Our model treated survival as a trait of the piglet. The genetic influences on the trait consisted of the genetic effect of the piglet itself (the ability of the piglet to survive) and the maternal genetic effect (the ability of the sow to provide a suitable environment for survival of her offspring).
Often data are available only in the form of litter means or totals, such as the number of stillborn piglets and number of piglets born alive. This information is generally analysed using a model which treats survival as a trait of the sow and ignores the direct genetic effect of the piglet. Assuming an average litter size of 10 piglets, such a model is attractive, since the number of animals and number of equations to be solved is reduced to around 10% of the full analysis.
In addition, such a model has fewer genetic effects and is perceived to be more "robust". The drawbacks to this model are that the direct genetic effect has been ignored and breeding values for service sires are not generated.
The analysis described in the next section retains the simplicity of the conventional analysis but has none of its drawbacks.

Statistical methods
We assume observations on k litters of fullsib animals, with n i animals in the ith litter. The total number of animals with records is N = n i . With the RA model, we distinguish between animals with progeny records (parents) and those with no recorded progeny (non-parents). A parent with its own performance record appears twice in the dataset, initially as a member of its birth litter and later as the parent of a litter.
The breeding value for a non-parent is represented as the average breeding value of the sire and dam plus a Mendelian deviation specific to the animal, with variance 1 2 σ 2 a (1− f ), where f is the average inbreeding of the two parents. As an approximation, we ignore the effect of inbreeding on the Mendelian variance, which is taken to be 1 2 σ 2 a . Let p be the number of parents in the pedigree. This number is usually a small fraction of the total number of animals (parents plus non-parents). Let a and m (p × 1) be vectors of breeding values for direct and maternal genetic effects for parents. It is assumed that cov(a) = σ 2 a A, cov(m) = σ 2 m A, where A is the numerator relationship matrix for parents. Equation (1) below describes the RA model for direct genetic effects, with in addition maternal genetic, litter, and fixed effects: The data vector is y (N × 1), Xb are fixed effects, P is an N × p matrix such that P i j = 1 if the ith animal is parent j, zero otherwise, Q is an N × N diagonal matrix with Q ii = 1 when animal i is a non-parent, zero otherwise, S and D are N × p matrices identifying sires and dams, respectively, in the list of parents. The matrix L (N × k) relates records to litters, and the litter effects u have variance σ 2 u . The error vector e has variance σ 2 e for parents and σ 2 e + 1 2 σ 2 a for non-parents (ignoring inbreeding).
Equation (1) suggests one way to set up the RA model to deal with direct genetic effects. For parents, identify the animal but set sire and dam "missing" and for non-parents, set the animal identity missing but give the sire and dam. The matrix P + 1 2 QS + 1 2 QD is then the design matrix for animals "overlaid" by half times the design matrices for sires and dams.
The RA model can be analysed with a BLUP or variance component estimation programme of sufficient flexibility. For example, it must have the capacity to "overlay" design matrices, and treat missing values in design matrices in an appropriate way. More detail is given in the appendix.
It is interesting to explore to what extent the data can be reduced to litter means without loss of information. Inspection of equation (1) shows that simply averaging results by litters will not result in a model that can be dealt with easily. However, for the subset of results relating to non-parents, P = 0, Q = I, and for these records, reduction to litter means is straightforward: whereȳ is a k × 1 vector of litter means for non-parents, andX,S andD are design matrices relating fixed effects, sires and dams to litters, obtained by averaging rows of X, S and D. This involves no loss of information provided any fixed effects which vary within litters are simple factors, such as sex of piglet. For a litter containing n non-parents, the variance ofē is (σ 2 e + 1 2 σ 2 a )/n. The information in the data comprises (i) individual records for parents, (ii) litter means, and (iii) the within-litters variance, with (ii) and (iii) calculated from records for non-parents.
When there are no parents with their own performance records, analysis is based on (ii) and (iii). Overlaying of design matrices can be avoided by regrouping the genetic terms as 1 2 Sa and D( 1 2 a + m). Correlated effects for sire of litter ( 1 2 a) and dam of litter ( 1 2 a + m) can be fitted with variances 1 4 σ 2 a and 1 4 σ 2 a + σ 2 m , and covariance 1 4 σ 2 a . The analysis required is therefore an analysis of litter means, weighted by litter size, with correlated random effects for sire and dam of litter, and the covariance constrained to equal the sire variance. This allows estimation of the direct genetic variance σ 2 a (as 4 times the sire variance), and the maternal genetic variance σ 2 m (as the difference between the dam and sire variances). This analysis is possible even when litter means are the only data available. If there is a large number of litters, and sufficient variability in litter sizes, it will also be possible to estimate the litter variance σ 2 u separately from the residual variance σ 2 e + 1 2 σ 2 a , although it may be poorly estimated.
If data are available for individual animals, the within-litters mean square provides a separate estimate of the residual variance, and the litter variance σ 2 u can then be estimated much more accurately. The within-litters mean square is always available with 0/1 data, since it can be calculated from the litter means (proportions).
When there are parents with their own performance records, each is included in the weighted analysis, with weight one, separately from the rest of its birth litter. It is linked to its birth litter through its litter, sire and dam records, and its breeding value is linked to its own litters through the overlaying of animal, sire and dam design matrices described above and in the appendix. We refer to this as the exact RA analysis.
The exact analysis requires identification of parents, overlaying of design matrices, and the use of various devices described in the appendix. The simpler analysis appropriate when there are no parents with their own records can be applied to all progeny, ignoring the distinction between parents and nonparents. This analysis will often capture most of the information in the data, and might be considered as a useful approximation when the number of parents with records is a small fraction of the total number of animals, or if litter means, such as percentage survival per litter, are the only data recorded. We refer to this as the approximate RA analysis.

Example
The method is illustrated by estimating genetic and environmental components of variance associated with farrowing survival. Survival data were collected between 2000 and 2003 from two farms and four pig lines. Pedigree information was available on the parents of the litters.
A piglet born alive was scored as 1, a stillborn or mummified piglet as 0. The trait is therefore the probability of being born alive among all piglets (including stillborn and mummified piglets). The model included a direct genetic effect for each piglet, maternal genetic and litter effects. The fixed effects included were parity of litter, farm, line and farrowing date (as a separate smoothing spline for each farm-line combination).
Since genetic and environmental variances for this trait are small, we did not consider it necessary to make any special allowances for the binary nature of the trait. The analysis is therefore on the 0/1 scale. This also facilitated comparisons with previous work.
There were observations on 118 193 piglets in 10 314 litters of 6592 sows and 844 service sires (an average 11.5 animals per litter). Amongst the 7436 parents, 3565 had their own performance records for the litter in which they were reared, and there were 19 850 animals in the parents' pedigree.
The animal model was fitted in the usual way. In addition we fitted the exact and approximate versions of the RA model, using the same or equivalent starting values. Timings and memory use were compared for the three methods. All calculations were performed using ASReml [2].

RESULTS
All three analyses converged without problems in between 5-10 iterations. The results for the animal model and both forms of the RA model were virtually indistinguishable. Estimates and standard errors are given in Table I.
Average farrowing survival was 88.3%. The residual variance made up 93.6% of the total phenotypic variance. Of the remaining 6.4%, 3.9% was litter variance and the remainder was approximately equally divided between direct (1%) and maternal (1.5%) genetic components (Tab. I). Table II gives the number of equations, the set-up time, time per iteration, and memory requirement for each model. Both forms of the RA model led to

DISCUSSION
The full and reduced animal models are exactly equivalent, provided that full account is taken of the effect of inbreeding [4]. This requires a separate residual variance σ 2 e + 1 2 (1 − f )σ 2 a for each litter, where f is the average inbreeding coefficient for the two parents. The average parental inbreeding in our data wasf = 0.061, but our calculations effectively assume that f = 0 for all litters. Since the results for the animal model and the RA model were extremely close, there appears to be no need to allow for this small amount of inbreeding.
The main advantage of the RA model over the full animal model is the reduction in the number of equations that have to be set up and manipulated by the programme. There is some advantage in the reduction in the size of the A-inverse that must be calculated, but this does not result in dramatic improvements in speed or memory use, since most programs will exploit the sparseness of the large A-inverse associated with the full model. A secondary advantage of the RA model is its greater transparency, in that the sources of information for various components of variance are more obvious. This may be useful when there is difficulty in obtaining convergence or uncertainty about whether to include a particular term in the model.
The disadvantages of the RA model are associated with the difficulties of setting it up. However, this does not apply to the approximate form of the model, which will often be a satisfactory substitute for either the full animal or the exact RA models.
The problem of analysing data recorded as full-sib group means was considered by Simianer and Gjerde [7]. They proposed an analysis based on averaging the relationship matrix for all animals by blocks defined by full-sib groups, but did not consider the possibility that parents might have both progeny and their own performance records. The approach described here based on the RA model is simpler and easier to implement with standard software.
With the approximate version of this model, it is possible to estimate a direct and maternal genetic effect even when performances are recorded as litter means or percentages. This substantially reduces the cost of recording for traits such as farrowing survival, without important loss of accuracy, as shown by the present study. In addition, the direct genetic effect of farrowing survival in sire lines can be estimated from mean litter records, often the only data available. This is important for sire lines because in a crossbreeding scheme only the direct genetic effect will have influence on the survival of piglets in commercial production. The RA model is particularly useful when litter sizes are large and only a small fraction of animals are selected to create the next generation. Therefore, the RA model is not only of interest for pig breeding but even more important for poultry and fish breeding where large full sib families are performance tested. With a model which includes both direct and maternal genetic effects, the number of equations for the RA analysis is reduced, relative to the full animal model, by twice the total number of non-parents. For example, an analysis of poultry data might involve 10 000 full-sib families of average size 25. If there are on average 2 parents per litter, there is a saving of 460 000 equations, with concomitant reductions in run-time and memory requirement. Even greater reductions can be expected with typical fish data sets. Although the emphasis of this paper is on methodology, we now briefly discuss the results on variance components for farrowing survival.
Our results were in general agreement with previous research [3,5]. Grandinson et al. [3] found that the direct effect was negligible but estimated maternal genetic and litter variances similar to those of the present study. Knol et al. [5] analysed sire and dam lines on different farms and found a small direct genetic effect.
Some comparisons are given in Table III. The average farrowing survival rate was smaller than that found in the other studies. This may reflect a difference in the definition of the trait. Mummified piglets were included in our calculations, but it is not clear whether they were included in the other studies. All four analyses give the same litter variance, and there is approximate agreement on the size of maternal effects. Our estimate of the direct genetic component was larger than the estimate of Knol et al. [5].
Residual and total phenotypic variances were larger, perhaps reflecting the different average survival rates.
Our analysis confirms the previous finding (e.g., Knol et al. [5]) that there is a small but statistically significant direct genetic effect associated with the individual piglet.
Fitness traits in livestock are generally of low heritability and the accuracy of breeding values for these traits is often increased by including information from correlated traits of higher heritability [1,9]. Multivariate analysis is required to estimate the parameters needed for such an analysis but for t traits in a multivariate analysis we need to estimate 1 2 t(t+1) genetic and residual parameters. In addition, the number of equations for genetic effects increases linearly with the number of traits, making multivariate analysis very demanding of computing resources. The analysis described here yields the same parameters as a full animal model but does so with a smaller requirement in computing time and memory. The advantages of this analysis are likely to increase with the number of traits, and we suggest that the reduced animal model has a useful role in estimating variance components in both a univariate and a multivariate context.

Linking two occurrences of a parent
As an example, consider a litter of sire 1 and dam 2, comprising four animals, three of which are born alive (score = 1) and one is stillborn (score = 0). Two of the survivors (animals 3 and 4) subsequently become parents. For the full animal model, these four animals would appear in data and pedigree files as In this case the pedigree file involves only parents. For a parent in its birth litter, animal ID is specified, sire and dam are set missing. In all other cases, animal ID is set missing, and sire and dam of the litter are specified. The non-parent results can be reduced to litter means, in which case this litter contributes three rows to the data file:

Parameter constraints
Let θ 1 = σ 2 a , θ 2 = σ 2 e , θ 3 = σ 2 e + 1 2 σ 2 a . All three variance parameters appear in the model, and we must impose the constraint This problem does not arise when using the RA model for BLUP calculations, because in that case the variance ratio σ 2 a /σ 2 e is fixed. ASReml allows proportionality constraints but not the more general form of constraint (3). We found various solutions to this problem.
(a) Instead of having separate residual variances for parents and non-parents, we let σ 2 e be a residual variance for all records, and set up an extra variance component specifically for non-parents. There is a simple proportional relationship between the extra component ( 1 2 σ 2 a ) and the additive genetic variance (σ 2 a ). A disadvantage of this approach is that it does not reduce the number of effects estimated compared to the full animal model. Each parent has an effect, and each non-parent a Mendelian component. Nevertheless we found some reduction in run times (around 10-20%) with this approach, perhaps due to the smaller A-inverse used (parents only). An advantage of this approach is that BLUP for non-parents are easily reconstructed as the sum of the average parental BLUP and the Mendelian component. (b) Method (a) can be modified so that σ 2 e + 1 2 σ 2 a is the residual variance for all records, with an "extra" variance (− 1 2 σ 2 a ) specifically for parents. This is constrained to be −0.5 times the additive genetic variance, and the default positivity constraint is removed. This adds one extra equation for each parent, but the number of extra equations is generally much less than for method (a). (c) ASReml allows a user-supplied "OWN" variance structure. It is possible to define a residual variance for non-parents (σ 2 e + 1 2 σ 2 a ) in terms of two parameters, σ 2 e and σ 2 a , to be estimated separately. The first is constrained to be equal to the residual variance for parents, the second to the additive genetic variance. (d) A Lagrange multiplier can be used for constraints such as (3). Unfortunately this method is only feasible if programmed internally. Otherwise it requires several programme runs. Some form of ad hoc adjustment is required if variance components stray outside the parameter space at any stage.
Write θ for the vector (θ 1 , θ 2 , θ 3 ), H θ = 0 for the constraint, and denote the score vector and information matrix by S and B.
With an unconstrained analysis, the update step is θ → θ + δ, where δ = B −1 S. The constraint can be accommodated by starting with a value of θ satisfying the constraint and modifying the update step to δ = B −1 (S − Hλ) All computing times reported in this paper are for method (b).
To access this journal online: www.edpsciences.org