Breeding value prediction for production traits in layer chickens using pedigree or genomic relationships in a reduced animal model

Background Genomic selection involves breeding value estimation of selection candidates based on high-density SNP genotypes. To quantify the potential benefit of genomic selection, accuracies of estimated breeding values (EBV) obtained with different methods using pedigree or high-density SNP genotypes were evaluated and compared in a commercial layer chicken breeding line. Methods The following traits were analyzed: egg production, egg weight, egg color, shell strength, age at sexual maturity, body weight, albumen height, and yolk weight. Predictions appropriate for early or late selection were compared. A total of 2,708 birds were genotyped for 23,356 segregating SNP, including 1,563 females with records. Phenotypes on relatives without genotypes were incorporated in the analysis (in total 13,049 production records). The data were analyzed with a Reduced Animal Model using a relationship matrix based on pedigree data or on marker genotypes and with a Bayesian method using model averaging. Using a validation set that consisted of individuals from the generation following training, these methods were compared by correlating EBV with phenotypes corrected for fixed effects, selecting the top 30 individuals based on EBV and evaluating their mean phenotype, and by regressing phenotypes on EBV. Results Using high-density SNP genotypes increased accuracies of EBV up to two-fold for selection at an early age and by up to 88% for selection at a later age. Accuracy increases at an early age can be mostly attributed to improved estimates of parental EBV for shell quality and egg production, while for other egg quality traits it is mostly due to improved estimates of Mendelian sampling effects. A relatively small number of markers was sufficient to explain most of the genetic variation for egg weight and body weight.


Background
During the first decade of the 21st century, there has been a rapid development of genomic selection tools. Through the application of genomic selection [1], marker information from high-density SNP genotyping can increase prediction accuracies at a young age, shorten generation intervals and improve control of inbreeding [2], which should lead to higher genetic gain per year. Many simulation studies have shown the benefits of this technology, depending on heritability, number and distribution of effects of QTL, population structure, size of training data set used to estimate SNP effects, and other factors [3]. However, studies on real data are still scarce. If practical application of genomic selection is to be implemented in chicken breeding, as already done for dairy cattle [4], it must prove its advantage over traditional methods and be used in a way that maximizes the use of available information. The accuracy of EBV derived from large numbers of markers for within-breed selection is difficult to evaluate analytically and must be validated by correlating predictions to phenotype in the target population (usually the generation following training).
One of the challenges in genomic prediction of breeding values is that not all phenotyped individuals are genotyped. One approach to exploit all available information is to first estimate breeding values of genotyped individuals by pedigree-based methods using all data, including phenotypes on non-genotyped relatives, and then use deregressed estimates of those EBV for marker-based analyses [5,6]. This two-step approach may, however, result in suboptimal use of information. Another recently developed method uses a combined pedigree and genomic covariance matrix, which can incorporate both genotyped and non-genotyped animals [7,8]. However, these methods are computationally demanding and require careful scaling of the genomic relationship matrix to be consistent with the pedigreebased relationship matrix. The reduced animal model was proposed by Quaas and Pollak [9] to make breeding value prediction under the animal model less computationally demanding. It fits the full relationship matrix for parents and absorbs the equations for non-parents. Nowadays, the development of powerful computers makes the reduction of computing cost less relevant for pedigree-based analyses but the reduced model can also be used to exploit marker-based relationships. In breeding programs using marker information, individuals that have been used for breeding (i.e. parents) are more likely to be genotyped than unselected non-parents. Estimating breeding values for genotyped animals and absorbing non-genotyped progeny into their equations can make full use of all available data. With this approach, there is no need to construct the inverse of the combined pedigree and genomic covariance matrix of Legarra et al. [7].
The objectives of this study were to implement a reduced animal model to estimate breeding values using high-density SNP genotypes, to evaluate the accuracy of breeding values estimated using high-density SNP genotypes in the generation following training in a layer breeding line, and to compare the accuracy of alternative methods of breeding value estimation.

Data
Data on nine traits collected during the first 22 weeks of production were recorded on 13,049 birds from five consecutive generations in a single brown-egg layer line: egg production (ePD, percent hen average); age at sexual maturity (eSM, d); weight of the first three eggs laid by the hen (eE3, g) and shell color (eC3) collected from same eggs by Chroma Meter that measures lightness (L) and hue (as a function of a red-green (a) and a yellow-blue (b) scale). A second set of egg quality traits collected at 26-28 weeks (early, e) included average weight of eggs (eEW,g); egg color (eCO) eggs; shell quality measured as puncture score -a non-invasive deformation test averaged over points of the shell (ePS, Newton); albumen height (eAH, mm); and yolk weight (eYW, g). For birds selected on the basis of early (e) trait data, also late (l) production (42-46 weeks of age) traits were recorded: body weight (lBW, g); egg production (lPD, percent hen average); puncture score (lPS, Newton); egg weight (lEW, g); albumen height (lAH, mm); egg color (lCO, Lab); and yolk weight (lYW, g). Early and late egg quality measurements were averages of records on three to five eggs. In total 2,708 animals were genotyped for 23,356 segregating SNP (minor allele frequency >0.025; maximum proportion of missing genotypes <0.05; maximum mismatch rate between parent-offspring pairs <0.05; parentage probability >0.95), using a custom high-density Illumina SNP panel. Of the genotyped animals, 1,563 were females with individual phenotypes and 1,145 were males without phenotypes. The genotyped set included sires and dams used for breeding in generations 1 to 5 and some progeny from generation 5. Breeding values were estimated for two stages of selection. To represent selection at a very young age, when own performances and phenotypes on female sibs were not yet available, training used all phenotypic data excluding generation 5, and validation was performed on 290 genotyped female individuals from generation 5. To represent selection of males at a later age, when phenotypes on female sibs are available, phenotypes of 2,167 non-genotyped hens from generation 5 were added to the training data but validation individuals were unchanged. A basic description of these data is given in Table 1.

Statistical analysis
Because of the data structure, a reduced animal model was applied with all parents genotyped and many nongenotyped non-parent progeny with phenotypes. In this approach, a distinction is made between genotyped individuals, including all parents, for which the full relationship matrix is fitted, and non-genotyped non-parent individuals. The following model was applied, following White et al. [10]: where y is the (Nx1) vector of observations, b is the (25 × 1) vector of generation-hatch-line fixed effects, X is the (Nx25) incidence matrix for fixed effects, a is the (px1) vector of breeding values of genotyped individuals, with variance-covariance matrix G  a 2 , P is the (N × p) matrix with element ij = 1 if the ith observation is on genotyped individual j, zero otherwise, Q is an (N × N) diagonal matrix with element ii = 1 if observation i is on a non-genotyped individual, zero otherwise, S and D are (N × p) incidence matrices with elements in rows for non-genotyped individuals that correspond to the columns identifying sires and dams set to 1, and zero's elsewhere.
e is the (Nx1) vector of random errors which has variance  e 2 for observations on genotyped individuals and   e a 2 2 1 2 + for observations on non-genotyped individuals, ignoring the effect of parental inbreeding on Mendelian sampling variance in progeny. Population size and avoiding the mating of close relatives insured low inbreeding in this population. Furthermore, variance component estimates from a full animal model and the reduced animal model described above, using pedigree relationships, were very close. Thus, ignoring the effect of parental inbreeding on Mendelian sampling variance in progeny is expected to have a negligible impact on results.
Three models were used to predict breeding values of individuals in generation 5: 1) PBLUP -Reduced animal model using pedigree relationships.
2) GBLUP -Reduced animal model using markerbased relationships for genotyped birds, with covariance matrix derived by the method of VanRaden [11], using allele frequencies based on all genotyped animals.
3) Bayes-C-π -A genomic prediction method similar to Bayes-B of Meuwissen et al. [1], except for the estimation of the proportion of SNP with zero effects (π) and assuming a common variance for all fitted SNP, with a scaled inverse chi-square prior with ν a degrees of freedom and scale parameter S a 2 , as described by Habier et al. [12]. The prior for π was uniform (0,1). The chain length was 160,000 iterations, with the first 50,000 excluded as the burn in period. In this analysis, the average genotype (number of 'B' vs. 'A' alleles) of the genotyped parents was used to fit SNP genotype effects to the preadjusted mean performance of their non-genotyped progeny. To account for different residual variances for progeny means, residual variances were scaled where p is the number of phenotypes included in the mean [5].
All models included the fixed effect of hatch within generation, either fitting it in the model (for PBLUP and GBLUP) or pre-adjusting the data by subtracting solutions from a single trait animal model that included all observations and pedigree relationships (for Bayes-C-π). The PBLUP and GBLUP analyses were performed using Early (e) traits recorded at 26-28 weeks of life: egg production (ePD); age at sexual maturity (eSM); shell quality (ePS); weight of first 3 eggs (eE3); color of first 3 eggs (eC3); egg weight (eEW); albumen height (eAH); egg color (eCO); and yolk weight (eYW); late (l) traits recorded at 42-46 weeks: body weight (lBW); egg production (lPD); egg weight (lEW); albumen height (lAH); egg color (lCO); and yolk weight (lYW).
ASREML [13] and Bayes-C-π using GenSel [12]. The correlation between EBV with hatch-corrected phenotype (as described above) in the validation data sets divided by square root of heritability and regression of hatch-corrected phenotype on EBV were used as measures of accuracy and bias of EBV, respectively. Another comparison of methods was based on selecting the top 30 individuals from the 290 available for validation based on EBV for each trait and comparing the average hatch-corrected phenotype of the selected individuals. Marker based parental average (PA) EBV were also calculated for animals in the validation sets to evaluate the extent to which improvements in accuracy with use of markers resulted from more accurate estimates of Mendelian sampling terms versus more accurate EBV of the parents. This was possible in this population because parents of both sexes were genotyped. To check if combining marker-based estimates with PA increases accuracies of estimates, as suggested by VanRaden et al. [6] for dairy cattle, linear regression of pre-adjusted phenotypes on PA and genomic EBV was performed; if GEBV capture all pedigree information, then adding PA to the regression model is not expected to increase the ability to predict phenotype in validation animals.

Results and discussion
Estimates of heritability from single-trait pedigree-based animal models fitted to the whole data set are shown in Table 2. Estimates were low to moderate for production and shell quality and moderate to high for all other egg quality traits, as expected. Estimates of heritability for early traits were higher than for the corresponding late traits. Variance components for the late traits may be biased because only selected birds had the opportunity to obtain phenotypes for these traits.

Accuracy of marker-based EBV
Marker-based EBV had, in general, a higher predictive ability than estimates using pedigree relationships (Figures 1 and 2) for all traits and for early and late selection scenarios. The advantage of GBLUP over PBLUP is due to the fact that realized marker-based genetic similarity between animals deviate from pedigree-based relationship coefficients. In addition, markerbased EBV are not affected by pedigree errors, although they are affected by genotyping errors and errors in DNA sample identification. As shown in Figure 3, marker-based relationships varied substantially around pedigree relationships. The regression of marker-based on pedigree-based relationships was 0.88 for all individuals and 0.97 for validation individuals, demonstrating on average good agreement between both types of relationships. The correlation between the two relationship measures was 0.68 and 0.72 for all and validation individuals, respectively. The difference in accuracy between GBLUP and PBLUP was smaller for selection at a later age than at an early age, when data on sibs of selection candidates were available (Figures 1 and 2). This extra information increased the accuracy of all methods and particularly of PBLUP. Using marker-based relationships increased accuracies up to over two-fold for early selection and by up to 88% for late selection. Proportionally, the highest gain in accuracy was achieved for traits with the lowest heritability. Accuracies obtained with GBLUP were on average slightly larger than those with Bayes-C-π. Several simulation studies have shown that the accuracy of Bayesian methods is higher than that of GBLUP [1,14,15] but a simulation study reported by Daetwyler et al. [16] has shown that the relative performance of GBLUP depends to a large extent on the genetic architecture of the trait. Also, studies on real data in dairy cattle have shown that GBLUP can be equally accurate or even superior in prediction for traits for which no individual QTL explains a large proportion of the variation [17,18].
Correlations for selection at an early age between EBV obtained by PBLUP and GBLUP ranged from 0.48 to 0.70 across the traits; from 0.46 to 0.71 between EBV from PBLUP and Bayes-C-π; and from 0.79 to 0.97 between EBV from GBLUP and Bayes-C-π. This indicates that reranking of top individuals is very likely between pedigree-and marker-based methods but limited between GBLUP and Bayes-C-π. This was confirmed by the average performance of the top 30 individuals selected with different methods (Table 3), which was similar for marker-based methods but somewhat different for the group selected based on pedigree EBV. A similar tendency was observed for ranking at   late selection but correlations between EBV from different methods were higher for this scenario. The presence of bias in EBV was evaluated by regressing phenotypes of validation individuals on their EBV. On average, these regression coefficients tended to be lower than the expected value of 1: 0.9 for PBLUP, 0.8 for GBLUP and 0.86 for Bayes-C-π, which suggests that EBV overestimated differences in phenotypes of progeny. This bias may be due to selection not being properly accounted for by the single-trait analyses or due to the assumption of normality for genotypic values not being valid. However, the mean squared deviation of the regression coefficients from 1 was lowest for Bayes-C-π, 0.05, compared to 0.06 for PBLUP and 0.07 for GBLUP, suggesting that estimates from Bayes-C-π were least biased. Sib information tended to improve the performance of all methods in this regard for most traits.

Estimation of π, the proportion of markers with zero effects
The proportion of markers with zero effect (π) is estimated from the data in the Bayes-C-π method. Habier et al. [12] have shown that, if there is enough information in the data, (1- )k is a good estimate of the number of QTL affecting the trait when k unlinked SNP with normally distributed effects were simulated and genotypes used for training included genotypes at the QTL. In the case of more realistic simulations, where QTL genotypes were not included as markers but the effects were estimated based on k linked markers, the number of markers fitted was higher than the number of true QTL, but the tendency for lower estimates of π for scenarios with more QTL did hold [12].
The posterior means of π (Table 4) suggest that a high proportion of markers should be included in the model to explain a substantial part of the genetic variation for the majority of traits in our data; estimates of π ranged from 0.19 to 0.99, which suggests that between 111 and 19,541 markers explained variation for the analyzed traits ( Table 4). The large number of associated markers with relatively small effects explains the good performance of GBLUP, which assumes a polygenic determination of  traits. However, GBLUP also performed well for egg weight and body weight, which had very high estimates of π. The results suggest that a limited number of markers explain most of the genetic variation for body size in chickens. This can be due to these markers being linked to or in linkage disequilibrium with QTL and/or due to markers capturing pedigree relationships [19]. The accuracy of estimates of π depends on the information content of the data and on mixing in the Monte Carlo Markov Chain, which can be poor for Bayes-C-π. Two independent chains with a high (0.99) or a low (0.1) starting value for π were used to verify convergence of π. For some traits (eE3, eEW, lEW, eCO, lBW), both chains converged to the same value with a clearly peaked posterior distribution but for other traits 160,000 iterations were not sufficient for the two chains to reach the same posterior means, as the posterior distribution of π was relatively flat. This difference may reflect differences in genetic architecture of the traits. For traits with a high estimate of π (i.e. with few markers associated), convergence was obtained quickly and the standard deviation of the posterior distribution of π was small but for traits for which many markers were fitted in the model, the standard deviation of π was high, which suggests that models with different numbers of markers had similar likelihoods. Nevertheless, lack of convergence in π, i.e. different estimates depending on starting value, had almost no impact on the accuracy of EBV. There was also no substantial difference between early and late selection scenarios with regard to convergence of estimates of π. Only for ePD and lCO did the inclusion of additional information from sibs make the posterior means of π from different chains more similar to each other.

Information from parental average EBV
In dairy cattle, genomic predictions are often combined with pedigree information [4] before obtaining final genomic EBV. In our study, lEW was the only trait for which adding pedigree-based information significantly improved predictive ability. The increase in the R-square of the regression equation to predict hatch-corrected phenotypes from generation 5 when adding PBLUP to marker-based EBV was significant (p < 0.05) only for lEW for GBLUP and Bayes-C-π, for which the R-square increased from 0.174 to 0.189 and from 0.187 to 0.203, respectively. Increases in R-square were not significant (p > 0.05) for all other traits using both methods. This suggests that in this dataset, the markers capture most of the pedigree information, likely because all the parents were genotyped.
For most traits, the predictive ability of the markerbased EBV was not substantially lower for traits measured at a late age (Figures 1 and 2), although late traits were only recorded on selected individuals and estimated heritabilities for late traits were generally lower than for the corresponding traits measured at a younger age. This indicates that having records only on selected parents did not limit the ability to estimate marker effects.
In Figures 1 and 2, the difference between the accuracy of marker-versus pedigree-based parental average EBV (e.g. GBLUP-PA vs. PBLUP) reflects the gain in information from more accurate EBV of parents when using markers, while the difference between the accuracy of marker-based parental average EBV and markerbased individual EBV (e.g. GBLUP-PA vs. GBLUP) arises from markers providing information on Mendelian sampling terms. For ePS and ePD and eSM, the increase in accuracy at an early age could be attributed mostly to better estimates of parental EBV. For all other traits, increases in accuracy were primarily based on markers providing information on Mendelian sampling terms. For EBV for selection at a later age, the improvement originated mostly from Mendelian sampling terms, probably because the pedigree parental average EBV were much more accurate than at the earlier age.

Reduced animal model
The reduced animal model was used to incorporate genomic information into genetic evaluation using GBLUP. It was possible to use this model here because all the parents were genotyped, thus data from nongenotyped individuals could be included without loss of information. If some parents are not genotyped, the 1-step methods that combine pedigree-based and genomic relationships can be used to avoid loss of information [7,8]. An alternative to the 1-step method is the use of deregressed EBV [5,6] but this involves approximations and a potential loss of information.
In fact, the model used here represents a special case of the 1-step method of Legarra et al. [7], where all non-genotyped individuals in the data are non-parent progeny. In this case, the only pedigree-relationships that are used are those between genotyped parents and their non-genotyped progeny. Without inbreeding, the expectation of these relationships is equal to 0.5, both based on pedigree and based on genomic data, because progeny receive half of their alleles from each parent. Thus, in this special case, combining genomic and pedigree relationships does not require the rescaling that is typically required for the 1-step approach [20]. In addition to avoiding the need for rescaling, this special case allows equations for non-parents to be absorbed, as in the reduced animal model, which reduces computational demands, although the main computational task of inverting the dense genomic relationship matrix of genotyped individuals remains. By absorbing non-parents, computing time for the reduced animal model is proportional to n 3 , where n is the number of genotyped animals, while the number of animals with phenotypes has a negligible impact on computing time. Computing time for Bayes-C-π is proportional to the number of markers and to the number of records. The reduced animal model can also easily be extended to a multi-trait setting, following standard multiple-trait animal model procedures. Finally, applying a reduced animal model makes it possible to use weighted genomic relationship matrices that accommodate differential weights on SNP, depending on their effects, similar to the Bayesian model averaging methods [21]. Use of a weighted genomic relationship matrix in a multi-trait setting, however, requires further work.

Implementation of genomic selection in layer chickens
Increases in accuracy were evaluated when selection is at a very early age, prior to phenotypes being available on selection candidates or their sibs, and at a later age. Late age selection represents a scenario in which genomic information is used to increase accuracy of selection in existing layer breeding programs, particularly in the case of males, which are primarily evaluated based on sib information in current breeding programs. Early age selection represents a scenario in which the benefits of genomic selection are capitalized on by also reducing the generation interval from the traditional one year to half a year, as proposed by Dekkers et al. [22]. Using these results, breeding programs exploiting genomic information can be optimized, including scenarios where only male candidates are genotyped and where population sizes are reduced to capitalize on the effect of GEBV on rates of inbreeding. The use of low-density SNP panels needs to be evaluated [23] to reduce costs of genotyping, but this was beyond the scope of this research. In this study, the size of the training data was limited compared to what is available in dairy cattle and increasing its size is expected to further increase the accuracy of GEBV.

Conclusions
Reduced animal model approaches can be used to estimate breeding values from high-density SNP data when all parents have been genotyped. Marker-based methods improve the prediction of future performances compared to the classical pedigree-based approach, with most of the accuracy increase due to improved estimation of Mendelian sampling terms. The advantage of marker-based methods is greater for selection at a young age, before information on sibs of selection candidates is available. The accuracies of methods that assume equal variance for all SNP, such as GBLUP and of those that allow differential weighting and shrinkage of SNP effects are similar.