Genomic selection of purebred animals for crossbred performance in the presence of dominant gene action
- Jian Zeng^{1},
- Ali Toosi^{2},
- Rohan L Fernando^{1}Email author,
- Jack CM Dekkers^{1} and
- Dorian J Garrick^{1}
DOI: 10.1186/1297-9686-45-11
© Zeng et al.; licensee BioMed Central Ltd. 2013
Received: 8 November 2012
Accepted: 15 March 2013
Published: 26 April 2013
Abstract
Background
Genomic selection is an appealing method to select purebreds for crossbred performance. In the case of crossbred records, single nucleotide polymorphism (SNP) effects can be estimated using an additive model or a breed-specific allele model. In most studies, additive gene action is assumed. However, dominance is the likely genetic basis of heterosis. Advantages of incorporating dominance in genomic selection were investigated in a two-way crossbreeding program for a trait with different magnitudes of dominance. Training was carried out only once in the simulation.
Results
When the dominance variance and heterosis were large and overdominance was present, a dominance model including both additive and dominance SNP effects gave substantially greater cumulative response to selection than the additive model. Extra response was the result of an increase in heterosis but at a cost of reduced purebred performance. When the dominance variance and heterosis were realistic but with overdominance, the advantage of the dominance model decreased but was still significant. When overdominance was absent, the dominance model was slightly favored over the additive model, but the difference in response between the models increased as the number of quantitative trait loci increased. This reveals the importance of exploiting dominance even in the absence of overdominance. When there was no dominance, response to selection for the dominance model was as high as for the additive model, indicating robustness of the dominance model. The breed-specific allele model was inferior to the dominance model in all cases and to the additive model except when the dominance variance and heterosis were large and with overdominance. However, the advantage of the dominance model over the breed-specific allele model may decrease as differences in linkage disequilibrium between the breeds increase. Retraining is expected to reduce the advantage of the dominance model over the alternatives, because in general, the advantage becomes important only after five or six generations post-training.
Conclusion
Under dominance and without retraining, genomic selection based on the dominance model is superior to the additive model and the breed-specific allele model to maximize crossbred performance through purebred selection.
Background
Numerous studies have shown encouraging results of applying genomic selection (GS) in purebred populations [1–6]. However, except for dairy cattle, most animals used in livestock production systems are crossbreds, with advantages of heterosis and breed complementarity. For such systems, the breeding goal in purebreds should be to optimize the performance of crossbred descendents.
GS has advantages in selection for crossbred performance over conventional methods [7–9] as reported in [9]. In particular, training on crossbred data for GS accounts for genetic differences between purebred and crossbred animals and genotype by environment effects. Different GS models have been proposed and used to select purebreds for crossbred performance [8–11]. In most studies, additive gene action or perfect knowledge of gene substitution effects or both have been assumed. It has been argued that dominance is the likely genetic basis of heterosis [12, 13], therefore explicitly including dominance in the GS model may be beneficial for selection of purebreds for crossbred performance.
Allele substitution effects are a function of additive and dominance effects, and of allele frequencies [12]. In the case of crossbred records, Dekkers [8] proposed a model that fits breed-specific allele substitution effects (BSAM), where the substitution effects at the QTL (quantitative trait loci) for paternal and maternal alleles would be different if the parental breeds differ in allele frequencies at the QTL. It has been shown that, when additive and dominance effects of the QTL are known without error, BSAM can give greater response to selection than the usual additive model that fits a common substitution effect for each QTL [10, 11]. When the QTL genotype is not observed and marker genotypes are fitted in the model, linkage disequilibrium (LD) between SNPs and the QTL may not be consistent between the parental breeds. This difference in LD will also contribute to differences in allele substitution effects between breeds. When SNP effects must be estimated, the advantage of fitting BSAM over the additive GS model was not always observed under additive inheritance [9], which suggests that differences in LD between breeds may not be as important as the presence of dominant gene action in practice.
Dekkers [14] showed that, to maximize performance in the first generation of crossbreds, the allele substitution effect for an identified QTL must be derived based on allele frequencies in the selected mates. However, allele frequencies of selected mates cannot be observed prior to computation of the substitution effects that are needed for selection. Thus, Dekkers [14] proposed an iterative algorithm to compute substitution effects based on selected mates.
A model that explicitly includes dominance effects (the dominance model) provides estimates of both additive and dominance effects and therefore enables the computation of allele substitution effects using appropriate allele frequencies. Once estimates of SNP effects are obtained from training, they can be successively applied over generations with updated allele frequencies to develop prediction equations specific to that generation.
The BSAM model gives allele substitution effects for each parental breed that depend on allele frequencies among individuals from the other breed that were used to produce the training population. It is not straightforward to apply the iterative algorithm of Dekkers to BSAM. In addition, the substitution effects from BSAM cannot be used to recompute the appropriate substitution effects in the subsequent generations, as allele frequencies change due to selection. Furthermore, breed origin of SNP alleles must be known or inferred for the BSAM model [8, 10], but such knowledge is not needed for the dominance model.
The primary objective of this study was to assess the performance of the dominance model in comparison with the additive model or BSAM for GS on purebreds for crossbred performance. Substitution effects from the dominance model were computed based on allele frequencies of unselected mates. Ibáne~z-Escriche et al. [9] compared BSAM to the additive model under additive gene action alone and Kinghorn et al. [10, 11] made this comparison with dominant gene action when QTL effects were assumed known. Thus, a secondary objective of this study was to compare BSAM to the additive model under dominance when SNP effects must be estimated. Model performance was evaluated by computer simulation based on response to 20 generations of selection in a two-way crossbreeding program.
Methods
Simulations
Comparisons between the dominance, BSAM and additive models were made for four scenarios of gene action. To clearly detect an advantage of including dominance in the model, the dominance variance V_{ D } and heterosis H were chosen to be large in scenario 1, allowing for overdominance. In scenarios 2 and 3, V_{ D } and H were restricted to more realistic values with (scenario 2) or without (scenario 3) overdominance. In scenario 4, V_{ D } was reduced to zero to examine any disadvantage of using the dominance model when gene action is purely additive. Changes to V_{ D } and H were achieved by changing the size and proportion of beneficial dominance effects. Other parameters, including locus positions and LD between loci and allele frequencies were held constant between the four scenarios. A total of 16 random simulations were carried out for each scenario.
Genome and trait phenotypes
A genome was simulated with either one or ten chromosomes. Each chromosome was one Morgan and consisted of 100 randomly distributed QTL and 1000 SNP markers that were almost evenly spaced. All loci were biallelic with starting allele frequencies of 0.5 and a reversible mutation rate of 2.5×10^{-5}. A binomial map function was used to model recombination with interference on a chromosome [15]. A base population of 500 unrelated individuals was randomly mated for 1000 discrete generations to create LD between loci in the founders of different breeds.
The additive effect a of a QTL is defined as half the difference in genotypic value between alternate homozygotes and the dominance effect d as the deviation of the value of the heterozygote from the mean of the two homozygotes [12]. Bennewitz and Meuwissen [16] evaluated QTL mapping results from many studies in pigs for meat quality and carcass traits and concluded that an exponential distribution with rate parameter 5.81 is an adequate “generating mechanism” for the absolute values of the additive effects of QTL. That distribution was used here to generate the unsigned value of the additive effect for each QTL, and a positive or negative sign was assigned to each additive effect with equal probability. Although the relationship between additive and dominance effects of QTL has been studied [16–18], a consistent relationship has not been observed. Thus, in scenarios 1 and 2 with overdominance, we assumed, for simplicity, that the dominance effects were independent of additive effects. The absolute values of dominance effects were independently sampled from the same exponential distribution as was used for the additive effects. This was considered reasonable because the exponential distribution has a high probability for the occurrence of small effects, which is also plausible for dominance effects. In scenario 3 with no overdominance, the dominance effect of a QTL was sampled from a uniform distribution that ranged from zero to the absolute value of the QTL’s additive effect. In order for the trait to manifest positive heterosis, only 30% of the sampled dominance effects were negative in scenarios 1 and 2, and this fraction was further reduced to 20% in scenario 3 for the sake of no overdominance. The resulting distribution of dominance coefficients, defined as the ratio of the dominance effect over the absolute value of additive effect, was similar to what has been observed for real data [16].
The QTL effects were scaled (as described in AppendixAppendix A) such that the relative contributions of the additive and dominance effects to the genetic variability of the trait were 2:1, 4:1 and 1:0 in the scenarios where V_{ D } was set to be large, realistic or null, respectively. After scaling, 40-45% of QTL showed partial dominance and 30-35% overdominance when overdominance was present. Trait phenotypes were simulated by adding a standard normal residual effect to the genotypic value of each animal. The variance of the residual effects was chosen such that broad sense heritability ${h}_{\mathit{\text{bs}}}^{2}$ of the trait was 0.5 in the founders. As a result, narrow sense heritability ${h}_{\mathit{\text{ns}}}^{2}$ was 0.33 for the scenario with large V_{ D }, 0.4 with realistic V_{ D }, and 0.5 with V_{ D } = 0.
Breed formation
In generation -1, the genetic disparity between breeds was due to differences in LD and in allele frequencies. Averaged over simulations, the heterozygosity of each breed was about 0.3, and the mean difference in allele frequencies between breeds was about 0.3. In each simulation, while the same set of QTL characterized the trait, the contribution of QTL effects to the phenotypic variability differed between breeds due to disparities in allele frequencies. Between simulations, the observed values of variance components in a given breed varied due to genetic drift during the 54 generations of random mating after breed separation.
In livestock, dominance can explain up to about 10% of phenotypic variation [19] and heterosis from a breed cross can be up to 10% [20]. In scenario 1, where V_{ D } was large, V_{ D } was on average 16.7% in the pure breeds and H was on average 31%. Under more realistic settings, ${\overline{V}}_{D}=10$% and $\overline{H}=9.1$% in scenario 2, where overdominance was allowed, and ${\overline{V}}_{D}=4.5$% and $\overline{H}=8.4$% in scenario 3, where overdominance was not allowed. When the genome was extended from one to ten chromosomes, these values were kept about the same by reducing the proportion of beneficial dominance effects.
Crossbreeding program
A two-way crossbreeding program with 20 generations of selection was simulated, as illustrated in Figure 1. The goal was to improve crossbred performance through selection in both parental breeds, starting from 1000 animals of sire breed (A) and 1000 animals of dam breed (B) in generation 0. The selection criteria was the rank of the individual’s genomic estimated breeding values (GEBV). The SNP effects for the prediction of GEBV were estimated only once, using 1000 crossbred AB_{0} animals in generation 0. These estimates of SNP effects were then repeatedly applied to predict GEBV in the following 20 generations of selection. In generation 1 through 20, 600 animals were selected from 1000 candidates in each parental breed based on their GEBV, of which the top 100 were used as males and the other 500 as females. As described in detail later, with the dominance model, GEBV were based on the breed-specific allele substitution effects that were recomputed for each generation, using allele frequencies in the opposite breed in that generation. The selected animals were randomly mated within each breed to produce 1000 purebred replacement animals for the next generation. Meanwhile, the 100 selected males of breed A were also randomly mated to the 500 selected females of breed B to produce 1000 crossbred progeny. The phenotypic mean of crossbreds was computed in each generation of selection (AB_{1} – AB_{20}) to evaluate the cumulative response to selection.
Starting from the same set of purebred selection candidates in generation 0, the subsequent 20 generations of selection were repeated 100 times to increase power to detect differences between the GS models in cumulative response. A mixed linear model (see Appendix B) was used to test for differences between GS models by generation 20. Note that the 16 random simulations each with 100 repetitions account for differences in purebreds due to genetic drift. In sum, the collected data consisted of 1600 observations in each generation of selection.
where μ_{ i } is the phenotypic mean of crossbreds in generation i and σ_{0} is the phenotypic standard deviation of crossbreds in generation 0.
Statistical models
Additive model
and $\mathrm{E}\left({\sigma}_{\alpha}^{2}\right)$ was obtained following [21] such that $\mathrm{E}\left({\sigma}_{\alpha}^{2}\right)=\frac{{V}_{A}}{k(1-{\Pi}_{0})\mathrm{E}\left(2\mathit{\text{pq}}\right)}$, where k is the total number of SNPs, Π_{0} is the chosen probability that a SNP has no effect prior to the analysis, p = 1 - q is the allele frequency, and V_{ A } is the additive genetic variance for the trait that is explained by all SNPs. The residual e_{ i } had a normal prior with variance also following a scaled inverse Chi-square distribution: ${e}_{i}|{\sigma}_{e}^{2}\sim N(0,{\sigma}_{e}^{2})$ and ${\sigma}_{e}^{2}\sim {\nu}_{e}{S}_{e}^{2}{\chi}_{{\nu}_{e}}^{-2}$, where ν_{ e } = 4. The value of ${S}_{e}^{2}$ was obtained from (2), with $\mathrm{E}\left({\sigma}_{e}^{2}\right)={V}_{E}$, where V_{ E } is the residual variance that cannot be explained by the SNPs. True values were given to V_{ A } and V_{ E } in this study.
Dominance model
where η is our prior belief about μ_{ d } and ϕ is the “prior sample size”, which expresses the strength of the prior belief in terms of ${\sigma}_{d}^{2}$. Here, ϕ was set to 10, a small number relative to 100 QTL, to allow the data to “dominate” the posterior distribution of μ_{ d }. The value of η was chosen as described below.
The variance components ${\sigma}_{a}^{2}$ and ${\sigma}_{d}^{2}$ were assumed to have independent scaled inverse Chi-square distributions. As shown in (2), specification of the hyper parameters ${S}_{a}^{2}$ and ${S}_{d}^{2}$ requires knowing $\mathrm{E}\left({\sigma}_{a}^{2}\right)$ and $\mathrm{E}\left({\sigma}_{d}^{2}\right)$. The following describes how $\mathrm{E}\left({\sigma}_{a}^{2}\right)$ or $\mathrm{E}\left({\sigma}_{d}^{2}\right)$ were computed based on the known quantities V_{ A } and V_{ D }.
Values for ${S}_{a}^{2}$ and ${S}_{d}^{2}$ can now be calculated by substituting (9) and (6) in (2), respectively.
Breed-specific SNP allele model
where ${X}_{\mathit{\text{ij}}}^{A}$ and ${X}_{\mathit{\text{ij}}}^{B}$, with value (0, 1), are the breed-specific copy numbers of a given allele at SNP j of breed origin A or B that animal i received from its sire or dam, and ${\alpha}_{j}^{A}$ and ${\alpha}_{j}^{B}$ are the breed-specific substitution effects for the alleles of breed origin A and B. The other parameters are defined as in the additive and dominance models. In BSAM, the SNP allele effects have breed-specific variances $\frac{}{}{\sigma}_{{\alpha}^{A}}^{2}$ and $\frac{}{}{\sigma}_{{\alpha}^{B}}^{2}$, and breed-specific parameters $\frac{}{}{\Pi}_{{\alpha}^{A}}$ and $\frac{}{}{\Pi}_{{\alpha}^{B}}$. The same prior as used in the additive model is used for $\frac{}{}{\sigma}_{{\alpha}^{A}}^{2}$ and $\frac{}{}{\sigma}_{{\alpha}^{B}}^{2}$.
Inference for model parameters
Markov chain Monte Carlo (MCMC) sampling was used to draw inferences from the posterior distributions of parameters. Gibbs sampling was used to sample parameters from their full conditional distributions, which are derived for some key model parameters in Appendix C. Since the implementation of a Gibbs sampler in the additive model has been well described by [6], here we focus on the algorithm for the dominance model. The decision to include a SNP in the model was separately sampled for the additive and dominance effects in the dominance model. Similarly, in BSAM, the decision to include a SNP in the model was separately sampled for the sire and dam breed-specific allele substitution effects.
The analyses were implemented by modifying GenSel[22] to allow dominance and allele specific effects. The Markov chain used for inference consisted of 11 000 samples, with the first 1000 discarded as a burn-in. Longer chains did not improve prediction accuracy. Parameters were estimated from the mean of the resulting 10 000 posterior samples.
True and genomic estimated breeding values
where Z_{ ij } is the marker genotype and ${\widehat{\alpha}}_{j}^{r}$ is the estimated allele substitution effect for SNP j. The definition of ${\alpha}_{t}^{r}$ for a purebred animal with a breeding goal of maximizing the performance of the crossbred descendents is described next.
Similarly, the substitution effect for the dam is α^{ D } = a + (1 - 2p^{ S })d. As a result, the allele substitution effects for a purebred parent used for crossbreeding are breed-specific and defined in terms of the allele frequencies in the breed of the other parent.
where r’ is the breed of the other parent of the crossbreds. In BSAM, ${\alpha}_{j}^{r}$ is directly estimated for prediction of $\mathit{\text{GEB}}{V}_{i}^{r}$ in (12), while it is indirectly estimated from the dominance model by combining the estimates of a_{ j } and d_{ j } with the current value of ${p}_{j}^{{r}^{\prime}}$ from breed r’ in (13). The additive model does not estimate breed-specific substitution effects. Instead, it estimates a common α_{ j } for SNP j, which uses the allele frequency in the crossbreds used for training.
Results
Cumulative response to selection
Cumulative response to genomic selection at generation 20 by the BSAM and dominance models compared to the additive model
One chromosome | Ten chromosomes | |||
---|---|---|---|---|
Scenario | Dominance model | BSAM | Dominance model | BSAM |
1 | 22.4%* | 6.5%* | 10.1%* | 1.7% |
2 | 8.6%* | 0.3% | 2.1% | -3.1% |
3 | 0.2% | -1.7% | 1.0% | -4.4% |
4 | -0.1% | -5.9%* | -0.7% | -6.6%* |
Changes in breed averages and heterosis in response to selection
Response to selection in heterozygosity
Discussion
Including dominance in addition to additive effects in the model was advantageous for response to selection when dominant gene action was present. Even when all gene action was purely additive, using the dominance model did not show a negative effect. However, an advantage of BSAM was observed only when dominance variance and heterosis were large, which confirms our hypothesis that the superiority of BSAM over the additive model may be primarily due to dominance effects, rather than differences in LD between the parental breeds.
Comparison between the dominance model and the additive model
It has been shown [14] that for a two-way cross and ignoring selection, the allele substitution effects for QTL or markers in one parental breed depend on the allele frequencies in the other parental breed. Thus, in the computation of substitution effects, failure to use the appropriate allele frequencies may result in a loss of response to selection. Dekkers [23] showed that, with the availability of only purebred data, selection within a breed using QTL allele substitution effects that are based on the allele frequencies from the opposite breed gave substantially greater response to selection than using allele frequencies from the same breed.
In the additive model, a single substitution effect was estimated for each SNP, assuming it is the same for both parental breeds. Selection on GEBV derived using such allele substitution effects is expected to fix the favorable allele in both breeds (Figure 6a). Exceptions to this could be genetic drift or the marker and QTL being in LD with opposite phases in the two parental breeds (Figure 6b). When the two breeds have opposite LD phases, and a common nonzero substitution effect is estimated for a SNP in the additive model, the allele frequencies of associated QTL will move in opposite directions in the two breeds.
In the dominance model, breed-specific allele substitution effects were computed using estimated dominance and additive effects, together with the appropriate allele frequencies from the other parental breed. When overdominance is present, the allele substitution effect α = a + (1 - 2p)d may have opposite signs in the parental breeds, depending on allele frequencies p in the two breeds [12]. In this case, the two parental breeds are expected to be fixed for alternate alleles of over-dominant QTL, which increases the frequency of favorable heterozygotes in crossbred progeny. Note that under the additive model, fixation of the favorable allele in both breeds would result in lower heterozygosity in the crossbreds. This explains why the dominance model resulted in substantially greater heterosis than the additive model (Figure 4a). Recall that the purebred gain was lower with the dominance model than with the additive model because the unfavorable allele was moved towards fixation in one parental breed at some loci. However, the crossbred gain was more than compensated by the larger amount of heterosis in the crossbreds. The dominance model was mostly favored in crossbred gain in scenario 1, where the dominance variance was large (Figure 2a), because in this scenario the difference in allele substitution effects between breeds was large.
When dominance is incomplete, the breed-specific substitution effects for a locus will have the same sign in both breeds, as can been seen from (13). In the long term, and ignoring drift, the favorable allele will be fixed in both breeds with any GS model. Thus, differences in response to selection in scenario 3 were not significant between GS models (Figure 2c). However, Kinghorn et al. [11] observed that with many QTL, even incomplete dominance had a significant influence on the divergence of allele frequencies in the parental breeds because the effect of drift is not negligible with a large number of loci. This phenomenon was also confirmed in our study, in which the advantage of the dominance model over the additive model in cumulative response to selection increased from 0.2% to 1.0% in the presence of only incomplete dominance (Table 1, scenario 3), when the number of loci increased ten-fold. Although the signs of the substitution effects are the same in the parental breeds with incomplete dominance, their magnitude can be very different depending on the allele frequencies. In the additive model, where a common substitution effect is used, loci with negligible substitution effects will be under higher selection pressure than in the dominance model, where breed-specific substitution effects are used. When the number of loci is large, relative to the additive model, the dominance model will put little or no selection pressure on loci with small or negligible substitution effects, resulting in greater genetic progress. Thus, the ability to exploit dominance for loci without overdominance may still be very important.
The advantage of the dominance model over the additive model is attributable to the use of breed-specific allele substitution effects to calculate GEBV of purebreds for crossbred performance with the dominance model. Kinghorn et al. [10, 11] observed an advantage of using breed-specific allele substitution effects for crossbreeding (reciprocal recurrent genomic selection, RRGS) in an ideal situation, where additive and dominance effects at the QTL were known without error. In the presence of overdominance, RRGS had greater cumulative response to selection than the additive model, regardless of whether recalculation of substitution effects of QTL alleles was performed only in the first generation, every five generations or every generation [10]. In the absence of overdominance, the advantage of RRGS over the additive model was still observed with retraining each generation, especially for many QTL [11].
Comparison between BSAM and the additive model
In BSAM, for the first generation after training, the estimated breed-specific allele substitution effects of SNPs account for differences in LD and allele frequencies between breeds. However, with repeated selection over generations, the breed-specific allele substitution effects must be re-estimated in BSAM to accommodate changes in allele frequency and LD. Ibáne~z-Escriche et al. [9] showed that, when using crossbred data to predict GEBV of purebred descendants, the additive model had a greater accuracy of prediction than BSAM if the breeds were related and the training population size was small relative to the number of markers. However, pure additive inheritance was assumed in their study. Under dominant gene action, we expect BSAM to have an additional advantage over the additive model because, as can be seen from (13), apart from different LD, breed-specific allele substitution effects will be different when allele frequencies differ between breeds. However, in this study, BSAM had a lower response to selection than the additive model, except when the dominance effects were large and the number of SNPs was not greater than the number of observations. When the number of SNPs was ten times greater than the number of observations, the advantage of BSAM over the additive model was not significant.
Posterior means of the number of nonzero SNP effects fitted in the model
BSAM | Dominance model | ||||||
---|---|---|---|---|---|---|---|
Scenario | Additive model | α ^{ S } | α ^{ D } | Total | a | d | Total |
1 | 183.1 | 73.2 | 131.0 | 204.2 | 284.3 | 58.4 | 342.7 |
2 | 166.3 | 88.2 | 151.3 | 239.5 | 181.8 | 58.1 | 239.9 |
3 | 130.9 | 68.7 | 114.4 | 183.1 | 149.4 | 34.8 | 184.2 |
4 | 105.4 | 90.9 | 144.8 | 235.7 | 116.3 | 4.1 | 120.4 |
Another possible reason to explain the lower response of BSAM relative to the additive model was given by [9]. Consider a locus that is segregating in breed A but fixed in breed B. Because the additive model regresses phenotypes only on segregating alleles, the common substitution effect for this locus is actually the effect specific to breed A, just as in BSAM. However, BSAM includes an additional substitution effect for breed B, for which the allele is fixed. The substitution effect estimated from BSAM for the breed B allele will only add noise to the prediction of GEBV. Therefore, when several loci are nearly fixed in one of the parental breeds but segregating in the other, the additive model is expected to show an advantage over BSAM.
Comparison between BSAM and the dominance model
The number of effects in BSAM equals the number of breeds times the number of SNPs. When only two breeds are involved in the cross, the dominance model is expected to have the same number of parameters as BSAM if the number of SNPs included in the model is fixed. However, with separate probability of inclusion parameters Π_{ a } and Π_{ d } for additive and dominance effects, the number of dominance effects that are fitted in the model only depends on the actual number of dominance effects for the trait. As dominance variance was lowered from large to null, the posterior mean of the number of nonzero dominance effects in the model decreased from 58.4 to 4.1 (Table 2). Thus, in contrast to BSAM, model complexity does not seem to be an issue for the dominance model. Even when dominance was absent, the response in the dominance model could not be distinguished from that of the additive model (Figure 2d) because only a few nonzero dominance effects were fitted in the model. In this regard, the dominance model is more robust than BSAM.
Even when additive and dominance effects are consistent between breeds, allele substitution effects will be breed-specific if allele frequencies differ between breeds. In such a case, the estimates of breed-specific allele substitution effects in the dominance model are expected to be more accurate than those in BSAM for the following four reasons:
First, the estimates of additive and dominance effects from the dominance model are combined with the observed allele frequencies in the opposite parental breed to calculate the breed-specific allele substitution effects. In BSAM, however, breed-specific allele substitution effects are estimated directly. Thus, the allele frequencies used in BSAM are based on the frequencies in the training population of the alleles inherited from the opposite parental breed. Note that the alleles inherited by the training population are a random sample of those from the parental population, and therefore their frequencies will deviate from those of the parental population. Thus, the use of observed allele frequencies from the parental population to compute breed-specific allele substitution effects favors the dominance model over BSAM.
Second, in the dominance model, as selection progresses and allele frequencies change due to selection, the observed allele frequencies in each generation are combined with the estimates of additive and dominance effects obtained in training to compute the current values of the breed-specific allele substitution effects. However, with the additive model and with BSAM, the allele substitution effects estimated in training are repeatedly used to compute GEBV of selection candidates, ignoring changes in allele frequencies. As a result, drops in accuracy of GEBV in generations of selection were greater for the additive model and BSAM than for the dominance model (results not shown). Thus, use of the dominance model is expected to require less frequent retraining than use of BSAM or the additive model. This is appealing for traits that are difficult or expensive to measure.
Thus, deviates ϵ_{00} and ϵ_{01} cannot both be zero. Similarly, it can be shown that ϵ_{10} - ϵ_{00} = 2p^{ B }d and ϵ_{11} - ϵ_{00} = 2(p^{ A } + p^{ B } - 1)d. The nonzero differences between the deviates imply that at least some deviates are not equal to zero. More simply, $\mu +{\alpha}_{1}^{A}+{\alpha}_{2}^{B}$ in the BSAM for G_{01} may not be equal to $\mu +{\alpha}_{2}^{A}+{\alpha}_{1}^{B}$ for G_{10}, although G_{01} is equal to G_{10}. In other words, the genotypic value is not exactly modeled in BSAM.
Forth, implementation of BSAM requires knowing the breed origins of SNP alleles but this is not required for the dominance model. In this study, breed origins of SNP alleles were assumed to be known without error but this may not hold for real data, which would introduce errors to the estimation of breed-specific SNP effects in BSAM.
However, the advantages of the dominance model may not hold if additive and dominance effects of SNPs are not consistent between breeds. More precisely, although a and d at the QTL may be consistent between breeds, the SNP effects may differ between breeds due to differences in LD. When these differences of SNP effects between breeds are large, the dominance model may become inferior to BSAM, for which breed-specific allele substitution effects account for differences in LD in addition to differences in allele frequencies. The magnitude and phase of LD, especially long-range LD, has been found to be inconsistent between breeds in livestock [24–26]. However, whether those differences in LD are large enough to enable BSAM to outperform the dominance model needs further investigation with real data.
The benefit of using one model over another depended on the number and size of QTL and the density of SNPs relative to the size of the training population. When the genome was extended from one to ten chromosomes, there was a ten-fold increase in the number of SNPs but the SNP density remained the same. In addition, with the values of variance components held constant, each QTL explained a smaller proportion of the total genetic variance for the larger genome, which made it more difficult to capture the QTL effects through SNPs. This explains the observed reduction of differences in response between models. This suggests that the preference of the dominance model is expected to hold for a larger genome but the amount of benefit will decrease when the genetic architecture is more polygenic. Furthermore, a larger training population will be needed.
In this study, SNP effects were estimated only once and applied successively over 20 generations of selection. This is rarely done in practice and retraining is usually carried out after each generation of selection. With retraining, the advantage of the dominance model is expected to be much smaller or may even be ignored because in that case, the additive and BSAM models are also expected to give estimated allele substitution effects using allele frequencies from the current or recent generations. However, if the training population consists of individuals from multiple generations, the estimates of substitution effects in the additive model or BSAM will depend on allele frequencies across multiple generations, which may not be appropriate to predict crossbred performance for the purebred candidates.
Although the dominance model was studied here when purebreds were selected for crossbred performance, the advantages of this model over the additive model also apply to selection for purebred performance. Furthermore, the estimates of a and d from the dominance model can be used to predict GEBV in any population with SNP genotypes, provided the LD in the training and candidate populations are about the same. It has been found that the accuracy of GEBV in Holstein-Friesian cattle with training in Jerseys and vice versa was as low as -0.1 to 0.3 over traits [27], and the correlations of SNP allele frequencies between Holsteins, Jerseys, and Brown Swiss cattle were as low as 0.65 to 0.67 [28]. Thus, differences in allele frequencies between breeds, along with dominant gene action, can be attributed to the low accuracy of prediction across breeds in dairy cattle. In this situation, the dominance model is expected to give better results, especially with a high marker density, since then LD would be more consistent between breeds.
Besides dominance, other types of non-additive effects may also contribute to heterosis, such as epistasis, imprinting, etc. The dominance model can be further extended to account for imprinting if the heterozygotes are phased. Epistatic interactions between loci are partially included in BSAM as a component of allele substitution effects but will be misspecified in the dominance model, which will impair accuracy of selection. Thus, BSAM may be more robust in the presence of epistasis.
Conclusions
When dominance, particularly overdominance, is the key driver of heterosis, using a dominance model for GS is expected to result in greater cumulative response to selection of purebred animals for crossbred performance than either BSAM or the additive model. The extent of this additional response to selection depends on the size of dominance effects at the QTL and the power of detecting the dominance effects through SNP genotypes. Also, when there are many loci, it is important to exploit dominance, even in the absence of overdominance. When BayesC Π is used, the dominance model is robust because, even in the absence of dominance, it does not give a lower response than the additive model. BSAM is favored over the additive model only when dominance effects are large enough to overwhelm its shortcomings. Furthermore, implementation of BSAM requires that the breed origins of SNP alleles are known. Our results suggest that in the presence of dominant gene action, relative to BSAM and the additive model, GS with the dominance model is superior to maximize crossbred performance through purebred selection, especially when no retraining is carried out at each generation.
Appendix A
Scaling procedure for QTL additive and dominance effects
where a_{ j } is the QTL additive effect.
For scenarios allowing overdominance
This can be seen as a quadratic equation with variable t unknown. Thus, the scalar t for the additive effects can be obtained by solving this equation.
For scenarios without overdominance
Thus, the scaled additive and dominance effects are respectively, ${a}_{j}^{\ast}=\sqrt{c}{a}_{j}$ and ${d}_{j}^{\ast}=\sqrt{c}{d}_{j}$. With this approach, the additive genetic variance is the one desired but the dominance genetic variance is not under control. However, as desired, there would be no overdominance affecting the trait.
Appendix B
Hypothesis test for the GS model effect
where y_{ ij } is the response from GS model i in simulation j, m_{ i } is the fixed effect for GS model i = {1, 2, 3}, ${b}_{j}\sim N(0,{\sigma}_{b}^{2})$ is the random blocking effect for simulation set j = {1, …, 16}, and ${e}_{\mathit{\text{ij}}}\sim N(0,{\sigma}_{e}^{2})$ is the residual. A set of t-tests was used for testing the null hypothesis that 1) m_{1} = m_{2}, 2) m_{1} = m_{3}, or 3) m_{2} = m_{3}.
Appendix C
Full conditionals for some key model parameters
where θ_{ else } denotes other parameters besides δ_{j,β} and β_{ j }. The “likelihood” in (22) can be obtained by integrating β_{ j } out from the sampling distribution as described below.
where w is either v or u corresponding to δ_{j,d} or δ_{j,a}. Substituting (24-26) in (22) gives the marginal distribution of δ_{j,β} respecting to β_{ j }.
which is a normal $N({\widehat{d}}_{j}+\frac{\lambda}{{C}_{j,d}}{\mu}_{d},\frac{{\sigma}_{e}^{2}}{{C}_{j,d}})$ where $\lambda =\frac{{\sigma}_{e}^{2}}{{\sigma}_{d}^{2}}$.
Thus, due to the conjugacy, the full conditional for ${\sigma}_{d}^{2}$ is also a scaled inverse Chi-square $\sim {\stackrel{~}{S}}_{d}{\chi}_{{\stackrel{~}{\nu}}_{d}}^{-2}$ where ${\stackrel{~}{\nu}}_{d}=k+{\nu}_{d}$ and ${\stackrel{~}{S}}_{d}=\frac{{(\mathit{d}-1{\mu}_{d})}^{\prime}(\mathit{d}-1{\mu}_{d})+{\nu}_{d}{S}_{d}}{{\stackrel{~}{\nu}}_{d}}$ given the sampled value of μ_{ d }.
which is a normal $\sim N\left(\frac{{1}^{\prime}\mathit{d}+\mathrm{\varphi \eta}}{k+\varphi},\frac{{\sigma}_{d}^{2}}{k+\varphi}\right)$.
The full conditionals for the other parameters including additive effect for SNP j and its variance variable are as illustrated in [6].
Declarations
Acknowledgements
The research was funded by USDA-NRI Grant 2007-35205-17862.
Authors’ Affiliations
References
- Meuwissen TH, Hayes BJ, Goddard ME: Prediction of total genetic value using genome-wide dense marker maps. Genetics. 2001, 157: 1819-1829.PubMed CentralPubMed
- VanRaden PM, Van Tassell CP, Wiggans GR, Sonstegard TS, Schnabel RD, Taylor JF, Schenkel FS: Invited review: reliability of genomic predictions for North American Holstein bulls. J Dairy Sci. 2009, 92: 16-24. 10.3168/jds.2008-1514.View ArticlePubMed
- Hayes BJ, Bowman PJ, Chamberlain AJ, Goddard ME: Invited review: Genomic selection in dairy cattle: progress and challenges. J Dairy Sci. 2009, 92: 433-443. 10.3168/jds.2008-1646.View ArticlePubMed
- Habier D, Fernando RL, Dekkers JCM: The impact of genetic relationship information on genome-assisted breeding values. Genetics. 2007, 177: 2389-2397.PubMed CentralPubMed
- Habier D, Tetens J, Seefried FR, Lichtner P, Thaller G: The impact of genetic relationship information on genomic breeding values in German Holstein cattle. Genet Sel Evol. 2010, 42: 5-10.1186/1297-9686-42-5.PubMed CentralView ArticlePubMed
- Habier D, Fernando RL, Kizilkaya K, Garrick DJ: Extension of the Bayesian alphabet for genomic selection. BMC Bioinformatics. 2011, 12: 186-10.1186/1471-2105-12-186.PubMed CentralView ArticlePubMed
- Daetwyler HD, Villanueva B, Bijma P, Woolliams JA: Inbreeding in genome-wide selection. J Anim Breed Genet. 2007, 124: 369-376. 10.1111/j.1439-0388.2007.00693.x.View ArticlePubMed
- Dekkers JCM: Marker-assisted selection for commercial crossbred performance. J Anim Sci. 2007, 85: 2104-2114. 10.2527/jas.2006-683.View ArticlePubMed
- Ibanez-Escriche N, Fernando RL, Toosi A, Dekkers JCM: Genomic selection of purebreds for crossbred performance. Genet Sel Evol. 2009, 41: 12-10.1186/1297-9686-41-12.PubMed CentralView ArticlePubMed
- Kinghorn BP, Hickey JM, van der Werf JHJ: Reciprocal recurrent genomic selection (RRGS) for total genetic merit in crossbred individuals. Proceedings of the 9th World Congress on Genetics Applied to Livestock Production: 1-6 August 2010; Leipzig. Paper 36. 2010,http://www.kongressband.de/wcgalp2010/assets/html/0036.htm,
- Kinghorn BP, Hickey JM, van der Werf JHJ: Long-range phasing and use of crossbred data in genomic selection. Proceedings of the 7th European Symposium on Poultry Genetics: 5-7 October 2011; Peebles Hydro, near Edinburgh. Paper 1. 2011, 7-9.http://www.roslin.ed.ac.uk/7espg/assets/7espg-edited-proceedings.pdf,
- Falconer DS, Mackay TFC: Introduction to Quantitative Genetics. 1996, Harlow: Longmans Green
- Charlesworth D, Willis JH: The genetics of inbreeding depression. Nat Rev Genet. 2009, 10: 783-796. 10.1038/nrg2664.View ArticlePubMed
- Dekkers JCM: Breeding values for identified quantitative trait loci under selection. Genet Sel Evol. 1999, 31: 421-436. 10.1186/1297-9686-31-5-421.PubMed CentralView Article
- Karlin S: Theoretical aspects of genetic map functions in recombination processes. Human Population Genetics: The Pittsburgh Symposium. Edited by: Chakravarti A. 1984, New York: Van Nostrand Reinhold, 209-228.
- Bennewitz J, Meuwissen THE: The distribution of QTL additive and dominance effects in porcine F2 crosses. J Anim Breed Genet. 2010, 127: 171-179. 10.1111/j.1439-0388.2009.00847.x.View ArticlePubMed
- Kacser H, Burns JA: The molecular basis of dominance. Genetics. 1981, 97: 639-666.PubMed CentralPubMed
- Caballero A, Keightley PD: A pleiotropic nonadditive model of variation in quantitative traits. Genetics. 1994, 138: 883-900.PubMed CentralPubMed
- Misztal I, Lawlor TJ, Gengler N: Relationships among estimates of inbreeding depression, dominance and additive variance for linear traits in Holsteins. Genet Sel Evol. 1997, 29: 319-326. 10.1186/1297-9686-29-3-319.PubMed CentralView Article
- Cundiff LV, Gregory KE: What is systematic crossbreeding?. Proceedings of the National Cattlemen’s Beef Association: 11 February 1999. 1999, Charlotte, [http://beefmagazine.com/mag/beef_systematic_crossbreeding]
- Fernando RL, Habier D, Stricker C, Dekkers JCM, Totir LR: Genomic selection. Acta Agric Scand A Anim Sci. 2008, 57: 182-195.
- Fernando RL, Garrick DJ: GenSel - User manual for a portfolio of genomic selection related analyses. [http://bigs.ansci.iastate.edu/bigsgui]
- Dekkers JCM, Chakraborty R: Optimizing purebred selection for crossbred performance using QTL with different degrees of dominance. Genet Sel Evol. 2004, 36: 297-324. 10.1186/1297-9686-36-3-297.PubMed CentralView ArticlePubMed
- de Roos APW, Hayes BJ, Goddard ME: Reliability of genomic predictions across multiple populations. Genetics. 2009, 183: 1545-1553. 10.1534/genetics.109.104935.PubMed CentralView ArticlePubMed
- de Roos APW, Hayes BJ, Spelman RJ, Goddard ME: Linkage disequilibrium and persistence of phase in Holstein-Friesian, Jersey and Angus cattle. Genetics. 2008, 179: 1503-1512. 10.1534/genetics.107.084301.PubMed CentralView ArticlePubMed
- Meadows JRS, Chan EKF, Kijas JW: Linkage disequilibrium compared between five populations of domestic sheep. BMC Genet. 2008, 9: 61-PubMed CentralView ArticlePubMed
- Harris B, Johnson D, Spelman R: Genomic selection in New Zealand and the implications for national genetic evaluation. Proceedings of the Interbull Meeting: 16-19 June 2008; Niagara Falls. Edited by: Sattler JD. 2008, Department of Animal Breeding and Genetics, 325-330.
- VanRaden P, Olson K, Wiggans G, Cole J, Tooker M: Genomic inbreeding and relationships among Holsteins, Jerseys, and Brown Swiss. J Dairy Sci. 2011, 94: 5673-5682. 10.3168/jds.2011-4500.View ArticlePubMed
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.