Skip to main content

Investigating the impact of preselection on subsequent single-step genomic BLUP evaluation of preselected animals

Abstract

Background

Preselection of candidates, hereafter referred to as preselection, is a common practice in breeding programs. Preselection can cause bias and accuracy loss in subsequent pedigree-based best linear unbiased prediction (PBLUP). However, the impact of preselection on subsequent single-step genomic BLUP (ssGBLUP) is not completely clear yet. Therefore, in this study, we investigated, across different heritabilities, the impact of intensity and type of preselection on subsequent ssGBLUP evaluation of preselected animals.

Methods

We simulated a nucleus of a breeding programme, in which a recent population of 15 generations was produced with PBLUP-based selection. In generation 15 of this recent population, the parents of the next generation were preselected using several preselection scenarios. These scenarios were combinations of three intensities of preselection (no, high or very high preselection) and three types of preselection (genomic, parental average or random), across three heritabilities (0.5, 0.3 or 0.1). Following each preselection scenario, a subsequent evaluation was performed using ssGBLUP by excluding all the information from the preculled animals, and these genetic evaluations were compared in terms of accuracy and bias for the preselected animals, and in terms of realized genetic gain.

Results

Type of preselection affected selection accuracy at both preselection and subsequent evaluation stages. While preselection accuracy decreased, accuracy in the subsequent ssGBLUP evaluation increased, from genomic to parent average to random preselection scenarios. Bias was always negligible. Genetic gain decreased from genomic to parent average to random preselection scenarios. Genetic gain also decreased with increasing intensity of preselection, but only by a maximum of 0.1 additive genetic standard deviation from no to very high genomic preselection scenarios.

Conclusions

Using ssGBLUP in subsequent evaluations prevents preselection bias, irrespective of intensity and type of preselection, and heritability. With GPS, in addition to reducing the phenotyping effort considerably, the use of ssGBLUP in subsequent evaluations realizes only a slightly lower genetic gain than that realized without preselection. This is especially the case for traits that are expensive to measure (e.g. feed intake of individual broiler chickens), and traits for which phenotypes can only be measured at advanced stages of life (e.g. litter size in pigs).

Background

Selection of the parents of the next generation usually takes place in two or more stages (e.g. [1,2,3].), and the term ‘preselection’ is used to refer to the early stages of selection (e.g. [3,4,5]). Preselection is a common practice in the nuclei of breeding programs, where only a few hundred to a few thousand replacement animals are required per generation. In order to have a large pool of animals to select from, many more young animals are produced than the numbers required for producing the next generation. Preselection is done for different reasons for different traits. For traits that are difficult or expensive to measure (e.g. feed intake of individual broiler chickens), preselection is used to reduce phenotyping costs. For traits for which phenotypes can be measured only at advanced stages of life (e.g. litter size in pigs), preselection is used to reduce the cost of raising the animals until phenotyping. Traditionally, preselection has mostly been based on correlated trait(s) that can be measured easily and cheaply early in life (e.g. [1, 3, 6,7,8]). In the genomic era, preselection is often based on genomic estimated breeding values (GEBV) of young selection candidates, and in the literature this type of preselection is called genomic or genotypic preselection (GPS; e.g. [4, 5, 9]).

Before the introduction of genomic prediction [10], models for the genetic evaluation of animals were based on phenotypic and pedigree data. These models are generally easy to implement and run fast, but their limitation is that they provide low accuracies for animals without own phenotype (e.g. [11, 12]). With the progress in DNA technology, large-scale genotyping of animals became affordable and genomic information can now be included in genetic evaluations of animals, e.g. by using multi-step genomic evaluation models, where genomic and pedigree information are used in two separate steps [13]. Generally, multi-step genomic evaluation models estimate breeding values more accurately than pedigree-based models, but have the disadvantage of estimating breeding values for genotyped animals only (e.g. [11, 12].). Because the required reference population (animals with genotypes and phenotypes) for multi-step models are usually already selected, the breeding values obtained are biased (e.g. [11, 12].). In 2010, single-step genomic evaluation models were introduced as improvements over both pedigree-based and multi-step genomic models [14, 15]. Single-step models combine all available pedigree, genomic and phenotypic information and provide GEBV for all the animals regardless of whether the animals have phenotypes and/or genotypes. It has been shown that single-step models produce more accurate and less biased breeding values than pedigree-based and multi-step genomic models, even in the presence of selective genotyping and phenotyping (e.g. [16, 17]).

Preselection is known to result in a positive average Mendelian sampling (MS) term for the selected animals (e.g. [4, 18, 19]). Selection candidates that have a positive average MS term represent a violation of one of the assumptions of genetic evaluation models (i.e. that the expectation of the average MS of the observed offspring is zero). This has been reported to result in biased and less accurate estimated breeding values (EBV) in subsequent evaluations that are done using pedigree-based best linear unbiased prediction (PBLUP, e.g. [4, 18,19,20,21,22]). It is also known that when all the information on which preselection is based is included in the subsequent PBLUP evaluations, the impact of the violation of this assumption is usually alleviated, e.g. [20–22].

Single-step genomic BLUP (ssGBLUP) has been reported to handle GPS better than PBLUP. For example, Masuda et al. [5] reported lower genetic trends in milk, fat, and protein yields in genomically-preselected US Holsteins when the subsequent evaluations were performed with PBLUP than with ssGBLUP. These authors ([5]) used these differences in genetic trends between PBLUP and ssGBLUP as an evidence of preselection bias in PBLUP evaluations following GPS. Although Aguilar et al. [14] hypothesised that ssGBLUP could completely prevent preselection bias, to date, there is no study in the literature that compared results using the same data with and without preselection to investigate this hypothesis. The study of Masuda et al. [5] evaluated preselection bias in subsequent PBLUP and ssGBLUP evaluations, but did not include a scenario based on the complete data (without preselection), with which the other scenarios could be compared. Furthermore, the benefit of including the genotypes of the selection candidates discarded at the preselection stage–hereafter referred to as preculled animals–in subsequent ssGBLUP evaluations is still not clear. On the one hand, Shabalina et al. [23] concluded that including the genotypes of preculled animals in subsequent ssGBLUP evaluations improves accuracy in situations where (some of the) parents of the genotyped selection candidates are not genotyped. On the other hand, Koivula et al. [24] reported larger biases and losses in reliability in subsequent ssGBLUP evaluations when genotypes of preculled animals were included and most of the parents of the selection candidates were genotyped. Thus, our aim was to investigate the impact of preselection on subsequent evaluations of preselected animals, using ssGBLUP with all the information from the preculled animals excluded.

Methods

Data simulation

To achieve our aim, we simulated a nucleus of a breeding program with inputs from the international breeding companies that operate in the Netherlands, using QMSim [25]. The QMSim parameter file, with all the details of the simulation, are in Additional file 1. For each animal in the breeding program, a genome of 30 chromosomes each 100 cM long was simulated. Sixty thousand single nucleotide polymorphisms (SNPs) and 3000 quantitative trait loci (QTL) were evenly distributed across the entire genome, and the QTL effects were randomly drawn from a gamma distribution with a shape parameter of 0.4. The simulation started with a historical population, to establish mutation-drift equilibrium and linkage disequilibrium among markers and QTL. The historical population had 3000 generations of random mating, starting with 2500 female and 2500 male animals (both sexes were equally represented throughout the simulation). The size of the historical population decreased linearly until it reached 50 animals at generation 2997, and then increased and reached 5000 animals again at generation 3000. The founder population, which comprised 100 males and 1000 females, was randomly selected from the 3000th historical generation. Then, from this founder population, 15 (recent) generations of artificial selection were simulated. In each generation, 100 males and 1000 females were selected and mated to produce the next generation of 16,000 animals. Within sex, all selected parents contributed equally to the next generation. Selection was based on EBV, and the mating design aimed at minimising inbreeding by using minimum co-ancestry matings as described in [26], which minimize the average relationship among all sires and dams, and therefore also among their offspring. There was no preselection during the production of these 15 generations, thus information on all the animals (including the culled animals) was used to inform selection decisions. The breeding goal consisted of a single quantitative trait that was measured in both sexes. Simulations were carried out with heritabilities of 0.5, 0.3 and 0.1, to represent breeding goal traits with high, medium and low heritabilities, respectively. Pedigree of all animals (from generations 0 to 15), genotypes of all animals in generations 13 to 15 and phenotypes of all animals in generations 11 to 15 were used in this study.

Implementation of preselection

Preselection was implemented in generation 15 by performing several scenarios, which were combinations of three intensities of preselection and three types of preselection, across the three simulated heritabilities. An overview of these preselection scenarios is in Table 1. The three intensities of preselection investigated were no preselection (control), high preselection, and very high preselection. With no preselection, all the selection candidates (animals produced in generation 15) were kept until the subsequent genetic evaluation; thus this scenario mimicked single-stage selection. With high preselection, 10% of the male and 15% of the female selection candidates were preselected. With very high preselection, 5% of the male and 12.5% of the female selection candidates were preselected. The choice of these intensities of preselection was informed by the information that we obtained from the international breeding companies operating in the Netherlands. The three types of preselection were GPS, parent average preselection (PAPS) and random preselection (RPS). Details of the information used in each preselection type are in Table 2. Briefly, with GPS, GEBV of the selection candidates were used, which were estimated by ssGBLUP, with the phenotypes of the selection candidates excluded from the model. With PAPS, average parental GEBV of the selection candidates were used, which were estimated by ssGBLUP, with the genotypes and phenotypes of the selection candidates excluded from the model. As the name implies, RPS preselects the selection candidates randomly, and in this study, we used it to investigate the impact of reducing the number of selection candidates per se. The GEBV used in performing preselection in all scenarios of GPS and PAPS were estimated by the ssGBLUP procedure of MiXBLUP [27].

Table 1 Overview of the various preselection scenarios implemented
Table 2 Details of the information used in the different types of preselection

Subsequent genetic evaluation

Following each preselection scenario, we performed a subsequent genetic evaluation with ssGBLUP. The subsequent evaluations included pedigree information of all the animals from generation 0 to preselected generation 15, genotypes of all the animals from generation 13 to preselected generation 15 and phenotypes of all the animals from generation 11 to preselected generation 15. This means that no information from the preculled animals was used in the subsequent evaluations. These (subsequent) evaluations provided the breeding values that were used to finally select the 100 males and 1000 females in generation 15 that become the parents of the next generation. MiXBLUP [27] was also used in these (subsequent) evaluations. Each step (simulation of the breeding program, implementation of preselection and subsequent genetic evaluations) was replicated 10 times.

Implementation of single-step GBLUP

In order to make sure that any observed bias and loss in accuracy in our results were due to preselection, all other known possible sources of bias and loss in accuracy in ssGBLUP evaluations were accounted for. Thus, the inverse of our combined pedigree-genomic relationship matrix (\({\mathbf{H}}^{ - 1}\)) was as follows:

$${\mathbf{H}}^{ - 1} = {\mathbf{A}}^{ - 1} + \left[ {\begin{array}{*{20}c} 0 & 0 \\ 0 & {\left( {0.9{\mathbf{G}}_{{\mathbf{t}}} + 0.1{\mathbf{A}}_{22} } \right)^{ - 1} - {\mathbf{A}}_{22}^{ - 1} } \\ \end{array} } \right].$$

where \({\mathbf{A}}^{ - 1}\) is the inverse of pedigree relationship matrix, and \({\mathbf{A}}_{22}\) is the pedigree relationship matrix among genotyped animals. To avoid the bias that is caused by not considering inbreeding in the construction of \({\mathbf{A}}^{ - 1}\) and \({\mathbf{A}}_{22}\) [28], we considered inbreeding in both \({\mathbf{A}}^{ - 1}\) and \({\mathbf{A}}_{22}\), and the inbreeding coefficients were calculated using the algorithm of Meuwissen and Luo [29]. \({\mathbf{G}}_{{\mathbf{t}}}\) is the adjusted genomic relationship matrix that was obtained according to the FST method described by Powell et al. [30] and Vitezica et al. [12] and aimed at setting the average genomic inbreeding equal to the average pedigree inbreeding as follows:

$${\mathbf{G}}_{t} \, = \,\left( {1 - \overline{{f_{p} }} } \right)\,{\mathbf{G}}_{r} \, + \,2\overline{{f_{p} }} \,{\mathbf{J}}$$

where \(\overline{{f_{p} }}\) is the average pedigree inbreeding coefficient across genotyped animals, \({\mathbf{G}}_{{\mathbf{r}}}\) is the raw genomic relationship matrix computed following the first method of VanRaden [31], and \({\mathbf{J}}\) is a matrix of 1s. To obtain \({\mathbf{G}}_{{\mathbf{r}}}\), we calculated allele frequencies using all the available genotypic data, and set the minor allele frequency threshold at 0.005.

The additive genetic and residual variances supplied to MiXBLUP (per heritability, per replicate) were estimated by fitting an animal model in ASReml [32]. To obtain these variances, we used the pedigree of all the animals in generations 0 to 14 and the phenotypes of all the animals in generations 11 to 14 (i.e. the available pedigree and phenotypic information at the time the selection candidates were born). The full MiXBLUP instruction file for the ssGBLUP analysis is included in Additional file 2.

Indicators of model performance across preselection scenarios

The following indicators of model performance were estimated for each preselection scenario and compared among the scenarios.

(Pre)selection accuracy

Accuracy was calculated as the correlation between (G)EBV and true breeding values (TBV). After running the preselection model, preselection accuracy was calculated based on all the selection candidates, whereas after running the subsequent genetic evaluation model, the subsequent selection accuracy was computed based only on the preselected animals.

Bias

Bias was measured in two ways. First, the absolute bias was calculated as the difference between mean TBV and mean (G)EBV of all the preselected animals, and expressed in additive genetic standard deviation (SD) units. If there is no absolute bias, the difference is 0. A negative difference means on average (G)EBV overestimate TBV, and a positive difference means that on average (G)EBV underestimate TBV. In order to make TBV comparable to (G)EBV, we subtracted the mean TBV and the mean (G)EBV of the animals in generations 11 to 14 from the TBV and the (G)EBV of each of the preselected animals, respectively. Second, dispersion bias was measured as the regression coefficient of TBV on (G)EBV (bTBV,(G)EBV) of all preselected animals. If there is no dispersion bias, bTBV,(G)EBV is 1. A value of bTBV,(G)EBV lower than 1 means that variance of (G)EBV is inflated compared to variance of TBV, and a value of bTBV,GEBV higher than 1 means that variance of (G)EBV is deflated compared to variance of TBV.

Realised genetic gain (RGG)

The realised genetic gain (RGG) is the difference between the average TBV of the selected individuals in two subsequent generations, provided that each of the selected animal (per sex) contributes equally to the next generation. In this study, RGG is the difference between the average TBV of the 100 males and 1000 females that were subsequently selected in generation 15 and the average TBV of the 100 males and 1000 females selected in generation 14. For each generation, we computed averages within selected males and females, separately, and then took the average of these two averages. Here, we assumed that just as in the previous generations, all the subsequently selected animals of generation 15 would have equal contributions (per sex) to the next generation. To give RGG a reference point, it was expressed in units of additive genetic SD. In reality, RGG is estimated using (G)EBV, because TBV are not known. Any bias in (G)EBV could lead to bias in estimated RGG. Thus, we calculated RGG based on (G)EBV as well. These two parameters were named true realised genetic gain (TRGG) and estimated realised genetic gain (ERGG), respectively.

Results

Results of the genetic evaluations in which ssGBLUP was used in the subsequent evaluations are in Tables 3 and 4. The results in Table 3 are from the evaluations that were obtained with different intensities of GPS and different heritabilities. The results in Table 4 are from the evaluations that were obtained with different intensities and types of preselection, all with a heritability of 0.1.

Table 3 ssGBLUP performance a, with different heritabilities and GPSb intensities
Table 4 ssGBLUP performance, with different preselection types and intensities, all with a heritability of 0.1

(Pre)selection accuracy

Preselection accuracy

Within the same heritability and type of preselection, preselection accuracy was the same for the high and very high intensities of preselection (Tables 3 and 4). GPS provided a higher preselection accuracy (0.71) than PAPS (0.44), and as expected, RPS provided a preselection accuracy equal to zero (Table 4).

Subsequent selection accuracy

For a given heritability, subsequent selection accuracy was always highest without preselection. It decreased with preselection (ranging from 0.80 to 0.48 for the scenarios with a heritability of 0.1), but within the same type of preselection, it remained similar across high and very high intensities of preselection (Tables 3 and 4). For a given heritability, subsequent selection accuracy increased from GPS to PAPS, and from PAPS to RPS (Table 4).

Bias

Both absolute and dispersion bias were always numerically very small, and often not statistically significant. The highest observed absolute bias was 0.05 genetic SD units, and the highest deviation of the \({\text{b}}_{{{\text{TBV}},{\text{GEBV}}}}\) from 1 (indicator of dispersion bias) was 0.06. Thus, the impacts of intensity of preselection and type of preselection on bias are considered negligible across all heritabilities.

Realised genetic gain

With the same heritability and type of preselection, RGG (both TRGG and ERGG) always decreased with increasing intensity of preselection (Tables 3 and 4). With the same intensity of preselection, RGG decreased from GPS to PAPS and from PAPS to RPS (Table 4), and ranged from 0.39 to 1.38 genetic SD (TRGG) and 0.37 to 1.36 genetic SD (ERGG) for the scenarios with heritability of 0.1. Irrespective of intensity of preselection, type of preselection, and heritability, ERGG was never statistically different from its corresponding TRGG (Tables 3 and 4).

Discussion

In this study, we investigated, for different heritabilities, the impact of intensity and type of preselection on the subsequent evaluation of preselected animals in terms of selection accuracy, bias and genetic gain, using ssGBLUP with all the information from preculled animals excluded. We implemented only one stage of preselection and only one type of preselection at a time, to clearly identify the impact of each type and intensity of preselection. However, in reality, most breeding programs involve at least two stages of preselection, i.e. a first preselection of elite families using PAPS and then genotyping some members of these elite families for performing GPS. In addition, female selection candidates may not be genotyped in all cases. It is expected that, in the near future, genotyping costs will become so cheap that breeding companies will decide to genotype all their selection candidates [33]. In addition, based on our findings (i.e. that GPS hardly leads to any significant loss of genetic gain whereas PAPS does), breeding companies may become more inclined to genotype all their selection candidates so that they can perform GPS as the only type of preselection.

Bias

We observed negligible bias in our subsequent evaluations with ssGBLUP. Patry and Ducrocq [4] have shown that PBLUP following GPS underestimates the genetic trend and decreases the accuracy of EBV of young bulls and of their daughters. Therefore, we hypothesized that our observed lack of bias was due to using ssGBLUP in the subsequent evaluations. To show this, we repeated the subsequent evaluations for our preselection scenarios with a heritability of 0.1, this time using PBLUP, with all the other parameters left unchanged. The results of the PBLUP evaluations are in Table 5. Subsequent evaluations with ssGBLUP (Table 4) resulted in higher accuracies, lower or at least similar biases, and higher realized genetic gains than the corresponding PBLUP evaluations (Table 5). Without preselection or with RPS, bias (in both absolute and dispersion forms) was absent with PBLUP, just as with ssGBLUP. Without preselection, or with an ineffective preselection such as RPS (as shown from preselection accuracies in Tables 3, 4 and 5), no preselection bias is expected. However, with GPS and PAPS, where preselection was effective (as shown from preselection accuracies in Tables 3, 4 and 5), bias was always statistically significant with PBLUP (absolute bias ranging from 0.20 to 0.50 additive genetic SD, and bTBV,EBV ranging from 0.71 to 0.46), as opposed to being insignificant with ssGBLUP (absolute bias ranging from 0.03 to 0.04 additive genetic SD, and bTBV,EBV always not statistically different from 1). This comparison indeed confirms that with preselection, the observed bias in subsequent genetic evaluations based on PBLUP, is removed by using ssGBLUP.

Table 5 PBLUP performance, with different preselection types and intensities, all with heritability of 0.1

Subsequent selection accuracy

Subsequent selection accuracy decreased with preselection, and this is in line with the findings reported by Patry and Ducrocq [4] in PBLUP evaluations following GPS in dairy cattle breeding schemes. It is important to note that without preselection, the subsequent selection accuracy was calculated across many more animals (16,000) compared to the 2000 and 1400 animals, respectively, used to for high and very high preselection scenarios. Even when the subsequent selection accuracy in the scenario without preselection was calculated using only these 2000 or 1400 preselected animals, it was still higher than in the scenarios with preselection [see Additional file 3]. The explanation for this result is that each selection candidate had, on average, more full and half sibs at the subsequent genetic evaluation without preselection than in the high and with very high preselection scenarios, and the phenotypes of these additional full and half sibs added to the accuracy of the scenario without preselection. With different types of preselection, contrary to the trend that we observed with preselection accuracy, the subsequent selection accuracy increased from GPS to PAPS, and from PAPS to RPS, because the more accurate the preselection was, the lower the additive genetic variance left in the preselected animals [34, 35], which in turn reduced selection accuracy [35].

Realised genetic gain (RGG)

We observed a decrease in RGG (both TRGG and ERGG) as intensity of preselection increased. The reason for this is that as intensity of preselection increased, more of the best animals (in terms of TBV) were lost during preselection, since preselection was never 100% accurate. Other studies have reported a similar trend, i.e. a reduction in genetic gain with an increasing intensity of preselection, and offered similar explanations (e.g. [2, 36,37,38]). With different types of preselection, we observed that RGG depended more on preselection accuracy than on subsequent selection accuracy, and therefore RGG had a trend that was more similar to the trend of preselection accuracy than to that of subsequent selection accuracy (Table 4). The reason is that among preselection types, variation in preselection accuracy was larger than that in subsequent selection accuracy (Table 4), due to different sources of information used in each preselection type (Table 2). In the subsequent genetic evaluations, irrespective of the type of preselection, the model used all three sources of information, i.e. pedigree, genotypes and phenotypes of the preselected candidates. This explains why RGG was always higher with GPS, than with PAPS, and why the lowest genetic gain was recorded with RPS. Schrooten et al. [2] also reported a larger impact of preselection accuracy than of subsequent selection accuracy on genetic gain in dairy cattle breeding schemes.

GPS and RGG

The decrease in RGG from no preselection to high and very high GPS scenarios was always small. Specifically, TRGG and ERGG decreased by 3.3 to 8.7% and 2.8 to 5.9%, respectively, from no preselection to high GPS, depending on heritability (Table 3). With the very high intensity of preselection, the number of females required to produce the next generation in this study (1000 females) was already reached at the preselection stage, thus there was no selection in females at the subsequent selection stage. TRGG and ERGG decreased, by 5.2 to 10.1% and 4.1 to 7.4%, respectively, from no preselection to very high GPS, depending on heritability (Table 3). These results show that, with ssGBLUP evaluations following GPS, it is possible to achieve a level of genetic gain that is similar to that achieved without preselection. This is especially important for traits that are expensive to measure (e.g. feed intake of individual broiler chickens), and traits for which phenotypes can only be measured at advanced stages of life (e.g. litter size in pigs). For such traits, GPS enables saving on the cost of phenotyping the preculled animals, and on the cost of raising the preculled animals in the expensive nucleus environments of breeding programs.

Conclusions

Using ssGBLUP in subsequent genetic evaluations prevents preselection bias, irrespective of intensity and type of preselection, and heritability. With GPS, in addition to reducing the phenotyping effort considerably, the use of ssGBLUP in subsequent genetic evaluations realizes only a slightly lower genetic gain than that realized without preselection. This is especially the case for traits that are expensive to measure (e.g. feed intake of individual broiler chickens), and traits for which phenotypes can only be measured at advanced stages of life (e.g. litter size in pigs).

Availability of data and materials

The codes used in generating the data used in this study are attached to this article as Additional file 1.

References

  1. 1.

    Appel LJ, Strandberg E, Danell B, Lundeheim N. Adjusting for missing data due to culling before testing in genetic evaluations of swine. J Anim Sci. 1998;76:1794–802.

    CAS  Article  Google Scholar 

  2. 2.

    Schrooten C, Bovenhuis H, van Arendonk JAM, Bijma P. Genetic progress in multistage dairy cattle breeding schemes using genetic markers. J Dairy Sci. 2005;88:1569–81.

    CAS  Article  Google Scholar 

  3. 3.

    Janhunen M, Kause A, Vehviläinen H, Nousiainen A, Koskinen H. Correcting within-family pre-selection in genetic evaluation of growth-a simulation study on rainbow trout. Aquaculture. 2014;434:220–6.

    Article  Google Scholar 

  4. 4.

    Patry C, Ducrocq V. Evidence of biases in genetic evaluations due to genomic preselection in dairy cattle. J Dairy Sci. 2011;94:1011–20.

    CAS  Article  Google Scholar 

  5. 5.

    Masuda Y, VanRaden PM, Misztal I, Lawlor TJ. Differing genetic trend estimates from traditional and genomic evaluations of genotyped animals as evidence of preselection bias in US Holsteins. J Dairy Sci. 2018;101:5194–206.

    CAS  Article  Google Scholar 

  6. 6.

    Meyer K, Thompson R. Bias in variance and covariance component estimators due to selection on a correlated trait. Zeitschrift für Tierzüchtung und Züchtungsbiologie. 1984;101:33–50.

    Article  Google Scholar 

  7. 7.

    Jensen J, Mao IL. Estimation of genetic parameters using sampled data from populations undergoing selection. J Dairy Sci. 1991;74:3544–51.

    CAS  Article  Google Scholar 

  8. 8.

    Árnason T, Albertsdóttir E, Fikse WF, Eriksson S, Sigurdsson Á. Estimation of genetic parameters and response to selection for a continuous trait subject to culling before testing. J Anim Breed Genet. 2012;129:50–9.

    Article  Google Scholar 

  9. 9.

    Mäntysaari EA, Liu Z, VanRaden P. Interbull validation test for genomic evaluations. Interbull Bull. 2010;41:17–22.

    Google Scholar 

  10. 10.

    Meuwissen THE, Hayes BJ, Goddard ME. Prediction of total genetic value using genome-wide dense marker maps. Genetics. 2001;157:1819–29.

    CAS  PubMed  PubMed Central  Google Scholar 

  11. 11.

    Wolc A, Kranis A, Arango J, Settar P, Fulton JE, O’Sullivan NP, et al. Implementation of genomic selection in the poultry industry. Anim Front. 2016;6:23–31.

    Article  Google Scholar 

  12. 12.

    Vitezica ZG, Aguilar I, Misztal I, Legarra A. Bias in genomic predictions for populations under selection. Genet Res. 2011;93:357–66.

    CAS  Article  Google Scholar 

  13. 13.

    VanRaden PM, Van Tassell CP, Wiggans GR, Sonstegard TS, Schnabel RD, Taylor JF, et al. Invited review: Reliability of genomic predictions for North American Holstein bulls. J Dairy Sci. 2009;92:16–24.

    CAS  Article  Google Scholar 

  14. 14.

    Aguilar I, Misztal I, Johnson DL, Legarra A, Tsuruta S, Lawlor TJ. Hot topic: a unified approach to utilize phenotypic, full pedigree, and genomic information for genetic evaluation of Holstein final score. J Dairy Sci. 2010;93:743–52.

    CAS  Article  Google Scholar 

  15. 15.

    Christensen OF, Lund MS. Genomic prediction when some animals are not genotyped. Genet Sel Evol. 2010;42:2.

    Article  Google Scholar 

  16. 16.

    Misztal I, Aggrey SE, Muir WM. Experiences with a single-step genome evaluation. Poult Sci. 2013;92:2530–4.

    Article  Google Scholar 

  17. 17.

    Legarra A, Christensen OF, Aguilar I, Misztal I. Single step, a general approach for genomic selection. Livest Sci. 2014;166:54–65.

    Article  Google Scholar 

  18. 18.

    Sullivan PG. Mendelian sampling variance tests with genomic preselection. Interbull Bull. 2018;54:1–4.

    Google Scholar 

  19. 19.

    Tyrisevä A-M, Mäntysaari EA, Jakobsen J, Aamand GP, Dürr J, Fikse WF, et al. Detection of evaluation bias caused by genomic preselection. J Dairy Sci. 2018;101:3155–63.

    Article  Google Scholar 

  20. 20.

    Henderson CR. Best linear unbiased estimation and prediction under a selection model. Biometrics. 1975;31:423–47.

    CAS  Article  Google Scholar 

  21. 21.

    Pollak EJ, van der Werf J, Quaas RL. Selection bias and multiple trait evaluation. J Dairy Sci. 1984;67:1590–5.

    Article  Google Scholar 

  22. 22.

    Patry C, Ducrocq V. Accounting for genomic pre-selection in national blup evaluations in dairy cattle. Genet Sel Evol. 2011;43:30.

    Article  Google Scholar 

  23. 23.

    Shabalina T, Pimentel ECG, Edel C, Plieschke L, Emmerling R, Götz K-U. Short communication: the role of genotypes from animals without phenotypes in single-step genomic evaluations. J Dairy Sci. 2017;100:8277–81.

    CAS  Article  Google Scholar 

  24. 24.

    Koivula M, Strandén I, Aamand GP, Mäntysaari EA. Reducing bias in the dairy cattle single-step genomic evaluation by ignoring bulls without progeny. J Anim Breed Genet. 2018;00:1–9.

    Google Scholar 

  25. 25.

    Sargolzaei M, Schenkel FS. QMSim: a large-scale genome simulator for livestock. Bioinformatics. 2009;25:680–1.

    CAS  Article  Google Scholar 

  26. 26.

    Sonesson AK, Meuwissen THE. Mating schemes for optimum contribution selection with constrained rates of inbreeding. Genet Sel Evol. 2000;32:231–48.

    CAS  Article  Google Scholar 

  27. 27.

    ten Napel J, Vandenplas J, Lidauer M, Stranden I, Taskinen M, Mäntysaari E, et al. MiXBLUP: a user-friendly softwarevfor large genetic evaluation systems. 2017. https://mixblup.eu/download.html. Accessed 06 August 2019.

  28. 28.

    Tsuruta S, Lourenco DAL, Misztal I, Lawlor TJ. Possible causes of inflation in genomic evaluations for dairy cattle. In Proceedings of the 11th World Congress on Genetics Applied to Livestock Production: 11-16 February 2018; Auckland. 2018. p. 1–6.

  29. 29.

    Meuwissen THE, Luo Z. Computing inbreeding coefficients in large populations. Genet Sel Evol. 1992;24:305–13.

    Article  Google Scholar 

  30. 30.

    Powell JE, Visscher PM, Goddard ME. Reconciling the analysis of ibd and ibs in complex trait studies. Nat Rev Genet. 2010;11:800–5.

    CAS  Article  Google Scholar 

  31. 31.

    VanRaden PM. Efficient methods to compute genomic predictions. J Dairy Sci. 2008;91:4414–23.

    CAS  Article  Google Scholar 

  32. 32.

    Gilmour AR, Gogel BJ, Cullis BR, Thompson R. ASReml user guide release 3.0. Hemel Hempstead: VSN Int. Ltd. 2009. p. 275.

  33. 33.

    Knol EF, Nielsen B, Knap PW. Genomic selection in commercial pig breeding. Anim Front. 2016;6:15–22.

    Article  Google Scholar 

  34. 34.

    Bulmer MG. The effect of selection on genetic variability. Am Nat. 1971;105:201–11.

    Article  Google Scholar 

  35. 35.

    Gomez-Raya L, Burnside EB. The effect of repeated cycles of selection on genetic variance, heritability, and response. Theor Appl Genet. 1990;79:568–74.

    CAS  Article  Google Scholar 

  36. 36.

    Martinez V, Kause A, Mäntysaari E, Mäki-Tanila A. The use of alternative breeding schemes to enhance genetic improvement in rainbow trout: ii. two-stage selection. Aquaculture. 2006;254:195–202.

    Article  Google Scholar 

  37. 37.

    Campo JL, de la Fuente MB. Efficiency of two-stage selection indices in tribolium. J Hered. 1991;82:228–32.

    Article  Google Scholar 

  38. 38.

    Xu S, Martin TG, Muid WM. Multistage selection for maximum economic return with an application to beef cattle breeding. J Anim Sci. 1995;73:669–710.

    Article  Google Scholar 

Download references

Acknowledgements

The authors thank Marco Bink, Katrijn Peters, Jeroen Visscher and Abe Huisman from Hendrix Genetics; Egbert Knol, Rob Bergsma and Egiel Hanenberg from Topigs Norsvin; John Henshall and Randy Borg from Cobb; and André M. Hidalgo, Chris Schrooten, and Gerben de Jong from CRV for their inputs towards the data simulation and implementation of preselection.

Funding

This study was financially supported by the Dutch Ministry of Economic Affairs (TKI Agri & Food project 16022) and the Breed4Food partners Cobb Europe, CRV, Hendrix Genetics and Topigs Norsvin. The use of the HPC cluster was made possible by CAT-AgroFood (Shared Research Facilities Wageningen UR).

Author information

Affiliations

Authors

Contributions

All authors participated in the conception and the design of the study and of the simulation and analysis of the dataset. IJ simulated and analysed the dataset and wrote the first draft of the manuscript, and the other authors revised the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Ibrahim Jibrila.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Additional file 1:

QMSim parameter file. The QMSim parameter file used to simulate the data used in this study.

Additional file 2

: MiXBLUP instruction file. The MiXBLUP instruction file used for data analysis in this study.

Additional file 3

. Accuracy of the no preselection (control) scenario in subsequent ssGBLUP evaluations, calculated across different animals. Accuracies of the no preselection (control) scenario in subsequent ssGBLUP evaluations, calculated across the different sets of selection candidates preselected by each preselection scenario.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Jibrila, I., ten Napel, J., Vandenplas, J. et al. Investigating the impact of preselection on subsequent single-step genomic BLUP evaluation of preselected animals. Genet Sel Evol 52, 42 (2020). https://doi.org/10.1186/s12711-020-00562-6

Download citation