Strategies for implementing genomic selection in family-based aquaculture breeding schemes: double haploid sib test populations
© Nirea et al.; licensee BioMed Central Ltd. 2012
Received: 25 April 2012
Accepted: 22 October 2012
Published: 30 October 2012
Simulation studies have shown that accuracy and genetic gain are increased in genomic selection schemes compared to traditional aquaculture sib-based schemes. In genomic selection, accuracy of selection can be maximized by increasing the precision of the estimation of SNP effects and by maximizing the relationships between test sibs and candidate sibs. Another means of increasing the accuracy of the estimation of SNP effects is to create individuals in the test population with extreme genotypes. The latter approach was studied here with creation of double haploids and use of non-random mating designs.
Six alternative breeding schemes were simulated in which the design of the test population was varied: test sibs inherited maternal (Mat), paternal (Pat) or a mixture of maternal and paternal (MatPat) double haploid genomes or test sibs were obtained by maximum coancestry mating (MaxC), minimum coancestry mating (MinC), or random (RAND) mating. Three thousand test sibs and 3000 candidate sibs were genotyped. The test sibs were recorded for a trait that could not be measured on the candidates and were used to estimate SNP effects. Selection was done by truncation on genome-wide estimated breeding values and 100 individuals were selected as parents each generation, equally divided between both sexes.
Results showed a 7 to 19% increase in selection accuracy and a 6 to 22% increase in genetic gain in the MatPat scheme compared to the RAND scheme. These increases were greater with lower heritabilities. Among all other scenarios, i.e. Mat, Pat, MaxC, and MinC, no substantial differences in selection accuracy and genetic gain were observed.
In conclusion, a test population designed with a mixture of paternal and maternal double haploids, i.e. the MatPat scheme, increases substantially the accuracy of selection and genetic gain. This will be particularly interesting for traits that cannot be recorded on the selection candidates and require the use of sib tests, such as disease resistance and meat quality.
In traditional aquaculture breeding schemes, selection for traits that cannot be measured on the selection candidates (e.g. disease resistance and fillet quality) is based on a performance test of sibs of the candidates, i.e. information on test sibs is used to calculate breeding values for the selection of parents. This is due to the fact that measuring meat quality traits requires killing of the fish and fish that have been challenge-tested for disease resistance cannot be used as breeding stock. However, with a sib test, only 50% of the total genetic variance of the candidates is exploited, perhaps less. Recently, with the advent of high-throughput genotyping of genetic markers, genomic selection has been taken up by animal breeders. With genomic selection, the total genetic value of the selection candidates is predicted based on the simultaneous estimation of single nucleotide polymorphism (SNP) effects using a set of individuals that have been genotyped and phenotyped. Compared to traditional genetic evaluation methodologies and marker-assisted selection, genomic selection can result in an increase in the accuracy of selection for an individual without a phenotype provided that the test set is sufficiently large and relevant to the selected population. Genomic selection is increasingly used in dairy cattle breeding[3–5] and in plant breeding[6, 7] but is not yet used in selective breeding in aquaculture.
Simulation studies have examined possible strategies to implement genomic selection in family-based aquaculture breeding schemes, generally following two stages: first, SNP marker effects are estimated in a test population consisting of sibs of the selection candidates and second, genome-wide breeding values of the genotyped selection candidates are estimated by summing up the estimated SNP marker effects. The benefits reported from these strategies are promising. In one study, in which sibs were performance-tested every generation, the accuracy of the estimated breeding values of selection candidates increased, which increased genetic gain because genetic gain is directly related to the accuracy of selection. In addition, genomic selection more than halved the rate of inbreeding compared to traditional BLUP selection using similar resources. A major contributor to these results is the accurate estimation of the within-family variance with genomic selection. Other studies have supported these findings.
Two factors are important for maximizing the accuracy of genomic breeding values: (1) increasing the precision of estimates of SNP marker effects in the test population; this can be achieved by increasing the number of animals in the test population and by increasing the number of SNP markers sufficiently to capture the genetic variance throughout the genome[10, 11]; (2) maximising the relationship between individuals in the test and candidate populations.
Another means of increasing the accuracy of estimates of SNP effects is to have the test population consist of extreme genotypes. This approach has been exploited with the use of double haploids for QTL mapping in fish. Double haploids are homozygous for all loci and thus achieve in a single generation more homozygosity than 10 generations of continuous full-sib mating. Double haploids are produced by chromosome manipulation techniques such as gynogenesis and androgenesis, which produce female and male double haploids, respectively. With both these techniques, the duplicated chromosomes can be combined either before (mitotic) or after recombination (meiotic). Although the availability of double haploids is a major advantage in fish breeding, a number of drawbacks have also been reported, including technological challenges, costs of implementation, and the low viability of the progeny due to inbreeding depression. An alternative approach to increasing homozygosity is to use non-random mating designs such as maximum coancestry mating.
Based on these considerations, it is hypothesized that designing a test population using double haploids or non-random mating can increase the accuracy of estimates of SNP effects in test sibs, which in turn will increase the accuracy of predicted breeding values when applied in genomic selection schemes. Given the reliance of many aquaculture schemes on sib testing, this hypothesis was tested by simulating a typical breeding scheme in fish.
Simulation of populations was carried out in two steps: (1) to create base populations (G0) with a set of genomic data and (2) to simulate breeding schemes derived from these base populations. Details are presented in the following section.
Simulation of the base population (generation G0)
We simulated a Fisher-Wright population with an effective population size of 1000 (500 males and 500 females) for 4000 generations to construct the base population G0. Four thousand generations has been shown to be sufficient to achieve mutation-drift equilibrium and stationary distributions of pair-wise linkage disequilibrium. Within each of these generations, 500 males and 500 females were produced by random selection and mating of a sire and dam, with replacement after each mating.
A diploid genome with 10 chromosomes of 1 Morgan (M) each was simulated. SNP mutations and recombinations were introduced every generation at a rate of 10-8 per base pair per meiosis, assuming 108 base pairs per M, which is close to the infinite sites mutation model. SNP were passed from parent to offspring following Mendelian inheritance and recombination followed the Haldane mapping function.
After 4000 generations, the G0 generation was created with Nm = 3000 and Nf = 3000 offspring obtained from the random mating of n s = 50 sires and n d = 50 dams with replacement. To obtain a reliable result, the base population was replicated 100 times. For each base population, six different breeding schemes were run. The average of these replicates was used for comparison of the breeding schemes. Quantitative SNP effects were simulated to attribute breeding values to each individual for the trait evaluated in the sib test, as described in the following section.
Simulation of generations for breeding schemes
Males and females from each G0 family were equally divided to create a test population and a candidate population, each with 3000 individuals. The phenotypes and genotypes of the test population were then evaluated using genomic evaluation techniques to estimate the SNP effects. Genomic estimated breeding values (EBV) for the candidates were computed based on their genotypes and these estimates. The best n s = 50 male and n d = 50 female candidates were selected to be sires and dams based on the EBV and randomly mated in pairs with n o = 120 offspring/pair to produce generation G1 with 6000 offspring. Generation G0 also acted as the base population for pedigree numerator relationships.
In G1, the procedure for G0 was repeated up to the point of mating. The offspring were divided into test and candidate populations. Genomic evaluation was carried out using the phenotypes of the test population to estimate SNP effects; the best n s = 50 male and n d = 50 female candidates were then selected. In the genomic evaluation of G1, the accumulated data from both G0 and G1 test populations were used to estimate SNP effects.
The selected males and females in G1 were then used to produce a candidate population and a test population in G2. The mating of the G2 individuals defined the different breeding schemes of the study. For all schemes, the 3000 individuals from the G2 candidate population were created by pair-wise random mating among the n s = 50 sires and n d = 50 dams. However, individuals in the test population were created either as diploids following random mating, or following minimum or maximum coancestry mating, or as double haploids as described in the following section. Finally, in G2, SNP effects were re-estimated using all the accumulated test data from G0, G1, and G2. Then, the genomic EBV of the candidates were calculated and comparisons among the different breeding schemes were made.
Deriving alternative test populations in G2
Two types of approaches were adopted to design the test population in G2: a diploid approach in which mating among G1 parents was managed, and a double haploid approach following the random mating among the G1 parents.
Three diploid approaches were simulated as follows:
Random mating (RAND)
Mating pairs were chosen from the n s and n d selected parents using random sampling without replacement to produce the G2 test population. Mating pairs were re-sampled from the same sets of parents to produce the G2 candidate population. This was done to allow fair comparisons with the assortative mating schemes described below.
Maximum coancestry (MaxC)
Using the selected males and females, mating pairs were chosen to maximize the average coancestry of the mates based on pedigree. First, a matrix of coancestries for all possible matings was constructed. Starting from an initial set of mating pairs, two pairs were chosen at random and their mates swapped. If this resulted in an increase in average coancestry, the swap was accepted, otherwise it was rejected. This was repeated until no further improvement was obtained.
Minimum coancestry (MinC)
Minimum coancestry mating was the same as maximum coancestry mating, except that mating pairs were designed to minimize average coancestry among the set of mates as in.
Three double haploid approaches were simulated. For all three approaches, the parents and mating pairs that created the G2 test and candidate populations were the same randomly selected group. In MatPat, 1500 progeny each were created by mitotic androgenesis from the G1 male parents, and by mitotic gynogenesis from the G1 female parents. Each parent produced 30 double haploid offspring. In Pat, all 3000 progeny came from the 50 male G1 parents by mitotic androgenesis, with 60 offspring per parent and no contribution from the female G1 parents. In Mat, G1 female parents contributed all the offspring by mitotic gynogenesis in a similar fashion as Pat.
Simulation of markers, quantitative trait loci and true breeding values
where z ijk is the number of copies of allele k (k = 0, 1 or 2) at the j th QTL locus of individual i and g jk is the sampled effect. Allelic effects were then scaled to set the total genetic variance (σ A 2) observed in G0 equal to 10.
Phenotypes with a heritability of 0.05, 0.1 or 0.4 were created by sampling environmental deviations from appropriate normal distributions and added these to the true breeding value. The m = 5000 SNP loci with the highest MAF, excluding those that had been selected to be QTL, were selected as markers and the test and candidate sibs were genotyped for these m markers.
Estimation of SNP effects
where p j is the frequency of allele “1” at the j th marker locus and xij 1* is the number of “1” alleles carried by individual i.
Standardization of SNP covariates x ij for the candidates was carried out using the same values of p j as used for the test animals, i.e. the estimates of frequencies were obtained using both candidate and test sets.
Outputs of the base populations obtained from the simulation of the founder ancestors were stored. Each replicate had its own base population. Averages of 100 replicates of each breeding scheme with different scenarios were compared in terms of inbreeding, genetic gain, accuracy of selection and variance reduction generated in G2. Inbreeding coefficients were computed using G0 as the base population.
Trend in genetic parameters
Genetic parameters in generation G2
Accuracy of selection
Accuracy of estimated breeding values for the candidate population in G2
h2 = 0.05
h2 = 0.1
h2 = 0.4
Double haploid scheme
Genetic gain generated in G2 in the candidate population
h2 = 0.05
h2 = 0.1
h2 = 0.4
Double haploid scheme
Level of inbreeding
Level of inbreeding generated in G2 in the candidate population
h2 = 0.05
h2 = 0.1
h2 = 0.4
Double haploid scheme
Fraction of genetic variance retained in G2 in the candidate
h2 = 0.05
h2 = 0.1
h2 = 0.4
Double haploid scheme
It has been reported that genomic selection in aquaculture breeding schemes can increase selection accuracy for candidates and increase genetic gain compared to traditional aquaculture sib testing schemes[8, 9, 19]. Our study shows that, when using genomic selection, creating double haploids as part of the process of sib testing can increase the selection accuracy of candidates and the genetic gain even more. These additional increases in accuracy and genetic gain were most dramatic with a low heritability (~22%) but were still substantial when heritability was 0.4 (~7%). However, this result was only obtained when both sexes were used to create double haploids for testing. When only one sex is double haploid, selection accuracy was reduced because only chromosome sets from the dam (sire) entered the test population and this was not offset by increasing the number of observations per chromosome set.
In this study, attempting to increase homozygosity above that obtained from random mating through maximum coancestry mating when breeding the test population had no detectable impact on genetic gain or inbreeding. This is because assortative mating was only done for one generation, which is unlikely to produce extreme genotypes. If breeding of the test population with the MaxC scheme had been continued for 10 generations or more, similar results to those obtained with the double haploids scheme would have been observed because it takes approximately 10 generations of continuous full sib mating to produce fully inbred lines. Clearly one generation of either MaxC or MinC is insufficient to deliver any benefit.
The increase in selection accuracy of the candidate population is due to the increase in accuracy of the estimates of SNP effects in the test population because the double haploids have more extreme genotypic values. It is important to note that the study design used here permits this inference because the non-random mating structure in the test population was not replicated in the candidate breeding population, which was always bred at random. Therefore, the increased predictive accuracy was due to the test design and not the results of differences in family structure amongst the candidates. For example, if MinC mating had been implemented in the candidate population, say for five generations, to improve the family structure, genetic effects would eventually have been better estimated and a substantial increase in selection accuracy might have occurred.
The benefit of the improved accuracy from generating extreme genotypes in the test population does come at a cost in robustness to the underlying genetic architecture. The design provides an estimate of a, i.e. half the difference between the genetic value of homozygotes. In this study, the allelic effects were simulated to be additive, so the estimates of an allelic substitution were not biased by the absence of heterozygotes. However, if the dominance deviation (d) is not equal to zero, the average effects obtained from homozygotes are biased by d(1-2p) where p is the minor allele frequency. This bias increases as p reduces. Most QTL are expected to have a low minor allele frequency, potentiating the bias. Presence of epistatic gene actions are expected to result in similar biases from estimations based on homozygotes only.
The advantage obtained with the MatPat scheme was achieved at a cost of reduced genetic variation in G2 compared to other schemes. However, the results show that this reduction in genetic variation was mainly generated by additional linkage disequilibrium created by the higher accuracy obtained with the MatPat scheme. First, inbreeding accounted for only a loss of 2% of the genetic variance and any difference in inbreeding between MatPat with the other schemes was not substantial. Under the infinitesimal model, the loss of genetic variance due to linkage disequilibrium would be ½ρ2 where ρ is the accuracy of selection. Thus with ρ = 0.6, predicted loss would be 18%. Using (1-½ρ2)(1-F) predicted losses in genetic variance observed in Table4 from the accuracies of selection presented in Table1 gives a very close approximation. Experience with the infinitesimal model has demonstrated that increasing selection intensity results in greater short and medium term genetic gain even though this also increases the linkage disequilibrium. However, this is not the case when selection is on a known or marked QTL along with estimates of unlinked polygenic effects, for which slowing fixation of the QTL has been shown to result in increased long-term gain. The existence of a trade-off between accuracy and long-term gain that is independent of inbreeding (genomic or pedigree) has not been reported to date.
There are two methodological approaches to produce double haploids, either mitotic or meiotic. In this study, mitotic gynogenesis and mitotic androgenesis were used. There are good reasons to expect that the extra genetic gains would have been somewhat less with meiotic gynogenesis and meiotic androgenesis because these technologies result in fish that are less homozygous than their mitotic counterparts, i.e. only a subset of their genotypes are homozygous.
Where A21 is the relationship between sib test individuals and one of the selection candidates, and P is a phenotypic covariance matrix for the sib test individuals, and V is the variance of relationships between the test sibs.
This shows the expected reliability (= squared accuracy) of genomic selection is approximately equal to the reliability of traditional EBV plus a term that increases with the variances of the deviations of genomic relationships from their expectations in the test population. The implications of this formula go further in defining conditions that maximize the expected reliability of genomic selection: (1) the training animals should be as little related as possible, which makes P-1 large; (2) the training animals should be as much related to the selection candidates as possible, which makes A21 large; and (3) deviations of the genomic relationships from the traditional relationships should be as large as possible, resulting in large V. Points (1) and (2) were also observed by. Double haploids have the same expected relationship with the candidates as diploid training animals (½), but the variance of their genomic relationship with the candidates is increased due to inbreeding. Thus, the increase in accuracy of genomic selection when using double haploids can be explained in two ways: (1) the more extreme regression factors in (Equation 2) allow SNP effects to be more accurately estimated, or (2) their more variable relationships with the candidates can be used by selection indices and BLUP to increase the accuracy of the EBV of the candidates.
Here, GS-BLUP was used for genetic evaluation. Similar outcomes in terms of increases in accuracy and genetic gain from double haploids and non-random mating are expected with other genomic evaluation methods that are currently used. For example, the BayesB method concentrates on certain important regions of the genome. Double haploid individuals are also double haploids for these regions of the genome. Therefore, the variance of genomic relationships between individuals in the test and candidate individuals also increases at these regions of the genome and thus delivers a more accurate estimation of SNP effects than the test population produced by random mating.
It is expected that the use of double haploid sib test populations increases the accuracy of genomic selection for any candidate population because use of double haploids achieves a more accurate estimation of the SNP effects in the test population by increasing the variance of the genomic relationships between the test and candidate population. This is expected to hold for any candidate populations. However, the selection candidates should not be completely different from the test population
Relevance of the study
This study shows the benefits of using double haploids as test sibs in aquaculture genomic selection breeding schemes. An increase of 7 to 19% in selection accuracy, leading to a 6 to 22% increase in genetic gain was obtained. This resulted from more accurate estimation of SNP effects and required a mixture of paternal and maternal double haploid test sibs in combination with genomic selection. Increases in accuracy and genetic gain from use of double haploids were greater with lower heritability levels. Therefore, the outputs of this study can be used to increase genetic gain for difficult traits such as disease resistance and meat quality, which cannot be recorded on selection candidates. In practical applications, eggs may be collected from the nucleus breeding parents and divided in two parts and 50% would be fertilized in a natural way, forming the candidate population, and the rest further divided into one half that is submitted to gynogenesis and the other half to androgenesis. This would result in a mixture of maternal and paternal double haploid genome fishes from each family in the test population.
However, there are practical problems to overcome for the implementation of double haploids in aquaculture breeding schemes, including biases from non-additive gene effects costs and other practical constraints. There may also be ethical and regulatory issues related to animal welfare associated with the use of chromosome manipulation techniques.
This study has shown that the use of double haploids that produce inbred test sibs for estimation of SNP effects significantly increased selection accuracy and genetic gain. This required a mixture of test sibs with maternal and paternal double haploid genomes. The approach yielded increases in selection accuracy of up to 19% and in genetic gain of up to 22%. The double haploid technique produces inbred fish in one generation, which increased the accuracy of the estimation of SNP effects. Another strategy, in which the test population was designed based on non-random mating such as maximum coancestry mating, hardly improved selection accuracy. Finally, this study demonstrated the benefit of using distinct designs for the testing versus the candidate population.
Genetic relationships and accuracy of selection
assuming that A21 and δ are independent.
This study was funded by grant 190442/S40 from the Research Council of Norway.
- Meuwissen THE, Hayes BJ, Goddard ME: Prediction of total genetic value using genome-wide dense marker maps. Genetics. 2001, 157: 1819-1829.PubMed CentralPubMed
- VanRaden PM, Van Tassell CP, Wiggans GR, Sonstegard TS, Schnabel RD, Taylor JF, Schenkel FS: Invited review: reliability of genomic predictions for North American Holstein bulls. J Dairy Sci. 2009, 92: 16-24. 10.3168/jds.2008-1514.View ArticlePubMed
- Hayes BJ, Bowman PJ, Chamberlain AJ, Goddard ME: Invited review. Genomic selection in dairy cattle: progress and challenges. J Dairy Sci. 2009, 92: 433-443. 10.3168/jds.2008-1646.View ArticlePubMed
- Goddard ME, Hayes BJ, Meuwissen THE: Genomic selection in farm animal species - Lessons learnt and future perspectives. Proceedings of the 9th World Congress on Genetics Applied to livestock. 2010, Leipzig,http://www.kongressband.de/wcgalp2010/assets/pdf/0701.pdf,
- Habier D: More than a third of the WCGALP presentations on genomic selection. J Anim Breed Genet. 2010, 127: 336-337. 10.1111/j.1439-0388.2010.00897.x.View ArticlePubMed
- Heffner EL, Sorrells ME, Jannink JL: Genomic selection for crop improvement. Crop Sci. 2009, 49: 1-12. 10.2135/cropsci2008.08.0512.View Article
- Jannink JL, Lorenz AJ, Iwata H: Genomic selection in plant breeding: from theory to practice. Brief Funct Genomics. 2010, 9: 166-177. 10.1093/bfgp/elq001.View ArticlePubMed
- Sonesson AK, Meuwissen THE: Testing strategies for genomic selection in aquaculture breeding programs. Genet Sel Evol. 2009, 41: 37-10.1186/1297-9686-41-37.PubMed CentralView ArticlePubMed
- Nielsen HM, Sonesson AK, Yazdi H, Meuwissen THE: Comparison of accuracy of genome-wide and BLUP breeding value estimates in sib based aquaculture breeding schemes. Aquaculture. 2009, 289: 259-264. 10.1016/j.aquaculture.2009.01.027.View Article
- Daetwyler HD, Villanueva B, Woolliams JA: Accuracy of predicting the genetic risk of disease using a genome-wide approach. PLoS One. 2008, 3: e3395-10.1371/journal.pone.0003395.PubMed CentralView ArticlePubMed
- Goddard M: Genomic selection: prediction of accuracy and maximisation of long term response. Genetica. 2009, 136: 245-257. 10.1007/s10709-008-9308-0.View ArticlePubMed
- Pszczola M, Strabel T, Mulder HA, Calus MPL: Reliability of direct genomic values for animals with different relationships within and to the reference population. J Dairy sci. 2012, 95: 389-400. 10.3168/jds.2011-4338.View ArticlePubMed
- Martinez VA, Hill WG, Knott SA: On the use of double haploids for detecting QTL in outbred populations. Heredity. 2002, 88: 423-431. 10.1038/sj.hdy.6800073.View ArticlePubMed
- Wright S: Systems of mating. II. The effects of inbreeding on the genetic composition of a population. Genetics. 1921, 6: 124-143.PubMed CentralPubMed
- Komen H, Thorgaard GH: Androgenesis, gynogenesis and the production of clones in fishes: a review. Aquaculture. 2007, 269: 150-173. 10.1016/j.aquaculture.2007.05.009.View Article
- Kimura M: The number of heterozygous nucleotide sites maintained in a finite population due to steady flux of mutations. Genetics. 1969, 61: 893-903.PubMed CentralPubMed
- Haldane JBS: The combination of linkage values, and the calculation of distances between the loci of linked factors. Genetics. 1919, 8: 299-309.
- Sonesson AK, Meuwissen THE: Mating schemes for optimum contribution selection with constrained rates of inbreeding. Genet Sel Evol. 2000, 32: 231-248. 10.1186/1297-9686-32-3-231.PubMed CentralView ArticlePubMed
- Villanueva B, Fernández J, García-Cortés LA, Varona L, Daetwyler HD, Toro MA: Accuracy of genome-wide evaluation for disease resistance in aquaculture breeding programs. J Anim Sci. 2011, 89: 3433-3442. 10.2527/jas.2010-3814.View ArticlePubMed
- Nirea KG, Sonesson AK, Woolliams JA, Meuwissen TH: Effect of non-random mating on genomic and BLUP selection schemes. Genet Sel Evol. 2012, 44: 11-10.1186/1297-9686-44-11.PubMed CentralView ArticlePubMed
- Bulmer MG: The effect of selection on genetic variability. Amer Nat. 1971, 105: 201-211. 10.1086/282718.View Article
- Gibson JP: Short term gain at the expense of long-term response with selection on identified loci. Proceedings of the 5th world congress on genetics applied to livestock production. 1994, University of Guelph, Guelph, 201-204.
- Galton F: Regression towards mediocrity in hereditary stature. J Anthropol Inst. 1886, 15: 246-263.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License(http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.