Prediction of expected genetic variation within groups of offspring for innovative mating schemes
© Segelke et al.; licensee BioMed Central Ltd. 2014
Received: 27 September 2013
Accepted: 24 June 2014
Published: 2 July 2014
Experience from progeny-testing indicates that the mating of popular bull sires that have high estimated breeding values with excellent dams does not guarantee the production of offspring with superior breeding values. This is explained partly by differences in the standard deviation of gamete breeding values (SDGBV) between animals at the haplotype level. The SDGBV depends on the variance of the true effects of single nucleotide polymorphisms (SNPs) and the degree of heterozygosity. Haplotypes of 58 035 Holstein animals were used to predict and investigate expected SDGBV for fat yield, protein yield, somatic cell score and the direct genetic effect for stillbirth.
Differences in SDGBV between animals were detected, which means that the groups of offspring of parents with low SDGBV will be more homogeneous than those of parents with high SDGBV, although the expected mean breeding values of the progeny will be the same. SDGBV was negatively correlated with genomic and pedigree inbreeding coefficients and a small loss of SDGBV over time was observed. Sires that had relatively low mean gamete breeding values but high SDGBV had a higher probability of producing extremely positive offspring than sires that had a high mean gamete breeding value and low SDGBV.
An animal’s SDGBV can be estimated based on genomic information and used to design specific genomic mating plans. Estimated SDGBV are an additional tool for mating programs, which allows breeders to identify and match mating partners using specific haplotype information.
Within the last years, dairy cattle breeding schemes have changed drastically with the availability of routine dense single nucleotide polymorphism (SNP) chips. Initially, research focused mainly on estimation of genomic breeding values[1–3] and more recently, on imputation from low-density marker sets to denser marker sets[4–6]. In addition to genomic breeding values, other information can also be derived from dense marker information, such as parentage verification. In addition, VanRaden et al. identified haplotypes with genetic lethal effects that may lead to embryonic death in the homozygous state. Moreover, genetic characteristics such as horn status can be predicted with routine SNP information.
In addition, genotyping large numbers of animals and dense SNP datasets makes it possible to characterize genetic variation at the chromosome and haplotype levels[10, 11]. Consequently, SNP haplotype information can be used to estimate the expected variance of breeding values at the gamete level. Variation between gametes is generated by random sampling of parental haplotypes during meiosis if the dam and/or the sire are heterozygous.
Knowledge on the mean (MGBV) and standard deviation of gamete breeding values (SDGBV) assuming normally distributed estimated breeding values allows the development of specific mating plans. For example, the probability that the breeding value of an offspring exceeds a certain threshold can be estimated. In addition, it is possible to predict the number of animals to be tested to produce an offspring with an estimated breeding value above a given threshold. Cole and VanRaden discussed the possibility of selecting animals for which gamete breeding values vary little, in order to produce more homogeneous progeny and simplify herd management. Conversely, breeding companies may be more interested in heterogeneous progeny to increase the probability of extremely positive offspring. In line with this, experience with progeny-testing indicates that the use of popular sires with high estimated breeding values and many tested offspring does not guarantee that male offspring with superior breeding values are produced. In contrast, bulls for which fewer male offspring are tested sometimes produce more excellent offspring than popular bulls.
The objective of this study was to predict and investigate the expected SDGBV using genomic information and to demonstrate its usefulness to improve mating decisions.
A total of 58 035 Holstein animals genotyped with the Illumina BovineSNP50 BeadChip (Illumina Inc., San Diego, CA, USA) obtained from routine genomic evaluation for German Holsteins (February 2013) were chosen for the study. Of the 50 k SNPs on this chip, 43 586 autosomal SNPs that had a minor allele frequency greater than 1% were selected. The algorithm reported by Hayes was used to check whether genotype information agreed with the pedigree information. Only genotypes with a call rate greater than 98% were used. The software package Beagle (version 3.3,) with default settings was used for imputation of missing marker genotypes and for phasing the genotypes. For this purpose, Beagle uses linkage disequilibrium at the population level. The order of the SNPs on the chromosomes was based on the UMD3.1 bovine genome assembly.
Four traits (fat yield, protein yield, somatic cell score and the direct genetic effect for stillbirth) with different genetic architectures, heritabilities and genomic reliabilities were chosen. SNP effects were estimated with a BLUP model assuming trait-specific residual polygenic variance (for more details on the model see).
Pedigree and genomic relationships
The pedigree contained 58 035 genotyped animals (15 816 females and 42 219 males) and their 136 477 ancestors. All sires and dams of the genotyped animals were known. The animals were born between 1960 and 2013 and were descendants from 2768 different sires and 32 416 different dams. Genomic inbreeding coefficients were calculated by setting up the diagonal elements of the genomic relationship matrix, as suggested by VanRaden. Allele frequencies in the base population were estimated using the gene content method described by Gengler et al..
Flow of information
Prediction of mean and standard deviation of gamete breeding values
MGBV and SDGBV were obtained by sampling different sets of transmitted haplotypes from the animals. In theory, with 29 autosomal chromosomes and ignoring the sex chromosome, there are 229 possible combinations of sampled haplotypes if the length of a haplotype is defined as one autosome and recombination is ignored. Assuming that, on average, one recombination occurs per centiMorgan, there is a near unlimited number of possible combinations of haplotypes. Thus, to make the simulation computationally feasible and to reduce the number of haplotype combinations, the genome was divided into 1856 chromosome segments (C) according to positions in the genome where a high number of recombination events occurred. These recombination events were identified in a preliminary study (results not shown here) in which a whole genome map of the number of crossing-over events was derived by identifying phase switches between the haplotypes of the sires and the paternal haplotypes of their sons.
where hij is the ith haplotype, with j the indicator of maternal or paternal haplotype, z is the maternal or paternal allele of marker k, αk is half of the estimated effect of the kth SNP from routine genomic evaluation of German Holstein cattle, and n is the number of SNPs belonging to the ith haplotype. Imprinting, dominance and epistasis were not considered in the simulation. In the second step, using the program genvar.f90, 100 000 possible gametes were simulated by selecting either the maternal or paternal phase from an animal. At the beginning of the chromosome, the probability of selecting the maternal or paternal strand was equal to 50%. Location of cross-overs was implemented in the simulation based on a uniform distribution over the interval [0,C] (C being the number of chromosome segments). The mean recombination rate between the haplotype strands was set to 0.3, which is in line with the number of expected recombinations assuming one recombination per Morgan.
where N is the number of replicates of the simulation, H is the number of haplotypes, and hij is the ith parental or maternal haplotype breeding value.
Correlations between traits were analyzed for MGBV and SDGBV to investigate relationships between traits. To study whether selection, which should result in increased inbreeding and homozygosity per generation, had an antagonistic effect on MGBV and SDGBV, correlations of SDGBV and MGBV with the genomic (FG) and the pedigree (FP) inbreeding coefficients were computed for each trait. Furthermore, MGBV and SDGBV were tested for normality.
Results of the simulation were validated by reconstructing the paternally transmitted haplotype for each animal. Then the paternally transmitted haplotype breeding value was estimated, by summing the paternally transmitted haplotype, which in this case refers to haploid chromosomes, with half the estimated SNP effects. A sensitivity analysis was performed to determine the size of the progeny groups per sire needed for validation. The observed mean and standard deviation of the estimated breeding values of the offspring were compared with the mean and standard deviation obtained from the simulation and correlations were computed.
where mBV is the expected breeding value of an offspring based on the parental average estimated breeding values, MGBVs is the estimated mean gamete breeding value of the sire, and MGBVd is the estimated mean gamete breeding value of the dam.
where sBV is the expected standard deviation of breeding values within the potential offspring of the same mating, SDGBVs is the standard deviation of gamete breeding values of the sire, and SDGBVd is the standard deviation of gamete breeding values of the dam. In addition, the probability to obtain offspring with a breeding value over a given threshold was calculated assuming normally distributed breeding values and the number of matings to produce at least one offspring with an estimated breeding value over a given threshold was calculated using a binomial distribution.
Mean and standard deviation of gamete breeding values
Correlations between MGBV among traits and with inbreeding coefficients
Correlation between SDGBV among traits and with inbreeding coefficients
Validation of simulated SDGBV
Correlations (r) between SDGBV with real progeny variations for different traits per minimum number of offspring per sire
Minimum number of offspring per sire
Number of sires
Results of mating two sires to a poor, average and superior female in the population for protein yield
The objective of this study was to predict the expected genetic standard deviation within groups of offspring using real data. The results indicate that gamete breeding values vary between animals and these results can be used to make specific mating decisions.
MGBV and SDGBV for direct genetic effect for stillbirth were about half as high as for the three other traits (Figure 2 and Figure 4), which is related to differences in the reliabilities of the direct genomic breeding values (DGV) between these traits. The reliability of DGV for fat and protein yields is equal to 69% and for somatic cell score to 74%, but only 44% for the direct genetic effect for stillbirth. Accordingly, the SNP effects for the direct genetic effect for stillbirth are more regressed to the mean than for the other traits.
In comparison to the SNP-effect reference population, high MGBV for protein and fat yields can be explained by higher selection intensities and genetic gains than for somatic cell score and the direct genetic effect for stillbirth. Comparing the three different traits with similar reliabilities indicates that protein yield had the highest MGBV but the lowest SDGBV. This is explained by a higher selection intensity for protein yield, which is caused by a higher weight on this trait in the German Total Merit Index. However, up to now most genotyped animals are elite animals, which means that the genotyped animals are highly preselected. From this point of view, the high MGBV for protein and fat yields may not represent the mean breeding value of the German Holstein population. In contrast, MGBV for somatic cell score and for the direct genetic effect for stillbirth are closer to the mean value of the population since these traits are not as relevant for selection. Similarly, Cole and Null, pointed out that most genotyped animals are elite animals, which have more chromosomes with a desirable DGV than chromosomes with an undesirable DGV.
Negative correlations between FG and SDGBV (Table 2) are in agreement with. These authors reported a stronger correlation of the Mendelian sampling variance (similar to the square of SDGBV) with FG than with FP, which is caused by pedigree errors.
Validation of simulated gamete variation
Systematic genotyping of young Holstein Friesian candidates started in 2010. This implies that animals born before 2010 were selectively genotyped because of their importance for the breeding scheme and their contribution to the reference population. The within-family variance of older families could be affected by this selective genotyping. Genotyping more animals results in larger groups of offspring from randomly genotyped sires, which should result in improved future validations.
Van Raden et al. and Fritz et al. reported that some haplotypes are never present in the homozygous state, because embryos that are homozygous for these haplotypes are not viable. This fact and genetic defects like Brachyspina[23, 24], Bovine Leukocyte Adhesion Deficiency (BLAD) or Complex Vertebral Malformation (CVM) also influence the SDGBV. However, the effect on the variation depends on the allele frequency in the population; thus a loss of variation can be observed only when sperm and ovum carry the same genetic defect. This fact can explain the difference between simulated and observed realized gamete breeding values, because the simulation did not consider loss of variation due to genetic defects. Indeed, gamete breeding values rather than animal breeding values were simulated and a carrier of a genetic defect had no influence on SDGBV if the mating partner did not carry this defect.
Figure 2 shows that there are animals with a high mean and a low variability that are relevant for dairy farmers. In particular, animals with a high mean and a high standard deviation are interesting for AI companies because selecting these animals will increase the probability of producing animals with extremely positive breeding values in the future.
Haplotype information enables the estimation of selection limits. Summing up the best breeding value for each haplotype will give the theoretically best animal. The gamete breeding values of these hypothetical animals should reach +30 σa (707 kg) for fat yield, +32 σa (539 kg) for protein yield, +35 σa somatic cell score and +14.2 σa for the direct effect of still birth. Cole and VanRaden showed that the selection limit for protein yield was 1138 kg. Although our results are estimated at the haplotype level and those of at the animal level, they are consistent. Theoretical mating of the two best animals for protein yield in our dataset would produce animals with a mean estimated breeding value of 4.82 σa and a standard deviation of 0.76 σa. The probability to produce an offspring with a breeding value higher than 8 σa is 0.14%, which is only one third of the selection limit, which illustrates that animals from the current population are far from the selection limits.
Figure 5 and Table 4 show that two different mating strategies can be designed based on knowledge about MGBV and SDGBV. On the one hand, AI companies are interested in finding extremely positive offspring and, from this point of view, mating bull 2 would be the best choice. On the other hand, farmers are more interested in homogeneous groups of offspring with low SDGBV, which means that mating bull 1 would be better for breeding in these herds. For computational reasons, no covariance between sire and dam was assumed to calculate the vBV. Thus, this method has to be improved because the German Holstein population has a small effective population size which increases the level of relationships and results in a non-zero covariance between sires and dams.
Finding the best combination of mating partners in mating programs that are based on genomic information requires time- and memory-intensive computing because of the large amount of data. A great benefit of the method described in this study is that MGBV and SDGBV need to be computed only once for each animal. After this step, it is computationally easy to find mating partners because mBV or vBV is the sum of maternal and paternal MGBV or SDGBV, respectively. Calculating the probability that an animal reaches a defined threshold is simple using normal distribution functions. Based on this methodology, a software tool for breeding associations was developed, which includes MGBV and SDGBV for a portfolio of bulls of interest and for genotyped cows. Given this information, the association can specify which breeding value threshold the offspring of a given cow should exceed and the tool provides a list of bulls that are expected to reach this criterion.
Future aspects and applications
Decreasing genotyping costs makes it possible to genotype whole commercial herds. Considering MGBV and SDGBV derived from haplotypes and SNP effect estimates is only one example of the use of additional genomic information in genomic mating programs. Ongoing research will develop new tools such as the estimation of dominance effects or more information about haplotypes with specific genomic effects. Software solutions need efficient and highly performing programs, which can handle large amounts of data within a reasonable timeframe.
The expected SDGBV of a potential parent can be estimated from genomic information. The SDGBV differs between animals and tend to be normally distributed in the absence of QTL with a large effect on the trait. For SDGBV for fat yield, a deviation from a normal distribution that is caused by the DGAT1 mutation results in a higher SDGBV than expected. Furthermore, for all traits, SDGBV decreased slightly in recent years because of an increase in the level inbreeding. A genomic mating program was developed to find optimal mating partners with respect to expected MGBV and SDGBV. This approach also allows the probability of finding an offspring with a breeding value exceeding a chosen threshold to be calculated.
German national organization FBF is thanked for financial support.
- VanRaden PM, Van Tassell CP, Wiggans GR, Sonstegard TS, Schnabel RD, Taylor JF, Schenkel FS: Invited review: reliability of genomic predictions for North American Holstein bulls. J Dairy Sci. 2009, 92: 16-24.View ArticlePubMedGoogle Scholar
- Lund MS, de Ross APW, de Vries AG, Druet T, Ducrocq V, Fritz S, Guillaume F, Guldbrandtsen B, Liu Z, Reents R, Schrooten C, Seefried F, Su G: A common reference population from four European Holstein populations increases reliability of genomic predictions. Genet Sel Evol. 2011, 43: 43-PubMed CentralView ArticlePubMedGoogle Scholar
- Liu Z, Seefried FR, Reinhardt F, Rensing S, Thaller G, Reents R: Impacts of both reference population size and inclusion of a residual polygenic effect on the accuracy of genomic prediction. Genet Sel Evol. 2011, 43: 19-PubMed CentralView ArticlePubMedGoogle Scholar
- Wiggans GR, Cooper TA, VanRaden PM, Olson KM, Tooker ME: Use of the Illumina Bovine3K BeadChip in dairy genomic evaluation. J Dairy Sci. 2012, 95: 1552-1558.View ArticlePubMedGoogle Scholar
- Segelke D, Chen J, Liu Z, Reinhardt F, Thaller G, Reents R: Reliability of genomic prediction for German Holsteins using imputed genotypes from low-density chips. J Dairy Sci. 2012, 95: 5403-5411.View ArticlePubMedGoogle Scholar
- Erbe M, Hayes BJ, Matukumalli LK, Goswami S, Bowman PJ, Reich CM, Mason BA, Goddard ME: Improving accuracy of genomic predictions within and between dairy cattle breeds with imputed high-density single nucleotide polymorphism panels. J Dairy Sci. 2012, 95: 4114-4129.View ArticlePubMedGoogle Scholar
- Heaton MP, Harhay GP, Bennett GL, Stone RT, Grosse WM, Casas E, Keele JW, Smith TPL, Chitko-Mckown CG, Laegreid WW: Selection and use of SNP markers for animal identification and paternity analysis in US beef cattle. Mamm Genome. 2002, 13: 272-281.View ArticlePubMedGoogle Scholar
- VanRaden PM, Olson KM, Null DJ, Hutchison JL: Harmful recessive effects on fertility detected by absence of homozygous haplotypes. J Dairy Sci. 2011, 94: 6153-6161.View ArticlePubMedGoogle Scholar
- Segelke D, Täubert H, Reinhardt F, Thaller G: Chancen und Grenzen der Hornloszucht für die Rasse Deutsche Holstein. Züchtungskunde. 2013, 85: 4-Google Scholar
- Cole JB, Null DJ: Visualization of the transmission of direct genomic values for paternal and maternal chromosomes for 15 traits in US Brown Swiss, Holstein, and Jersey cattle. J Dairy Sci. 2013, 96: 2713-2726.View ArticlePubMedGoogle Scholar
- Cole JB, VanRaden PM: Use of haplotypes to estimate Mendelian sampling effects and selection limits. J Anim Breed Genet. 2011, 128: 446-455.View ArticlePubMedGoogle Scholar
- Hayes BJ: Technical note: Efficient parentage assignment and pedigree reconstruction with dense single nucleotide polymorphism data. J Dairy Sci. 2011, 94: 2114-2117.View ArticlePubMedGoogle Scholar
- Browning SR, Browning BL: High-resolution detection of identity by decent in unrelated individuals. Am J Hum Genet. 2010, 86: 526-539.PubMed CentralView ArticlePubMedGoogle Scholar
- Zimin AV, Delcher AL, Florea L, Kelley DR, Schatz MC, Puiu D, Hanrahan F, Pertea G, Van Tassek CP, Sonstegard TS, Marcais G, Roberts M, Subramanian P, Yorke JA, Salzberg SL: A whole genome assembly of the domestic cow. Bos taurus. Genome Biol. 2009, 10: R42-View ArticlePubMedGoogle Scholar
- VanRaden PM: Efficient methods to compute genomic predictions. J Dairy Sci. 2008, 91: 4414-4423.View ArticlePubMedGoogle Scholar
- Gengler N, Mayeres P, Szydlowski M: A simple method to approximate gene content in large pedigree populations: application to the myostatin gene in dual-purpose Belgian Blue cattle. Animal. 2007, 1: 21-28.View ArticlePubMedGoogle Scholar
- Estimation of Breeding Values for Milk Production Traits, Somatic Cell Score, Conformation, Productive Life and Reproduction Traits in German Dairy Cattle.http://www.vit.de/fileadmin/user_upload/vit-fuers-rind/zuchtwertschaetzung/milchrinder-zws-online/Zws_Bes_eng.pdf,
- Grisart B, Farnir F, Karim L, Cambisano N, Kim JJ, Kvasz A, Mni M, Simon P, Frere JM, Coppieters W, Georges M: Genetic and functional confirmation of the causality of the DGAT1 K232A quantitative trait nucleotide in affecting milk yield and composition. Proc Natl Acad Sci USA. 2004, 101: 2398-2403.PubMed CentralView ArticlePubMedGoogle Scholar
- Thaller G, Krämer W, Winter A, Kaupe B, Erhardt G, Fries R: Effects of DGAT1 variants on milk production traits in German cattle breeds. J Anim Sci. 2003, 81: 1911-1918.PubMedGoogle Scholar
- Kühn C, Bennewitz J, Reinsch N, Xu N, Thomsen H, Looft C, Brockmann GA, Schwerin M, Weimann C, Hiendleder S, Erhardt G, Medjugorac I, Forster M, Brenig B, Reinhardt F, Reents R, Russ I, Averdunk G, Blümel J, Kalm E: Quantitative trait loci mapping of functional traits in the German Holstein cattle population. J Dairy Sci. 2003, 86: 360-368.View ArticlePubMedGoogle Scholar
- Cole JB, VanRaden PM, O’Connell JR, Van Tassell CP, Sonstegard TS, Schnabel RD, Taylor JF, Wiggans GR: Distribution and location of genetic effects for dairy traits. J Dairy Sci. 2009, 92: 2931-2946.View ArticlePubMedGoogle Scholar
- Fritz S, Capitan A, Djari A, Rodriguez SC, Barbat A, Baur A, Grohs C, Weiss B, Boussaha M, Esquerré D, Klopp C, Rocha D, Boichard D: Detection of haplotypes associated with prenatal death in dairy cattle and identification of deleterious mutations in GART, SHBG and SLC37A2. PLoS ONE. 2013, 8: e65550-PubMed CentralView ArticlePubMedGoogle Scholar
- Agerholm JS, Peperkamp K: Familial occurrence of Danish and Dutch cases of the bovine brachyspina syndrome. BMC Vet Res. 2007, 3: 8-PubMed CentralView ArticlePubMedGoogle Scholar
- Charlier C, Agerholm JS, Coppieters W, Karlskov-Mortensen P, Li W, de Jong G, Fasquelle C, Karim L, Cirera S, Cambisano N, Ahariz N, Mullaart E, Georges M, Fredholm M: A deletion in the bovine FANCI gene compromises fertility by causing fetal death and brachyspina. PLoS ONE. 2012, 7: e43085-PubMed CentralView ArticlePubMedGoogle Scholar
- Shuster DE, Kehrli ME, Ackermann MR, Gilbert RO: Identification and prevalence of a genetic defect that causes leukocyte adhesion deficiency in Holstein cattle. Proc Nat Acad Sci USA. 1992, 89: 9225-9229.PubMed CentralView ArticlePubMedGoogle Scholar
- Agerholm JS, Dendixen C, Andersen O, Arnbjerg J: Complex vertebral malformation in Holstein calves. J Vet Diagn Invest. 2001, 13: 283-289.View ArticlePubMedGoogle Scholar
- Weigel KA, Hoffmann PC, Herring W, Lawlor TJ: Potential gains in lifetime net merit from genomic testing of cows, heifers, and calves on commercial dairy farms. J Dairy Sci. 2012, 95: 2215-2225.View ArticlePubMedGoogle Scholar
- Toro MA, Varona L: A note on mate allocation for dominance handling in genomic selection. Genet Sel Evol. 2010, 42: 33-PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.