The effect of genomic information on optimal contribution selection in livestock breeding programs

Background Long-term benefits in animal breeding programs require that increases in genetic merit be balanced with the need to maintain diversity (lost due to inbreeding). This can be achieved by using optimal contribution selection. The availability of high-density DNA marker information enables the incorporation of genomic data into optimal contribution selection but this raises the question about how this information affects the balance between genetic merit and diversity. Methods The effect of using genomic information in optimal contribution selection was examined based on simulated and real data on dairy bulls. We compared the genetic merit of selected animals at various levels of co-ancestry restrictions when using estimated breeding values based on parent average, genomic or progeny test information. Furthermore, we estimated the proportion of variation in estimated breeding values that is due to within-family differences. Results Optimal selection on genomic estimated breeding values increased genetic gain. Genetic merit was further increased using genomic rather than pedigree-based measures of co-ancestry under an inbreeding restriction policy. Using genomic instead of pedigree relationships to restrict inbreeding had a significant effect only when the population consisted of many large full-sib families; with a half-sib family structure, no difference was observed. In real data from dairy bulls, optimal contribution selection based on genomic estimated breeding values allowed for additional improvements in genetic merit at low to moderate inbreeding levels. Genomic estimated breeding values were more accurate and showed more within-family variation than parent average breeding values; for genomic estimated breeding values, 30 to 40% of the variation was due to within-family differences. Finally, there was no difference between constraining inbreeding via pedigree or genomic relationships in the real data. Conclusions The use of genomic estimated breeding values increased genetic gain in optimal contribution selection. Genomic estimated breeding values were more accurate and showed more within-family variation, which led to higher genetic gains for the same restriction on inbreeding. Using genomic relationships to restrict inbreeding provided no additional gain, except in the case of very large full-sib families.


Introduction
Selection of livestock has led to increases in inbreeding over time, especially following the widespread use of artificial insemination and reproductive technologies. Inbreeding reduces the total number of heterozygotes in the population, hence reducing variation, which can cause a depression in fitness (inbreeding depression) and a decrease in selection response [1]. Selection on pedigree-based BLUP (best linear unbiased prediction) EBV (estimated breeding values) can lead to increases in inbreeding because the covariance between EBV of family members may be high, especially when animals are selected at a young age using EBV that are based on ancestral information.
Estimation of breeding values based on DNA marker information is now used in many livestock breeding programs [2]. Genomic estimated breeding values (GEBV) are usually equal to or more accurate than EBV based on parent average information (PA_EBV). Furthermore, it has been suggested that GEBV could also be used as a tool to reduce inbreeding because they may explain more Mendelian sampling variation than PA_EBV [3]. Thus, selection on GEBV is expected to increase genetic gain while maintaining the population's diversity. However, the question of what proportion of the Mendelian sampling variance is explained by GEBV and whether GEBV can be used to manage inbreeding remains unanswered.
Various methods have been developed to manage inbreeding in livestock breeding populations and all focus on managing inbreeding while increasing genetic gain [4][5][6]. To attempt to control inbreeding and maximize response to selection, Wray and Goddard [4] proposed a dynamic selection principle, which places a penalty on the co-ancestry of the selected animals. Meuwissen [5] extended this principle to introduce optimum contribution selection so that inbreeding could be limited to a specific level and the rate of gain maximized for that specific level of inbreeding. These optimal selection principles have been shown to maximize genetic gain at lower rates of inbreeding so that response can be maintained in the long term.
The addition of genomic information to optimal contribution selection can have several impacts on the genetic gain achieved in a breeding program. First, EBV based on genomic information are generally more accurate than PA_EBV [7] and allow for more accurate selection to occur at a younger age. Secondly, GEBV may explain more within family variation, which would allow for increases in genetic merit at the same level of inbreeding [3]. Finally, increases in genetic gain may be possible if genomic instead of pedigree relationships are used to restrict inbreeding [8]. Unlike the expected relationships derived from pedigree, genomic relationships can vary for a given type of relative [9], which may enable different selection decisions to be made. For example, it may be undesirable to select two sires from the same family; however, by using genomic relationships to constrain inbreeding it may be possible to include siblings that are less related to each other in the breeding program.
The aims of this study were to test the effect of genomic information on the balance between genetic gain and inbreeding under optimal selection and to observe how much GEBV vary within a family in real data.

Simulation study
Genotype simulations were conducted using the Markovian Coalescence Simulator (MaCS) [10] to simulate 2000 base haplotypes, with an effective population size (N e ) of 1000. As described in Clark et al. [11], 30 chromosomes each with base haplotypes of 100 cM (1.10 8 base pairs) were simulated with a mutation rate of 2.5.10 -8 per site. The total number of single nucleotide polymorphisms (SNPs) segregating on the genome was approximately 1 670 000. Sixty thousand SNP markers and 6000 QTL were randomly selected from the SNP sequence for genomic analysis. Therefore, each SNP had a 3% chance of being used as a marker and a 0.03% chance of being used as a QTL. As in Meuwissen et al. [12], the additive effect of each QTL was drawn from a gamma distribution with shape parameter 0.4 and scale parameter 1.66, and had a 50% chance of being positive or negative.
The base haplotypes were randomly allocated to 500 male and 500 female base animals of a simulated population structure. Two subsequent generations received haplotypes via Mendelian inheritance, allowing recombination to occur according to genetic distance, i.e. 1% recombination per cM. Each generation in the simulated population contained 1000 animals (50% males and 50% females). Twenty males were randomly selected in each generation and each male was randomly mated to 25 females. Each female had two offspring per generation (half-sib population). A second population was simulated with a full-sib family structure (full-sib population). This population included 25 males, each mated to two females and all females had 20 offspring each.
All animals in each simulation were allocated breeding values and phenotypes. True breeding values (TBV) for each animal were determined using: where β j is the additive effect of QTL (j) and Q kj is the QTL genotype at locus j, which was coded as 0, 1, or 2, as the number of copies of a given QTL allele that an individual (k) carries. Trait phenotypes were simulated by adding a random environmental effect drawn from a normal distribution with variance σ 2 e , which was chosen to result in a heritability (h 2 ) of 0.3.
Selection of animals in the third generation was investigated. Phenotypes and SNP genotypes of the two previous generations of animals were used as the reference population (2000 animals). The selection candidates had no phenotypic records and their breeding values were estimated by BLUP, using either pedigree (PA_EBV) or genomic information (GEBV).
As in Hayes et al. [13], we assumed the following model for genetic evaluation: where y is a vector of phenotypes, μ is the mean, 1 n is a vector of 1 s, Z is a design matrix allocating records to breeding values, g is a vector of breeding values for animals in the reference set and the test set, and e is a vector of normal deviates with variance σ 2 e . For genomic BLUP (gBLUP), V(g) = G σ 2 g where G is the genomic relationship matrix, and σ 2 g is the genetic variance for this model. Matrix G was formed using the method described by VanRaden [14]. Pedigree-based BLUP was also used to analyse the data. Pedigree-based BLUP uses the same model as gBLUP, except that V(g) = Aσ a 2 where A is the numerator relationship matrix based on pedigree and σ a 2 is the additive genetic variance. Variance components for both models were estimated with ASREML [15] and the model solutions yielded EBV. The accuracy of the EBV was estimated as the correlation between TBV and EBV (GEBV or PA_EBV) for the selection candidates. The intra-class correlation between EBV of animals from different family structures (half-or full-sib families) was also estimated based on the selection candidates.

Optimal contribution selection
Optimal contribution selection was undertaken to maximize the expected genetic value of future offspring while maintaining different levels of genetic diversity (i.e. effectively allowing for different rates of inbreeding). The average genetic merit of the selected individuals (m) was maximized using the dynamic selection rule of Wray and Goddard [7]; m = x 'b, where x is a vector that relates to how much each selection candidate contributes to the next generation and b is a vector of EBV, using either genomic (GEBV), pedigree (PA_EBV) or true (TBV) breeding values. Rates of inbreeding were restricted by penalizing the average co-ancestry of the selection candidates to; c = λ x 'A x, where A is the (n x n) relationship matrix among the selection candidates based on pedigree and λ is the penalty factor applied to restrict inbreeding. Note that the level of inbreeding equals x 'A x / 2. Table 1 shows the alternative levels of λ that were used to obtain a graph of possible solutions for maximum genetic gain for the various levels of co-ancestry. Solutions for m + c were obtained using an evolutionary algorithm [16] such that an optimal x was determined. To evaluate the impact of restricting inbreeding using genomic relationships rather than pedigree relationships, A was substituted by G. The overall increase in genetic merit was measured as the increase in TBV and inbreeding was assessed based on the average genomic co-ancestry among selected individuals such that results from the alternate selection strategies were compared on an equal basis.

Real data analysis
Estimated breeding values and 50 k genotypes were obtained for 267 bulls from the Australian Dairy Holstein Improvement Scheme (ADHIS) and 140 bulls from the Livestock Improvement Corporation (LIC) in NZ. These EBV were for the 2007 cohort of Holstein bulls that obtained progeny test proofs in 2011. Breeding values (GEBV, PA_EBV and PT_EBV) for milk, fat and protein yield were obtained from each organization. The average number of daughter records per sire was 90. Each dataset consisted primarily of a half-sib family structure but 30 bulls from the ADHIS dataset and 28 bulls from the LIC dataset shared one full sibling and the ADHIS dataset contained two families of four full siblings. The 50 k genotypes were used to form G to use in optimal contribution selection (as above). Note that PA_EBV were based on information before the bulls had progeny themselves and the GEBV were based on genomic information alone.
Optimal contribution selection was performed (as above) for each dairy dataset, with the rate of inbreeding restricted by penalizing the average co-ancestry of the selection candidates using pedigree information (A) or genomic information (G). Genetic merit was maximized using either GEBV or PA_EBV. The overall genetic gain was measured as the increase in PT_EBV since it was the most accurate estimate of TBV and inbreeding was assessed based on the average genomic co-ancestry.

Decomposition of variation in estimated breeding values
To explore the within-family variation in EBV, the effects of sire and dam were fitted as random effects in an analysis of PA_EBV, GEBV and PT_EBV using ASReml [15].
In the analysis of EBV, the effects of sire and dam remove variation between families and the unexplained variation is the within-family variation in EBV. For PA_EBV, the residual variation (e) that remains is expected to be zero.

Simulation study
Using GEBV as an estimate of genetic merit generally gave higher genetic gains than selection based on PA_EBV (Figure 1). In the simulation, average accuracies of GEBV and PA_EBV were 0.57 and 0.45, respectively. For the population that consisted of large half-sib families, there was no difference in genetic gain when using pedigree or genomic measures of relationship to constrain inbreeding ( Figure 2). However, more sires and dams were consistently selected when inbreeding was constrained using genomic relationships. For the population that consisted of large full-sib families and when selection was on GEBV, the use of genomic relationships to constrain inbreeding resulted in greater genetic gain than the use of pedigree relationships ( Figure 3). In contrast, when selection was on PA_EBV, no differences could be observed between constraining inbreeding based on pedigree or genomic information. With high and moderate constraints on co-ancestry, higher genetic gains were obtained when using genomic relationships than with pedigree-based relationships.
When pedigree relationships were used to constrain inbreeding, all animals in a full-sib family were considered equal with regard to co-ancestry. However, when genomic information was used, the breeding program could be optimized by selecting a high performing full-sib with lower co-ancestry with other selected individuals. With a low or no constraint on inbreeding, both measures of relationship resulted in the same genetic gain and each method selected the single best sire and dam for all matings. Table 2 shows the intra-class correlations between breeding values (PA_EBV or GEBV) for half-sibs and full-sibs in the two simulated populations. In the half-sib families the correlations between the EBV of family members were similar for GEBV and PA_EBV. In the full sib population, there was a lower correlation between the GEBV of family members than that of PA_EBV.

Real data
Genomic breeding values for bulls in the ADHIS data set were more accurate than PA_EBV, which led to higher gains in merit when using GEBV as a measure of merit in optimal contribution selection ( Figure 4). Reducing the number of sires by increasing the constraint on inbreeding had little impact when selection was based on PA_EBV. In the LIC dataset, optimal selection based on GEBV and PA_EBV gave variable results, with no clear difference in genetic gain between the two measures of merit, except at high inbreeding levels, for which PA_EBV often outperformed GEBV ( Figure 5). The use of genomic or pedigree-based relationships to restrict inbreeding resulted in similar genetic gains. However, the use of genomic relationships (compared to using pedigree relationships) increased the number of animals selected, although differences in merit (based on PT_BV) and average inbreeding were small or null for all traits measured, especially for the LIC dataset ( Figure 6). At low and moderate inbreeding levels, genetic gains were slightly higher for the ADHIS dataset ( Figure 7) when using genomic relationships to constrain inbreeding. Table 3 shows the proportion of variation that was attributed to the various sources for each EBV for the two dairy datasets. As expected, sire and dam (between-family) explained most of the variation in PA_EBV. In contrast, within-family differences explained 30 to 40% of the variation in GEBV, with the rest due to sire and dam. Approximately half of the variation in PT_BV was due to sire and dam and half was due to within-family differences.

Discussion
Genomic breeding values provided greater genetic gains than PA_EBV for the same rate of inbreeding. Using simulation, Nielsen et al. [17] previously reported optimal contribution selection on GEBV to increase gains by up to 80%, but in our study increases were much lower, i.e. 16%. Genetic gains with optimal contribution selection on GEBV are greatly affected by the structure of the selected population. In highly related full-sib families (Figure 3 and as simulated by Nielsen et al. [17] and Sonesson et al. [8]), there is some variation in GEBV within families (intra-class correlation of 0.85 (Table 2)), which can be used in optimal contribution selection, i.e. a full-sib with a higher GEBV can be selected. In contrast, PA_EBV have an intra-class correlation of 1, with all full-sibs receiving the same EBV and therefore no within-family variation can be exploited. In a population of large half-sib families, there is already some differentiation of animals based on PA_EBV (intra-class correlation of 0.55). However the intra-class correlation for GEBV is lower (0.5) than that for PA_EBV and may allow for extra gain when selecting animals based on GEBV under constrained inbreeding. Additional increases in merit due to selection on GEBV are because GEBV are more accurate than PA_EBV.
Avendano et al. [18] suggested that the success of optimal breeding schemes depends on how much Mendelian

Genomic co ancestry
Progeny test BV GEBV PA EBV Figure 5 Optimal contribution selection of LIC Holstein bulls at different levels of genomic co-ancestry of selected bulls, using three alternative estimates of genetic merit for protein yield. sampling effects (within family) contribute to variation in EBV. In our study, one third of the variation in GEBV was within-families, hence supporting the suggestions of Daetwyler et al. [3] that GEBV explain more Mendelian sampling effects than PA_EBV. However, PT_EBV explained more within-family variation than GEBV, with GEBV explaining 36 to 56% of the within-family variance in PT_EBV. Given this, the Mendelian sampling variance explained by the GEBV is comparable to that of a small progeny test. A large proportion of variation in GEBV was explained by the sire and dam effects (between family), demonstrating that information on close relatives contributes to GEBV [19][20][21]. Sonesson et al. [8] noted that in optimal contribution selection, the relationship structure used to constrain inbreeding must be the same as that used to estimate breeding values in order to maximize genetic gain. This is especially important if inbreeding is constrained to a specific level because pedigree and genomic measures of inbreeding are often on different scales [8]. However, an alternative approach to balancing inbreeding and genetic gain is to vary the inbreeding penalty (λ) and graph a range of outcomes for inbreeding (co-ancestry) and genetic gain. Such a graph provides alternative solutions to the breeder, who then can balance how much merit (s) he is prepared to sacrifice to maintain diversity. Using this approach, various methods and strategies can be compared on a fair basis, as long as the scale along both axes is consistent across alternatives. In other words, either pedigree or genomic measures of relationship can be used to compare outcomes on the co-ancestry axis, as long as the same measure is used across alternatives.
Similar to the study by Schierenbeck et al. [22], we found that in the real data example, there was again no difference in genetic gain due to constraints based on genomic relationships or pedigree relationships. This supports the results from the simulation study and shows that in the two dairy cattle datasets, which had clear large half-sib population structures, there was little or no advantage to using genomic relationship information to manage inbreeding. However, the results from our simulation study suggest that for a breeding population with a larger full-sib family structure, the advantage of using genomic relationships to manage inbreeding may be quite substantial, especially at low and moderate inbreeding levels.
To achieve low inbreeding levels in a population that consists of many full-sib families, it will be necessary to select one animal from each family. When selection aims at increasing genetic gain, more than one animal per family may be needed. The within-family variation in GEBV makes it possible to use some within-family  selection differential and the variation in relationship can allow for increases in genetic merit without compromising genetic diversity. This also occurs in populations that have a half-sib family structure but the additional variation in relationship that can be exploited is lower since variation in pedigree relationships and PA_EBV already exits. As in Pryce et al. [23], we observed that at low inbreeding levels the use of genomic measures to manage inbreeding also led to an increase in the number of sires, in both the simulated and dairy bull dataset, however, this had little to no impact on inbreeding levels in the population. The amount of variation in EBV among family members may be important to determine how much gain can be obtained from using genomic relationships to manage inbreeding. Furthermore, within families, a positive correlation between GEBV and the genomic relationship may exist with the best selection candidate, which reduces the prospects for genomic relationships to create additional selection opportunities without reducing genetic gain.

Conclusions
Genomic information can be used in livestock breeding programs to optimize genetic gain and inbreeding. Selection on GEBV increases genetic gain for a given rate of inbreeding because GEBV are more accurate and display more within-family variation than PA_EBV. Using genomic rather than pedigree-based relationships had limited impact on achieving additional genetic gain for a given rate of inbreeding in populations that consisted mainly of half-sib families (such as in dairy cattle), because the intra-class correlations among the EBV of relatives were already relatively low. However when the population consisted of large full-sib families, there was a clear benefit from using genomic relationships to control inbreeding, especially at moderate and low inbreeding constraints.