Genomic analysis of dominance effects on milk production and conformation traits in Fleckvieh cattle

Background Estimates of dominance variance in dairy cattle based on pedigree data vary considerably across traits and amount to up to 50% of the total genetic variance for conformation traits and up to 43% for milk production traits. Using bovine SNP (single nucleotide polymorphism) genotypes, dominance variance can be estimated both at the marker level and at the animal level using genomic dominance effect relationship matrices. Yield deviations of high-density genotyped Fleckvieh cows were used to assess cross-validation accuracy of genomic predictions with additive and dominance models. The potential use of dominance variance in planned matings was also investigated. Results Variance components of nine milk production and conformation traits were estimated with additive and dominance models using yield deviations of 1996 Fleckvieh cows and ranged from 3.3% to 50.5% of the total genetic variance. REML and Gibbs sampling estimates showed good concordance. Although standard errors of estimates of dominance variance were rather large, estimates of dominance variance for milk, fat and protein yields, somatic cell score and milkability were significantly different from 0. Cross-validation accuracy of predicted breeding values was higher with genomic models than with the pedigree model. Inclusion of dominance effects did not increase the accuracy of the predicted breeding and total genetic values. Additive and dominance SNP effects for milk yield and protein yield were estimated with a BLUP (best linear unbiased prediction) model and used to calculate expectations of breeding values and total genetic values for putative offspring. Selection on total genetic value instead of breeding value would result in a larger expected total genetic superiority in progeny, i.e. 14.8% for milk yield and 27.8% for protein yield and reduce the expected additive genetic gain only by 4.5% for milk yield and 2.6% for protein yield. Conclusions Estimated dominance variance was substantial for most of the analyzed traits. Due to small dominance effect relationships between cows, predictions of individual dominance deviations were very inaccurate and including dominance in the model did not improve prediction accuracy in the cross-validation study. Exploitation of dominance variance in assortative matings was promising and did not appear to severely compromise additive genetic gain.


Background
Dominance arises when the effects of alleles at a locus are not only additive, but interact so that the value of the heterozygous genotypes deviates from the mean of the values of the homozygous genotypes. With a and -a being the genotypic values of homozygous genotypes A 1 A 1 and A 2 A 2 , let d be the genotypic value of the heterozygous genotype A 1 A 2 [1]. If d = 0, there is no dominance action at the locus and the genotypic values at the locus are purely additive. The additive effects of genotypes at a locus are expressed as breeding values, which include part of the dominance effect because animals pass alleles, not genotypes, to their offspring. Breeding values are 2q[a + d(q-p)] for genotype A 1 A 1 , (q-p)[a + d(q-p)] for genotype A 1 A 2 and -2p[a + d(q-p)] for genotype A 2 A 2 , where p is the frequency of allele A 1 in the population and q the frequency of allele A 2 . The dominance deviation for a given genotype at the locus is the difference between genotypic value and breeding value, and is equal to -2q 2 d, 2pqd and -2p 2 d for genotypes A 1 A 1 , A 1 A 2 and A 2 A 2 , respectively [1].
Until recently, studies on dominance deviations were sparse because without genomic information, the availability of large datasets with sufficient proportions of individuals with non-null dominance effect relationships, such as full-sibs, is essential for accurate estimation of dominance variance. Estimates of dominance variance in dairy cattle that are based on pedigree data range from 7.3% to 49.8% of the total genetic variance for conformation traits [2,3] and from 3.4% to 42.9% for milk production traits [4][5][6].
At the individual animal level, dominance is hardly used in animal breeding [7], although it contains a relevant part of genetic variation. The reasons are the heavy computational demand of large-scale genetic evaluations for dominance, the relatively low accuracy of resulting estimates of dominance effects, and the complexity of planning and computing the outcome of planned matings [8].
With the availability of SNP genotypes, dominance at a marker locus can be readily determined, dominance effects of markers can be estimated [9,10] and computing the expected outcome of planned matings based on SNP genotypes is straightforward [9]. Furthermore, covariance matrices of genomic dominance effects among individuals can be calculated, similar to matrices of genomic additive relationships, which are widely used in genomic selection, such that dominance effects can be estimated in a GBLUP (genomic best linear unbiased prediction) model [11,12].
In this work, we explored the possibilities of including dominance effects in genomic evaluation and furthermore in planned matings in dairy cattle. We estimated variance components, including dominance variance, in a dataset of genotyped Bavarian Fleckvieh cows, analyzed the predictions of breeding and total genetic values using cross-validation, and predicted total genetic values of specific matings.

Estimation of variance components
First-lactating cows from 145 Bavarian dairy herds (all first-lactating cows of each herd were genotyped), born in 2008 and 2009, were genotyped with the Illumina BovineHD Genotyping BeadChip that includes 777 962 SNPs. SNPs with a call rate lower than 0.9, a minor allele frequency higher than 0.005 and a highly significant deviation (P < 10 −5 ) from the Hardy-Weinberg equilibrium, and SNPs that were not annotated (UMD3) on the autosomes or on the pseudo-autosomal region of the X-chromosome were excluded from the analysis. A total of 629 028 SNPs remained in the dataset after editing. Highdensity SNP genotypes and yield deviations (YD) for nine traits (milk yield, fat yield, protein yield, somatic cell score, milkability, stature, udder score, udder depth and feet and legs score) from 1996 Bavarian Fleckvieh cows were available to (a) estimate variance components, including dominance variance and (b) perform cross-validation in order to evaluate the predictive ability of a model with dominance effects in comparison to a purely additive model. Both studies were done within a GBLUP framework. YD were calculated based on test-day observations adjusted for nongenetic effects, but not for permanent environmental effects, for each lactation and interpolated by the method of best prediction [13,14]. A weighted mean was calculated across lactation YD of a cow in order to obtain one multilactation YD per cow. The effective number of own performances (EOP) [15] was provided as a weight for the multi-lactation YD. For conformation traits, a permanent environmental effect was not modeled because repeated measurements are not available for cows.
Additive genetic (σ 2 A ) and residual (σ 2 E ) variance components were estimated with models MA and MG.
where y is a vector of multi-lactation YD, μ is the overall mean, Z is a design matrix relating YD to breeding values, u is a vector of breeding values of cows, and e is a vector of residuals. Covariance matrices of additive ef- where A is the numerator relationship matrix and G is the genomic relationship matrix. The genomic relationship matrix G * was calculated based on the approach of VanRaden [16] using PREGSF90 [17]: where matrix W a has dimensions of the number of individuals (n) by the number of loci (m), with elements that are equal to 2-2p k and -2p k for opposite homozygous and 1-2p k for heterozygous genotypes, p k is the minor allele frequency of locus k, and q k =1-p k . Matrix G * was scaled so that the means of diagonals and off-diagonals are the same as in A [18,19] and then combined with A to G = 0.95 G * + 0.05 A in order to improve numerical stability. The variance matrix of residual effects was V e ð Þ ¼ Fσ 2 E for both models, where F is a diagonal matrix with reciprocals of the EOP as weights. Extending model MG with dominance effects leads to model MGD: where v is a vector of dominance deviations of cows. V(u) and V(e) are defined as in model MG. The covariance matrix of dominance effects is V v ð Þ ¼ Dσ 2 D ; where D is the genomic dominance relationship matrix and σ 2 D is the dominance variance. Matrix D * was calculated as: where W d has dimensions of the number of individuals (n) by the number of loci (m), with elements that are equal to −2q 2 k for genotype A 1 A 1 , 2p k q k for genotype A 1 A 2 , and −2p 2 k for genotype A 2 A 2 . Matrix D * was then combined with the identity matrix I as D = 0.95 D * + 0.05 I to improve numerical stability.
Estimation of variance components was performed with REMLF90 [20]. Goodness of fit of the respective models to the data was measured by the likelihood. The superiority of model MGD over model MG was tested by a likelihood ratio test, which was calculated as -2ln (likelihood for MG) + 2ln(likelihood for MGD). The likelihood ratio follows a mixture of χ 2 -distributions with 0 and 1 degree of freedom [21]. In addition, variance components of model MGD were estimated by Gibbs sampling using the GIBBS1F90 software [20] in order to compare them with REML results and to calculate standard errors of the estimates. A total of 200 000 iterations of the sampler were run, with the first 20 000 iterations discarded as burn-in samples and every 50 th sample included in the posterior analysis. Convergence to the final distribution was checked with the Geweke diagnostics [22] of the R package coda [23,24].
Additive and dominance variance components at the marker level ( σ 2 a and σ 2 d ) were also estimated with the GS3 software [25] in a Markov chain Monte Carlo algorithm, using a model at the marker level (referred to as the MGD-SNP model hereinafter), in contrast to the previous animal level models: where a and d are vectors of additive and dominant effects of the SNPs, and T and X are incidence matrices coded as {−1, 0, 1} and {0, 1, 0} for the three possible genotypes. The assumed variance-covariance structure was V a ð Þ ¼ Iσ 2 a and V d ð Þ ¼ Iσ 2 d . From the resulting estimates, additive and dominance variance components on the animal level were calculated as: [12]. A total of 300 000 iterations of Gibbs sampling were performed for each trait. The first 20 000 iterations were discarded as burn-in samples and from the remaining 280 000 every 50 th sample was considered for analysis of the posterior distribution.

Prediction of breeding values and total genetic valuescross-validation
Genotyped cows with YD for the respective traits were randomly divided in ten groups in order to perform cross-validation analysis. Typically, splitting at random implies that some validation animals have descendants in the training dataset, which means that the crossvalidation is based on descendants, a case of no interest in reality and which will inflate accuracies [26]. In our dataset, genotyped cows were from a single generation. Therefore, a predicted cow could not have daughters (but, e.g., half-or full-sibs) in the training datasethence limiting upward bias in the estimation caused by progeny of validation animals in the training data. In this setting, the cross-validation accuracy measures the accuracy to predict contemporary cows including half-and full-sibs of training cows. Each group served once as validation group and the calibration group consisted of the other nine groups. Breeding values and total genetic values for the validation group were predicted based on models MA, MG, and MGD with their respective variance components estimated with REMLF90. The correlation between predicted breeding values and YD in the validation group [r YD;û ð Þ] were also calculated. These measures were averaged over the ten validation groups.

Prediction of total genetic values of matings
Genotype probabilities and expectations of purely additive breeding values (u) and total genetic values (g), that include dominance deviations, were calculated for the offspring of all possible matings between 1996 cows and 50 bulls for milk yield and protein yield. The bulls were genotyped and selected for the respective trait on their conventional breeding value after progeny test (including the records of 1996 genotyped cows) from the German-Austrian genetic evaluation. SNP effects a and d were estimated in a BLUP model (BLUP-SNP; equal to model MGD-SNP but with variance components known) using GS3. Variance components σ 2 a and σ 2 d were fixed to values calculated from REMLF90 variance components σ 2 A and σ 2 D (model MGD): The total genetic value g ij of progeny from a mating between bull i and cow j was predicted as in Toro and Varona [9]: where Pr ijk () is the probability of the corresponding genotype at locus k. Analogously, the breeding value u ij of progeny from a mating between bull i and cow j was predicted as: whereα k ¼â k þd k q k − p k ð Þ . Matings can be selected onû to maximize additive genetic gain or onĝ to maximize total genetic superiority. The latter maximizes the productive performance of the offspring, which might be a farmer's interest. However,ĝ can be maximized only for the next generation because gain in the dominance part ofĝ cannot be accumulated in subsequent generations. In our example, additive gain is assured by pre-selection of bulls on their conventional breeding value. Selection onû leads to maximum additive gain, which can be accumulated in subsequent generations, and thus optimizes cumulative multi-generational genetic gain. A desirable objective might be to maximizeĝ of matings and at the same time to keep the expectedû of the offspring as high as possible.
In order to compare the results of these two possible selection strategies,ĝ andû of all possible matings between the 1996 cows and 50 bulls were calculated for milk and protein yields. For each cow, the top mating was selected with respect toĝ orû, with the restriction that a single bull was not mated to more than 200 cows. The expected additive genetic gains and total genetic superiorities with selection onû orĝ were calculated as the difference between the meanû orĝ of selected matings and the meanû orĝ of all possible matings. Figure 1 shows the histograms of off-diagonal elements of the additive and dominance genomic relationship matrices. Means of off-diagonals of G (before scaling) and D were equal to 0, which implies that the population was in Hardy-Weinberg equilibrium. The standard deviation of off-diagonals of G was equal to 0.036, which is five times larger than the standard deviation of offdiagonals of D, i.e. 0.007. The proportion of off-diagonals that were smaller than −0.05 or larger than 0.05 was 6.27% for G but only 0.02% for D. Therefore, matrix D was less informative than G.

Estimation of variance components
Estimated variance components for model MGD are in Table 1. Dominance variance (expressed as a percentage of total genetic variance) for milk production traits ranged from 28.1% for fat yield to 40.9% for protein yield. For somatic cell score and milkability, dominance variance was estimated at 39.0 and 50.5% of the genetic variance. Estimates of dominance variance for conformation traits were quite small, except for udder depth, ranging from 3.3% for stature to 15.3% for feet and legs score. For udder depth, dominance variance was estimated at 23.8% of the genetic variance. For comparison, additive variances estimated with models MA and MG are also in Table 1. With the exception of milkability, the estimates of additive variance from model MG were consistent with additive variance estimates from the dominance model. Estimates of additive variance obtained with the pedigree model MA differed to some extent from those obtained with the genomic models. Estimates of variance components obtained using Gibbs sampling with model MGD and with an equivalent MGD-SNP model are in Table 2 and were similar to REML estimates with model MGD. Geweke statistics [22] showed convergence for model MGD but for the MGD-SNP model, the Gibbs chains did not converge even after 300 000 iterations. However, the means of the Gibbs chains for the MGD-SNP model were similar to those for the MGD model. For stature, udder score and feet and legs score, the estimated dominance variance was clearly larger with both Gibbs sampling analyses than with REML estimation because of a skewed posterior distribution of the Gibbs samples. Estimates of the ratio between dominance and total genetic variance had standard errors around 0.10, which is fairly good for such a small dataset.
For all traits, model MG, which exploited genomic information, fitted the data better than model MA, which included pedigree information only. The superiority of model MGD, which included a dominance effect, compared to model MG was significant for milk yield, fat yield, protein yield, somatic cell score and milkability, based on the likelihood ratio test. Likelihood measures and statistics of the likelihood ratio test between models MG and MGD are in Table 3. The likelihood ratio test statistics were asymptotically χ 2 -distributed [27]. The χ 2distribution function can take only non-negative values because it is defined as a sum of squared values. For two traits (stature and udder score), the likelihood ratio test statistics were negative (but very close to 0), which was due to numerical rounding or not finding the mode of the likelihood exactly. Expected total genetic superiorities and additive genetic gains obtained with the selected matings are in Table 5, both in absolute numbers and relative to the standard deviations (SD) ofû andĝ . When matings were selected onĝ for milk yield, the expected total genetic superiority was estimated to be equal to 165.2 kg, which is equivalent to 1.01 SD ofĝ . The expected total genetic superiority was reduced to 143.8 kg (0.88 SD) when matings were selected onû. The expected additive genetic gain was less sensitive to the selection criterion applied since it was only slightly reduced when selection was done onĝ (137.   Table 2 Estimates of additive and dominance variance components from Gibbs sampling for models MGD and MGD-SNP

Discussion
This study analyzed the importance of dominance variation for several milk production and conformation traits in the Fleckvieh breed using the GBLUP methodology. Additive and dominance genomic relationship matrices were calculated similar to Su et al. [11], except that standard quantitative genetic approaches were used, with the dominance variance at locus k defined as (2p k q k d) 2 [1,12]. This resulted in the reported estimates of dominance variance to be compatible with pedigreebased estimates. Independence between u and v is the classical treatment [1] and it is convenient because it allows orthogonality of the estimates and thus an easy translation into variances and covariances of u and v. However, this independence is contradictory with the phenomena of inbreeding depression and hybrid vigor; presence of inbreeding depression indicates that dominance is directional, e.g. [28]. Wellmann and Bennewitz [10,29] reviewed biological information on milk yield and productive life in Holstein cattle to suggest a priori dependencies between a and d (which would result in dependencies between u and v) and Bayesian regression models that could accommodate those dependencies. The treatment of dependencies between breeding values and dominance deviations  The results are given as mean ± standard error.  Expected total genetic superiority (ΔG) and expected additive genetic gain (ΔU) for the alternative selection criteria total genetic value (ĝ) and breeding value (û) in absolute value (kg) and relative to the standard deviations (SD) of g andû of all possible matings; the maximum number of matings per bull was restricted to 200.
is rather complex and the computational requirements are large, thus, we did not consider this method although it should be a field of further research. Estimates of dominance variance varied from 3.3 to 50.5% of total genetic variance for the analyzed traits. Estimated dominance variance (as a percentage of total genetic variance) was greater for milk production traits than for conformation traits. These results agree with those of Misztal et al. [7], who found larger dominance variance for production than for conformation traits. Moreover, Misztal et al. [3] reported estimates of dominance variance in US Holstein cattle for 14 conformation traits that ranged from 7.3 (rump angle) to 22.3% (strength) of the total genetic variance. This is comparable to the estimates of dominance variance for the conformation traits analyzed in this study. In the literature, reported estimates of dominance variance for milk production traits of Holstein cattle vary considerably ranging from 1.4 to 42.9% of the total genetic variance [4][5][6][7], which are within the same range but smaller than those found in our study. Two reasons may explain the relatively large estimates of dominance variance for milk production traits obtained in our study compared to values reported in the literature: (1) Fleckvieh cattle are genetically more diverse than Holstein cattle, as reflected by the considerably larger effective population size of the Fleckvieh breed [30], which is expected to result in more heterozygosity and in QTL alleles with more intermediate frequencies; (2) all estimates of dominance variance available in the literature were obtained using relationship matrices based on pedigree data; the use of genomic information is expected to improve estimates of dominance effect relationships and reduce potential confounding with additive effects and residuals which is likely to result in different estimates.
Although moderate changes in estimates of additive variance were observed between pedigree and genomic models, estimates of additive variance were consistent for genomic additive and dominance models, except for milkability. Su et al. [11] reported a small difference in estimates of additive variance between additive and dominance models. However, the additive and dominance variances reported in Su et al. [11] result from an alternative partitioning of genetic variance and are thus not directly comparable to the classical partitioning of genetic variance [12]. In studies based on pedigree information, estimates of additive variance have been similar between additive and dominance models [5,6,31].
Both Gibbs sampling with model MGD and at the marker level with the MGD-SNP model resulted in estimated variance components that were comparable with REML estimates for most traits. The relative standard error (calculated as standard error divided by the estimate) of dominance variance was on average 2.7 times larger than the relative standard error of the estimated additive variance, which is expected based on the properties of G and D. However, in other studies the ratio between relative standard errors of dominance and additive variances was even larger, i.e. 4.1 in Misztal [32] and 4.5 in Su et al. [11]. In order to estimate dominance variance more accurately, more dominance-specific information is needed. This could be achieved, e.g., by increasing the number of full-sibs in the dataset. The present dataset contained 3% full-sibs.
Despite the large estimates of dominance variance for most analyzed traits (significantly larger than 0 for five traits), prediction accuracy of breeding values and total genetic values did not change when dominance effects were included in the model. Estimates of additive variance did not differ much between models MG and MGD, which means that additive variance is already captured quite accurately in the additive model. Thus, additive effects are relatively well predicted, whether the dominance effect is modeled or not. The accuracy of predictions of total genetic values in cross-validation was not higher with the dominance than with the additive model because the proportion of full-sibs and dominance effect relationship coefficients between the training and validation datasets were small. Thus, little information was transferred from the reference to the validation group in cross-validation for prediction of dominance effects. Su et al. [11], who analyzed non-additive effects for average daily gain with a dataset of 1911 purebred pigs, observed that the estimates of the additive variance with the additive and dominance models remained fairly constant and that gains in accuracies of predicted breeding values and predicted total genetic values reached only 0.004 and 0.011 with the dominance model. The proportion of fullsibs in the pig dataset was not reported in Su et al. [11] but is expected to be substantially larger than in our cow dataset, which might be the reason for the gain in accuracy of predicted total genetic values with inclusion of dominance in the model. Based on a simulation study, Varona et al. [33] observed that relevant changes in breeding values when switching from an additive to a dominance model were obtained only for animals that had fullsibs or full-sib progeny and little other information. A cow dataset with a larger proportion of full-sibs would contain more information in order to accurately estimate dominance effects but in practice such data is not available. Analysis of full-sib progeny from elite animals, which generally are available, would not be representative for the whole population.
The regression coefficient of YD on predicted breeding values was generally close to 1, with a few exceptions. With the dominance model, this regression coefficient was slightly closer to 1 for most traits but differences were small, which is similar to the data reported by Su et al. [11], i.e. 0.927 and 0.983 with the additive and dominance models, respectively. In our study, the regression coefficient of YD on predicted total genetic values for model MGD was slightly smaller than the regression on predicted breeding values, which agrees with Su et al. [11], but it remained close to the expectation, which means that predictions were unbiased. In general, bias can originate from preferential treatment, unrecognized pre-selection of validation animals, or inappropriate modeling of predictions (i.e. using incorrect variance components).
The results show that selection of matings onĝ instead ofû led to 14.8% (milk yield) and 27.8% (protein yield) greater expected total genetic superiorities and maximized expected productive performance of the offspring. Although the accuracy of estimates of total genetic values was not greater than that of estimates of breeding values, as indicated by the cross-validation results (Table 4), expected total genetic superiority was not impaired by this result because predicted genetic values are best linear unbiased predictions and therefore unbiased expectations [34]. Toro and Varona [9] reported that expected total genetic superiority with optimized mate allocation was 16% greater than with selection on the breeding value only, for a trait with additive and dominance variances amounting to 40 and 10% of the phenotypic variance. Expected additive genetic gain was reduced by only 4.5% for milk yield and by 2.6% for protein yield with selection of matings onĝ instead ofû . Thus, optimization ofĝ of the offspring appears to be feasible without a great loss inû. Our considerations of optimized matings are limited to the first generation offspring. Toro and Varona [9] found that response from assortative mating was only realized in the first generation without any additional response in subsequent generations. Thus, optimization of matings with respect to total genetic value has to be applied in each generation, otherwise the dominance-specific advantage is lost. Toro and Varona [9] pre-selected males and females on their estimated breeding values and then optimized the total genetic value of matings between these pre-selected animals. In our example, only bulls were pre-selected on their conventional breeding value and the optimal bull was determined for each cow based on the expected total genetic value of the offspring. However, the potential of assortative mating to exploit dominance variance optimally by combining mates that are expected to produce offspring with large total genetic values is limited even for these two traits with sizeable dominance variation. This can be caused either by cancellation effects across the genome (i.e., it is extremely unlikely to combine all positive dominance effects) or by a reduced accuracy of the dominance deviation of a mating because of uncertainty about the resulting marker genotypes.

Conclusions
Estimates of genomic variance due to dominance in Fleckvieh cattle ranged from 3 to 50% of the genetic variance and were within the range of published pedigreebased estimates for dairy cattle. The computational complexity and modeling were straightforward. Predictive ability of breeding and total genetic values by crossvalidation was not improved when dominance effects were included in the prediction model, probably because of the limited size of the dataset and the small proportion of full-sibs. There is potential to exploit dominance variance in planned matings in order to increase total genetic value of the offspring (i.e. future performance) without compromising additive genetic gain. Use of planned matings could also be a way to motivate farmers that are otherwise not interested in using genomic breeding values for breeding schemes.