A variance component estimation approach to infer associations between Mendelian polledness and quantitative production and female fertility traits in German Simmental cattle

Background Managing beneficial Mendelian characteristics in dairy cattle breeding programs implies that the correlated genetic effects are considered to avoid possible adverse effects in selection processes. The Mendelian trait polledness in cattle is traditionally associated with the belief that the polled locus has unfavorable effects on breeding goal traits. This may be due to the inferior breeding values of former polled bulls and cows in cattle breeds, such as German Simmental, or to pleiotropic or linkage effects of the polled locus. Methods We focused on a variance component estimation approach that uses a marker-based numerator relationship matrix reflecting gametic relationships at the polled locus to test for direct pleiotropic or linked quantitative trait loci (QTL) effects of the polled locus on relevant traits. We applied the approach to performance, health, and female fertility traits in German Simmental cattle. Results Our results showed no evidence for any pleiotropic QTL effects of the polled locus on test-day production traits milk yield and fat percentage, on the mastitis indicator ‘somatic cell score’, and on several female fertility traits, i.e. 56 days non return rate, days open and days to first service. We detected a significant and unfavorable QTL effect accounting for 6.6% of the genetic variance for protein percentage only. Conclusions Pleiotropy does not explain the lower breeding values and phenotypic inferiority of polled German Simmental sires and cows relative to the horned population in the breed. Thus, intensified selection in the polled population will contribute to increased selection response in breeding goal traits and genetic merit and will narrow the deficit in breeding values for production traits. Supplementary Information The online version contains supplementary material available at 10.1186/s12711-021-00652-z.

horn-related phenotypes, including scurs and polledness, in which the polled locus has a presumed epistatic suppressive function [8].
The broad availability and use of the high-throughput genotyping technology in cattle have enhanced the discovery of new Mendelian genetic characteristics during the past decade [9]. Most of these relevant genetic characteristics are lethal recessive monogenic disorders (e.g. cholesterol deficiency (CDH) [10]) or detrimental recessive haplotypes with unfavorable effects on production and functional traits (e.g. [11]). However, there are a few examples of favorable or beneficial genetic characteristics such as the red factor in Holstein cattle [12] or polledness [3]. Managing the growing number of genetic characteristics is a major challenge in current dairy cattle breeding programs [13,14]. The management of beneficial genetic characteristics implies an increase in the frequency of the desired causative alleles by selection, ultimately until fixation, and the genomic regions that are linked to the causative alleles will also be fixed. Thus, it is important to examine the pleiotropic or linked effects of the desired alleles before aiming for their fixation.
The basis of genetic trait associations is either pleiotropy or linkage [15]. From the perspective of the Mendelian trait polledness, genetic associations with other traits may be caused by direct or linked effects of causative variants for polledness, in analogy to a quantitative trait locus (QTL). Differences in the means of a quantitative trait between genotype groups at a single marker locus is a classic example of a potential pleiotropic QTL effect [15,16]. Lower means for the breeding values of production traits in homo-and heterozygous polled animals, which are in part statistically significant, have been reported for various breeds [17][18][19][20][21]. In contrast, for reproduction traits, neutral or even positive effects have been estimated [17,18,20]. One explanation for breeding value inferiority in polled German Simmental is the history of introgression from Simmental beef populations, which possibly result in conserved chromosomal segments with detrimental effects that are distant from the polled locus [21]. Consequently, this breeding value inferiority could also be caused by genetic drift (i.e. a random increase of less favorable alleles), because all polled animals descend from only a few polled founder animals [21]. To date, simple comparisons of breeding values do not explain the genetic mechanisms that underlie the associations of polledness with other quantitative traits. In this study, our aim was to separate direct QTL effects of the polled locus from associated polygenic effects (i.e., genetic variation in genomic locations that are distant from the polled locus on BTA1 or other chromosomes) by using available genomic and pedigree information.
Variance component (VC) estimation methods that incorporate single or multiple marker genotype information to infer QTL effects at a given chromosomal segment or position appear to be suitable to test for correlated effects of the polled locus [22,23]. In this context, random QTL allele effects can be modelled using either marker-based gametic (hereafter called G v ) or numerator (hereafter called A v ) relationship matrices [24]. These matrices reflect expected identity-by-descent (IBD) relationships between individuals inferred from genotypes at given marker loci, e.g., the polled locus. In contrast to A v , the pedigree-based relationship matrix A reflects the expected IBD relationships across the entire genome that are inferred from known pedigree relationships and is traditionally used in VC models to model the polygenic additive genetic variance. The combination of A v and A in a univariate linear model for a trait of interest, in our case, traits that are potentially affected by pleiotropic effects of the polled locus, allows separation of additive QTL effects of the polled locus from the remaining additive polygenic effects. The outlined VC estimation approach to separate polygenic and single-locus QTL effects has been extended to multivariate models to increase the power of QTL detection [22,25]. When applied to the main question of our study, a bivariate approach including phenotypes for the polled trait will also enable the estimation of the genetic correlations based on A v , which would reflect the direction of the pleiotropic effects.
Our aim was to derive complete polled genotypes in a complex pedigree to be able to calculate the IBD probabilities and to test for pleiotropic or linked QTL effects of the Mendelian polled locus on production and reproduction traits in German Simmental cattle via univariate and bivariate VC estimation.

Phenotypes and genotypes for polledness
The polled status in German Simmental cattle is routinely recorded via a collaboration between farmers and the Bavarian milk recording organization [21]. Only polled animals are specifically registered in the database using the following labels: PP = homozygous polled; Pp = heterozygous polled; PS = heterozygous polled, scurred; P = polled, exact genotype unknown. Animals with a genotype test result are marked with an asterisk (*) that is added to the polled label. Animals without any label in the database are assumed to be horned. Sorting and selection of the polled animals from the German Simmental database was performed in two steps. First, we identified 89 farms for which between 25 and 75% of the animals were registered as polled at the end of 2015, in order to generate an almost balanced sampling design of polled and horned animals in target farms. As the number of newborn polled calves has only recently started to increase [21], we included a second filtering step that selected farms with at least one polled calf born per year during the 2007-2013 period. Thus, we created a final dataset including 24 farms with 2420 polled and horned cows for which phenotypes for production and fertility traits from multiple generations were available. Genotype frequencies prior to the inference of missing and falsely-registered polled genotypes in the final dataset are in Table 1. Based on the 2420 selected cows, the full pedigree included 13,256 animals traced back five generations.

Inference of polled genotypes in the full pedigree to calculate the A v matrix
Available polled labels for ancestors of the selected cows in the final dataset were taken from the literature [17,21,26,27], and extracted from public databases for Simmental and other breeds such as Holstein and Brown Swiss. The method to infer missing and falsely-registered polled genotypes in the full pedigree consisted of different procedures and tests, which were all performed using self-written R [28] functions and scripts (see Additional file 1). The main steps of the algorithm were: (i) iterative reconstruction of missing parent genotypes, (ii) correction of Mendelian inheritance errors, and (iii) derivation of polled genotypes based on progeny genotype statistics. For further detailed information (see Additional file 1). Table 1 gives an overview of the frequencies of genotype labels in the dataset including ancestors after the inference procedures.
A comparison of the unchanged initial genotypes as reported from the routine records and the genotypes after inference showed a significant amount of non-registered (e.g. progeny of gene-or progeny-tested homozygous polled sires or dams) and also falsely-registered (e.g. polled progeny from horned matings) animals (see Table 1) and (see Additional file 3). Hence, in a preliminary study that focused on variance component models for the trait polledness, the inference of polled genotypes greatly improved the model fit compared to the expectations for a Mendelian trait (see Additional file 3).

Calculation of A v
Inferred genotypes at the polled locus from all 13,256 animals in the pedigree were used to compute the probabilities of inheriting the paternal or maternal alleles from the sire and dam at the polled locus, starting from the founders. Identical-by-descent (IBD) probabilities between the alleles at the polled locus of any two founders were assumed to be zero. Genotypes at the polled locus were treated as a tri-allelic marker. Hence, polled alleles of founder animals from breeds that predominantly carry the "Celtic" polled allelic variant such as Simmental, Brown Swiss and most beef breeds are initially coded differently than those of founder animals from breeds that predominantly carry the "Friesian" polled allelic variant, such as Holstein and Jersey [3]. However, this differentiation is only effective for informative matings, thus for uninformative matings both polled allelic variants are treated as the same allele.
The G v matrix based on previously computed IBD probabilities at the polled locus was computed using the algorithm reported by van Arendonk et al. [24] with a self-written R function (see Additional file 2). G v as a gametic relationship matrix has an order twice the number of animals. Therefore, after completing the computation, G v was scaled down to the dimensions of a marker-based numerator relationship matrix A v using the matrix transformation A v = 1 2 KG v K ′ with K = I n ⊗ [1, 1] and n the number of animals. The inverse of A v was computed using the function solve() from the base package in R [28], and was then used in the VC estimations of QTL effects.

Cow traits
Test-day production traits included milk yield (MY), protein percentage (P%), fat percentage (F%) and the health indicator somatic cell score (SCS). In total, 58,262 testday records from lactations 1 to 7 recorded from 2005 to 2015 were included for parameter estimation. Female fertility traits were binary 56 days non-return-rate (NRR-56), days open (DO) and days to first service (DFS) for cows. NRR-56, DFS and DO were calculated from routine insemination data collected from 2004 to 2014. Only the first four lactations were considered for parameter estimation of female fertility traits. Descriptive statistics for all analyzed cow traits are in Table 2.

Statistical models
Variance components for all cow traits were estimated in single trait animal models using the software package DMU and the implemented AI-REML algorithm [29]. The basic linear model without QTL effects ( M Basic ) for production, fertility and health traits was defined as: where y is a vector of cow traits, β is a vector of fixed effects, u is a vector of random additive polygenic effects, p is a vector of random permanent environment effects, and e is a vector of random residual effects. X , Z a and Z p are the incidence matrices for fixed, additive-genetic and permanent environmental effects, respectively. The random effects a , p and e were assumed to be uncorrelated and to follow univariate normal distributions as follows: and e ∼ N m 0, Rσ 2 e , with σ 2 a , σ 2 p and σ 2 e being the polygenic variance, permanent environmental variance and residual variance, respectively. A is the standard additive genetic relationship matrix and I is and identity matrix and R is a known diagonal matrix.
Fixed effects considered in the models for the test-day traits were lactation, herd-test day and calving season. Days pregnant and calving age (linear regression) and days in milk (Legendre polynomials of order 3) were considered as covariates.
A threshold model using a logit link function was defined for the NRR-56 trait without changing the structure of fixed and random effects described above. Fixed effects in the models for the fertility traits were the combined effects of herd-year, type of insemination-year, and lactation-calving age.
The extended linear univariate QTL models ( M QTL ) were defined as: for production and fertility traits, respectively, with the same properties as described for M Basic but adding v , a vector of additive QTL effects with the distribution v ∼ N q 0, A v σ 2 v , and the corresponding incidence matrix Z v .
For the bivariate analyses, we defined the following models based on the basic and extended QTL model as described above. The basic linear models without QTL effects ( M Basic bivariate ) for the quantitative traits and the Mendelian trait polledness were defined as: where y 1 is a vector of phenotypes for the respective quantitative trait, and y 2 is a vector of phenotypes for the Mendelian trait polledness. Based on preliminary studies that focused on the identification of suitable VC models for the analysis of the Mendelian polled trait, we chose numerically-coded genotype labels, which represent the allele content at the polled locus as phenotypes for the trait polledness (i.e., 0 for horned animals, 1 for heterozygous polled animals and 2 for homozygous polled animals). All effect categories were identical to those in the univariate models. Due to the Mendelian inheritance of the polledness trait, environmental effects do not affect the phenotype. Thus, fixed effects were excluded from VC estimation for polledness in the bivariate models.
The extended linear QTL models ( M QTL bivariate ) were defined as: with the same properties as described for the basic models, and adding v , a vector of additive QTL effects with the distribution v ∼ N q 0, A v σ 2 v , and Z v , an incidence y 1 = Xβ + Z a a + Z p p + e, and y 2 = Z a a + e, and y 2 = Z a a + Z v v + e, matrix relating animals to phenotypes as in the univariate QTL models. Based on covariance estimates from the bivariate models, we estimated the genetic correlation based on QTL effects ( r g v ) (i.e., the "monogenic" genetic correlation between the polledness trait and the evaluated quantitative traits), the genetic correlation based on additive polygenic effects ( r g a ) and the phenotypic correlation ( r p ).

Test statistics
The hypothesis test for the presence of pleiotropic QTL effects at the polled locus was based on the asymptotic distribution of the likelihood ratio test (LRT) statistic: where L BASIC and L QTL were the maximized likelihoods under M Basic and M QTL, respectively. The asymptotic distribution of the likelihood ratio test statistic follows a χ 2 -distribution, with the number of degrees of freedom equal to the difference between the number of independent parameters of both models [20]. LRT tests for the presence of pleiotropic QTL-effects at the polled locus were only calculated for the univariate models after VC estimation.
Although the outlined VC approach appears to be straightforward for our research question, to our knowledge no previous studies have proved its applicability using either simulated or real data. As flexible stochastic simulation packages to simulate precise genomic trait architectures are available [30,31], we decided to validate our approach by simulation. Hence, we performed a small preliminary stochastic simulation study prior to the analyses of the real data. The results of the preliminary simulation study are in Additional file 4. Table 2 includes the descriptive statistics for all the evaluated traits separated by polled genotype groups. The means for the production traits clearly reflect the phenotypic inferiority of polled animals, e.g. horned cows (pp) produced 2.71 kg and 4.40 kg more milk on average than polled Pp and PP animals, respectively. In contrast, the means for SCS and fertility traits reflect an advantage of the polled compared to the horned cows. Standard deviations were in similar ranges across all traits and groups.

Results
Standard errors of the heritabilities for all production and fertility traits from both models M Basic and M QTL were quite small and in the range from 0.018 to 0.049 for the univariate and from 0.014 to 0.048 for the bivariate models (see Tables 3 and 4). Estimates of the permanent environmental variances, residual variances and Table 3 Variance components for test-day production traits and SCS in the real dataset Corresponding variance components for the polledness trait from the bivariate models are included in Additional file 5 MY: milk yield; F: fat, P: protein; SCS: somatic cell score σ 2 a = additive genetic variance based on A ; σ 2 v = additive genetic variance based on A v ; σ 2 PE = permanent environment variance; σ 2 e = residual variance; QTL − h 2 = QTL heritability calculated as σ 2 v /σ 2 a + σ 2 v + σ 2 PE + σ 2 e ; h 2 (SE) = overall heritability and standard error (in brackets) calculated as σ 2 v + σ 2 a /σ 2 a + σ 2 v + σ 2 PE + σ 2 e ; LRT p (λ) = p-and lambda values from likelihood ratio tests With regard to the M QTL applications, we detected a statistically significant QTL effect of the polled locus that contributed up to 6.6% of the genetic variance and 2.3% of the phenotypic variance for P%, respectively. However, the estimated QTL effect in the corresponding bivariate model was substantially smaller. The genetic correlation based on the estimated direct QTL effect was slightly negative (see Table 5).
The estimated direct QTL effects of the polled locus on production traits MY, F% and on SCS were consistently low and not significant. Comparison of the univariate with the bivariate QTL models indicates that the QTL effects from the bivariate models are smaller for the moderately heritable production traits. All the estimates of QTL effects for the fertility traits were non-significant and those from the corresponding bivariate models were considerably smaller, thus following the same trend as for the production traits.
The estimates for phenotypic and genetic correlations (Table 5) support the very small and non-significant estimated direct QTL effects of the polled locus. The genetic correlations estimated from the direct effect of the polled locus ( r g v ) reflect no antagonistic but rather neutral relationships with test-day production traits. This is also the case for P% for which we detected a small QTL effect in the univariate models. Genetic correlations based on direct effects of the polled locus with SCS and all the fertility traits were favorable from a breeding perspective. Interestingly, the genetic correlations estimated from the separated polygenic effects based on the pedigree ( r g a ) were close to zero for all traits, which further disprove any antagonistic relationships associated with the polled status.  Table 5 Phenotypic and genetic correlations between polledness, test-day production traits, SCS and fertility traits Standard errors for the genetic correlations were calculated based on heritability estimates and corresponding standard errors according to [15] r P : phenotypic correlation calculated based on variance estimates from the QTL models; r g v (SE): genetic correlation and standard errors (in brackets) calculated based on variance estimates for v ; r g a (SE): genetic correlation and standard errors (in brackets) calculated based on variance estimates for a *Since the full bivariate model including A and A v for the trait DFS did not fully converge, we present the results for a model excluding A in the model for polledness; r g a . cannot be estimated from this model. The results for the preliminary simulation study showed a general suitability of our approach for the detection of predefined QTL effects (see Additional file 4). However, the estimated QTL effects for the simulated trait with a low heritability were statistically non-significant, in contrast to the results for the simulated moderately heritable trait. Interestingly, the bivariate QTL models generally performed better than the univariate models and showed the highest accuracy to detect the simulated QTL effects. The estimates of the heritability of polledness obtained with the bivariate models for the numerically-coded allele content phenotype were close to 1, and thus close to the expected value of 1 for a monogenic Mendelian trait (see Additional file 5).

Discussion
The descriptive statistics in our dataset are in line with the reported inferiority of polled animals for production traits and the benefits for health and fertility traits [17,21]. The moderate estimates of the heritability obtained for the test-day production traits MY, F% and P% and the low estimate of the heritability for test-day SCS are in line with previously reported estimates in the German Simmental population [32,33]. The low estimates for heritability of fertility traits are also in line with those found in the literature for the German and Austrian Simmental populations [34,35].
Estimates of the variance component for the production traits, SCS and the female fertility traits revealed no significant QTL effects, apart from a moderately large contribution (i.e. ~ 5%) of the polled locus to the genetic variance of P%. In consequence, we refuted any concerns for possible unfavorable effects of the polled locus on production, udder health and female fertility traits in German Simmental cattle. At first sight, this contrasts with previous studies in the Simmental population based on the comparison of breeding values between polled and horned sires [17,21]. However, to our knowledge, our study is the first attempt to separate direct pleiotropic effects of the polled locus from other negative polygenic effects present in polled families.
Based on our results, there is no general antagonistic or detrimental effect of the polled alleles on productivity, udder health and female fertility. The formerly general and presently partial inferiority of polled animals is most likely due to genetic drift caused by a small number of polled Simmental founders or ancestors from beef type cattle with inferior genetic values in dairy traits. Strong associations due to pleiotropy or tight linkage would imply unidirectional effects across breeds that are unaffected by selection over time. In contrast, associations due to genetic drift have a random nature and can be altered by selection and recombination. Thus, the closing gap in production traits between polled and horned animals in Simmental [21] and Holstein [36] due to selection clearly points to genetic drift as the reason for the initial breeding value inferiority of polled animals. This is further supported by the striking analogy that polled animals in the Simmental and Holstein breeds both descend from very small inferior polled founder pools [21,36]. Recently published results in beef cattle breeds that are under intensive selection for polledness since the last 20 years provide additional evidence that there are no systematic detrimental effects on growth and carcass traits across breeds in this regard [37]. Therefore, the remaining breeding value inferiority of polled sires can be overcome by selective breeding, i.e., continued targeted mating of superior young polled sires with superior horned dams while eliminating polled selection candidates. Nonetheless, linkage based on long-range LD might also contribute partly to the remaining deficit in production traits of polled animals, even if no direct pleiotropic effects exist.
Our results also give further support to previously published simulation results on breeding strategies for polledness (e.g. [13,14,[38][39][40][41]), since all the studies implicitly ignored potential persisting pleiotropic effects. In summary, a moderate selection intensity for polledness of genomically-selected sire candidates appears to be the best compromise to narrow down the gap in production traits even more between polled and horned animals in the population while preserving populationwide maximal genetic gain [38,39]. In German Simmental, highly intensified selection, especially among cows, which for example could be based on an index incorporating polledness, involves the risk of losing genetic gain because of the current inferiority of polled animals [13,14,38]. Recent simulation studies that applied novel gene-editing methods showed the potential of this technology to potentially overcome such a loss of genetic gain [42,43], and polledness is one of the first traits for which the technology was successfully applied in cattle [44][45][46].
It should be noted that selective breeding for polledness in German Simmental is an ongoing dynamic process that has significantly changed and still changes the frequency of the polled allele in the population. Therefore, given that the estimates of QTL effect and variance depend on allele frequency, we recommend monitoring the effects of the polled locus in the future, when the proportion of polled animals has substantially increased. In addition, although non-significant, the estimated QTL effects for NRR-56, DFS and DO reflect significant proportions of the overall estimated genetic variance for each of these traits. This may be due to a lack of power to accurately estimate the effects of the polled locus on these traits, and a larger sample size should be used in future studies.
Nonetheless, we were able to show that a relationship matrix based on polled genotypes ( A v ) allowed us to estimate direct QTL effects of the polled trait. However, with the broad availability of imputed data on polled genotypes from commercially-used SNP chips [3,47], we are confident that larger datasets from polled animals will become broadly available to further validate our results. Certainly, in addition to this methodological aspect, broad genotyping and genomic selection will also practically improve selection of superior polled animals.
Given that the number of polled animals in the population is constantly growing and that young animals are much more frequently genotyped in the current genomic breeding schemes, it will therefore be much easier to analyze extensive datasets in German Simmental cattle and other breeds by using the methodological framework developed here. In this regard, explicitly modelling and dissecting more precisely the potentially diverging effects of the known polled variants such as P c and P f based on genotype data could help to answer the remaining questions concerning different family effects in association with the polled trait [18,48].

Conclusions
Our results reveal no direct pleiotropic or linked QTL effects of the polled locus on the studied production traits MY and F%, on the udder health indicator SCS and on the female fertility traits, NRR56, DO and DFS, in German Simmental cattle. Only one statistically significant direct QTL effect of the polled locus on P%, was detected. Thus, further selection on polledness is not expected to result in negative side effects on breeding goal traits. Based on our results, we conclude that any remaining inferiority of polled cows and bulls will be reduced by increasing the dissemination of the polled alleles and by intensive simultaneous selection on breeding goal traits and polled genotypes in the German Simmental population.