Skip to main content

Inbreeding depression across the genome of Dutch Holstein Friesian dairy cattle

Abstract

Background

Inbreeding depression refers to the decrease in mean performance due to inbreeding. Inbreeding depression is caused by an increase in homozygosity and reduced expression of (on average) favourable dominance effects. Dominance effects and allele frequencies differ across loci, and consequently inbreeding depression is expected to differ along the genome. In this study, we investigated differences in inbreeding depression across the genome of Dutch Holstein Friesian cattle, by estimating dominance effects and effects of regions of homozygosity (ROH).

Methods

Genotype (75 k) and phenotype data of 38,792 cows were used. For nine yield, fertility and udder health traits, GREML models were run to estimate genome-wide inbreeding depression and estimate additive, dominance and ROH variance components. For this purpose, we introduced a ROH-based relationship matrix. Additive, dominance and ROH effects per SNP were obtained through back-solving. In addition, a single SNP GWAS was performed to identify significant additive, dominance or ROH associations.

Results

Genome-wide inbreeding depression was observed for all yield, fertility and udder health traits. For example, a 1% increase in genome-wide homozygosity was associated with a decrease in 305-d milk yield of approximately 99 kg. For yield traits only, including dominance and ROH effects in the GREML model resulted in a better fit (P < 0.05) than a model with only additive effects. After correcting for the effect of genome-wide homozygosity, dominance and ROH variance explained less than 1% of the phenotypic variance for all traits. Furthermore, dominance and ROH effects were distributed evenly along the genome. The most notable region with a favourable dominance effect for yield traits was on chromosome 5, but overall few regions with large favourable dominance effects and significant dominance associations were detected. No significant ROH-associations were found.

Conclusions

Inbreeding depression was distributed quite equally along the genome and was well captured by genome-wide homozygosity. These findings suggest that, based on 75 k SNP data, there is little benefit of accounting for region-specific inbreeding depression in selection schemes.

Background

Inbreeding depression refers to the decrease in mean performance with increased levels of inbreeding [1]. Many important traits in dairy cattle show inbreeding depression [2,3,4,5]. For example, a 1% increase in pedigree-based inbreeding is associated with a decrease in 305-day milk yield of 20 to 38 kg and with an increase in calving interval of 0.2 to 0.7 days [6,7,8]. The reduction in mean performance is believed to be caused by the increase in homozygosity associated with inbreeding, reducing the expression of dominance effects [1, 9]. When dominance effects are on average favourable (i.e. when there is directional dominance in the favourable direction), their reduced expression results in a lower phenotypic performance. Not all genomic loci are expected to contribute equally to inbreeding depression. The expected contribution of a locus depends on both its dominance effect (higher with larger dominance effect) and its allele frequency (higher at intermediate allele frequencies) [1, 9]. Interactions between loci, i.e. epistasis, may play a role in explaining inbreeding depression as well. However, epistasis is difficult to prove and difficult to account for in statistical models. Therefore, epistasis is typically ignored. When epistasis is ignored, the change in mean phenotypic performance due to inbreeding equals \(-F{\sum }_{i}{2p}_{i}{q}_{i}{d}_{i}\), where \(F\) is the genome-wide inbreeding coefficient, \({d}_{i}\) is the statistical dominance effect at locus \(i\), and \({p}_{i}\) and \({q}_{i}\) are the allelic frequencies [1].

The increasing availability of single nucleotide polymorphism (SNP) data enables the study of inbreeding depression along the genome. SNPs are expected to capture effects of quantitative trait loci (QTL) in linkage disequilibrium (LD) with the SNPs. Traditionally, single SNP genome-wide association studies (GWAS) have been conducted to identify significant dominance (and additive) associations [10,11,12]. In such studies, one SNP is fitted at a time and typically a pedigree-based or genomic relationship matrix is included to account for population structure and prevent inflation of type I errors (e.g. [12]). With approaches that are more novel, all SNP effects can be estimated simultaneously. For example, additive genetic and dominance relationship matrices can be computed [13] and these matrices can be fitted in a mixed model using genomic best linear unbiased prediction (GBLUP) to estimate additive genetic and dominance effects, after which additive and dominance effects of single SNPs can be obtained through back-solving (e.g. [14]). Since variance components need to be estimated, genomic residual maximum likelihood (GREML) can be used, based on the same mixed model, yielding estimates of random effects and variance components simultaneously, whereas GBLUP assumes that variances are known [15]. Benefits of GREML SNP solutions, over those from a single SNP GWAS, are that all SNP effects are estimated simultaneously (i.e. accounting for other SNPs in LD) and that effects are regressed towards the mean depending on information in the data.

In addition to the estimation of dominance effects, there is an increasing interest in the use of regions of homozygosity (ROH) to quantify inbreeding and inbreeding depression [2, 4, 9, 16]. When compared to homozygosity of individual SNPs, ROH may better capture (region-specific differences in) inbreeding depression. This is due to two reasons. First, as a multi-locus measure of LD, ROH are expected to better capture the probability that QTL located in between the SNPs of the ROH are homozygous. When many subsequent SNPs are homozygous, i.e. are in ROH, it is very likely that the loci between those SNPs are also homozygous. Homozygosity of individual SNPs is expected to be less predictive of the homozygosity at QTL, because of the strong dependence on the LD between the QTL and the individual SNPs (e.g. [45]). Second, ROH capture more recent inbreeding, and recent inbreeding is expected to be more harmful than old inbreeding, although empirical results do not always support this hypothesis [2, 8]. In a simulation study, Keller et al. [17] found that, among the inbreeding measures that they investigated, ROH-based inbreeding performed best in capturing the homozygous inbreeding load. Martikainen et al. [4] estimated the effect of ROH-based inbreeding on fertility traits in Finnish Ayrshire cattle, first per chromosome and then within chromosomes using a sliding window approach. Pryce et al. [2] performed a single SNP GWAS to study the effect of ROH on yield traits and calving interval in Australian Holstein and Jersey cattle. In their approach, the ROH-status of a SNP was set to 1 when the SNP was in a ROH (irrespective of which ROH), and to 0 otherwise [2]. Ferenčaković et al. [16] performed a similar analysis for sperm quality traits in Austrian Fleckvieh bulls. Although these studies did report candidate regions associated with inbreeding depression, they did not consider how much of the total phenotypic variation was explained by ROH effects (in relation to additive and dominance effects).

The objective of this study was to estimate dominance and ROH effects across the genome for Dutch Holstein Friesian dairy cattle and to estimate the contribution of these effects to the phenotypic variance. For various yield, fertility and udder health traits, first we ran GREML models to estimate the effect of genome-wide homozygosity and estimate the amount of variance attributable to additive, dominance and ROH effects. Then, we obtained individual SNP effects through back-solving. We also performed a single SNP GWAS to estimate additive, dominance and ROH effects per SNP and compared GWAS estimates with those obtained from the GREML approach.

Methods

Data

In total, 38,792 first-parity cows (fraction Holstein Friesian > 87.5%, either red or black), which calved in the period 2012–2016 in 233 herds, were included. The same dataset was used as in Doekes et al. [8]. Genotype and phenotype data were provided by the Dutch-Flemish cattle improvement co-operative (CRV; Arnhem, the Netherlands). Cows were genotyped with the Illumina BovineSNP50 BeadChip (v1 and v2) or CRV custom-made 60 k Illumina panel (v1 and v2). Genotypes were imputed to approximately 76 k, following Druet et al. [18]. The 75,538 SNPs used by Doekes et al. [8] were remapped to the ARS-UCD1.2 assembly, using the NAGRP Data Repository [19] and the NCBI Genome Remapping Service [20]. The final dataset comprised 75,377 successfully remapped SNPs. The distribution of the allelic frequencies of these SNPs was approximately uniform [see Additional file 1: Figure S1].

Phenotypic data included yield, fertility and udder health traits. For yield, the 305-day milk yield (MY; in kg), 305-day fat yield (FY; in kg) and 305-day protein yield (PY; in kg) were included. For fertility, the calving interval (CI; in days), interval calving to first insemination (ICF; in days), interval first to last insemination (IFL; in days) and conception rate (CR; in %) were included. For udder health, the mean somatic cell scores for day 5 through to 150 (SCS150; in units) and day 151 through to 400 (SCS400; in units) were included. Somatic cell scores were calculated as 1000 + 100*[log2 of cells/mL]. Descriptive statistics for the different traits are reported by Doekes et al. [8].

Identification of ROH

Regions of homozygosity (ROH) were identified with the Plink 2.0 software [21]. The following criteria were used to define a ROH: (i) a minimum physical length of 1 Mb, (ii) a minimum of 15 SNPs, (iii) a minimum density of 1 SNP per 100 kb, (iv) a maximum of 1 heterozygous call within a ROH, and (v) a maximum gap of 500 kb between two consecutive SNPs. Since genotypes were imputed to 76 k, there were no missing genotypes. A scanning window of 15 SNPs was used, with a maximum of 1 heterozygote call per window. The Plink command was plink --cow --homozyg --homozyg-density 100 --homozyg-gap 500 --homozyg-het 1 --homozyg-kb 1000 --homozyg-snp 15 --homozyg-window-het 1 --homozyg-window-snp 15. The use of criteria such as a maximum gap of 500 kb will have resulted in some SNPs having a lower probability of being in a ROH (e.g. there were 66 gaps of > 500 kb), but will also have reduced the number of false positive ROH. The SNP density was relatively uniform along the genome (see Additional file 1: of Doekes et al. [22]).

Statistical models

Additive, dominance and ROH effects were estimated with two approaches: (i) a GREML model with back-solving, and (ii) a single SNP GWAS. For both approaches, the classical (“statistical”) parametrization was used for additive and dominance effects, which implies among others that additive effects were calculated as allele substitution effects (see Vitezica et al. [13]).

GREML with back-solving

GREML models were used to estimate all SNP effects simultaneously and to estimate variance components. For each trait, three models were run in mtg2 [23]: one with only additive effects (A), one with additive and dominance effects (AD), and one with additive, dominance and ROH effects (ADR). Model A was:

$${\mathbf{y}} = {\mathbf{Xb}} + {\mathbf{Qc}} + {\mathbf{u}} + {\mathbf{e}}, \quad \quad \quad {\text{Model A}}$$

where \(\mathbf{y}\) is a vector of phenotypes; \(\mathbf{X}\) is an incidence matrix that related the observations to fixed effects; \(\mathbf{b}\) is a vector of fixed effects that included herd of calving (233 levels), year of calving (5 levels), season of calving (4 levels, defined as the four quarters of a year), age at calving (as linear covariate) and genome-wide SNP homozygosity (as linear covariate, to account for genome-wide inbreeding depression), the latter being calculated as the proportion of homozygous SNPs; \(\mathbf{Q}\) is an incidence matrix that related observations to random herd-year-season effects; \(\mathbf{c}\) is a vector of random herd-year-season effects (4596 levels), which were assumed to be distributed as \(\mathbf{c}\sim N(\mathbf{0},\mathbf{I}{\sigma }_{HYS}^{2})\), with \({\sigma }_{HYS}^{2}\) being the herd-year-season variance and \(\mathbf{I}\) an identity matrix; \(\mathbf{u}\) is a vector of random polygenic additive effects (i.e. breeding values), which were assumed to be distributed as \(\mathbf{u}\sim N(\mathbf{0},\mathbf{G}{\sigma }_{A}^{2})\), with \(\mathbf{G}\) being the genomic relationship matrix and \({\sigma }_{A}^{2}\) the additive genetic variance; and \(\mathbf{e}\) is a vector of random residuals, which were assumed to be distributed as \(\mathbf{e}\sim N(\mathbf{0},\mathbf{I}{\sigma }_{E}^{2})\), with \(\mathbf{I}\) being an identity matrix and \({\sigma }_{E}^{2}\) the residual variance.

Model A was extended to model AD by adding a dominance term:

$${\mathbf{y}} = {\mathbf{Xb}} + {\mathbf{Qc}} + {\mathbf{u}} + {\mathbf{v}} + {\mathbf{e}},\quad \quad \quad {\text{Model AD}}$$

where \(\mathbf{v}\) is a vector of random polygenic dominance deviations, which were assumed to be distributed as \(\mathbf{v}\sim N(\mathbf{0},\mathbf{D}{\sigma }_{D}^{2})\), with \(\mathbf{D}\) being the dominance relationship matrix and \({\sigma }_{D}^{2}\) the dominance variance.

Model AD was further extended to model ADR by adding a ROH term:

$${\mathbf{y}} = {\mathbf{Xb}} + {\mathbf{Qc}} + {\mathbf{u}} + {\mathbf{v}} + {\mathbf{w}} + {\mathbf{e}},\quad \quad \quad {\text{Model ADR}}$$

where \(\mathbf{w}\) is a vector of random polygenic ROH deviations, which were assumed to be distributed as \(\mathbf{w}\sim N(\mathbf{0},\mathbf{R}{\sigma }_{ROH}^{2})\), with \(\mathbf{R}\) being a ROH-based relationship matrix and \({\sigma }_{ROH}^{2}\) the ROH variance.

The additive genomic relationship matrix (\(\mathbf{G}\)) was computed with calc_grm [24], according to VanRaden [25]:

$$\mathbf{G}=\frac{\mathbf{Z}{\mathbf{Z}}^{\mathbf{^{\prime}}}}{{\sum }_{i}2{p}_{i}{q}_{i}}$$

where \({p}_{i}\) is the allele frequency of allele A at SNP \(i\), \({q}_{i}\) the allele frequency of allele B at SNP \(i\) and \(\mathbf{Z}\) is the additive marker covariate matrix with elements of \(-2{p}_{i}\)\(1-2{p}_{i}\), and \(2-2{p}_{i}\) for genotypes BB, AB, and AA, respectively.

The dominance relationship matrix (\(\mathbf{D}\)) was computed with calc_grm [24], according to Vitezica et al. [13]:

$$\mathbf{D}=\frac{{\mathbf{H}\mathbf{H}}^{\mathbf{^{\prime}}}}{{\sum }_{i}{\left(2{p}_{i}{q}_{i}\right)}^{2}},$$

where \(\mathbf{H}\) is the dominance marker covariate matrix with elements of\(-2{{p}_{i}}^{2}\), \(2{p}_{i}{q}_{i}\), \(-2{{q}_{i}}^{2}\) for genotypes BB, AB, and AA, respectively.

The ROH-based relationship matrix (\(\mathbf{R}\)) was introduced in this study to quantify the effect of a SNP being in a ROH (irrespective of which ROH). \(\mathbf{R}\) was computed as a cross-product matrix:

$$\mathbf{R}=\frac{\mathbf{M}{\mathbf{M}}^{\mathbf{^{\prime}}}}{{\sum }_{i}{p}_{i}^{*}{q}_{i}^{*}},$$

where \({p}_{i}^{*}\) is the frequency of SNP \(i\) being in a ROH, \({q}_{i}^{*}\) is the frequency of SNP \(i\) not being in a ROH, and \(\mathbf{M}\) is the ROH marker covariate matrix with elements of \(1-{p}_{i}^{*}\) for being in a ROH and of \(0-{p}_{i}^{*}\) for not being in ROH. Hence, element \({R}_{jk}\) could be interpreted as the (centered and scaled) probability that animals \(j\) and \(k\) both have a ROH at a random SNP. Note that the ROH does not have to be the exact same ROH for both animals. For example, if animal \(j\) has a ROH ranging from the 10th to the 40th SNP, and animal \(k\) has a ROH ranging from the 20th to the 45th SNP, then both animals “share” a ROH from the 20th SNP to the 40th SNP. Even if animals “share” a ROH at the same location, the ROH does not have to be identical. Namely, animal \(j\) could have a ROH with allele 1, and animal \(k\) with allele 2. It is important to note that the \(\mathbf{R}\) matrix is different from the segment-based (or haplotype-based) relationship matrix that was defined by De Cara et al. [26], which is sometimes also referred to as a ROH-based relationship matrix. In our study, \(\mathbf{R}\) indicates whether animals are inbred at the same genomic positions, i.e. have ROH at the same genomic positions, whereas the segment-based relationship matrix indicates the probability that, if you mated two animals, the offspring would carry ROH. A more detailed explanation of how \(\mathbf{R}\) was calculated, including a numerical example, is provided in [Additional file 2: Tables S1 and S2].

Goodness-of-fit of Models A, AD and ADR were compared using maximum likelihood (ML) ratio tests. Test statistics were defined as two times the difference between the maximum log likelihood of a reduced model (e.g. Model A) and that of a full model (e.g. Model AD). Approximate P-values were calculated as \(0.5(1-\mathrm{Pr}\left({\chi }_{1}^{2}\le T\right))\), where \(T\) was the test statistic.

Variance components were obtained from the mtg2 output. Relative variance components and corresponding standard errors were calculated using the delta method in mtg2 [23]. For example, the relative dominance variance was calculated as \({\sigma }_{D}^{2}/{\sigma }_{P}^{2}\), where \({\sigma }_{P}^{2}\) is the phenotypic variance (which excluded \({\sigma }_{HYS}^{2}\)).

To estimate additive effects (\(\widehat{{\varvec{\upalpha}}}\)), dominance effects (\(\widehat{\mathbf{d}}\)) and ROH effects (\(\widehat{\mathbf{r}}\)) per SNP, the polygenic additive effects (\(\widehat{\mathbf{u}}\)), dominance deviations (\(\widehat{\mathbf{v}}\)) and ROH deviations (\(\widehat{\mathbf{w}}\)) were back-solved using the compute SNP-effects program of calc_grm [24], according to:

$$\widehat{{\varvec{\upalpha}}}=\frac{{\mathbf{Z}}^{\mathbf{^{\prime}}}{\mathbf{G}}^{-1}\widehat{\mathbf{u}}}{{\sum }_{i}2{p}_{i}{q}_{i}},$$
$$\widehat{\mathbf{d}}=\frac{{\mathbf{H}}^{\mathbf{^{\prime}}}{\mathbf{D}}^{-1}\widehat{\mathbf{v}}}{{\sum }_{i}{\left(2{p}_{i}{q}_{i}\right)}^{2}},$$
$$\widehat{\mathbf{r}}=\frac{{\mathbf{M}}^{\mathbf{^{\prime}}}{\mathbf{R}}^{-1}\widehat{\mathbf{w}}}{{\sum }_{i}{p}_{i}^{*}{q}_{i}^{*}},$$

where all parameters are defined as before. Note that for additive and dominance effects, \({p}_{i}\) and \({q}_{i}\) were allelic frequencies, whereas for ROH effects \({p}_{i}^{*}\) and \({q}_{i}^{*}\) were the frequencies of a SNP being in a ROH or not. The back-solving procedure was verified, by recalculating polygenic effects from the back-solved SNP effects.

Note that the dominance deviations (\(\widehat{\mathbf{v}}\)) and dominance SNP effects (\(\widehat{\mathbf{d}}\)) did not include directional dominance, because the mean dominance was already absorbed by the fixed regression on genome-wide homozygosity. The mean dominance effect across loci can be calculated as \(-b/{N}_{SNP}\), where \(b\) is the regression coefficient for genome-wide homozygosity and \({N}_{SNP}\) is the number of SNPs [27, 28]. In this study, we report the \({\sigma }_{D}^{2}\) and the dominance effects as obtained from GREML and back-solving output (thus, excluding mean dominance). However, we also computed the mean dominance effect (i.e. \(-b/{N}_{SNP}\)) and investigated the effect of correcting \({\sigma }_{D}^{2}\) for this mean dominance effect (see Discussion).

Single SNP GWAS

A single SNP GWAS was performed to estimate additive, dominance and ROH effects as fixed effects per SNP. For this purpose, GREML Model A was extended by adding a fixed additive, dominance and ROH effect at a specific SNP. For each SNP, the following model was run with Snappy [29] in Wombat [30]:

$$\mathbf{y}=\mathbf{j}\alpha +\mathbf{k}d+\mathbf{l}r+\mathbf{X}\mathbf{b}+\mathbf{Q}\mathbf{c}+\mathbf{u}+\mathbf{e},$$

where \(\mathbf{j}\) is a vector with allele counts (coded as 0, 1, and 2 for genotypes BB, AB, and AA); \(\alpha\) is the additive effect; \(\mathbf{k}\) is a vector with heterozygosity status (coded as 0, 1, and 0 for genotypes BB, AB, and AA); \(d\) is the dominance effect; \(\mathbf{l}\) is a vector with ROH status (coded as 1 when the SNP was in a ROH, or 0 otherwise); and \(r\) is the ROH-effect. The other parameters are defined as in GREML Model A.

Solutions and t-statistics were obtained from the output, and corresponding P-values were computed. Genomic inflation was assessed using QQplots and genomic inflation factors. The latter were computed as the ratio of the observed median χ2 statistic over the expected median of the χ2 distribution under the null hypothesis [31]. To account for multiple testing, P-values were adjusted with the p.adjust() function in R by applying the approach of Benjamini and Hochberg [32]. A genome-wide false discovery rate (FDR) of 10% was used as a threshold to declare associations as being significant.

Results

Homozygosity and ROH-coverage along the genome

Genome-wide SNP homozygosity of cows approximately followed a normal distribution with a mean of 64.4% and a standard deviation of 1.0% (Fig. 1a). In total, 3,910,969 ROH were identified. As expected, these ROH followed approximately an exponential distribution in terms of length (Fig. 1b), with short ROH being more abundant than long ROH. The frequency of a SNP being in a ROH was on average 11.5% [for distribution, see Additional file 1: Figure S2] and differed along the genome (Fig. 1c). Chromosomes 10, 16 and 20 had the highest ROH-frequency. The highest local peak was observed on chromosome 1, with a ROH-frequency of up to 63.3%. Sixty-two SNPs were never in a ROH. These SNPs were mostly located at the start or end of chromosomes.

Fig. 1
figure1

Summary statistics of SNP homozygosity and regions of homozygosity (ROH) across all cows: (a) Distribution of genome-wide SNP homozygosity; (b) Distribution of ROH length; (c) Frequency of each SNP being in a ROH by genomic position

The homozygosity status and ROH status partly overlapped. Of all SNPs across all individuals, 11.3% were both homozygous and in a ROH, 53.1% were homozygous but not in a ROH, 0.2% were heterozygous and in a ROH, and the remaining 35.4% were heterozygous and not in a ROH.

Genome-wide inbreeding depression from GREML models

Genome-wide homozygosity had an unfavourable effect on all evaluated traits and across all GREML models (Table 1). For example, a 1% increase in homozygosity in Model A was associated with a decrease in 305-d milk yield of 99.6 kg (SE = 5.2), an increase in calving interval of 1.1 days (SE = 0.4) and an increase in SCS400 of 2.3 units (SE = 0.7). These unfavourable effects of genome-wide homozygosity reflect the presence of (favourable) directional dominance. For example, the mean dominance effect of a SNP in Model A was 0.13 kg for milk yield, − 0.0015 days for calving interval and − 0.0030 units in SCS400 (Table 1). Estimated effects of genome-wide inbreeding depression were similar across Models A, AD and ADR.

Table 1 Effect of a 1% increase in genome-wide homozygosity (\(b\)), and mean dominance effect per SNP (\(-b/{N}_{SNP}\)), for three models and nine traits

Variance components and goodness of fit of GREML models

Additive genetic variance was observed for all traits (Table 2). In Model A, heritability estimates ranged from 2.36% (SE = 0.32%) for conception rate to 41.16% (SE = 0.81%) for milk yield. Heritability estimates were approximately identical across Models A, AD and ADR.

Table 2 Estimated variance components for three GREML models and nine traits, with standard errors in parentheses

In Model AD, 0.8 to 0.9% of the phenotypic variance for yield traits and less than 0.4% of the phenotypic variance for all other traits was attributable to dominance. When expressed as part of the total genetic variance, dominance variance explained on average 2.36% of the genetic variance in Models AD (with a range of 0.07 to 5.24% across traits). The small contribution of dominance was also reflected by the goodness-of-fit of the different models. When moving from Model A to AD, the maximum log likelihood increased significantly (P < 0.05) only for yield traits (Table 3). For these yield traits, the maximum log likelihood further increased (P < 0.05) when moving to Model ADR. In Model ADR, the relative ROH variance for yield traits was approximately 0.2% (Table 2), while the relative dominance variance was lower than that in Model AD (i.e. 0.5% instead of 0.8%).

Table 3 Comparison of goodness–of-fit of different GREML models for nine traits

The herd-year-season variance (data not shown) was similar across Models A, AD and ADR and was highest for yield traits (6.7% to 9.8% of total variance) and for the interval between calving and first insemination (5.8% of total variance). The latter trait is known to be strongly influenced by farmers’ decision.

Comparison of GREML and GWAS effects

Estimated additive, dominance and ROH effects from back-solving in GREML and from single SNP GWAS models were approximately normally distributed with a mean of zero (Fig. 2). The range and standard deviation of GWAS effects were substantially larger than those of GREML effects. For example, additive effects for milk yield estimated by GWAS ranged from − 1069 to 1020 kg with a standard deviation of 36.6 kg, whereas those estimated by GREML ranged from − 25.7 to 17.8 kg with a standard deviation of 1.1 kg. The difference between GWAS and GREML estimates was larger for dominance and ROH effects than for additive effects. For example, dominance effects for milk yield estimated by GWAS ranged from − 1038 to 1120 kg with a standard deviation of 34.1 kg, whereas those estimated by GREML ranged from − 0.25 to 0.25 kg with a standard deviation of 0.05 kg.

Fig. 2
figure2

Distributions of SNP effects for 305-day milk yield (kg), estimated by GREML and single SNP GWAS. The mean (\(\bar{x}\)) and standard deviation (\(sd\)) of the effects are shown. Note that distributions were truncated such that the first and last bar represent “smaller than” and “bigger than” classes (i.e. the range was larger than shown here). Also note that the dominance effects shown here do not include the mean dominance effect that was absorbed by the fixed regression on genome-wide homozygosity

There was a moderate correlation between SNP effects estimated by GREML and GWAS. For milk yield, for example, this correlation was 0.50 for additive effects, 0.40 for dominance effects and 0.79 for ROH effects. For additive and dominance effects, many SNPs had large (absolute) GWAS effects but a GREML effect close to zero (Fig. 3).

Fig. 3
figure3

Scatterplots comparing SNP effects for 305-day milk yield (kg) estimated by GREML and single SNP GWAS. The dashed line is a linear trendline. The regression equation corresponding to this line and the Pearson correlation coefficient (R) are shown. Note that the dominance effects shown here do not include the mean dominance effect that was absorbed by the fixed regression on genome-wide homozygosity

Additive effects across the genome

Estimated additive genetic effects followed a similar pattern as that reported for other GWAS. Manhattan plots of SNP effects for yield traits, obtained by back-solving from the GREML ADR Models, are in Fig. 4. As expected, SNPs with the largest additive effects for yield traits were located between 0 and 1 Mb on chromosome 14, surrounding the DGAT1 gene [33]. The effects in this region were antagonistic, i.e. alleles that were favorable for milk and protein yields were unfavorable for fat yield. Two other regions had major additive effects on all yield traits, one on chromosome 5 (approximately 88.2 to 88.5 Mb), surrounding the ABCC9 gene [33], and one on chromosome 6 (near 87 Mb), surrounding the GC gene [33]. For protein yield, there was an additional peak on chromosome 6 (approximately 85.4 to 85.7 Mb), which included the casein cluster, i.e. CSN1, CSN2 and CSN3 [33]. The abovementioned additive peaks also passed the 10% FDR threshold in the GWAS (Fig. 5). Genomic inflation factors of the GWAS for additive and dominance effects were all lower than 1.1, suggesting that there was no major inflation of P-values for these effects [see Additional file 3: Figure S3].

Fig. 4
figure4

Additive, dominance and ROH effects for yield traits, estimated by GREML (model ADR) with back-solving. MY: 305-day milk yield (kg); FY: 305-day fat yield (kg); PY: 305-day protein yield (kg). Effects were multiplied by 100 and divided by the genetic standard deviation (\({\sigma }_{g}\)) of the corresponding trait. Note that the dominance effects shown here do not include the mean dominance effect that was absorbed by the fixed regression on genome-wide homozygosity

Fig. 5
figure5

Statistical significance of additive, dominance and ROH effects for yield traits from single SNP GWAS. MY: 305-day milk yield; FY: 305-day fat yield; PY: 305-day protein yield. The horizontal red line is a threshold based on 10% false-discovery rate (absence of this line implies that all effects were below the threshold). The y-axis for MY additive effects was truncated at 40; in the peak on chromosome 14, there were 6 SNPs with a -log10(P-value) ranging from 40 to 94

The peaks of GREML additive effects were less pronounced for fertility and udder health traits than for yield traits [see Additional file 4: Figure S4]. In addition, for fertility and udder traits, fewer SNPs showed a significant additive association in the GWAS [see Additional file 5: Figure S5]. The most notable region with significant associations was a region on chromosome 19 for SCS400. This region consisted of two narrow subpeaks (one around 54.6 Mb and one around 55.3 Mb). For the interval between calving and first insemination (ICF), various significant additive associations were detected in the GWAS. The most notable region, which was also identified by the GREML approach, was on chromosome 28 (near 35.8 Mb).

Dominance and ROH effects along the genome

GREML-based dominance effects showed less pronounced peaks than additive effects (Fig. 4) [see also Additional file 4: Figure S4]. In the GWAS, there were also fewer significant dominance associations than significant additive associations (Fig. 5) [see also Additional file 5: Figure S5].

For yield traits, the most notable region with large favorable dominance effects in the GREML and with significant dominance associations in the GWAS was located on chromosome 5 (Figs. 4 and 5). This region was rather wide, with significant associations between 13 and 40 Mb and the largest effects between 24 and 28 Mb [see Additional file 6: Figures S6 and S7]. In addition to the region on chromosome 5, two other peaks passed the 10% FDR in the GWAS, one near DGAT1 for milk and fat yields and one on chromosome 23 (near 25.2 Mb) for milk yield.

For fertility and udder health traits, there were very few significant dominance associations in the GWAS (Additional file 5). The only significant SNPs were found for the interval between calving and first insemination (ICF) and these SNPs did not cluster in peaks.

ROH effects showed many narrow peaks for all traits, both with GREML and GWAS (Figs. 4 and 5) [see also Additional file 4: Figure S4 and Additional file 5: Figure S5]. However, none of the ROH effects in the GWAS passed the 10% FDR.

Discussion

The objective of this study was to obtain a better understanding of (differences in) inbreeding depression across the genome of Dutch Holstein Friesian dairy cattle. To fulfil this objective, first we estimated genome-wide inbreeding depression and estimated additive, dominance and ROH variance components with GREML models. Then, we investigated dominance and ROH effects along the genome for yield, fertility and udder health traits, using GREML (with back-solving) and a single SNP GWAS.

Genome-wide inbreeding depression

Genome-wide SNP homozygosity had an unfavourable effect on all traits, indicating the presence of directional dominance (Table 1). The estimated effects were comparable in size to those previously reported for similar traits in Holstein Friesian dairy cattle [2, 3, 8]. When comparing inbreeding depression estimates across studies, it is important to consider the variance of underlying inbreeding measures [34]. For example in this study, the effect of a 1% increase in SNP homozygosity on milk yield (of approximately − 99 kg) may seem larger than the effect of a 1% increase in ROH-based inbreeding that we previously reported (of approximately − 36 kg [8]), but this difference can be largely explained by the different scale of the inbreeding measures. In this study, SNP homozygosity had a mean of 64.4% and a standard deviation of 1.0%, whereas ROH-based inbreeding in our previous study had a mean of 12.3% and a standard deviation of 2.7% [8]. Therefore, a 1% increase in SNP homozygosity captures a larger effect at the population level than a 1% increase in ROH-based inbreeding. To illustrate this effect of scale, previously we compared the phenotypes of highly inbred cows and lowly inbred cows and showed that different inbreeding measures may result in similar inbreeding depression at the population level, in spite of the difference in estimated regression coefficients [8].

Dominance and ROH variance

Estimates of dominance and ROH variances were small and differed significantly from zero for yield traits only (Tables 2 and 3). Dominance and ROH variances explained less than 1% of the phenotypic variance, and less than 5% of the genetic variance. When we ran models excluding dominance, i.e. Models AR, the ROH variance also explained less than 1% of the phenotypic variance [see Additional file 7: Table S3]. The maximum log-likelihoods of Models AR were significantly higher than those of Models A for yield traits only [see Additional file 7: Table S4].

Many other studies have estimated dominance variance in Holstein Friesian dairy cattle. Here, we use “relative dominance variance” for the ratio of dominance variance over phenotypic variance. Note that in the literature the term “dominance heritability” is also sometimes used for the same ratio [14, 35, 36]. Estimates of relative dominance variance based on pedigree relationship matrices in Holstein cattle typically range from 1 to 5% [37,38,39,40], although a few studies suggest a larger contribution of dominance effects [41, 42]. Estimates based on genomic relationship matrices are similar to, or slightly higher, than pedigree-based estimates (Table 4).

Table 4 Estimates of relative dominance variance from various studies that used genomic relationship matrices

Our relative dominance variance estimates tend to be a bit lower than most estimates from the literature. One explanation is that we corrected for genome-wide inbreeding depression in our GREML models (as discussed in the next section). Another reason for low dominance variance may be the limited SNP density that we used, although most other studies used similar densities (Table 4). It is well known that the additive variance captured by a SNP depends on \({r}^{2}\) (with \(r\) being the correlation between the SNP and a QTL), while the dominance variance captured by the SNP depends on \({r}^{4}\) [46]. In other words, the detection of dominance effects relies more on high LD than detection of additive effects. Thus, the detection of dominance effects would benefit substantially from a higher SNP density. In addition to SNP density and the inclusion of a regression on genome-wide homozygosity, many other factors may explain differences in relative dominance variance across studies. These factors include differences in trait definition, differences in the way phenotypes are (pre)corrected for fixed effects, differences in how the dominance relationship matrix is calculated and population-specific differences [13, 47].

In this study, we introduced a ROH-based relationship matrix to estimate a ROH-based variance component. For yield traits, Model ADR showed a better fit than Model AD (Table 3), which suggests some benefit of including ROH effects. This benefit appeared to be due to an overall redistribution of variance components and could not be easily explained. In fact, the proportion of variance explained by dominance in Model AD was slightly higher than the combined proportion of variance explained by dominance and ROH effects in Model ADR (Table 2). The dominance variance from the AD model appeared to be split over dominance and ROH components in Model ADR, suggesting that dominance and ROH effects partly captured the same variation (which might be due to collinearity between the \(\mathbf{D}\) and \(\mathbf{R}\) matrices, see discussion on Orthogonality of \(\mathbf{G}\), \(\mathbf{D}\) and \(\mathbf{R}\)). However, differences in variance components across models were small and may have also been due to random sampling.

Accounting for directional dominance when estimating dominance variance with GREML

In our GREML models, we corrected for genome-wide inbreeding depression by including a fixed regression on genome-wide SNP homozygosity. This correction is important to ensure that the model assumptions of a mean dominance effect of zero (\(\mathrm{E}\left[\mathrm{v}\right]=0\)) and of no covariance between additive and dominance effects (\(\mathrm{cov}[\mathrm{u},\mathrm{v}]=0\)) hold, and to prevent the dominance variance from being inflated [10, 28, 47]. Indeed, when we removed the fixed regression on genome-wide homozygosity from Model AD, the relative dominance variance for yield traits increased to approximately 3% (as compared to 0.8%), which are values similar to those reported by Aliloo et al. [47]. In addition, the mean back-solved dominance effect was no longer zero, but slightly favourable. The mean back-solved dominance effect for milk yield, for example, was 0.05 kg when not accounting for genome-wide homozygosity (as compared to 0.0005 kg when accounting for genome-wide homozygosity). Note that this 0.05 kg is smaller than the 0.13 kg from the fixed regression on genome-wide homozygosity (Table 1), which may be explained by shrinkage on the mean dominance when it was part of the random effect.

When a fixed regression on genome-wide SNP homozygosity is included in an AD model, the mean dominance effect across all loci is absorbed by this regression [27, 28]. Consequently, the \({\sigma }_{D}^{2}\) of such models is expected to be underestimated. Namely, the \({\sigma }_{D}^{2}\) of such models captures only the deviations of dominance effects (\({d}_{i}\)) at individual loci from the mean dominance effect (\(\bar{d}\)) across all loci, \({\sigma }_{D}^{2}={\sum }_{i}{(2{p}_{i}{q}_{i}({d}_{i}-\bar{d}))^{2}},\) while the full dominance variance equals \({\sigma }_{{D}_{FULL}}^{2}={\sum }_{i}{\left(2{p}_{i}{q}_{i}{d}_{i}\right)^{2}}\). Thus, to obtain \({\sigma }_{{D}_{FULL}}^{2}\), a component related to the mean dominance effect across all loci should be added to the \({\sigma }_{D}^{2}\) from the GREML output. This additional component can be derived as:

$${\sigma }_{{D}_{FULL}}^{2}={\sum }_{i}{\left(2{p}_{i}{q}_{i}{d}_{i}\right)^{2}}$$
$$={\sum }_{i}{\left(2{p}_{i}{q}_{i}({d}_{i}-\bar{d})+2{p}_{i}{q}_{i}\bar{d} \right)^{2}}$$
$$={\sum }_{i}{(2{p}_{i}{q}_{i}({d}_{i}-\bar{d}))^{2}}+{\sum }_{i}{(2{p}_{i}{q}_{i}\bar{d})^{2}}$$
$$= {\sigma }_{D}^{2}+N\overline{{(2pq)}^{2}}{\bar{d}}^{2},$$

where \({\sigma }_{{D}_{FULL}}^{2}\) is the full dominance variance, \({\sigma }_{D}^{2}\) is the dominance variance obtained from the GREML output, \(N\) is the number of SNPs, \(\overline{{(2pq)}^{2}}\) is the mean squared expected heterozygosity, and \({\bar{d}}^{2}\) is the squared mean dominance effect, where \(\bar{d}\) can be obtained from the regression on genome-wide homozygosity. Note that, in the third line of the derivation above, a cross-product has disappeared because \(\sum ({d}_{i}-\bar{d})=0.\)

To quantify the difference between \({\sigma }_{D}^{2}\) and \({\sigma }_{{D}_{FULL}}^{2}\), we calculated \({\sigma }_{{D}_{FULL}}^{2}\) for Model AD, applying the reasoning above. The additional component, \(N\overline{{(2pq)}^{2}}{\bar{d}}^{2}\), was relatively small compared to \({\sigma }_{D}^{2}\). For milk yield, for example, \(N\overline{{(2pq)}^{2}}{\bar{d}}^{2}\) equalled 189 kg2, whereas \({\sigma }_{D}^{2}\) equalled 10,377 kg2. Consequently, the relative dominance variance increased only marginally when accounting for the additional \(N\overline{{(2pq)}^{2}}{\bar{d}}^{2}\) component (e.g. from 0.77 to 0.78% for milk yield).

In Model ADR, it was assumed that ROH effects were distributed as \({\sim N(\mathbf{0},\mathbf{R}\sigma }_{ROH}^{2}\)). This assumption may not hold, because of a potential average genome-wide ROH-effect being different from zero (similar to the genome-wide dominance effect). However, since genome-wide SNP homozygosity and genome-wide ROH coverage (the \({F}_{ROH}\)) are highly correlated [8], we expected that the inclusion of genome-wide homozygosity would largely correct for a genome-wide ROH effect. In Model ADR, the means of the back-solved ROH effects were approximately zero (e.g. − 0.0007 for milk yield), suggesting that the fixed effect for genome-wide homozygosity indeed removed the mean ROH effect.

Orthogonality of \(\mathbf{G}\), \(\mathbf{D}\) and \(\mathbf{R}\)

In the GREML models in this study, the \(\mathbf{G}\) and \(\mathbf{D}\) were parameterized according to Vitezica et al. [13], ensuring orthogonality between these matrices. \(\mathbf{R}\) was constructed using VanRaden’s method 1 [25], with ROH-status of SNPs as input instead of genotype status. Since the ROH-status of a SNP partly depends on its genotype status (e.g. a SNP typically must be homozygous to be in a ROH), some collinearity was expected between \(\mathbf{R}\) on the one hand, and \(\mathbf{G}\) and \(\mathbf{D}\) on the other hand. To check for collinearity, correlations between off-diagonals of the different relationship matrices were calculated. These correlations equalled 0.03 between \(\mathbf{G}\) and \(\mathbf{D}\), 0.18 between \(\mathbf{G}\) and \(\mathbf{R}\), and 0.44 between \(\mathbf{D}\) and \(\mathbf{R}\). Correlations between variance component estimates were also calculated, using the average information matrix from Models ADR, and equalled approximately 0 between \({\sigma }_{G}^{2}\) and \({\sigma }_{D}^{2}\), 0 between \({\sigma }_{G}^{2}\) and \({\sigma }_{ROH}^{2}\), and 0.41–0.48 between \({\sigma }_{D}^{2}\) and \({\sigma }_{ROH}^{2}\) [see Additional file 8: Table S5]. This suggests that there was moderate collinearity between \(\mathbf{R}\) and \(\mathbf{D}\) and that it was difficult to disentangle the dominance and ROH variance components.

Although the moderate collinearity between \(\mathbf{D}\) and \(\mathbf{R}\) may have complicated the partitioning of variance components, the ranking of estimated SNP effects did not seem to be affected. Correlations between back-solved dominance effects of Models AD and ADR were above 0.998 for all traits. Similarly, correlations between additive effects of Models A, AD and ADR were above 0.999 for all traits.

Our objective was to investigate dominance and ROH effects along the genome, and the ranking of these effects did not seem to be affected by collinearity. Therefore, we believe that the collinearity had no effect on the results and conclusions of this study and decided not to perform a precorrection for orthogonality in our study. If similar analyses were to be performed in the future, orthogonality of \(\mathbf{G}\), \(\mathbf{D}\) and \(\mathbf{R}\) could be ensured by a pre-correction of the indicator matrix underlying \(\mathbf{R}\) (i.e. the \(\mathbf{M}\) in this study), making it independent from the indicator matrices underlying \(\mathbf{G}\) (i.e. \(\mathbf{Z}\)) and \(\mathbf{D}\) (i.e. \(\mathbf{H}\)). This could be achieved by linear regression. For each SNP \(i\), the corresponding vector in \(\mathbf{M}\) (say \({\mathrm{M}}_{\mathrm{i}}\)) could be regressed on the corresponding vectors of \(\mathbf{Z}\) (i.e. \({\mathrm{Z}}_{\mathrm{i}}\)) and \(\mathbf{H}\) (i.e. \({\mathrm{H}}_{\mathrm{i}}\)). The residuals of this linear regression could then be used as the new indicator variable (say \({\mathrm{M}}_{\mathrm{i}}^{*}\)) to build a corrected \(\mathbf{R}\)-matrix. By virtue of least squares, the \({\mathrm{M}}_{\mathrm{i}}^{*}\) would be orthogonal to \({\mathrm{Z}}_{\mathrm{i}}\) and \({\mathrm{H}}_{\mathrm{i}}\).

SNP effects estimated by GREML and single SNP GWAS

In this study, we estimated SNP effects with two approaches: GREML with back-solving and a single SNP GWAS. It is important to note that these approaches are equivalent when they are used for association mapping and when the models are strictly identical. As shown mathematically by Bernald Rubio et al. [48], the test statistics of GREML and of single SNP GWAS are equivalent if models are identical. These test statistics can be calculated as the estimates of effects divided by their standard deviation [48]. In this study, we did not compare the test statistics of the GREML and GWAS models, because of two software-related reasons: (1) the \(\mathbf{D}\) and \(\mathbf{R}\) matrices were not fitted in the single SNP GWAS, introducing a difference between GREML and GWAS models, and (2) the standard deviations of GREML effects were not available.

Although we did not compare test statistics, we did compare the effects of SNPs. Estimated effects of SNPs were much larger for the single SNP GWAS than for GREML and correlations between the effects of both approaches were moderate (Figs. 2 and 3). This can be explained by the fact that a single SNP GWAS estimates the effects for one SNP at a time as fixed effect without shrinkage, whereas GREML estimates the effects of all SNPs together as random effect with shrinkage. The magnitude of shrinkage depends on the standard error of the estimate of the effect, which in turn depends on the amount of data and the variance of the associated factor. There is more shrinkage for a factor that explains a smaller proportion of the variance [49]. This explains why the effects of GREML are so much smaller than those of the single SNP GWAS (Figs. 2 and 3), especially for dominance effects (which explained little variance) and ROH effects (which explained even less variance). Shrinkage can also explain why there were various SNPs with a large absolute additive and dominance effect for GWAS, but with a close to zero additive and dominance effect for GREML (Fig. 3). These SNPs all had a low minor allele frequency (MAF). For additive effects, the degree of shrinkage at a SNP (given a fixed sample size) depends on \(2pq\) [49]. When we manually shrunk additive effects from the GWAS by multiplying them with \(2pq\), the outliers indeed disappeared and the correlation between additive effects of GWAS and GREML increased from 0.50 to 0.86 (Fig. 6).

Fig. 6
figure6

Scatterplot comparing additive effects for 305-day milk yield (kg) estimated by GREML and by GWAS with manual shrinkage. The GWAS effects were manually shrunk by multiplying them with \(2pq\). The dashed line is a linear trendline. The regression equation corresponding to this line and the Pearson correlation coefficient (R) are shown

Region-specific inbreeding depression

We found limited evidence for region-specific inbreeding depression based on 75 k SNP data for yield, fertility and udder health traits. For yield traits, we only found a few regions with large favourable dominance effects from GREML and significant dominance associations in the GWAS (Figs. 4 and 5). These regions were similar to those identified in previous studies. For example, in a recent large-scale GWAS with approximately 300 k US Holsteins and 60 k SNP data, Jiang et al. [11] also found the most significant dominance effects for yield traits on chromosome 5 between 24 and 28 Mb. The second most significant dominance peak for milk yield that they found was located on chromosome 23 (near 18 Mb). The latter peak was also observed in an earlier GWAS with 43 k Holsteins [43]. The peak that we observed on chromosome 23 (near 25 Mb) was not exactly at the same location, but close to the previously reported peak. We also found a significant dominance peak near the DGAT1 gene on chromosome 14 for milk and fat yields. Significant dominance effects of DGAT1 on milk and fat yields have been reported before [11, 50, 51]. For non-yield traits, we found very few significant dominance associations for fertility traits and no significant dominance associations for SCS, also similar to findings of Jiang et al. [11].

We detected no significant GWAS associations for ROH effects. Pryce et al. [2], in contrast, reported various candidate regions associated with inbreeding depression for yield traits and calving interval based on a single SNP GWAS for ROH status in US Holsteins. This may be explained by two differences in the approach used. First, we estimated the additive, dominance and ROH effects simultaneously, whereas Pryce et al. [2] had no dominance effect in the model. Second, Pryce et al. [2] used a threshold of -log10(P-value) of 3 to identify candidate regions and mentioned that the FDR was high. When we applied a threshold -log10(P-value) of 3 in the GWAS, we indeed found various significant ROH peaks for all traits (Fig. 5) and [see Additional file 5 Figure S5]. Some of these peaks were favourable, potentially indicating signatures of selection, whereas others were unfavourable, potentially indicating inbreeding depression. Also note that genomic inflation factors were approximately 1.3 for ROH effects [see Additional file 3 Figure S3], suggesting substantial inflation. Because the ROH effects did not pass the 10% FDR, we decided not to correct for this inflation.

In this and other studies in which a single SNP GWAS for ROH-status was performed [2, 16], the ROH-status of a SNP was set to 1 when the SNP was in a ROH, irrespective of which ROH it concerned. As a result, the estimated ROH-effect of a SNP was a pooled effect of many distinct ROH. This approach is in line with the reasoning behind inbreeding depression, namely that any homozygosity decreases performance (irrespective for which allele). However, it may be of interest to know which specific ROH is unfavourable. Fine-mapping of individual ROH effects is not straightforward due to the large number of distinct ROH (each having a low frequency). One possibility is to group ROH based on common core regions and then try to associate ROH-groups with the phenotype, as is sometimes done in humans (e.g. [52].). Alternatively, one could test each individual ROH, e.g. using the heuristic approach introduced by Howard et al. [53]. In the approach of Howard et al. [53], the mean phenotype of individuals with a specific ROH is compared to the mean phenotype of individuals without that ROH. A limitation of this approach is that it is computationally intensive, in spite of the various filtering criteria that can be used (such as minimum ROH-frequency). Consequently, the feasibility for datasets with many traits and large numbers of individuals is limited. Recently, this approach has been applied to smaller datasets (less than 10,000 individuals) in swine [53] and Canadian Holstein cattle [54], and the estimated effects reported by these studies are rather large. For example, the average effect of unfavourable ROH identified by Marras et al. [54] for 305-d milk yield in first parity cows, was -295 kg with a standard deviation of 105 kg. These effects are likely to be overestimated, because of statistical biases similar to those in a single SNP GWAS. In addition, there may be many false negatives due to the initial filtering steps and the use of significance thresholds to account for multiple testing. Despite these limitations, the identified unfavourable ROH-haplotypes and their effects could be used to build an inbreeding load matrix (ILM), which provides some information on the expected inbreeding load of the offspring of a particular mating [53]. This could be valuable in mating programs but is of less importance for selection schemes.

An important observation in this study was that dominance (and ROH) effects shrunk substantially when the fixed regression on homozygosity was included in GREML models. This was also observed by Aliloo et al. [47]. As a result, ROH and dominance variances were small (< 1% of phenotypic variance) and only significant for yield traits. This suggests that, after correcting for genome-wide inbreeding depression, fitting dominance and ROH effects (based on 75 k data) had little additional value. Overall, our findings suggest that, based on 75 k SNP data, the deleterious effect of homozygosity is quite evenly distributed across the genome and is well captured by genome-wide homozygosity in Holstein–Friesian dairy cattle. Based on these findings, there is little benefit of accounting for region-specific inbreeding depression in selection schemes.

Conclusions

Inbreeding depression was observed for yield, fertility and udder health traits in Dutch Holstein Friesian cattle. However, after correcting for genome-wide homozygosity dominance and ROH effects explained very little variance in GREML models. A few regions with relatively large favourable dominance effects and significant dominance associations (based on 10% FDR) were identified for yield traits, based on both GREML and single SNP GWAS. Overall, however, inbreeding depression appeared to be distributed quite equally along the genome and was well captured by genome-wide homozygosity.

Availability of data and materials

All information supporting the results is included in the text, figures, tables and Additional files. The dataset is not publicly available due to commercial restrictions.

References

  1. 1.

    Falconer DS, Mackay TFC. Introduction to quantitative genetics. 4th ed. Harlow: Longman Group Ltd; 1996.

    Google Scholar 

  2. 2.

    Pryce JE, Haile-Mariam M, Goddard ME, Hayes BJ. Identification of genomic regions associated with inbreeding depression in Holstein and Jersey dairy cattle. Genet Sel Evol. 2014;46:71.

    Article  CAS  Google Scholar 

  3. 3.

    Bjelland DW, Weigel KA, Vukasinovic N, Nkrumah JD. Evaluation of inbreeding depression in Holstein cattle using whole-genome SNP markers and alternative measures of genomic inbreeding. J Dairy Sci. 2013;96:4697–706.

    CAS  Article  Google Scholar 

  4. 4.

    Martikainen K, Sironen A, Uimari P. Estimation of intrachromosomal inbreeding depression on female fertility using runs of homozygosity in Finnish Ayrshire cattle. J Dairy Sci. 2018;101:11097–107.

    CAS  Article  Google Scholar 

  5. 5.

    Mc Parland S, Kearney F, Berry DP. Purging of inbreeding depression within the Irish Holstein-Friesian population. Genet Sel Evol. 2009;41:16.

    Article  Google Scholar 

  6. 6.

    Biffani S, Samoré A, Canavesi F. Inbreeding depression for production, reproduction and functional traits in Italian Holstein cattle. In Proceedings of the 7th World Congress on genetics applied to livestock production: 19–23 August 2002; Montpellier; 2002, pp. 183–186.

  7. 7.

    Mc Parland S, Kearney J, Rath M, Berry D. Inbreeding effects on milk production, calving performance, fertility, and conformation in Irish Holstein-Friesians. J Dairy Sci. 2007;90:4411–9.

    CAS  Article  Google Scholar 

  8. 8.

    Doekes HP, Veerkamp RF, Bijma P, de Jong G, Hiemstra SJ, Windig JJ. Inbreeding depression due to recent and ancient inbreeding in Dutch Holstein-Friesian dairy cattle. Genet Sel Evol. 2019;51:54.

    Article  CAS  Google Scholar 

  9. 9.

    Howard JT, Pryce JE, Baes C, Maltecca C. Invited review: Inbreeding in the genomics era: inbreeding, inbreeding depression, and management of genomic variability. J Dairy Sci. 2017;100:6009–24.

    CAS  Article  Google Scholar 

  10. 10.

    Bolormaa S, Pryce JE, Zhang Y, Reverter A, Barendse W, Hayes BJ, et al. Non-additive genetic variation in growth, carcass and fertility traits of beef cattle. Genet Sel Evol. 2015;47:26.

    Article  Google Scholar 

  11. 11.

    Jiang J, Ma L, Prakapenka D, VanRaden PM, Cole JB, Da Y. A large-scale genome-wide association study in US Holstein cattle. Front Genet. 2019;10:412.

    CAS  Article  Google Scholar 

  12. 12.

    MacLeod I, Hayes B, Savin K, Chamberlain A, McPartlan H, Goddard M. Power of a genome scan to detect and locate quantitative trait loci in cattle using dense single nucleotide polymorphisms. J Anim Breed Genet. 2010;127:133–42.

    CAS  Article  Google Scholar 

  13. 13.

    Vitezica ZG, Varona L, Legarra A. On the additive and dominant variance and covariance of individuals within the genomic selection scope. Genetics. 2013;195:1223–30.

    Article  Google Scholar 

  14. 14.

    Aliloo H, Pryce JE, González-Recio O, Cocks BG, Hayes BJ. Accounting for dominance to improve genomic evaluations of dairy cows for fertility and milk production traits. Genet Sel Evol. 2016;48:8.

    Article  CAS  Google Scholar 

  15. 15.

    Calus MPL, Goddard ME, Wientjes YCJ, Bowman PJ, Hayes BJ. Multibreed genomic prediction using multitrait genomic residual maximum likelihood and multitask Bayesian variable selection. J Dairy Sci. 2018;101:4279–94.

    CAS  Article  Google Scholar 

  16. 16.

    Ferenčaković M, Sölkner J, Kapš M, Curik I. Genome-wide mapping and estimation of inbreeding depression of semen quality traits in a cattle population. J Dairy Sci. 2017;100:4721–30.

    Article  CAS  Google Scholar 

  17. 17.

    Keller MC, Visscher PM, Goddard ME. Quantification of inbreeding due to distant ancestors and its detection using dense single nucleotide polymorphism data. Genetics. 2011;189:237–49.

    Article  Google Scholar 

  18. 18.

    Druet T, Schrooten C, de Roos APW. Imputation of genotypes from different single nucleotide polymorphism panels in dairy cattle. J Dairy Sci. 2010;93:5443–54.

    CAS  Article  Google Scholar 

  19. 19.

    ARS-UCD1.2 bovine genome assembly. https://www.animalgenome.org/repository/cattle/UMC_bovine_coordinates/. Accessed 15 Sept 2019.

  20. 20.

    NCBI. Genome Remapping Service. https://www.ncbi.nlm.nih.gov/genome/tools/remap/. Accessed 15 Sept 2019.

  21. 21.

    Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81:559–75.

    CAS  Article  Google Scholar 

  22. 22.

    Doekes HP, Veerkamp RF, Bijma P, Hiemstra SJ, Windig JJ. Trends in genome-wide and region-specific genetic diversity in the Dutch-Flemish Holstein-Friesian breeding program from 1986 to 2015. Genet Sel Evol. 2018;50:15.

    Article  Google Scholar 

  23. 23.

    Lee SH, Van der Werf JH. MTG2: an efficient algorithm for multivariate linear mixed model analysis based on genomic information. Bioinformatics. 2016;32:1420–2.

    CAS  Article  Google Scholar 

  24. 24.

    Calus MPL, Vandenplas J. Calc_grm—a programme to compute pedigree, genomic, and combined relationship matrices. Wageningen: Wageningen University & Research Animal Breeding and Genomics; 2013.

    Google Scholar 

  25. 25.

    VanRaden PM. Efficient methods to compute genomic predictions. 2008;91:4414–23.

    CAS  Google Scholar 

  26. 26.

    de Cara MÁR, Villanueva B, Toro MÁ, Fernández J. Using genomic tools to maintain diversity and fitness in conservation programmes. Mol Ecol. 2013;22:6091–9.

    Article  Google Scholar 

  27. 27.

    Xiang T, Christensen OF, Vitezica ZG, Legarra A. Genomic evaluation by including dominance effects and inbreeding depression for purebred and crossbred performance with an application in pigs. Genet Sel Evol. 2016;48:92.

    Article  Google Scholar 

  28. 28.

    Varona L, Legarra A, Toro MA, Vitezica ZG. Non-additive effects in genomic selection. Front Genet. 2018;9:78.

    Article  Google Scholar 

  29. 29.

    Meyer K, Tier B. “SNP Snappy”: a strategy for fast genome-wide association studies fitting a full mixed model. Genetics. 2012;190:275–7.

    CAS  Article  Google Scholar 

  30. 30.

    Meyer K. WOMBAT—a tool for mixed model analyses in quantitative genetics by restricted maximum likelihood (REML). J Zhejiang Univ Sci B. 2007;8:815–21.

    Article  Google Scholar 

  31. 31.

    Price AL, Zaitlen NA, Reich D, Patterson N. New approaches to population stratification in genome-wide association studies. Nat Rev Genet. 2010;11:459–63.

    CAS  Article  Google Scholar 

  32. 32.

    Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Series B Stat Methodol. 1995;57:289–300.

    Google Scholar 

  33. 33.

    EMBL-EBI. e!Ensemble: Cow (ARS-UCD1.2). www.ensembl.org/Bos_taurus/. Accessed 9 Mar 2020.

  34. 34.

    Kardos M, Nietlisbach P, Hedrick PW. How should we compare different genomic estimates of the strength of inbreeding depression? Proc Natl Acad Sci USA. 2018;115:E2492–3.

    CAS  Article  Google Scholar 

  35. 35.

    Sun C, VanRaden PM, Cole JB, O’Connell JR. Improvement of prediction ability for genomic selection of dairy cattle by including dominance effects. PLoS One. 2014;9:e103934.

    Article  CAS  Google Scholar 

  36. 36.

    Da Y, Wang C, Wang S, Hu G. Mixed model methods for genomic prediction and variance component estimation of additive and dominance effects using SNP markers. PLoS One. 2014;9:e87666.

    Article  CAS  Google Scholar 

  37. 37.

    Miglior F, Burnside EB, Kennedy BW. Production traits of Holstein cattle: estimation of nonadditive genetic variance components and inbreeding depression. J Dairy Sci. 1995;78:1174–80.

    CAS  Article  Google Scholar 

  38. 38.

    Miglior F, Burnside EB, Dekkers JC. Nonadditive genetic effects and inbreeding depression for somatic cell counts of Holstein cattle. J Dairy Sci. 1995;78:1168–73.

    CAS  Article  Google Scholar 

  39. 39.

    Kawahara T, Gotoh Y, Yamaguchi S, Suzuki M. Variance component estimates with dominance models for milk production in Holsteins of Japan using method R. Asian-Australas J Anim Sci. 2006;19:769–74.

    Article  Google Scholar 

  40. 40.

    Van Tassell CP, Misztal I, Varona L. Method R estimates of additive genetic, dominance genetic, and permanent environmental fraction of variance for yield and health traits of Holsteins. J Dairy Sci. 2000;83:1873–7.

    Article  Google Scholar 

  41. 41.

    Tempelman RJ, Burnside EB. Additive and nonadditive genetic variation for production traits in Canadian Holsteins. J Dairy Sci. 1990;73:2206–13.

    Article  Google Scholar 

  42. 42.

    Hoeschele I. Additive and nonadditive genetic variance in female fertility of Holsteins. J Dairy Sci. 1991;74:1743–52.

    CAS  Article  Google Scholar 

  43. 43.

    Jiang J, Shen B, O’Connell JR, VanRaden PM, Cole JB, Ma L. Dissection of additive, dominance, and imprinting effects for production and reproduction traits in Holstein cattle. BMC Genomics. 2017;18:425.

    Article  CAS  Google Scholar 

  44. 44.

    Alves K, Brito LF, Baes CF, Sargolzaei M, Robinson JAB, Schenkel FS. Estimation of additive and non-additive genetic effects for fertility and reproduction traits in North American Holstein cattle using genomic information. J Anim Breed Genet. 2020;137:316–30.

    CAS  Article  Google Scholar 

  45. 45.

    Mao X, Sahana G, Johansson AM, Liu A, Ismael A, Løvendahl P, et al. Genome-wide association mapping for dominance effects in female fertility using real and simulated data from Danish Holstein cattle. Sci Rep. 2020;10:2953.

    CAS  Article  Google Scholar 

  46. 46.

    Zhu Z, Bakshi A, Vinkhuyzen AA, Hemani G, Lee SH, Nolte IM, et al. Dominance genetic variation contributes little to the missing heritability for human complex traits. Am J Hum Genet. 2015;96:377–85.

    CAS  Article  Google Scholar 

  47. 47.

    Aliloo H, Pryce J, González-Recio O, Cocks B, Goddard M, Hayes B. Including nonadditive genetic effects in mating programs to maximize dairy farm profitability. J Dairy Sci. 2017;100:1203–22.

    CAS  Article  Google Scholar 

  48. 48.

    Bernal Rubio YL, Gualdron Duarte JL, Bates RO, Ernst CW, Nonneman D, Rohrer GA, et al. Meta-analysis of genome-wide association from genomic prediction models. Anim Genet. 2016;47:36–48.

    CAS  Article  Google Scholar 

  49. 49.

    Gianola D. Priors in whole-genome regression: the Bayesian alphabet returns. Genetics. 2013;194:573–96.

    CAS  Article  Google Scholar 

  50. 50.

    Bovenhuis H, Visker MHPW, van Valenberg HJF, Buitenhuis AJ, van Arendonk JAM. Effects of the DGAT1 polymorphism on test-day milk production traits throughout lactation. J Dairy Sci. 2015;98:6572–82.

    CAS  Article  Google Scholar 

  51. 51.

    Kuehn C, Edel C, Weikard R, Thaller G. Dominance and parent-of-origin effects of coding and non-coding alleles at the acylCoA-diacylglycerol-acyltransferase (DGAT1) gene on milk production traits in German Holstein cows. BMC Genet. 2007;8:62.

    Article  CAS  Google Scholar 

  52. 52.

    Yang TL, Guo Y, Zhang JG, Xu C, Tian Q, Deng HW. Genome-wide survey of runs of homozygosity identifies recessive loci for bone mineral density in Caucasian and Chinese populations. J Bone Miner Res. 2015;30:2119–26.

    CAS  Article  Google Scholar 

  53. 53.

    Howard JT, Tiezzi F, Huang Y, Gray KA, Maltecca C. A heuristic method to identify runs of homozygosity associated with reduced performance in livestock. J Anim Sci. 2017;95:4318–32.

    CAS  Article  Google Scholar 

  54. 54.

    Marras G, Howard J, Martin P, Fleming A, Alves K, B.; M, Schenkel F, Miglior F, Maltecca C, Baes CF. Identification of unfavourable homozygous haplotypes associated with with milk and fertility traits in Holsteins. In Proceedings of the 11th World Congress on Genetics Applied to Livestock Production: 11–16 February 2018; Auckland; 2018.

Download references

Acknowledgements

The authors gratefully acknowledge the Dutch-Flemish cattle improvement cooperative (CRV) for providing pedigree and genotype data. The authors would also like to thank Mario Calus (Animal Breeding and Genomics, Wageningen University & Research) and the anonymous editor and reviewers for their valuable input.

Funding

The research leading to these results has been conducted as part of the IMAGE project, which received funding from the European Union's Horizon 2020 Research and Innovation Program under the grant agreement no 677353. The study was co-funded by the Dutch Ministry of Agriculture, Nature and Food Quality (KB-34-013-002).

Author information

Affiliations

Authors

Contributions

HPD and PB conceived and designed the experiments; HPD performed the analyses and prepared the manuscript; HPD, PB, RFV, GDJ, YCJW and JJW participated in the interpretation of results and revision of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Harmen P. Doekes.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Additional file 1: Figure S1.

Allele frequency distribution of SNPs, shown as number of SNPs per minor allele frequency (MAF) class of 1% (e.g. from 0 to 1%). Figure S2. ROH frequency distribution of SNPs, shown as number of SNPs per ROH frequency class of 1% (e.g. from 0 to 1%). The last class includes all SNPs with a ROH frequency above 35%.

Additional file 2: Table S1.

Construction of the R matrix for the ADR-model: SNP coding for ROH status and ROH-frequency per SNP and example of genotypes and ROH-status for 30 SNPs and two animals (j and k) in Table S1. Table S2. Construction of the R matrix for the ADR-model: example of calculating the ROH-based relationship between two animals (j and k) based on their ROH-status (\({x}_{ij}\) and \({x}_{ik}\)) for the 30 SNPs in Table S1. Applying Eq. (S1), the ROH-based relationship between j and k is 3.0471/3.4829 = 0.8749. Note that the frequency of each SNP being in a ROH (\({p}_{i}^{*}\)) was assumed to be known.

Additional file 3: Figure S3.

QQ-plots and genomic inflation factors for P-values corresponding to additive, dominance and ROH effects estimated by a single SNP GWAS for nine traits. MY: 305-day milk yield; FY: 305-day fat yield; PY: 305-day protein yield; CI: calving interval; ICF: interval calving to first insemination; IFL: interval first to last insemination; CR: conception rate; SCS150 somatic cell score day 5 to 150; SCS400: somatic cell score day 151 to 400.

Additional file 4: Figure S4.

Additive, dominance and ROH effects for fertility and udder health traits, estimated by GREML model ADR with back-solving. CI: calving interval (d); ICF: interval calving to first insemination (d); IFL: interval first to last insemination (d); CR: conception rate (%); SCS150 somatic cell score day 5 to 150 (units); SCS400: somatic cell score day 151 to 400 (units). Effects were multiplied by 100 and divided by the genetic standard deviation (\({\sigma }_{g}\)) of the corresponding trait. Note that dominance effects for CI and ROH effects for IFL, CR, and SCS400 are not shown, because the corresponding variances were fixed to zero in the GREML (Table 2 of main text).

Additional file 5: Figure S5.

Statistical significance of additive, dominance and ROH effects for fertility and udder health traits based on single SNP GWAS (continued on next page). The horizontal red line is a threshold based on 10% false-discovery rate (absence of this line indicates that all effects were below the threshold). CI: calving interval (d); ICF: interval calving to first insemination (d); IFL: interval first to last insemination (d); CR: conception rate (%); SCS150 somatic cell score day 5 to 150 (units); SCS400: somatic cell score day 151 to 400 (units).

Additional file 6: Figure S6.

Dominance effects for yield traits, estimated by GREML (model ADR) with back-solving, for a region on chromosome 5 from 10 to 45 Mb. MY: 305-day milk yield (kg); FY: 305-day fat yield (kg); PY: 305-day protein yield (kg). Effects were multiplied by 100 and divided by the genetic standard deviation (\({\sigma }_{g}\)) of the corresponding trait. Figure S7. Statistical significance of dominance effects for yield traits, estimated by single SNP GWAS, for a region on chromosome 5 from 10 to 45 Mb. MY: 305-day milk yield.

Additional file 7: Table S3.

Estimated variance components1 for additive + ROH model (AR) and nine traits2, with standard errors in parentheses. Table S4. Comparison of goodness of fit of different GREML models for nine traits.

Additional file 8: Table S5.

Correlations between variance components estimates1 from the ADR model for nine traits. Correlations were calculated from the average information matrix from mtg2 output.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Doekes, H.P., Bijma, P., Veerkamp, R.F. et al. Inbreeding depression across the genome of Dutch Holstein Friesian dairy cattle. Genet Sel Evol 52, 64 (2020). https://doi.org/10.1186/s12711-020-00583-1

Download citation