Estimation of (co)variances for genomic regions of flexible sizes: application to complex infectious udder diseases in dairy cattle

Sørensen, Lars P; Janss, Luc; Madsen, Per; Mark, Thomas; Lund, Mogens S

doi:10.1186/1297-9686-44-18

Research
Open access
Published: 28 May 2012

Estimation of (co)variances for genomic regions of flexible sizes: application to complex infectious udder diseases in dairy cattle

Lars P Sørensen¹,
Luc Janss¹,
Per Madsen¹,
Thomas Mark² &
…
Mogens S Lund¹

Genetics Selection Evolution volume 44, Article number: 18 (2012) Cite this article

4178 Accesses
12 Citations
Metrics details

Abstract

Background

Multi-trait genomic models in a Bayesian context can be used to estimate genomic (co)variances, either for a complete genome or for genomic regions (e.g. per chromosome) for the purpose of multi-trait genomic selection or to gain further insight into the genomic architecture of related traits such as mammary disease traits in dairy cattle.

Methods

Data on progeny means of six traits related to mastitis resistance in dairy cattle (general mastitis resistance and five pathogen-specific mastitis resistance traits) were analyzed using a bivariate Bayesian SNP-based genomic model with a common prior distribution for the marker allele substitution effects and estimation of the hyperparameters in this prior distribution from the progeny means data. From the Markov chain Monte Carlo samples of the allele substitution effects, genomic (co)variances were calculated on a whole-genome level, per chromosome, and in regions of 100 SNP on a chromosome.

Results

Genomic proportions of the total variance differed between traits. Genomic correlations were lower than pedigree-based genetic correlations and they were highest between general mastitis and pathogen-specific traits because of the part-whole relationship between these traits. The chromosome-wise genomic proportions of the total variance differed between traits, with some chromosomes explaining higher or lower values than expected in relation to chromosome size. Few chromosomes showed pleiotropic effects and only chromosome 19 had a clear effect on all traits, indicating the presence of QTL with a general effect on mastitis resistance. The region-wise patterns of genomic variances differed between traits. Peaks indicating QTL were identified but were not very distinctive because a common prior for the marker effects was used. There was a clear difference in the region-wise patterns of genomic correlation among combinations of traits, with distinctive peaks indicating the presence of pleiotropic QTL.

Conclusions

The results show that it is possible to estimate, genome-wide and region-wise genomic (co)variances of mastitis resistance traits in dairy cattle using multivariate genomic models.

Background

Livestock provide a great source of data to investigate genome-wide effects on various phenotypic characteristics such as infectious diseases. There are several reasons for this, including: (1) vast amounts of phenotypic measures (milk yield in dairy cattle, litter size in pigs, daily gain in broilers etc.) are systematically recorded in modern livestock production and in Danish dairy cattle, for example, phenotypic information on a variety of traits, including clinical disease, is stored together with pedigrees in one central database; (2) important environmental factors, such as herd membership, affecting various phenotypes are recorded and animals within such groups receive rather homogeneous treatments; (3) low effective population sizes are frequent in livestock (e.g. compared with humans), which makes it easier to predict genetic merit and (4) recently, routine genotyping using dense SNP marker panels (e.g. >50 K) for thousands of animals has been initiated in several livestock species.

In the Nordic countries (Denmark, Finland, Norway, and Sweden), treatment of udder infections (mastitis) in dairy cattle is systematically recorded by veterinarians or farmers. However, estimates of heritability of mastitis incidence are low (i.e. 0.1 on the underlying continuous scale or 0.03 on the observable scale; [1] and [2], respectively). The disease can be caused by a large number of microbial pathogens [3], which differ in pathogenesis and reservoir. Several studies have shown that the mammary immune response differs between pathogens [4, 5] suggesting that it is regulated by different genes and that mastitis caused by different pathogens should be considered as different traits. This is supported by our previous study [1] in which pedigree-based analyses were conducted to estimate genetic correlations between mastitis caused by different pathogens. The genetic correlations between mastitis caused by five common mastitis pathogens, Staphylococcus aureus Escherichia coli, coagulase-negative staphylococci (CNS), Streptococcus dysgalactiae, and Streptococcus uberis, ranged from 0.45 to 0.77, which implies that the mammary immune system, or the physical defense system, or both, act in a pathogen-specific manner. However, the existence of positive genetic correlations also implies the presence of pleiotropic effects or linked quantitative trait loci (QTL). Several studies have reported different heritability estimates for pathogen-specific mastitis traits [6–8], indicating that they may differ between traits, although some of these differences may also be due to differences in data structure and in the method used to estimate genetic parameters.

Genomic data are now used to infer either (1) whole-genome effects for the purpose of, e.g., estimation of breeding values to select superior breeding animals or for prediction of future phenotypes such as disease risks, or (2) effects of single genes or markers, to guide the development of human or veterinary drugs through improved knowledge on the biological basis of traits. Approach (1) typically involves ‘whole genome’ models that model all SNP simultaneously, whereas approach (2) involves Genome-Wide Association Studies (GWAS), in which, typically, each SNP is tested individually using univariate association tests. Here, we suggest a compromise between these two approaches by employing whole-genome models in which variances and covariances are partitioned by chromosome segments. We hypothesize that this approach will capture a large portion of the genetic variance, while also providing further biological understanding of the traits in question. Investigating the effects of chromosome segments of variable size (e.g. regions of neighboring SNP, haplotypes, gene-networks, chromosomes) and correlations among segment effects on different traits may provide interesting insights into the genetic and biological architecture of disease traits such as mastitis incidence.

Statistical methods for genomic analyses typically employ fixed prior parameters, which make them less suited to estimate genomic (co)variances. Models that use a genomic relationship matrix, e.g. [9], could be used to estimate (co)variances using REML (Restricted Maximum Likelihood) but studying (co)variances per chromosome or for several chromosome segments would be computationally prohibitive. For instance, a bivariate analysis in dairy cattle with 30 chromosomes would involve 30 genomic relationship matrices and the simultaneous estimation of 90 variance-covariance components. Using multivariate genomic selection methodology [10] for mastitis traits, it is possible to build a (co)variance matrix of allele substitution effects. In this study, we used a Bayesian SNP-based genomic model, which was extended to estimate hyperparameters of the prior distribution of allele substitution effects from the data. Thereby, the method makes it possible to estimate genomic (co)variances while remaining computationally feasible. Results can be used to reveal genomic regions associated with only one pathogen (pathogen-specific effects), associated with two or more pathogens (group-specific effects), or associated with all the pathogens (general effects). The estimated (co)variances between the allele substitution effects can also be used to compute various genetic parameters such as heritabilities and correlations; these can be computed region-wise (e.g. per chromosome) or genome-wide.

The objectives of this work were to (1) present a multivariate model for genome-wide and region-wise association studies, (2) perform simultaneous estimation of genomic effects (allele substitution effects) for mastitis resistance using more than one trait, and (3) estimate covariances between traits across the chromosomes and across regions of various sizes.

Methods

Phenotypic data

The data comprised records of mastitis treatments and pathogen information (results of bacteriological culturing of milk samples) from Danish Holstein cows that calved for the first time between January 1998 and January 2009 (collection period). The data were extracted from the Danish National Cattle Database. Mastitis is a difficult trait to analyze due to its low heritability and a potential bias in the treatment of cows; thus, data were edited as described in [11]. Briefly, data from cows that had calved after March 2008 (300 days before the end of the collection period) were removed from the dataset to reduce the bias due to censoring. In addition, the following criteria were required for a herd to be included in the dataset: age at first calving between 19 and 36 months for a cow to be included in the data set, participating herds with at least 30 first calvings in a given year of the collection period, and active participation in disease recording [12]. Information on mastitis treatments was merged with pathogen data if the recorded date of a pathogen was three days before to four days after a case of mastitis was recorded on the same cow. Only the data from daughters of genotyped bulls were included in the present study and each bull was required to have at least five daughters calving during the collection period, resulting in a dataset of 200 149 daughters of 1 844 genotyped sires.

Trait definitions

General mastitis was defined as a binary trait for the period from15 days before to 300 days after first calving, i.e. a pheno type of “1” was assigned if a cow was treated for mastitis during this period and “0” otherwise. Only the first observed mastitis treatment for each cow was included. The five most common pathogens in Danish dairy herds, i.e. Staph. aureus, CNS, E. coli, Strep. dysgalactiae, and Strep. uberis, were chosen to represent the pathogen-specific mastitis traits (also binary). The pathogen-specific traits were defined only for treatments with pathogen information. In contrast, the trait “general mastitis” contained all recorded (according to trait definition) treatments of mastitis, i.e. both treatments with and without pathogen information.

Estimation of progeny means (PM) adjusted for non-genetic effects

For computational reasons, it was necessary to summarize information per sire (meta-analysis) due to the small number of sires and the large number of offspring per sire, and availability of SNP information on the sires only. Thus, PM of the mastitis traits were estimated as daughter yield deviations, as described by [13]. However, in the present study, a sire model was used to estimate both PM and EBV; thus PM were defined as a trait corrected for all known environmental effects and averaged over records so that they consisted of additive genetic and residual effects. For the mastitis traits, a threshold-liability model [14] was applied to estimate PM. The threshold model assumes the presence of an underlying continuous random variable called liability, λ. The relationship between the observed binary variable, y, and the unobservable λ is

y_{i} = {\begin{array}{c} 0 i f λ_{i} \leq τ \\ 1 i f λ_{i} > τ \end{array}

where τ is a fixed threshold and y_i = 1 and 0 correspond to the presence or absence of mastitis for observation i, respectively. It was assumed that λ is normally distributed with a mean vector μ and covariance matrix $R = I σ_{e}^{2}$ . Because τ and $σ_{e}^{2}$ are undetermined, they were arbitrarily set equal to τ = 0 and $σ_{e}^{2}$ = 1 such that

λ | μ \sim ~ N (μ, I)

The probability (π_i) that observation i is scored as “1” given the model parameter vector θ, is

\begin{array}{c} π_{i} = P r (y_{i} = 1 | θ) \\ = P r (λ_{i} > 0 | θ) \\ = 1 - P r (λ_{i} \leq 0 | θ) \\ = Φ (μ_{i}) \end{array}

where Φ(.) is the standard normal cumulative distribution.

The following sire model was used to describe liability to mastitis:

λ_{ijklm} = Y M_{i} + A G E_{j} + b_{1} t_{ijklm} + h y s_{k} + s i r e_{l} + e_{ijklm}

where

λ_ijklm = liability to mastitis of daughter m of sire l calving in year-month class i at calving age class j and in herd-year-season class k;

YM_i = “fixed” effect of year-month of calving (123 classes);

AGE_j = “fixed” effect of calving age (17 classes);

hys_k = random effect of herd-year-season (season = year divided into quarters; 22,918 levels);

sire_l = transmitting ability of sire l (8 547 levels);

b₁ = “fixed” regression coefficient of λ on the length of the period at risk;

t_ijklm = period at risk for daughter m of sire l, defined as the number of days from 15 days before calving to the date of culling or to the end of the risk period; it was assumed that all cows with mastitis had a completed risk period;

e_ijklm = residual ~ N(0,1) and independent.

In matrix notation, the model for the mastitis traits can be expressed as:

λ = X_{b} b + X_{h} h + Z s + e

where λ is a n × 1 vector of the underlying liabilities of mastitis, n is the number of records for each trait, b is a vector of “fixed” effects as described previously, h is a vector of random herd-year-season effects, s is a vector of random sire effects, and e is a vector of random residual effects. X_b, $X_{h_{i}}$ and Z are corresponding incidence matrices.

A full Bayesian approach using Markov chain Monte Carlo (MCMC) methods [15] via Gibbs sampling implemented in the DMU package [16] was used to fit the models and sample posterior PM. The PM were on the liability scale and were estimated from the model above as $P M_{i} = \sum_{k} T D_{k} / n$ , where TD_k is the trait of daughter k on the liability scale and adjusted for all effects other than additive genetic effects and residuals and n is the number of daughters of bull i. Independent improper uniform priors were assigned to each element of b. Herd and sire effects were assigned uninformative normal prior distributions $h \sim N (0, I σ_{h}^{2})$ and $s \sim N (0, A σ_{s}^{2})$ , respectively, where I is an identity matrix, A is the additive relationship matrix, and $σ_{h}^{2}$ and $σ_{s}^{2}$ are the herd and sire variances, respectively. Independent scaled inverse chi-square distributions were used for the unknown variance components ( $σ_{h}^{2}$ and $σ_{s}^{2}$ ), with settings so that these prior distributions were flat. Inferences were based on 600 000 samples; the first 100 000 samples were disregarded as burn-in, and every 10^th sample was saved for post-Gibbs analyses.

Convergence of the Gibbs chains for each model parameter was ensured using a standardized time series method of batch means [17, 18].

Estimation of heritabilities of progeny means

Subsequently, genetic variances of the estimated PM were estimated using a standard linear animal model with pedigree information and REML. The PM were weighted based on the standard errors of prediction (SEP) of the posterior PM samples. From the estimated variances, heritabilities for each trait PM were computed for later comparisons with estimated genomic variances.

Weights for the association model

Standard errors of prediction of the posterior PM samples were calculated to construct weights for each trait included in the genomic model to adjust for heterogeneous variances of the sire records. The weights were computed as 1/SEP² and scaled to achieve an average weight of 1. The scale factor used in the present study was the average weight per trait of the 1 844 genotyped bulls. By scaling the weights to an average of 1, the computed residual variances will be directly comparable with the genomic (co)variances.

Marker data

The bulls selected for this study were genotyped using the Illumina Bovine SNP50 BeadChip (Illumina, San Diego, CA). The raw marker data were edited using the following criteria: (1) a locus was removed from the analyses if the minor allele frequency was less than 5%, if the proportion of animals genotyped for this locus was less than 95%, if the average GenCall score at the locus was less than 60%, and if the proportion of missing marker genotypes was larger than 10%; (2) an individual was deleted if the call rate (i.e. the overall call rate of a sample is equal to the number of SNP receiving an AA, AB, or BB genotype call divided by the total number of SNP on the chip) had a score below 0.85. After editing, 1 844 bulls had daughters with mastitis and pathogen data, and 37 862 SNP were available and used in the analyses.

Genomic model

Genomic parameters were estimated using a Bayesian model in which SNP effects, within a trait, were assumed to originate from the same normal distribution. This represents the gBLUP method [9] implemented with Bayesian methodology [19] and a random walk Metropolis-Hastings algorithm to obtain MCMC samples for variance components [20]. The difference between the method described in [9] and the present method is that the variances are also treated as unknown model parameters in the Bayesian model, so that variances and SNP effects are jointly estimated in a single model. This allows for estimation of individual SNP effects, which allows the model to be more easily scaled up to a multi-trait analysis. Weighted residuals were used in the model and latent variables were used to model the covariances between traits within each SNP and between residuals. The bivariate model specification was:

{\begin{array}{c} P M_{1} = 1 μ_{1} + \sum_{i = 1}^{M} X_{i} b_{1 i} + v_{1} W_{1}^{- 1 / 2} 1 + e_{1} \\ P M_{2} = 1 μ_{2} + \sum_{i = 1}^{M} X_{i} b_{2 i} + v_{2} W_{2}^{- 1 / 2} 1 + e_{2} \end{array}

(1)

where PM₁ and PM₂ are vectors with PM for the two traits on a common list of individuals, μ₁ and μ₂ are the PM means of each trait, x_i are vectors of coded genotypes of the individual for i = 1, …, M SNPs, b_ki is the random regression coefficient modeling the effect for SNP i on trait k W is a diagonal matrix with 1/SEP² as diagonal elements, l is a vector of latent effects that models the correlated part of the residuals (note the use of the same vector l for both traits), ν₁ and ν₂ are scale factors for the effect of the latent vector l on each trait, which can be interpreted as the elements of the first eigenvector of the residual variance-covariance matrix (see below), and e₁ and e₂ are the uncorrelated parts of the model residuals.

The genotype coding in x_i was done as 2p-2, 2p-1, and 2p for homozygotes for the first allele, heterozygotes, and homozygotes for the second allele. This is similar to [21], except that p is the frequency of the first allele. Such coding standardizes the means of the genotype covariates to zero, assuming Hardy-Weinberg equilibrium of genotype frequencies, and the regression of such a genotype coding on the PM represents the allele substitution effect for substituting the first coded with the second coded allele. Covariances between the SNP effects were also modeled using a latent variable, but this was specified as a hierarchy in the Bayesian model. In this multi-trait model, the effects of a SNP on the two traits were correlated; therefore the variance of marker and residual effects were $v a r (b_{1}, b_{2}) \sim [\begin{array}{c} σ_{b_{1}}^{2} & σ_{b_{1} b_{2}} \\ σ_{b_{1} b_{2}} & σ_{b_{2}}^{2} \end{array}]$ and $v a r (e_{1}, e_{2}) \sim [\begin{array}{c} σ_{e_{1}}^{2} & σ_{e_{1} e_{2}} \\ σ_{e_{1} e_{2}} & σ_{e 2}^{2} \end{array}]$ , respectively. Note that the elements of var(b₁b₂) were assumed the same across the genome.

The distributional assumptions of the model parameters were:

\begin{array}{l} \begin{array}{c} \begin{array}{c} \begin{array}{c} μ_{1}, μ_{2}, v_{1}, v_{2} \sim U (- \infty, \infty) \\ 1 \sim N (0, I δ_{1}^{2}) \\ e_{1} \sim N (0, W_{1}^{- 1} δ_{2}^{2}) \end{array} \\ e_{2} \sim N (0, W_{2}^{- 1} δ_{2}^{2}) \\ b_{1} \sim N (u_{1} s, t_{2}^{2}) \end{array} \\ b_{2} \sim N (u_{2} s, t_{2}^{2}) \\ s \sim N (0, I t_{1}^{2}) \end{array} \\ \begin{array}{c} δ_{1}^{2}, δ_{2}^{2}, t_{1}^{2}, t_{2}^{2} \sim U (0, \infty) \\ u_{1}, u_{2} \sim U (- \infty, \infty) \end{array} \end{array}

with the constraint that |ν| = 1, where ν = (ν₁, ν₂), |u| = 1, where u = (u₁,u₂), and where N() denotes a normal distribution with mean and variance parameter, U() denotes a uniform distribution on the given interval.

The modeled residual variance-covariance structure can be shown to be:

\begin{array}{l} var (\begin{array}{l} ν_{1} W_{1}^{- 1 / 2} l + e_{1} \\ ν_{2} W_{2}^{- 1 / 2} l + e_{2} \end{array}) = [\begin{array}{c} ν_{1}^{2} W_{1}^{- 1} δ_{1}^{2} + I δ_{2}^{2} & ν_{1} ν_{2} W_{1}^{- 1 / 2} W_{2}^{- 1 / 2} δ_{1}^{2} \\ ν_{2} ν_{1} W_{2}^{- 1 / 2} W_{1}^{- 1 / 2} δ_{1}^{2} & ν_{2}^{2} W_{2}^{- 1} δ_{1}^{2} + I δ_{2}^{2} \end{array}] \\ = [\begin{array}{c} ν_{1}^{2} δ_{1}^{2} + I δ_{2}^{2} & ν_{1} ν_{2} δ_{1}^{2} \\ ν_{2} ν_{1} δ_{1}^{2} & ν_{2}^{2} δ_{1}^{2} + I δ_{2}^{2} \end{array}] \times [\begin{array}{c} W^{- 1} & W_{1}^{- 1 / 2} W_{2}^{- 1 / 2} \\ W_{2}^{- 1 / 2} W_{1}^{- 1 / 2} & W_{2}^{- 1} \end{array}] \end{array}

where the first part corresponds to a special form of the spectral decomposition of the variance-covariance matrix R, such that it can be shown that ν = (ν₁, ν₂) is the first eigenvector of R, $δ_{2}^{2}$ is the second eigenvalue of R, and $δ_{1}^{2}$ estimates the difference between the first eigenvalue and the second eigenvalue. In the same way, the vectors of SNP effects, b, are correlated through the use of common latent vectors, s, and the variance-covariance structure for SNP effects can be shown to have a covariance of $u_{1} u_{2} t_{1}^{2}$ and variances $u_{1}^{2} t_{1}^{2} + t_{2}^{2}$ and $u_{2}^{2} t_{1}^{2} + t_{2}^{2}$ . Again, (u₁,u₂) can be interpreted as the first eigenvector of the variance-covariance matrix, $t_{2}^{2}$ as the second eigenvalue, and $t_{1}^{2}$ as the difference between the first eigenvalue and the second eigenvalue.

Implementation

The MCMC estimation for this model was straightforward for μ₁, μ₂, b_1i, b_2i, because these parameters have conditional normal distributions that are independent between traits and therefore can be updated in a “single trait manner”. Also ν₁ and ν₂ have conditional normal distributions and were updated as a regression on the vector l. They were scaled to unity norm after sampling to apply the constraint on |ν|. The same inverse scaling was applied to the latent vector l, because ν and l are multiplicative in the model. Applying the same scaling to both ν and l is arbitrary but forces the model to uniquely explain all the variance through l and makes parameters identifiable. The vector of latent residual effects, l, works across traits but its conditional distribution is also normal and derived by unifying the two trait equations into a single equation:

[\begin{array}{c} {\tilde{y}}_{1} \\ {\tilde{y}}_{2} \end{array}] = [\begin{array}{c} W_{1}^{- 1 / 2} v_{1} \\ W_{2}^{- 1 / 2} v_{2} \end{array}] [1] + [\begin{array}{c} e_{1} \\ e_{2} \end{array}],

where ${\tilde{y}}_{k} = y_{k} - μ_{k} - \sum_{i = 1}^{M} x_{i} b_{ki}$

Posterior analyses

Model fit was assessed by visual inspection of model residuals plotted against the reliability of the estimated breeding values (EBV) for the six traits and significance of the slope of regression line from zero was tested using a t-test. Reliability of EBV was calculated as:

r^{2} = (1 - (\frac{S E P^{2}}{σ_{s}^{2}}))

Posterior statistics for any function of the model parameters can be easily obtained when such a function is computed on the primary MCMC samples of the model parameters. This was applied to compute direct genomic breeding values (DGV) of individuals and genomic and residual (co)variances per chromosome and parameters derived thereof. For all these estimates, posterior means and posterior standard deviations were obtained. The genomic parameters were based on the constructed DGV of individuals which automatically take into account the covariance generated between SNP due to linkage disequilibrium (LD). Markov chain Monte Carlo samples of individual genomic values for trait 1 ( $g_{1}^{*}$ ) and trait 2 ( $g_{2}^{*}$ ) were constructed from the MCMC samples of allele effects for the two traits ( $b_{1}^{*}$ , $b_{2}^{*}$ ) as $g_{1}^{*} = \sum x_{i} b_{1 i}^{*}$ and $g_{2}^{*} = \sum x_{i} b_{2 i}^{*}$ . Using markers only in specified intervals (e.g. per chromosome or specified blocks of SNP within a chromosome), MCMC samples of individual DGV per interval $g_{1 c}^{*}$ and $g_{2 c}^{*}$ for interval c were constructed. From the MCMC samples of individual DGV, MCMC samples of genomic variances and covariances were subsequently constructed by computing $σ_{g 1}^{2} * = v a r (g_{1}^{*})$ , $σ_{g 2}^{2} * = v a r (g_{2}^{*})$ , and $σ_{g 12} * = c o v (g_{1}^{*}, g_{2}^{*})$ _, which was done for the whole-genome DGV and for the interval-wise DGV. Furthermore, MCMC samples of genomic correlations, $r_{g} = \frac{σ_{g 12}^{*}}{σ_{g 1}^{*} . σ_{g 2}^{*}}$ , were computed, and finally MCMC samples of the genomic proportions of the total variance (GPV) were computed. From these constructed MCMC samples, posterior statistics such as the posterior means and posterior standard deviations were collected.

Inferences were based on 40 000 samples with a burn-in of 5 000 samples. Every 50^th sample was saved and used for post-MCMC analysis. Convergence of the Markov chains was ensured by visual inspection of trace plots and plots of autocorrelations between lags for each model parameter.

Results

The number of daughters with data from individual bulls and the low heritabilities of the traits both affected the reliability of the EBV and the posterior standard deviations of the PM. For example, 80% of the bulls had between 5 and 50 daughters with phenotypic information. This resulted in average reliabilities of EBV of 0.30, 0.36, 0.32, 0.30, and 0.38 for mastitis caused by Staph. aureus, CNS, E. coli, Strep. dysgalactiae, and Strep. uberis, respectively. The heritability of general mastitis was higher than that of the pathogen-specific mastitis traits, resulting in a higher average reliability, i.e. 0.57. Accuracy of the PM was assessed by studying their posterior standard deviations (SD). Figure 1 shows that the SD for general mastitis was larger when the number of daughters was low. Similar results were observed for the pathogen-specific mastitis traits (not shown).

Model fit

Model fit was assessed by plotting model residuals (observed PM-DGV) against different variables. In Figure 2, examples of plots of residuals against reliability of EBV are shown for general mastitis and mastitis caused by Staph. aureus. The slope of the regression line was significantly different from zero (t-test; p < .0.05) for all traits except Staph. aureus mastitis. For Staph. aureus mastitis, the estimation errors of the DGV clearly increased when reliabilities of the EBV reached values below 0.5. This trend was observed for all the pathogen-specific mastitis traits. For general mastitis, which had higher heritability and EBV reliabilities estimation errors of the DGV increased when EBV reliabilities were below 0.7.

Whole-genome GPV

The average of the posterior means of GPV from the pair-wise analyses of the mastitis traits differed among traits (Table 1). Among the pathogen-specific mastitis traits, the largest value was found for CNS, followed by Strep. uberis, E. coli, Staph. aureus, and Strep. dysgalactiae. Analysis of a trait in different pair-wise trait combinations, resulted in similar GPV for the trait. As expected, the GPV of general mastitis was higher than that of pathogen-specific mastitis traits, except when compared to mastitis caused by CNS. Table 1, shows the pedigree-based heritabilities of the trait PM for comparison. Pedigree-based heritabilities were all smaller than the GPV but had the same ranking across traits.

Table 1 Whole-genome genomic proportions of total variance (GPV) and pedigree-based estimates of heritability h ² for progeny means of mastitis susceptibility to five pathogens and general mastitis and standard deviations (SD) of the estimates

Full size table

Whole-genome correlation

Genomic correlations (Table 2) among the investigated traits were moderate to high (0.22 to 0.72). Genomic correlations among the pathogen-specific traits (0.22 to 0.51) were lower than genomic correlations between general mastitis and the pathogen-specific mastitis traits (0.55 to 0.72) because of their part-whole relationship, i.e. pathogen-specific cases of mastitis are part of the general mastitis cases.

Table 2 Genomic correlations among the five pathogen-specific mastitis traits and general mastitis

Full size table

Chromosome-wise GPV

All chromosomes explained significant amounts of genetic variance for each trait, except chromosome X (BTA30), which explained substantially less variance than would be expected based on its size. For example, the variance explained by chromosome X was 93% lower than expected for general mastitis. For the other chromosomes, the trend is that chromosome-wise GPV (Figure 3) increases with chromosome size (Bos Taurus 4.0; [22]), i.e. larger chromosomes tend to explained more genomic variance. This pattern was seen for all traits.

Some chromosomes deviated from the general trend and explained more variance than would be expected according to their relative size, i.e. they may contain relatively more QTL or QTL with larger effects on the trait and such chromosomes differed across traits; chromosomes with relatively large GPV were BTA6, 13, 14, 16, 19, and 26 for Staph. aureus mastitis; BTA1, 11, 14, 17, 19, and 20 for CNS; BTA6, 11, 13, 14, 16, 19, and 21 for E. coli; BTA3, 14, 17, 19, 20 and 25 for Strep. dysgalactiae; and BTA6, 13, 14, 18, 19, 25 and 27 for Strep. uberis. Additional chromosomes with less pronounced effects, but still above their expected value, were observed for most of the pathogen-specific traits. Chromosomes showing a large variance for general mastitis were BTA3, 5, 6, 14, and 19. Only BTA19 had higher GPV than expected according to size for all traits.

Chromosome-wise genomic covariances

For all chromosomes, covariances between traits were positive (results not shown), resulting in overall positive genomic correlations between the traits, given the default prior assumptions for the latent vector, l. For all trait pairs, there was a high proportion of chromosomes with larger covariances than expected, which may indicate the presence of pleiotropic QTL. In addition, in several cases a limited number of chromosomes accounted for a major part of the total covariance, e.g. BTA3, 6, 10, 11, 13, 14, 16, 19, 21 and 29 for the covariance between Staph. aureus and E. coli (Figure 4).

Chromosome-wise genomic correlations

In Figure 5, three examples are shown to illustrate the differences in chromosome-wise genomic correlations between pair-wise trait combinations with Staph. aureus. Differences between chromosomes were still noticeable but were far less pronounced compared with the chromosome-wise genomic covariances. Similar to the chromosome-wise GPV, some chromosomes had larger genomic correlations than expected based on the genome-wide genomic correlation. This could indicate the presence of pleiotropic QTL, e.g. BTA16 and 19 for the combination of Staph. aureus and Strep. uberis. In contrast to the genomic covariances, the genomic correlations did not depend on chromosome size but were similar across chromosomes.

Region-wise genomic variance and correlation

BTA19 was further investigated as this chromosome showed a clear effect on all traits. Profiles of genomic variances across this chromosome were created by computing posterior variances in half-overlapping blocks of 100 SNP for each trait. This means that one computation was done for blocks with SNP 1–100, 101–200 etc., and a second computation was done for blocks with SNP 1–50, 51–150 etc. and then the values of overlapping blocks were averaged to smooth out blocks of 50 SNP.

In general, the genomic variance on BTA19 (Figure 6) was spread across the entire chromosome, except at the chromosome ends, with regions of larger variance than average. The variance patterns differed between traits but a common peak was observed around 35 Mb for the pathogen-specific traits except E. coli. This peak was most pronounced for CNS and Strep. uberis. Also, a peak was observed around 10 Mb for E. coli and Strep. dysgalactiae. No clear peaks were observed for general mastitis, possibly because of the composition of this trait.

Establishing the profiles of genomic correlations for BTA19 as described above, revealed the pleiotropic regions along the chromosome. Figure 7 shows such profiles for the genomic correlations between Staph. aureus and the other traits. The magnitude of the genomic correlations differed between trait combinations, with the largest genomic correlations obtained between Staph. aureus and general mastitis because of their part-whole relationship. The genomic correlation between pathogen-specific traits was highest for Staph. aureus and CNS, likely because both are staphylococci. The genomic correlation between Staph. aureus and Strep. dysgalactiae was also high, while that between Staph. aureus and Strep. uberis was the lowest, followed by the correlation between Staph. aureus and E. coli. The positions and widths of the pleiotropic regions differed between trait combinations. For example, a very narrow region around 35 Mb was observed for the genomic correlation between Staph. aureus and Strep. uberis, while it was much wider around 30 Mb for the genomic correlation between Staph. aureus and E. coli.