Skip to main content


Improved precision of QTL mapping using a nonlinear Bayesian method in a multi-breed population leads to greater accuracy of across-breed genomic predictions



Genomic selection is increasingly widely practised, particularly in dairy cattle. However, the accuracy of current predictions using GBLUP (genomic best linear unbiased prediction) decays rapidly across generations, and also as selection candidates become less related to the reference population. This is likely caused by the effects of causative mutations being dispersed across many SNPs (single nucleotide polymorphisms) that span large genomic intervals. In this paper, we hypothesise that the use of a nonlinear method (BayesR), combined with a multi-breed (Holstein/Jersey) reference population will map causative mutations with more precision than GBLUP and this, in turn, will increase the accuracy of genomic predictions for selection candidates that are less related to the reference animals.


BayesR improved the across-breed prediction accuracy for Australian Red dairy cattle for five milk yield and composition traits by an average of 7% over the GBLUP approach (Australian Red animals were not included in the reference population). Using the multi-breed reference population with BayesR improved accuracy of prediction in Australian Red cattle by 2 – 5% compared to using BayesR with a single breed reference population. Inclusion of 8478 Holstein and 3917 Jersey cows in the reference population improved accuracy of predictions for these breeds by 4 and 5%. However, predictions for Holstein and Jersey cattle were similar using within-breed and multi-breed reference populations. We propose that the improvement in across-breed prediction achieved by BayesR with the multi-breed reference population is due to more precise mapping of quantitative trait loci (QTL), which was demonstrated for several regions. New candidate genes with functional links to milk synthesis were identified using differential gene expression in the mammary gland.


QTL detection and genomic prediction are usually considered independently but persistence of genomic prediction accuracies across breeds requires accurate estimation of QTL effects. We show that accuracy of across-breed genomic predictions was higher with BayesR than with GBLUP and that BayesR mapped QTL more precisely. Further improvements of across-breed accuracy of genomic predictions and QTL mapping could be achieved by increasing the size of the reference population, including more breeds, and possibly by exploiting pleiotropic effects to improve mapping efficiency for QTL with small effects.


The accuracies of genomic predictions are often reported to decrease with increasing genetic distance (or meiosis) from the reference population. For example, Habier et al. [1] showed that, in the German Holstein breed, accuracies of genomic predictions of animals that were distantly-related to the reference population declined. Saatchi et al. [2] reported a decline in accuracy of genomic predictions that were derived from a US Hereford population when they were tested in Canadian, Uruguayan or Argentinean Hereford populations. These results suggest that the linkage disequilibrium (LD) between markers and quantitative trait loci (QTL) was different in the validation population compared to the reference or training population. This occurs because LD within a group of related animals may be lost due to recombination in less closely-related animals. Several authors also reported that the accuracy of genomic predictions was poor for a breed not included in the reference (i.e. across-breed genomic predictions) [3,4]. Across-breed prediction is particularly challenging because, in addition to the possible occurrence of inconsistent LD between markers and QTL [5,6], QTL may be breed-specific, which places an upper limit to the accuracy that can be reached in another breed.

This problem of poor prediction for animals not closely-related to the reference population is exacerbated when BLUP (best linear unbiased prediction) is used to derive prediction equations. BLUP (or the mathematical equivalent genomic BLUP, GBLUP) is widely used for genomic prediction because of its computational efficiency and because it performs almost as well as nonlinear methods for within-breed prediction [7,8]. GBLUP assumes that the effects of all markers are drawn from the same normal distribution, which implies that all markers are assumed to have very small effects. In spite of this unrealistic assumption, GBLUP can capture the effects of QTL, even if the effects are moderate to large, by using a linear combination of markers. Since LD can extend over long genomic distances, this linear combination can include markers that are a long distance away from the QTL. For example, long-range LD probably explains why predictions based on 50 K single nucleotide polymorphism (SNP) markers have similar accuracies as predictions based on higher density chips (800 K) for within-breed prediction of Holstein cattle [9,10]. Thus, closely-related animals inherit similar long chromosomal segments to those of the animals in the reference population and hence the same linear combination of markers will predict the effect of QTL. However, if recombination breaks up these long chromosomal segments, the predictive power of the linear combination of markers will decrease [1]. In contrast, non-linear methods, such as BayesB [11], allow the effects of some markers to be large, while many markers have zero (or near-zero) effect. This allows the prediction equation to be driven by markers that are close to the QTL and in strong LD with it. The LD between such markers and their associated QTL is broken down less quickly because the recombination distance between them is small. Although using non-linear alternatives (e.g. BayesA, BayesB, BayesR) is not always superior to GBLUP for within-breed prediction, nonlinear methods are expected to improve the persistency of the accuracy of genomic predictions over future generations [1,10].

Within a single breed, a marker may be in strong LD with a QTL in spite of being some distance away. Therefore, to find markers close to and in LD with QTL in all breeds, using a reference population that includes more than one breed can be advantageous. Combining animals from multiple breeds in a reference dataset will reduce the long-range LD that is present within a breed but may not necessarily increase the accuracy of predictions, particularly if predictions are evaluated in direct offspring of the reference population. Thus, it is not surprising that, in the literature, the reported benefits of using a multi-breed reference dataset are mixed. However, some improvements have been observed for breeds with small (within-breed) reference populations and, in general, results have been more promising for beef cattle than for dairy cattle [4,12-16]. In some cases, the failure of prediction equations to benefit from a multi-breed reference population could be due to the use of medium (50 K) density SNP chips, which are unlikely to have consistent across-breed LD [6].

In this paper, we show that using a large reference population from two breeds, combined with high-density SNP genotypes and a nonlinear method for the analysis (BayesR) increases the accuracy of genomic predictions in a breed that is not included in the reference population. To create a large reference population, we expanded the current Australian reference population of progeny-tested bulls by including cows e.g. [16,17]. Our dataset of bulls and cows is similar to that recently used by Raven et al. [18] for a genome-wide association study. Here, in contrast to [18], we fitted all SNPs simultaneously and extended the BayesR methodology from Erbe et al. [10] to include cow records and estimate fixed effects. The use of cows requires making changes in the evaluation procedure because cows and bulls have different degrees of uncertainty in their measurement, i.e. there is heterogeneous error variance. In addition, if nonlinear methods identify markers that are close to QTL, they should be able to map the QTL with greater precision than alternative methods such as GBLUP. We assessed the ability of BayesR with the multi-breed reference to fine-map QTL by mapping known loci such as DGAT1 [19] and identify new genes that affect dairy traits by combining the BayesR results with differential gene expression of the mammary gland compared to that of 17 other tissue types.




Illumina Bovine HD genotypes (777 K SNPs) were available for 1620 Holstein bulls and cows, 125 Jersey bulls, and 114 Australian Red bulls. After quality control, carried out as in Erbe et al. [10], and removal of non-polymorphic SNPs, 632 002 SNPs remained. A total of 10 311 Holstein, 4738 Jersey and 249 Australian Red bulls and cows were genotyped with the Illumina Bovine SNP array (54 K SNPs) and passed parentage verification. After quality control, 43 425 SNPs remained. All animals had genotypes imputed to the higher density SNP panel using Beagle 3 [20]. The Australian Red animals were used only to evaluate the prediction equations derived from reference populations of Holstein animals, Jersey animals or Holstein plus Jersey animals. Australian Red cattle have a large component of Scandinavian Red ancestry and are genetically distinct from Holstein and Jersey cattle (Figure 1). The average LD between markers for Australian Red cattle is lower than that of either Holstein or Jersey cattle (see Additional file 1: Figure S1).

Figure 1

Relationships between Holstein, Jersey and Australian Red dairy cattle. Shown are principal components 1 and 2 for the genomic relationship matrix [24] constructed from a random sample of Holstein (n =334) and Jersey (n =326) animals with the genotyped Australian Red (n =313) animals. Principle components were obtained using the eigen() function in R [50].


Phenotypes for the genotyped animals (trait deviations for cows and daughter trait deviations for bulls) were from the April 2013 genetic evaluations from the Australian Dairy Herd Improvement Scheme (ADHIS) for fat yield (FY), milk yield (MY), protein yield (PY), fertility (FERT), stature (STAT) and survival (SURV). Trait deviations were corrected for herd year season effects, permanent environmental effects, and heterosis. Milk composition traits, i.e. percentage of fat and protein in milk (F% and P%), were calculated using a linear approximation of the milk production and milk solid yield traits. For example, F% was calculated as:

$$ F\%=\frac{F{Y}_P}{M{Y}_P}\left[\frac{FY}{F{Y}_P}-\frac{MY}{M{Y}_P}\right], $$

where FY P is the (within-breed) average fat yield and MY P is the (within-breed) average milk yield. P% was calculated using the same methodology. Values for FY P (kg/lactation in Holstein = 284; Jersey = 522; Australian Red = 256), PY P (kg/lactation in Holstein = 243; Jersey = 193; Australian Red = 216) and MY P (L/lactation in Holstein = 7417; Jersey = 5273; Australian Red = 6254) were from the 2012 ADHIS annual report [21].

Reference and validation datasets

The Holstein and Jersey phenotypes were split into reference and validation datasets for each trait. The reference datasets consisted of six different combinations of up to 11 527 Holstein and 4687 Jersey animals. The six reference sets were: (1) Holstein bulls, (2) Jersey bulls, (3) Holstein and Jersey bulls, (4) Holstein bulls and cows, (5) Jersey bulls and cows or (6) all reference animals (Holstein and Jersey bulls and cows).

The four validation datasets consisted of a minimum of: (1) 251 Holstein bulls, (2) 81 Jersey bulls, (3) 247 Australian Red cows or (4) 114 Australian Red bulls. Validation animals for the Holstein and Jersey breeds were selected on the basis of birth year and cows that were progeny of bulls in the validation set were removed from the reference set. All bulls had more than 20 effective daughter records. The number of animals in the reference and validation populations for each breed and each trait are in Table 1.

Table 1 Number of phenotypic records for each trait in the reference and validation sets

Model fitted to the reference data

Genomic predictions for each trait were estimated for the validation datasets using only animals in the prescribed reference dataset. Two procedures were used to estimate marker effects, either GBLUP or BayesR. The model that was fitted to the reference dataset in both cases included fixed effects (overall mean, breed and sex nested within breed, when appropriate), SNP effects and polygenic effects. The model was:

$$ \mathbf{y}=\mathbf{X}\mathbf{b}+\mathbf{Z}\mathbf{a}+\mathbf{W}\mathbf{v}+\mathbf{e}, $$


y = vector of n trait or daughter deviations (phenotypes) for cows or bulls,

b = vector of p fixed effect solutions,

a = vector of q polygenic breeding values, distributed N(0, A σ 2 a ),

v = vector of m SNP effects,

e = vector of n residual errors, distributed N(0, E σ 2 e ),

X = design matrix allocating phenotypes to fixed effects (X = n by p matrix),

Z = design matrix allocating phenotypes to polygenic breeding values (Z = n by q matrix),

W = design matrix of SNP marker genotypes (W = n by m matrix),

A = numerator relationship matrix,

σ 2 a  = additive genetic variance not explained by the SNPs,

σ 2 e  = error variance.

Constructing the matrix of weights for errors (E)

The analysis aimed to account for the uncertainty in phenotypic records, particularly between bulls and cows but also for bulls with few or many daughters. This uncertainty affects the error variance associated with each record, that is e ~ N(0,E σ 2 e ), where E is a diagonal matrix constructed as diag(1/w i ), where w i is the weighting coefficient for each animal. Weights were scaled such that the error variance for animals with one observation of their own phenotype is σ 2 e . The calculation of the weighting coefficient differs between cows (which have their own records) and bulls (for which phenotypes are daughter deviations) and was done following Garrick et al. [22] i.e.:

$$ {w}_i(bulls)=\frac{d\left(1-{h}^2\right)}{4-{h}^2}, $$
$$ {w}_i(cows)=\frac{r\left(1-{h}^2\right)}{1+\left(r-1\right)t-r{h}^2}, $$

where h 2 is the heritability of a single record of the trait, d is the effective number of daughters, r is the number of records per cow and t is the repeatability of the trait. All variables (h 2, d, r and t) were obtained from ADHIS, and the heritabilities and repeatabilities for each trait are in Table 1.


The GBLUP method was implemented using restricted maximum likelihood in ASReml [23]. GBLUP assumes that all marker effects are drawn from the same distribution [i.e. \( \mathbf{v}\sim N\left(0,\mathbf{I}{\sigma}_v^2\right) \)] and a model equivalent to Equation (ii) was fitted in which Wv = Qg, where Q is a (n x n) design matrix allocating phenotypes to animals and \( \mathbf{g}\sim N\left(0,\mathbf{G}{\sigma}_g^2\right) \)]. G was calculated according to Yang et al. [24] and \( {\sigma}_g^2 \) is the genetic variance explained by all SNPs and was estimated from the data. Solutions for fixed effects (\( \widehat{\mathbf{b}} \)), polygenic breeding values (â) and genomic estimated breeding values (ĝ) for the model y = Xb + Za + Qg + e are the same as for the mixed model equations following [25]:

$$ \left[\begin{array}{ccc}\hfill \mathbf{X}\hbox{'}{\mathbf{R}}^{-1}\mathbf{X}\hfill & \hfill \mathbf{X}\hbox{'}{\mathbf{R}}^{-1}\mathbf{Z}\hfill & \hfill \mathbf{X}\hbox{'}{\mathbf{R}}^{-1}\mathbf{Q}\hfill \\ {}\hfill \mathbf{Z}\hbox{'}{\mathbf{R}}^{-1}\mathbf{X}\hfill & \hfill \mathbf{Z}\hbox{'}{\mathbf{R}}^{-1}\mathbf{Z}+{\mathbf{A}}^{-1}{\sigma}_a^{-2}\hfill & \hfill \mathbf{Z}\hbox{'}{\mathbf{R}}^{-1}\mathbf{Q}\hfill \\ {}\hfill \mathbf{Q}\hbox{'}{\mathbf{R}}^{-1}\mathbf{X}\hfill & \hfill \mathbf{Q}\hbox{'}{\mathbf{R}}^{-1}\mathbf{Z}\hfill & \hfill \mathbf{Q}\hbox{'}{\mathbf{R}}^{-1}\mathbf{Q}+{\mathbf{G}}^{-1}{\sigma}_g^{-2}\hfill \end{array}\right]\left[\begin{array}{c}\hfill \widehat{\mathbf{b}}\hfill \\ {}\hfill \widehat{\mathbf{a}}\hfill \\ {}\hfill \widehat{\mathbf{g}}\hfill \end{array}\right]=\left[\begin{array}{c}\hfill \mathbf{X}\hbox{'}{\mathbf{R}}^{-1}\mathbf{y}\hfill \\ {}\hfill \mathbf{Z}\hbox{'}{\mathbf{R}}^{-1}\mathbf{y}\hfill \\ {}\hfill \mathbf{Q}\hbox{'}{\mathbf{R}}^{-1}\mathbf{y}\hfill \end{array}\right], $$

where R −1 = E −1 σ −2 e , and all other terms are as described above. Following [25], the solutions are:

$$ \widehat{\mathbf{b}}={\left[{\mathbf{X}}^{\hbox{'}}{\mathbf{R}}^{-1}\mathbf{X}\right]}^{-1}\mathbf{X}\hbox{'}{\mathbf{R}}^{-1}\left(\mathbf{y}-\mathbf{Z}\widehat{\mathbf{a}}-\mathbf{Q}\widehat{\mathbf{g}}\right), $$
$$ \widehat{\mathbf{a}}={\left[{\mathbf{Z}}^{\hbox{'}}{\mathbf{R}}^{-1}\mathbf{Z}+{\mathbf{A}}^{-1}{\sigma}_a^{-2}\right]}^{-1}\mathbf{Z}\hbox{'}{\mathbf{R}}^{-1}\left(\mathbf{y}-\mathbf{X}\widehat{\mathbf{b}}-\mathbf{Q}\widehat{\mathbf{g}}\right), $$
$$ \widehat{\mathbf{g}}={\left[{\mathbf{Q}}^{\hbox{'}}{\mathbf{R}}^{-1}\mathbf{Q}+{\mathbf{G}}^{-1}{\sigma}_g^{-2}\right]}^{-1}\mathbf{Q}\hbox{'}{\mathbf{R}}^{-1}\left(\mathbf{y}-\mathbf{X}\widehat{\mathbf{b}}-\mathbf{Z}\widehat{\mathbf{a}}\right). $$

In this study, solutions for \( \hat{\mathbf{g}} \) were obtained with ASReml [23] and then back-solved to estimate SNP effects, i.e. to obtain solutions for \( \hat{\mathbf{v}} \), where back-solving was as described by Yang et al. [26].


This paper extends BayesR, following Meuwissen et al. [11], Meuwissen and Goddard [27] and Erbe et al. [10], with modifications to account for the heterogeneous error variance in the phenotypes and to estimate the fixed effects in the model. BayesR [10] assumes that the distribution of SNP effects is a mixture of normal distributions, with the kth component comprising a proportion prk of the mixture and having variance σ 2 k . Similar to the construction of the G matrix [24], SNP alleles in W were standardised prior to analysis in BayesR to have a unit variance (i.e. \( \left[{w}_{i,j}-2 freq\left({w}_j\right)\right]/\sqrt{2 freq\left({w}_j\right)\left(1- freq\left({w}_j\right)\right)} \), where w i,j is the genotype of SNP j for animal i, and freq(w j ) is the allele frequency of j). Note that in the following, the current estimates of the parameters in the Gibbs sampler used in the analysis (e.g. \( \tilde{\mathbf{b}} \)) are distinguished from the final estimates (e.g. \( \widehat{\mathbf{b}} \)) using superscript notation. The model is implemented using the following steps:

  1. (1)

    The error variance was sampled from a scaled inverse chi-squared distribution with mean equal to ẽE − 1 and n – 2 degrees of freedom where \( \tilde{\mathbf{e}}=\mathbf{y}-\mathbf{X}\tilde{\mathbf{b}}-\mathbf{Z}\tilde{\mathbf{a}}-\mathbf{W}\tilde{\mathbf{v}} \), and \( \tilde{\mathbf{b}} \), ã and are the current values of those terms in the model.

  2. (2)

    The fixed effects were sampled from a normal distribution with a mean given by [X ' R − 1 X]− 1 X ' R − 1 y*, following Equation (vi) where y* is the phenotype corrected for the current estimates of all other terms and fixed effects in the model, with variance [X ' R − 1 X]− 1.

  3. (3)

    The polygenic effect was sampled from a normal distribution, with mean for animal i equal to \( {\left[{{\mathbf{Z}}_i}^{\hbox{'}}{\mathbf{R}}_{ii}^{-1}{\mathbf{Z}}_{\mathrm{i}}+{\boldsymbol{A}}_{ii}^{-1}{\upsigma}_a^{-2}\right]}^{-1}{\mathbf{Z}}_i^{\hbox{'}} \), following Equation (vii), where Z i is the row corresponding to animal i in Z and A ii −1 and R ii −1 are the ith diagonal elements of A −1 and R −1, respectively. The variance of the sampling distribution for the polygenic effect for animal i was \( {\left[{{\mathbf{Z}}_i}^{\hbox{'}}{\mathbf{R}}_{ii}^{-1}{\mathbf{Z}}_i+{\mathbf{A}}_{ii}^{-1}{\upsigma}_a^{-2}\right]}^{-1} \). More details on the estimation of the polygenic effects are in the appendix of [28].

  4. (4)

    The polygenic variance was sampled from a scaled inverse chi-squared distribution with mean ãA − 1 ã and n – 2 degrees of freedom.

  5. (5)

    The effect of SNP j was sampled by first sampling a component of the mixture and then drawing j from that distribution. A residual model (i.e. \( {\mathbf{y}}_j^{*}={\mathbf{W}}_{.j}{v}_j+{\mathbf{e}}_{\boldsymbol{j}} \), where y j * is the phenotype corrected for all other effects, excluding the current marker j, W .j is column j from the genotype matrix W, v j is the allelic substitution effect of marker j, and e j is the error) was used to determine the full conditional posterior probability for each distribution k as:

    $$ L\left({v}_{j,k}\Big|{\sigma}_k^2\right)=-0.5\left[ \ln \left(1+{\mathbf{W}}_{.j}\hbox{'}{\mathbf{R}}^{-1}{\mathbf{W}}_{.j}{\sigma}_k^2\right)+{\mathbf{y}}_j^{*}\hbox{'}{\mathbf{R}}^{-1}{\mathbf{y}}_j^{*}-{\mathbf{y}}_j^{*}\hbox{'}{\mathbf{R}}^{-1}{\mathbf{W}}_{.j}{v}_{j,k}\right]+ \ln \left(p{r}_k\right), $$

    where pr k is the current estimate for the proportion of markers from distribution k, v j,k is an estimate of the effect for marker j when sampled from distribution k, and \( {v}_{j,k}={\left[{\mathbf{W}}_{.j}\hbox{'}{\mathbf{R}}^{-1}{\mathbf{W}}_{.j}+{\sigma}_k^{-2}\right]}^{-1}{\mathbf{W}}_{.j}\hbox{'}{\mathbf{R}}^{-1}{\mathbf{y}}^{*} \) (following Equation (vi)). The full conditional posterior probability that marker j is from distribution k was calculated as:

    $$ {\left\{{\displaystyle \sum_{l=1,4}} \exp \left[L\left({v}_{j,l}\Big|{\sigma}_l^2\right)-L\left({v}_{j,k}\Big|{\sigma}_k^2\right)\right]\right\}}^{-1}. $$

    More details for the derivation of the full conditional likelihood function are in Additional file 2 (see Additional file 2). Once the component of the mixture distribution was determined, allele effects were sampled from a normal distribution, using the residual model with a mean v j,k and variance \( {\left[{\mathbf{W}}_{.j}\hbox{'}{\mathbf{R}}^{-1}{\mathbf{W}}_{.j}+{\sigma}_k^2\right]}^{-1} \).

  6. (6)

    The prior pr k was updated by sampling from a Dirichlet distribution given by pr k  ~ Dir(α k  + β k ), where α k is the prior counts for markers from distribution k and β k is the current number of markers with effects sampled from distribution k. The prior assumed one marker from each distribution (i.e. α k =1).

BayesR was implemented with a multi-threaded C++ program to improve computing performance. Based on Erbe et al. [10], we defined four possible distributions for σ 2 v,k with variance equal to 0, 0.0001σ 2 a2, 0.001σ 2 a2, and 0.01σ 2 a2, where σ 2 a2 is the additive genetic variance explained by the pedigree, which was determined prior to the analysis by fitting y = Xb + Za + e (following Equation (ii)) in ASReml [23]. The Gibbs sampler used at least 30 000 iterations, with 20 000 iterations discarded as burn-in, and each analysis had five replicate Gibbs sampling chains. Final parameter estimates were the means of the sampled effects in the post burn-in iterations, which were obtained separately for each chain.

Assessment of the accuracy and bias of predictors

The model that was fitted to the reference datasets always included the estimate of marker effects (\( \widehat{\mathbf{v}} \)) and a polygenic (â) term. Thus, predictions for the Holstein and Jersey validation datasets considered only predictions based on genotype (\( {\widehat{\mathbf{y}}}_{\mathbf{v}}=\mathbf{W}\widehat{\mathbf{v}} \)) or prediction of the total genetic merit of the animal (\( \widehat{\mathbf{y}}=\widehat{\mathbf{a}}+\mathbf{W}\widehat{\mathbf{v}} \)). A proxy for the accuracy of prediction was assessed for these two quantities as the correlation between the prediction and the phenotype [i.e. \( r\left(\mathbf{y},\widehat{{\mathbf{y}}_{\mathbf{v}}}\right) \) or r(y, ŷ)] and the bias in the prediction was assessed as the regression coefficient of phenotype on the prediction [i.e. \( b\left(\mathbf{y},\widehat{{\mathbf{y}}_{\mathbf{v}}}\right) \) or b(y, ŷ)]. For BayesR, the accuracy was the average correlation across the five replicate chains.

Accuracies were calculated for many combinations of dataset (bulls or bulls and cows), for each method (GBLUP or BayesR), breed (single or both breeds), and with or without inclusion of the polygenic term in the prediction. Therefore, to summarize the effects of all these factors on accuracy, the accuracies (r) were analysed using the following linear model rm,n,o,p = datasetm + datasetm.methodn + datasetm.breedo + datasetm.polygenicp + em,n,o,p. We did not use this model to test significance of each factor because the accuracies were not independent. Rather, we used the model to estimate the effect of each factor and reported these estimates. Data on bias were analysed in the same way.

Derterminating the precision of QTL mapping

To map QTL, GEBV in sliding windows of 250 kb (i.e. ‘local’ GEBV) [29] were calculated for each animal from the multi-breed bull and cow reference dataset for the milk production traits (FY, MY, PY, F%, P%). Local GEBV were calculated as \( {\mathbf{W}}_{{\boldsymbol{j}}_1:{\boldsymbol{j}}_2}{\widehat{\mathbf{v}}}_{{\boldsymbol{j}}_1:{\boldsymbol{j}}_2} \), where \( {\mathbf{W}}_{{\boldsymbol{j}}_1:{\boldsymbol{j}}_2} \) and \( {\widehat{\mathbf{v}}}_{{\boldsymbol{j}}_1:{\boldsymbol{j}}_2} \) includes all SNP markers within a 250 kb region of the genome. Adjacent 250 kb windows were separated by 50 kb. The variance of the local GEBV was determined for each breed, trait and window. If the variance of a window was greater than 50 times that of an average window, the window was defined as containing a QTL. Windows that contained QTL were examined for possible candidate loci based on QTL reported in the literature and for genes that were over- or under-expressed in the mammary gland (P < 1 × 10−5) compared to 17 other tissue types [30]. To obtain the latter, RNA was extracted from 18 tissues (including mammary gland) in triplicate from a single lactating Holstein cow at one time point. RNA was sequenced on the Illumina HiSeq2000 platform using 100 base paired end reads. After quality control and filtering, approximately 4 × 107 to 1 × 108 reads per tissue were aligned to the Ensembl annotation of the UMD3.1 bovine genome assembly using Tophat2 [31]. A matrix of gene counts by tissue was constructed with HTSeq [32] and the bioconductor ‘edgeR’ package [33] was used to perform tissue-specific expression analysis where the intercept was the mean gene expression across all tissues.

QTL often affected more than one milk production trait. We summarised the pleiotropic pattern of the effects on milk production traits of windows that were identified to contain QTL as follows. The correlation between local GEBV [34] for all pairwise combinations of traits were calculated for windows for which the local GEBV variance for each trait was greater than 3 times that of an average window. Windows with mid-points separated by less than 0.5 Mbp and with similar patterns of effects were assumed to be part of the same QTL and combined into a single region. QTL were allocated to one of nine possible groups, first according to the QTL’s largest effect on a yield trait (FY, MY, PY) and then by the QTL’s pattern of pleiotropic effects on the remaining two yield traits (defined by either a negative (−) or positive (+) correlation, or with no (n) effect. For example, a 'FY-' QTL had its largest effect on FY and a negative correlation with (either one or both) of MY and PY. Similarly, a 'MYn' QTL had its largest effect on MY with no notable effect on either FY or PY. QTL regions affecting only P% were grouped with the MYn QTL as a change in milk composition was assumed to be a sensitive measure of increased milk volume (i.e. an increase in milk volume with no change in milk solids would result in a decreased P%. Hence P% was considered to be potentially more sensitive to changes in MY than to changes in milk volume than L of milk per lactation).


Variance components from GBLUP and BayesR

The proportions of phenotypic variance captured by polygenic effects, SNPs and residuals for each method were investigated to assess differences between the GBLUP and BayesR. The proportion of phenotypic variance captured by genetic terms (i.e. SNP + polygenic) in GBLUP and BayesR differed by less than 5% for most traits (Table 2). The notable exception was F% (Table 2), for which the BayesR estimate of the genetic variance (SNP + polygenic) was 20 to 30% smaller than the GBLUP estimate.

Table 2 Variance components from GBLUP and BayesR for the combined (bull and cow) reference sets

The total genetic variance accounted for less than 5% of the phenotypic variance for FERT and SURV; 20 to 60% of the phenotypic variance for FY, MY, PY and STAT; and more than 70% of the phenotypic variance for F% and P% (Table 2). The variance captured by SNPs was equal to about 70% of the genetic variance for production traits (FY, MY, PY, F%, P%) and about 90% for STAT and FERT. For SURV, the proportion of genetic variance captured by SNPs in Jersey cattle was low (less than 60%) compared to the estimate in either the Holstein or the multi-breed reference datasets (about 75%). BayesR and GBLUP resulted in very similar estimates of the variance captured by SNPs, relative to the total genetic variance, in Jersey cattle for most traits. However, SNPs in the Holstein and the multi-breed Holstein/Jersey reference datasets were estimated to capture 5 to 10% less of the total genetic variance with BayesR than with GBLUP.

The average number of SNPs in each distribution for BayesR indicates that most SNPs (>99%) had no effect on the traits (Table 3). More than 10 SNPs were estimated to come from the distribution with the largest variance (i.e. 0.01σ 2 a2 ) for P% and F% in the Jersey and the multi-breed Holstein/Jersey reference datasets, and for P%, STAT and FERT in the Holstein dataset. Between 5 and 10 SNPs were estimated from the distribution with the largest variance for FY, F% in Holstein cattle; for FY, MY, PY, STAT and FERT in Jersey cattle and for FY, MY, STAT and FERT in the multi-breed Holstein/Jersey reference dataset. SURV had the lowest number of SNPs estimated from the distribution with the largest variance for all traits for both the Holstein and Jersey datasets.

Table 3 Average number of SNPs estimated to be in each distribution by BayesR 1

Assessment of the accuracy and bias of predictions

Averaged across the five milk production traits, the accuracy of Holstein GBLUP predictions using bulls only in the reference dataset was equal to 0.61, compared with 0.52 if only pedigree information (no SNP effects) was used (see Additional file 3: Table S1). Increasing the size of the reference dataset by including cow records had the largest and most consistent effect on improving the accuracy of genomic predictions for milk production traits (FY, MY, PY, F%, P%) and STAT. The accuracy increased by an average of 5.4% in the Holstein and 4.2% in the Jersey breed when cows were added to the reference datasets for these traits. However, there was little or no benefit of adding cows to the reference dataset for FERT in the Holstein breed and for SURV in each breed. The effect of adding Jersey animals into the combined (bull and cow) Holstein reference dataset had little effect on the accuracies for milk production traits in the Holstein breed, but there was a small average increase in accuracy of 1% for milk production traits in the Jersey breed. Genomic predictions for all Jersey and Holstein validation populations are in Additional file 3 (see Additional file 3: Table S2 and Table S3). Table S1 (see Additional file 3: Table S1) summarises the effects of the reference dataset, the method of prediction and the addition of the polygenic term on the accuracy and bias of predictions.

Accuracies obtained with the BayesR method were generally equal to or higher than those with GBLUP (see Additional file 3: Table S1). The average increase in accuracy using BayesR for milk production traits was equal to 6 and 3% for the bull only and the combined (bull and cow) reference datasets for Holstein cattle and about 5% for Jersey cattle. The largest increases in accuracy when using BayesR were observed for F% for both Holstein and Jersey cattle, probably because of the large-effect loci that segregate for this trait [7,8]. This occurs in spite of the apparent underestimation of phenotypic variance by BayesR reported in Table 3.

Genomic predictions for FERT (r ≈ 0.50) and SURV (r ≈ 0.43) in Holstein cattle were little affected by the prediction method or reference dataset used (see Additional file 3: Table S1). In Jersey cattle, accuracies obtained for FERT when using SNP information and only bulls for training were lower than the accuracy of pedigree-based predictions obtained when using the combined (bull and cow) reference dataset. Accuracies of genomic predictions for SURV in Jersey were rarely higher than those based on pedigree data. It is likely that these results for FERT and SURV reflect the low heritability and low accuracy of the records for these traits.

Adding polygenic effects to the prediction model increased the accuracy for milk production traits by on average 1 (Holstein) and 3% (Jersey) when using the combined (bull and cow) reference datasets (see Additional file 3: Table S1). However, adding polygenic effects increased bias by on average 13 and 17% in Holstein and Jersey cattle. The effect of adding polygenic effects on prediction bias was similar for the bull only and combined (bull and cow) reference datasets, and a similar bias was also observed in the pedigree (only) predictions. It seems that the pedigree relationships cause the increase in bias observed when polygenic effects are added to the genomic predictions, and this increase in bias was independent of whether the bull only or combined reference datasets were used (note that national genomic evaluations in ADHIS regress parent averages by approximately 0.6 to account for this bias). When predictions were based on SNP effects only, the overestimation of GEBV was greater in Jersey (average slope =0.94 for milk production traits) than in Holstein cattle (average slope =1.02) but, in general, the slope of the regressions did not differ notably from 1.

Within- and multi-breed genomic predictions for Holstein and Jersey

Using the combined (bull and cow) reference datasets and excluding polygenic effects was found to give the ‘best’ (highest accuracy with least bias) genomic predictions for Holstein and Jersey validation animals. The observed accuracies and bias for these reference datasets when using only SNP effects for prediction are in Table 4 for milk production traits. These results show that BayesR resulted in an average increase in accuracy of 3 and 6% in the Holstein and Jersey single breed reference datasets, compared to GBLUP. There was little effect (±1%) of using the multi-breed reference dataset on prediction accuracies when using BayesR, and a small favourable effect (<2%) for GBLUP.

Table 4 Accuracy and bias of within- and multi-breed genomic predictions for milk production traits

Across-breed genomic predictions

Table 5 gives the accuracy and bias when prediction equations were tested in a breed not included in the reference population. Using predictions from the other breed resulted in a 40% reduction in prediction accuracy for the Holstein and Jersey breeds (Table 5), compared to prediction accuracies when the target breed was included in the reference dataset (Tables 4). The accuracy of prediction for the Australian Red breed was on average 3 and 9% greater when using a reference population that included both Jersey and Holstein animals compared to a reference population that included either Holstein or Jersey animals, respectively (Australian Red animals were never included in the reference population).

Table 5 Accuracy and bias of across-breed genomic predictions for milk production traits

Across-breed predictions showed an overall benefit of using BayesR compared to GBLUP (Table 5). Across various traits and breed combinations, BayesR outperformed GBLUP by on average 7% for all across-breed predictions in milk production traits. BayesR showed a very large (17%) advantage for F%, probably due to the segregation of SNPs with large effects [7,8] and a consistent advantage of 5 to 7% for FY, MY and PY. The combined effect of using BayesR and a multi-breed reference increased the accuracy of genomic predictions for the Australian Red animals by 8 and 17%, compared to the accuracies obtained from a single-breed reference dataset of Holstein or Jersey animals using GBLUP.

Precision of QTL mapping with BayesR and GBLUP

We hypothesized that BayesR results in more accurate across-breed genomic predictions because it locates QTL effects more precisely in the genome. We examined the QTL regions identified by BayesR and GBLUP for QTL previously reported in the literature and identified several regions that contain QTL for milk production traits (e.g. DGAT1 [19], ABCG2 [35], FASN [36], SCD [37], the casein complex, LALBA and PAEP (formally LGB) [38]; GHR [39] and AGPAT6 [40]). In most cases, except for DGAT1 and GHR, the gene in the QTL region that was most over- (or under-) expressed in the mammary gland (compared to the 17 other tissues analysed) matched the loci reported in the literature (Table 6). Although GHR is cited as a candidate gene for the region identified on bovine chromosome BTA20, CCL28 showed higher differential expression (P < 1 × 10−29) and it should be noted that this region is reported to contain other QTL [41].

Table 6 Regions with large variance in local GEBV from BayesR for milk production traits

To investigate the precision of GBLUP versus BayesR in mapping QTL, we specifically investigated the mapping of the PAEP gene. Figure 2 shows the absolute value of the SNP effect estimates in the region. With BayesR, SNPs were identified that have larger effects on milk production traits than most of the surrounding SNPs. In contrast, with GBLUP all the SNPs in the identified region had small effects, although there was possibly a small increase in SNP effect estimates for SNPs for which BayesR also found larger effects. In spite of these small effects, somewhat surprisingly, the local variance in GEBV using GBLUP did find peaks in the region of PAEP (Figure 3). This is probably due to the SNP estimates in the linear combination for local GEBV reinforcing each other in the area of the peak but almost cancelling each other out in other regions. However, a careful inspection showed that, although GBLUP often indicated a region with large GEBV variance near the QTL, the maximum variance was larger and more concentrated near the reported QTL for BayesR than for GBLUP. This is due to the heterogeneous variance assumptions in the BayesR method, which allow SNPs in high LD with the QTL to have larger effects.

Figure 2

SNP effects estimated by BayesR and GBLUP for FY, MY and PY near the PAEP gene. Shown is the (mean corrected) absolute value of SNP effect estimates from the bull and cow, multi-breed reference population. Traits are FY = fat yield, MY = milk yield and PY = protein yield. The position of PAEP on BTA11 is marked (*). Note the changed y-axis scale for each graph.

Figure 3

Local GEBV variance near the PAEP gene for FY, MY and PY using BayesR and GBLUP . Shown is the (mean corrected) GEBV variance in 250 kb windows for Holstein and Jersey reference animals from SNP effects estimated from the bull and cow, multi-breed reference population. Traits are FY = fat yield, MY = milk yield and PY = protein yield. The position of PAEP on BTA11 is marked (*). Note the changed y-axis scale for each graph.

PAEP is reported to have a large effect on PY and smaller effects on MY and FY, and encodes the primary whey protein of bovine milk [38]. Although GBLUP indicates a region of high GEBV variance that encompasses PAEP, BayesR captured this pattern of effects more accurately and estimated appropriately the SNPs with large effects near PAEP in the analysis of PY. The analysis of FY, MY and PY with GBLUP likely captured the effect of PAEP, but the effect seemed to be dispersed over a large region that covered possibly the entire 5 Mbp region shown in Figure 2.

A second example of QTL mapping is provided in Figures S2 and S3 (see Additional file 1: Figures S2 and S3) for AGPAT6. In agreement with the other reports [40], we observed AGPAT6 to have a large effect on FY with smaller effects on MY and PY. Similar to PAEP, the effect that was estimated for AGPAT6 by GBLUP was spread over a larger genomic region than the effect that was estimated by BayesR. Interestingly, the effects of AGPAT6 on PY estimated by both methods were very similar. It seems that the difference between BayesR and GBLUP declines as the effect size of a locus decreases. In addition, there may be two other QTL near AGPAT6 that affect MY and PY.

All QTL with large effects (defined by a local GEBV variance greater than 50 times that of an average window) identified in the BayesR analysis with the multi-breed bull and cow reference dataset and their pleiotropic effects are presented in Table 6. Several of these regions were also identified by GBLUP and by a previous study on this data using a genome-wide association approach with single-marker regression analysis [18]. We identified QTL from eight of the nine possible groupings for pleiotropic effects. That is, large-effect QTL for milk yield traits could have positive, negative or no (observable) correlation with other milk yield traits.


The accuracy of genomic predictions using GBLUP depends on the size of the reference population [42,43]. Thus, when a large reference population is available for a single breed of dairy cattle, such as Holstein, GBLUP captures most of the potential accuracy for genomic predictions and there seems little benefit in using nonlinear methods for prediction, such as BayesR, or high-density genomic markers [9]. However, when predictions need to be more robust and are used to predict genetic merit of distantly-related animals, such as animals in future generations or animals from different breeds, the benefits of genomic prediction using GBLUP with medium-density SNPs decreases compared to nonlinear methods. This was first pointed out by Habier et al. [1] who reported poor predictions with GBLUP over successive generations compared to BayesB. Here, we show the advantages of nonlinear genomic prediction methods with across-breed predictions. We showed that the accuracy of genomic predictions obtained using BayesR increased by 8 and 17% compared to GBLUP predictions from a single breed when they were estimated for Australian Red animals from a multi-breed Holstein/Jersey reference population. In regions that contain known mutations that affect milk production, we demonstrated that BayesR localises SNP effects to smaller genomic regions than GBLUP. Thus, robust genomic prediction of genetic merit and QTL mapping are related problems, which can both be accomplished by nonlinear methods such as BayesR.

Increasing the size of the reference population by including cows increased accuracy of genomic predictions by 4.2 to 5.4% for traits with moderate to high heritability. Adding cows had little or no effect on the bias of predictions. This is in contrast to the bias introduced when adding cows in French and US studies [44,45], possibly because, in our case, cows were sampled from commercial herds with little potential for bias from preferential treatment and animals were not selectively genotyped based on genetic merit. In our data, adding cows benefitted predictions despite their phenotypic records being less accurate than records on bulls. This is probably because the size of the reference population increased substantially by adding the cows, by almost three times in the Holstein data and five times in the Jersey data. A further benefit of using the Holstein cows was that they were more genetically diverse than the Holstein bull population (see Additional file 1: Figure S4). This diversity is useful to identify SNPs that track causative mutations, and thus contributes to improving the robustness of genomic predictions. Since the cows that were added to the reference population were animals from commercial farms, it is possible that some animals may have been recently admixed with another breed and present varying degrees of traditional ancestry with Australian dairy cattle, such as British Friesians.

The BayesR QTL mapping approach, combined with expression data from mammary gland tissue, was powerful for the identification of many previously reported QTL that are known to be involved in milk production. For the known QTL, the patterns of pleiotropic effects estimated by BayesR matched the reported effects for some mutations. This study suggests that QTL mapping using a nonlinear approach and considering multiple traits may improve the mapping precision. This will be most beneficial for QTL with large effects on one trait and smaller effects on another trait. For example, the large effect of AGPAT6 on FY could help to more precisely map the smaller effects of this locus on PY. We observed little difference between GBLUP and BayesR in the across-breed prediction for P%, presumably because it is controlled by many QTL with small effects. A strategy that uses multiple traits to assist the localisation of QTL may be useful to increase robustness and accuracy of across-breed predictions for traits such as P%.

Our analysis identified several interesting candidate loci for milk production traits that (1) were identified as QTL with both BayesR and GBLUP, (2) were highly over- (or under-) expressed in the mammary gland compared to the other 17 tissue types analysed and (3) have functions in milk synthesis that have been described independently. It may be interesting to further study these loci, which include: SLC37A1 that encodes a glucose-6-phosphate transporter involved in the homeostasis of blood glucose [46]; MUC1 that encodes a glycoprotein that is a component of the surface membrane of fat globules in milk [47] and is also assumed to contribute to epithelial cell defence against bacteria; and CSF2RB, which is involved in the JAK-STAT signalling pathway (the JAK-STAT pathway has a central role in prolactin signalling [48]). Another promising novel candidate gene is TPH1, which is involved in mammary gland development (GO:0067074) and serotonin synthesis [49].


The use of a nonlinear method (such as BayesR) and high-density SNP genotypes, combined with a multi-breed reference population that included cows and bulls, led to the highest accuracies of genomic prediction, especially for a breed that was not included in the reference population. The advantage of BayesR over GBLUP is due to its better use of SNPs that are close to the causal mutation. Thus, the accuracy of GEBV derived using BayesR should be greater than that of GEBV derived using GBLUP for a variety of target populations and across multiple generations. It seems that BayesR is a useful methodology to map genes responsible for variation in quantitative traits.


  1. 1.

    Habier D, Tetens J, Seefried F-R, Lichtner P, Thaller G: The impact of genetic relationship information on genomic breeding values in German Holstein cattle. Genet Sel Evol 2010, 42:5.

  2. 2.

    Saatchi M, Ward J, Garrick DJ: Accuracies of direct genomic breeding values in Hereford beef cattle using national or international training populations. J Anim Sci 2013, 91:1538–1551.

  3. 3.

    Kachman SD, Spangler ML, Bennett GL, Hanford KJ, Kuehn LA, Snelling WM, Thallman RM, Saatchi M, Garrick DJ, Schnabel RD, Taylor JF, Pollak EJ: Comparison of molecular breeding values based on within- and across-breed training in beef cattle. Genet Sel Evol 2013, 45:30.

  4. 4.

    Pryce JE, Gredler B, Bolormaa S, Bowman PJ, Egger-Danner C, Fuerst C, Emmerling R, Solkner J, Goddard ME, Hayes BJ: Short communication: genomic selection using a multi-breed, across-country reference population. J Dairy Sci 2011, 94:2625–2630.

  5. 5.

    The Bovine Genome Sequencing Analysis Consortium, Elsik CG, Tellam RL, Worley KC: The genome sequence of Taurine cattle: a window to ruminant biology and evolution. Science 2009, 324:522–528.

  6. 6.

    de Roos APW, Hayes BJ, Spelman RJ, Goddard ME: Linkage disequilibrium and persistence of phase in Holstein-Friesian, Jersey and Angus cattle. Genetics 2008, 179:1503–1512.

  7. 7.

    VanRaden PM, van Tassell CP, Wiggans GR, Sonstegaard TS, Schnabel RD, Taylor JF, Schenkel F: Invited review: reliability of genomic predictions for North American Holstein bulls. J Dairy Sci 2009, 92:16–24.

  8. 8.

    Hayes BJ, Pryce J, Chamberlain AJ, Bowman PJ, Goddard ME: Genetic architecture of complex traits and accuracy of genomic prediction: Coat colour, milk-fat percentage, and type in Holstein cattle as contrasting model traits. PLoS Genet 2010, 6:e1001139.

  9. 9.

    VanRaden PM, Null DJ, Sargolzaei M, Wiggans GR, Tooker ME, Cole JB, Sonstegard TS, Connor EE, Winters M, van Kaam JBCHM, Valentini A, Van Doormaal BJ, Faust MA, Doak GA: Genomic imputation and evaluation using high-density Holstein genotypes. J Dairy Sci 2013, 96:668–678.

  10. 10.

    Erbe M, Hayes BJ, Matukumalli LK, Goswami S, Bowman PJ, Reich CM, Mason BA, Goddard ME: Improving accuracy of genomic predictions within and between dairy cattle breeds with imputed high-density single nucleotide polymorphism panels. J Dairy Sci 2012, 95:4114–4129.

  11. 11.

    Meuwissen THE, Hayes BJ, Goddard ME: Prediction of total genetic value using genome-wide dense marker maps. Genetics 2001, 157:1819–1829.

  12. 12.

    Olson KM, VanRaden PM, Tooker ME: Multibreed genomic evaluations using purebred Holsteins, Jerseys, and Brown Swiss. J Dairy Sci 2012, 95:5378–5383.

  13. 13.

    Karoui S, Carabano MJ, Diaz C, Legarra A: Joint genomic evaluation of French dairy cattle breeds using multiple-trait models. Genet Sel Evol 2012, 44:39.

  14. 14.

    Bolormaa S, Pryce JE, Kemper KE, Savin K, Hayes BJ, Barendse W, Zhang Y, Reich CM, Mason BA, Bunch RJ, Harrison BE, Reverter A, Herd RM, Tier B, Graser HU, Goddard ME: Accuracy of prediction of genomic breeding values for residual feed intake, carcass and meat quality traits in Bos taurus , Bos indicus and composite beef cattle. J Anim Sci 2013, 91:3088–3104.

  15. 15.

    Saatchi M, Garrick DJ: Accuracies of genomic predictions in US beef cattle. Proc Assoc Advmt Anim Breed Genet 2013, 20:207–210.

  16. 16.

    Schrooten C, Schopen GCB, Parker A, Medley A, Beatson P: Across-breed genomic evaluation based on bovine high density genotypes and phenotypes of bulls and cows. Proc Assoc Advmt Anim Breed Genet 2013, 20:138–141.

  17. 17.

    Wiggans GR, VanRaden PM, Cooper TA: The genomic evaluation system in the United States: past, present, future. J Dairy Sci 2011, 94:3202–3211.

  18. 18.

    Raven L-A, Cocks BG, Hayes BJ: Multibreed genome wide association can improve precision of mapping causative variants underlying milk production in dairy cattle. BMC Genomics 2014, 15:62.

  19. 19.

    Grisart B, Coppieters W, Farnir F, Karim L, Ford C, Berzi P, Cambisano N, Mni M, Reid S, Simon P, Spelman R, Georges M, Snell R: Positional candidate cloning of a QTL in dairy cattle: Identification of a missense mutation in the bovine DGAT1 gene with major effect on milk yield and composition. Genome Res 2002, 12:222–231.

  20. 20.

    Browning SR, Browning BL: Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am J Hum Genet 2007, 81:1084–1097.

  21. 21.

    Australian Dairy Herd Improvement Report 2012. []

  22. 22.

    Garrick DJ, Taylor JF, Fernando RL: Deregressing estimated breeding values and weighting information for genomic regression analyses. Genet Sel Evol 2009, 41:55.

  23. 23.

    Gilmour AR, Gogel BJ, Cullis BR, Thompson R: ASReml User Guide 2.0. Hemel Hempsted, UK: VSN International Ltd.; 2006.

  24. 24.

    Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, Nyholt DR, Madden PA, Heath AC, Martin NG, Montgomery GW, Goddard ME, Visscher PM: Common SNPs explain a large proportion of the heritability for human height. Nat Genet 2010, 42:565–569.

  25. 25.

    Mrode RA: Linear Models for the Prediction of Animal Breeding Values. 2nd edition. Wallingford: CABI Publishing; 2005.

  26. 26.

    Yang J, Lee SH, Goddard ME, Visscher PM: GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet 2011, 88:76–82.

  27. 27.

    Meuwissen THE, Goddard ME: Mapping multiple QTL using linkage disequilibrium and linkage analysis information and multitrait data. Genet Sel Evol 2004, 36:261–279.

  28. 28.

    Meuwissen TH, Goddard ME: Prediction of identity by descent probabilities from marker-haplotypes. Genet Sel Evol 2001, 33:605–634.

  29. 29.

    Fan B, Onteru SK, Du ZQ, Garrick DJ, Stalder KJ, Rothschild MF: Genome-wide association study identifies loci for body composition and structural soundness traits in pigs. PLoS ONE 2011, 6:e14726.

  30. 30.

    Chamberlain AC, Vander Jagt CJ, Goddard ME, Hayes BJ: A Gene Expression Atlas from Bovine RNAseq Data. In Proceedings of the 10th World Congress of Genetics Applied to Livestock Production: 17–22 August; Vancouver. 2014.

  31. 31.

    Kim D, Pertea G, Trapnell C, Pimentel H, Kelley DR, Salzberg SL: TopGat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 2013, 14:R36.

  32. 32.

    HTSeq: Analysing High-Throughput Sequencing Data with Python. []

  33. 33.

    Robinson MD, McCarthy DJ, Smyth GK: EdgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 2010, 26:139–140.

  34. 34.

    Saatchi M, Garrick DJ, Tait RG, Mayes MS, Drewnoski M, Schoonmaker J, Diaz C, Beitz DC, Reecy JM: Genome-wide association and prediction of direct genomic breeding values for composition of fatty acids in Angus beef cattle. BMC Genomics 2013, 14:730.

  35. 35.

    Cohen-Zinder M, Seroussi E, Larkin DM, Loor JJ, Everts-van der Wind A, Lee J-H, Drackley JK, Band MR, Hernandez AG, Shani M, Lewin HA, Weller JI, Ron M: Identification of a missense mutation in the bovine ABCG2 gene with a major effect on the QTL on chromosome 6 affecting milk yield and composition in Holstein cattle. Genome Res 2005, 15:936–944.

  36. 36.

    Roy R, Ordovas L, Zaragoza P, Romero A, Moreno C, Altarriba J, Rodellar C: Association of polymorphisms in the bovine FASN gene with milk-fat content. Anim Genet 2006, 37:215–218.

  37. 37.

    Mele M, Conte G, Castiglioni B, Chessa S, Macciotta NPP, Serra A, Buccioni A, Pagnacco G, Secchiari P: Stearoyl-coenzyme A desaturase gene polymorphism and milk fatty acid composition in Italian Holsteins. J Dairy Sci 2007, 90:4458–4465.

  38. 38.

    Ng-Kwai-Hang KF: A Review of the Relationship between Milk Protein Polymorphism and Milk Composition/Milk Production. In Proceedings of the International Dairy Federation Seminar: 25–27 Febuary; Palmerston North, New Zealand. 1997:22–37.

  39. 39.

    Blott S, Kim JJ, Moisio S, Schmidt-Kuntzel A, Cornet A, Berzi P, Cambisano N, Ford C, Grisart B, Johnson D, Karim L, Simon P, Snell R, Spelman R, Wong J, Vilkki J, Georges M, Farnir F, Coppieters W: Molecular dissection of a quantitative trait locus: a phenylalanine-to-tyrosine substitution in the transmembrane domain of the bovine growth hormone receptor is associated with a major effect on milk yield and composition. Genetics 2003, 163:253–266.

  40. 40.

    Wang X, Wurmser C, Pausch H, Jung S, Reinhardt F, Tetens J, Thaller G, Fries R: Identification and dissection of four major QTL affecting milk fat content in the German Holstein-Friesian population. PLoS ONE 2012, 7:e40711.

  41. 41.

    Chamberlain AJ, Hayes BJ, Savin K, Bolormaa S, McPartlan HC, Bowman PJ, Van der Jagt C, MacEachern S, Goddard ME: Validation of single nucleotide polymorphisms associated with milk production traits in dairy cattle. J Dairy Sci 2012, 95:864–875.

  42. 42.

    Daetwyler HD, Pong-Wong R, Villanueva B, Woolliams JA: The impact of genetic architecture on genome-wide evaluation methods. Genetics 2010, 185:1021–1031.

  43. 43.

    Goddard M: Genomic selection: prediction of accuracy and maximisation of long term response. Genetica 2009, 136:245–257.

  44. 44.

    Dassonneville R, Baur A, Fritz S, Boichard D, Ducrocq V: Inclusion of cow records in genomic evaluations and impact on bias due to preferential treatment. Genet Sel Evol 2012, 44:40.

  45. 45.

    Wiggans GR, Cooper TA, VanRaden PM, Cole JB: Technical note: adjustment of traditional cow evaluations to improve accuracy of genomic predictions. J Dairy Sci 2011, 94:6188–6193.

  46. 46.

    Pan C-J, Chen S-Y, Jun HS, Lin SR, Mansfield BC, Chou JY: SLC37A1 and SLC37A2 are phosphate-linked, glucose-6-phosphate antiporters. PLoS ONE 2011, 6:e23157.

  47. 47.

    Pallesen LT, Andersen MH, Nielsen RL, Berglund L, Petersen TE, Rasmussen LK, Rasmussen JT: Purification of MUC1 from bovine milk-fat globules and characterization of a corresponding full-length cDNA clone. J Dairy Sci 2001, 84:2591–2598.

  48. 48.

    Watson CJ, Burdon TG: Prolactin signal transduction mechanisms in the mammary gland: the role of the Jak/Stat pathway. Rev Reprod 1996, 1:1–5.

  49. 49.

    Hernandez LL, Stiening CM, Wheelock JB, Baumgard LH, Parkhurst AM, Collier RJ: Evaluation of serotonin as a feedback inhibitor of lactation in the bovine. J Dairy Sci 2008, 91:1834–1844.

  50. 50.

    R: A language and environment for statistical computing. []

Download references


Authors thank Gert Nieuwhof and Kon Konstantinov from the Australian Dairy Herd Improvement Scheme (ADHIS) and the Dairy Futures Co-operative Research Centre for the provision of data and resources to conduct this research. The Australian Red Breed Association, Holstein Australia, Jersey Australia and many dairy farmers are warmly thanked for sample collection. This research was supported under Australian Research Council’s Discovery Projects funding scheme (project DP1093502). The views expressed herein are those of the authors and are not necessarily those of the Australian Research Council.

Author information

Correspondence to Kathryn E Kemper.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

KEK performed the analysis and wrote the first draft of the paper; CM and BM collected and performed genotyping of cows and Australian Red animals; PJB extracted data and implemented the multi-threaded version of BayesR; CJV conducted the tissue-specific expression analysis; AJC collected tissues and performed RNA sequencing; BJH quality checked and performed imputation of genotypes and helped interpret the results; MEG supervised the analysis and interpretation of the results. All authors read and approved the manuscript.

Additional files

Additional file 1: Figure S1.

Linkage disequilibrium in Holstein, Jersey and Australian Red dairy cattle. Figure S1. Shows the average r2 (correlation) between SNP pairs and the average distance between adjacent SNPs (vertical grey line) on BTA1. Figure S2. SNP effects estimated by BayesR and GBLUP for FY, MY and PY near the gene AGPAT6. Figure S2. shows the (mean corrected) absolute SNP effects for Holstein and Jersey animals from the bull and cow, multi-breed reference population. Traits are FY = fat yield, MY = milk yield and PY = protein yield. The position of AGPAT6 on BTA27 is marked (*). Note the y-axis scale is changed for each graph. Figure S3. Local GEBV variance near the gene AGPAT6 for FY, MY and PY using BayesR and GBLUP. Figure S3 shows the (mean corrected) GEBV variance in 250 kb windows for Holstein and Jersey animals from the bull and cow, multi-breed reference population. Traits are FY = fat yield, MY = milk yield and PY = protein yield. The position of the gene AGPAT6 on BTA27 is marked (*). Note the y-axis scale is changed for each graph. Figure S4. Relationship between Holstein and Jersey cows and bulls in the reference and validation datasets. Figure S4. shows the principle components from the G matrix using the eigen() function in R [50]. The named subset (i.e. Holstein cows, top left) is highlighted in blue in each panel.

Additional file 2:

Derivation of the full conditional likelihood function. Additional file 2 provides the details for the derivation of the full conditional likelihood function.

Additional file 3:

Effect of reference dataset, prediction method and polygenic term on genomic predictions of milk traits, stature, fertility and survival. Table S1. Bold characters show the prediction accuracies (Acc.) and biases using only pedigree information or GBLUP with bull only or bull and cow reference datasets. Below the numbers in bold characters are the average effect on the prediction when changing of the method used to predict SNP effects (+BayesR), adding non-target breed animals to the reference (i.e. multi-breed reference, e.g. +Holstein) and adding the polygenic effect (+polygenic). These estimates are from the linear model fitted to the prediction results and effects are deviations from the GBLUP predictions. Table S2. Genomic prediction accuracy and bias for the Holstein validation dataset. Genomic predictions accuracy and bias for the Holstein validation population using different combinations of reference datasets (bulls or bulls and cows), method of prediction (GBLUP or BayesR), breed composition in the reference population (single or both breeds), and with or without inclusion of the polygenic term in the prediction. Table S3. Genomic prediction accuracy and bias for the Jersey validation dataset. Genomic predictions accuracy and bias for the Jersey validation population using different combinations of reference datasets (bulls or bulls and cows), method of prediction (GBLUP or BayesR), breed composition in the reference population (single or both breeds), and with or without inclusion of the polygenic term in the prediction. Table S4. Across-breed prediction accuracies for Australian Red animals. Prediction accuracies and bias for genomic predictions in Australian Red animals are provided separately for the bull and cow validation sets. Table S5. Details on the genes identified from the mammary expression data and listed in Table 6 . Tables S5 gives the Ensebl gene ID, full name, chromosome, gene start and end on the bovine genome (bp) and the description of the genes listed in Table 6 .

Rights and permissions

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark


  • Quantitative Trait Locus
  • Reference Population
  • Quantitative Trait Locus Mapping
  • Genomic Prediction
  • Reference Dataset