# Computational strategies for alternative single-step Bayesian regression models with large numbers of genotyped and non-genotyped animals

- Rohan L. Fernando
^{1}Email authorView ORCID ID profile, - Hao Cheng
^{1}, - Bruce L. Golden
^{2}and - Dorian J. Garrick
^{1, 3}

**Received: **8 June 2016

**Accepted: **22 November 2016

**Published: **8 December 2016

## Abstract

### Background

Two types of models have been used for single-step genomic prediction and genome-wide association studies that include phenotypes from both genotyped animals and their non-genotyped relatives. The two types are breeding value models (BVM) that fit breeding values explicitly and marker effects models (MEM) that express the breeding values in terms of the effects of observed or imputed genotypes. MEM can accommodate a wider class of analyses, including variable selection or mixture model analyses. The order of the equations that need to be solved and the inverses required in their construction vary widely, and thus the computational effort required depends upon the size of the pedigree, the number of genotyped animals and the number of loci.

### Theory

We present computational strategies to avoid storing large, dense blocks of the MME that involve imputed genotypes. Furthermore, we present a hybrid model that fits a MEM for animals with observed genotypes and a BVM for those without genotypes. The hybrid model is computationally attractive for pedigree files containing millions of animals with a large proportion of those being genotyped.

### Application

We demonstrate the practicality on both the original MEM and the hybrid model using real data with 6,179,960 animals in the pedigree with 4,934,101 phenotypes and 31,453 animals genotyped at 40,214 informative loci. To complete a single-trait analysis on a desk-top computer with four graphics cards required about 3 h using the hybrid model to obtain both preconditioned conjugate gradient solutions and 42,000 Markov chain Monte-Carlo (MCMC) samples of breeding values, which allowed making inferences from posterior means, variances and covariances. The MCMC sampling required one quarter of the effort when the hybrid model was used compared to the published MEM.

### Conclusions

We present a hybrid model that fits a MEM for animals with genotypes and a BVM for those without genotypes. Its practicality and considerable reduction in computing effort was demonstrated. This model can readily be extended to accommodate multiple traits, multiple breeds, maternal effects, and additional random effects such as polygenic residual effects.

## Keywords

## Background

Two types of equivalent mixed linear models are used for whole-genome analyses in livestock [1]. The first type, which we refer to as marker effects models (MEM), includes random effects (\(\varvec{\alpha }\)) of marker genotype covariates (\({\mathbf {M}}_{g}\)) in the model [2, 3]. The second type, which we refer to as breeding value models (BVM), includes the breeding values of the animals, \({\mathbf {u}}_{g}={\mathbf {M}}_{g}\varvec{\alpha }\), as a random effect that has a covariance computed from \({\mathbf {M}}_g\) [1, 2, 4–6] rather than from the pedigree.

It was shown that the BVM can be adapted for what is known as single-step genomic best linear unbiased prediction (SS-GBLUP) that combines information from animals with genotypes and from those without genotypes in a single BLUP analysis [7–9]. However, the SS-GBLUP analysis requires computing the inverse of \({\mathbf {G}}\), which is the matrix of genomic relationships of the animals with genotypes [8, 9]. When the number \(N_{g}\) of genotyped animals exceeds the number of markers, \({\mathbf {G}}\) is singular, but a full-rank matrix such as \({\mathbf {G}}^{*}=0.95{\mathbf {G}}+0.05{\mathbf {A}},\) with \({\mathbf {A}}\) being the pedigree-based relationship matrix might be used in its place. Single-step analyses based on the MEM do not require computing \({\mathbf {G}}\) or its inverse [10]. Furthermore, Bayesian regression analyses based on the MEM are not limited to assuming a normal prior for \(\varvec{\alpha }\), which is implicit in SS-GBLUP; Bayesian regression models can accommodate various priors including the *t* distribution as in BayesA [3, 11], the double exponential distribution as in Bayesian LASSO [12] or mixtures of the *t* distribution or the normal distribution [3, 11, 13] as in BayesB or BayesC. However, the MME that correspond to single-step MEM (SS-MEM) types of models contain dense blocks that correspond to the imputed genotypes of animals with missing genotypes [10], and those blocks can be large if many animals have missing genotypes.

Liu et al. [14] developed a single-step method based on the BVM with direct estimation of marker effects (SSME-GBLUP). An advantage of that method over SS-GBLUP is that it does not require computing \({\mathbf {G}}\) or its inverse. Also, their method can be used for Bayesian regression models [14]. However, the MME for SSME-GBLUP contains expressions that involve the inverse of the pedigree-based relationship matrix, \({\mathbf {A}}_{gg},\) for the animals with genotypes. This is a dense matrix, and therefore a computational strategy was proposed to avoid computing its inverse but it requires solving a dense system of equations of order \(N_{g}\) within each round of Jacobi or pre-conditioned conjugate gradient (PCG) iteration for solution of the MME or within each round of MCMC sampling for Bayesian inference with models such as BayesA or BayesB [3]. Equation (A1) in Legarra and Ducrocq [15] also present a set of similar MME with marker effects for genotyped animals and breeding values for non-genotyped animals. As with the MME in Liu et al. [14], the advantage of the MME of Legarra and Ducrocq [15] is that they do not require the computation of \({\mathbf {G}}\) or its inverse but require computing the inverse of \({\mathbf {A}}_{gg}\). Recently, in some livestock such as dairy cattle, \(N_{g}\) has increased towards a million or more, and thus, solving a dense system of equations of order \(N_{g}\) within each round of iteration will place a heavy burden on SSME-GBLUP in computing time and storage requirements.

The objective of this paper is to present computational strategies for whole-genome analyses based on the SS-MEM that avoid storing large, dense blocks of the MME that involve imputed genotypes. First, we will show this for the MME given in [10]. Second, we will present what we refer to as a hybrid type model (HM) that uses a MEM for the animals with marker genotypes and a BVM for animals without genotypes. The MME that correspond to this model also has dense blocks that correspond to animals with missing genotypes. However, in Bayesian regression analyses based on this hybrid model, storing the dense blocks can be avoided even more efficiently than was the case for the MME given in [10]. Finally, we will present the computer storage and time required for a real application.

## Theory

In most genomic analyses, the columns of the matrix \({\mathbf {M}}_{g}\) of marker covariates are centered to have zero expectations. This ensures that the vector of breeding values, \({\mathbf {u}}_{g}={\mathbf {M}}_{g}\varvec{\alpha }\), has a mean of \({\mathbf {0}}\). Centering \({\mathbf {M}}_{g}\) requires knowing the expected value of the marker covariates for founder animals. Often, these expected values are unknown, but can be incorporated into the model as a location parameter [10, 16]. However, to simplify our presentation without loss of generality we assume that \({\mathbf {M}}_{g}\) is a matrix of correctly centered marker covariates.

### Marker effects model for single-step Bayesian regression

*n*and those for the animals with genotypes with a subscript

*g*. Thus, \({\mathbf {y}}_{n}\) and \({\mathbf {y}}_{g}\) are the vectors of phenotypic values, \({\mathbf {X}}_{n}\) and \({\mathbf {X}}_{g}\) are the incidence matrices for the fixed effects, \(\varvec{\beta }\), \({\mathbf {Z}}_{n}\) and \({\mathbf {Z}}_{g}\) are incidence matrices that relate the breeding values of animals, \(\left[ \begin{array}{l} {\mathbf {M}}_{n}\varvec{\alpha }+\varvec{\epsilon }\\ {\mathbf {M}}_{g}\varvec{\alpha } \end{array}\right]\), to the phenotypic values, \({\mathbf {M}}_{g}\) is the matrix of centered marker covariates for animals with genotypes, \({\mathbf {M}}_{n}={\mathbf {A}}_{ng}{\mathbf {A}}_{gg}^{-1}{\mathbf {M}}_{g}\), is the matrix of imputed marker covariates for animals with missing genotypes, \(\varvec{\alpha }\) is the vector of random marker effects, \(\varvec{\epsilon }\) is the vector of imputation residuals with null means and covariance matrix proportional to the inverse of \({\mathbf {A}}^{nn}\), the sub-matrix corresponding to animals with missing genotypes in the inverse of the matrix \({\mathbf {A}}\) of pedigree-based additive relationships, and \({\mathbf {e}}\) is a vector of residuals. The matrix of imputed genotypes can be more efficiently computed by solving the sparse system of equations [10]:

Depending on the prior used for \(\varvec{\alpha }\), Model (1) can be used for a range of single-step Bayesian regression analyses, including single-step BLUP, BayesA, BayesB, BayesC or Bayesian LASSO [10]. Those models (1) and their corresponding analyses assume that the breeding values can be adequately explained by the marker covariates. If that assumption does not hold, a polygenic residual with a mean of zero and a covariance matrix that is proportional to **A **can be included as an additional effect in the model.

#### Computing strategies

First, we will discuss the calculations necessary to apply PCG to (3). Following this, we will discuss how to use (3) to obtain Markov chain Monte-Carlo (MCMC) samples of the location parameters of Model (1) from their full conditional distributions.

*Preconditioned conjugate gradient iteration*The PCG algorithm is widely used to iteratively solve the MME, e.g., [17, 18]. In each iteration of PCG, the left-hand-side of the MME (LHS-MME) is post-multiplied by a vector. However, the LHS-MME given in (3) contains two dense sub-matrices, \({\mathbf {Z}}'_{n}{\mathbf {Z}}_{n}{\mathbf {M}}_{n}\) and its transpose, that may be too large for storage in memory; the remaining sub-matrices in LHS-MME can be stored in memory either because they are not too large or because they are sparse. In each round of PCG, \({\mathbf {Z}}'_{n}{\mathbf {Z}}_{n}{\mathbf {M}}_{n}\) needs to be post-multiplied by a vector \({\mathbf {q}}\) that has the same order as \(\varvec{\alpha }\) and the transpose of this matrix by a vector \({\mathbf {s}}\) that has the same order as \(\varvec{\epsilon }\). The first of these products can be done without storing \({\mathbf {Z}}'_{n}{\mathbf {Z}}_{n}{\mathbf {M}}_{n}\) in memory as follows. Post-multiplying both sides of Eq. (2) by \({\mathbf {q}}\) gives:

We will now describe how these matrices and the right hand sides involving \({\mathbf {M}}_{n}\) can be computed in order to form the other elements of the MME without storing \({\mathbf {M}}_{n}\) in memory.

*i*of \({\mathbf {Z}}_{n}{\mathbf {M}}_{n}\). Then, \({\mathbf {M}}'_{n}{\mathbf {Z}}'_{n}{\mathbf {Z}}_{n}{\mathbf {M}}_{n}\) can be written as:

*i*of \({\mathbf {Z}}_{n}{\mathbf {M}}_{n}\) can be computed as:

*i*th position and 0s elsewhere, and the product \({\mathbf {e}}'_{i}{\mathbf {Z}}_{n}({\mathbf {A}}^{nn})^{-1}\) can be obtained by solving the sparse system:

*i*th column of \({\mathbf {X}}'_{n}\), which is the first term of

*i*th element of \({\mathbf {y}}_{n}.\)

Note that computing \({\mathbf {m}}'_{n_{i}}\) corresponding to row *i* of \({\mathbf {Z}}_{n}{\mathbf {M}}_{n}\) using Eq. (8) can be done independently of its computation for any other row, and thus, the computations in Eqs. (7), (10), and (11) can be easily parallelized.

There are a number of approaches to compute \({\mathbf {m}}'_{n_i}\) that can be used. One approach for solving Eq. (9) for *n* less than approximately ten million using a typical workstation computer is to obtain a sparse Cholesky factor of \({\mathbf {A}}^{nn}\) and directly solve for each \({\mathbf {m}}'_{n_i}\) using forward and backward substitution. Software libraries exist for obtaining a Cholesky factor of large sparse matrices [19] using multiple threads or general purpose graphics processing units (GPU). Once the factor is obtained, independent threads can be used to solve in parallel from a single memory copy of the factor. Note that the Cholesky factor of \({\mathbf {A}}^{nn}\) may be denser than \({\mathbf {A}}^{nn}\), depending upon the nature of the relationships between genotyped and non-genotyped animals. For example, if non-genotyped animals comprise only non-parents, then \({\mathbf {A}}^{nn}\) is diagonal and solution for \({\mathbf {m}}_{i}'\) is trivial. For larger linear systems with non-genotyped parents where the Cholesky factor is too large, indirect solution using high performance methods on GPU is a practical alternative. The PCG algorithm parallelizes well and performs efficiently on GPU.

*MCMC sampling*Gibbs sampling is a widely used MCMC method for inference with Bayesian regression models, e.g., [3, 20, 21]. One of the most time-consuming tasks in these analyses is single-site sampling of the location parameters from their full-conditional distributions. Let \(\varvec{\theta }=\left[ \begin{array}{c} \varvec{\beta }\\ \varvec{\alpha }\\ \varvec{\epsilon } \end{array}\right]\) denote the location parameters in Model (1). Then, following [22], the full-conditional distribution for \(\theta _{i}\) under BayesC with \(\pi =0\) is:

*i*th diagonal of the matrix \({\mathbf {C}}\) that denotes the LHS-MME given in (3), \(r_{i}\) is the right-hand-side element from (3) corresponding to \(\theta _{i}\), and \(\mathbf {c}'_{i}\) is row

*i*of \({\mathbf {C}}\). However, as mentioned previously, some sub-matrices of the LHS-MME given in (3) are dense and too large to be stored in memory. However, as explained below, the same strategy used to avoid storing these sub-matrices in PCG calculations can also be used here.

*i*th diagonal element from \({\mathbf {C}}\) is obtained from the

*j*th diagonal of \({\mathbf {B}}={\mathbf {M}}'{\mathbf {Z}}' {\mathbf {Z}}{\mathbf {M}}+\mathbf {I\dfrac{\sigma _{e}^{2}}{\sigma _{\alpha }^{2}}}\), which can be stored in memory. Similarly, \(r_{i}-{\mathbf {c}}'_{i}\varvec{\theta }+c_{ii}\theta _{i}\) is computed as:

*j*of the vector

*j*and \(b_{jj}\) is the

*j*th diagonal of \({\mathbf {B}}\). We have already seen how the large, dense matrix \({\mathbf {M}}_{n}'{\mathbf {Z}}'_{n}{\mathbf {Z}}_{n}\) can be multiplied by a vector such as \(\varvec{\epsilon }\) without storing this matrix in memory, and this same strategy can be used here to compute \({\mathbf {M}}'_{n}{\mathbf {Z}}'_{n}{\mathbf {Z}}{}_{n}\varvec{\epsilon }\). The full-conditional distribution for \(\alpha _{j}\), under BayesC with \(\pi =0\), becomes:

*j*th diagonal of \({\mathbf {B}}\) that now denotes \({\mathbf {B}}={\mathbf {Z}}_{n}'{\mathbf {Z}}_{n}+{\mathbf {A}}^{nn}\dfrac{\sigma _{e}^{2}}{\sigma _{g}^{2}}\), and

*j*of the vector that now denotes:

Samples of model effects such as \(\varvec{\beta }, \varvec{\alpha }\), and \(\varvec{\epsilon }\), or their linear functions that represent breeding values namely \({\mathbf {M}}_g\varvec{\alpha }\) for genotyped animals or \({\mathbf {M}}_n\varvec{\alpha } + \varvec{\epsilon }\) for non-genotyped animals can be accumulated as sums and sums of squares to obtain posterior means and prediction error variances. Alternatively, samples of fitted model effects can be written to a file for post-processing.

### Hybrid model for single-step Bayesian regression

The only difference between Eq. (15) and the MME given by Equation (A1) in Legarra and Ducrocq [15] is in \({\mathbf {Q}}\). Using the notation in this paper, the matrix expression \({\mathbf {M}}'_{n}{\mathbf {A}}^{nn}{\mathbf {M}}_{n}\) that is present in the \({\mathbf {Q}}\) is expressed as \({\mathbf {M}}'_{g}({\mathbf {A}}^{gg} - {\mathbf {A}}^{-1}_{gg}){\mathbf {M}}_{g}\) in that paper [15], which involves the inverse of \({\mathbf {A}}_{gg}\) that is difficult to compute. However, these two expressions are identical because \(({\mathbf {A}}^{gg} - {\mathbf {A}}_{gg}^{-1}) = {\mathbf {A}}^{gn} ({\mathbf {A}}^{nn})^{-1} {\mathbf {A}}^{ng}\) and \({\mathbf {M}}_n = -({\mathbf {A}}^{nn})^{-1} {\mathbf {A}}^{ng} {\mathbf {M}}_g\).

#### Computing strategies

*i*of \({\mathbf {M}}_{n}\) is obtained by solving the sparse system (2) for column

*i*and premultiply it by the sparse matrix \({\mathbf {A}}^{gn}\). This gives column

*i*of the product \({\mathbf {A}}^{gn}{\mathbf {M}}_{n}\), which has the same size as \({\mathbf {M}}_{g}\). The columns of \({\mathbf {A}}^{gn}{\mathbf {M}}_{n}\) can be computed one at a time or in parallel. Premultiplying \({\mathbf {A}}^{gn}{\mathbf {M}}_{n}\) by \(-{\mathbf {M}}'_{g}\) gives:

This needs to be done only once to set up the MME and has order equal to the number of marker genotypes which is often much less than the number of genotyped or non-genotyped animals.

These MME also contain two large, dense sub-matrices, namely \({\mathbf {A}}^{ng}{\mathbf {M}}_{g}\) and its transpose. As described previously, in the PCG iteration and in the Gibbs sampling, these matrices need to be post-multiplied by a vector. When the number of genotyped animals is sufficiently smaller than the number of non-genotyped animals, these matrix-by-vector products can be obtained more efficiently by storing in memory the sparse matrix \({\mathbf {A}}^{ng}\) and the dense but smaller matrix \({\mathbf {M}}_{g}\) rather than their product \({\mathbf {A}}^{ng}{\mathbf {M}}_{g}\). In each round of PCG iteration or Gibbs sampling, the matrix-by-vector product \({\mathbf {A}}^{ng}{\mathbf {M}}_{g}{\mathbf {q}}\), for example, is obtained by first multiplying the dense matrix \({\mathbf {M}}_{g}\) by the vector \({\mathbf {q}}\) and then premultiplying the result by the sparse matrix \({\mathbf {A}}^{ng}\). The corresponding calculation for the MEM required solving sparse systems of equations given by Eq. (4) in each round of PCG or Gibbs sampling, in addition to the two matrix-by-vector multiplications that are also required here.

In situations when the number of genotyped animals exceeds the number of non-genotyped animals, using SS-MEM that explicitly involves \({\mathbf {M}}_{n}\) in off-diagonal blocks may be competitive with SS-HM.

## Application of hybrid model

An example dataset from the American Simmental Association is used to demonstrate the computing effort to obtain PCG samples from (15) and the relative computing effort to obtain MCMC samples for the MME of (15) compared to (3). The vector of phenotypes comprised of 4,934,101 birth weight observations; there were 6,179,960 animals in the pedigree file; 31,453 animals in the pedigree file were genotyped and 23,290 of those had birth weight observations. After filtering marker covariates for low minor allele frequency, 40,214 marker effects were included in the model. There were 399,036 fixed effects, including the herd-year-season effects defined in the same manner as in the routine national evaluation. To keep this presentation that compares the computational effort involved in fitting (3) and (15) simple, our application was limited to a single trait ignoring maternal genetic and permanent environmental effects. Furthermore, we did not include a comparison with SS-GBLUP since that model cannot accommodate mixture priors for marker effects as used in this example.

The analyses were performed using a workstation built on an ASUS X99E WS motherboard, a Xeon E5-1650V3 3.5 Ghz processor overclocked to 4.2 Ghz, 128 GB of DDR4 ECC RAM at 2133 Mhz and four NVidia Titan X GPU, with 9TB of workspace in a RAID5 configuration comprising four SATA disks. The operating system was Ubuntu 14.04 LTS, and the BOLT software package (http://manual.thetasolutionsllc.com/IntroBolt) built with the CUDA Toolkit 7.5 was used.

The vector \({\mathbf {y}}\), and matrices \({\mathbf {X}},{\mathbf {Z}},{\mathbf {A}}^{nn},{\mathbf {A}}^{gn},{\mathbf {A}}^{ng},\) and \({\mathbf {M}}_{g}\) were built from data files using BOLT tools. Ordering the pedigree file, construction of \({\mathbf {A}}^{-1}\), including calculation of inbreeding, and its partitioning into blocks representing genotyped and non-genotyped animals took 3 min and required 1.0 Gb of disk storage and 302 Mb of memory. While \({\mathbf {A}}^{-1}\) was being formed, \({\mathbf {y}}\), \({\mathbf {X}}\) and \({\mathbf {Z}}\) were created in about 10 s and required 38, 78 and 83 Mb of disk storage. When stored in memory, they required 19.7, 59.2 and 59.2 Mb respectively. The matrix \({\mathbf {M}}_{g}\) required 4.2 Gb of disk and memory when stored in single precision.

The matrix product \({\mathbf {X}}'_{g}{\mathbf {Z}}_{g}{\mathbf {M}}_{g}\) and its transpose \({\mathbf {M}}'_{g}{\mathbf {Z}}'_{g}{\mathbf {X}}_{g}\) were not explicitly formed, instead computations involving those terms were done in parts as described previously in this paper. The sparse Cholesky decomposition of the 6,148,507 order \(\mathbf {A}^{nn}\) matrix took just under 4 min. The imputation of \({\mathbf {M}}_{n}\), using forward and backward substitution with the Cholesky factor, and its premultiplication by \({\mathbf {A}}^{gn}\) took just over 35 min using eight parallel processes. The creation of the matrix products \({\mathbf {M}}'_{g}{\mathbf {Z}}'_{g}{\mathbf {Z}}_{g}{\mathbf {M}}_{g}\) and \({\mathbf {M}}'_{n}{\mathbf {A}}^{nn}{\mathbf {M}}_{n}\) each took about 20 s using 2 GPU after obtaining the imputed values and required 6.2 Gb of disk storage and memory when stored in single precision.

For the analysis using Eq. (15) the PCG solution of the MME stored in double precision took just under 40 min, using a single GPU and diagonal preconditioning. Because the PCG was performed in double precision just under 18 Gb of memory was required to store all the sub-matrices comprising the left-hand side. The right-hand side required 53Mb of memory. Additional memory for work space of approximately 4 Gb was required for PCG. Convergence was determined by comparing solutions from every 200 rounds of iteration to solutions from 5000 rounds. By 1800 rounds the correlation and regression of solutions with those from 5000 rounds were very close to one (.99 each). The PCG residual value was near 1.1e\(-05\). The PCG solution does not give the posterior mean of the marker effects for a model with mixture priors, but was used to define starting values for MCMC sampling of all the effects in the MME, but using a mixture prior for marker effects.

Starting with the same PCG solution but different random number generator seed values, using 4 parallel chains each drawing 10,500 samples on its own GPU took 70 min to obtain a total of 42,000 Gibbs samples, using \(\pi =0.95\) and known variance ratios in Eqs. (3) and (15). Each of the parallel Gibbs Sampler jobs shared a single copy in shared memory of the left-hand-side matrices, reducing the memory requirements and reading from disk.

Experience has shown that 40,000 samples after burn-in is sufficient to obtain posterior means of breeding values that are stable for the hybrid model and MEM. However, we confirmed this by sampling four additional parallel chains each with length 250,000 samples after a 5000 sample burn. The purpose of these very long chains was to confirm that the 40,000 length post-burn-in chain was sufficient, thus supporting the timings provided here to achieve useful results. The correlation of the posterior means of the breeding values for genotyped animals and non-genotyped animals from the aggregated 40,000 length chain was .99 and 1.0, respectively, with the posterior means from the aggregated 1,000,000 length chain. However, a chain longer that 40,000 may be needed to accurately estimate PEV for animals with intermediate to low accuracies. Because only off-diagonal blocks of the left-hand side are used in the GPU computation for updating the right-hand-sides for each vector of single-site Gibbs samples for \(\varvec{\beta }\), \(\varvec{\alpha }\) or \({\mathbf {u}}_{n}\), and the Gibbs samples were obtained using single precision, the entire left-hand side without the diagonal blocks fit on the GPU. This strategy also allowed the GPU to asynchronously update the right-hand-side while the next set of effects was being sampled using the CPU.

The Gibbs sampler was performed using single precision for storage of the left- and right-hand sides, requiring approximately half as much memory as the 18 Gb required for the PCG which was performed in double precision. Additional memory for work space of approximately 2 Gb was required for the Gibbs sampler. The total time required to assemble the left- and right-hand sides, after the matrix components were formed, was just under 3 min. The total job time for all steps, starting with the raw data, to obtain posterior mean estimates of the MCMC samples of marker effects and MCMC samples of breeding values of the genotyped and non-genotyped animals and their prediction error variances (from the posterior variances of their MCMC samples), took approximately 3 h.

The memory required to store \({\mathbf {M}}_{g}\) is determined by the product of the number of animals genotyped and the number of marker covariates. A compressed dense format (CBRC) allows this matrix to be 32 times larger than with the double precision version used above, but increased the computing time for PCG in this example by 25%.

An additional Gibbs sampler run was made with the MME of (3) that used Eq. (4), which requires within each iteration, forward and backward solves using the factor of \({\mathbf {A}}^{nn}\). The time required to obtain one sample of all effects was 2.0 s. Using the MME of (15) required only 0.44 s for each sample of all effects. Accordingly, the hybrid model has considerable advantage over that of [10]. These two computing approaches should give the same estimates of breeding values as they represent equivalent models as explained in the theory section. The correlations between the MCMC-derived estimates of breeding values between the two approaches were 1.0 for non-genotyped animals and over 0.99 for genotyped animals.

Computational performance of Eq. (15) was compared ignoring the genotypes on approximately half the genotyped animals to demonstrate the effect of the proportion of genotyped animals on computing time. This reduced dataset left 15,694 animals with genotype information of which 11,683 had a birth weight observation. The total number of animals in the pedigree file and number of observations on birth weight were the same as before. After filtering the marker covariates for low minor allele frequency, 40,211 marker loci remained. The time necessary to complete the PCG solver was about 3 min less than the 27 min needed for the larger analysis, which had approximately double the number of genotyped animals. The reduction in time necessary to complete the PCG solver was primarily due to reductions in time used for matrix multiplications involving the smaller matrix \({\mathbf {M}}_{g}\). The time necessary to obtain the 42,000 Gibbs samples was reduced by about 20% to 1 h. Imputation required 24 instead of 35 min. Creating \({\mathbf {M}}'_{g}{\mathbf {Z}}'_{g}{\mathbf {Z}}_{g}{\mathbf {M}}_{g}\) and \({\mathbf {M}}'_{n}{\mathbf {A}}^{nn}{\mathbf {M}}_{n}\) required just under 20 s, the same as before. Thus, doubling the proportion of genotyped animals increased the total job time from about 2.5 to 3 h.

## Discussion

Here we have introduced a single-step HM that is equivalent to the MEM in [10] and in a special case equivalent to SS-GBLUP. The HM includes marker effects and breeding values, and the off-diagonal sub matrices comprise the term \({\mathbf {A}}^{ng}{\mathbf {M}}_{g}.\) Computations that involve this sub-matrix can be done efficiently in parts without having to solve equations of the form \({\mathbf {A}}^{nn}{\mathbf {x}}={{\mathbf {b}}}.\)

The off-diagonal sub-matrices in both these models are the same size, the lower off-diagonal matrix being of the order of the number of non-genotyped animals by the number of markers. In SS-HM, this sub-matrix has a more convenient structure for storage and computation than is generally the case for SS-MEM. We have demonstrated its practicality and its considerable reduction in computing effort. This model can be readily extended to accommodate multiple traits, multiple breeds, maternal effects, and additional random effects such as polygenic residual effects.

## Declarations

### Authors' contributions

DJG developed the computational strategies for the MEM and also proposed the HM. RLF, HC and DJG derived the MME for the HM. BLG wrote the software and performed all the data analyses. All authors read and approved the final manuscript.

### Acknowledgements

We are grateful to the reviewers of this manuscript for several helpful suggestions and for pointing out the relationship between the MME for the hybrid model (15) and the MME in [A1] of Legarra and Ducrocq [15]. The authors thank the American Simmental Association for sharing their data. This work was supported in part by the US Department of Agriculture, Agriculture and Food Research Initiative National Institute of Food and Agriculture Competitive Grant No. 2015-67015-22947.

### Competing interests

DJG and BLG are partners in Theta Solutions LLC that developed the BOLT software used in this paper. The authors declare that they have no competing interests.

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

## Authors’ Affiliations

## References

- Strandén I, Garrick DJ. Technical note: Derivation of equivalent computing algorithms for genomic predictions and reliabilities of animal merit. J Dairy Sci. 2009;92:2971–5.View ArticlePubMedGoogle Scholar
- Fernando RL. Genetic evaluation and selection using genotypic, phenotypic and pedigree information. In: Proceedings of the 6th World Congress on Genetics Applied to Livestock Production: 11–16 January 1998. vol. 26. Armidale; 1998. pp. 329–36.Google Scholar
- Meuwissen THE, Hayes BJ, Goddard ME. Prediction of total genetic value using genome-wide dense marker maps. Genetics. 2001;157:1819–29.PubMedPubMed CentralGoogle Scholar
- Nejati-Javaremi A, Smith C, Gibson JP. Effect of total allelic relationship on accuracy of evaluation and response to selection. J Anim Sci. 1997;75:1738–45.View ArticlePubMedGoogle Scholar
- Habier D, Fernando RL, Dekkers JCM. The impact of genetic relationship information on genome-assisted breeding values. Genetics. 2007;177:2389–97.PubMedPubMed CentralGoogle Scholar
- VanRaden PM. Efficient methods to compute genomic predictions. J Dairy Sci. 2008;91:4414–23.View ArticlePubMedGoogle Scholar
- Legarra A, Aguilar I, Misztal I. A relationship matrix including full pedigree and genomic information. J Dairy Sci. 2009;92:4656–63.View ArticlePubMedGoogle Scholar
- Christensen OF, Lund MS. Genomic prediction when some animals are not genotyped. Genet Sel Evol. 2010;42:2.View ArticlePubMedPubMed CentralGoogle Scholar
- Aguilar I, Misztal I, Johnson DL, Legarra A, Tsuruta S, Lawlor TJ. Hot topic: a unified approach to utilize phenotypic, full pedigree, and genomic information for genetic evaluation of Holstein final score. J Dairy Sci. 2010;93:743–52.View ArticlePubMedGoogle Scholar
- Fernando RL, Dekkers JCM, Garrick DJ. A class of Bayesian methods to combine large numbers of genotyped and non-genotyped animals for whole-genome analyses. Genet Sel Evol. 2014;46:59.View ArticleGoogle Scholar
- Gianola D, de los Campos G, Hill WG, Manfredi E, Fernando R. Additive genetic variability and the Bayesian alphabet. Genetics. 2009;83:347–63.View ArticleGoogle Scholar
- de los Campos G, Naya H, Gianola D, Crossa J, Legarra A, Manfredi E, et al. Predicting quantitative traits with regression models for dense molecular markers and pedigree. Genetics. 2009;182:375–85.View ArticlePubMedPubMed CentralGoogle Scholar
- Habier D, Fernando RL, Kizilkaya K, Garrick DJ. Extension of the Bayesian alphabet for genomic selection. In: Proceedings of the 9th world congress on genetics applied to livestock production: 1–6 August 2010. Leipzig 2010.Google Scholar
- Liu Z, Goddard ME, Reinhardt F, Reents R. A single-step genomic model with direct estimation of marker effects. J Dairy Sci. 2014;97:5833–50.View ArticlePubMedGoogle Scholar
- Legarra A, Ducrocq V. Computational strategies for national integration of phenotypic, genomic, and pedigree data in a single-step best linear unbiased prediction. J Dairy Sci. 2012;95:4629–45.View ArticlePubMedGoogle Scholar
- Vitezica ZG, Aguilar I, Misztal I, Legarra A. Bias in genomic predictions for populations under selection. Genet Res (Camb). 2011;93:357–66.View ArticleGoogle Scholar
- Strandén I, Lidauer M. Solving large mixed linear models using preconditioned conjugate gradient iteration. J Dairy Sci. 1999;82:2779–87.View ArticlePubMedGoogle Scholar
- Masuda Y, Misztal I, Tsuruta S, Legarra A, Aguilar I, Lourenco DAL, et al. Implementation of genomic recursions in single-step genomic best linear unbiased predictor for US Holsteins with a large number of genotyped animals. J Dairy Sci. 2016;99:1968–74.View ArticleGoogle Scholar
- Chen Y, Davis TA, Hager WW, Rajamanickam S. Algorithm 887: CHOLMOD, supernodal sparse Cholesky factorization and update/downdate. ACM Trans Math Softw. 2008;35:22.View ArticleGoogle Scholar
- Habier D, Fernando RL, Kizilkaya K, Garrick DJ. Extension of the Bayesian alphabet for genomic selection. BMC Bioinformatics. 2011;12:186.View ArticlePubMedPubMed CentralGoogle Scholar
- Fernando R, Garrick D. Bayesian methods applied to GWAS. In: Gondro C, van der Werf J, Hayes B, editors. Genome-wide association studies and genomic prediction. New York: Humana Press; 2013.Google Scholar
- Sorensen DA, Gianola D. Likelihood, Bayesian, and MCMC methods in quantitative genetics. New York: Springer; 2002.View ArticleGoogle Scholar
- Cheng H, Qu L, Garrick DJ, Fernando RL. A fast and efficient Gibbs sampler for BayesB in whole-genome analyses. Genet Sel Evol. 2015;47:80.View ArticlePubMedPubMed CentralGoogle Scholar