# A fast and efficient Gibbs sampler for BayesB in whole-genome analyses

- Hao Cheng
^{1, 2}, - Long Qu
^{3}, - Dorian J. Garrick
^{1, 4}and - Rohan L. Fernando
^{1}Email author

**Received: **27 February 2015

**Accepted: **23 September 2015

**Published: **14 October 2015

## Abstract

### Background

In whole-genome analyses, the number *p* of marker covariates is often much larger than the number *n* of observations. Bayesian multiple regression models are widely used in genomic selection to address this problem of \(p\gg n.\) The primary difference between these models is the prior assumed for the effects of the covariates. Usually in the BayesB method, a Metropolis–Hastings (MH) algorithm is used to jointly sample the marker effect and the locus-specific variance, which may make BayesB computationally intensive. In this paper, we show how the Gibbs sampler without the MH algorithm can be used for the BayesB method.

### Results

We consider three different versions of the Gibbs sampler to sample the marker effect and locus-specific variance for each locus. Among the Gibbs samplers that were considered, the most efficient sampler is about 2.1 times as efficient as the MH algorithm proposed by Meuwissen et al. and 1.7 times as efficient as that proposed by Habier et al.

### Conclusions

The three Gibbs samplers presented here were twice as efficient as Metropolis–Hastings samplers and gave virtually the same results.

## Keywords

## Background

In whole-genome analyses, the number *p* of marker covariates is often much larger than the number *n* of observations. Bayesian multiple regression models are widely used in genomic selection to address this problem of \(p\gg n.\) The primary difference between these models is the prior assumed for the effects of the covariates. These priors and their effects on inference have been recently reviewed by Gianola [1]. In most Bayesian analyses of whole-genome data, inferences are based on Markov chains constructed to have a stationary distribution equal to the posterior distribution of the unknown parameters of interest [2]. This is often done by employing a Gibbs sampler where samples are drawn from the full-conditional distributions of the parameters [3].

It can be shown that in BayesA introduced by Meuwissen et al. [4], the prior for each marker effect follows a scaled *t* distribution [5]. However when the prior for the marker effect is specified as a *t* distribution, its full-conditional is not of a known form. Fortunately, this prior can also be specified as a normally distributed marker effect conditional on a locus-specific variance, which is given a scaled inverted chi-square distribution. When marginalized over the variance, this gives a *t* distribution for the marker effect [5]. Thus, the posterior for the marker effect would be identical under both these priors. The second form of the prior, however, is more convenient because it results in the full-conditional for the marker effect having a normal distribution.

BayesA is a special case of BayesB, also introduced by Meuwissen et al. [4], where the prior for each marker effect follows a mixture distribution with a point mass at zero with probability \(\pi\) and a univariate-*t* distribution with probability \(1-\pi\) [5]. When \(\pi =0\), BayesB becomes BayesA. When the marker effect is non-null, as in BayesA, the second form of the prior leads to the full-conditional of the marker effect being normal. Nevertheless, Meuwissen et al. [4] used a Metropolis–Hastings (MH) algorithm to jointly sample the marker effect and the locus-specific variance because they argued that “the Gibbs sampler will not move through the entire sampling space” for BayesB. In their MH algorithm, they use the prior distribution of the locus-specific variance as the proposal distribution. When \(\pi\) is high, the proposed values for the marker effect will be zero with high probability. Thus, for each locus, 100 cycles of MH algorithm were used in their paper, which makes BayesB computationally intensive. Habier et al. [6] used an alternative proposal, where the marker effect was zero with probability 0.5, that leads to a more efficient MH algorithm. For each locus, five cycles of MH were used to sample marker effects in this efficient MH method.

In this paper, we will show how Gibbs samplers without the MH algorithm can be used for the BayesB method. Recall that by introducing a locus-specific variance into BayesA, the full-conditional for the marker effects becomes normal. Similarly, in this paper we show that by introducing a variable \(\delta _{j}\) in BayesB, indicating whether the marker effect for a locus is zero or non-zero, the marker effect and locus-specific variance can be sampled using Gibbs. We consider three different versions of the Gibbs sampler to sample each marker effect, locus-specific variance and its indicator variable \(\delta _{j}\). The objectives of this paper are to introduce these samplers and study their performance.

## Methods

### Statistical methods

#### Gibbs samplers for BayesB

Here, we present three Gibbs samplers for BayesB. The first is a single-site Gibbs sampler, where all parameters are sampled from their full conditional distributions. The second is a joint Gibbs sampler, where \(\delta _{j},\beta _{j}\) are sampled from the joint full-conditional distribution \(f\left( \delta _{j},\beta _{j}|\varvec{y},\mu ,\varvec{\beta }_{-j},\varvec{\delta }_{-j},\varvec{\xi },\sigma _{e}^{2}\right)\), where \(\varvec{\xi }=\left[ \sigma _{1}^{2},\sigma _{2}^{2},\ldots ,\sigma _{k}^{2}\right]\), because \(\delta _{j}\) and \(\beta _{j}\) are highly dependent. Carlin and Chib [7] have shown that the prior used for parameters that are not in the model does not affect the Bayes factor. Thus, this prior, which they call a pseudo prior, can be chosen to improve mixing of the sampler. Following Carlin and Chib, the third sampler is a Gibbs sampler where a pseudo prior is used for \(\beta _{j}\) when \(\delta _{j}\) is zero. Godsill [8] has shown that the marginal posterior for a parameter in the model does not depend on the choice of pseudo priors. It has been suggested to choose the full conditional distribution for \(\beta _{j}\) when it is in the model as the pseudo prior [7, 8]. This choice is justified by showing that using this pseudo prior is equivalent to sampling \(\delta _{j}\) from \(f\left( \delta _{j}|\varvec{y},\mu ,\varvec{\beta }_{-j},\varvec{\delta }_{-j},\varvec{\xi },\sigma _{e}^{2}\right)\) [8]. However, in BayesB, use of the exact full conditional distribution as the pseudo prior will require MH to sample \(\sigma _{j}^{2}\) and \(\sigma _{e}^{2}\). Thus in this paper, a distribution close to the full conditional is used.

#### BayesB model with data augmentation model

#### Priors

#### Single-site Gibbs sampler

The full conditional distributions of \(\mu ,\sigma _{j}^{2}\) and \(\sigma _{e}^{2}\) are well-known [4, 9, 10]. Thus they are presented here without derivations. The full conditional of \(\mu\) is a normal distribution with mean \(\frac{1^{'}\mathbf {\mathbf {e_{\mu }}}}{n}\) and variance \(\frac{\sigma _{e}^{2}}{n}\), where \(\mathbf {e_{\mu }}=\mathbf {y}-\sum _{j=1}^{k}\mathbf {X}_{j}\beta _{j}\delta _{j}\) and *n* is the number of individuals. The full conditional distributions of \(\sigma _{j}^{2}\) and \(\sigma _{e}^{2}\) are both scaled inverted chi-square distributions; for \(\sigma _{j}^{2}\), the scale parameter is \(\frac{\left( \nu _{\beta }S_{\beta }^{2}+\beta _{j}^{2}\right) }{\nu _{\beta }+1}\) and the degrees of freedom parameter are \(\nu _{\beta }+1\); for \(\sigma _{e}^{2}\), the scale parameter is \(\frac{\nu _{e}S_{e}^{2}+\varvec{e^{'}e}}{\nu _{e}+n}\)and the degrees of freedom parameter are \(\nu _{e}+n\), where \(\mathbf {e}=\mathbf {y}-\mathbf {1}\mu -\sum _{j=1}^{k}\mathbf {X}_{j}\beta _{j}\delta _{j}\).

#### Joint Gibbs sampler

#### Gibbs sampler with pseudo priors

We show here that, given this pseudo prior, the full conditional for \(\delta _{j}\) is identical to the marginal full conditional distribution of \(\delta _{j}\), \(\Pr \left( \delta _{j}|\varvec{y},\mu ,\varvec{\beta }_{-j},\varvec{\delta }_{-j},\varvec{\xi },\sigma _{e}^{2}\right)\), which is used in the joint Gibbs sampler.

After replacing the pseudo prior in (10) with (11), it becomes \(\begin{aligned} \frac{1}{f\left( \varvec{w}\mid \delta _{j}=1,\sigma _{j}^{2},\sigma _{e}^{2}\right) }, \end{aligned}\) which is identical to \(h_{1}^{-1}\). Thus, the ratio \(\frac{g_{0}}{g_{1}}\) in (8) is identical to \(\frac{h_{0}Pr\left( \delta _{j}=1\right) }{h_{1}Pr\left( \delta _{j}=0\right) }\) in (6), which proves the full conditional probability (8) of \(\delta _{j}=1\), when the proposed prior is used, is identical to (6), the marginal full conditional probability of \(\delta _{j}=1\), which is used in the joint Gibbs sampler.

Use of the exact full conditional distribution as the pseudo prior in BayesB, however, will require MH to sample \(\sigma _{j}^{2}\) and \(\sigma _{e}^{2}\). Thus, a distribution close to the full conditional is used here. Here, we use a normal distribution with mean \(\frac{\mathbf{X}_{j}^{'}\varvec{w}}{\mathbf{X}_{j}^{'}\mathbf{X}_{j}+\widetilde{\lambda }}\)and variance \(\frac{\widetilde{\sigma _{e}^{2}}}{\mathbf{X}_{j}^{'}\mathbf{X}_{j}+\widetilde{\lambda }}\), where \(\widetilde{\lambda }=\frac{\widetilde{\sigma _{e}^{2}}}{\widetilde{\sigma _{j}^{2}}}\), and \(\widetilde{\sigma _{e}^{2}}\) and \(\widetilde{\sigma _{j}^{2}}\) are means of the prior distributions for the residual and the marker effect variances, respectively.

*X*, which are always centered, are orthogonal to the column vectors of ones so that \(\varvec{X_{j}^{'}}\mathbf {1}\mu =\mathbf {0}\). Thus, the full conditional of \(\mu\) is the same as that in the single-site Gibbs sampler. When \(\delta _{j}=1\), the full conditional distribution of \(\beta _{j}\) is identical to that in the single-site Gibbs sampler. When \(\delta _{j}=0\), (16) is the only part that includes \(\beta _{j}\). Thus the full conditional distribution of \(\beta _{j}\) is:

### Data

Real genotypic data and simulated phenotypic data were used here to compare BayesB using MH, efficient MH or the three different Gibbs samplers as described above. The genotypic data included 3961 individuals with 55,734 SNPs. The heritability of the simulated trait was 0.25. The training data contained 3206 individuals and the remaining individuals were used for testing. A chain of length of 50,000 was used to estimate parameters of interest. Prediction accuracies were calculated using different samplers. The effective sample sizes [11], which estimate the number of independent samples from a chain, were calculated for \(\sigma _{e}^{2}\) to compare convergence rates for different methods. Computing time for different methods with the same number of iterations were also compared.

## Results

Efficiency of alternative MCMC samplers for BayesB

Alternative MCMC samplers | |||||
---|---|---|---|---|---|

MH | Efficient MH | Single-site Gibbs | Joint Gibbs | Gibbs with pseudo prior | |

Computing time | 90,009 | 70,714 | 52,452 | 44,726 | 47,043 |

Effective sample size | 25,262 | 24,588 | 24,684 | 26,757 | 25,036 |

Effective samples/s | 0.280 | 0.347 | 0.471 | 0.598 | 0.532 |

## Discussion

In the joint Gibbs sampler, \(\delta _{j}\) and \(\beta _{j}\) are sampled jointly, which addresses the problem of dependence between \(\delta _{j}\) and \(\beta _{j}\). Thus, the joint sampler had the largest effective sample size. However, in the single-site Gibbs sampler, \(\delta _{j}\) and \(\beta _{j}\) are sampled from their full conditionals, and thus due to the dependence between \(\beta _{j}\) and \(\delta _{j}\), the single-site Gibbs sampler had the smallest effective sample size. These differences in effective sample size, however, were small.

In the Gibbs sampler with pseudo priors, \(\beta _{j}\) and \(\delta _{j}\) are also sampled from their full conditionals. Recall that we have shown that the posterior mean of the marker effects does not depend on the pseudo prior. Furthermore, Godsill [8] has shown that the marginal posterior for parameters in the model do not depend on the pseudo prior, which is the prior for \(\beta _{j}\) when \(\delta _{j}=0\). As suggested by Carlin and Chib [7], when the full conditional distribution of \(\beta _{j}\) when \(\delta _{j}=1\) is chosen to be the pseudo prior, we have shown that the samples of \(\beta _{j}\) and \(\delta _{j}\) are identically distributed to those from the joint Gibbs sampler. Thus, the Gibbs sampler with pseudo priors will have a similar effective sample size as the joint Gibbs sampler.

However, when the full conditional distribution for \(\beta _{j}\) when \(\delta _{j}=1\) is used as the pseudo prior in BayesB, the full conditional distributions of \(\sigma _{e}^{2}\) and \(\sigma _{j}^{2}\) are not of known forms because \(\sigma _{e}^{2}\) and \(\sigma _{j}^{2}\) are in the pseudo prior for the marker effect. In contrast to BayesB, in the model used by Godsill [8] to justify the use of full conditional distributions as the pseudo priors, for simplicity, hyper-parameters such as \(\sigma _{e}^{2}\) were omitted [8]. Here, we have replaced \(\sigma _{e}^{2}\) and \(\sigma _{j}^{2}\) in the pseudo prior with constants such that the full conditionals for \(\sigma _{e}^{2}\) and \(\sigma _{j}^{2}\) have scaled inverted chi-square distributions. This modification will give a pseudo prior whose distribution is close to that of the full conditional. In the Gibbs sampler with this pseudo prior, the effective sample size was smaller than in the joint Gibbs sampler but still larger than in the single-site Gibbs sampler.

## Conclusions

When a MH algorithm is used to jointly sample the marker effect and the locus-specific variance, the BayesB method is computationally intensive. After introducing a variable \(\delta _{j}\), indicating whether the marker effect for a locus is zero or non-zero, the marker effect and locus-specific variance can be sampled using Gibbs sampler without MH. Among the Gibbs samplers that were considered here, the joint Gibbs sampler is the most efficient. This sampler is about 2.1 times as efficient as the MH algorithm proposed by Meuwissen et al. [4] and 1.7 times as efficient as that proposed by Habier et al. [6].

## Declarations

### Authors' contributions

HC, LQ, DJG, RLF contributed to the development of the statistical methods. HC wrote the program code and conducted the analyses. The manuscript was prepared by RLF and HC. All authors read and approved the final manuscript.

### Acknowledgements

This work was supported by the US Department of Agriculture, Agriculture and Food Research Initiative National Institute of Food and Agriculture Competitive Grant No. 2012-67015-19420 and by National Institutes of Health Grant R01GM099992.

### Competing interests

The authors declare that they have no competing interests.

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

## Authors’ Affiliations

## References

- Gianola D. Priors in whole-genome regression: the bayesian alphabet returns. Genetics. 2013;157:1819–29.Google Scholar
- Norris JR. Markov chains. Cambridge series on statistical and probabilistic mathematics. New York: Cambridge University Press; 1997.Google Scholar
- Sorensen DA, Gianola D. Likelihood, Bayesian, and MCMC methods in quantitative genetics. New York: Springer; 2002.View ArticleGoogle Scholar
- Meuwissen THE, Hayes BJ, Goddard ME. Prediction of total genetic value using genome-wide dense marker maps. Genetics. 2001;157:1819–29.PubMed CentralPubMedGoogle Scholar
- Gianola D, de los Campos G, Hill WG, Manfredi E, Fernando R. Additive genetic variability and the bayesian alphabet. Genetics. 2009;183:347–63.View ArticlePubMed CentralPubMedGoogle Scholar
- Habier D, Fernando RL, Kizilkaya K, Garrick DJ. Extension of the bayesian alphabet for genomic selection. BMC Bioinformatics. 2011;12:186.View ArticlePubMed CentralPubMedGoogle Scholar
- Carlin BP, Chib S. Bayesian model choice via Markov chain Monte Carlo methods. J R Stat Soc Series B Stat Methodol. 1995;57:473–84.Google Scholar
- Godsill SJ. On the relationship between Markov chain Monte Carlo methods for model uncertainty. J Comput Graph Stat. 2001;10:230–48.View ArticleGoogle Scholar
- Habier D, Fernando RL, Kizilkaya K, Garrick DJ. Extension of the Bayesian alphabet for genomic selection. In: Proceedings of the 9th world congress on genetics applied to livestock production, 1–6 August 2010, Leipzig, 2010. http://www.kongressband.de/wcgalp2010/assets/pdf/0468.pdf.
- Fernando RL, Garrick D. Bayesian methods applied to GWAS. Methods Mol Biol. 2013;1019:237–74.View ArticlePubMedGoogle Scholar
- Geyer CJ. Practical Markov chain Monte Carlo. Stat Sci. 1992;7:473–511.View ArticleGoogle Scholar