A fast and efficient Gibbs sampler for BayesB in whole-genome analyses

Background In whole-genome analyses, the number p of marker covariates is often much larger than the number n of observations. Bayesian multiple regression models are widely used in genomic selection to address this problem of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p\gg n.$$\end{document}p≫n. The primary difference between these models is the prior assumed for the effects of the covariates. Usually in the BayesB method, a Metropolis–Hastings (MH) algorithm is used to jointly sample the marker effect and the locus-specific variance, which may make BayesB computationally intensive. In this paper, we show how the Gibbs sampler without the MH algorithm can be used for the BayesB method. Results We consider three different versions of the Gibbs sampler to sample the marker effect and locus-specific variance for each locus. Among the Gibbs samplers that were considered, the most efficient sampler is about 2.1 times as efficient as the MH algorithm proposed by Meuwissen et al. and 1.7 times as efficient as that proposed by Habier et al. Conclusions The three Gibbs samplers presented here were twice as efficient as Metropolis–Hastings samplers and gave virtually the same results.


Background
In whole-genome analyses, the number p of marker covariates is often much larger than the number n of observations. Bayesian multiple regression models are widely used in genomic selection to address this problem of p ≫ n. The primary difference between these models is the prior assumed for the effects of the covariates. These priors and their effects on inference have been recently reviewed by Gianola [1]. In most Bayesian analyses of whole-genome data, inferences are based on Markov chains constructed to have a stationary distribution equal to the posterior distribution of the unknown parameters of interest [2]. This is often done by employing a Gibbs sampler where samples are drawn from the full-conditional distributions of the parameters [3].
It can be shown that in BayesA introduced by Meuwissen et al. [4], the prior for each marker effect follows a scaled t distribution [5]. However when the prior for the marker effect is specified as a t distribution, its full-conditional is not of a known form. Fortunately, this prior can also be specified as a normally distributed marker effect conditional on a locus-specific variance, which is given a scaled inverted chi-square distribution. When marginalized over the variance, this gives a t distribution for the marker effect [5]. Thus, the posterior for the marker effect would be identical under both these priors. The second form of the prior, however, is more convenient because it results in the full-conditional for the marker effect having a normal distribution.
BayesA is a special case of BayesB, also introduced by Meuwissen et al. [4], where the prior for each marker effect follows a mixture distribution with a point mass at zero with probability π and a univariate-t distribution with probability 1 − π [5]. When π = 0, BayesB becomes BayesA. When the marker effect is non-null, as in BayesA, the second form of the prior leads to the full-conditional of the marker effect being normal. Nevertheless, Meuwissen et al. [4] used a Metropolis-Hastings (MH) algorithm to jointly sample the marker effect and the locus-specific variance because they argued that "the Gibbs sampler will not move through the entire sampling space" for BayesB. In their MH algorithm, they use the prior distribution of the locus-specific variance as the proposal distribution. When π is high, the proposed values for the marker effect will be zero with high probability. Thus, for each locus, 100 cycles of MH algorithm were used in their paper, which makes BayesB computationally intensive. Habier et al. [6] used an alternative proposal, where the marker effect was zero with probability 0.5, that leads to a more efficient MH algorithm. For each locus, five cycles of MH were used to sample marker effects in this efficient MH method.
In this paper, we will show how Gibbs samplers without the MH algorithm can be used for the BayesB method. Recall that by introducing a locus-specific variance into BayesA, the full-conditional for the marker effects becomes normal. Similarly, in this paper we show that by introducing a variable δ j in BayesB, indicating whether the marker effect for a locus is zero or non-zero, the marker effect and locus-specific variance can be sampled using Gibbs. We consider three different versions of the Gibbs sampler to sample each marker effect, locus-specific variance and its indicator variable δ j . The objectives of this paper are to introduce these samplers and study their performance.

Statistical methods
BayesB introduced by Meuwissen et al. [4] assumes that each locus-specific variance follows a mixture distribution. However, following Gianola [5], we prefer to specify the mixture at the level of the marker effect instead of the locus specific variance. In this formulation, the prior for the marker effect is a mixture with a point mass at zero and a univariate normal distribution conditional on σ 2 j : where σ 2 j follows a scaled inverted chi-square distribution and π is treated as known. Employing the concept of data augmentation, it is convenient to write the marker effect α j as α j = β j δ j , where we introduce a Bernoulli variable δ j with probability of success 1 − π and normally distributed variable β j with mean zero and variance σ 2 j , which has a scaled inverted chi-square distribution. As shown below, Gibbs sampling can be used to draw samples for these unknowns.

Gibbs samplers for BayesB
Here, we present three Gibbs samplers for BayesB. The first is a single-site Gibbs sampler, where all parameters are sampled from their full conditional distributions. The second is a joint Gibbs sampler, where δ j , β j are sampled from the joint full-conditional distribution f δ j , β j |y, µ, β −j , δ −j , ξ , σ 2 e , where ξ = σ 2 1 , σ 2 2 , . . . , σ 2 k , because δ j and β j are highly dependent. Carlin and Chib α j |σ 2 j = 0 with probability π ∼ N 0, σ 2 j with probability (1 − π), [7] have shown that the prior used for parameters that are not in the model does not affect the Bayes factor. Thus, this prior, which they call a pseudo prior, can be chosen to improve mixing of the sampler. Following Carlin and Chib, the third sampler is a Gibbs sampler where a pseudo prior is used for β j when δ j is zero. Godsill [8] has shown that the marginal posterior for a parameter in the model does not depend on the choice of pseudo priors. It has been suggested to choose the full conditional distribution for β j when it is in the model as the pseudo prior [7,8]. This choice is justified by showing that using this pseudo prior is equivalent to sampling δ j from . However, in BayesB, use of the exact full conditional distribution as the pseudo prior will require MH to sample σ 2 j and σ 2 e . Thus in this paper, a distribution close to the full conditional is used.

BayesB model with data augmentation model
where y i is the phenotype for individual i, µ is the overall mean, k is the number of SNPs, X ij is the genotype covariate at locus j for animal i (coded as 0, 1, 2), β j is the allele substitution effect for locus j, δ j is an indicator variable and e i is the random residual effect for individual i.

Priors
The prior for µ is a constant. The prior for e i is

Single-site Gibbs sampler
The full conditional distributions of µ, σ 2 j and σ 2 e are well-known [4,9,10]. Thus they are presented here without derivations. The full conditional of µ is a normal distribution with mean 1 ′ e µ n and variance σ 2 e n , where e µ = y − k j=1 X j β j δ j and n is the number of individuals. The full conditional distributions of σ 2 j and σ 2 e are both scaled inverted chi-square distributions; for σ 2 j , the scale parameter is and the degrees of freedom parameter are ν β + 1; for σ 2 e , the scale parameter is ν e S 2 e +e ′ e ν e +n and the degrees of freedom parameter are ν e + n, where e = y − 1µ − k j=1 X j β j δ j . (1) Next, we derive the full conditional distributions of β j and δ j . These full conditional distributions are proportional to the joint distribution of all parameters and y, which can be written as The full conditional distribution of β j is now obtained by dropping factors that do not involve β j , which gives where ELSE stands for all the other parameters and y. This means when δ j = 1, the sampling of β j is identical to that in BayesA; when δ j = 0, β j is sampled from its prior. This is different from the original implementation of BayesB introduced by Meuwissen et al. [4]. Similarly, the full conditional distribution of δ j can be obtained from (2) by dropping all factors free of δ j , which gives Thus,

Joint Gibbs sampler
The same priors as in single-site Gibbs sampler are used here. The only difference is that δ j , β j are sampled from their joint full conditional distribution, which can be written as the product of the full conditional distribution of β j given δ j and marginal full conditional distribution of δ j : Thus, δ j is first sampled from f δ j |y, µ, β −j , δ −j , ξ , σ 2 e . Then β j is sampled from f β j |δ j , y, µ, β −j , δ −j , ξ , σ 2 e , which is identical to the sampling of β j in BayesB with single-site Gibbs sampler. The marginal full conditional for δ j can be written as: is a multivariate normal distribution with mean E e + X j β j δ j | δ j , σ 2 j , σ 2 e and variance Var X j β j δ j + e | δ j , σ 2 j , σ 2 e . When δ j = 1, it becomes a multivariate normal with null mean and variance X j X ′ j σ 2 j + Iσ 2 e ; when δ j = 0, it becomes a multivariate normal with null mean and variance Iσ 2 e . Thus, samples can be drawn using (4) where h i = f w | δ j = i, σ 2 j , σ 2 e . However, evaluating the multivariate normal distribution f w | δ j , σ 2 j , σ 2 e is computationally intense. An efficient way is to use the univariate distribution of X ′ j w, which contains all the information from w about β j , instead of the distribution of w, which is a multivariate. Thus, (5) can be written as e , m 1 is an univariate normal distribution with null mean and variance X ′ j X j 2 σ 2 j + X ′ j X j σ 2 e , and m 0 is an univariate normal distribution with null mean and variance X ′ j X j σ 2 e .

Gibbs sampler with pseudo priors
Here, following Carlin and Chib [7], a pseudo prior is used for β j when δ j is zero. They proposed to use the full conditional distribution of β j when δ j = 1 as the pseudo prior for β j when δ j = 0 [7,8], which results in the prior for β j as We show below that the posterior mean of the marker effect, α j = β j δ j , does not depend on the pseudo prior. This posterior mean can be written as: The numerator in (7) is free of the pseudo prior: f β j | δ j = 0 . Furthermore, it can be seen from the model equation (1) that the value of y is free of β j when δ j = 0. Thus, the marginal distribution of y, the denominator of (7), does not depend on the pseudo prior, which is the distribution of β j when δ j = 0. As both the numerator and denominator of (1) are free of the pseudo prior, it follows that the posterior mean of α j does not depend on the pseudo prior for β j . We show here that, given this pseudo prior, the full conditional for δ j is identical to the marginal full conditional distribution of δ j , Pr δ j |y, µ, β −j , δ −j , ξ , σ 2 e , which is used in the joint Gibbs sampler.
The full conditional probability of δ j = 1 can be written as: where with i = 0 or 1.
The ratio in the denominator of (8) is: In the above equation, (9) is identical to h 0 in (5) used in the joint Gibbs sampler, because f w | δ j = 0, β j , σ 2 e is also a multivariate normal with null mean and variance Iσ 2 e . We show below that (10) is identical to h −1 1 . Our proposed prior for β j when δ j = 0 (the pseudo prior) can be written as: After replacing the pseudo prior in (10) with (11), it becomes 1 f w | δ j = 1, σ 2 j , σ 2 e , which is identical to h −1 1 .
Thus, the ratio g 0 g 1 in (8) is identical to which proves the full conditional probability (8) of δ j = 1, when the proposed prior is used, is identical to (6), the marginal full conditional probability of δ j = 1, which is used in the joint Gibbs sampler. Use of the exact full conditional distribution as the pseudo prior in BayesB, however, will require MH to sample σ 2 j and σ 2 e . Thus, a distribution close to the full conditional is used here. Here, we use a normal distribution with mean , and σ 2 e and σ 2 j are means of the prior distributions for the residual and the marker effect variances, respectively.
Next, we will show the derivation of the full conditionals, which are proportional to the joint distribution of all parameters and y. Here, the joint distribution of all parameters and y can be written as:

Results
The number of effective samples per second of computing time was obtained for BayesB using MH, efficient MH or the three different Gibbs samplers. These three Gibbs samplers were almost twice as efficient as Metropolis-Hastings (Table 1). The prediction accuracies for different samplers, which are calculated as the correlation between

Discussion
In the joint Gibbs sampler, δ j and β j are sampled jointly, which addresses the problem of dependence between δ j and β j . Thus, the joint sampler had the largest effective sample size. However, in the single-site Gibbs sampler, δ j and β j are sampled from their full conditionals, and thus due to the dependence between β j and δ j , the single-site Gibbs sampler had the smallest effective sample size. These differences in effective sample size, however, were small.
In the Gibbs sampler with pseudo priors, β j and δ j are also sampled from their full conditionals. Recall that we have shown that the posterior mean of the marker effects does not depend on the pseudo prior. Furthermore, Godsill [8] has shown that the marginal posterior for parameters in the model do not depend on the pseudo prior, which is the prior for β j when δ j = 0. As suggested by Carlin and Chib [7], when the full conditional distribution of β j when δ j = 1 is chosen to be the pseudo prior, we have shown that the samples of β j and δ j are identically distributed to those from the joint Gibbs sampler. Thus, the Gibbs sampler with pseudo priors will have a similar effective sample size as the joint Gibbs sampler.
However, when the full conditional distribution for β j when δ j = 1 is used as the pseudo prior in BayesB, the full conditional distributions of σ 2 e and σ 2 j are not of known forms because σ 2 e and σ 2 j are in the pseudo prior for the marker effect. In contrast to BayesB, in the model used by Godsill [8] to justify the use of full conditional distributions as the pseudo priors, for simplicity, hyper-parameters such as σ 2 e were omitted [8]. Here, we have replaced σ 2 e and σ 2 j in the pseudo prior with constants such that the full conditionals for σ 2 e and σ 2 j have scaled inverted chi-square distributions. This modification will give a pseudo prior whose distribution is close to that of the full conditional. In the Gibbs sampler with this pseudo prior, the effective sample size was smaller than in the joint Gibbs sampler but still larger than in the single-site Gibbs sampler.

Conclusions
When a MH algorithm is used to jointly sample the marker effect and the locus-specific variance, the BayesB method is computationally intensive. After introducing a variable δ j , indicating whether the marker effect for a locus is zero or non-zero, the marker effect and locusspecific variance can be sampled using Gibbs sampler without MH. Among the Gibbs samplers that were considered here, the joint Gibbs sampler is the most efficient. This sampler is about 2.1 times as efficient as the MH algorithm proposed by Meuwissen et al. [4] and 1.7 times as efficient as that proposed by Habier et al. [6].