Fast parallelized sampling of Bayesian regression models for whole-genome prediction

Background Bayesian regression models are widely used in genomic prediction, where the effects of all markers are estimated simultaneously by combining the information from the phenotypic data with priors for the marker effects and other parameters such as variance components or membership probabilities. Inferences from most Bayesian regression models are based on Markov chain Monte Carlo methods, where statistics are computed from a Markov chain constructed to have a stationary distribution that is equal to the posterior distribution of the unknown parameters. In practice, chains of tens of thousands steps are typically used in whole-genome Bayesian analyses, which is computationally intensive. Methods In this paper, we propose a fast parallelized algorithm for Bayesian regression models called independent intensive Bayesian regression models (BayesXII, “X” stands for Bayesian alphabet methods and “II” stands for “parallel”) and show how the sampling of each marker effect can be made independent of samples for other marker effects within each step of the chain. This is done by augmenting the marker covariate matrix by adding p (the number of markers) new rows such that columns of the augmented marker covariate matrix are orthogonal. Ideally, the computations at each step of the MCMC chain can be accelerated by k times, where k is the number of computer processors, up to p times, where p is the number of markers. Results We demonstrate the BayesXII algorithm using the prior for BayesC\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi $$\end{document}π, a Bayesian variable selection regression method, which is applied to simulated data with 50,000 individuals and a medium-density marker panel (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sim $$\end{document}∼ 50,000 markers). To reach about the same accuracy as the conventional samplers for BayesC\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi $$\end{document}π required less than 30 min using the BayesXII algorithm on 24 nodes (computer used as a server) with 24 cores on each node. In this case, the BayesXII algorithm required one tenth of the computation time of conventional samplers for BayesC\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi $$\end{document}π. Addressing the heavy computational burden associated with Bayesian methods by parallel computing will lead to greater use of these methods.


Background
Genome-wide single nucleotide polymorphism (SNP) data have been adopted for whole-genome analyses, including genomic prediction [1] and genome-wide association studies [2]. Bayesian regression models are widely used in genomic prediction, where the effects of all markers are estimated simultaneously by combining the information from the phenotypic data and priors for the marker effects and other parameters such as variance components or membership probabilities. Most of the widely-used Bayesian regression models differ only in the prior used for the marker effects. For example, the prior for each marker effect in BayesA [1] follows a scaled t-distribution, whereas in variable selection regression, the prior for each marker effect is a mixture distribution, such as BayesB [1], BayesC [3], BayesCπ [4] and BayesR [5,6].
In these Bayesian regression models, closed-form expressions for the marginal posterior distributions of parameters of interest, e.g., marker effects, are usually not available. Thus, inferences from most Bayesian methods are based on Markov chain Monte Carlo (MCMC) methods, where statistics are computed from a Markov chain that is constructed such that the stationary distribution of a random vector x is equal to the posterior. It has been shown that statistics computed from such a Markov chain converge to those from the stationary distribution as the chain length increases [7]. In Bayesian regression models for genomic prediction, the vector x has a length about equal to the number p of markers or a multiple of it if auxiliary variables such as locus-specific marker effect variances are introduced to the analysis as in BayesA or BayesB. A widely used method to construct such a Markov chain is Gibbs sampling. In Gibbs sampling, at step t, each component of the vector x t is sampled from the conditional distribution of that component given all the other components sampled up to that point [8]. In some Gibbs samplers proposed for Bayesian variable selection methods such as BayesB [1,9], BayesC [3] or BayesCπ [4], for example, within each step, each variable in the vector x is sampled conditionally on all the other variables. This includes, for each marker i, its effect, the effect variance and a Bernoulli variable indicating whether the effect is zero or non-zero, as well as all nonmarker effects and the residual variance. These are examples of a single-site Gibbs sampler where each variable is iteratively sampled conditional on the current values of all other variables.
In practice, chains of about tens of thousands steps are typically used in whole-genome Bayesian regression models [10,11], and within each step of the chain, Gibbs sampling requires iteratively sampling at least p unknowns. Medium-level marker panels are often used in whole-genome prediction projects in animal breeding, where p is about 50,000 or larger. Thus, Bayesian regression models are usually computationally intensive, and commonly used strategies to sample these p unknowns efficiently are demanding.
The Ergodic theorem of Markov chain theory states that statistics computed from an increasingly long chain converge to those from the stationary distribution [7], whereas no such theory demonstrates that convergence can be achieved from an increasing number of shorter chains. It has sometimes been suggested that parallel computing can be adopted in MCMC by using multiple processors to obtain several short chains in parallel and then pooling the statistics computed from these short chains. In this case, caution is needed to make sure these short chains have converged to the stationary distribution. Combining several short chains will reduce the Monte Carlo variance of the computed quantities, but this may not yield statistics from the stationary distribution. Thus, to rapidly construct a single long enough chain in parallel is key to addressing the problem of the long computing time in Bayesian regression models. This is difficult because a Markov chain is an iterative process, and it can not be broken into several independent processes.
Parallel computing has been used in whole-genome prediction with Bayesian regression models by parallelizing the computations, including vector additions and dot products, associated with each marker at each step of the chain. To parallelize the computations, vectors are split up and additions or products are done in parallel on multiple processors [12,13]. These strategies can be used for most methods involving matrix or vector calculations. Speedup from these approaches, however, is limited, because marker effects still need to be sampled iteratively.
Another appealing approach is to parallelize the Gibbs sampling for all markers within each step of the chain. In a single-site Gibbs sampler, however, sampling of a marker effect is from the full conditional distribution, which is the conditional distribution of this marker effect given the current values of all the other markers. Thus, parallelizing Gibbs sampling would not be feasible unless the full conditional distributions do not depend on the values of the variables being conditioned on, i.e., unless the full-conditionals are independent. In this paper, we show how the full conditional distributions of the marker effects can be made independent within each step of the chain. This is done by augmenting the marker covariate matrix by adding p new rows such that columns of the augmented marker covariate matrix are mutually orthogonal. The phenotypes corresponding to the augmented rows of the marker covariate matrix are considered missing.
The objective of this paper is to propose a fast parallelized algorithm for Bayesian regression models that we call BayesXII, where "X" stands for Bayesian alphabet methods and "II" stands for "parallel". In this paper, the prior for BayesCπ , a Bayesian variable selection regression method, was used to demonstrate the BayesXII algorithm. Use of this approach with other priors, such as those in BayesA, BayesB or Bayesian Lasso, should be straightforward. Simulated data for 50,000 individuals with genotypes from a medium-level marker panel ( ∼ 50,000 markers) were used to compare the computational efficiency of the BayesXII algorithm with the conventional sampler for BayesCπ.

Bayesian linear regression models
In Bayesian linear regression models, for simplicity and without loss of generality, we assume that individuals have a general mean as the only fixed effect. Phenotypes of n genotyped individuals are then modeled as where y is the vector of n phenotypes, µ is the overall mean, X is the n × p marker covariate matrix (coded as 0, 1, 2), a is a vector of p random additive marker effects and e is a vector of n random residuals. A flat prior is used for µ . The prior for the residual e is e|σ 2 e ∼ N (0, Iσ 2 e ) with σ 2 e | ν e , S 2 e ∼ ν e S 2 e χ −2 ν e . The columns of X are usually centered prior to further computation. In BayesCπ , a Bayesian variable selection method, the prior for the marker effect is a mixture of a point mass at zero and a univariate normal distribution with null mean and a common locus variance σ 2 a with σ 2 a | ν a , S 2 a ∼ ν a S 2 a χ −2 ν a [3,4].

Gibbs sampling for marker effects in Bayesian regression models
In Gibbs sampling, the full conditional distribution of a j , the marker effect for locus j, when a j is non-zero, can be written as where ELSE stands for all the other parameters and y , X j is the jth column of X , and â j is the solution to The derivation of the full conditional distributions of other parameters of interest in Bayesian regression models are shown in Appendix.

Parallelized Bayesian regression models (BayesXII)
In commonly-used Gibbs sampling, the sample for each marker effect, a j , can not be obtained simultaneously, i.e., in parallel, because samples for other marker effects, a j ′ � =j , appear in the term j ′ � =j X T j X j ′ a j ′ on the right-hand-side of (2), i.e., the full conditional distributions of the marker effects are not independent. One solution is to orthogonalize columns of the marker covariate matrix X such that the term j ′ � =j X T j X j ′ a j ′ in (2) becomes zero. A data augmentation approach [14] to obtain a design matrix with orthogonal columns is described below.

Orthogonal data augmentation (ODA)
Let W o = 1 X be the incidence matrix for the Bayesian regression analysis. Following Ghosh et al. [14], we show where W a is a square matrix of dimension p + 1 and D is a diagonal matrix. Thus, and W a can be obtained using Cholesky decomposition (or Eigen decomposition) of, D − W T o W o , the righthand-side of (3). Our choice of D is Id , where I is an identity matrix and d is set to be the largest eigenvalue of W T o W o . In practice, a small value, e.g., 0.001, is added to d to avoid computationally unstable solutions. A small numerical example of ODA can be found in Appendix.

Gibbs sampling for marker effects in the BayesXII algorithm
Employing ODA, Bayesian linear regression models can be written as: where y denotes a vector of unobserved phenotypes that are introduced into the model, e e ∼ N 0, Iσ 2 e and J , X are obtained using (3) with W a = J X , and W o = 1 X . In the BayesXII algorithm, the full conditional distribution of a j when a j is non-zero, under model (4), which was derived in Appendix, can be written as: where the mean and variance parameters are free of the values of the other marker effects a j ′ � =j . Thus, the full conditional distributions of the marker effects are independent, and thus, samples for each marker can be obtained simultaneously and therefore in parallel. Note that X T j y does not change, and only X T j y needs to be computed at each step of the MCMC chain, where the number of operations for this is always of order p regardless of the size of n. In the BayesXII algorithm, sampling marker effects at each MCMC step, however, requires sampling of the vector y of unobserved phenotypes. At each step of the MCMC chain, each element of the "missing" phenotypes y is sampled from independent univariate normal distributions as: Note that the means of these normal distributions can be computed in parallel as described in Appendix. Once the means are computed, each element in y can be sampled in parallel. The derivation of the full conditional distributions of other parameters of interest in the BayesXII algorithm and parallel implementation of the BayesXII algorithm using Message Passing Interface (MPI) [15] are shown in Appendix.

Data analysis
A dataset of 60,000 observations with a medium-density marker panel was simulated using software XSim [16] to compare the BayesXII algorithm with the conventional sampler for BayesCπ . Publicly available genotypes for a medium-density marker panel ( ∼ 42,000 markers after quality control) were obtained for 100 German Holstein cattle (https ://datas hare.is.ed.ac.uk/handl e/10283 /3040). Then, haplotypes of these 100 individuals were estimated from their genotypic data using the software WinHap [17]. Starting from a base population of these 100 individuals, random mating was simulated for 100 generations, and continued for one more generation to increase the population size to 60,000 individuals, which were then used in the analysis. A random sample of five percent of the total number of loci were selected as quantitative trait loci (QTL), and their effects were sampled from a univariate normal distribution with mean zero and variance one. The QTL effects were scaled such that the genetic variance from the last generation was 1.0. A trait with a heritability 0.3 was simulated by adding independent residuals to the genetic values. In our analysis, a randomly sampled subset of 50,000 individuals was used for training and the remaining 10,000 individuals were used for testing.
This dataset was used to compare the BayesXII algorithm with the conventional sampler for BayesCπ . For both BayesXII algorithm and conventional sampler for BayesCπ , five Markov chains of length 100,000 were generated from five different sets of starting values for marker effects sampled from a normal distribution with null mean (6) and variance calculated as is the genetic variance, p i is the allele frequency for locus i, and π is the probability that a marker has a null effect. Prediction accuracies, which are calculated as the correlation between estimated breeding values and adjusted phenotypic values, in the testing population were used to compare these two methods. The correlation between estimated breeding values of these two methods for the testing population was investigated to: (1) confirm that the BayesXII algorithm can provide about the same prediction accuracy as the conventional sampler for BayesCπ ; and (2) quantify the relative convergence of the BayesXII algorithm and conventional sampler for BayesCπ . Our BayesXII algorithm was implemented using the Message Passing Interface (MPI) [15], which is a message-passing standard for distributed memory programming. The speed of our parallel implementation of the BayesXII algorithm was tested on a server with up to 24 nodes in the same rack, and there were 24 cores on each node.
The authors state that all data necessary for confirming the conclusions presented in this article are represented fully within the article.

Results
The speed of the BayesXII algorithm is shown in Table 1. The total runtime for the BayesXII algorithm was sped up nearly linearly by the number of computer processors. Using 576 processors (24 nodes with 24 cores on each node), the BayesXII algorithm required about 47 min to obtain samples for a chain of length 100,000. However, the conventional sampler for BayesCπ running on one node and one core required about 7900 minutes for the same chain length, which was about 170 times slower than the BayesXII algorithm.
Prediction accuracies were obtained from five chains of length 100,000 for each method. The potential scale reduction factor (PSRF) was used to diagnose the convergence of the Markov chain [18,19]. Using conventional

Discussion
Whole-genome Bayesian multiple regression methods are usually computationally intensive, where a MCMC chain comprising tens of thousands of steps is typically used for inference. In this paper, a strategy to parallelize Gibbs sampling for each marker within each step of the MCMC chain was proposed. This parallelization is accomplished using an orthogonal data augmentation strategy, where the marker covariate matrix is augmented with p new rows such that its columns are orthogonal. Then, the full conditional distributions of marker effects become independent within each step of the chain, and thus, samples of marker effects within each step can be drawn in parallel. Ideally, the BayesXII algorithm can be accelerated by k times, where k is the number of computer processors, up to p times, where p is the number of markers. In this paper, the full conditional distributions that are needed for BayesCπ with orthogonal data augmentation (BayesXII) were derived and the speed of the BayesXII algorithm was evaluated. In the simulation data with 50,000 individuals and a medium-density marker panel ( ∼ 50,000 markers), the BayesXII algorithm reached about the same accuracy as the conventional sampler for BayesCπ in less than 30 minutes on 24 nodes with 24 cores on each node. In this case, the BayesXII algorithm required one tenth of the computation time as conventional sampler for BayesCπ.

Marker effects
The time and space complexity for sampling marker effects, the most time-consuming task, in the BayesXII algorithm and conventional sampler for BayesCπ , are shown in Table 2, where the big O notation [20] is used. Time complexity represents the number of elementary operations, such as multiplication and addition, performed by an algorithm. Space complexity represents the amount of storage required by an algorithm. In Bayesian regression model implementations such as BayesCπ , the most time consuming task is sampling the marker effects from their full conditional distributions. The time complexities for two different computational approaches, BayesCπ -I and BayesCπ-II, for conventional sampler for BayesCπ are O(npt) and O(p 2 t) , for which the details are described in Appendix. In the BayesXII algorithm, however, the marker effects can be sampled in parallel within each step, using (5), and the time complexity is O(p 2 t/k) . Ideally, the computations at each step of the MCMC chain can be accelerated by k times, where k is the number of computer processors, up to p times, where p is the number of markers. In our simulated data, the speed performance of the two computational approaches for conventional sampler for BayesCπ should be similar and only the first approach is shown in "Results" section.

Variance components
After marker effects are sampled at each step of the MCMC, the computational complexity for sampling residual variances and marker effects variances in the BayesXII algorithm at each step are O(p) and O(p), which are negligible compared with the complexity of sampling marker effects. In conventional samplers for BayesCπ , the computational complexity for sampling residual variances at each step is O(n) for BayesCπ -I and O(p 2 ) for BayesCπ-II.

Building prerequisite matrices
The time complexities for computations to build prerequisite matrices that are done only once for the BayesCπ-I, BayesCπ-II and BayesXII algorithms on a single-core are O(np), O(np 2 ) and O(np 2 + p 3 ) . The computing time for these "only-once" computations, however, is trivial compared to that for sampling marker effects. For example, in the BayesXII algorithm, the time complexity for computation to build the augmented matrix W a as in (3) on a single core is O(np 2 + p 3 ) . In (3), the two tasks are: (1) computation of X T X , where X is an n × p matrix; and (2) Cholesky decomposition of a positive definite matrix of size p. Parallel computing approaches for the first of these two tasks is given in Appendix. The computing time for the Cholesky decomposition in the second task is relatively short, requiring about 5 minutes for p = 50, 000 on a workstation with 4 cores and 64G memory.

More complicated scenarios Larger sample size
In the BayesXII algorithm, the marker covariate matrix is augmented by adding p new rows such that its columns are orthogonal. Compared with the conventional sampler for BayesCπ , the p "missing" phenotypes are sampled at each step of the MCMC chain in the BayesXII algorithm. This makes the convergence of the BayesXII algorithm slower than for the conventional sampler. Thus, the ratio of the number p of markers and the number n of individuals may affect the convergence of the BayesXII algorithm. We simulated new datasets with 1000 markers and different number of individuals. A random sample of five percent of those 1000 markers were selected as QTL. Figure 1 shows the effect of sample size on the convergence of the BayesXII algorithm, where the number of steps required for the BayesXII algorithm to obtain similar estimated breeding value as conventional sampler for BayesCπ was shown for different n with p = 1000 . It can be seen that fewer iterations are needed as more individuals (n) become available. This is happening in genomic prediction as more individuals are genotyped.

High-density marker panel
In the BayesXII algorithm, the marker covariate matrix is augmented by adding a matrix X to make all columns of the augmented marker covariate matrix orthogonal to each other. However, with high-density marker panels (e.g., whole-genome sequence data), a large amount of memory will be used to store X , and the Cholesky decomposition of what may be a huge square matrix of size p is required, which can be very slow or infeasible. Thus, we propose another approach below. Markers are partitioned into small groups, and a square matrix X i is generated to orthogonalize columns of the marker covariate matrix in group i such that marker effects for markers in the same group can be sampled in parallel at each step. For example, for a marker covariate matrix partitioned into m groups, X = X 1 X 2 · · · X m , the augmented marker covariate matrix can be written as: where and D i is a diagonal matrix for group i. Note that the way in which markers are partitioned into small groups may affect the convergence.

Conclusions
A fast parallelized algorithm called BayesXII is proposed in this paper. Ideally, the computations at each step of the MCMC chain can be accelerated by k times, where k is the number of computer processors, up to p times, where p is the number of markers. In a simulation analysis with 50,000 individuals and a medium-density marker panel, the BayesXII algorithm reached about the same prediction accuracy as the conventional samplers for BayesCπ within one tenth of the computation time for the conventional sampler. In conclusion, we believe that the BayesXII algorithm is a practical alternative for accelerating Gibbs sampling for some applications of Bayesian regression models, such as those encountered in genomic prediction. Addressing the heavy computational burden associated with Bayesian regression models by parallel computing will lead to greater use of these models. where ELSE stands for all the other unknowns and y , X j is the jth column of X , and â j is the solution to: It is worth noting that two approaches are available to compute the right-hand-side of (8). The first approach, BayesCπ-I, corresponding to (8) is briefly described below.

Initially:
• Compute X T j X j with j = 1, 2, 3, . . . , p, • Compute y corr = y − 1µ − X j a j , which is the vector y corrected for all fixed and random effects, using their current values, and • Store X in the memory.
2. Then, for each locus at each step of MCMC: • Before sampling the marker effect, the right-handside of (8) is updated as then • once a new value of a j , denoted by a * j , is sampled, y corr is updated as The second approach, BayesCπ-II, corresponding to (9) is briefly described below.

Initially:
• Compute X T X , X T 1 and X T y , then, • Store these matrices in memory.

Then for each locus at each step of MCMC:
• Before sampling the marker effect, the right-handside is computed as where X T X :,j is the jth column of X T X and X T X j,j is the jth diagonal element of X T X . Note that the number of operations to sample marker effects for each locus at each step of the MCMC is of order n in the first approach and of order p for the second approach. Thus, the computational complexities for sampling marker effects in conventional sampler for BayesCπ are O(npt) for the first approach and O(p 2 t) for the the second approach. The space complexities are O(np) for storing X in the first approach and O(p 2 ) for storing X T X in the second approach.

Single-site Gibbs sampler for the BayesXII algorithm Full conditional distribution of the marker effect
The full conditional distribution of a j in the BayesXII algorithm, which is shown below, can be obtained from (7) and (8) Thus, the full conditional distribution of a j can be written as: Note that the number of operations is of order p for sampling marker effects, thus the computational complexity is O(p 2 t) . The space complexity is O(p 2 ) for X. Detailed derivation of the full conditional distribution of the indicator variable δ j indicating if a j had a normal distribution ( δ j = 1 ) or if it is null ( δ j = 0 ) in the conventional sampler for BayesCπ is also in Fernando and Garrick [21]. The full conditional distribution of δ j in conventional sampler for BayesCπ is: where f 1 r j | σ 2 a , σ 2 e is a univariate normal distribution with and f 0 r j | σ 2 e is a univariate normal distribution with and (10) The full conditional distribution of δ j in the BayesXII algorithm, which is shown below, can be obtained from (11) by replacing y with y y , 1 with 1 J and X with X X .
Thus, (11) for the BayesXII algorithm can be simplified as: where f 1 r j | σ 2 a , σ 2 e is a univariate normal with and f 0 r j | σ 2 e is a univariate normal with and

Full conditional distributions of the unobserved phenotypes
The full conditional distribution of y can be written as:

Full conditional distributions of other unknowns
The derivation of the full conditional distributions of other parameters such as µ , σ 2 a , σ 2 e , π are straightforward. Thus they are presented as below without derivations.
where y corr = y y − 1 J µ − X X a , k is the number of markers in the model at each step. Note that y T corr y corr can be written as: f y | a, µ, σ 2 e , σ 2 a , y = f y | a, µ, σ 2 e ∝ N Jµ + Xa, Iσ 2 e .

Parallel implementation of the BayesXII algorithm
As shown above, in the BayesXII algorithm, all marker effects can be sampled simultaneously in parallel within each step of the chain. Given k available computer processes, markers can be split into k groups, and each computer process can be used to sample marker effects in the corresponding group. A graph to demonstrate such a parallel implementation for sampling marker effects at each step is shown in Fig. 2.

Parallel computing of Xa
In many modern programming languages, such as R, Python and Julia, libraries are available to take advantage of multiple processors and GPUs for parallel computing of many matrix or vector operations. The descriptions given below are only to illustrate the main principle underlying parallel computing, which involves distributing calculations across processors. Actual implementations may be different and will depend on the programming language, the library and the hardware used.
To sample the unobserved phenotypic values using (6), a matrix by vector product Xa is needed. Here we describe how parallel computing can be used to compute the product of a matrix X by a vector a .
Note that the same strategy can also be used to calculate X T y by partitioning X by rows.

Parallel computing of X T X
In (3), computation of X T X is needed. Here we describe how parallel computing can be used to compute X T X , where X is a n × p matrix.
Note that once X T X has been built, it can be updated as X T X + X T new X new when new observations X new of size n new × p are available, for which the computation complexity is O(p 2 n new ) . The computing time is trivial since n new is usually small. In addition to the benefit of reducing the computing time, this approach can also address the limitation that X may be too large to be stored on a single computing node ( n ≫ p ) by distributing the X (i) across several nodes.

A numerical example of ODA
A small numerical example of ODA is provided here to illustrate the orthogonal data augmentation (ODA). Assuming there are only three individuals and five markers, the centered 3 × 5 marker covariate matrix is:  The d is set to be the largest eigenvalue of W T o W o , which is about 4.55 in this example.
In (3), W T a W a can be calculated as: In this example, W a is obtained using Cholesky decomposition of W T a W a , and W a is J equals the first column of W a , and X is the rest of columns of W a since W a = J X .

Prediction accuracy of the BayesXII algorithm
Prediction accuracies of the BayesXII algorithm for BayesCπ were obtained from five chains of length 100,000. In Fig. 3, the blue solid line is the average of prediction accuracies from those five chains, and the black dashed line is the prediction accuracy obtained from the converged conventional sampler. In summary, the BayesXII algorithm could provide about the same prediction accuracy as the conventional sampler for BayesCπ when the chain was of length 50,500.