### Appendix

### Two computational approaches for the conventional sampler for BayesC\(\pi \)

Detailed derivation of the full conditional distributions of the marker effect for locus *j* in conventional sampler for BayesC\(\pi \) is in Fernando and Garrick [21]. As shown in [21], the full conditional distribution of \(a_{j}\) in BayesC\(\pi \), when \(a_{j}\) is non-zero, is

$$ \left( {a_{j} \left| {ELSE} \right.} \right) \sim N\left( {\hat{a}_{j} ,\frac{{\sigma _{e}^{2} }}{{{\mathbf{X}}_{j}^{T} {\mathbf{X}}_{{\text{j}}} + \frac{{\sigma _{e}^{2} }}{{\sigma _{a}^{2} }}}}} \right) $$

(7)

where *ELSE* stands for all the other unknowns and \({\mathbf {y}}\), \({\mathbf {X}}_{j}\) is the *j*th column of \({\mathbf {X}}\), and \(\hat{a_{j}}\) is the solution to:

$$ \left( {{\bf{X}}_{j}^{T} {\bf{X}}_{j} + \frac{{\sigma _{e}^{2} }}{{\sigma _{a}^{2} }}} \right)\hat{a}_{j} = {{\bf X}}_{j}^{T} \left( {{\bf y} - {\mathbf{1}}\mu - \sum\limits_{{j^{\prime} \ne j}} {{\bf X}_{{j^{\prime}}} a_{j^{\prime}} } } \right) $$

(8)

$$= {\mathbf {X}}_{j}^{T}{\mathbf {y}}-{\mathbf {X}}_{j}^{T}{\mathbf {1}}\mu -\sum _{j^{\prime}\ne j}{\mathbf {X}}_{j}^{T}{\mathbf {X}}_{j^{\prime}}a_{j^{\prime}}. $$

(9)

It is worth noting that two approaches are available to compute the right-hand-side of (8). The first approach, BayesC\(\pi \)-I, corresponding to (8) is briefly described below.

- 1.
Initially:

Compute \({\mathbf {X}}_{j}^{T}{\mathbf {X}}_{j}\) with \(j=1,2,3,\ldots ,p,\)

Compute \({\mathbf {y}}_{corr}={\mathbf {y}}-{\mathbf {1}}\mu -\sum {\mathbf {X}}_{j} a_{j}\), which is the vector \({\mathbf {y}}\) corrected for all fixed and random effects, using their current values, and

Store \({\mathbf {X}}\) in the memory.

- 2.
Then, for each locus at each step of MCMC:

$$ {\mathbf {X}}_{j}^{T}\left( {\mathbf {y}}-{\mathbf {1}}\mu -\sum _{j^{\prime}\ne j}{\mathbf {X}}_{j^{\prime}}a_{j^{\prime}}\right) ={\mathbf {X}}_{j}^{T}{\mathbf {y}}_{corr}+{\mathbf {X}}_{j}^{T}{\mathbf {X}}_{j}a_{j}, $$

then

once a new value of \(a_j\), denoted by \(a_j^{*}\), is sampled, \({\mathbf {y}}_{corr}\) is updated as

$$ {\mathbf {y}} _{corr}= {\mathbf {y}}_{corr}+{\mathbf {X}}_{j}^{T}\left( a_j-a_j^{*}\right) . $$

The second approach, BayesC\(\pi \)-II, corresponding to (9) is briefly described below.

- 1.
Initially:

Compute \({\mathbf {X}}^{T}{\mathbf {X}}\), \({\mathbf {X}}^{T}{\mathbf {1}}\) and \({\mathbf {X}}^{T}{\mathbf {y}}\), then,

Store these matrices in memory.

- 2.
Then for each locus at each step of MCMC:

$$ \left( {\mathbf {X}}^{T}{\mathbf {y}}\right) _{j}-\left( {\mathbf {X}}^{T}{\mathbf {1}}\right) _{j}\mu -\left( {\mathbf {X}}^{T}{\mathbf {X}}\right) _{:,j}{\mathbf {a}}+\left( {\mathbf {X}}^{T}{\mathbf {X}}\right) _{j,j}a_{j}, $$

where \(\left( {\mathbf {X}}^{T}{\mathbf {X}}\right) _{:,j}\) is the *j*th column of \({\mathbf {X}}^{T}{\mathbf {X}}\) and \(\left( {\mathbf {X}}^{T}{\mathbf {X}}\right) _{j,j}\) is the jth diagonal element of \({\mathbf {X}}^{T}{\mathbf {X}}\). Note that the number of operations to sample marker effects for each locus at each step of the MCMC is of order *n* in the first approach and of order *p* for the second approach. Thus, the computational complexities for sampling marker effects in conventional sampler for BayesC\(\pi \) are *O*(*npt*) for the first approach and \(O(p^2t)\) for the the second approach. The space complexities are *O*(*np*) for storing \({\mathbf {X}}\) in the first approach and \(O(p^2)\) for storing \({\mathbf {X}}^{T}{\mathbf {X}}\) in the second approach.

### Single-site Gibbs sampler for the BayesXII algorithm

#### Full conditional distribution of the marker effect

The full conditional distribution of \(a_{j}\) in the BayesXII algorithm, which is shown below, can be obtained from (7) and (8) by replacing \(\mathbf {{y}}\) with \(\begin{bmatrix}{\mathbf {y}}\\ \widetilde{{\mathbf {y}}} \end{bmatrix}\), \(\mathbf {{1}}\) with \(\begin{bmatrix}{\mathbf {1}}\\ \widetilde{{\mathbf {J}}} \end{bmatrix}\) and \(\mathbf {{X}}\) with \(\begin{bmatrix}{\mathbf {X}}\\ \widetilde{{\mathbf {X}}} \end{bmatrix}\). Note that columns of the augmented covariate matrix \(\begin{bmatrix}{\mathbf {1}} &{} {\mathbf {X}}\\ \widetilde{{\mathbf {J}}} &{} \widetilde{{\mathbf {X}}} \end{bmatrix}\) are orthogonal. Thus, (7) for the BayesXII algorithm can be simplified as:

$$\begin{aligned} \left( \begin{bmatrix}{\mathbf {X}}_{j}^{T}\widetilde{{\mathbf {X}}}_{j}^{T}\end{bmatrix}\begin{bmatrix}{\mathbf {X}}_{j}\\ \widetilde{{\mathbf {X}}}_{j} \end{bmatrix}+\frac{\sigma _{e}^{2}}{\sigma _{a}^{2}}\right) \hat{a_{j}}= & {} \begin{bmatrix}{\mathbf {X}}_{j}^{T}\widetilde{{\mathbf {X}}}_{j}^{T}\end{bmatrix}\left( \mathbf {\begin{bmatrix}{\mathbf {y}}\\ \widetilde{{\mathbf {y}}} \end{bmatrix}}-\begin{bmatrix}{\mathbf {1}}\\ \widetilde{{\mathbf {J}}} \end{bmatrix}\mu - \sum _{j^{\prime}\ne j}\begin{bmatrix}{\mathbf {X}}_{j^{\prime}}\\ \widetilde{{\mathbf {X}}}_{j^{\prime}} \end{bmatrix}a_{j^{\prime}}\right) \nonumber \\ \left( \begin{bmatrix}{\mathbf {X}}_{j}^{T}\widetilde{{\mathbf {X}}}_{j}^{T}\end{bmatrix}\begin{bmatrix}{\mathbf {X}}_{j}\\ \widetilde{{\mathbf {X}}}_{j} \end{bmatrix}+\frac{\sigma _{e}^{2}}{\sigma _{a}^{2}}\right) \hat{a_{j}}= & {} \begin{bmatrix}{\mathbf {X}}_{j}^{T}\widetilde{{\mathbf {X}}}_{j}^{T}\end{bmatrix}\mathbf {\begin{bmatrix}{\mathbf {y}}\\ \widetilde{{\mathbf {y}}} \end{bmatrix}}-\begin{bmatrix}{\mathbf {X}}_{j}^{T}\widetilde{{\mathbf {X}}}_{j}^{T}\end{bmatrix}\begin{bmatrix}{\mathbf {1}}\\ \widetilde{{\mathbf {J}}} \end{bmatrix}\mu - \nonumber \\&\sum _{j^{\prime}\ne j}\begin{bmatrix}{\mathbf {X}}_{j}^{T}\widetilde{{\mathbf {X}}}_{j}^{T}\end{bmatrix}\begin{bmatrix}{\mathbf {X}}_{j^{\prime}}\\ \widetilde{{\mathbf {X}}}_{j^{\prime}} \end{bmatrix}a_{j^{\prime}}\nonumber \\ \left( d+\frac{\sigma _{e}^{2}}{\sigma _{a}^{2}}\right) \hat{a_{j}}= & {} \begin{bmatrix}{\mathbf {X}}_{j}^{T}\widetilde{{\mathbf {X}}}_{j}^{T}\end{bmatrix}\mathbf {\begin{bmatrix}{\mathbf {y}}\\ \widetilde{{\mathbf {y}}} \end{bmatrix}}\nonumber \\ \hat{a_{j}}= & {} \frac{{\mathbf {X}}_{j}^{T}{\mathbf {y}}+\widetilde{{\mathbf {X}}}_{j}^{T}\widetilde{{\mathbf {y}}}}{d+\frac{\sigma _{e}^{2}}{\sigma _{a}^{2}}}. \end{aligned}$$

(10)

Thus, the full conditional distribution of \(a_{j}\) can be written as:

$$ \left( a_{j}\mid ELSE\right) \sim N\left( \frac{{\mathbf {X}}_{j}^{T}{\mathbf {y}}+\widetilde{{\mathbf {X}}}_{j}^{T}\widetilde{{\mathbf {y}}}}{d+\frac{\sigma _{e}^{2}}{\sigma _{a}^{2}}},\frac{\sigma _{e}^{2}}{d+\frac{\sigma _{e}^{2}}{\sigma _{a}^{2}}}\right) . $$

Note that the number of operations is of order *p* for sampling marker effects, thus the computational complexity is \(O(p^2t)\). The space complexity is \(O(p^2)\) for \(\widetilde{{\mathbf {X}}}\).

Detailed derivation of the full conditional distribution of the indicator variable \(\delta _{j}\) indicating if \(a_{j}\) had a normal distribution (\(\delta _{j}=1\)) or if it is null (\(\delta _{j}=0\)) in the conventional sampler for BayesC\(\pi \) is also in Fernando and Garrick [21]. The full conditional distribution of \(\delta _{j}\) in conventional sampler for BayesC\(\pi \) is:

$$\begin{aligned}&Pr\left( \delta _{j}=1\mid ELSE\right) \nonumber \\&\quad =\frac{f_{1}\left( r_{j}\mid \sigma _{a}^{2},\sigma _{e}^{2}\right) Pr\left( \delta _{j}=1\right) }{f_{0}\left( r_{j}\mid \sigma _{e}^{2}\right) Pr\left( \delta _{j}=0\right) +f_{1}\left( r_{j}\mid \sigma _{a}^{2},\sigma _{e}^{2}\right) Pr\left( \delta _{j}=1\right) }, \end{aligned}$$

(11)

where \(f_{1}\left( r_{j}\mid \sigma _{a}^{2},\sigma _{e}^{2}\right) \) is a univariate normal distribution with

$$ E\left( r_{j}\mid \sigma _{a}^{2},\sigma _{e}^{2}\right) =0,Var\left( r_{j}\mid \sigma _{a}^{2},\sigma _{e}^{2}\right) =\left( {\mathbf {X}}_{j}^{T}{\mathbf {X}}_{j}\right) ^{2}\sigma _{a}^{2}+{\mathbf {X}}_{j}^{T}{\mathbf {X}}_{j}\sigma _{e}^{2} , $$

and \(f_{0}\left( r_{j}\mid \sigma _{e}^{2}\right) \) is a univariate normal distribution with

$$ E\left( r_{j}\mid \sigma _{e}^{2}\right) =0, Var\left( r_{j}\mid \sigma _{e}^{2}\right) ={\mathbf {X}}_{j}^{T}{\mathbf {X}}_{j}\sigma _{e}^{2}, $$

and

$$ r_{j}={\mathbf {X}}_{j}^{T}\left( {\mathbf {y}}-{\mathbf {1}}\mu -\sum _{j^{\prime}\ne j}{\mathbf {X}}_{j^{\prime}}a_{j^{\prime}}\right) .$$

The full conditional distribution of \(\delta _{j}\) in the BayesXII algorithm, which is shown below, can be obtained from (11) by replacing \(\mathbf {{y}}\) with \(\begin{bmatrix}{\mathbf {y}}\\ \widetilde{{\mathbf {y}}} \end{bmatrix}\), \(\mathbf {{1}}\) with \(\begin{bmatrix}{\mathbf {1}}\\ \widetilde{{\mathbf {J}}} \end{bmatrix}\) and \(\mathbf {{X}}\) with \(\begin{bmatrix}{\mathbf {X}}\\ \widetilde{{\mathbf {X}}} \end{bmatrix}\). Thus, (11) for the BayesXII algorithm can be simplified as:

$$\begin{aligned}&Pr\left( \delta _{j}=1\mid ELSE\right) \\&\quad =\frac{f_{1}\left( r_{j}\mid \sigma _{a}^{2},\sigma _{e}^{2}\right) Pr\left( \delta _{j}=1\right) }{f_{0}\left( r_{j}\mid \sigma _{e}^{2}\right) Pr\left( \delta _{j}=0\right) +f_{1}\left( r_{j}\mid \sigma _{a}^{2},\sigma _{e}^{2}\right) Pr\left( \delta _{j}=1\right) }, \end{aligned}$$

where \(f_{1}\left( r_{j}\mid \sigma _{a}^{2},\sigma _{e}^{2}\right) \) is a univariate normal with

$$\begin{aligned} E\left( r_{j}\mid \sigma _{a}^{2},\sigma _{e}^{2}\right) =0,&Var\left( r_{j}\mid \sigma _{a}^{2},\sigma _{e}^{2}\right) =d^{2}\sigma _{a}^{2}+d\sigma _{e}^{2}, \end{aligned}$$

and \(f_{0}\left( r_{j}\mid \sigma _{e}^{2}\right) \) is a univariate normal with

$$\begin{aligned} E\left( r_{j}\mid \sigma _{e}^{2}\right) =0,&Var\left( r_{j}\mid \sigma _{e}^{2}\right) =d\sigma _{e}^{2}, \end{aligned}$$

and

$$\begin{aligned} r_{j}={\mathbf {X}}_{j}^{T}{\mathbf {y}}+\widetilde{{\mathbf {X}}}_{j}^{T}\widetilde{{\mathbf {y}}}. \end{aligned}$$

#### Full conditional distributions of the unobserved phenotypes

The full conditional distribution of \(\widetilde{{\mathbf {y}}}\) can be written as:

$$\begin{aligned} f\left( \widetilde{{\mathbf {y}}}\mid {\mathbf {a}},\mu ,\sigma _{e}^{2},\sigma _{a}^{2},{\mathbf {y}}\right)&=f\left( \widetilde{{\mathbf {y}}}\mid {\mathbf {a}},\mu ,\sigma _{e}^{2}\right) \\&\propto N\left( \widetilde{{\mathbf {J}}}\mu +\widetilde{{\mathbf {X}}}{\mathbf {a}},{\mathbf {I}}\sigma _{e}^{2}\right) . \end{aligned}$$

#### Full conditional distributions of other unknowns

The derivation of the full conditional distributions of other parameters such as \(\mu \), \(\sigma _{a}^{2}\), \(\sigma _{e}^{2}\), \(\pi \) are straightforward. Thus they are presented as below without derivations.

$$\begin{aligned} \left( \mu \mid ELSE\right)&\sim N\left( \frac{{\mathbf {1}}^{T}{\mathbf {y}}+\widetilde{{\mathbf {J}}}^{T}\widetilde{{\mathbf {y}}}}{d},\frac{\sigma _{e}^{2}}{d}\right) ,\\ \left( \sigma _{a}^{2}\mid ELSE\right)&\sim \left( {\mathbf {a}}^{T}{\mathbf {a}}+\nu _{a}S_{a}^{2}\right) \chi _{k+\nu _{a}}^{-2},\\ \left( \sigma _{e}^{2}\mid ELSE\right)&\sim \left( {\mathbf {y}}_{corr}^{T}{\mathbf {y}}_{corr}+\nu _{e}S_{e}^{2}\right) \chi _{n+p+\nu _{e}}^{-2},\\ \left( \pi \mid ELSE\right)&\sim Beta(p - k + 1,k + 1), \end{aligned}$$

where \({\mathbf {y}}_{corr}=\mathbf {\begin{bmatrix}{\mathbf {y}}\\ \widetilde{{\mathbf {y}}} \end{bmatrix}}-\begin{bmatrix}{\mathbf {1}}\\ \widetilde{{\mathbf {J}}} \end{bmatrix}\mu -\begin{bmatrix}{\mathbf {X}}\\ \widetilde{{\mathbf {X}}} \end{bmatrix}{\mathbf {a}}\), *k* is the number of markers in the model at each step. Note that \({\mathbf {y}}_{corr}^{T}{\mathbf {y}}_{corr}\) can be written as:

$$\begin{aligned}&\left( \mathbf {\begin{bmatrix}{\mathbf {y}}\\ \widetilde{{\mathbf {y}}} \end{bmatrix}}-\begin{bmatrix}{\mathbf {1}}\\ \widetilde{{\mathbf {J}}} \end{bmatrix}\mu -\begin{bmatrix}{\mathbf {X}}\\ \widetilde{{\mathbf {X}}} \end{bmatrix}{\mathbf {a}}\right) ^{T}\left( \mathbf {\begin{bmatrix}{\mathbf {y}}\\ \widetilde{{\mathbf {y}}} \end{bmatrix}}-\begin{bmatrix}{\mathbf {1}}\\ \widetilde{{\mathbf {J}}} \end{bmatrix}\mu -\begin{bmatrix}{\mathbf {X}}\\ \widetilde{{\mathbf {X}}} \end{bmatrix}{\mathbf {a}}\right) \\ & ={\mathbf {y}}^{T}{\mathbf {y}}+\widetilde{{\mathbf {y}}}^{T}\widetilde{{\mathbf {y}}}+d\mu ^{2}+d{\mathbf {a}}^{T}{\mathbf {a}} -2\mu \left( {\mathbf {1}}^{T}{\mathbf {y}}+\widetilde{{\mathbf {J}}}^{T}\widetilde{{\mathbf {y}}}\right) -2{\mathbf {a}}^{T}\left( {\mathbf {X}}^{T}{\mathbf {y}}+\widetilde{{\mathbf {X}}}^{T}\widetilde{{\mathbf {y}}}\right) . \end{aligned}$$

### Parallel implementation of the BayesXII algorithm

As shown above, in the BayesXII algorithm, all marker effects can be sampled simultaneously in parallel within each step of the chain. Given *k* available computer processes, markers can be split into *k* groups, and each computer process can be used to sample marker effects in the corresponding group. A graph to demonstrate such a parallel implementation for sampling marker effects at each step is shown in Fig. 2.

#### Parallel computing of \(\widetilde{{\mathbf {X}}}{\mathbf {a}}\)

In many modern programming languages, such as R, Python and Julia, libraries are available to take advantage of multiple processors and GPUs for parallel computing of many matrix or vector operations. The descriptions given below are only to illustrate the main principle underlying parallel computing, which involves distributing calculations across processors. Actual implementations may be different and will depend on the programming language, the library and the hardware used.

To sample the unobserved phenotypic values using (6), a matrix by vector product \(\widetilde{{\mathbf {X}}}{\mathbf {a}}\) is needed. Here we describe how parallel computing can be used to compute the product of a matrix \(\widetilde{{\mathbf {X}}}\) by a vector \({\mathbf {a}}\).

- 1
Partition \(\widetilde{{\mathbf {X}}}\) of size \(n\times p\) by columns into smaller submatrices \(\widetilde{{\mathbf {X}}}_{(1)},\widetilde{{\mathbf {X}}}_{(2)},\widetilde{{\mathbf {X}}} _{(3)},\ldots \)of size \(n\times p_{i}\), and partition \({\mathbf {a}}\) into smaller subvectors \({\mathbf {a}}_{(1)},{\mathbf {a}}_{(2)},{\mathbf {a}}_{(3)},\ldots \) of length \(p_{i}\) with \(\sum p_{i}=p\).

- 2
Compute \(\widetilde{{\mathbf {X}}}{\mathbf {a}}\) as \(\widetilde{{\mathbf {X}}}_{(1)}{\mathbf {a}}_{(1)}+\widetilde{{\mathbf {X}}}_{(2)}{\mathbf {a}}_{(2)}+\widetilde{{\mathbf {X}}}_{(3)}{\mathbf {a}}_{(3)}+\ldots \), where \(\widetilde{{\mathbf {X}}}_{(i)}{\mathbf {a}}_{(i)}\) for \(i=1,2,\ldots \) are computed on different processors and then summed to obtain \(\widetilde{{\mathbf {X}}}{\mathbf {a}}\).

Note that the same strategy can also be used to calculate \({\mathbf {X}}^{T}{\mathbf {y}}\) by partitioning \({\mathbf {X}}\) by rows.

#### Parallel computing of \({\mathbf {X}}^{T}{\mathbf {X}}\)

In (3), computation of \({\mathbf {X}}^{T}{\mathbf {X}}\) is needed. Here we describe how parallel computing can be used to compute \({\mathbf {X}}^{T}{\mathbf {X}}\), where \({\mathbf {X}}\) is a \(n\times p\) matrix.

- 1
Partition \({\mathbf {X}}\) of size \(n\times p\) by rows into smaller submatrices \({\mathbf {X}}_{(1)},{\mathbf {X}}_{(2)},{\mathbf {X}}_{(3)},\ldots \)of size \(n_{i}\times p\) with \(\sum n_{i}=n\).

- 2
Compute \({\mathbf {X}}^{T}{\mathbf {X}}={\mathbf {X}}_{(1)}^{T}{\mathbf {X}}_{(1)} + {\mathbf {X}}_{(2)}^{T}{\mathbf {X}}_{(2)}+\ldots \), where \({\mathbf {X}}_{(j)}^{T}{\mathbf {X}}_{(j)}\) for \(j=1,2,\ldots \) are computed on different processors and then summed to obtain \({\mathbf {X}}^{T}{\mathbf {X}}\).

Note that once \({\mathbf {X}}^{T}{\mathbf {X}}\) has been built, it can be updated as \({\mathbf {X}}^{T}{\mathbf {X}}+{\mathbf {X}}_{new}^{T}{\mathbf {X}}_{new}\) when new observations \({\mathbf {X}}_{new}\) of size \(n_{new}\times p\) are available, for which the computation complexity is \(O(p^2n_{new})\). The computing time is trivial since \(n_{new}\) is usually small. In addition to the benefit of reducing the computing time, this approach can also address the limitation that \({\mathbf {X}}\) may be too large to be stored on a single computing node (\(n\gg p\)) by distributing the \({\mathbf {X}}_{(i)}\) across several nodes.

### A numerical example of ODA

A small numerical example of ODA is provided here to illustrate the orthogonal data augmentation (ODA). Assuming there are only three individuals and five markers, the centered \(3 \times 5\) marker covariate matrix is:

$$\begin{aligned} {\mathbf {X}}=\begin{bmatrix} -0.33 &{}-0.33 &{} 0.67&{} -0.33&{} -1.33\\ 0.67 &{}-0.33 &{} 0.67 &{} 0.67&{} 0.67\\ -0.33 &{}0.67 &{} -1.33 &{}-0.33&{} 0.67 \end{bmatrix}. \end{aligned}$$

Then, the \({\mathbf {W}}_{o}\) is

$$\begin{aligned} {\mathbf {W}}_{o}=\begin{bmatrix}{\mathbf {1}}&{\mathbf {X}}\end{bmatrix}= \begin{bmatrix} 1 &{} -0.33 &{}-0.33 &{} 0.67&{} -0.33&{} -1.33\\ 1 &{} 0.67 &{}-0.33 &{} 0.67 &{} 0.67&{} 0.67\\ 1 &{} -0.33 &{}0.67 &{} -1.33 &{}-0.33&{} 0.67 \end{bmatrix}. \end{aligned}$$

The *d* is set to be the largest eigenvalue of \({\mathbf {W}}_{o}^{T}{\mathbf {W}}_{o}\), which is about 4.55 in this example.

In (3), \({\mathbf {W}}_{a}^{T}{\mathbf {W}}_{a}\) can be calculated as:

$$\begin{aligned} {\mathbf {W}}_{a}^{T}{\mathbf {W}}_{a}&={\mathbf {D}}-{\mathbf {W}}_{o}^{T}{\mathbf {W}}_{o} \\ & ={\mathbf {I}}d-{\mathbf {W}}_{o}^{T}{\mathbf {W}}_{o} \\ & =\begin{bmatrix} 1.55 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ 0.0 & 3.88 & 0.33 & -0.67 & -0.67 & -0.67\\ 0.0 & 0.33 & 3.88 & 1.33 & 0.33 & -0.67\\ 0.0 & -0.67 & 1.33 & 1.88 & -0.67 & 1.33\\ 0.0 & -0.67 & 0.33 & -0.67 & 3.88 & -0.67\\ 0.0 & -0.67 & -0.67 & 1.33 & -0.67 & 1.88 \end{bmatrix}. \end{aligned}$$

In this example, \({\mathbf {W}}_a\) is obtained using Cholesky decomposition of \({\mathbf {W}}_{a}^{T}{\mathbf {W}}_{a}\), and \({\mathbf {W}}_{a}\) is

$$\begin{aligned} {\mathbf {W}}_{a} = \begin{bmatrix} 1.24 &{} 0.0 &{} 0.0 &{} 0.0 &{} 0.0 &{} 0.0 \\ 0.0 &{} 1.97 &{} 0.17 &{} -0.34 &{} -0.34&{} -0.34\\ 0.0 &{} 0.0 &{} 1.96 &{} 0.71 &{} 0.2 &{} -0.31\\ 0.0 &{} 0.0 &{} 0.0 &{} 1.13 &{} -0.82 &{} 1.28\\ 0.0 &{} 0.0 &{} 0.0 &{} 0.0 &{} 1.75 &{} 0.19\\ 0.0 &{} 0.0 &{} 0.0 &{} 0.0 &{} 0.0 &{} 0.05 \end{bmatrix}. \end{aligned}$$

\(\widetilde{{\mathbf {J}}}\) equals the first column of \({\mathbf {W}}_{a}\), and \(\widetilde{{\mathbf {X}}}\) is the rest of columns of \({\mathbf {W}}_{a}\) since \({\mathbf {W}}_{a}=\begin{bmatrix}\widetilde{{\mathbf {J}}}&\widetilde{{\mathbf {X}}}\end{bmatrix}\).

### Prediction accuracy of the BayesXII algorithm

Prediction accuracies of the BayesXII algorithm for BayesC\(\pi \) were obtained from five chains of length 100,000. In Fig. 3, the blue solid line is the average of prediction accuracies from those five chains, and the black dashed line is the prediction accuracy obtained from the converged conventional sampler. In summary, the BayesXII algorithm could provide about the same prediction accuracy as the conventional sampler for BayesC\(\pi \) when the chain was of length 50,500.