On a multivariate implementation of the Gibbs sampler

Il est bien etabli que, lorsque les parametres d'un modele sont correles, l'estimation de ces parametres par echantillonnage de Gibbs converge lentement lorsque les composantes du modele sont traitees separement. Mais, si l'echantillonnage de Gibbs est conduit en fixant des valeurs pour les parametres correles et en echantillonnant dans les distributions conditionnelles respectives, la convergence est plus rapide et les variances de Monte-Carlo des caracteristiques des distributions sont diminuees pour une chaine de longueur donnee. Cet echantillonnage multidimensionnel et non plus scalaire requiert souvent l'inversion de matrices de grande taille. Cette note presente une methode d'echantillonnage en bloc de ce type qui evite le passage par ces inverses. L'algorithme s'applique dans le contexte d'un modele gaussien et est illustre numeriquement sur un petit echantillon de donnees simulees.


INTRODUCTION
The Gibbs sampler is a numerical technique that has received considerable attention in statistics and animal breeding. It is particularly useful in the solution of high dimensional integrations and has therefore been applied in likelihood and Bayesian inference problems in a wide variety of models.
The Gibbs sampler produces realizations from a joint posterior distribution by sampling repeatedly from the full conditional posterior distributions of the parameters of the model. In theory, drawing from the joint posterior density takes place only in the limit, as the number of drawings becomes infinite. The study of convergence to the appropriate distributions is still an active area of research, but it is well established that convergence can be slow when highly correlated scalar components are treated individually (Smith and Roberts, 1993). In such cases it is preferable to block the scalars and to perform sampling from multivariate conditional distributions. Liu et al (1994) show that a block sampling strategy can lead to considerably smaller Monte-Carlo variances of estimates of features of posterior distributions.
In animal breeding applications this block sampling strategy can be difficult to implement because repeated inversions and factorizations of very large matrices are required in order to perform the multivariate sampling. The purpose of this note is to describe a block sampling computer strategy which does not require knowledge of the inverse of these matrices. The results of applying the method are illustrated using a small simulated data set.

THE MODEL
Let y, a and b represent vectors of data (order n), of additive genetic values (order q) and of fixed effects (order p), respectively, and let X and Z be design matrices associating the data with the additive genetic values and fixed effects, respectively. We will assume that the data are conditionally normally distributed, that is: where Q e is the residual variance. Invoking an infinitesimal additive genetic model, the distribution of additive genetic values is also normal: where A is the known numerator relationship matrix and o,2 is the additive genetic variance. For illustration purposes we will assume that the vector of fixed effects b, and of the variance components Q a and Q e are all a priori independent and that they follow proper uniform distributions. Under the model, the joint posterior distribution of the parameters is: The scalar implementation of the Gibbs sampler consists of deriving from [3] the full conditional posterior distributions of the scalar parameters pertaining to the model (eg, Wang et al, 1994). where 9 = E(6!, af, y) satisfies the linear system [4], and C-l (j; = Var(0 )a£ , !e!y)! In [6], S a = a'A-1 a; v a = q -2; S e = (y-Xb-Za)'(y-Xb-Za); ! = n -2; and X ;;, 2 is an inverse chi square variate with Vi degrees of freedom. In order to sample the whole vector 0 simultaneously from [5] without involving the inverse of the coefficient matrix C, we draw from ideas in Garcia-Cortes et al (1992,1995).
Given starting values for Q a, Q e, apply the method of composition (eg, Tanner, 1993) to solve the following equation: To obtain samples from p(y!,o!), draw 0 * from p (e l(j!(j!) and y * from p(y!e*,!,er!). Then (y * , 0 * ) is a drawing from p (y, 0)a£, af) and (y * ) from p (YI!a! !e!' Further, the pair (y * , 0 * ) can also be viewed as a realized value from p ( 0 ) a£ , af , y * ) and from p( YI 0 * , Q a, Q e J' By analogy with the identity: we define the random variable: where the random variable 0 in [9] has density p(o 10,2, a U2, e y*). The expected value of 0 in [9] with respect to p(o 10, 2 , U2 , y * ) is equal to E ( 0 ) a£ , 0 , 2 , y) and the variance is C-l (j;, independent_of y * . In addition, the random variable 0 in [9] is normally distributed; therefore 0 in [9] and 0 in [8] have the same density p!0!!a,ae,Y!, which is the conditional posterior distribution of interest. Using [4] and !5!, and replacing the random variable 0 in [9] by its realized values 0 * , it follows that !* drawings from this conditional posterior distribution, which we denote A , can be constructed by solving the linear system: A wide variety of efficient algorithms are available which do not require C-1 to solve the mixed-model equations (10!. We note in passing that under the assumption of either proper or improper uniform priors for b, a simple manipulation of [10] shows that this expression is not a function of b * , where 0*! _ (a*!b*!). Therefore the drawing of 0 * from p (0 )a £ af) involves a * only, and b * can be set arbitrarily to zero. !* With 0 available, draw from: The samples Q2 are now used in the new round of iteration as input in (7!.

AN EXAMPLE
As an illustration, the block (multivariate) sampling strategy is compared with the traditional scalar implementation of the Gibbs sampler. A data file based on a univariate animal model with 250 individuals (all with one record, 25 individuals per generation, ten generations) and one fixed-effect factor with ten levels was simulated. For both strategies, a single chain of length 31 000 was run, and the first 1000 samples were discarded. Table I shows estimates of the Monte-Carlo variance of the mean of the marginal posterior distributions of the first level of the fixed-effect factor, of the additive genetic value of the last individual and of both variance components. The mean was estimated by summing the samples and dividing by the number of samples (30 000) and the Monte-Carlo variance was computed using the Markov chain estimator (Geyer, 1992). Also shown in table I are estimates of the chains effective length. Briefly, the effective chain length is associated with the amount of information available in a given chain. This parameter becomes smaller as the dependence between the samples of the chain increases. When the samples are in fact independent, the effective and actual chain lengths are equal. Details can be found in Sorensen et al (1995). The figures in table I show that for this model and data structure, there is a reduction in the Monte-Carlo variance by a factor of ten using the block sampling strategy in the case of the fixed effects and breeding values, and a twofold reduction in the case of the variance components. In the above example, the reduction of the Monte-Carlo variance using either the scalar or block sampling strategies was compared on the basis of the simple estimator of the mean of marginal posterior distributions (the raw average of the elements of the Gibbs chain). A more efficient estimator is based on the average of conditional densities Liu et al, 1994). Liu et al (1994) refer to this as the mixture estimator. For example, let X, Y and Z denote three parameters and assume that interest focuses on the estimate of the mean of the marginal posterior distribution of X.
The mixture estimator is given by where the summation over i is over the n elements of the Gibbs chain, and the summation over j is over the chosen values of X. Alternatively, when the integral has a closed-form solution, the mixture estimator can take the form The Monte-Carlo variances of the mixture estimator of the mean of the marginal posterior distributions of the first level of the fixed effect factor, of the additive genetic value of the last individual and of both variance components were respectively 5.11 x 10-3 , 7.62 x 10-3 , 2.59 and 1.03 for the scalar sampling strategy and 0.13 x 10-3 , 0.56 x 10-3 , 1.30 and 0.43 for the block sampling strategy. There is a small increase in efficiency relative to the 'raw means estimator' in the case of the location parameters but not in the case of the variance components. The mixture estimator is especially beneficial when the chain mixes quickly (Liu et al, 1994) and this is not the case in animal models.

CONCLUSIONS
We have presented an algorithm which allows us to implement the Gibbs sampler in a multivariate fashion. The development is in terms of the single trait Gaussian model, but extension to a multiple trait analysis with arbitrary pattern of missing data is straightforward, provided the procedure is used in conjunction with data augmentation. The benefit of block sampling relative to scalar sampling in terms of CPU time was not investigated, since the results will be dependent on the model and data structure. The important feature of the strategy is that it only involves the solution of a linear system. This means that either computer time or storage requirements can be optimized by choice of the appropriate method to solve the linear system. This is in contrast with the scalar Gibbs sampler which has computer requirements analogous to Gauss-Seidel iterative methods.