Bayesian analysis of mixed linear models via Gibbs sampling with an application to litter size in Iberian pigs

Summary - The Gibbs sampling is a Monte-Carlo procedure for generating random samples from joint distributions through sampling from and updating conditional distributions. Inferences about unknown parameters are made by: 1) computing directly summary statistics from the samples; or 2) estimating the marginal density of an unknown, and then obtaining summary statistics from the density. All conditional distributions needed to implement the Gibbs sampling in a univariate Gaussian mixed linear model are presented in scalar algebra, so no matrix inversion is needed in the computations. For location parameters, all conditional distributions are univariate normal, whereas those for variance components are scaled inverted chi-squares. The procedure was applied to solve a Gaussian animal model for litter size in the Gamito strain of Iberian pigs. Data were 1 213 records from 426 dams. The model had farrowing season (72 levels) and parity (4) as fixed effects; breeding values (597), permanent environmental effects (426) and residuals were random. In CASE I, variances were assumed known, with REML (restricted maximum likelihood) estimates used as true parameter values. Here, means and variances of the posterior distributions of all effects were obtained, by inversion, from the mixed model equations. These exact solutions were used to check the Monte-Carlo estimates given by Gibbs, using 120 000 samples. Linear regression slopes of true posterior means on Gibbs means were almost exactly 1 for fixed, additive genetic and permanent environmental effects. Regression slopes of true posterior variances on Gibbs variances were 1.00, 1.01 and 0.96, respectively. In CASE II, variances were treated as unknown, with a flat prior assigned to these. Posterior densities of selected location parameters, variance components, heritability and repeatability were estimated. Marginal posterior distributions of dispersion parameters were skewed, save the residual variance; the means, modes and medians of these distributions differed from the REML estimates, as expected from theory. The conclusions are: 1) the Gibbs sampler converged to the true posterior distributions, as suggested by CASE I; 2) it provides a richer description of uncertainty about genetic

and then obtaining summary statistics from the density. All conditional distributions needed to implement the Gibbs sampling in a univariate Gaussian mixed linear model are presented in scalar algebra, so no matrix inversion is needed in the computations. For location parameters, all conditional distributions are univariate normal, whereas those for variance components are scaled inverted chi-squares. The procedure was applied to solve a Gaussian animal model for litter size in the Gamito strain of Iberian pigs. Data were given by Gibbs, using 120 000 samples. Linear regression slopes of true posterior means on Gibbs means were almost exactly 1 for fixed, additive genetic and permanent environmental effects. Regression slopes of true posterior variances on Gibbs variances were 1.00, 1.01 and 0.96, respectively. In CASE II, variances were treated as unknown, with a flat prior assigned to these. Posterior densities of selected location parameters, variance components, heritability and repeatability were estimated. Marginal posterior distributions of dispersion parameters were skewed, save the residual variance; the means, modes and medians of these distributions differed from the REML estimates, as expected from theory.
Les moyennes et les variances des distributions a posteriori de tous les effets étaient alors obtenues par la résolution du système d'équations du modèle mixte. Ces solutions exactes étaient utilisées pour vérifier les estimées Monte-Carlo données par le Gibbs, en utilisant 120 000 échantillons. Les coefficients de régression linéaire des vraies moyennes a posteriori en fonction des moyennes de Gibbs étaient presque exactement de 1, pour les effets fixés, génétiques additifs et de milieu permanent. Les coeff cients de régression des variances vraies a posteriori en fonction des variances de Gibbs étaient 1,00, 1,01, et 0, 96 respectivement. Dans le CAS II, les variances étaient traitées comme des inconnues, avec une distribution a priori uniforme. Les densités a posteriori de paramètres de position choisis, des composantes de variance, de l'héritabilité et de la répétabilité ont été estimées. Les distributions a posteriori des paramètres de dispersion étaient dissymétriques, sauf la variance résiduelle; les moyennes, modes et médianes de ces distributions différaient des INTRODUCTION Prediction of merit or, equivalently, deriving a criterion for selection is an important theme in animal breeding. Cochran (1951), under certain assumptions, showed that the selection criterion that maximized the expected merit of the selected animals was the mean of the conditional distribution of merit given the data. The conditional mean is known as best predictor, or BP (Henderson, 1973), because it minimizes mean square error of prediction among all predictors. Computing BP requires knowing the joint distribution of predictands and data, which can seldom be met in practice. To simplify, attention may be restricted to linear predictors. Henderson (1963Henderson ( , 1973 and Henderson et al (1959) developed the best linear unbiased prediction (BLUP), which removed the requirement of knowing the first moments of the distributions. BLUP is the linear function of the data that minimizes mean square error of prediction in the class of linear unbiased predictors. Bulmer (1980), Gianola and Goffinet (1982), Goffinet (1983)   showed that under multivariate normality, BLUP is the conditional mean of merit given a set of linearly independent error contrasts. This holds if the second moments of the joint distribution of the data and of the predictand are known. However, second moments are rarely known in practice, and must be estimated from the data at hand. If estimated dispersion parameters are used in lieu of the true values, the resulting predictors of merit are no longer BLUP.
In animal breeding, dispersion components are most often estimated using restricted maximum likelihood, or REML (Patterson and Thompson, 1971). Theoretical arguments (eg, Gianola et al, 1989;Im et al, 1989) and simulations (eg, Rothschild et al, 1979) suggest that likelihood-based methods have ability to account for some forms of nonrandom selection, which makes the procedure appealing in animal breeding. Thus, 2-stage predictors are constructed by, first, estimating variance and covariance components, and then obtaining BLUE and BLUP of fixed and random effects, respectively, with parameter values replaced by likelihoodbased estimates. Under random selection, this 2-stage procedure should converge in probability to BLUE and BLUP as the information in the sample about variance components increases; however, its frequentist properties under nonrandom selection are unknown. One deficiency of this BLUE and BLUP procedure is that errors of estimation of dispersion components are not taken into account when predicting breeding values. ,   and Gianola et al (1990aand Gianola et al ( , b, 1992 advocate the use of Bayesian methods in animal breeding. The associated probability theory dictates that inferences should be based on marginal posterior distributions of parameters of interest, such that uncertainty about the remaining parameters is fully taken into account. The starting point is the joint posterior density of all unknowns. From the joint distribution, the marginal posterior distribution of a parameter, say the breeding value of an animal, is obtained by successively integrating out all nuisance parameters, these being the fixed effects, all the random effects other than the one of interest, and the variance and covariance components. This integration is difficult by analytical or numerical means, so approximations are usually sought Gianola et al, 1990a, b). The posterior distributions are so complex that an analytical approach is often impossible, so attention has concentrated on numerical procedures (eg, Cantet et al, 1992). Recent breakthroughs are related to Monte-Carlo Markov chain procedures for multidimensional integrations and for sampling from joint distributions (Geman and Geman, 1984;). One of these procedures, Gibbs sampling, has been studied extensively in statistics Besag and Cliford, 1991;Gelfand and Carlin, 1991;Geyer and Thompson, 1992). Wang et al (1993) described the Gibbs sampler for a univariate mixed linear model in an animal breeding context. They used simulated data to construct marginal densities of variance components, variance ratios and intraclass correlations, and noted that the marginal distributions of fixed and random effects could also be obtained.
However, their implementation was in matrix form. Clearly, some matrix computations are not feasible in many animal breeding data sets because inversion of large matrices is needed repeatedly.
In this paper, we consider Bayesian marginal inferences about fixed and random effects, variance components and functions of variance components in a univariate Gaussian mixed linear model. Here, marginal inferences are obtained, in contrast to Wang et al (1993) through a scalar version of the Gibbs sampler, so inversion of matrices is not needed. Our implementation was applied to and validated with a data set on litter size of Iberian pigs.

Model
We consider a univariate mixed linear model with several independent random factors as in Henderson (1984), Macedo and Gianola (1987) and Gianola et al (1990a, b): where: y: data vector of order n x 1; X: known incidence matrix of order n x p ; Z i : known matrix of order n x q i ; (3: p x 1 vector of uniquely defined 'fixed effects' (so that X has full column rank); u i : q i x 1 random vector; and e: n x 1 vector of random residuals.
The conditional distribution that generates the data is: where I is an n x n identity matrix, and Q e is the variance of the random residuals.

Prior Distributions
An integral part of Bayesian analysis is the assignment of prior distributions to all unknowns in the model; here, these are 13, Ui (i = 1, 2, ... , c) and the c + 1 variance components (one for each of the random vectors, plus the error). Usually, a flat or uniform prior distribution is assigned to 0, so as to represent lack of prior knowledge about this vector, so: Further, it is assumed that: where G i is a known matrix and cr! is the variance of the prior distribution of u i . In a genetic context, G i matrices can contain functions of known coefficients of coancestry. All u i 's are assumed to be mutually independent a priori, as well as independent of j3. Note that the priors for u i correspond to the assumptions made about these random vectors in the classical linear model. Independent scaled inverted chi-square distributions are used as priors for variance components, so that: and Above, ve(v&dquo;!) is a 'degree of belief' parameter, and s!(s!) can be thought of as a prior value of the appropriate variance. Joint posterior density be 0' without 0 z . Further, let be the vector of variance components other than the residual; and be the sets of all prior variances and degrees of belief, respectively. As shown, for example, by Macedo and Gianola (1987) and Gianola et al (1990a, b), the joint posterior density is in the normal-gamma form: Inferences about each of the unknowns (9, v, !e ) are based on their respective marginal densities. Conceptually, each of the marginal densities is obtained by successive integration of the joint density [7] with respect to parameters other than the one of interest. For example, the marginal density of a£ is It is difficult to carry out the needed integration analytically. Gibbs sampling is a Monte-Carlo procedure to overcome such difficulties. liblly conditional posterior densities (Gibbs sampler) The fully conditional posterior densities of all unknowns are needed for implementing the Gibbs sampling. Each of the full conditional densities can be obtained by regarding all other parameters in [7] as known. Let W = f w ij 1, i, j = 1, 2, ... , N, and b = {b i }, i = 1, 2, ... , N be the coefficient matrix and the right hand side of the mixed model equations, respectively. As proved in the A P pendix, the conditional posterior distribution of each of the location parameters in 0 is normal, with mean and variance !i and Ez : because all computations needed to implement Gibbs sampling are scalar, without any required inversion of matrices. This is in contrast with the matrix version of the conditional posterior distributions for the location parameters given by Wang et al (1993). It should be noted that distributions [8] do not depend on s and v, because v is known in [8]. .. The conditional posterior density of o,2 is in the scaled inverted chi-square form: It can be readily seen that with parameters Each condition posterior density of a u! 2 is also in the scaled inverted chi-square form: A set of the N + c + 1 conditional posterior distributions [8]-[10] is called the Gibbs sampler for our problem.

FULL CONDITIONAL POSTERIOR DENSITIES UNDER SPECIAL PRIORS
The Gibbs sampler with 'naive' priors for all variance components The Gibbs sampler [8]-[10] given above is based on scaled inverted chi-squares used as priors for the variance components. These priors are proper and, therefore, informative, about the variance. A possible way of representing prior ignorance about variances would be to set the degree of belief parameters of the prior distributions for all the variance components to zero ie, v e = vu, = 0, for all i. These priors have been used, inter alia, by   and Gianola et al (1990a, b). In this case, the conditional posterior distributions of the location parameters are as in !8!: because the distributions do not depend on s and v.
However, the conditional posterior distributions of the variance components no longer depend on hyper-parameters s and v. The conditional posterior distribution of the residual variance remains in the scaled inverted chi-square form: but now with parameters Each conditional posterior density of a), is again in the scaled inverted chi-square form: with parameters v&dquo;! = q i and V ui = u §Gy lui/v&dquo;!.
It has been noted recently Raftery and Banfield, 1991) that under these priors, the joint posterior density [7] is improper because it does not integrate to 1. In the light of this, we do not recommend these 'naive' priors for variance component models.
The Gibbs sampler when all variance components are known When variances are assumed known, the only conditional distributions needed are those for the location parameters, and these are as in [8] or !11!.

INFERENCES ABOUT THE MARGINAL DISTRIBUTIONS THROUGH GIBBS SAMPLING
Gibbs sampling: an overview The Gibbs sampler was used first in spatial statistics and presented formally by Geman and Geman (1984) in an image restoration context. Applications to Bayesian inference were described by   .
Since then, it has received extensive attention, as evidenced by recent discussion papers (Gelman and Rubin, 1992;Geyer, 1992;Besag and Green, 1993;Gilks et al, 1993;Smith and Roberts, 1993). Its power and usefulness as a general statistical tool to generate samples from complex distributions arising in some particular problems is unquestioned.
Our purpose is to generate random samples from the joint posterior distribution !7!, through successively drawing samples from and updating the Gibbs sampler !8!-!10!. Formally, Gibbs sampling works as follows: (i) set arbitrary initial values for e, v and a e 2 (v) repeat (ii)-(iv) k (length of the chain) times.
As koo, this creates a Markov chain with an equilibrium distribution that has [7] as its density. We shall call this procedure a single long-chain algorithm.
In practice, there are at least 2 ways of running the Gibbs sampler: a single long chain and multiple short chains. The multiple short-chain algorithm repeats steps (i)-(v) m times and saves only the kth iteration as sample . Based on theoretical arguments (Geyer, 1992) and on our experience, we used the single long-chain method in the present study.
Initial iterations are usually not stored as samples on grounds that the chain may not yet have reached the equilibrium distribution; this is called 'warm-up'. After the warm-up, samples are stored every d iterations, where d is a small positive integer. Let the total number of samples saved be m, the sample size.
If the Gibbs sampler converges to the equilibrium distribution, the m samples are random drawings from the joint posterior distribution with density !7!. The ith sample jo i ,v i and (!)J,z=l,2,...,! [14] is then an N + c + 1 vector, and each of the elements of this vector is a drawing from the appropriate marginal distribution. Note that the m samples in [14] are identically but not independently distributed (eg, Geyer, 1992 can be approximated by where g(x) can be any feature of P(x), such as its mean or variance. As m -00 , u converges almost surely to u (Geyer, 1992). Another way to compute features of P(x) is by first estimating the density p(x), and then obtaining summary statistics from the estimated density using 1dimensional numerical procedures. If Yi (i = 1, 2, ... , m) is another component of (14!, an estimator of p(x) is given by the average of the m conditional densities p(xly i ) ): Note that this estimator does not use the samples xi, i = 1, 2, ... , m; instead, it uses the samples of variable y through the conditional density p(x!y). This procedure, though developed primarily for identically and independently distributed (iid) data, can also be used for dependent samples, as noted by Liu et al (1991) and Diebolt and Robert (1993).
An alternative form of estimating p(x) is to use samples x i (i = 1,2,..., m) only. For example, a kernel density estimator is defined (Silverman, 1986) as: where j!(z) is the estimated density at point z, K(.) is a 'kernel function', and h is a fixed constant called window width; the latter determines the smoothness of the estimated curve. For example, if a normal density is chosen as kernel function, then [18] becomes: Again, though the kernel density estimator was developed for iid data, the work of Yu (1991) indicates that the method is valid for dependent data as well.
Once the density of p(x) is estimated by either [17] or !19!, summary statistics (eg, mean and variance) can be computed by a 1-dimensional numerical integration procedure, such as Simpson's rules. Probability statements about x can also be made, thus providing a full Bayesian solution to inferences about the distribution x.
Bayesian inference about functions of the original parameters Suppose we want to make inference about the function: The quantity is a random (dependent) sample of size m from a distribution with density p(z). Formulae !16!, [18] and [19] using such samples can also be used to make inferences about z.
An alternative is to use standard techniques to transform from either the conditional densities p(x!y) or p(y!x), to p(z!y) or p(z!x). Let the transformation be from xly to z!y; the Jacobian of the transformation is lyl, so the conditional density of zl y is: where y is a vector of observations (number of pigs born alive per litter); X, Z i and Z 2 are known incidence matrices relating (3, u and c, respectively, to y; 13 is a vector of fixed effects, including a mean, farrowing season (72 levels) and parity (4 levels) ; u is a random vector of additive genetic effects (597 levels) ; c is a random vector of permanent environmental effects (426 levels) ; and e is a random vector of residuals. Distributional assumptions were: where Qu, a! and cr! are variance components and A is the numerator of Wright's relationship matrix; the vectors u, c and e were assumed to be pairwise independent. After reparameterization, the rank (p) of X was 1 + 71 + 3 = 75; the rank of the mixed model equations was then: N = 75 + 597 + 426 = 1 098.

Gibbs sampling
We ran 2 separate Gibbs samplers with this data set, and we refer to these analyses as CASES I and II. In CASE I, the 3 variance components were assumed known, with REML estimates (Meyer, 1988) used as true parameter values. In CASE II, the variance components were unknown, and flat priors were assigned to them. For each of the 2 cases, a single chain of size 1 205 000 was run. After discarding the first 5 000 iterations, samples were saved every 10 iterations (d = 10), so the total number of samples (m) saved was 120 000. This specification (mainly the length of a chain) of running the Gibbs sampler was based on our own experience with this data and with others. It may be different for other problems.
Due to computer storage limitation, not all Gibbs samples and conditional means and variances could be saved for all location parameters. Instead, for further analysis and illustration, we selected 4 location parameters arbitrarily, one from each of the 4 factors (farrowing season, parity, additive genetic effect and permanent environmental effect). For each of these 4 location parameters, the following quantities were stored: where x i is a Gibbs sample from an appropriate marginal distribution, and 0 1 and v i are the mean and variance of the conditional distribution, [8]  where 0 j and 11j are the conditional mean and variance of the conditional posterior density of z. For each of the variance components, the estimator was: where v is the degree of belief, s j is the scale parameter of the conditional posterior distribution of the variance of interest, and r(.) is the gamma function. The normal kernel estimator [19] was applied directly to the samples for location and dispersion parameters.
To estimate the densities, we divided the 'effective domain' of each parameter into 100 equally spaced intervals; the effective domain contained at least 99.5% of the density mass. Running through the effective domain, a sequence of pairs (p(z i ), zi), i = 1, 2, ... ,101, was generated. Densities were displayed graphically by plotting (p(z i ), z i ) pairs. For the normal kernel density estimator (19!, window width was specified as: h = (range of effective domain)/75. For the 4 selected location parameters, the mean, mode, median and variance of each of the marginal distributions were computed as summary features. The mean, median and variance were obtained with Simpson's integration rules by further dividing the effective domain into 1 000 equally spaced intervals, and using a cubic spline technique (IMSL, INC, 1989); the mode was located through a grid search.
For location parameters other than the 4 selected ones, the posterior means and variances were computed directly from the Gibbs samples.
Density estimation for functions of variance components As mentioned previously, the posterior density of any function of the original parameters can be made through transformation of random variables without rerunning the Gibbs sampler, provided appropriate samples are saved. In this section, we summarize methods for density estimation of functions of variance components; see also Wang et al (1993).
Let the Gibbs samples for or2,a 2 and or2 , be respectively: Also, let the scale parameters of the corresponding densities be: and Consider first estimating the marginal posterior density of the total variance: QP = a2 + a § c 2+U2 . e Let the transformation be from the conditional posterior density, (0,2 e 1 y, e, s, v) to ( 01; 2 1 y, 0, v, s, v). The Jacobian of the transformation is 1. Thus, using !17!, the estimator of the marginal posterior density of or; is: where v = v e . Further, zp i = Xu i + x!i + x ei , (i = 1, 2, ... , m), is a sample from the Similarly, the estimator of the marginal posterior density of h 2 = C2/ u 01; is obtained by using the transformation afl -h 2 in the conditional posterior density of a2. One obtains where v = i u and For repeatability, r = ( Q u + 0,2)/U,2, we used the transformation o! &mdash;! r in the conditional posterior density of Q e to obtain where c is as in [33] but with v = v e .
where v = vu. The variance ratio, 6 = or2/or2, is estimated in the same mammer as for q in [35] with the samples Scj substituted in place of s!! and v = using the transformation o,2 c 6.

RESULTS
When variance components are known (CASE I), the marginal posterior distributions of all location parameters are normal . The mean (mode or median) of the marginal distribution of a location parameter is given by the corresponding component of the solution vector of the mixed model equations, and the variance of the distribution is equal to the corresponding diagonal element of the inverted mixed model coefficient matrix, multiplied by the residual variance. These are mathematical facts, and do not relate in any way to the Gibbs sampler. We used this knowledge to assess the convergence of the Gibbs sampler, which gives Monte-Carlo estimates of the posterior means and variances. In CASE I, for the data at hand, the posterior distributions can be arrived at more efficiently by direct inversion or iterative methods than via the Gibbs sampler. Table I contains results of a comparison between the posterior means and variances estimated by Gibbs sampling (GIBBS) with the exact values found by solving directly the mixed model equations (TRUE). Several criteria were used to compare the 2 sets of results: absolute difference (bias) between TRUE and GIBBS; absolute relative bias (RB) defined as bias divided by TRUE; the slopes of the linear regression of TRUE on GIBBS, and vice versa; and the correlation between TRUE and GIBBS. Of course, GIBBS and TRUE were not exactly the same because GIBBS is subject to Monte-Carlo sampling errors; as m(k) goes to infinity, GIBBS is expected to converge to TRUE. We found excellent agreement between TRUE and GIBBS for all these criteria. The average absolute RB did not exceed 1%, except for the posterior means of additive genetic and permanent environmental effects. For these effects, the RB criterion can be misleading, because the true posterior means were very small in value, so that even small biases made the RB very large. The regressions and correlation coefficients between GIBBS and TRUE were all close to 1.0 for both means and variances. All these results indicate that the Gibbs sampler converged in this application.
The true and estimated posterior distributions of the 4 selected location parameters are depicted in figure 1. The true densities are simply normal density plots with means and variances from the TRUE analysis. The estimated densities were obtained with [17] and [19]. The 3 curves overlapped perfectly, indicating convergence of the Gibbs sampler to the true posterior distributions. Figure 2 depicts the marginal posterior densities of the same 4 selected location parameters for CASE II, ie with unknown variances. For the 2 fixed effects, the distributions were essentially symmetric, and similar to those found for CASE I (fig 1). This indicated, for this application, that replacing the unknown variances by REML estimates, and then completing a Bayesian analysis of fixed effects as if variances were known , would give a good approximation to inferences about fixed effects in the absence of knowledge about variances. Note that the variances of the posterior distributions of the fixed effects shown in figures 1 and 2 are similar. In theory, one would expect the posterior variances to be somewhat larger in CASE II. However, it should be borne in mind that the Gibbs variances are Monte-Carlo estimates, therefore subject to sampling error.
A noteworthy feature of figure 2 was that densities of the additive genetic and permanent environmental effects were skewed, in contrast to the normal distributions found in CASE I. Further, the posterior densities were sharply peaked in the neighborhood of 0, the prior mean; this is consistent with the fact (as discussed later) that in this data there was considerable density near 0 for the additive genetic and permanent environmental components of variance. For these 2 parameters, an analysis using REML estimates would tend to give misleading probability statements and posterior confidence intervals. In particular, in CASE I, the posterior mean (variance) of the selected permanent environmental effect was 0.106 (0.0566); in CASE II, the corresponding figure was 0.22 (0.140). The data contained little information about the permanent environmental effect of this animal, and this is proportional to the number of litters produced by the sow in question. It is precisely in these instances that 'errors' in variance component estimates are crucial. The posterior variance of the permanent environmental effect in CASE II was almost twice as large as in CASE I, illustrating the impact of errors in estimating variance components on inferences. The point is that variances, interval estimates and probability statements about location parameters based on normal approximations, with variance components assumed known in the mixed model equations, can be misleading when information about location parameters and variances is scant in the data. The more information one has about a location parameter, the less influential are the assumed values for the variance components.
The problem could be serious if the number of location parameters is large relative to the number of observations, eg, in an animal model, and if the data do not contain sufficient information about the variance parameters. An exact Bayesian analysis such as the one conducted here for CASE II would correct all these problems. Estimated densities for variance components and their functions are presented in figure 3. REML estimates (Meyer, 1988) are also included for comparison purposes. The striking feature was that all distributions, with the exception of those of the residual and phenotypic variances, were skewed. Hence, the mean, mode and median tended to differ from each other. Consider, for example, the posterior distribution of the additive genetic variance; the REML estimate was 0.206, identical to the estimated posterior mean, but quite different from the marginal mode (0.165). For o, c 2, the REML estimate was closer to the marginal mode than to the mean or median. A naive 95% 'confidence interval' for U2 based on the mean and variance of the posterior distribution and asymptotic theory would be (-0.052, 0.309) ; inference of this type is typical in likelihood based analyses. In the light of the Bayesian analysis depicted in figure 3, the hypothesis that the permanent environmental variance is zero could not be rejected, although there is considerable posterior probability that Q! > 0.04. Further, whereas any reasonable Bayesian confidence interval would be in the permissible parameter space, an REML interval would not in this case. For this data set, 95% asymptotic confidence intervals for a2 and a § based on the REML analysis were (-0.971, 1.356) and (-0.267, 0.384), respectively.
The estimated densities using [17] were less smooth than those based on !19).
This was due to Monte-Carlo sampling errors; the curves can be smoothed by increasing the length of the chain.

DISCUSSION
A Gibbs sampling scheme was developed for a univariate Gaussian mixed linear model with correlated observations, such as those arising in quantitative genetics. With this implementation, a full Bayesian analysis of the location and dispersion parameters, or of their functions, in a real-life mixed linear model was possible. The Gibbs sampler made feasible integration of all nuisance parameters, and gave a Monte-Carlo estimate of the marginal posterior distribution of the parameter of interest. In the classical sense, this is equivalent to taking into account errors in estimation of all other parameters in the model when inferences about a parameter of interest are made. This is precisely why REML was advanced over maximum likelihood: errors in estimating fixed effects are taken into account in REML. In this sense, a marginal Bayesian analysis with flat priors can be thought of as an analysis of a marginal likelihood, with additional richness brought by the probability calculus on which Bayesian inference is based. Bayesian analysis via Gibbs sampling provides the complete marginal posterior distribution of an unknown. Any features of this distribution can be computed, including probability statements. Because Bayes theorem operates within the space in which parameters are defined, all statistics fall in the permissible parameter space. This is a serious problem of frequentist procedures such as REML. Although the REML estimates are defined within the permissible parameter space, interval estimates based on asymptotic theory can include values outside its boundaries, as illustrated previously.
Unfortunately, a richer analysis requires more intensive computations. For the problem studied in this paper, it took about 14.5 and 23 hours of CPU for CASES I and II, respectively, on a HP9000/827 running HPUX 8.02, with a Gibbs chain length of 1205 000. This certainly limits the applicability of the procedure to large problems, at least at present. Our experience suggests that the procedure is feasible with as many as 10 000 location parameters; hence, analysis of data from designed experiments would be feasible. With the fast advances in computing technology, it is likely that much larger models could be handled efficiently in the near future. We do not advocate at this time Gibbs sampling as a computing method for routine genetic evaluation. However, it is appealing for scientific purposes, eg, when many simplifying assumptions must be relaxed, or when model flexibility is needed.
The chain length of 1 205 000 was deliberately long. Summary statistics of a marginal distribution can be computed from a much shorter chain in practice, with relatively high precision. This, of course, would reduce computing cost.
Theoretical results guarantee that an irreducible Markov chain converges to its equilibrium distribution (Kipnis and Varadhan, 1986;Tierney, 1991). However, this does not translate easily into practical guidelines for convergence checking. Our heuristic convergence checking procedure was to run chains under different specifications (starting values, chain length and number of samples saved) of the sampler. If they produced similar results, convergence was assumed. Our experience suggested that if one can obtain a smooth density curve by averaging conditionals, as in [17], the sampler has converged. In CASE II, a smooth curve using [17] was much harder to obtain for dispersion than for location parameters; with !17!, very long chains were needed to obtain smooth estimated densities. In CASE I, convergence was sure, because the estimated densities were almost identical to those derived by analytical means. At any rate, checking for convergence is a difficult problem in most areas of numerical analysis, and Gibbs sampling is no exception. Here, there are additional complications stemming from Monte-Carlo errors and from convergence in probability to the true distributions. In the process of monitoring convergence, we observed that this was slower for or and a § than for other parameters.
We used 2 density estimators: averaging conditionals [17] and the normal density estimation [19]. Theoretically, [17] is expected to be more efficient than its counterpart because of the use of conditional information ). However, we did not observe sizable differences in our analysis, as can be ascertained from figures 1-3. In fact, we found that similar density estimates could be obtained with [19], but using much fewer samples than 120000. This would favor [19] over (17!, in this situation. The naive fixed window length used in [19] throughout performed well in all cases. The procedure can be divided into 2 stages: Gibbs sampling and post-Gibbs analysis. In our case, because sample size was large (m = 120 000), the post Gibbs analysis was onerous. In general, large sample sizes are needed because of high serial correlations between consecutive samples. The effective sample size, perhaps measured as m(1 -p)!(1 + p) (Tierney, 1991), where p is the lag-one serial correlation, could be much smaller than m. For example, if p = 0.9, the effective sample size would be 6 316, or 5.26% of 120 000. In our study, we monitored serial correlations between consecutive drawings for the variance components. The estimated lag-300 correlations for Q u and a § were 0.6 and 0.3, respectively, while the lag-one correlation for a£ was almost 0. This is why the chains were so long, and so many samples were saved. One possible way to reduce dependence between samples would be to embed a Hasting or Metropolis updating step in the basic Gibbs sampling scheme, as used, for example, in pedigree analysis (Lin and Thompson, 1993). Further research is needed in this area for the type of models applied in animal breeding.
We have demonstrated in this paper that the Gibbs sampling scheme can be used successfully to carry out an exact Bayesian analysis of all parameters in a general univariate mixed linear model. The method, however, could also be used in classical situations for problems where analytical integration is intractable.
Examples are Cliford (1989, 1991) on Monte-Carlo tests, and Geyer and Thompson (1992) or Gelfand and Carlin (1991) on Monte-Carlo maximum likelihood. Extensions to multivariate problems, eg, genetic maternal effects, are in progress ). An application of Gibbs sampling to the analysis of selection experiment is given by Wang et al (1994).