Marginal inferences about variance components in a mixed linear model using Gibbs sampling

Summary - Arguing from a Bayesian viewpoint, Gianola and Foulley (1990) derived a new method for estimation of variance components in a mixed linear model: variance estimation from integrated likelihoods (VEIL). Inference is based on the marginal posterior distribution of each of the variance components. Exact analysis requires numerical integration. In this paper, the Gibbs sampler, a numerical procedure for generating marginal distributions from conditional distributions, is employed to obtain marginal inferences about variance components in a general univariate mixed linear model. All needed conditional posterior distributions are derived. Examples based on simulated data sets containing varying amounts of information are presented for a one-way sire model. Estimates of the marginal densities of the variance components and of functions thereof are obtained, and the corresponding distributions are plotted. Numerical results with a balanced sire model suggest that convergence to the marginal posterior distributions is achieved with a Gibbs sequence length of 20, and that Gibbs sample sizes ranging from 300 - 3 000 may be needed to appropriately characterize the marginal distributions. variance components / linear models / Bayesian methods / marginalization / Gibbs sampler

bution of each of the variance components. Exact analysis requires numerical integration. In this paper, the Gibbs sampler, a numerical procedure for generating marginal distributions from conditional distributions, is employed to obtain marginal inferences about variance components in a general univariate mixed linear model. All needed conditional posterior distributions are derived. Examples based on simulated data sets containing varying amounts of information are presented for a one-way sire model. Estimates of the marginal densities of the variance components and of functions thereof are obtained, and the corresponding distributions are plotted. Numerical results with a balanced sire model suggest that convergence to the marginal posterior distributions is achieved with a Gibbs sequence length of 20, and that Gibbs sample sizes ranging from 300 -3 000 may be needed to appropriately characterize the marginal distributions. variance components / linear models / Bayesian methods / marginalization / Gibbs sampler R.ésumé -Inférences marginales sur des composantes de variance dans un modèle linéaire mixte à l'aide de l'échantillonnage de Gibbs. Partant d'un point de vue bayésien, Gianola et Foulley (1990) ont établi une nouvelle méthode d'estimation des composantes de variance dans un modèle linéaire mixte: estimation de variance par les vraisemblances intégrées (VEIL). L'inférence est basée sur la distribution marginale a posteriori de chacune des composantes de variance, ce qui oblige à des intégrations numériques pour arriver aux solutions exactes. Dans cet article, l'échantillonnage de Gibbs, qui est une procédure numérique pour générer des distributions marginales à partir de distributions conditionnelles, est employé pour obtenir des inférences marginales sur des composantes de variance dans un modèle linéaire mixte univarié général. Toutes les distributions conditionnelles a posteriori nécessaires sont établies. Des exempdes basés sur des données simulées contenant plus ou moins d'information sont présentés pour un modèle paternel à un facteur. Des estimées des densités marginales des composantes de variance et de fonctions de celles-ci sont obtenues, et les distributions correspondantes sont tracées. Les résultats numériques avec un modèle paternel équilibré suggèrent que la convergence vers les distributions marginales a posteriori est atteinte avec une séquence de Gibbs longue de 20 unités, et que des tailles de l'échantillon de Gibbs allant de 300 à 3 000 peuvent être nécessaires pour caractériser convenablement les distributions marginales. composante de variance / modèle linéaire / méthode bayésienne / marginalisation / échantillonnage de Gibbs INTRODUCTION Variance components and functions thereof are important in quantitative genetics and other areas of statistical inquiry. Henderson's method 3 (Henderson, 1953) for estimating variance components was widely used until the late 1970's. With rapid advances in computing technology, likelihood based methods gained favor in animal breeding. Especially favored has been restricted maximum likelihood under normality, known as REML (Thompson, 1962;Patterson and Thompson, 1971). This method accounts for the degrees of freedom used in estimating fixed effects, which full maximum likelihood (ML) does not do. ML estimates are obtained by maximizing the full likelihood, including its location variant part, while REML estimation is based on maximizing the &dquo;restricted&dquo; likelihood, ie, that part of the likelihood function independent of fixed effects. From a Bayesian viewpoint, REML estimates are the elements of the mode of the joint posterior density of all variance components when flat priors are employed for all parameters in the model (Harville, 1974). In REML, fixed effects are viewed as nuisance parameters and are integrated out from the posterior density of fixed effects and variance components, which is proportional to the full likelihood in this case.
First, REML estimates are the elements of the modal vector of the joint posterior distribution of the variance components. From a decision theoretic point of view, the optimum Bayes decision rule under quadratic loss in the posterior mean rather than the posterior mode. The mode of the marginal distribution of each variance component should provide a better approximation to the mean than a component of the joint mode. Second, if inferences about a single variance component are desired, the marginal distribution of this component should be used instead of the joint distribution of all components. Gianola and Foulley (1990) proposed a new method that attempts to satisfy these considerations from a Bayesian perspective. Given the prior distributions and the likelihood which generates the data, the joint posterior distribution of all parameters is constructed. The marginal distribution of an individual variance component is obtained by integrating out all other parameters contained in the model. Summary statistics, such as the mean, mode, median and variance can then be obtained from the marginal posterior distribution. Probability statements about a parameter can be made, and Bayesian confidence intervals can be constructed, thus providing a full Bayesian solution to the variance component estimation problem. In practice, however, this integration cannot be done analytically, and one must resort to numerical methods. Approximations to the marginal distributions were proposed (Gianola and Foulley, 1990), but the conditions required are often not met in data sets of small to moderate size. Hence, exact inference by numerical means is highly desirable.
Gibbs sampling (Geman and Geman, 1984) is a numerical integration method.
It is based on all possible conditional posterior distributions, ie, the posterior distribution of each parameter given the data and all other parameters in the model. The method generates random drawings from the marginal posterior distributions through iteratively sampling from the conditional posterior distributions. Gelfand and Smith (1990) studied properties of the Gibbs sampler, and revealed its potential in statistics as a general numerical integration tool. In a subsequent paper ), a number of applications of the Gibbs sampler were described, including a variance component problem for a one-way random effects model. The objective of this paper is to extend the Gibbs sampling scheme to variance component estimation in a more general univariate mixed linear model. We first specify the Gibbs sampler in this setting and then use a sire model to illustrate the method in detail, employing 7 simulated data sets that encompass a range of parameter values. We also provide estimates of the posterior densities of variance components and of functions thereof, such as intraclass correlations and variance ratios.

SETTING
Model Details of the model and definitions are found in Macedo and Gianola (1987), Gianola et al (1990a, b) and Gianola and Foulley (1990); only a summary is given here. Consider the univariate mixed linear model: where: y: data vector of order n x 1; X: known incidence matrix of order n x p; Z i : known matrix of order n x q i ; p: p x 1 vector of uniquely defined &dquo;fixed effects&dquo; (so that X has full column rank); u i : q i x 1 &dquo;random&dquo; vector; and e i : n x 1 vector of random residuals. The conditional distribution which generates the data is.
where R is an n x n known matrix, assumed to be an identity matrix here, and Q e 2 is the variance of the random residuals.

Prior distributions
Prior distributions are needed to complete the Bayesian specification of the model. Usually, a &dquo;flat&dquo; prior distribution is assigned to J3, so as to represent lack of prior knowledge about this vector, so: where G i is a known matrix and a' . is the variance of the prior distribution of u i . All u i 's are assumed to be mutually independent, a priori, as well as independent of p. Independent scaled inverted x 2 distributions are used as priors for variance components, so that: Above v e (v u; ) is a &dquo;degree of belief&dquo; parameter, and se (su a ) can be interpreted as a prior value of the appropriate variance. In this paper, as in Gelfand et al (1990) we assume the degree of belief parameters, v e and v!;, to be zero to obtain the &dquo;naive&dquo; ignorance improper priors: The joint prior density offi,u i (i = 1, 2, ... , c), U2i (i = 1, 2, ... , c) and Q e is the product of densities associated with (3-7J, realizing that there are c random effects, each with their respective variances. The joint posterior distribution resulting from priors [7] is improper mathematically, in the sense that it does not integrate to 1. The improperty is due to (6J, and it occurs at the tails. Numerical difficulties can arise when a variance component has a posterior distribution with appreciable density near O. In this study, priors [7] were employed following Gelfand et al (1990), and difficulties were not encountered. However, informative or non informative priors other than [7] should be used T in applications where it is postulated that at least one of the variance components is close to O.

JOINT AND FULL CONDITIONAL POSTERIOR DISTRIBUTIONS
Denote u' = ui, ... , u!) and v' = (Q!1, ... , Qu!). Let f = (u!...,u!_i,u!i,...,u!) and v' _ (a2 ' !2 !z . !2 ) b e u ' ' U-i = -U1&dquo;&dquo;,Ui-1,Ui+1&dquo;&dquo;'Uc c an y-i = aU1&dquo;..,aU'-1,aUH1&dquo;..,auc v.! e U and yf with the ith element deleted from the set. The joint posterior distribution of the unknowns (fi, u, y and ud) is proportional to the product of the likelihood function and the joint prior distribution. As shown by Macedo and Gianola (1987) and Gianola et al (1990a, b), the joint posterior density is in the normal-gamma form: The full conditional density of each of the unknowns is obtained by regarding all other parameters in [8] as known. We then have: The full conditional distribution of each u i (i = 1, 2, ... , c) is multivariate normal: The full conditional density of Q e is in the scaled inverted X 2 form: c c with parameters v e = n and sd = (y-Xfi-L Ziui)'(y-X!-! Ziui)/n. Each : =i : = i full conditional density of a!, also is in the scaled inverted X 2 form: with parameters v u; = q i and s 2 = u!G71 t . ui/qi.
The full conditional distributions [9-12] are essential for implementing the Gibbs sampling scheme.

Gibbs sampling
In many Bayesian problems, marginal distributions are often needed to make appropriate inferences. However, due to the complexity of joint posterior distributions obtaining a high degree of marginalization of the joint posterior density is difficult or impossible by analytical means. This is so for many practical problems, eg inferences about variance components. Numerical integration techniques must be used to obtain the exact marginal distributions, from which functions of interest can be computed and inferences made.
A numerical integration scheme known as Gibbs sampling (Geman and Geman, 1984;Gelfand and Smith, 1990;Gelfand et al, 1990;Casella and George, 1990) circumvents the analytical problem. The Gibbs sampler generates a random sample from a marginal distribution by successively sampling from the full conditional distributions of the random variables involved in the model. The full conditional distribution presented in the previous sections are summarized below: Although we are interested in the marginal distributions of Q e and a Ui 2 only, all full conditional distributions are needed to implement the sampler. The ordering placed above is arbitrary. Gibbs sampling proceeds as follows: (i) set arbitrary initial values for p, u, v, U2 (ii) generate ud from (13J, and update U2 ; e (iii) generate a2i u from (14J, and update O -u 2i (iv) generate u i from (15J, and update u i ; (v) generate f3 from (16J, and update 13; and (vi) repeat (ii-v) k times, using the updated values.
We call k the length of the Gibbs sequence, Askoo, the points from the kth iteration are sample points from the appropriate marginal distributions. The convergence of the samples from the above iteration scheme to drawings from the marginal distributions was established by Geman and Geman (1984) and restated by Gelfand and Smith (1990) and Tierney (1991). It should be noted that there are no approximations involved. Let the sample points be: ( 2) (k) ( 2 ) (k) (i = 1 , 2 , ... , c ), ( ui )( k ) (i -1, 2, ... , c) and (f3) lk > respectively, where superscript (k) denotes the kth iteration. Then: (vii) Repeat (i-vi) m times, to generate m Gibbs samples. At this point we have: Because our interest is in making inferences about o, and Q u., no attention will be paid hereafter to u i and P. However, it is clear that the marginal distributions of u i and P are also obtained as a byproduct of Gibbs sampling.

Density estimation
After samples from the marginal distributions are generated, one can estimate the densities using these samples and the full conditional densities. As noted by Casella and George (1990)  , the marginal density of a random variable x can be written as: An estimator of p(!) is: Thus, the estimator of the marginal density of Q e is: The estimated values of the density are thus obtained by fixing Q e (at a number of points over its space), and then evaluating [21] at each point. Similarly, the estimator of the marginal density of Q u i is: Additional discussion about mixture density estimators is found in Gelfand and Smith (1990).
Estimation of the density of a function of the variance components is accomplished by applying theory of transformations of random variables to the estimated densities, with minimal additional calculations. Examples of estimating the densities of variance ratios and of an intraclass correlation are given later.

CLASSIFICATION Model
We consider the one-way linear model: where (3 is a &dquo;fixed&dquo; effect common to all observations, u i could be, for example, sire effects, and e ij is a residual associated with the record on the jth progeny of sire i. It is assumed that: where NiD and NiiD stand for &dquo;normal, independently distributed&dquo; and &dquo;normal, independently and identically distributed&dquo; , respectively.

Conditional distributions
For this model, the parameters of the normal conditional distribution of ,6Iu, Qu, <7!, y in [9] are: Likewise, the parameters of the normal conditional distribution of ul,8, au 2 ,ae 2 y in [10] are: with a = a;/a!, and the covariance matrix is: with c ii = 1/(n;. + a). Because the covariance matrix is diagonal, each u i can be generated independently as: The conditional density of a; in !11! can be written as: Because e e XN , it follows that ud -Ns!x&dquo;i/, so [31] is the kernel of a multiple of an inverted X Z random variable.

Data sets and designs
Seven data sets (experiments) were simulated, so as to represent situations that differ in the amount of statistical information. Essentials of the experimental designs are in table 1. Number of sire families (q) varied from 10 to 10 000, while number of progeny per family (n) ranged from 5 to 20. The smallest experiment was I, with 10 sires and 5 progeny per sire; the largest one was VII, with a total of 100 000 records. Only balanced designs (n i = n, for i = 1, 2, ... , q) were reported here, as similar results were found with unbalanced layouts. Data were randomly generated using parameter values of 0 and 1 for ( 3 and a!, respectively. Parametric values for a; were from 1 to 99, thus yielding intraclass correlations (p) ranging from 0.01 to 0.5. From a genetic point of view, an intraclass correlation of 0.5 (Data set I) is not possible in a sire model, but it is reported here for completeness.
The Gibbs sampler [13][14][15][16] was run at varying lengths of the Gibbs sequence (k = 10 to 100) and Gibbs sample sizes (m = 300 to 3 000), to assess the effects of k and m on the estimated marginal distributions. FORTRAN subroutines of the IMSL (IMSL Inc, 1989) were used to generate normal and inverted X 2 random deviates. At the end of each run, the following quantities were retained: Sample points in [37] and (38), are needed for density estimation, as noted below.

Marginal density estimation
The estimators of the marginal densities of or2and <7! were: Using the theory of transformation of random variables, we obtained the marginal density of the variance ratio y = a!/a; by fixing a;, using [40] as: If inferences are to be made about the intraclass correlation p = ufl /(ufl + a;), the Jacobian of the transformation from , to p is J = !e/(1 -p) 2 , so from !40!, considering a; as fixed, the marginal density of the intraclass correlation is: Densities of any functions of the variance components can be estimated in a similar manner. Density [39] can also be used to make the transformations. Note that the Gibbs sampler [13][14][15][16] does not need to be run again to obtain the densities of the functions of variance components.
Plots were generated using densities [39][40][41][42] by selecting 50 to 100 equally spaced points in the &dquo;effective&dquo; range of the variables; the &dquo;effective&dquo; range of the variable covers at least 99.5% of the density mass. Summary statistics of the posterior distributions such as mean, mode, median and variance were calculated using the composite Simpson's rules of numerical integration by dividing the effective range of the variables into 100 to 200 equally spaced intervals. Where appropriate, summary statistics were calculated using the densities estimated at higher values of m or k.

RESULTS
The estimated marginal densities of variance components and of their functions (q and p) are depicted in figures 1 to 7, in which solid curves correspond to densities estimated at higher values of m or k and vice versa for the dotted ones. These figures correspond to the 7 designs given in table I. Figure 1 represents a situation where 50% of the total variance is &dquo;due to sires&dquo;, ie, a high intraclass correlation; as noted earlier, this is not possible genetically. All posterior distributions were unimodal, and convergence of the Gibbs sampling scheme to the appropriate marginals was achieved with k = 20 and k = 300, as it can be ascertained from direct inspection of the curves. Because of the limited information contained in data set I(q = 10, n = 5), posterior densities were not symmetric, so the mean, mode and median differ. The median was closer to the true values of the parameters than the mean and the mode; this was true for all 4 distributions considered.
For data sets II-IV, with heritabilities ranging from 20-80%, and number of sires from 50 to 1000, the posterior densities (figs 2-4) were nearly symmetric, so the 3 location statistics were very similar to each other. In figure 4, corresponding to a! = 1, Q e = 10 and to a data set with 1000 sires and a total of 20 000 records, the posterior coefficients of variation were approximately 1% for or and 9% for the remaining parameters. This illustrates the well known result that Q e is less difficult to estimate than or or functions thereof. The plots suggest convergence of the Gibbs sampler at values of k as low as 10-20.
Some difficulties were encountered with the Gibbs sampling schemes in designs V and VI (fig 5 and 6, respectively). These designs correspond to situations of low heritability and of mild information about parameters contained in the data.
While there was no problem in general with the posterior distribution of Q e, this was not so for the remaining 3 parameters. Typical problems were bi-modality or lack of smoothness in the left tail of the estimated densities. These were found to be related to insufficient Gibbs sample size, and were corrected by increasing the Gibbs sampling size (m). Compare, for example, the dotted (m = 300) with the solid (m = 3 000) curves in figure 6. A more awkward distribution requires more samples to be characterized accurately. Recall that each of the density estimators [40]-[42] was obtained by averaging m conditional densities. In the case of badly behaved marginal distributions, which are associated with low heritability and small data sets, it is possible to obtain &dquo;outlier conditional densities&dquo;. The outliers can be influential and distort the values of the estimated density unless m is large enough. It is clear from the figures that smooth posterior estimated densities for awkward distributions, eg, for heritability as low as 4%, can be obtained by increasing Gibbs sample size, albeit at computational expense. At this level of heritability, when the number of sire families was increased to 10 000, posterior distributions were nearly symmetric (fig 7), and there was little variability about the parametric values. Gianola and Foulley (1990) gave approximations to the marginal distribution of variance components, which should be adequate provided that the joint posterior density of the ratios between variance components is sharp or symmetric about its mode. As in any approximation, its goodness depends on the amount of information contained in the data, and on the number of parameters in the model. An important practical question is to what extent the approximations hold. From our experiments, the information in data sets I, II, V and VI is not sufficient to justify use of the approximation, even in a model with 2 variance components; this is so because the marginal distribution of the variance ratio was neither sharp nor symmetric.

DISCUSSION
However, the approximation would hold in the remaining data sets. A more detailed comparative study between the exact and the approximate method is needed to ascertain when the latter can be used confidently. In the meantime, the Gibbs sampling provides a way to estimate the marginal distributions without using complicated numerical integration procedures.
The present work is an extension of that of Gelfand et al (1990), in which they illustrated the Gibbs sampler with several normal data models, including variance component estimation in a balanced one-way random effects layout. In our paper, formulae were presented to implement Gibbs sampling in a more general univariate mixed linear model. With these extensions, a full Bayesian solution to the problem of inference about variance components, and functions thereof in such a mixed linear model is possible. Gibbs sampling turns an analytically intractable multidimensional integration problem into a feasible numerical one. Gibbs sampling is relatively easy to implement. Given the likelihood function and the priors, one can always obtain the joint posterior density of the unknowns under consideration. From this density, at least in the normal linear model, one can directly get the full conditional distribution of a particular variable by fixing the rest of the variables in the joint posterior distribution. The set of all full conditional densities gives the expressions needed for implementing the Gibbs sampler. The full conditional densities in this case are in families, such as normal and inverted x 2 , where generating random numbers is not exceedingly complicated. The limiting factor is the efficiency with which random numbers can be generated, because the method requires generating a large quantity of random numbers; m x k x r, where r is the number of parameters in the model; r = q + 3, in our case. It is difficult to specify the values of m and k, a priori. Gelfand et al (1990) described a method for assessing convergence under different values of m and k, and suggested using k = 10 &mdash; 20 and m = 100 for a variance component problem in a balanced oneway model. However, they increased m to 1000 when the variance ratio was under consideration. Our numerical results for the same model support their suggestion for the value of k, but indicate that Gibbs sample sizes of 2 000 to 3 000 may be needed for badly behaved marginal distributions (figs 6, 7). The most difficult situation encountered in our experiments was when intraclass correlation and sample size were both small. In general the appropriate values of m and k depend on the number of variables in the model, the shapes of the marginal distributions and the accuracy required to estimate densities. An appealing aspect of Gibbs sampling is its flexibility. For example, densities of functions of the original variables included in the posterior distribution can be estimated using standard theory of random variable transformation, with minimal additional calculations.
Gibbs sampling is iterative. In this respect, there are 2 issues of concern: convergence and uniqueness. However, Geman and Geman (1984) showed that under mild regularity conditions, the Gibbs sampler converges uniquely to the appropriate marginal distributions. Casella and George (1990) discuss numerical means to accelerate convergence. Another way to speed up convergence is to integrate out analytically some nuisance parameters from the joint posterior distribution before running the Gibbs sampler. In our case, inference is sought on variance components and on their functions. Here, u and P would be integrated out before running the Gibbs sampler. The necessary conditional distributions are given by Gianola and Foulley (1990).
The finite mixture density estimators of ae and Q u in [39] and [40] can be thought of as average of a finite number of inverted X 2 . This ensures that &dquo;point estimates&dquo; of variance components, eg, mean, mode and median will always be within the allowable parameter space. Likewise, interval estimates are also within the parameter space, in contrast to the asymptotic confidence intervals obtained from full or restricted maximum likelihood, which may include negative values.
Once the marginal densities are obtained, it is easy to calculate summary statistics from the posterior distributions. From a decision theoretical viewpoint, the optimum Bayes estimator under quadratic loss is the posterior mean; the posterior median is optimal if the loss functions is proportional to the absolute value of the error of estimation and the posterior mode is optimal if a step loss function is adopted.
Some caution is needed in variance component problems in genetics. For example, some genetic models dictate bounds for a particular variable. If one employs a &dquo;sire&dquo; model, the intraclass correlation (p) must lie inside the [0, 1/4J interval, because heritability is between 0 and 1. This implies that the variance ratio a!/a; is between 0 and 1/3, and that 0 ! a! :0;:; a; /3. Hence, use of truncated inverted X 2 densities in the Gibbs sampler would be more sensible in such a model. On the other hand, if one considers an &dquo;animal&dquo; model, the bounds for p and a u 2!U2 are from 0 to 1, and from 0 to oo respectively, and the variance components are unbounded. Since any &dquo;sire&dquo; model is expressible as an &dquo;animal&dquo; model, this would solve the problem mentioned above, though at some computational expense.
Gibbs sampling is computer intensive, but in some simple models, such as the sire model employed here, large data sets can be handled (eg, data set VII and fig 7). The feasibility of Gibbs sampling in large &dquo;animal&dquo; model is a subject for further research.