Heterogeneous variances in Gaussian linear mixed models

Summary - This paper reviews some problems encountered in estimating heterogeneous variances in Gaussian linear mixed models. The one-way and multiple classification cases are considered. EM-REML algorithms and Bayesian procedures are derived. A structural mixed linear model on log-variance components is also presented, which allows identification of meaningful sources of variation of heterogeneous residual and genetic components of variance and assessment of their magnitude and mode of action.


INTRODUCTION
Genetic evaluation procedures in animal breeding rely mainly on best linear unbiased prediction (BLUP) and restricted maximum likelihood (REML) estimation of parameters of Gaussian linear mixed models (Henderson, 1984). Although BLUP can accommodate heterogeneous variances (Gianola, 1986), most applications of mixed-model methodology postulate homogeneity of variance components across subclasses involved in the stratification of data. However, there is now a great deal of experimental evidence of heterogeneity of variances for important production traits of livestock (eg, milk yield and growth in cattle) both at the genetic and environmental levels (see, for example, the reviews of Garrick et al, 1989, andVisscher et al, 1991).
As shown by Hill (1984), ignoring heterogeneity of variance decreases the efficiency of genetic evaluation procedures and consequently response to selection, the importance of this phenomenon depending on assumptions made about sources and magnitude of heteroskedasticity (Garrick and Van Vleck, 1987;Visscher and Hill, 1992). Thus, making correct inferences about heteroskedastic variances is critical. To that end, appropriate estimation and testing procedures for heterogeneous variances are needed. The purpose of this paper is an attempt to describe such procedures and their principles. For pedagogical reasons, the presentation is divided into 2 parts according to whether heteroskedasticity is related to a single or to a multiple classification of factors.

Statistical model
The population is assumed to be stratified into several subpopulations (eg, herds, regions, etc) indexed by i = 1, 2, ... , I, representing a potential source of heterogeneity of variances. For the sake of simplicity, we first consider a one-way random model for variances such as where y i is the (n 2 x 1) data vector for subpopulation i, 13 is the (p x 1) vector of fixed effects with incidence matrix X i , u * is a (q x 1) vector of standardized random effects with incidence matrix Z i and e i is the (n i x 1) vector of residuals.
The usual assumptions of normality and independence are made for the distributions of the random variances u * and e i , ie u* ! N(0, A) (A positive definite matrix of coefficients of relationship) and e i N NID(O, er! 1!;) and Cov(e i , u*!) = 0 so that y 2 N N(X il 3, a u 2i Z'AZI i +0  (Patterson and Thompson, 1971;Harville, 1977) as the basic estimation procedure for heterogeneous variance components (Foulley et al, 1990).
A convenient algorithm to compute such REML estimates is the 'expectationmaximization' (EM) algorithm of Dempster et al (1977). The iterative scheme will be based on the general definition of EM (see pages 5 and 6 and formula 2.17 in Dempster et al, 1977) which can be explained as follows.
L tt' ( 1 1 1 1 )1 2 { 2} l 2 { 2} d 2 ( 2' Lettln g y = y 1 , Y 2, ... , Y i, ... , Y I , (y e 2 = 1 0 , 2 ei 1, U2 u = f o , 2 Ui and U2 = ( g2 &dquo; e 9 u 2'), the derivation of the EM algorithm for REML stems from a complete data set defined by the vector x = (y', 13 1 , U */ ) I and the corresponding likelihood function L( 6 2 ;x) = Inp(xlc¡ 2 ). In this presentation, the vector (3 is treated in a Bayesian manner as a vector of random effects with variance fixed at infinity (Dempster et al, 1977;Foulley, 1993). A frequentist interpretation of this algorithm based on error contrasts can be found in De Stefano (1994). A similar derivation was given for the homoskedastic case by Cantet (1990). As usual, the EM algorithm is an iterative one consisting of an 'expectation' (E) and of a 'maximization' (M) step. Given  Ju , is the coefficient of regression of any element of y i on the corresponding element of Zju * .
Let the system of mixed-model equations be written as and C = [ C , 3,3 C , 3 . J = g inverse of the coefficient matrix.  [8] can also be applied to the homoskedastic case by considering that there is just one subpopulation (I = 1). The resulting algorithm looks like a regression in contrast to the conventional EM whose formula (a![t+1] = E l t] (u'A-1 u)/q) where u is not standardized (u = cr!u*) is in terms of a variance. Not only do the formulae look quite different, but they also perform quite differently in terms of rounds to convergence. The conventional EM tends to do quite poorly if or » o, and (or) with little information, whereas the scaled EM is at its best in these situations. This can be demonstrated by examining a balanced paternal half-sib design (q families with progeny group size n each). This is convenient because in this case the EM algorithms can be written in terms of the between-and within-sire sums of squares and convergence performance checked for a variety of situations without simulating individual records. For this simple situation performance was fairly well predicted by the criterion R 2 = n/(n + a), where a = a2/0,2 . Figure 1 is a plot of rounds to convergence for the scaled and usual EM algorithms for an arbitrary set of values of n and a. As noted by Thompson and Meyer (1986), the usual EM performs very poorly at low R 2 , eg, n = 5 and h 2 = 4/(a + 1) = 0.25 or n = 33 and h 2 = 0.04, ie R 2 = 0.25, but very well at the other end of the spectrum: n = 285 and h 2 = 0.25 or n = 1881 and h 2 = 0.04, ie R 2 = 0.95. The performance of the scaled version is the exact opposite. Interestingly, both EM algorithms perform similarly for R 2 values typical of many animal breeding data sets (n = 30 and h 2 = 0.25, ie R 2 = 2/3).
Moreover, solutions given by the EM algorithm in [7] and [8] turn out to be within the parameter space in the homoskedastic case (see proof in the Appendix) but not necessarily in the heteroskedastic case as shown by a counter-example.

Bayesian approach
When there is little information per subpopulation (eg, herd or herdx management unit), REML estimation of Q e i and Q u z can be unreliable. This led Hill (1984) and Gianola (1986) to suggest estimates shrunken towards some common mean variance. In this respect, Gianola et al (1992) proposed a Bayesian procedure to estimate heterogeneous variance components. Their approach can be viewed as a natural extension of the EM-REML technique described previously. The parameters ol2 ei and o, U , 2 are assumed to be independently and identically distributed random variables with scaled inverted chi-square density functions, the parameters of which are s2,, q , e and su, r!! respectively. The parameters se and s! are location parameters of the prior distributions of variance components, and TIe and 7 7 ,, (degrees of belief) are quantities related to the squared coefficient of variation (cv) of true variances by q e = (2/cve ) + 4 and q u = (2/cufl) + 4 respectively: Moreover, let us assume as in Searle et al (1992, page 99), that the priors for residual and u-components are assumed independent so that p (,72i, U2i) = p(,71i)p(0,2i). The Q @ ( 0' 2 1 O' 2 [ t ]) function to maximize in order to produce the posterior mode of o-2 is now (Dempster et al, 1977, page [8] shows how prior information modifies data information (see also tables I and II). In particular when TJe (TJ u ) = 0 (absence of knowledge on prior variances), formulae [13b] and [14] are very similar to the EM-REML formulae. They would have been exactly the same if we had considered the posterior mode of log-variances instead of variances, !7e and !7.! replacing 17 , + 2 and !7! + 2 respectively in !11!, and, consequently also in the denominator of [13b] and !14!. In contrast, if !7e(!/u) ---> 00 (no variation among variances), estimates tend to the location parameters s!(s!).

Extension to several u-components
The EM-REML equations can easily be extended to the case of a linear mixed model including several independent u-components (uj; j = 1, 2, ... , J), ie In that case, it can be shown that formula [7] is replaced by the linear system The formula in [8] for the residual components of variance remains the same. This algorithm can be extended to non-independent u-factors. As in a sire, maternal grand sire model, it is assumed that correlated factors j and k are such that Var(u*) = Var(u*) = A, and Cov(uj, u!/) = p jk A with dim(u!) = m for all j. Let a 2 = (or2&dquo; or2&dquo; p') with p = vech(S2), S2 being a (m x m) correlation matrix with p jk as element jk. The Q#(êT2IêT 2[t] ) function to maximize can be written here as where The first term Q7 (u! ] 8&dquo;!°! ) = ErJ[lnp(yll3,u*,(J'!)] has the same form as with the case of independence except that the expectation must taken with respect to the distribution of (3, u * Iy, Õ ' 2 = Õ' 2

[t] . The second term Q!(plÕ'2[t]) = E l c ' ] ) [In p(u * 1 & 2 )]
can be expressed as The maximization of Q#(¡ 2 1(¡ 2[t] ) with respect to 6 2 can be carried out in 2 stages: i) maximization of Qr(¡ 2 1(¡2 [t] ) with respect to the vector !2 of variance components which can be solved as above; and ii) maximization of Q#(p 1&211,) with respect to the vector of correlation coefficients p which can be performed via a Newton-Raphson algorithm.

THE MULTIPLE-WAY CLASSIFICATION
The structural model on log-variances Let us assume as above that the a2s (u and e types) are a priori independently distributed as inverted chi-square random variables with parameters 5! (location) and ri z (degrees of belief), such that the density function can be written as: where r( x ) is the gamma function.
From !19), one can alternatively consider the density of the log-variance 1n Q2 , I or more interestingly that of v z = ln(a2/s2). In addition, it can be assumed that 7 1i = ! for all i, and that lns2 can be decomposed as a linear combination p'S of some vector 5 or explanatory variables (p' being a row vector of incidence), such that with For v i --> 0, the kernel of the distribution in [21] tends towards exp( -r¡v'f /4), thus leading to the following normal approximation where the variance a priori (!) of true variances is inversely proportional to q (! = 2/?!), ! also being interpretable as the square coefficient of variation of log-variances. This approximation turns out to be excellent for most situations encountered in practice (cv ! 0.50). Formulae [20] and [21] can be naturally extended to several independent classifications in v = (v!, v2, ... , vj, ... , v!)' such that with where K j = dim(v j ) and J1 = (t,', v')' is the vector of dispersion parameters and C' = (p!, q ') is the corresponding row vector of incidence.
This presentation allows us to mimick a mixed linear model structure with fixed 5 and random v effects on log-variances, similar to what is done on cell means (vt i = x!13 + z'u = t!0), and thus justifies the choice of the log as the link function (Leonard, 1975;Denis, 1983;Aitkin, 1987;Nair and Pregibon, 1988) to use for this generalized linear mixed-model approach.  Testing procedure The procedure described previously reduces to REML estimation of J1 when flat priors are assumed for v! and v,, or equivalently when the structural model for log-variances is a purely fixed model. This property allows derivation of testing procedures to identify significant sources of heteroskedasticity. This can be achieved via the likelihood ratio test (LRT) as proposed by Foulley et al (1990Foulley et al ( , 1992  has an asymptotic chi-square distribution with r degrees of freedom; r is the difference in the number of estimable parameters involved in A and Il o , eg, r = rank(K) when H o is defined by K'71 = 0, K' being a full row rank matrix.
Fortunately, for a Gaussian linear model, such as the one considered here y 2 N N(Xil3, a!iZ!AZi + U2i j&dquo; i ), L(À; y), can be easily expressed as

CONCLUSION
This paper is an attempt to synthesize the current state of research in the field of statistical analysis of heterogeneous variances arising in mixed-model methodology and in its application to animal breeding. For pedagogical reasons, the paper successively addressed the cases of one-way and multiple-way classification. In any case, the estimation procedures chosen were REML and its natural Bayesian extension, the posterior mode using inverted gamma prior distributions. For a single classification, simple formulae for computing REML and Bayes estimations were derived using the theory of EM. Emphasis was placed on this algorithm but other alternatives could be considered, eg, ECME (Liu and Rubin, 1994), AI-REML (Johnson and Thompson, 1994) and DF-REML (Meyer, 1989). For multiple classifications, the key idea underlying the structural approach is to render models as parsimonious as possible, which is especially critical with the large data sets used in genetic evaluation having a large number of subclasses. Consequently, it was shown how heterogeneity of log-residual and u-components of variance can be described with a mixed linear model structure. This makes the corresponding estimation procedures a natural extension of what has been done for decades by BLUP techniques on subclass means.
An important feature of this procedure is its ability to assess via likelihood ratio tests the effects of potential sources of heterogeneity considered either marginally or jointly. Although computationally demanding, these procedures have begun to be applied. San Cristobal et al (1993) applied these procedures to the analysis of muscle development scores at weaning of 8 575 progeny of 142 sires in the Maine Anjou cattle breed where heroskedasticity was found both for the sire and residual components.  and DeStefano (1994) also used the structural approach for sire and residual variances to assess sources of heterogeneity in withinherd variances of milk and fat records in Holstein. Herd size and within-herd means were associated with significant increases in residual variances as well as various management factors (eg, milking system). Approximations for the to estimation of within region-herd-year-parity phenotypic variances were also proposed for dairy cattle evaluation by Wiggans and Van Raden (1991) and  . These techniques also open new prospects of research in different fields of quantitative genetics, eg, analyses of genotype x environment interactions Robert et al, 1994), testing procedures for genetic parameters (Robert et al, 1995ab), crossbreeding experiments and QTL detection. However, research is still needed to improve the methodology and efficiency of algorithms. ACKNOWLEDGMENTS This paper draws from material presented at the 5th World Congress of Genetics Applied to Livestock Production, Guelph, August 7-12, 1994. Special thanks are expressed to AL DeStefano, V Ducrocq, RL Fernando, D Gianola, D H6bert, CR Henderson, S Im, C Robert, M Gaudy-San Cristobal and K Weigel for their valuable contribution to the discussion and to the publication of papers on this subject. The assistance of C Robert in the computations of the numerical example is also greatly acknowledged.