ECM approaches to heteroskedastic mixed models with constant variance ratios

Cet article presente des techniques d'estimation des parametres intervenant dans des modeles mixtes ayant des rapports de variance constants et des variances residuelles decrites par un modele lineaire de leurs logarithmes. Les parametres de dispersion sont estimes par le maximum de vraisemblance classique (ML) et restreint (REML). Les equations a resoudre pour obtenir ces estimations sont etablies a partir de l'algorithme d'esperance-maximisation conditionnelle (ECM) et d'une version simplifiee dite du gradient ECM. Des approches directe et indirecte sont proposees, cette derniere conduisant a un test d'hypothese sur le rapport de variances. La theorie est illustree par l'analyse numerique d'un petit exemple.


INTRODUCTION
Heteroskedasticity has recently generated much interest in quantitative genetics and animal breeding. To begin with, there is now a large amount of experimental evidence of heterogeneous variances for most important livestock production traits (Garrick et al, 1989;Visscher et al, 1991;Visscher and Hill, 1992). Second, major theoretical and applied work has been carried out for estimating and testing sources of heterogeneous variances arising in univariate mixed models (Foulley et al, 1990;Gianola et al, 1992;DeStefano, 1994;Foulley and Quaas, 1995). For many reasons (accuracy of estimation, ease of handling large data sets), a major objective in this area lies in making models as parsimonious as possible.
This can be accomplished in at least two ways: i) by modelling variances in the case of potentially numerous sources of heteroskedasticity, and ii) by assuming that some functions of those parameters (eg, intra-class correlation or heritability) are constant. The first aspect corresponds to the so-called structural approach in which the heterogeneity of the log components of variances is described via a linear model structure similar to that used for means 1990San Cristobal, 1993). Restrictions as in ii) were considered by Meuwissen et al (1996) and Robert et al (1995a,b). Meuwissen et al (1996) introduced a multiplicative mixed model to estimate breeding values and heteroskedasticity factors assuming heritability (h 2 ) constant across herd-years. Robert et al (1995a,b) developed estimation and testing procedures for homogeneity of heritability within and/or genetic correlations across environments. But Meuwissen's study postulates known h 2 and Robert's research applies to only a single classification of heteroskedasticity. The purpose of this paper is to propose a complete inference approach for parameters having both features i) and ii), ie, for continuous data described by mixed models with constant variance ratios and heteroskedasticity analyzed via a structural approach. For simplicity, the theory will be presented using a oneway random mixed model for data and afterwards it will be generalized to several u-components. Inference is based on likelihood procedures (REML and ML) and estimating equations derived from the expectation-maximization (EM) theory, more precisely the expectation/conditional maximization (ECM) algorithm recently introduced by Meng and Rubin (1993).

Statistical model
As usual, it is assumed that the population can be structured into strata (i = 1, 2, ....,1) corresponding to potential factors of heterogeneity. Let the one-way random model be written as: where y i is the (n 2 x 1) data vector for stratum i; j3 is a (p x 1) vector of unknown fixed effects with incidence matrix X i , and e i is the (n i x 1) vector of residuals. The contribution of random effects is expressed as in Foulley and Quaas (1995) as O&dquo;uiZiU' where u * is a (q x 1) vector of standardized deviations, Z i is the corresponding incidence matrix and a u , is the square root of the u-component of variance the value of which depends on stratum i. Classical assumptions are made for the distributions of u * and e i , ie, u * N N(0, A), e i N N(0, ae.In! ), and The notation in [1] is unusual as compared to that used in the statistical literature on mixed effects (eg, Laird et al, 1987). There are practical motivations for such an expression of the random part especially in animal breeding. For instance the between sire variance may vary according to the environment in which the progeny of the sires are raised. Note also that ( J Ui can be viewed as a regression coefficient of any element of y i on the corresponding element of Z i u * . Thus, in animal breeding, a u , acts as a scaling factor of a vector u * of standardized sire values on which, for instance, selection can be based. A structure is hypothesized on the residual variance so as to model the influence of factors causing heteroskedasticity. This is carried out along the lines presented in Foulley et al (1990) via a linear regression on log-variances: where 5 is an unknown (r x 1) real-valued vector of parameters and p' is the corresponding (1 x r) row incidence vector of qualitative or continuous covariates.
Furthermore, the assumption of a constant intra-class correlation (or heritability) implies setting

EM-REML estimation
Use is made here of the EM algorithm of Dempster et al (1977) to compute REML estimates of parameters involved in variance components (Patterson and Thompson, 1971;Searle et al, 1992). The basic procedure proposed by Foulley and Quaas (1995) is applied here after some adjustment of the M-step taking advantage of the ECM algorithm of Meng and Rubin (1993).
the ECM algorithm is based on a complete data set defined by x = (0', u * ', e')' and its log-likelihood L(y; x). The iterative process takes place as follows. The E-step is defined as usual, ie, at iteration [t], calculate the conditional expectation of L(y; x) given the data y and y = y!t! which, as shown in Foulley and Quaas (1995), reduces to where E!t] (.) is a condensed notation for a conditional expectation taken with respect to the distribution of x!y, y = -yf t l.
Since the parameters to be estimated are heterogeneous, the estimating equations are derived at the maximization stage from a slightly different version of the EM algorithm, the so-called ECM algorithm. As explained in detail in Meng and Rubin (1993), a CM stage replaces the M-step by a sequence of several conditional maximization steps. This is basically the same principle as that employed in a cyclic ascent maximization procedure (Zangwill, 1969). We suggest here the following procedure: -maximize Q over y to get 6 [tH] with T set at its last value T [t] , ie then, maximize Q over T to obtain T!'+'l with 5 in y of Q( 1 ' I 1'[ t ]) set to 5!!!, ie, Thus, the maximization step consists of two CM-steps within the same E-step in order to reduce the need to compute the conditional expectation of eie i , and its components more than once. The algebra of differentiation is given in Appendix A.
The iterative system for computing formulae 5 can be written as with the elements of the right-hand side being Note that for this algorithm to be a true ECM, one would have to iterate the NR algorithm in [7] within an inner cycle (index £) until convergence to the conditional maximizer y[ tH] = yl','] at each M-step [t]. In practice it may be advantageous to reduce the number of inner iterations, even up to only one, ie, by solving just once However, caution should be exercised when applying such a hybrid algorithm that no longer guarantees the monotonic convergence in likelihood values (Lange, 1995).
The formula to update T reduces to mimicking the form of a scaled regression coefficient pooled over strata.
The elements to compute at the E-step can be expressed as functions of the sums X'yi, Z'yi, the sums of squares yiyi within strata, and GLS-BLUP solutions of Henderson's mixed model equations and of their accuracy (Henderson, 1984) Quaas (1995). This means that the computations can be implemented with very little change in the code used previously. True or gradient EM could also have been applied (see Appendix A). The advantage of ECM will be more substantial for the next situations considered, and especially in the case of the indirect approach.
Extension to several u-components Formulae (7!, [8ab] and [9] can easily be generalized to a mixed model including several (k = 1, 2,..., K) independent u-components with Tk = a Uik /a ei constant over strata i. Letting y = (b', T ')' as previously but now with T = I Tk being a vector of ratios of standard deviations, the Q function to be maximized has the same form as in [4] with ei expressed from !13!. One can perform the CM-steps using either i) the sequence 6, 'r l , T 2 ....
I Tk , ---, T K , ie, each Tk one by one, the remaining ones being held constant, or ii) the sequence /5, and T as a whole with all the Tk s maximized jointly. In both cases, the algorithm for computing 5 is formally the same as in [7] with only a slight change in the definition of the elements of W bb , v b being unchanged If the conditional maximization of the T k s takes place one by one (case i), formula [9] still applies for each of them. Otherwise (case ii), one has to solve the following system:

An indirect approach
The original model with a constant T ratio specified in [1-3] can be viewed as a special case of a more general model with, as previously, fno, 2p §5, but also with a linear structure on log-ratios involving either the same (h i = p i ) or possibly different covariates. Letting y = (6', 71')' here, the sequence of the CM-steps are The algorithm for S is the same as in [7]. The algebra for A is shown in the Appendix, and leads to a system that can be written under a similar form as that of 6 1--J For practical reasons, one may also wish to limit the number of inner iterations (index £) even to only one in order to reduce the volume of computation but the application of this ECM gradient algorithm should be performed carefully. Further empirical simplifications for the elements of [22] can be proposed along the same lines as in Foulley et al (1990).
Again, these results can be extended to a model with several random independent factors (k = 1, 2,..., K) by setting Actually, if the CM-steps are performed for each vector 71 k separately, the same formulae as in [20], [21] and [22] apply: just replace Ti , Z i , u * by Ti , k , Zi k , uk and ML estimation It may be interesting in some instances to use ML rather than REML for estimating variance components (see Discussion). The ECM procedure developed in this paper can be easily adapted to obtain ML parameter estimates. 13 is now part of the parameter vector instead of being a vector of random effects with infinite variance included in missing data. The Q function to be maximized has the same formal expression as in [4] but here at the E-step, expectations have to be taken with respect to the distribution of u * given y, y = y!t!, and 13 = 13 [ ' I . Maximization with respect to 13 can be based on the equation <9Q/<9j3 = 0, ie One can proceed as previously, ie, run two CM-steps for the dispersion parameters based on the same E-step so as to obtain 6!t+ and T ] t+1 ] (or !ft+1]), and then perform an additional CM-step for computing ¡3 [t+l] based on !23!, ie l &dquo;&dquo; ' -!J Alternatively, it may be advantageous to perform the CM-step for j3 and the next E-step jointly by solving Henderson where (M §) ) k£ is the block pertaining to random factors k and in the inverse of the random part of the coefficient matrix.

Numerical example
The procedures presented in this paper are illustrated with a small data set obtained from simulation. Data were generated according to a cross-classified model having two (environmental) fixed factors (A = 2 levels; B = 3 levels) and one (genetic) random factor (S = 9 levels). The genetic contribution consists of sire and maternal grand sire effects, the latter being assumed to have half the value of the first one.
The model to generate the records was where p is a general mean, a i the effect of environmental factor A (i = 1, 2), b! the effect of environmental factor B (j = 1, 2, 3), s* the standardized contribution of male k as a sire, and 1/2se the standardized contribution of male as a maternal grand sire, and eZ!w&dquo;, the residual term.
Values chosen for the fixed effects were (using a full-rank parameterization): ¡. .t +al +b1 = 100; az-o n = 20; b2 -b , = -10; b3 -bl = -20. The vector s * = fs * kl } of sire effects is assumed to be N(0, A) with elements of the relationship matrix A shown at the bottom of table I.
There were 267 observations distributed among 18 different AB x sire x maternal grand sire subclasses. The data structure is displayed in table I as well as cell size (n), sum (£ y) and sum of squares (¿ y 2 ) in each suclass.
Tests of hypotheses about the location parameters {3, the residual dispersion parameters 5 and the ratios r ij were carried out via the likelihood ratio statistic as described in previous studies (Foulley et al, 1990San Cristobal et al, 1993;Meyer et al, 1993;Foulley and Quaas, 1995). Formulae by Quaas (1992) were used to compute maximized likelihood functions (Ln, aX ). Results can be arranged as an analysis of variance (or deviance) table: see  table II for hypothesis testing about {3, and table III for residual (b) and ratio (A) parameters. Note also that the test statistic for 13 relies on -2L n ,aX evaluated from the ML estimates of all parameters, whereas a maximized residual likelihood can be better employed for 5 and 7!.
Interaction effects on location parameters are constantly rejected under different assumptions for the other parameters. The hypothesis of residual variance homogeneity is strongly rejected as well as single factor descriptions of heterogenity. The assumption of a constant ratio T turns out to be a reasonable one. The test results eventually agree with the simulation model; they support the practical conclusion that the p + A + B model is the most appropriate to account for variation both in location and in log-residual variances, the ratio T being constant.
The estimation procedure for l5 and T (or J!) is illustrated in table IV for this model and an alternative one using both standard and residual maximum likelihood methods of estimation. ML and REML estimates of residual variances do not differ very much; on the contrary, the ML estimates of the ratio T turns out to be, as expected, lower than the REML ones, the values of the latter being close to the true value.

DISCUSSION AND CONCLUSION
The main purpose of this paper was to extend the general structural approach to heteroskedasticity in mixed models proposed by Foulley et al (1990Foulley et al ( , 1992 to the case of homogeneous ratios of u to e variance components. In a sire by environment interaction, this is equivalent to postulating homogeneous intra-class correlations or heritabilities. This seems to be a reasonable assumption in practice, or at least serves as a suitable compromise between the existence of heteroskedasticity and parsimony of models. Less restrictive assumptions might also be investigated (Quaas, 1995, pers comm). This paper also provides a generalization of LR tests of this assumption to unbalanced data and complex model structures: see the previous work of Visscher (1992) on a one-way random balanced design, and that by Robert et al (1995a,b) for heterogeneous variances due to a single classification. The EM algorithm turns out to be a convenient and powerful tool for solving variance component estimation problems. The ECM algorithm allows us to simplify the estimating equations, in particular the ECM gradient version. The advantage of this algorithm was especially clear here in the case of the indirect approach.
A few examples of this for the mixed model have been already mentioned (Meng and Rubin, 1993 example 1; Walker, 1996). It offers great flexibility in defining the sequence of the conditional maximization steps, all the alternatives of which have not been investigated here. In the case addressed in this paper, the basic statistics generated by the EM algorithm are strikingly natural (see Appendix B) thus giving a flavour of simplicity to the whole procedure.
It also makes it possible to switch from REML to ML or vice versa with very little change in implementing computations (Foulley et al, 1994). Some authors such as Leonard (1975), Denis (1983) and Anderson (1984) in statistics and Shaw (1987) Gianola et al, 1992;San Cristobal et al, 1993;Foulley and Quaas, 1995). In addition, as already pointed out by Denis et al (1996), a LR test about (3 requires 5 and T (or 71) being the same for the null hypothesis and its alternative; the same rule holds in hypothesis testing about 5 (or 71) by keeping the other parameters the same over the models to be compared. This is part of the general and difficult problem of joint modelling of means and variances, which is related to such issues as the Behrens-Fisher problem and multi-stage hypothesis testing, and which needs further consideration.
Another area that deserves caution and further development is that of estimability. Difficulties are expected when all the cells contributing to an element j of 5 or A (or to a linear combination of them) have a weight tending towards zero. This may arise due to i) purely overparameterization problems, or due to ii) parameter values becoming extreme (eg, ratios Ti tending to zero implying elements of 71 being infinite). This last phenomenon is similar to what happens in the analysis of binary and ordinal data with latent variable models (Misztal et al, 1989;Fahrmeir and Tutz, 1994). Such difficulties can be avoided by reparameterization (i), or by setting lower bounds to the diagonal elements of the system of equations to solve (ii).
Finally, asymptotic accuracy can also be worked out numerically within the EM framework via the so-called SEM algorithm (Meng and Rubin, 1991). Although it only requires the code of the complete data variance covariance matrix and of the EM or ECM outputs, the burden of calculations is then heavier, so that we suggest that it is restricted to the simplest models. Other computing alternatives should also be considered, eg, the average information-restricted maximum likelihood algorithm (AI-REML) of Gilmour et al (1995).