A Monte-Carlo algorithm for maximum likelihood estimation of variance components

Un nouvel algorithme pour resoudre le maximum de vraisemblance de composantes de variance est presente. Cet algorithme traite d'abord les effets aleatoires comme des effets fixes, puis exprime ces pseudo-effets fixes sous la forme de transformations lineaires d'un ensemble de variables normales centrees reduites. Celles-ci sont ensuite eliminees par integration a l'aide d'un processus numerique de Monte-Carlo. Un algorithme iteratif est employe pour estimer l'ecart type (et non la variance) des effets aleatoires. Cette methode est simple conceptuellement et facile a programmer parce que des inversions de la matrice de variance-covariance des donnees repetees a chaque iteration ne sont plus necessaires. La methode peut etre utile pour traiter de grands ensembles de donnees et des donnees qui ne sont pas distribuees normalement.


INTRODUCTION
Estimates of variances and covariances have been used extensively in animal breeding (Henderson, 1986). Recently, much attention has been paid to natural populations (Guo and Thompson, 1991). A thorough knowledge of genetic variances and covariances is useful in determining genetic variability of a population, interpreting the genetic mechanism of quantitative traits and estimating heritabilities and genetic correlation of quantitative traits. Those genetic parameters are necessary in planning breeding programs, constructing selection indices, estimating breeding values of candidate breeders and predicting selection responses (Henderson, 1986).
Various methods for estimation of variance components have been developed. A general review can be found in Henderson (1984) and Searle (1989). Among these, the maximum likelihood (ML) of Hartley and Rao (1967) and restricted maximum likelihood (REML) of Patterson and Thompson (1971) are the most popular methods. With the ML-related methods, large computer resources and CPU times are required due to repeatedly inverting the variance-covariance matrix of the data.
The derivative-free algorithm for REML (DF-REML) has been suggested by Graser et al (1987) where matrix inversion is not required, rather, Gaussian elimination is used (Smith and Graser, 1986). With the advent of efficient computer programs for ML and REML estimation of variance components such as the DF REML program of Meyer (1988), large data set (N > 100 000) can be handled by utilizing the sparse matrix technique (see Misztal, 1994;Kriese et al, 1994 for up-to-date REML programs).
Recently, Guo and Thompson (1991) developed a Monte-Carlo expectationmaximization (EM) method for variance component estimation which uses jointly the EM algorithm (Dempster et al, 1977) and the Gibbs sampler (Geman and Geman, 1984). Their method has avoided repeated inversion of the variancecovariance matrix.
An alternative algorithm for solving ML and REML solutions, similar to Guo and Thompson (1991), is reported in this paper. The new method first treats random effects as fixed and then integrates out these pseudo-fixed effects via Monte-Carlo simulation. In the case where data are normally distributed, there is an explicit form of the multiple integral, but the explicit form involves inverse of the variance-covariance matrix. Instead of using the explicit form of the integral, the integration is carried out via Monte-Carlo simulation. With this algorithm, the standard deviation, rather than the variance, of the random effects is estimated using an iterative algorithm similar to the EM algorithm (Dempster et al, 1977). This method does not require updating the variance-covariance matrix, thus may need less computer memory than other methods. As a result, it may potentially handle large dimensional data sets. For data that are not normally distributed, an explicit form of the multiple integral is not available, thus the Monte-Carlo method may be the only appropriate way to solve the problem. In this paper application of the Monte-Carlo method to normal data for ML (or REML) estimation is reported.
The result serves as a necessary step approaching the application the the new method to non-normal data.

THEORY AND METHODS
The mixed model We will use the simplest mixed model with one class of random effects to demonstrate the new algorithm. The model is shown below: where y n x 1 vector of observations or data, b p x 1 vector of fixed effects, u q x 1 vector of random effects with N(0, IQu), e n x 1 vector of residuals with N(0, I Q e ), X n x p known incidence matrix for the fixed effects with a rank r, Z n x q known incidence matrix for the random effects, usually with full column rank.
It is assumed that E(yle) = Xb and Var(ylO) = V = ZZ T U2 -! Iu d , where e = [bo, 2 a2] T denotes the unknown parameters and the superscript T stands for matrix transposition. In the genral framework of animal breeding, u may represent sire effects. If every progeny within each sire has a different dam from each other, the variance among sires, Q u, will account for a quarter of the additive genetic variance and ae will contain three-quarters of the additive genetic variance plus the variance solely due to environmental effects (deviation of phenotype from genotype).
The likelihood function of the mixed model is proportional to Note that this likelihood function involves inverting the variance-covariance matrix of the data (V-1 ).
Conditional on the random effects (u), equation [1] is a fixed-effect model so that E(y!eu) = Xb + Zu and Var(y!eu) = lor,2,. Furthermore, u can be obtained by linear transformation of a set of standard normal deviates, ie, u = so u , where s -N QX 1 (0, I). Thus, conditional on s, the fixed model is reformulated as With such a fixed model, the conditional likelihood function is proportional to Clearly, matrix inversion is not involved here. However, s is a vector of random variables which must be integrated out to obtain unconditional estimates of e = [b,0 u and u,2]. Note that ufl in 0 is now replaced by Q &dquo;.

The marginal likelihood function
This likelihood function has the form whose explicit form [5] is equivalent to [2]. The reason to reformulate [2] as [5] is that equation [5] can be approximated by Monte-Carlo simulation, as suggested by Fahrmeir and Tutz (1994). Because the distribution of s is completely known (standard normal), we can generate M sets of random normal deviates and denote the ith set by s i . Then the marginal likelihood function can be approximated by where f (y!0si) is given in [4] with s substituted by s i , and it is an exponential function not the logarithm. Hereafter, M is called the length of the Monte-Carlo simulation. We can see that as Moo,<y(y!8) &mdash;! f(ylO). Therefore, maximum likelihood estimators can be obtained by maximizing g(y!0) instead of f(ylo) provided that M is large. More importantly, when <7(y!0) is used, the maximum likelihood solution of the parameters can be easily solved via an iteratively reweighted least squares scheme.
The iterative algorithm When g(yIO) is used, the vector of parameters becomes 0 = [b T 0' u 0' e IIT namely <7u is estimated. As usual, the maximum likelihood solution of 0 is found by maximizing Let us first define the following partial derivatives: The maximum likelihood estimators (MLE S ) are obtained by setting Dropping the constant -1/2 Q e in the above equations and defining we have the following ML equations: Unfortunately, the weight p i is a function of the unknown parameters, therefore, an iterative scheme must be employed. We can see that the iterative scheme is essentially an iteratively reweighted least squares approach. The iteration takes the following steps: Step 1: set up initials for b, o u and Q e; Step 2: evaluate p i ; Step 3: solve for b, Q &dquo; and Q e using [7] and [8]; Step 4: update b, Q &dquo; and ae which completes one cycle of iteration; Step 5: repeat Steps 2-4 until convergence.
The MLE of u2 is simply the square of the MLE of a u due to the invariance property of the ML method (DeGroot, 1986). The iterative algorithm does not require inversion of matrix V; rather, it only requires storing the following quantities: These quantities do not involve the unknown parameters, and thus do not need to be updated; rather, they can be calculated after the random normal deviates are generated and before the iteration is invoked. In addition, if the starting value of u u is positive, the solution to u u remains positive at each round of iteration. If O'u starts at a negative value, it remains negative subsequently, but its square is still a valid ML estimate of Q u.
Note that p i is the posterior probability density whose denominator is constant across i, and thus can be dropped without altering the solutions. For computational conveneince, p i is redefined as p i = f(yl as j ) hereafter. Furthermore, with a large data set, p i will be very close to zero and may cause the method to fail due to numerical problems. This can be circumvented by multiplying p i by the exponential of a very large positive number whose magnitude is comparable to A candidate of such a number may be where or2 e (O) ) b( o ) and < 7u ( o ) are chosen such that they are close to the true parametric values, and s( o) is an arbitrary Monte-Carlo realization of vector s. The modified p i has the following form: It should be warned that the value of c stays the same throughout the iterations and should not be updated.

The REML
The Monte-Carlo algorithm also works for the REML estimation of variance components. As in the ML analysis described earlier, we first treat the random effects as pseudo-fixed effects, then express the model by a linear transformation of the original data, ie, where s is a q x 1 vector of standard random normal deviates generated via Monte-Carlo simulation, K is a known matrix with n rows and n -r columns. Matrix K is chosen such that K T X = 0 (Patterson and Thompson, 1971). One such a choice is given by Harville (1977) as any n -r independent columns of matrix I -X(X T X)X T . After the linear transformation, the model does not depend on the fixed effects, b. Thus, the vector of unknown parameters becomes 0 = [uuud]!. Conditional on s, the expectation and variance of K T y are E(KT ylÐ s) = K!Zso-u and Var(K T yIÐs) = KTK!e, respectively, Thus, the conditional likelihood function is proportional to The marginal likelihood function is similarly approximated via Monte-Carlo simulation and has the following form where f (KTy!9 Si ) is given by [11] with s replaced by s i . Setting partial derivatives of L = log!g(KTy!9)! with respect to 0 equal to zero, we have where is again the posterior probability density. Iteration is required because the weight p i is a function of the unknown parameters. Note that the quantities to be stored are {sTZTK(KTK)-lKTy!Mx1 and f sTZTK(KTK)-lKTZsi!Mx1 which do not involve the unknown parameters, hence updating is not necessary.

The animal model
Under an individual animal model, the genetic value of each animal is included. It has the form where u is an n x 1 vector of additive genetic values for all animals. There is no Z matrix here and u are correlated through the additive relationship matrix (matrix A), namely, Var(u) = Aufl and ufl is the additive genetic variance. Let Z = A1!2, the lower Choleskey decomposition of A. Now the fixed model version of [15] becomes This equation is exactly the same as [3] except that Z is an n x n lower triangular matrix and s is an n x 1 vector of standard normal deviates generated via Monte-Carlo simulation. Therefore, the Monte-Carlo algorithm works equally well with an animal model. Note that Var(Zso u ) = ZVar(s)Z T Qu = ZZ T Q u = A ufl because ZZ T = A and Var(s) = I. The likelihood function for an animal model can be formulated either as [6] for ML estimation or as [12] for REML estimation. With an animal model, the typical size of a data set could be very large, making factorization of A into Z very expensive. In addition, matrix A is dense and storing it may not be feasible. Here, we introduce a new algorithm that can avoid these difficulties.
Let u = aau, then y is equation [15] can be remodeled as where a is an n x 1 vector with N(0, A) distribution. The vector a can be simulated as follows: for founder j, a j is sampled from an N(0,1) distribution; for non-founder j, a j = (a m + a f )/2 + 'Yj , where m and f are the parents of j and -y j is sampled from an N(0, 1/2) distribution. We have now avoided the use of matrix factorization by directly generating vector a (instead of s) and subsequently replacing Zs by a.

AN ILLUSTRATION
To demonstrate the Monte-Carlo algorithm, data originally used by Cunningham and Henderson (1968) were reanalyzed. There were 18 observations classified into two treatments (fixed) and three blocks (random). No correlation between blocks was assumed. The data was described by model [1]. The estimates of ( 7 ; and or 2 were given by Patterson and Thompson (1971), which are ML: Q e = 2.3518 and a2 = 2.5051; REML: % d = 2.5185 and a2 = 3.9585. The length of the Monte-Carlo algorithm varied from 10 to 31623, which have a log io scale ranging from 1.0 to 4.5 incremented by 0.5. The experiment was replicated 20 times. Results of the ML estimates are plotted against the length in log lo scale as shown in figures 1 and 2. When log io (M) = 3.5, the MLE of (7! stabilized at the true maximum likelihood estimate (fig 1), while the estimate of or took only log lo (M) = 3.0 to stabilize (fig 2). From these two figures we also observed that when log lo (M) < 3.0 the Monte-Carlo estimation tends to be biased (downward).
The standard deviation among the 20 replicates are plotted against the length as shown in figure 3. Similar results were also observed for the REML estimates (see figs 4-6). Downward baseness may be a general property of the Monte-Carlo method for small M because biases in both ML and REML estimates are downward. Further investigation is necessary to quantify the expected bias for small M. In conclusion, the Monte-Carlo algorithm does converge to the true ML or REML estimates for both (7! and u 2 e. Estimate of ( 7 ; converges more quickly than that of (7!. For this particular data set, a length of M = 1000-5 000 seemed to be sufficient. M ! 5 000 may also serve as a guideline for other data sets (see Discussion).

DISCUSSION
The major advantages of the Monte-Carlo algorithm presented in this paper rely on its conceptual simplicity and easy programming. There are two strategies to implement the Monte-Carlo algorithm. One is to generate the random numbers before invoking the iteration. In ML analysis, this requires storing the following quantities: {s;Z T ZS í } MX1 , { S ;Z T Y } MX1 and {s!Z!X}Mxt-The number of storage units required is jk + 2) x M. In REML analysis, the quantities to be stored are {s;Z T K(K T K)-l K Y } MX1 and {s;ZTK(KTK)-lKTZsdMX1' where the number of storage units is 2M. Although REML needs less computer memory than ML, additional CPU time is required for evaluating K(K T K)-1 K T . Once these quantities are generated, however, ML takes more CPU time than REML for iteration because ML needs repeated evaluation of equation [7] which is not trivial for a large number of fixed effects. The second strategy is to generate the random numbers as needed using a reseed with the same number strategy in the pseudo-random number generator. This would have a substantial CPU load but does not require storing those quantities needed by the first strategy. In either case, generating the quantities such as sTZ T K(K T K)-l K T Zs i must resort to special techniques for large data sets (eg, Perez-Enciso et al, 1994). When applied to large data sets, the Monte-Calro algorithm is powerful for ML analysis, but inefficient for REML analysis because it involves matrix K(K T K)-1 K T which looks formidable. The K matrix may be avoided by absortion of fixed effects using Guassian elimination, but it is still expensive for more than one fixed effect. Further investigation is necessary to implement this algorithm to REML analysis.
Using the first strategy of Monte-Carlo simulation, matrix factorization and inversion (the latter is required in REML) are done only once, which makes the first strategy more desirable than the second. For a reasonable number of levels of fixed effects (eg, k = 10), the number of storage units required is solely determined by M. For example, if M = 5 000, the number of storage units will be (10 + 2) x 5 000 for ML and 2 x 5 000 for REML. Such a number of storage units may be easily handled by standard PCs. The question becomes whether M = 5 000 is sufficient or not. The example previously described shows that M = 5 000 is sufficient. This number may also serve as a general guideline because it does not depend on the size of the data set. Fahrmeir and Tutz (1994) state that the computing load of Monte-Carlo multiple integration increases linearly with the dimension of the integral.
Here, the computing load is not M but M x q (the total number of random normal deviates generated), where q is the dimension of u. We can see that M x q is already proportional to q. Thus, we will not expect M to increase with q. We have demonstrated the Monte-Carlo algorithm under the situation of one class of random effects. Extension to more classes of random effects is straightforward. Suppose that the mixed model has the form where W is a known matrix and v is another vector of random effects with v N N(O,lov2). Here we assume Cov(u, v T ) = 0. Conditional on two independent sets of random normal deviates, s and z, the fixed model equivalence of the above model is To obtain the marginal likelihood function, one simply generates M sets of s i and z i to approximate the multiple integral. A similar algorithm can be employed to simultaneously estimate b, Q &dquo;, Q ,, and ae 2 The Monte-Carlo algorithm presented in this paper should be considered as an alternative method for estimation of variance components. Similar to the Monte-Carlo method of Guo and Thompson (1991), the method proposed here is ineffective for estimating the likelihood under mixed inheritance. In addition, large pedigrees (say 1000 members) would be needed to demonstrate its effectiveness under polygenic inheritance. Furthermore, with a normal distribution, the new method perhaps does not offer too much advantage over the existing algorithms that use the sparse matrix technique. Therefore, by no means should the Monte-Carlo method replace those conventional algorithms. For data that are not normally distributed, however, the Monte-Carlo algorithms is perhaps the only convenient way to solve the problem. Many economically important traits of animal species are not normally distributed, such as binary disease resistant traits or ordinal categorical traits. When analyzing such traits, one usually assumes that there is a continuous latent variable (liability) controlling the phenotype. The link between the liability and the discrete phenotype may be described by Wright's (1934) physiological threshold model. To estimate the genetic variance of the liability and the thresholds, one can easily establish a fixed model equivalence of the likelihood function similar to equation !4!. The random effects are then integrated out to obtain an integrated likelihood function. Certainly, there is no explicit form for the multiple integral and Monte-Carlo method may be the only appropriate way for such a large dimensional multiple integration. Results from this research (normal data) serve as a necessary first step approching to application of the new method to non-normal data.
Finally, it is necessary to clarify a potential confusion between this method and the Gibbs samplers of Guo and Thompson (1991) and Wang et al (1993).
Each method involves maximizing integrated likelihoods and the integration is numerically approximated by drawing pseudo-random deviates from presumably known conditional distributions. However, the Monte-Carlo EM method of Guo and Thompson (1991) generates the joint posterior distribution of the random effects using the Gibbs sampling. This method generates the same estimates of variance components as the usual ML method. The Gibbs sampler of Wang et al (1993) generates the marginal posterior distribution of each variance component.
This method is a Bayesian approach in that it requires prior distributions of the unknown parameters and then integrates out all other parameters except the one of interest. Each parameter is then estimated by maximizing its own marginal likelihood function. The Gibbs sampler of Wang et al (1993) does not produce new estimators but the Bayesian estimates of the variance components. The Monte-Carlo algorithm presented in this paper first treats the random effects as fixed and then generates the marginal distribution of the data via Monte-Carlo sampling.
Similar to the derivative-free algorithm for REML (Graser et al, 1987), our method is only an alternative algorithm to obtain the ML and REML estimates. It does not generate new estimates of variance components, provided M is sufficiently large.