Marginal maximum likelihood estimation of variance components in Poisson mixed models using Laplacian integration

more commonly used expectation-maximization type algorithm for MML. Numerically, however, a different sequence of iterates is obtained, although the same variance component estimates should result. The Laplacian algorithm is precisely DFREML (derivative free REML) optimization when applied to normally distributed data, and could then be termed DFMML (derivative-free marginal maximum likelihood). Because DFMML is based on

Résumé -Estimation des composantes de variance par le maximum de vraisemblance marginale dans des modèles mixtes de Poisson à l'aide de la méthode d'intégration de Laplace. Un algorithme de calcul des estimées de composantes de variance par le maximum de vraisemblance marginale dans des modèles mixtes de Poisson est présenté. On utilise une approximation de Laplace pour éliminer par intégration les effets fixés et aléatoires de la densité conjointe a posteriori de tous les paramètres. Cette approximation se montre identique à celle à laquelle il est fait appel dans l'algorithme plus classique du type espérance-maximisation. Du point de vue numérique cependant, la séquence des valeurs obtenues par itération est différente, bien que les mêmes estimées de composantes doivent être obtenues. L'algorithme de Laplace est précisément l'optimisation de DFREML (maximum de vraisemblance restreinte sans dérivée) quand on l'applique à des données distribuées normalement, et pourrait dont être appelé DFMML (maximum de vraisemblance marginale sans dérivée). Parce que DFMML est basé sur une approximation de la vraisemblance marginale des composantes de la variance, il fournit un moyen de tester des hypothèses relatives à de telles composantes via des rapports de probabilités a posteriori ou des tests de rapport de vraisemblance. De plus, des valeurs asymptotiques a posteriori des composantes de variance peuvent être calculées au moyen de DFMML. Une procédure de Tierney-Kadane pour calculer la moyenne a posteriori d'une composante de variance est également présentée; elle requiert cependant 2 maximisations conjointes et, en conséquence, on ne doit pas s'attendre à ce qu'elle donne de bons résultats dans beaucoup de modèles linéaires et non linéaires. Un exemple de modèle de Poisson est donné, dans lequel on obtient les valeurs nulles habituellement trouvées quand on estime conjointement des composantes de variance avec des effets fixés et aléatoires; ainsi, la procédure de Tiernay-Kadane pour calculer la moyenne a posteriori échoue. En revanche, la méthode de Laplace réussit à localiser le mode de la distribution marginale des composantes de variance dans un modèle bayésien avec des a priori uniformes pour les effets fixés et les composantes de variance, ie l'estimée du maximum de vmisemblance marginale. composantes de variance / distribution de Poisson / modèle linéaires généralisé / maximum de vraisemblance marginale / intégration de Laplace INTRODUCTION Non-linear models for quantitative genetic analysis of categorically scored phenotypes have been developed in recent years (Gianola and Foulley, 1983;Harville and Mee, 1984). In these models, it is assumed that the observed polychotomies correspond to realizations of an underlying normal variate inside intervals of the real line that are delimited by fixed thresholds. The mathematical link between the underlying and the discrete scales is, thus, the probit function. Although threshold models have been used for analysis of different types of discrete data (eg, Meijering, 1985;Weller et al, 1988;Weller and Gianola, 1989;Manfredi et al, 1991) counted variates are probably better modelled using Poisson or related distributions such as the negative binomial distribution. Non-linear Poisson models for counted variates, eg litter size in swine and sheep, have been suggested by , and an application to prolificacy in the Iberian pig is given by P6rez-Enciso et al (1993).
The model of  requires knowledge of variance components, so these must be estimated somehow. Animal breeders have used restricted maximum likelihood (REML) to estimate genetic variances for a wide array of economically important traits. This is not entirely satisfactory for discrete characters because REML relies on the assumption of multivariate normality; the degree of robustness of this method to departures from normality has not been sufficiently studied. Further, unless there is a large amount of statistical information in the data about the variance parameters, the sampling performance of REML when applied to discrete traits may be unsatisfactory, as suggested by the simulation study of Tempelman and Gianola (1991 (1987) introduce the linear relationship where 9' = 91 u'l, and wi = [x', zi! is the ith row of the n x (p + q) incidence matrix W = [X, Z]. X and Z are known incidence matrices of dimensions n x p and n x q, respectively, that associate the location vectors PPX 1 and u9 x 1 to each observation. Under the Poisson model, the mean and variance of an observation, given 0, is equal to the Poisson parameter A i . Hence, the residual variance in this model is precisely Aj .
The vectors P and u are distinct in the following sense. Typically, the elements ofp pertain to levels of fixed effects such as herd, year and season, whereas those of the vector u pertain to &dquo;random&dquo; effects of the animals being recorded and of their known relatives. In a Bayesian context, a flat prior density is assigned to p and a multivariate normal prior distribution is assumed for u . If u is a vector of breeding values, Above, A is a matrix of additive relationships, and J fl is the additive genetic variance.
If the dispersion parameter J fl is unknown, it can be estimated from its marginal posterior distribution so as to provide a parametric empirical Bayes approach to joint estimation ofp and u. When the prior density assigned to U2 is flat, then the mode of the marginal posterior distribution of Jfl is identical to the maximum of the marginal likelihood of or2 . u The unknown parameters are thus!i, u, and ou . In animal breeding applications, often p + q > n. For example, in 'animal' models with a single observation per recorded individual, the dimension of u is often greater than the number of observations, that is, q > n. This leads to a highly parameterized model. When the elements of u are strongly intercorrelated, a potentially low degree of orthogonality can seriously slow down convergence of Monte Carlo Markov Chain methods, such as Gibbs sampling, as a means of estimating marginal densities, modes, or means (Smith, 1991). Under these conditions, approximating the marginal density of U2 u by Laplacian integration procedures may be attractive from a numerical point of view.

ESTIMATION OF THE VARIANCE COMPONENT FROM THE MODE OF ITS MARGINAL POSTERIOR DISTRIBUTION
We first assume that, conditionally on 8, the observations are independent, following a Poisson distribution as in [1]. Let id = [0 1 ' er 2 l , = [p', U ,,a2j' represent all parameters of interest. Assigning a flat prior to the variance component Q u and to p, such that the joint prior density of tl is proportional to that of u, we can write the log of the joint posterior density ofp, u, and or2 as where 7 r(u E&dquo;) is a multivariate normal density function. Further, Because !E&dquo;! = IAQu! = !A!(<r!)!, and A does not depend on the parameters, it follows that [5] is expressible as: The joint posterior density of the full parameter set can be written as: where p(0 ) !u, y) is the posterior density of 0, given that the variance components are known, and p(o,2 u y) is the marginal density of the variance parameter. Define: to be the mode of the joint posterior density of 0, given Q u, and Ignoring third and higher order terms, the asymptotic approximation is then made that which is also used by Foulley et al (1987) to obtain approximate MML. In order to compute [8a], these authors employ the Newton-Raphson algorithm, which can be shown to lead to the iteration: where [t] indicates iterate number, is a residual. Note that R-l v = {(Yi -Ài) / Ài} can be interpreted as a residual vector expressed in units of residual variance, or relative to the mean of the conditional distribution of the observations. It can also be shown that: Note that the solution to system (9J, which resembles Henderson's mixed model equations, and the negative Hessian [10) are both a function of or2 U.
Whe wish to find the mode of the marginal distribution of Jfl (or maximum of the marginal likelihood of the data) by recourse to Laplacian integration, as in Leonard (1982). Now: A second-order Taylor series expansion of the log joint posterior density about 0, at a fixed au gives: Employing [13] in [12] and letting p A (.) denote an approximate density, Using this in [11] and recalling that 9 ! 1 o,2, y is approximately normal, Taking logs of [15], and using [6], we note that apart from a constant, where A j = exp {wi9 a } is computed from the mode of the joint posterior density of p and u, given Q u. One can find the posterior marginal mode of Jfl by establishing a grid of points of {Qu, LA(Q! ! y)} and then interpolating with a second order polynomial as in Smith and Graser (1986).
It is interesting to note that if the data were normally distributed, the algorithm just described reduces to that suggested by Graser et al (1987), or DFREML (Meyer, 1989). Hence, Laplacian integration provides a generalization of DFREML to a class of non-linear models that could be termed DFMML.

VARIANCE COMPONENT ESTIMATION FROM THE POSTERIOR MEAN Theory
The posterior mean is an attractive point estimator; from a decision theory viewpoint, it can be shown to minimize expected posterior quadratic loss (Lee, 1989). The mean of the marginal posterior distribution of the variance component can be written as: where !2! is the space of the entire parameter vector. In this section we consider developments for computing posterior means presented by Tierney and Kadane (1986) and derived in detail by Cantet et al (1992); these are extensions of Laplacian procedures introduced by Leonard (1982).

Let:
The posterior mean of the variance component can then be represented as Note that the denominator assures that the joint posterior integrates to one when the integration constant in the joint density is ignored. Define: The negative joint Hessians above can be written as:  [19], the posterior mean is approximately This approximation has been deemed to be highly accurate. The errors of the approximations to the integrals in the denominator and the numerator in [19] are of order 0(n-1 ). This would also be the order of the error incurred when approximating the joint posterior by a normal distribution. However, the leading terms in the 2 errors are nearly identical and cancel when the ratio in [19] is taken (Tierney and Kadane, 1986), thereby leading to an error term that is proportional to 0(n-2 ).

Computational considerations
Consider the ratio of integrals in [19], and the maximize_rs, ! and 4 * in [20a] and [20b]. Computing the denominator entails evaluating L(!), which is equivalent to maximizing the joint log-posterior density in [6]. Likewise, computing the numerator would entail the same procedure, except that log(a!) is added to the log joint posterior. From [6], it follows that: Setting the first derivatives to zero leads to expressions: which would be used in conjunction with system [9] (evaluated at the 'current' values of Jfl ) to obtain the joint posterior mode. We obtained estimates of 0' u 2 equal to zero in several simulation tests of this algorithm, when applied to Poisson models. This implies that the joint density is maximum when J fl = 0. As noted by Lindley and Smith (1972), Harville (1977), Thompson (1980), and Gianola and Fernando (1986), joint maximization of a joint posterior density with respect to fixed and random effects and the variance components in a linear model, often leads to a sequence of iterates for the latter converging towards zero. Harville (1977) attributed the problem to 'severe dependencies' between u and Jfl (clearly, the conditional distribution of ulE,, depends on o, u 2). As noted by Gianola et al (1990), the problem also arises when searching for the mode of p(p, uly) or p(uly) where any 'dependency' would be eliminated by integration of Q u. In general, the problem does not occur when informative priors are employed for a!. H6schele et al (1987) also found that many of their variance component iterations were drifting towards zero when using a first order algorithm for maximizing the joint posterior density in threshold models.
It is instructive to contrast the log of the joint posterior density in [6], L(!), with the approximate marginal density of a 2, LA ( U2l y), in [16]. Apart from the constant terms, these 2 functions differ in that in [16], half the value of the log of the determinant of the negative Hessian matrix is subtracted. Ritter (1992) views this as an important 'width or variance adjustment' in the estimation of or from its marginal distribution; this supports the claim made by O'Hagan (1976) that marginal modes are better estimators than joint modes.
Because the Tierney and Kadane (1986) approximation to the posterior mean fails whenever <7! goes to zero in the joint maximization algorithm, alternative strategies must be sought. One possibility would be to evaluate the approximate marginal density of the variance component as in [16] and then compute the posterior mean by cubic spline fitting (deBoor, 1978) or by Gaussian quadrature involving 'strategic' evaluation points of o,2 U.

NUMERICAL EXAMPLE
Data on embryo yields within a nucleus scheme were simulated with a Poisson animal model according to procedures given in Tempelman and Gianola (1991). The underlying mean on the log scale was log(4). Two 'fixed' factors, one with 5 levels and the other with 20 levels were generated from a N(0, 0.10) distribution on the canonical log scale. Additive genetic effects were generated from a N(0, 0.05) distribution for a base population of 16 sires and 128 cows. Cows were superovulated and mated at random to outside sires also drawn at random from the population at large. The numbers of embryos produced per cow was a drawing from the Poisson distribution, with the value of its parameter depending on the fixed effects and the additive genetic value of the female in question. Sex ratios in the embryos was 50: 50, and sexes were assigned at random, using the binomial distribution. Male embryos were discarded, and the genetic value of female embryos was obtained as: where as is the breeding value of an outside sire, a D is the breeding value of the donor cow, and z o -NiiD(0,1). The female embryos were 'raised' (probability of survival to an embryo collection was 0.70), and mated at random to nucleus sires, to produce a new generation. Records on embryo yields obtained from these matings were simulated as before. Thus, information on embryo yields was available on foundation cows and their surviving female progeny. The simulation involved a 'natural selection' process because donor cows without embryos recovered left no progeny at all, whereas donor cows with higher embryo yields left more female progeny.
In the simulation, p = 24 (25 levels of fixed effects minus 1 dependency) and q = 242 (16 sires, 128 dams and 98 surviving progeny). The mode of the approximate marginal density of or was located employing [16]. An iterative quadratic fit led to o,2= 0.0347 as maximum, and the approximate log marginal density is depicted in figure 1, with a cubic spline fitted through the iterates. The EM-type algorithm  gave a modal value of or = 0.0343.
Convergence criteria of the 2 algorithms were not directly comparable, thereby contributing to some of the discrepancy between the 2 estimates.
As noted previously, the Tierney and Kadane (1986) approach gave Jfl = 0 as the value of the variance component that maximized the joint density. To illustrate, the log joint density [6] was evaluated at the 'current' value of or2 (and of the resulting solution to system (9)) during iteration and the plot is shown in figure 2.
Clearly, u2 = 0 would be the maximizer, giving a density value of plus infinity. The degeneracy of this log density highlights the importance of the Hessian adjustment in [16].

DISCUSSION
The Laplacian procedure for finding the mode of the marginal posterior distribution of a single variance component in a Poisson mixed model was found to be analogous to 'derivative-free' methods employed for computing REML estimates of variance components in a mixed linear model (Smith and Graser, 1986;Graser et al, 1987).
In fact, if Laplacian marginalization is applied to variance estimation with normally distributed data in a Bayesian model with flat priors for fixed effects and variance components, this would yield precisely derivative-free REML. This is because the Laplacian integration is then exact. Although a single variance component was considered in this paper, the algorithm generalizes in a straightforward manner to a Poisson model with several variances, and one obtains MML estimates of variance components. Because of the analogy noted above, we suggest DFMML (derivativefree marginal maximum likelihood) as a generic term for this algorithm, since the procedure extends beyond the class of mixed linear models.
The Laplacian technique used for finding the mode of the marginal posterior distribution of Q u is theoretically, although not numerically, equivalent to the EMtype algorithm suggested by . In order to obtain the mode of the marginal distribution of Q u these authors employ the relationship: verified in the numerical application discussed previously. Application of the Laplacian procedure to the threshold model of Gianola and Foulley (1983) and Harville and Mee (1984) is straightforward. One would simply replace the log-likelihood and the conditional Hessian given in this paper by the corresponding terms in,the threshold model. For multidimensional problems, ie, more than one variance parameter, procedures suggested for DFREML by Meyer (1989) such as the simplex or quasi-Newton algorithms could be used. Optimization procedures that incorporate information on the vector of first derivatives and on the function to be optimized would be expected to be most useful.
Approximate posterior standard errors for the variance components may be computed using a quadratic fit of the log posterior density near the mode as in Graser and Smith (1987). The computation of the log-posterior densities allows also to construct marginal likelihood ratio tests, or posterior odds ratios, for assessing the importance of different sources of genetic and environmental variation. Harville (1977) asserted that the posterior mode is an attractive estimator, being less sensitive than the mean to the tails of the posterior density. However, under a squared-error loss, the posterior mean is optimum. Unfortunately, the Laplacian procedure of Tierney and Kadane (1986) for computing the posterior mean would be expected to fail in many instances. Joint maximization should work well when there is a large amount of information on each random effect, eg sire models, but not in animal models. Hence, alternative numerical procedures should be sought for computing posterior means. Further enhancements to marginal estimation of parameters involving Laplacian integration are given by Kass and Steffey (1989) and Leonard et al (1989).