A marginal quasi-likelihood approach to the analysis of Poisson variables with generalized linear mixed models

Summary - This paper extends to Poisson variables the approach of Gilmour, Anderson and Rae (1985) for estimating fixed effects by maximum quasi-likelihood in the analysis of threshold discrete data with a generalized linear mixed model. discrete variable / Poisson distribution / generalized linear mixed model / quasi-likelihood Résumé - Une approche de quasi-vraisemblance pour l’analyse de variables de Poisson en modèle linéaire mixte généralisé. Cet article généralise à des variables de Poisson l’approche de Gilmour, Anderson et Rae (1985! destinée à l’estimation par maximum de quasi-vraisemblance des ef,!’ets fi,!és lors de l’analyse de variables discrètes à seuils sous un modèle mixte.


INTRODUCTION
As shown by Ducrocq (1990), there has been recently some interest in non linear statistical procedures of genetic evaluation. Examples of such modelling procedures involve: 1) the threshold liability model for categorical data (Gianola and Foulley, 1983; Harville and Mee, 1984) and for ranking data in competitions (Tavernier, 1991); 2) Cox's proportional hazard model for survival data (Ducrocq et al, 1988); and 3) a Poisson model for reproductive traits (Foulley et al', 1987). In FGI, estimation of fixed (II) and random (u) effects involved in the model is based on the mode of the joint posterior distribution of those parameters. As discussed by Foulley and Manfredi (1991), this procedure is likely to have some drawbacks regarding estimation of fixed effects due to the lack of integration of random effects. A popular alternative is the quasi-likelihood approach (Mc Cullagh and Nelder, 1989) for generalized linear models (GLM) which only requires the specification of the mean and variance of the distribution of data. This procedure has been used extensively by Gilmour et al b (1985) in genetic evaluation for threshold traits.
In particular, GAR derived a very appealing algorithm for computing estimates of fixed effects which resembles the so-called mixed model equations of Henderson (1984). The purpose of this note is to show how the GAR procedure can be extended to Poisson variables.

THEORY
The same model as in FGI is postulated. Let Y k be the random variable (with realized value y! = 0,1,2...) pertaining to the kth observation (k = 1,2,..., K). Given A k , the Y!s have independent Poisson distributions with parameter A k , ie: As the canonical link for the Poisson distribution is the logarithm (Me Cullagh and Nelder, 1989), A k is modelled as: where p and u are (p x 1 ) and (q x 1 ) vectors of fixed and random effects respectively, and x' and z' are the corresponding row incidence vectors, the parameterization in P being assumed to be of full rank.
Notice that [2] is an extension to mixed models of the structure of &dquo;linear predictors&dquo; originally restricted to fixed effects in the GLM theory: see eg Breslow and Clayton (1992) and Zeger et al (1988) for more detail about the so-called general linear mixed models (GLMM). Moreover, it will be assumed as in other studies (Hinde, 1982;Im, 1982;Foulley et al, 1987) that u has a multivariate normal distribution N(0, G) with mean zero and variance covariance matrix G, thus resulting in what is called the Poisson-lognormal distribution (Reid, 1981;Aitchinson and Ho, 1989). FGI showed that the mixed model structure in [2] can cope with most modelling situations arising in animal breeding such as, eg, sire, sire and maternal grand sire, and animal models on one hand, and direct and maternal effects on the other hand.
A simple example of that is the classical animal model In (a2! ) = x!. p + ai +pi, for the jth performance of the ith female (eg ovulation rate of an ewe) as a function of a From now on referred to as FGI (Foulley, Gianola and Im) ; b from now on referred to as GAR (Gilmour, Anderson and Rae).
the usual fixed effects (eg herd x year, parity), the additive genetic value a i and a permanent environmental component p i for female i.
According to the GLM theory, the quasi-likelihood estimating equations for 13 are obtained by differentiating the log quasi-likelihood function Q(j; y, G) (G being assumed known), with respect to (3, and equating the corresponding quasi-score function to zero, vix. As clearly shown by the expression in [3], the quasi-likelihood approach only requires the specification of the marginal mean vector 11 and of the variance covariance matrix V of the vector Y of observations. Given the moment generating function of the multivariate normal distribution ie E(exp (t'Y) = exp [t'1I + (t'Vt/2)j, it can be shown that: (Hinde, 1982;Zeger et al, 1988) and (Aitchinson and Ho, 1989), 8!l being the Kronecker delta, equal to 1 if k = l, and 0 otherwise. Generally models used in animal breeding yield, in the absence of inbreeding, homogeneous variances so that for any k, a 2 = zkGz! = a 2 , and In ( ILk ) = x!!-1-(a 2 /2). Moreover, letting L!Kxx) _ {exp(z!Gzl) -1! and M lKx x) = Diag 1,U kl , variances and covariances of observations defined in [9] can be expressed in matrix [t] being the round of iteration.
As in GAR, one may consider to approximate V. This can be accomplished here using a first order Taylor expansion of exp (z!Gz1) around G = 0, ie replace exp (z!Gz1) -1 1 in L, by z!Gzl' This approximation is likely to be realistic as long as the u-part of variation remains small enough in the total variation. Doing so, V in [10] becomes V = M + MZGZM, with Z!Kxql = ( Zl , z 2 , ... , z! , ... , Zk )' being the overal incidence matrix of u. Putting this formula into the inverse of W in !12!, one has This formula exhibits the classical form (R + ZGZ' in the usual notation) of a variance cori_ance matrix of data described by a linear mixed model; this allows us to solve for? in [11] using the mixed model equations of Henderson (Henderson, 1984), ie here with: or, alternatively, defining The similarity between [16] and the formula given by FGI should be noted.
Actually, here tlk = Eu (.),k) replaces A k , thus indicating the way random effects are integrated out in the GAR procedure. It should be kept in mind that the main advantage of [15] is to provide estimates ofp which can be computed in a similar way as with mixed model equations of Henderson (1984). These equations also imply as a by-product an estimate of u which, as pointed out by Knuiman and Laird (1990) about the GAR system of equations, &dquo;has no apparent justification&dquo;.

DISCUSSION
The procedure assumes G known. Arguing from the mixed model structure of equations in !15J, GAR have proposed an intuitive method for estimating G which mimics classical EM type-formulae for linear models. FGI advocated approximate marginal likelihood procedures based on the ingredients of their iterative system in # and u. Actually, applying such procedures would mean to use a third level of approximation; the first one was resorted to quasi-likelihood procedures and the second one to the use of !15J instead of !11J. Alternatively, pure maximum likelihood approaches based on the EM algorithm were also envisaged by Hinde (1982), viewing u as missing and using Gaussian quadratures to perform the numerical integration of the random effects. More details about methods for estimating variance components in such non-linear models can be found eg in Ducrocq (1990), Knuiman and Laird (1990), Smith (1990), Thompson (1990), Breslow and Clayton (1992); Solomon and Cox (1992) and Tempelman and Gianola (1993).
It must be kept in mind that the mixed model structure in [2] applies to a large variety of situations. In particular, it can be used to remove extra-Poisson variation when the fit due to identified explanatory variables remains poor. In such cases, some authors ( eg Hinde, 1982;Breslow, 1984) have suggested to improve the fit by introducing an extra variable into the random component part of [2] ie by modelling the Poisson mean as In (A k ) = xk13 + z!.u + e k . This procedure can be applied eg to a sire model so as to fit the fraction (3/4) of the genetic variance that is not explicitly accounted for in the model. Finally, our approach can also be used for partitioning the observed phenotypic variance (Q!) into its genetic (a9) and residual (or 2 ) components. Let us assume that the trait is determined by a purely additive genetic model on the transformed scale, ie, In (A) + a where, as in !2J, q is the location parameter, and a -N(0, O ra 2) is the genetic value normally distributed with mean zero and variance a a. 2 Following Falconer (1981), the genetic value (g) on the observed scale can be defined as the mean phenotypic value of individuals having the same genotype, ie, g = E(Yla). The additive genetic variance on the observed scale ( Q9* ) can be defined as 0!2 ! (E(8g/ 8 a)) 2 Q a (see Dempster and Lerner, 1950;p 222), or alternatively as 09* _ [Cov (g, a)J2 ja! (see Robertson, 1950; formula 1, p 234). Now, E(agloa) = J i, and Cov (g, a) = poa2. Both formulae give the same result, ie with a heritability coefficient in the narrow sense (h 2 = Q9*/a!) equal to Notice that, for ufl small enough, Q9 -! a9 * so that [20] and [22] tend to 0,.2 / (or + J-l-1), which can be viewed as the expression of heritability on the linear scale, as anticipated by FGI and expected from the expression of the system in [15]. CONCLUSION Although some other more sophisticated procedures ( eg Bayesian treatment with Gibbs sampling; Zeger and Karim, 1991) can be envisaged to make inference about GLMM parameters, it has been shown that methods based on the quasi-likelihood or related concepts are reasonably accurate for many practical situations (Breslow and Clayton, 1992).