A quasi-score approach to the analysis of ordered categorical data via a mixed heteroskedastic threshold model

- This article presents an extension of the methodology developed by Gilmour et al. [19], for ordered categorical data, taking into account the heterogeneity of residual variances of latent variables. Heterogeneity of residual variances is described via a structural linear model on log-variances. This method involves ’ two main steps: i) a ’marginalization’ with respect to the random effects leading to quasi-score estimators; ii) an approximation of the variance-covariance matrix of the observations which leads to an analogue of the Henderson mixed model equations for continuous Gaussian data. This methodology is illustrated by a numerical example of footshape in sheep. &copy; Inra/Elsevier, Paris generalized linear mixed models / quasi-score / heterogeneity of variances / threshold response model

geneity of residual variances of latent variables. Heterogeneity of residual variances is described via a structural linear model on log-variances. This method involves ' two main steps: i) a 'marginalization' with respect to the random effects leading to quasi-score estimators; ii) an approximation of the variance-covariance matrix of the observations which leads to an analogue of the Henderson mixed model equations for continuous Gaussian data. This methodology is illustrated by a numerical example of footshape in sheep. &copy; Inra/Elsevier, Paris generalized linear mixed models / quasi-score / heterogeneity of variances / threshold response model Résumé -Une approche de quasi-score pour l'analyse de variables qualitatives ordonnées par un modèle mixte à seuils hétéroscédastique. Cet article présente une extension de la méthodologie développée par Gilmour et al. !19! dans le cas de variables qualitatives ordonnées, prenant en compte l'hétérogénéité des variances résiduelles des variables latentes. L'hétérogénéité des variances résiduelles est décrite par un modèle linéaire structurel sur les logarithmes des variances. Cette méthode comprend deux étapes principales : i) une « marginalisation » par rapport aux effets aléatoires qui conduit, grâce aux équations de quasi-score, à l'estimation des paramètres ; ii) une approximation de la matrice de variance-covariance des observations qui aboutit à un système analogue aux équations du modèle mixte d'Henderson dans le cas de variables continues gaussiennnes. Cette méthodologie est illustrée par un exemple sur la forme des pieds chez le mouton. @ Inra/Elsevier, Paris modèles linéaires généralisés mixtes / quasi-score / variances hétérogènes / modèle à seuils

INTRODUCTION
The threshold model is one of the most popular models for analysing ordered categorical data especially in population [36,37] and quantitative [7] genetics as well as in animal breeding !16).
Recently Foulley and Gianola [8] extended the standard threshold model to a model allowing for heterogeneous variances of the Gaussian latent variables using a log-linear model for the residual variances. In the case of mixed models, they proposed to base inference about threshold cutoff points, location and dispersion parameters of the latent distribution on the mode of the a posteriori (MAP) distribution. This approach is basically a conditional one (given random effects) and is similar to penalized quasi-likelihood (l, 31), iterated re-weighted restricted maximum likelihood [5] and hierarchical likelihood of generalized linear mixed models [28] for one parameter exponential families. As discussed by Foulley and Manfredi [10] and Engel and Keen [6], these procedures are likely to have some drawbacks regarding the estimation of fixed effects due to the approximation in integrating out random effects. One simple way to overcome the difficulty of an exact integration of random effects is the quasi-score approach of Me Cullagh and Nelder [30] which only requires the mean and variance of the data distribution. In particular, an appealing version of the quasi-score approach for computing estimations of fixed effects was proposed by Gilmour et al. [18,19] using an approximation of the variance-covariance matrix. One of the main advantages of this method is that it mimics the mixed model equations of Henderson [23] making the estimation of fixed effects computationally easier and providing analogues of BLUP (best linear unbiased predictor) of random effects as by-products. Moreover, this quasi-score method, via linearization, was proven to be quite general [1,21,39]. Initially derived by Gilmour et al. [18] for binary data modelled with logit or probit links, it was applied to ordered categorical data by the same authors !19), to Poisson data with a log link by Foulley and 1m [9] and to a log link exponential model by Trottier [35). The purpose of this paper is to show how this procedure can also cope with heterogeneous residual variances in the case of ordered polytomics modelled via Gaussian latent variables. Section 2 entitled 'Theory' outlines the model, the quasi-score equations and their GAR [19] counterpart and by-products. Section 3 illustrates the theory using the numerical example of footshape in sheep presented by GAR !19).

Model
The model assumptions and notations are basically the same as in Foulley and Gianola [8]. First, it is assumed that the population can be stratified according to an index i (i = 1, 2, ... , I) such that the between subgroup variation corresponds to systematic influences of identified factors and the within group variation to random noise.
There are J response categories indexed by j such that y 2+ _ (y ij+ ) represents the vector of the counts of responses for subpopulation i in the n! different categories j. The vector y i+ can be expressed as the sum yi + _ L y 2r r=l of indicator vectors y ir = (Yiln ?2! ' &mdash; ; !r; &mdash; '; yi.Jr!! such that y 7 ,j r = 1 if response of observation r in subpopulation i is in category j and y ij , = 0 otherwise. In the threshold approach, the probability of a response in category j for an observation of population i, say !rij, is described by the distribution of continuous latent variables giro, The expression of these variables is discretized via threshold values (!l, !2, !j !,!-1), (!o = !oo and !j = +oo) such that: A mixed model structure is hypothesized on the latent variable: where r!Z = E(£ ir ) is decomposed as a linear function x,)/3 of explanatory variables (row vector xi') with unknown coefficients /3 E IR P ; !!zzzu* represents the contribution of random effects to the model with u * being a (q x 1) vector of scaled deviations, zi the corresponding row incidence vector and !!! the square root of the u-component of variance, which may vary between subpopulations.
Classical assumptions are made regarding the distribution of u * and e i = {e ir }, i.e. u * -M(0, Iq) or, in genetics, u * -M(0, A) where A represents the known relationship matrix, e i -N (0, Q e. Ini) and Cov( u, ei') = 0.
Homogeneity of the covariate structure is assumed within the subpopulation i, i.e. xi, = xi and z ir = z 7 ,. If not (e.g. when x i is a continuous covariate), smaller units will be considered, even at the limit elementary units (n 2 = 1). Moreover, as in Foulley and Gianola (8!, the ratio pi = u uj u ei is assumed to be constant (p) across populations which is equivalent to supposing homogeneous intra-class correlations (e.g. constant heritability or repeatability) across environments. Thus, with a i = z'Azi. In many applications, a i is a constant or even a i = 1, but this simplification is not mandatory throughout this paper. In fact, the theory is presented here with a single random factor but it can be easily extended to any number K of independent random vectors uk.
Similarly for the expectations, a structure is postulated for residual variances so as to account for the effects of factors causing heteroskedasticity. As in Foulley et al. [13,14], heterogeneity of residual variances is described by a structural linear model and a log link function, as follows: where p' is the (1 x r) row vector of covariates and 6 is the (r x 1) vector of real-valued dispersion parameters.

Estimation
The estimation procedure described here includes two steps. The first step consists in setting up the quasi-score equations based on the first two marginal moments according to the quasi-likelihood theory [30] and its extension to correlated observations [29]. The second step lies in replacing the variancecovariance matrix of observations by an approximation which is analogous to solving for fixed effects using the mixed model equations of Henderson (23].

(!uasi-score equations
Let e = (ç', 13', 6')' be the (J -1 + p + r) vector of parameters of interest, where !!! J_1) X 1) are the thresholds, !3!pX 1) the location parameters, and 6<r x i! the dispersion parameters. The quasi-score equations are: built from a lower triangular matrix of Is, the last row of which is removed.
Equations in (6) need to specify p and E which can be performed as follows. In the marginal model, 1-iij is the expectation of fJi j ( u * ) with respect to the distribution of u * . Remember that if X N N ( fJ ,u 2 ), the E(4 l (X)) = !(!(1 + ( 2 )-1 / 2 ) !2!. Here, the expectation of (7) reduces to: As shown in detail in the Appendix, the variance-covariance matrix E of the observations can be decomposed as the sum of two components: The first component E A is a (I(J -1) x I(J -1)) block diagonal matrix such that: In equation (11), E o , ii is a ((J -1) x (J -1)) matrix whose general term I is ( 170 , ii ) jk = fJij (1 -fJik ) for j, k = 1,..., (J -1), so that E = i(D1 (Eo,!2)/ni t=i is the variance-covariance matrix of observations for multinomial data (i.e. a purely fixed model).
The second component E B corresponds to the covariance terms for offdiagonal blocks, i.e.: For any pair of blocks (diagonal i = i' or off-diagonal i =1= i') its general term ( j k) can be expressed as: where tii! is the correlation coefficient between f j, and e2!r! and 4 l 2 (a, b; r) is the CDF of the standardized binormal distribution with arguments a, b, and correlation r.
The system in equation (6) (15) and (16) leads to an iterative generalized least square system: where W(1(!_1!x1(J-1!! _ !E 1! is a matrix of weights, and v = HTO + o-1 (,!p) is a working variable. Both are updated from round (t) to round (t + 1) of iteration using the current value B(t! of 0.

The GAR procedure
The size (I(J &mdash; 1)) of the E matrix to invert in W may be very large in some types of applications (e.g. genetic evaluation of field data). This precludes the use of the equation system (20) for computing 0 estimates. This was the basic reason why Gilmour et al. [18] proposed an alternative procedure based on a convenient approximation of E, whose principle was explained in detail in Foulley et al. !12!. Let Q(a, b; r) = 4l2 (a, b; r) -4l(a)lF(b) . Using Tallis's [34] result viz i9Q/Or = 4 >2 (a, b; r) ( 02 (-): standardized bivariate density with arguments a, b and correlation r), the first order Taylor expansion of S2(a, b; r) about r = 0 is S2(a, b; r) = r4>(a)4>(b) + o(r 2 ). Applying this to a = !y2!, b = !y2!! and r = tii&dquo; which occur in the general term of E B , iil (cf. equation (13)), leads to: This can also be written as: where <P i and z § are as previously defined, G = Ap 2 , M i = -1 < j-i> Wi ! (1<k> is a vector of k ones and the minus sign is used for the convenience of calculation). where EA is the same as defined in equations (10) and (11) with block diagonal terms of E B replaced by their approximations given in equation (23). Substituting E in W-1 = < p -l ¿,4>-l by its expression in equation (24) R-1 can be directly calculated due to the peculiar structure of E o which has a tridiagonal inverse (see Appendix). Detailed expressions for the elements of the coefficient matrix and the right hand side of (26) can be found in the Appendix. Moreover, arguing as Gilmour et al. [19] from the mixed model structure of equation (26), one can extract two by-products of this system: i) a BLUP-type prediction of the random effects represented by the u solution to equation (26).
ii) a EM-REML-type estimation of the variance component, say here p 2 via: where C uu is the portion of the inverse of the coefficient matrix in equation (26) corresponding to u.
The preceding theory is now illustrated with a small example. For pedagogical reasons, the data set used is the same as the one analysed by Gilmour et al. !19!. The data consisted of footshape scores recorded in three categories on 2 513 lambs observed over a 2 year period, out of five mating groups [17] later on referred to as 'breeds' for simplicity, and sired by 34 rams which are assumed to be unrelated.
The data set is listed in table I. As the year (Yi; i = l, 2) and breed (B j ; j = 1,2,3,4,5) factors are disconnected, parametrization is not standard. Following Searle's [32] 'cell means models', the parametrization adopted here is defined from the elementary estimable parameters, i.e. here the cell location (q zj ) and dispersion ( Vij ) parameters.
The same B transformation applies to the 6 ij as linear functions of the v, i ,j = lnQ2!. The interpretation of parameters is similar to previously, but with the geometric means replacing arithmetic means and ratios replacing differences as shown below: The general procedure presented here was applied to both standard (S-TM) and heteroskedastic (H-TM) threshold models with the fixed parametrization effects described above for the location and dispersion parameters, and random sire effects within year x breed subclasses. Data were not analysed in detail since the main purpose of this numerical illustration is to serve as a test example. Parameter estimates under both models are shown in table II. The intra-class correlation (sire variance) was estimated as 0.0622 and 0.0630 under the S-TM and H-TM, respectively. Differences between sire predictions under the two models are distinct but small, suggesting, as expected, a wider spread of predictions under the H-TM (+ 0.8 %).
The estimations of fixed effects for location parameters under the S-TM model are not directly comparable with those obtained by Gilmour et al. [19] owing to different parametrizations. The estimates and Wald's tests (table III) provide strong evidence for heterogeneity in residual variances.
It is worth noting that, in the H-TM model, year and breed contrasts within year 2 are not significant factors of variation of the mean but greatly influence the residual variance contrarily to what happened with the breed contrast within year 1. Thus one may apply in practice a more parsimonious H-TM model which has in that case as many parameters as the S-TM model (i.e. four fixed effects + one variance component) but fits the data set better ( X 2 Pearson statistics = 27.0 and 11.8 for 4 degrees of freedom for purely fixed models).

DISCUSSION
New perspectives are opened in the analysis of ordinal data by the use of heteroskedastic threshold models [38]. The justifications for considering this extension were discussed at length by Foulley and Gianola [8] and include alternatives such as the variable threshold concept and its relationship with H-TM. Here, heterogeneous variances were considered within the framework of the usual mixed linear model with heteroskedasticity described by structural models [14,15]. This problem can also be tackled under different model structures such as for instance the multilevel models of Golstein !20!.
In the H-TM context, the GAR quasi-likelihood procedure turns out to be a natural alternative to the MAP approach proposed by Foulley and Gianola !8!.
The main advantage of the MAP approach lies in both its conceptual and computational simplicity. Part of this simplicity results, however, from an inference based on the mode of the joint posterior distribution of 13 and u rather than from posterior expectations or marginal modes. In other words, due to its equivalence to Schall's approach !31!, it can be viewed as a procedure based on a linearization of a conditional model [35!. On the contrary, the GAR quasi-score method is an attempt to integrate out u in order to estimate the 0 and 6 fixed effects. There is, however, a trade-off for an easy integration, i.e. i) to replace the data distribution by its quasilikelihood counterpart, and ii) to approximate the variance-covariance matrix of data by a Taylor expansion about small intra-class correlations. Moreover, as pointed out by Knuiman and Laird !27!, u solutions to equation (26) have no clear justification.
An additional level of approximation resorts to estimating G from formula (27) which mimics classical EM-type formulae for linear models. Model approximations, when estimating G as originally proposed by Harville and Mee [22] and Foulley et al. !11!, could also be especially critical. For example, with binary data, such approximations may yield seriously biased estimators (4!. Fully EM marginal maximum likelihood or Bayesian posterior analysis based on MCMC (Monte Carlo Markov chains) methods [33] would be useful to improve the estimation of variance components and to get further in the implementation of heteroskedastic models [15]. An alternative procedure based on simulated moments was proposed by Jiang !26!; it provides consistent estimators for both fixed effects and random components. Although computationally attractive, this method can be quite inefficient when applied to small samples.
A comparison between MAP and GAR has been carried out via Monte Carlo simulation for an S-TM and for binary data by Hoeschele and Gianola [24].
Contrarily to expectations, MAP estimators of fixed effects were superior to GAR estimators in terms of bias and MSE, and u predictions were very close to each other. Preliminary simulation work carried out on a sire-maternal grand sire design assuming G known and relatively large differences in variances exactly indicate the same tendency as far as the comparison of these two methods is concerned: smaller MSE for 13 and  where L is a ((J -1) x J) matrix built from a lower triangular (J x J) matrix of Is, the last row of which is removed. y i can, however, be expressed as a combination of indicator variables: 1 n , Yi = &mdash; y!y:r with yi,! an indicator vector of dimension (J x 1), with 1 on n 2 r=1 I the jth row if the observation r of the population i is in the category j. Then, Now, Var(Yir) = F' ' (Yir.f'ir) -E(Y.<T)E<Y&dquo;T)!, with E(YorY/r) ! dl!!lE(Yo0)1, and E(yi r ) = TIi, with H; corresponding to the (J x 1) vector of (7r!)j=i! !./. Thus, Moreover, Cov(y!y!) = !(y,.y!) -E'(Yir)E(Yir!)'. E(y!.y!,) = II 2 ii> where H 2 ,, stands for a (J x J) matrix, for which the Uk) term represents the joint probability that an observation r of subpopulation i is in category j and that a different observation r' from the same subpopulation i is in category k, i.e. (IIz,!)! = PT(yijr = yikr ' = 1). Thus, Since, from equation (A1), Var(.ki) = LVar(y, 7 )L', and using the decomposition in equation (A3) with formulae (A4) and (A5), one has: 2 ,.)L' = L diagflli l } L' -L Hin! L', this expression reduces to LVar(y i ,)L' = E o , 2 i with where t ii stands for the correlation coefficient between Pi r and £i r' ; !*2 is the CDF of the normal bivariate distribution, such that < I > 2 ( Yl , Y2 ; p) = P((Y l s !1) f1 (Y 2 X yz)), Y 1 and Y 2 being identically distributed as JV(O, 1), with Corr(Y i , Y 2 ) = p. Inserting this in equation (32) leads to: Hence, the (i, i) blocks of the variance-covariance matrix of y can be written as: The expression of the covariance calculated before can be easily generalized to the case of two distinct populations i and i'. Then, it follows: Finally, the variance-covariance matrix E of y can be written as the sum of two components: where E A is a block diagonal matrix defined as: 1 with E!, = &mdash;(Bo,,: -E B , ii ), and E B = {Eg iin!i,i!=1,...,1)! where £0,it is ni given in equation (A6), and E B ,ii and E B , ii , are given in equations (A7) and (A9).