A Bayesian analysis of mixed survival models

Dans le cas des modeles a risques proportionnels, la fonction de risque d'un animal λ(t), c'est-a-dire sa probabilite de mourir ou d'etre reforme au temps t sachant qu'il est vivant juste avant t, a la forme λ(t) = λ 0 (t)e W'e ou λ 0 (t) est une fonction de risque «de base» et e W'e represente l'effet des covariables w sur le taux de reforme. Une distribution peut etre associee aux termes sq de θ, identifiant, par exemple, des effets genetiques et conduisant a des modeles de survie mixtes, aussi appeles modeles de fragilite. Pour l'estimation des parametres τ de la distribution des termes aleatoires, une analyse bayesienne est proposee. Les inferences statistiques sont faites a partir de la densite marginale a posteriori π(T) qui peut etre obtenue a partir de la distribution conjointe a posteriori par integration laplacienne, une technique liee aux approximations point-selles. La validite de cette technique est demontree ici a partir d'exemples simules, en comparant les resultats de l'approximation de π(τ) avec ceux obtenus apres integration algebrique. Cette derniere correspond a un calcul exact realisable uniquement dans des cas tres particuliers, alors que l'approximation point-selle peut etre appliquee dans des situations ou λ 0 (t) est completement arbitraire (modeles de Cox) ou parametrique (par exemple, de type Weibull), ou les termes aleatoires sont correles a travers une matrice de parente connue, ou avec des modeles plus generaux avec stratification et/ou covariables dependantes du temps. L'influence du taux de censure et de la structure des donnees est aussi illustree.

effect of covariates w on culling rate. A distribution can be attached to elements sq in 0, identifying, for example, genetic effects and leading to mixed survival models, also called 'frailty' models. To estimate the parameters T of the distribution of frailty terms, a Bayesian analysis is proposed. Inferences are drawn from the marginal posterior density x(T) which can be derived from the joint posterior density via Laplacian integration, a powerful technique related to saddlepoint approximations. The validity of this technique is shown here on simulated examples by comparing the resulting approximate x( T ) to the one obtained by algebraic integration. This exact calculation is feasible in very specific cases only, whereas the saddlepoint approximation can be applied to situations where Ao(t) is arbitrary (Cox models) or parametric (eg, Weibull), where the frailty terms are correlated through a known relationship matrix, or in more general models with stratification and/or time-dependent covariates. The influence of the censoring rate and the data structure is also illustrated. survival analysis / mixed model / variance component estimation / Bayesian analysis / proportional hazards model Résumé -Une analyse bayésienne des modèles de survie mixtes. Dans le cas des modèles à risques proportionnels, la fonction de risque d'un animal a(t), c'est-à-dire sa probabilité de mourir ou d'être réformé au temps t sachant qu'il est vivant juste avant t, a la forme A(t) = >' o (t)e W 'o où A o (t) est une fonction de risque « de basé» et eW'o représente l'e,f,fet des covariables w sur le taux de réforme. Une distribution peut être associée avx termes S q de 9, identifiant, par exemple, des effets génétiques et conduisant à des modèles * Correspondence and reprints: Station de g6n6tique quantitative et appliqu6e, Institut national de la recherche agronomique, 78352 Jouy-en-Josas cedex, France. de survie mixtes, aussi appelés modèles de fragilité. Pour l'estimation des paramètres T de la distribution des termes aléatoires, une analyse bayésienné est proposée. Les inférences statistiques sont faites à partir de la densité marginale a posteriori x( T ) qui peut être obtenue à partir de la distribution conjointe a posteriori par intégration laplacienne, une technique liée aux approximations point-selles. La validité de cette technique est démontrée ici à partir d'exemples simulés, en comparant les résultats de l'approximation de 7 r( T ) avec ceux obtenus après intégration algébrique. Cette dernière correspond à un calcul exact réalisable uniquement dans des cas très particuliers, alors que l'approximation point-selle peut être appliquée dans des situations où À o (t) est complètement arbitraire (modèles de Cox) ou paramétrique (par exemple, de type Weibull), où les termes aléatoires sont corrélés à travers une matrice de parenté connue, ou avec des modèles plus généraux avec INTRODUCTION Traits associated with longer productive life of livestock are receiving increasing attention in the animal breeding field: it is recognized that decreasing culling due to the involuntary causes (eg, related to disease, infertility, lameness, etc) by genetic or non-genetic means has a positive effect on economic performance, mainly through decreased replacement costs (van Arendonk, 1986;Strandberg, 1991, Strandberg, 1995, Strandberg and S61kner, 1996. Huge field data sets are usually available for comprehensive analyses of productive life, for example, as a by-product of the dairy recording schemes in dairy cattle. The obvious methodology of choice for such studies is survival analysis, in which proper techniques to deal with the unavoidable presence of censored data have been developed. However, statistical complexity and computational difficulties related to these methods have delayed the adoption of state-of-the-art methodology and different indirect approaches have been proposed (see Strandberg and S61kner (1996) for a review). Some largescale applications (Smith, 1983;Smith and Quaas, 1984;Ducrocq, 1987;Ducrocq et al, 1988a, b;Ruiz, 1991;Fournet, 1992;Egger-Danner, 1993;Ducrocq, 1994) as well as the availability of a software specifically written with animal breeding applications in mind (Ducrocq and S61kner, 1994) have demonstrated that the use of less appropriate approaches can be avoided.
The most popular class of survival models is the class of proportional hazards models (Cox, 1972;Kalbfleisch and Prentice, 1980;Lawless, 1982;Cox and Oakes, 1984). The hazard of an animal (or in the animal breeding context, its risk of being culled) at time t is described as the product of a baseline hazard function .!o(t), which is either left completely arbitrary (Cox model) or has a parametric form (eg, exponential, Weibull or gamma) and of a positive term which is an exponential function of a vector of covariates w' multiplied by a vector of regression parameters 0.
Proportional hazard models can be extended to include random (eg, genetic) effects, as in the regular mixed linear models that are used for genetic evaluations worldwide. Mixed survival models are classically referred to as 'frailty' models by statisticians. The 'frailty' term v is defined as an unobserved random quantity which affects multiplicatively the hazard of individuals or groups of animals. When a term v m is defined for each animal 'I!!I (!,,L(t,w) = v m . À (t, w)), the frailty component extracts part of the unobserved variation between individuals (Vaupel et al, 1979;Hougaard, 1986a,b;Follmann and Goldberg, 1988;Aalen, 1994) and therefore allows for a correction of the possible discrepancy between the true variance of the observations and the one specified by the model. Such an extra variation is referred to as 'overdispersion' (Louis, 1991;Tempelman and Gianola, 1994). When vq is defined for a group of individuals, eg, all daughters of a sire q, it describes the shared unobservable (genetic, in this case) characteristics which act on the hazard of each member of the group (Clayton and Cuzick, 1985;Anderson et al, 1992;Klein, 1992;Klein et al, 1992). In all cases, the simple transformation s = log v allows the inclusion of the frailty term in the linear term w'O.
Traditionally, a gamma (Clayton and Cuzick, 1985;Ducrocq, 1987;Klein, 1992) distribution has been attached to the frailty term v because of its flexibility and mathematical convenience. Other distributions have also been proposed, eg, a positive stable distribution or an inverse Gaussian distribution (Hougaard, 1986a,b;Klein et al, 1992). Unfortunately, in all cases, they do not have the theoretical appeal of the (multivariate) normal distribution commonly used in animal breeding when a infinitesimal polygenic model is assumed. However, it has been shown that the estimates obtained for the parameters of the gamma distribution of v were relatively large, at least in dairy cattle, which means that v had an approximate lognormal distribution, ie, s was approximately normally distributed (Ducrocq, 1987;Ducrocq et al, 1988b;Ducrocq, 1994). Therefore, it has been suggested to account for the genetic relationship between animals by assuming a multivariate normal distribution for s, the logarithm of the frailty term v (Ducrocq, 1987;Korsgaard, 1996).
Several approaches have been used to estimate the parameters of the frailty distributions. Klein (1992)   suggested the use of an EM algorithm (Dempster et al, 1977), with iterative estimation of v, 0 and the baseline cumulative hazard distribution for a Cox model, followed by the estimation of the frailty distribution given 0. When a Weibull model is combined with a gamma frailty term, Follmann and Goldberg (1988) showed that the frailty term can be algebraically integrated out from the likelihood function. The same property has been used in a Bayesian context (Ducrocq, 1987;Ducrocq et al, 1988b;Fournet, 1992;Ducrocq, 1994). Monte-Carlo techniques have also been suggested in order to obtain the marginal posterior distributions of the hyperparameters (Clayton, 1991;Dellaportas and Smith, 1993;Korsgaard, 1996) but their use on large data sets with complex models (eg, with time-dependent covariates) may be very tedious.
The objective of this paper is to present a general Bayesian approach to the analysis of mixed survival models, with (but without being restricted to) typical animal breeding situations in mind. The framework will be presented for a simple Weibull model with two types of priors for the frailty term (gamma or log-normal). Straightforward generalization to other models (with stratification and time-dependent covariates, Cox models) will follow. A particular strategy for estimation of the hyperparameters suitable for large applications, complex models and situations where a relationship matrix is used will be presented and its performance will be studied on simulated data.

METHODS
In the Weibull regression case, the baseline hazard function has the Weibull form A o (t) = A/9(A!!. For the time being, we will assume that all covariates are timeindependent and that only one baseline is defined (no stratification). The vector 0 includes fixed and random effects. For clarity, and unless specified otherwise, only one random effect in the model, eg, a sire effect s is considered here. Using the classical linear mixed-model notation: where 13 is the vector of fixed effects.
The hazard function A(t) for animal m is: and p log A can be incorporated in a grand mean (or any factor) in w! 0.
For simplicity, we will write from now on: using the same notation but keeping in mind that a component of w£ 0 (representin g an intercept) now includes p lo g A . m If the record comes from a daughter m of sire q, with observed failure at T m : Here, vq = e sq is the frailty term. The usual relationship f (t) = A(t)S(t) where S(t) = J 0 A(u) du can be used to show that [3] is a particular case of a log-linear model of the form (Kalbfleisch and Prentice, 1980): where u m follows an extreme value distribution (Kalbfleisch and Prentice, 1980; Lawless, 1982) whose variance is equal to !r2/6. Note that here um implicitly includes three-quarters of the additive genetic variance. With this presentation, a natural definition of the heritability of the survival trait on the logarithmic scale is: Formula [6] solves the problem of a proper definition of heritability for survival traits indicated in Ducrocq (1987) and Ducrocq et al (1988b).
Prior distributions Gamma frailty model Assume: vq N gamma ( T , 7) ie, sq -(generalized) log-gamma(-/ ,-y) The log-gamma distribution (Bartlett andKendall, 1946, according to Lawless (1982), p 21) corresponds to the distribution of logx when x follows a gamma distribution. Note however that the suffix 'log-' (eg, in 'log-normal') is often given to the distribution of x when log x has a known form (eg, normal). Again, the choice of this prior distribution is mainly related to its flexibility and mathematical convenience (see also Klein, 1992, andKlein et al, 1992). Then: Log-normal frailty model In quantitative genetics, due to the infinitesimal polygenic model usually assumed, it is more natural to consider the following prior distribution for the frailty term: and if sires are related: where A is the relationship matrix between sires, we have Hyperparameters In order to simultaneously consider the two previous cases, we will denote the dispersion parameter of the random effect distribution by T (with T = y or T = os 2 ) and we will assume a flat prior for T as well as for (3 and p:

Likelihood construction
Conditionally on 0 and p, the contribution to the likelihood of animal m which fails (8 m = 1) or is censored (6&dquo;, = 0) at time y m , is: where S(t) is the survivor function at time t. For the Weibull model, these two components are: Combining all these contributions (for m = 1, ... , N) which are conditionally independent, we obtain: where {unc} and {cens} represent the sets of indices m corresponding to uncensored and censored records, respectively. Joint posterior density Applying Bayes' theorem, we obtain: and taking the logarithm on both sides: Inference on 0 and p If we assume that T is known, the logarithm of the joint posterior density of Using the same notation as in Tempelman and Gianola (1993), let 8 r be the mode of this joint posterior density: At the mode, the gradient vector is null: For latter use, we also need to define the negative Hessian matrix: Joint inference on (3, p and T Consider here the particular case of the gamma frailty model, where the random effect s has a log-gamma distribution ( T = &dquo;y; this implies that the genetic relationship between sires is ignored). Then the marginal posterior density of 0, p and T is obtained by integrating out s from the joint posterior density p(e,p,T I Y) = P((), P, 7 1 y ): Grouping the contributions to the likelihood of all daughters of each sire q: where now func, q} and {cens, q} are the sets of indices m of the nq uncensored and the censored daughters of sire q, respectively. Writing e!'1° = e x ;&dquo;{3 e Sq for all daughters of sire q, one can factor out the terms which do not depend on sq, which leads to: with: and: Each of these products, for q = 1, ... N 9 , is of the form: The term under the integral can be recognized as the kernel of a log-gamma distribution with parameters (n 9 + &dquo; Y ) and (Qq + -!). Therefore, Hence, the integration of the random effects sq out of the joint posterior density can be done algebraically:  Ducrocq (1994) for the estimation of the sire variance of the length of productive life of dairy cows. Follmann and Goldberg (1988) referred to the distribution in [28] as a multivariate Burr distribution. Again, (3, p and q can be estimated as the mode of this posterior distribution: with associated negative Hessian matrix H. Inference on T Inferences on the dispersion parameter T should be based on its marginal posterior distribution, after integrating out the nuisance parameters 0 and p (Berger, 1985; Robert, 1992): or: J J Except in trivial cases, this integration cannot be performed algebraically. To obtain the marginal posterior distribution of the dispersion parameter T , one can either simulate random samples from it (Clayton, 1991;Dellaportas and Smith, 1993;Korsgaard, 1996), compute the integral numerically (Smith et al, 1985) or find an approximation. We will choose the third alternative, using a technique known as Laplacian integration (Tierney and Kardane, 1986;Achcar and Bolfarine, 1986;Tierney and Kardane, 1986;Tierney et al, 1989;Tempelman and Gianola, 1993; Goutis and Casella, 1996). For any given value T * of T , we want to approximate: Intuitively, ifp(0,p ! y,r) = p(6 T -* ) is unimodal, the value of the integral will heavily depend on the value of the density at its mode 6r* . Then, using the first terms of a Taylor series expansion of logp(6!*) around this mode and noticing that , 7 p ,. (% r * ) = 0, we have: The determinant part in the last equation is obtained by recognizing the kernel of a multivariate normal density of mean È>r* and variance H T * under the integral sign.
This results in an approximation of the marginal posterior density which is similar to what is described in the statistical literature as a saddlepoint approximation of this density (Daniels, 1954;Reid, 1988;Kolassa, 1994;Goutis and Casella, 1996).
Taking the logarithm on both sides, we get the following approximation: An obvious point estimate of T is T at the mode of this approximate marginal posterior density: However, the use of [34] is not limited to the computation of its mode. Other point estimates or other types of inferences (credible sets or hypothesis testing, etc (Berger, 1985; Robert, 1992)) can be derived from the knowledge of the full marginal posterior density. Repeated computations of (34!, and in particular of the negative Hessian matrix H, for many different values of T may quickly become too heavy, though. We propose to summarize the general characteristics of the distribution [34] through the computation of its first three moments by unidimensional numerical integration based on Gauss-Hermite quadrature. To obtain a more precise estimate of these moments after quadrature, the iterative strategy proposed by Smith et al (1985) is implemented. Using initial values of the mean and the variance of the distribution of log T (to force the integration domain to be (&mdash; 00 ,+ 00 )), the integration variable is standardized. New estimates are obtained by quadrature and the standardization is repeated. After a few iterations, this strategy ensures that the quadrature rules are applied in an appropriate region of the function to integrate. Details are given in the Appendix. The results can be used to obtain a second approximation of the marginal posterior density based on its first three moments. Using an expression known as the Gram-Charlier series expansion of a function f (!) of a variable x with moments p, 0 &dquo; 2 and !c, we have (McCullagh, 1987): where §(z) is the density of a normal distribution with mean !, and variance Q 2 2 and z = (x -p)lo,.

Cox model
The application of the saddlepoint approximation to obtain the marginal posterior density of the dispersion parameter of the random effect is not restricted to the Weibull regression model. It can be applied, at least in theory, to any joint posterior density. For example, in the case of a Cox mixed model, for which the baseline hazard function Ao (t) is assumed to be completely arbitrary, p(0,-* !, y, T * ) and the corresponding negative Hessian matrix H 7 * in [34] can be derived replacing the likelihood function in [16] by the partial likelihood function initially proposed by Cox (1972): where the T!2!'s are the distinct observed failure times and Risk(T!Z! ) is the set of individuals at risk at time T [i] , ie, alive just prior to 7!. Then, assuming that T is known, the estimate of 0 to be used in [34] is obtained from the joint posterior density as: Stratification. Time-dependent covariates Stratification and the use of time-dependent covariates are common approaches to accommodate situations for which the proportional hazards is not valid for all effects or throughout the whole time range. As for the Cox model, the main changes with respect to the situation described so far occur in the computation of the likelihood and its derivatives and do not interfere with the validity of the saddlepoint approximation. For example, if the covariates in b f mw.&dquo;,, are step-functions of time with changes at times cp,&dquo;,,,i, i = 0, ...I with W ,,, o = 0 and <!m,7 = Ym , then w m is piecewise constant on intervals (cp,,&dquo;,, i , cp&dquo;,,, i+1 and the expressions to use in [12] are: In the case of stratification, the hazard function A(y,,,) and the survivor function S( Ym ) include parameters p and p log A (the 'intercept' in w£0 in !1!) specific to the relevant stratum.

ILLUSTRATION
In order to illustrate the approach described above for the estimation of dispersion parameters of the random effects in frailty models, simulated data were generated based on a Weibull model with a random effect (that will be referred to as a sire effect) and mimicking the data structure that is often encountered in animal breeding situations. The objective was to assess the quality of the saddlepoint approximation by comparing the exact marginal posterior distribution of the variance parameter of the sire effect ( !28! obtained via algebraic integration) with its approximation (!34! after Laplacian integration). This comparison was done under the following conditions: a log-gamma distribution [8] was considered as a prior for the sire effect (which is a prerequisite for possible algebraic integration); only one fixed effect (13 = ft the grand mean ) was included; and it was assumed that in !28!, we have: Preliminary examination of [43] showed that in all cases studied, the density [43] was virtually identical to the approximate density p(-y y) after integrating out /t and p by Laplacian integration. In other words, what was actually compared here are two approximate densities obtained after Laplacian integration of / -t, p and Nq sire effects s9 in one case, of p and p (with algebraic integration of the sq's) in the other case.
The general behavior of the saddlepoint approximation of the marginal posterior density of the sire variance was also examined under a variety of situations (different types of censoring, of unbalanced structure, with a multivariate normal prior, with relationships between sires, using a Cox model, etc).

Simulation strategy
In all situations (unless specified otherwise), 5 000 records were generated using the following Weibull hazard function: where A jk q (t) represents the hazard at time t of the jth animal (j = 1, ... 5 000/Nq) under the influence of the kth level of a fixed effect, hereafter referred to as the 'herd' effect (k = 1,... K) and daughter of the qth sire (q = 1,... N Q ). Values p, _ -11 1 and p = 1.5 were used in all cases described here, corresponding to an average failure time of about 1800. For the comparison between Laplacian and algebraic integrations, it was assumed that K = 0, ie, !3! = 0 and the sire effects sq were generated from a log-gamma distribution with parameter 'Y = 50. This corresponds to a variance of sq equal to 1}i(1) C'Y) ! 0.02, where is the trigamma function evaluated at y. Using expression !6!, we get: which is in the typical range of heritability values encountered for this kind of trait.
When a normal distribution was assumed, a sire variance of 0 &dquo; = 0.02 was retained to generate the sire effects. When herd effects were used in model [44] (K > 0), these were arbitrarily generated from a uniform !-2, 2! distribution.
Two different censoring schemes were simulated. In censoring type A, all generated records greater than a given value C A were considered as censored at C A .
The value of C A was chosen by trial and error in order to obtain a given proportion of censored records. Censoring type B tried to mimic an overlapping generations scheme. The daughters of a first batch (10%) of sires had a censored record equal to C B when their simulated failure time was greater than C B . The daughters of the following batch (also 10%) of sires were considered as censored when their failure time was greater that 2C B , and so on. The censoring time for the last 10% was lOC B . Therefore, the daughters of the first group of sires were heavily censored ('young daughters of young sires') while the proportion of censored records for the last group was small ('daughters of old sires'). Again, C B was determined by trial and error.
Different unbalanced situations were also simulated. In scheme U1, the daughters of 100 sires (with 50 daughters each) were distributed over 505 herds, five with 500 animals and 500 with five daughters. In scheme U2, half of the animals (2 500) were assumed to be daughters of five sires with 500 daughters each while the other half were daughters of 500 sires with five daughters each. These animals were randomly distributed over 100 herds. Finally, in scheme U3, the daughters of the 50 'best' sires (with 50 daughters each) were raised in the 'best' 50 herds (where 'best' means lowest relative culling rate) while the daughters of the 'worst' 50 sires were raised in the 'worst' herds.
To study the impact of the existence of genetic relationships between individuals, data were generated according to a model slightly different from [44]. First, the effects sg, of ten grandsires ('sires of sires') were generated from a normal distribution with mean 0 and variance a 2/4 (with 0 & d q u o ; ; = 0.02). For each of them, ten sire effects sq were obtained by adding to sg, a normally distributed random effect with variance 3o!/4. Finally, 50 records of daughters of each of these sires were simulated according to the model: where r j represents the remaining additive genetic effect for the jth animal and was generated from a normal distribution with mean 0 and variance 3 Q9 , leading to records with a global additive genetic variance equal to Q a = 4cr!. These data were analyzed and the marginal posterior density of the sire variance component was obtained under three different genetic models: two sire models identical to [44] assuming no relationships between sires (case Sl) or including the relationship matrix between sires (case S2), and an 'animal' model (case An), describing the individual additive genetic effect a j of each animal j and including the complete relationship matrix between the 5 110 animals (5 000 with records + 100 sires + 10 grand-sires): All computations were done using the 'Survival Kit', a set of Fortran programs developed by Ducrocq and S61kner (1994). The 'Survival Kit' was specifically written to efficiently analyze the very large field data sets encountered by animal breeders and implements all the features described in this paper with Weibull and Cox models, possibly with strata, time-dependent covariates and random effects.
In particular, the maximization of the expressions [18] or [29] is based on a limited memory quasi-Newton method (Liu and Nocedal, 1989)

Results
Laplacian integration vs Algebraic integration Figure 1 represents the marginal posterior distribution obtained after integrating out the sire effects s 9 from the joint posterior distribution, either algebraically or using the Laplacian approximation. All records were uncensored. In the three samples presented here, the true value q = 50 is obviously included in any reasonable HPD credible set. When there were few sires with many daughters each, the two computed forms of the marginal posterior distribution were virtually indistinguishable. When little information was available for each sire effect (ten daughters each in the 500 sires case), the marginal posterior distributions were rather flat, with a long tail towards large values of q (ie, small sire variances). The agreement between Laplacian and algebraic integration was not as good, although the modes of the two distributions were close. With even less information per sire (five daughters or less per sire), neither of the two marginalization techniques worked in most of the cases: the mode of the distribution or its first moments could not be computed.
Effect of censoring Figure 2 presents again the result of the same two marginalization approaches, for 100 sires with 50 daughters each but under censoring schemes A and B, with in both cases a proportion of 50% censored records (C A = 1200 and C B = 270). Clearly, censoring had little effect on the quality of the approximation when the Laplacian integration was used. However, because the amount of information available to estimate a rather small sire variance was drastically reduced, it was not always possible to obtain a well-defined posterior density (see Breslow and Clayton (1993) for similar results in the context of generalized linear mixed models). For example, in figure 2, the posterior density in the case of censoring scheme A does not integrate to 1. The same phenomenon also occurred for some samples with censoring scheme B. Interestingly, when sire effects with a larger variance 7 = 10 were simulated, which corresponds to an heritability of 0.24, even extreme situations with more than 80% censored records (with C A = 520) led to well-defined, very peaked posterior densities.

Normally distributed random effects
Having shown the validity of the saddlepoint approximation of the marginal posterior density, other samples were generated with normally distributed sire effects and with 100 (fixed) 'herd' effects. Figure 3 displays the marginal posterior density for ten such samples, with 100 sires and no censoring. The obtained distributions were not as skewed as in the case of a log-gamma distribution. At least in the examples studied, the true value 0.02 was always in any HPD credible set. Note however that the variance of these densities were quite large (standard deviations between 0.0049 and 0.0079 for a true parameter value of 0.02).

Effect of unbalancedness
When unbalancedness was induced by simulating both very large and small herds (case Ul), the effect on the marginal posterior density appeared to be minimal (fig 4). When a large heterogeneity was created in the number of daughters per sire (case U2), the main consequence was a less precise estimation of the sire variance.
The most negative impact was observed when the animals were not randomly distributed across herds (case U3). It seems that a part of the favorable influence of the best sires on the survival of their daughters was attributed to the herd effects, resulting in a sire variance strongly biased downwards.

Including a relationship matrix
The two marginal posterior densities obtained under a sire model with or without inclusion of the true relationship matrix between sires were very similar ( fig 5).
As may have been expected, the inclusion of the relationship matrix slightly increased the variance of this posterior density, because it accounts for the fact that the records of related animals are more similar, hence globally less variable. In all the samples simulated, the animal model consistently led to a slight overestimation of the sire variance: the marginal posterior density in the case of the animal model was systematically to the right of those for the two sire models. This may be attributed, at least in part, to the fact that a much larger number of parameters have to be integrated out with an animal model than with a sire model. Such a problem has been pointed out for example by Mayer (1995) in the context of a threshold model. The Laplacian integration probably does not perform as well in such a case. Note, however, that this may be worsened by the fact that only a very simple pedigree structure was simulated here. In particular, no information at all was assumed to be available on the female side. The sire model used does not account for the overdispersion implicitly created by the effect r j , which represents three-quarters of the total additive genetic variance. An attempt to fit a model similar to [46] assuming a log-gamma prior distribution for r j and performing the algebraic integration of r j led to a marginal posterior density of the sire variance similar to that obtained with the two sire models and a very large estimate (q > 400 at the mode) for the gamma parameter, synonymous of a very small variance for the r j 's. This is likely the result of the lack of information available for the estimation for q that was already illustrated in figure 1.

Cox model vs Weibull model
When a parametric (Weibull) or semi-parametric (Cox)  Approximation of the marginal posterior density of T based on its first three moments The first three moments of the marginal posterior density of the parameter T were computed by numerical integration of [34] using a five-point Gauss-Hermite quadrature formula and after standardization of the function to integrate. New standardization factors were obtained and the procedure was repeated until the computed moments stabilized, which usually occurred after only three iterations. Figure 7 illustrates the fact that the knowledge of these moments leads to a reasonable approximation of the marginal posterior density of T

DISCUSSION AND CONCLUSION
Bayesian analysis offers a coherent framework for the otherwise unclear problem of variance components estimation in mixed nonlinear models (Ducrocq, 1990): all the elements for inferences on dispersion parameters are contained in the marginal posterior distribution of these parameters and the construction of the latter is based on general principles. Particular applications to animal breeding situations were proposed for categorical data H6 5 chele et al, 1987;Foulley et al, 1989) and for Poisson mixed models (Tempelman and Gianola, 1993). In this paper, a general approach for genetic evaluation and estimation of dispersion parameters for Weibull and Cox mixed models was described. Its main attractive features are its generality and its computational feasability, even for very large applications. As an example of the latter, the largest analysis that we have carried out involved the estimation of the mode and the first three moments of the marginal posterior distribution of the sire variance component for the length of productive life of 633 516 Holstein cows, daughters of 3 613 related sires. The Weibull mixed model used was quite complex and included time-dependent effects such as a herd- year-season effect (with 82 713 levels, assumed to be randomly distributed with a log-gamma distribution), a lactation number x stage of lactation effect, a herd size effect and a year-to-year variation in herd size effect as well as continuous linear and quadratic effects of covariates such age at first calving, milk, fat and protein yield. Popular extensions of proportional hazards models such as stratification or the use of time-dependent covariates complicate the actual likelihood computations but do not interfere with the marginalization procedures described here. The inclusion of genetic relationships between individuals is straightforward through the use of an appropriate prior distribution. Other prior distributions (including informative priors) or other parametric baseline hazard functions could have been incorporated. More complex genetic structures (eg, with maternal effects) can be fitted. When more than one random effect is considered in the model, the approximation described here leads to the joint marginal posterior of all the dispersion parameters for all random effects. Further marginalization can be performed numerically along the lines described in the Appendix for the calculation of the moments of the marginal posterior distribution but this may be considered too costly. In the case of a Weibull mixed model with two random effects, one of them having a log-gamma distribution, the possibility of integrating out the latter algebraically avoids this difficulty.
Laplacian integration can be applied to other situations too. For example, Tierney and Kadane (1986) and Tierney et al (1989) suggested the direct computation of the mean of the marginal posterior density using second-order approximation formulae. These formulae were derived applying Laplacian integration to both the numerator and the denominator of a ratio of integrals. However, this requires the maximization of the joint posterior density for the dispersion parameters, the fixed effects and the random effects. This approach failed when we attempted it as the maximization procedure led to dispersion parameters estimates corresponding to random effects with null variance. The same phenomenon had been described previously in similar situations (Tempelman and Gianola, 1993).
At least in theory, Laplacian integration could have been used to obtain the marginal posterior distribution of parameters other than the dispersion parameters. However, this may be considered far too demanding, because each application of the Laplace expansion requires the maximization of one particular function involving all parameters except the one of interest. This is in contrast with some Monte-Carlo methods, such as Gibbs sampling, where the marginal distributions for all parameters can be obtained simultaneously. However, in practical animal breeding situations, the separate consideration of all marginal densities is often not required, because estimated breeding values are point estimates mainly used to rank animals: when little information is available for the genetic evaluation, an accurate ranking of the candidates to selection is unrealistic. In the opposite case (precise estimation), the rankings based on, say, the mode or the mean of either the marginal or the joint posterior distribution are likely to be very similar. Marginal posterior densities of nonlinear functions of parameters can also be calculated (Wong and Li, 1992).
Marginalization based on Laplacian integration has been shown to give excellent results in standard situations. For many nonlinear applications, the quality of the saddlepoint approximation would have to rely on the comparison of the approximate marginal distribution of the dispersion parameters with the actual distribution obtained via Monte-Carlo simulations. The exceptional situation studied here where an exact algebraic integration of a log-gamma random effect is possible permits a more straightforward comparison. It was found that the designs for which the two marginal posterior distributions (exact and approximate) depart from each other correspond to situations where the quantity of information available for the estimation of genetic parameters is quite limited. This means, in particular, that the saddlepoint approximation is likely to be unsuccessful for the estimation of the parameters of a frailty term used to describe an extra variation (overdispersion).
However, one can still use algebraic integration of the random effects in the case of a gamma frailty component in a Weibull model.