Bayesian inference in the semiparametric log normal frailty model using Gibbs sampling

- In this paper, a full Bayesian analysis is carried out in a semiparametric log normal frailty model for survival data using Gibbs sampling. The full conditional posterior distributions describing the Gibbs sampler are either known distributions or shown to be log concave, so that adaptive rejection sampling can be used. Using data augmentation, marginal posterior distributions of breeding values of animals with and without records are obtained. As an example, disease data on future AI-bulls from the Danish performance testing programme were analysed. The trait considered was ’time from entering test until first time a respiratory disease occurred’. Bulls without a respiratory disease during the test and those tested without disease at date of analysing data had right censored records. The results showed that the hazard decreased with increasing age at entering test and with increasing degree of heterozygosity due to crossbreeding. Additive effects of gene importation had no influence. There was genetic variation in log frailty as well as variation due to herd of origin by period and year by season. &copy; Inra/Elsevier, Paris survival analysis / semiparametric log normal frailty model / Gibbs sampling / animal model / disease data on performance tested bulls


INTRODUCTION
When survival data, the time until a certain event happens, is analysed, very often the hazard function is modelled. The hazard function, A i (t), of an animal i, denotes the instantaneous probability of failing at time t, if risk exists. In Cox's proportional hazards model [5] it is assumed that A i (t) = A o (t) exp{x!,6}, where, in semiparametric models, A O (t) is any arbitrary baseline hazard function common to all animals. Covariates of animal i, x i , are supposed to act multiplicatively on the hazard function by exp{x!,6}, where ,Q is a vector of regression parameters. In fully parametric models the baseline hazard function is also parameterized. The proportional hazard model assumes that conditional on covariates, the event times are independent and attention is focused on the effects of the explanatory variables. The baseline hazard function is then regarded as a nuisance factor. Frailty models are mixed models for survival data. In frailty models it is assumed that there is an unobserved random variable, a frailty variable, which is assumed to act multiplicatively on the hazard function. Sometimes a frailty variable is introduced to make correct inference on regression parameters. In other situations the parameters of the frailty distribution are of major interest.
In shared frailty models, introduced by Vaupel et al. (32), groups of individuals (or several survival times on the same individual) share the same frailty variable. Frailties of two individuals have a correlation equal to 1 if they come from the same group and equal to 0 if they come from different groups. Mainly for reasons of mathematical convenience, the frailty variable is often assumed to follow a gamma distribution. In the animal breeding literature, this method has been used to fit sire models for survival data using fully parametric models (e.g. [8,10]).
Several papers deal with correlated gamma frailty models (e.g. [22, 26, 30, 31!). In these models individual frailties are linear combinations of independent gamma distributed random variables constructed to give the desired variance covariance matrix among frailties. From a mathematical point of view these models are convenient because the EM algorithm [7] can be used to estimate the parameters. Because of the infinitesimal model often assumed in quantitative genetics, frailties may be log normally distributed; thereby conditional random effects act multiplicatively on the baseline hazard as do covariates. It is not immediate to use the EM algorithm in log normally distributed frailty models as stated by several authors and shown in Korsgaard !21!.
In this paper we show how a full Bayesian analysis can be carried out in a semiparametric log normal frailty model using Gibbs sampling and adaptive rejection sampling. It is shown that by using data augmentation, marginal posterior distributions of breeding values of animals without records can be obtained. The work is very much inspired by the works of Kalbfleisch !19!, Clayton !4!, Gauderman and Thomas !11! and Dellaportas and Smith !6!. Kalbfleisch [19] presented a Bayesian analysis of the semiparametric regression model. Gibbs sampling was used by Clayton [4] for Bayesian inference in the semiparametric gamma frailty model and by Gauderman and Thomas [11] for inference in a related semiparametric log normal frailty model with emphasis on applications in genetic epidemiology. Finally Dellaportas and Smith [6] demonstrated that Gibbs sampling in conjunction with adaptive rejection sampling gives a straightforward computational procedure for Bayesian inferences in the Weibull proportional hazards model.
The semiparametric log normal frailty model is defined in section 2 of this paper. In this part we show how a full Bayesian analysis is carried out in the special case of the log normal frailty model, where the model of log frailty is a variance component model. The full conditional posterior distributions required for using Gibbs sampling are derived for a given set of prior distributions. In section 3, we analyse disease data on performance tested bulls as an example and section 4 contains a discussion. variable, equal to 1 if T i < C i , and 0 if C i < T i . In the semiparametric frailty model, it is assumed that, conditional on frailty Z i = z i , the hazard function, Ài(t), of Ti; i = 1, ... , n, is given by where A h (t) is the common baseline hazard function of animals that belong to the hth stratum, h = 1, ... , H, where H is the number of strata. x i (t) is a vector of possible time-dependent covariates of animal i and is the corresponding vector of regression parameters. Z i is the frailty variable of animal i. This is an unobserved random variable assumed to act multiplicatively on the hazard function. A large value, z i , of Z i increases the hazard of animal i throughout the whole time period.
Definition: let w = (wl, ... , w n )'; if w I E -N n (0, E) and the frailty variable Zi in equation (1) be given by Z i = exp f w 2 }, i.e. Z Z is log normally distributed; i = 1, ... , n. Then the model given by equation (1) is called a semiparametric log normal frailty model. This is the definition of a semiparametric log normal frailty model in broad generality. However, special attention is given to a subclass of models where the distribution of log frailty is given by a variance component model: or in scalar form, w i = Uj + a i + e i where j is the class of the random effect, u, that animal i belongs to; j E {1, ... , q}. a i is the random additive genetic value and e i the random value of environmental effect not already taken into account. It is assumed that ula ' -Nq(O, Iq U '), a[a § -N N (0, Aa!) and e!er! !!(0,In.cr!). Q u and Q a . Q! and Q a are known design matrices of dimension n x q and n x N, respectively, where N is the total number of animals defining the additive genetic relationship matrix, A, and n is the number of animals with records. Here, (u, a'), (a, or') and (e, U2 ) are assumed to be mutually independent. Generalizations will be discussed later. From equation (2), the hazard of T i is: assuming that the covariates are time independent and that there is no stratification. The vector of parameters and hyperparameters of the model Note that log frailty, w i , of animal i, is an unobserved quantity which is modelled. This is analogous to the threshold model (e.g. [28]), where an unobserved quantity, the liability, is modelled. In the threshold model, a categorical trait is considered, but heritability is defined for the liability of the trait. In the semiparametric log normal frailty model the trait is a survival time, but heritability is defined for log frailty of the trait. The semiparametric log normal frailty model is not a log linear model for the survival times T i , i = 1, ... , n. The only log linear models that are also proportional hazards models are the Weibull regression models (including exponential regression models), where the error term is e/p, with p being a parameter of the Weibull distribution and having the extreme value distribution !20!. Without restriction on the baseline hazard, the proportional hazard model postulates no direct relationship between covariates (and frailty) and time itself. This is unlike the threshold model, where the observed value is determined by a grouping on the underlying scale.

Prior distributions
In order to carry out a full Bayesian analysis, the prior distributions of all parameters and hyperparameters in the model must be specified. A priori, it is assumed (by definition of the log normal frailty model) that u, given the hyperparameter ( 7 u 2, follows a multivariate normal distribution: U I U2 u -Nq(O,I9Qu). Similarly, it is assumed that ala 2 -NN (0, AO,2 ) and e 10,2 _ N,,(o,l,,a2) A priori elements in /3 are assumed to be independent and each is assumed to follow an improper uniform distribution over the real numbers; i.e. p({3 b ) oc 1; b = 1,...,.B, where B is the dimension of !3. The hyperparameters a£, a § a and Q e are assumed to follow independent inverse gamma distributions; i.e. a! '&dquo; IG(¡.¿u, lIu), a! '&dquo; IG(¡ ' ¿ a , v a ) and or2 -IG(¡ ' ¿ e , v e ), where ¡ ' ¿ u , lIu , pa, v a , and,a,, v e , are values assigned according to prior belief. The convention used for inverse gamma distributions is given in the Appendix. The baseline hazard function >'0 (t) will be approximated by a step function on a set of intervals defined by the different ordered survival times, 0 < t ( 1 )  The usual convention that survival times tied to censoring times, precede the censoring times is adopted. Furthermore, as in Breslow [3], it is assumed that censoring occurring in the interval [t( m-1 ) t(m)) occurs at t(,,,-1 ); Under the assumption, where, conditional on u, a and e, censoring is independent (e.g. [1,2]), the partial conditional (censoring omitted) likelihood is given by (e.g. (15!). Under the assumptions given above, equation (5) (6) and p(qp) is the prior distribution of parameters and hyperparameters given by equation (4).

Marginal posterior distributions and Gibbs sampling
If cp is a parameter or a subset of parameters of interest from 1/i, the marginal posterior distribution of cp is obtained by integrating out the remaining parameters from the joint posterior distribution. If this can not be performed analytically for one or more parameters of interest, Gibbs sampling [12,14] can be used to obtain samples from the joint posterior distribution, and thereby also from any marginal posterior distribution of interest. Gibbs sampling is an iterative method for generation of samples from a multivariate distribution which has its roots in the Metropolis-Hastings algorithm [17, 24!. The Gibbs sampler produces realizations from a joint posterior distribution by sampling repeatedly from the full conditional posterior distributions of the parameters in the model. Geman and Geman [14] showed that, under mild conditions, and after a large number of iterations, samples obtained are from the joint posterior distribution.

Full conditional posterior distributions
In order to implement the Gibbs sampler, the full conditional posterior distributions of all the parameters in 1/i must be derived. The following notation is used: that 1/i B <p denotes 1/i except cp; e.g. if cp = {3, then 1/i V3 is (A 01' ... , A om , u, o'!, a, <r!, e, o, e 2) The full conditional posterior distribution of cp given data and all the remaining parameters, 1/iB<p' is proportional to the joint posterior distribution of 1/i given by equation (7). From equation (7) it then follows that the full conditional posterior distribution of u j , j = 1, ... , q up to proportionality is given by where Of = !! exp{ai+ei+x!,8}Aom and d(u j ) is the number of animals !n,a!.&dquo;,,! < y;, that failed from the jth class of u and S( Uj) is the set of animals belonging to the jth class of u. For i, an animal with records, the full conditional posterior distribution of a i is given by where Of = L exp{uj+et+x!}Aomand{!4''-'}aretheelementsofA !. Sampling from gamma, inverse gamma and normalely distributed random variables is straightforward. The full conditional posterior distribution of u!, of a i , for i, an animal with records, of e i and of regression parameters, given by equations (8), (9), (11) and (12), respectively, can all be shown [21] to be log concave, and therefore adaptive rejection sampling [16] can be used to sample from these distributions. Adaptive rejection sampling is useful in order to sample efficiently from densities of complicated algebraic form. It is a method for rejection sampling from any univariate log-concave probability density function, which need only be specified up to proportionality.

Data
As an example, disease data on future AI-bulls from the Danish performance testing programme for beef traits of dairy and dual purpose breeds were analysed. The trait considered was 'time from entering test until first time a respiratory disease occurred'. The bulls of the Danish Red breed were all performance tested in the 15-year period [1982][1983][1984][1985][1986][1987][1988][1989][1990][1991][1992][1993][1994][1995][1996] and entered the Aalestrup test station between 23 and 74 days of age. Bulls which did not experience a respiratory disease during the test period or which were still undergoing testing, on the date of data analysis have right censored records. For these animals, it is only known that the time at first occurrence of a respiratory disease, T i , will be greater than the time at censoring, C i , that is, either the time at the end of the test (336 days of age) or the time at the date of data analysis or the time at being culled before end of test (a very rare event). Data on animal i; i = 1, ... , n is (y; , 6 i ), where y i is the observed value of Y = min{T i , C i } and 6 i is a random indicator variable, equal to 1 if a respiratory disease occurred during test, and 0 otherwise. Data on all animals is (y, 6).

Model
It is assumed that the hazard function, A i (t), of T i , is given by where t is time (in days) from entering test. In (17), A o (t) is the baseline hazard function; x' = ( X i l , X i 2 , X i3, !i4) is a vector of covariates of animal i; x il ranges between 23 and 74 days of age in the data and is the animal's age at entering test; x iz ranges between 0.0 and 1.0 and x i3 ranges between 0.0 and 0.78125 and are proportions of genes from foreign populations (American Brown Swiss and Red Holstein cattle) and x i4 (which ranges between 0.0 and 1.0) is the degree of heterozygosity due to crossbreeding. x ii is included in order to take into account that bulls are entering test at different ages; Xi2 and x 23 in order to take additive effects of gene importation into account and x i4 in order to take account of heterosis due to dominance. { 3' _ (0 1 , ... , Q4) is the corresponding vector of regression parameters. Z i = exp{h! + s k + a i + e i is the log normally distributed frailty variable of animal i. h j is the effect of the jth herd of origin by period combination (one period is 5 years), j = 1, ... , J, where J is the number of herd of origin by period combination, and s k is the effect of entering test in the kth yearseason (one season is 1 month), k = 1,..., K, where K is the number of yearseasons. a i is an additive genetic effect of animal i and e i is an effect of environment not already taken into account; i = 1, ... , n, where n is the number of animals with records. In this example J is 540, K is 170 and n is 1 635. The relationship among the test bulls was traced back as far as possible, leading to a total of N = 5 083 animals defining the additive genetic relationship matrix.

Implementation of the Gibbs sampler and results
The Gibbs sampler was implemented with prior distributions according to the previous section. The prior distributions of the hyperparameters a 2, as, or 2 and or2 were given by inverse gamma distributions with parameters and That is, the prior means were of afl and Q a were 0. 3) sample afl from the inverse gamma distribution given by equation (13) with, a2 = Oh, q = J, u = h and (pu, 1 /u) = (p h , Vh ); 4) sample a i from the normal distribution given by equation (10) if i is an animal without records; if i is an animal with records, a i is sampled from equation (9) with h j + s,! substituted for u j in Of and using adaptive rejection sampling; 5) sample Q a from the inverse gamma distribution given by equation (14); 6) sample e i ; i = 1, ... , n from equation (11) with h j + s k substituted for Uj in Of using adaptive rejection sampling; 7) sample Q e from the inverse gamma distribution given by equation (15); 8) sample (3 b ; b = 1, 2, 3, from equation (12) with h j + Sk substituted for Uj using adaptive rejection sampling; 9) sample s k k = 1, ... , K from equation (8) with u j = s k and using adaptive rejection sampling; 10) sample u2 from the inverse gamma distribution given by (13) with a£ = 0';, q = K, u = s and ( M u, v u ) = (u s , v s ).
The marginal posterior density of a and the standardized parameters h 2 , c2, c2 and e 2 (of log frailty) are shown in figure 1. The densities were estimated by the methodology of Scott !27!. The mean of the marginal posterior density of h 2 is 0.14 where h 2 is heritability of log frailty. If herd of origin by period and year by season had been considered as fixed effects rather than random effects, then heritability of log frailty would have been higher.
The marginal posterior densities of {3 1 , /? 2 , Q3 and are shown in figure 2. Because the marginal posterior mean of (3 1 , the effect of age at entering test, is negative (-0.0061), the hazard is decreased by increasing age at entering test.
That is, for a bull entering test at 23 days of age, the conditional hazard is always exp{-0.0061 x 23}/exp{-0.0061 x 74} = 0.87/0.64 = 1.36 higher, than that of a bull entering test at 74 days of age; conditional on frailty and other covariates being the same for the two bulls. Similarly, because the marginal posterior mean of ( 3 4 , the effect of heterozygosity, is negative (-0.22), the hazard is decreased by increasing the degree of heterozygosity. The marginal posterior means of additive effects of gene immigration from American Brown Swiss and Red Holstein cattle are close to 0.0. The marginal posterior mean of h 2 , the heritability of log frailty, is 0.1406; of c!, the proportion of variation in log frailty due to herd of origin by period combination is 0.0913 and of c!, the proportion of variation in log frailty caused by a year by season effect is 0.2766.

DISCUSSION
This paper illustrates how a Bayesian analysis can be carried out in the semiparametric log normal frailty model using Gibbs sampling. In the version of the Gibbs sampler implemented here, samples were repeatedly taken from univariate full conditional posterior distributions. This is only one possible implementation. With highly correlated univariate components it could be preferable to sample from the joint conditional posterior distribution of these components. The advantage of this method is its greater speed of convergence to the joint posterior distribution [23). The methodology is quite general and could obviously be used in full parametric models and in models with stratification and/or time-dependent covariates. It is time consuming to sample thousands of observations from thousands of distributions; but, analysing the relatively small dataset in section 3 left us optimistic about possibilities for analysing larger datasets.
Another possibility is to perform a Bayesian analysis using Laplace integration. This was carried out by Ducrocq and Casella [9] with special emphasis on obtaining the marginal posterior distribution of parameters, T , of the distribution of frailty terms. Their prior distributions of frailty terms were either gamma or log normal. They point out that the computations may quickly become too heavy and propose to summarize the approximate marginal posterior of T through its first 3 moments using the Gram-Charlier series expansion of a function. The moments are found using numerical integration based on Gauss-Hermite quadrature followed by an iterative strategy to obtain precise estimates. When frailty terms are gamma distributed, these can be integrated out algebraically to obtain the marginal posterior distribution of the remaining parameters. From a likelihood point of view, gamma distributed frailties can be integrated out algebraically to obtain a likelihood based on observed data. This is so in correlated gamma frailty models as well (e.g. !22!) but not with log normally distributed frailty terms. If T is a vector of variance components of log frailty terms from a log normal frailty model such as equation (2), Laplace integration may be considered too costly (9!. These authors suggested letting some frailty terms be gamma distributed and others log normally distributed in order to integrate out algebraically some frailty terms. Implementation of the Gibbs sampler in a log normal frailty model avoids making the mathematically tractable but somewhat artificial choice of a gamma distribution of frailty terms.
In this paper some of the parameters are modelled with improper prior distributions. The joint posterior distribution is unavailable in closed form and, as pointed out by Hobert and Casella !18!, the mathematical intractability that necessitates use of the Gibbs sampler also makes demonstrating the propriety of the posterior distribution a difficult task. The fact that the full conditional posterior distributions defining the Gibbs sampler are all proper distributions does not guarantee that the joint posterior distribution will be proper. The question of which improper prior distributions will yield proper joint posterior distributions in hierarchical linear mixed models was addressed by Hobert and Casella (18!. The important question of propriety of the posterior distribution using improper prior distributions must be dealt with in a Bayesian analysis using a Laplace integration as well. One way to avoid improper posteriors is to use proper prior distributions. The analysis could have been carried out without augmenting [29] by additive genetic values of animals without records. The number of animals defining the additive genetic relationship matrix A was 5 083, but only 1635 had records. Let a be the vector of additive genetic values of animals with records, then taking the part A of A relating to animals with records, the analysis could have been carried out using the prior £[a § -N.!(0, AQa). The number of distributions needing sampling from would have been 5 083-1635 lower at the expense of obtaining marginal posterior distributions of breeding values on animals without records. However A-1 , is much denser than A-1 , and is more difficult to compute.
The model of log frailty, specified by equation (2), can easily be generalized to include more independent variance components and/or the assumption of independence can be relaxed; for example in equation (2), u could represent a maternal effect. Assuming that (u', a')'1 G rv N 2N (0, G Q 9 A), where G is the 2 x 2 matrix of additive genetic covariances, and that the prior distribution of G is inverse Wishart distributed, then the full conditional posterior distribution of G, required for Gibbs sampling, will also be inverse Wishart distributed. Furthermore, it should be possible to carry out a multivariate analysis of a survival trait, a quantitative trait and a categorical trait, by assuming that the log frailty of the survival trait, the quantitative trait and the liability of the categorical trait follow a multivariate normal distribution. It should also be possible to generalize to an arbitrary number of survival traits, quantitative traits and categorical traits.
By definition, the trait considered in the example: 'time until first occurrence of a respiratory disease during test' can occur at most once for each animal. These data are, however, only a subset of the data being collected on bulls during the test period. Each repeated occurrence of a respiratory disease is recorded, as well as other categories of diseases and several quantitative traits and other traits of interest. Oakes [25] gives a survey of frailty models for multiple event time data, and it would be interesting to extend the log normal frailty models to allow for multiple survival times for each animal as well as a multivariate analysis of censored survival time data and other traits of interest.