Genetic relationship of discrete-time survival with fertility and production in dairy cattle using bivariate models

Bivariate analyses of functional longevity in dairy cattle measured as survival to next lactation (SURV) with milk yield and fertility traits were carried out. A sequential threshold-linear censored model was implemented for the analyses of SURV. Records on 96 642 lactations from 41 170 cows were used to estimate genetic parameters, using animal models, for longevity, 305 d-standardized milk production (MY305), days open (DO) and number of inseminations to conception (INS) in the Spanish Holstein population; 31% and 30% of lactations were censored for DO and INS, respectively. Heritability estimates for SURV and MY305 were 0.11 and 0.27 respectively; while heritability estimates for fertility traits were lower (0.07 for DO and 0.03 for INS). Antagonist genetic correlations were estimated between SURV and fertility (-0.78 and -0.54 for DO and INS, respectively) or production (-0.53 for MY305), suggesting reduced functional longevity with impaired fertility and increased milk production. Longer days open seems to affect survival more than increased INS. Also, high productive cows were more problematic, less functional and more liable to being culled. The results suggest that the sequential threshold model is a method that might be considered at evaluating genetic relationship between discrete-time survival and other traits, due to its flexibility.


INTRODUCTION
Production, longevity and fertility are traits of major interest for the dairy industry. Emphasis placed on selection for milk production has led to deterioration of fertility [2,30,34] and reduction of herd life [19,31]. Less fertile cows have impaired longevity, and their average total lifetime production can be up to 10 000 kg lower than cows with adequate fertility as shown by 392 O. González-Recio and R. Alenda González-Recio et al. [15]. The dairy industry needs cows that are able to produce enough milk to ensure sustainability of farms, but also cows that live longer to reduce rearing costs. Reproductive performance is essential in both tasks because a proper female fertility allows for a subsequent lactation that ensures income from milk yield and may increase herd life. Heritability estimates for fertility are low, ranging between 0.01 and 0.07 [5,14,20], whereas heritabilities for survival or longevity are typically low to moderate ranging between 0.02 and 0.20 [3,35]. Total merit indices in all countries include production traits; however, female fertility is not always included in the total merit indices, and those including survival often analyze indirect longevity using correlated traits as body traits, fertility or udder health [21]. Yield, measured as kg of milk or protein, is the most important trait in many total merit indices, which makes joint selection with survival and fertility rather difficult due to their genetic antagonism with production. (Co)variance components must be estimated to place proper emphasis on these traits, because it is one of the factors needed to achieve an efficient selection. Genetic correlations between production and fertility in the Basque Holstein populations were previously estimated by González-Recio et al. [18], but genetic relationships between survival and production or fertility have not. The increased interest on modeling direct longevity has led to the development of improved methods for analyzing survival traits as longevity, allowing for random effects, censoring and time dependent covariates [11,12]. Survival can also be measured in discrete-time intervals (e.g. number of lactations in the lifetime) as described by Prentice and Gloeckeler [23] or Ducrocq [9]. Discrete-time survival analysis has been applied on longevity using simulated data [37], and on fertility with simulated [25] and field data [16]. Bivariate models using survival analysis still pose some difficulties because joint distribution cannot be described using common methods. Interesting approaches have recently been studied to implement bivariate analyses allowing for one survival trait [7,29], but they still pose some inconvenience for practical implementation. The method proposed by Tarrés et al. [29] is a two-step approach, and that by Damgaard and Korsgaard [7] is a computing demanding method. In this paper we propose a bivariate analysis for one discrete-time survival trait and one Gaussian trait. A sequential threshold model, described by Albert and Chib [1] was used to analyze the discrete-time survival trait. This method can describe physiological or decision processes that occur in a sequential order, which means that for observing a given stage, it is necessary to have passed through all previous stages. Furthermore, this method can incorporate time-dependent covariates and censoring, and it accounts for what occurred in the previous stages increasing 393 reliability of estimates. The sequential threshold model does not need a twostep approach to model both traits jointly, and is suitable to implement animal models.
The objective of this research was to implement the sequential threshold model in bivariate analyses for survival (using animal models) with milk production or fertility traits for estimation of genetic parameters.

Data
Data were provided by the regional Holstein Associations from the Basque and Navarra autonomous regions of Spain. Milk yield and reproductive data from 1995 through 2005 were used in the analyses. Records from cows born between 1995 and 2001 were included in the analyses. Cows with first calving before 18 months or after 40 months of age were excluded. At least 25 lactation records were required per herd, and data form herds with an average of less than 1.5 inseminations to conception were discarded. The edited data set contained 96 642 lactation records from 41 170 cows, and the pedigree file contained 85 815 animals.
Survival was observed for the first five lactations. If a cow survived more than five lactations, it was coded as 1 (survived to next lactation). The respective 305-d standardized milk yield (MY305), days open (DO) and number of inseminations to conception (INS) were matched with each lactation. Fertility records were considered as censored if no subsequent calving was reported, or if the outcome of a given insemination was unknown. The last known insemination was considered as the censoring point for INS. In addition, if pregnancy was not achieved after the fifth insemination, INS was considered as censored at that point. DO was calculated as days between one calving and the subsequent one, minus gestation period (assumed to be 282 d). If the subsequent calving was not registered, DO was computed as days between calving and the last insemination record, and assumed to be censored. An indicator variable tagged each cow and lactation as being either pregnant or censored. For convenience, censoring for DO and INS was assumed independent and noninformative; however, the probability of censoring is debatably higher in less fertile cows. Some cows were not inseminated during their last lactation because those cows were scheduled to be culled beforehand. Hence, DO and INS records within those lactations were considered missing values, and data augmentation was implemented as in Carriquiry et al. [4]. A summary of the data is given in Table I. Cows stayed an average of 2.06 lactations in herds. The average culling rate was 40% per lactation, ranging between 24% in first lactation and 77% in fifth lactation. Average MY305 was 9025 kg. Mean DO and INS for observed records were 130 d and 1.8 inseminations, respectively. Average censored time for DO was 194 d postpartum, whereas censored records for INS occurred at an average of 2.68. The data set contained 31% and 30% right censored records for DO and INS, respectively.

Bivariate models
A sequential threshold model described by Albert and Chib [1] was used to analyze the number of lactations per cow that occurs in a sequential order. This means that for an observation to be present at a given lactation, it must have survived through all previous lactations. Suppose that L lactations can be observed for a cow. The lactation l can only be observed after l − 1 lactations have been reached. A dichotomous variable, indicating if a subsequent lactation is reached (1) or if the cow is culled (0), can be modeled. The probability of surviving until lactation l, conditional on the event that the lth lactation has been reached, is given by where γ = (γ 1 , ..., γ L−1 ) are unordered cutpoints and X i θ i represents the explanatory effects of the covariates. This probability function is referred to as the discrete-time hazard function [23,32,33]. The sequential model can formulate the probability to survive in terms of latent variables (Chang et al., not published). Corresponding to the observation y i = L lactations, we define L latent variables {ω il }, and we observe: Thus, associated with the latent variable ω il in lactation l, we can implement Bayesian bivariate models to analyze genetic relationship for functional discrete-time survival (SURV), measured as the probability of surviving to next lactation, jointly with MY305, DO or INS (measured as linear traits), as follows: data on all individuals is y, where y i = (y i1, , y i2 ) is the observed records is the pair of records on individual i at lactation l, being ω il the unobserved latent variable indicating probability of being culled or survival to next lactation, and η i is the observed or augmented record for MY305, DO or INS in the current lactation. Note that time-dependent covariates for survival may be considered, changing at the beginning of each lactation, affecting ω il of animal i at lactation l. A linear model was adopted for DO and INS, allowing for censoring as described by González-Recio et al. [17]. A vector δ denoted censored (0) or observed (1) records and tagged each of the N observations for both fertility traits. Thus, if the observation is censored, η i corresponds to the augmented value of the trait, given the parameters. The censoring point C i for INS was the value of the last, presumably unsuccessful, insemination observed. In these cases, the date of the last insemination was considered as the censoring point for DO. Data augmentation [4,28] was used to generate latent data for censored observations. Missing values were augmented according to the procedure referred above, between −∞ and +∞.
The augmented posterior distribution follows the specification described next: let W = (ω, η) be the complete data vector of augmented or observed values for SURV and the second trait (MY305, DO or INS), such that the sampling model is: W γ, β, p, a, R ∼ N(γ + Xβ + Z p p + Z a a, R ⊗ I), is the residual covariance matrix between the two traits, and I is an identity matrix of order equal to the number of records in the data set. Residual variance for SURV was set to 1. Note that the cutpoint vector can be incorporated at the right hand side of the equation of the sequential threshold model.
The systematic effects (β) in the model above were the following: lactation number (five levels) for MY305, DO and INS; effects of milk production level (five levels: <5000, 5000 to 6100, 6100 to 7300, 7300 to 8600, and >8600) for SURV (as time-dependent covariate), DO and INS, calendar month of calving (12 levels) (as time-dependent covariate for SURV), herd (517 levels), and effects of year-season of calving (16 levels, two seasons per year: October-March and April-September) (as time-dependent covariate for SURV). Other effects in the model were the following: p = permanent environmental effect of the cow for MY305, DO and INS (41 672 levels) and a = additive genetic effect of animal (85 815 levels). Furthermore, X, Z p and Z a were the corresponding incidence matrices of systematic effects, permanent environmental effect and additive genetic effects, respectively. Discrete-time survival, fertility and production 397 The following prior distributions were assumed: where, σ 2 p is the cow permanent environmental variance of milk or fertility is the 2 × 2 (co)variance matrix of the additive genetic effect, and A is the additive relationship matrix among animals. Independent inverse Wishart prior distributions was assigned to matrix G, whereas σ 2 p was sampled using the Metropolis-Hasting algorithm with a bounded uniform (0, 1) prior. In this sequential threshold-linear model, the prior distribution of the residual variance of the Gaussian trait (σ 2 e2 ) was an inverted chi-square, whereas that of the residual covariance was a bounded uniform − σ 2 e2 , σ 2 e2 . Assuming that conditionally on model parameters θ = (γ, β, p, a, σ 2 p , G, R) censoring for DO and INS is non-informative and independent, the augmented posterior distribution is given by: Draws from the posterior distribution of heritability were formed as Posterior distributions of the parameters of interest were estimated using a Gibbs/Metropolis algorithm [13,26,27]. Visual inspections of the running means and trace plots were used to assess the number of iterations and the required length of burn-in. The analyses were based on a single chain of 100 000 iterations, with the first 10 000 samples discarded as burn-in.

RESULTS AND DISCUSSION
Heritability posterior means for SURV, MY305, DO and INS are shown in Table II, and the histograms of the heritability posterior distribution are shown in Figure 1. The MY305 heritability estimate was 0.27 ± 0.01, whereas SURV and fertility heritability estimates were lower. Heritability of MY305 typically ranges from 0.20 to 0.33 in the scientific literature [8,20,36]. SURV had a heritability estimate on the liability scale of 0.12 ± 0.01 which is higher than that estimated on the original scale by Chirinos et al. [6] in a Spanish Holstein population and lower than heritability reported by Ducrocq [10], both using survival analysis. The heritability estimate of DO was 0.08 ± 0.01, similar to that reported by Veerkamp et al. [34]. González-Recio et al. [17] estimated a heritability of 0.07 for this trait in the same population using a linear censored model. The heritability of INS had the lowest estimate (0.03 ± 0.01), indicating a higher environmental effect on this trait. Authors using the threshold model found higher heritability for INS on the liability scale [5,16] than that estimated by other authors on the observed scale [14,20,34].
Antagonist genetic correlations between SURV and the remaining traits were estimated (Tab. III). MY305 and INS had posterior means for genetic correlation with SURV of, approximately, −0.53. SURV had a genetic correlation with DO of −0.78. Standard deviations of the posterior distributions for genetic correlation estimates ranged between 0.04 and 0.07, and Monte Carlo errors were lower than 0.01. Antagonism between fertility and production has been previously estimated [2,15,20], reporting deteriorated fertility (both INS and DO) with higher yield. There have been few studies of the genetic associations between survival and fertility. Tsuruta et al. [31] reported negative genetic correlation (−0.36) between herd life and days open, and  Vollema et al. [35] reported antagonistic genetic correlation of −0.15 between longevity and non-return rate at day 56 postpartum, as a fertility trait. Figure 2 shows the histograms of the genetic correlation posterior distribution between SURV and the remaining traits. The posterior distributions involving INS were slightly right skewed, probably due to the categorical nature of this trait.
Summary statistics for the residual, additive and permanent environmental variances of Gaussian traits are shown in Table IV. The Monte Carlo error for those posterior distributions was within expected ranges. Residual correlation between the Gaussian traits and SURV was generally low, ranging from −0.22 for DO and 0.02 for MY305 (Tab. IV).
This study showed that SURV is genetically correlated with production and fertility. Cows with higher production had shorter lives in the herd and less fertile cows were culled earlier in life. These estimates suggest that survival is greatly affected by production and fertility. High yielding cows were less liable to get pregnant, needed more inseminations to conception and had longer DO, which usually lead to culling non-pregnant cows towards the end of lactation [24]. Short productive life increases rearing cost and is less profitable for the dairy industry, even with high milk yield as shown in González-Recio et al. [15]. Thus, proper fertility with an adequate production level is a challenge for the dairy industry.
A higher genetic correlation (in absolute values) of SURV with DO than with INS may be due to difficulty in the detection of heat in high yielding cows, which influence DO but not INS, and this could be a culling reason in some herds. Furthermore, dairymen may be more concerned in reducing DO, culling cows with larger calving-conception intervals, and giving less emphasis to the number of inseminations that were applied to cows. It is noteworthy that INS and DO are correlated but they are not the same trait, and missing heats or delaying first insemination affect DO but not INS. Augmented censored data The method presented in this study allows for including functional longevity in multivariate analyses to estimate genetic (co)variance among different traits. Therefore, farms and AI studs may maximize lifetime profitability in dairy cattle, enlarging herd life and reducing cost and labor on farms.
Genetic analyses between one discrete-time survival and one linear trait were carried out using bivariate censored linear -sequential threshold models. Genetic correlations of SURV with other traits of interest were estimated with field data and no two-step approach was needed. The method presented in this paper may handle large data sets, and an animal model may be implemented for evaluation of functional longevity considering time-dependent covariates, although the animal model increases computation time. However, sire models could be used as well, if the data sets are too large or unproper data structure exists [22]. Further studies considering this method are needed to improve knowledge on its behavior.