Comparison of methods for analysis of selective genotyping survival data

Survival traits and selective genotyping datasets are typically not normally distributed, thus common models used to identify QTL may not be statistically appropriate for their analysis. The objective of the present study was to compare models for identification of QTL associated with survival traits, in particular when combined with selective genotyping. Data were simulated to model the survival distribution of a population of chickens challenged with Marek disease virus. Cox proportional hazards (CPH), linear regression (LR), and Weibull models were compared for their appropriateness to analyze the data, ability to identify associations of marker alleles with survival, and estimation of effects when all individuals were genotyped (full genotyping) and when selective genotyping was used. Little difference in power was found between the CPH and the LR model for low censoring cases for both full and selective genotyping. The simulated data were not transformed to follow a Weibull distribution and, as a result, the Weibull model generally resulted in less power than the other two models and overestimated effects. Effect estimates from LR and CPH were unbiased when all individuals were genotyped, but overestimated when selective genotyping was used. Thus, LR is preferred for analyzing survival data when the amount of censoring is low because of ease of implementation and interpretation. Including phenotypic data of non-genotyped individuals in selective genotyping analysis increased power, but resulted in LR having an inflated false positive rate, and therefore the CPH model is preferred for this scenario, although transformation of the data may also make the Weibull model appropriate for this case. The results from the research presented herein are directly applicable to interval mapping analyses.


INTRODUCTION
Genetic association analyses are becoming a common approach in animal breeding to identify genes or genomic regions that affect quantitative traits (e.g., [17,29,33]). Most analyses utilize statistical models that assume normality of phenotypes [8,9,12,14,32]. Many phenotypic traits of interest in agriculture, however, do not follow a normal distribution (e.g. ordinal traits such as conformation scores, binary traits such as calving/not calving, or time to success/failure traits such as survival time).
Many agriculturally important traits follow a survival distribution (e.g. survival after infection, length of productive life, or days open). To analyze such traits, the Weibull and Cox proportional hazards models are commonly employed [1,2,4,6,7,11,16,23,25,27]. The Weibull model, a generalization of the exponential model, is parametric and is therefore appropriate for only specific distributions; the Cox model is a rank-based semi-parametric method and, therefore, should be appropriate for all distributions for which the hazards between groups are proportional [26]. Both models can appropriately handle datasets that include data from individuals without a recorded time of death, i.e., censored data.
Another distributional difficulty that is often encountered in marker/phenotype association analyses is created by the use of selective genotyping. Selective genotyping is a method to reduce the costs of an experiment by genotyping only individuals from the extremes of the phenotypic distribution [14,15]. Individuals from the phenotypic extremes provide most of the power for a marker-trait analysis, so power can be maximized for a limited number of genotypes by selective genotyping. This, however, also causes the distribution of phenotypes to be non-normal. When the phenotype of all individuals in the experiment is normally distributed, using linear regression to analyze selective genotyping data results in the effects being overestimated [14]. Darvasi and Soller [3] and Ronin et al. [24] derived an approximation to correct for this bias when the complete phenotypic data follows a normal distribution. Maximum likelihood can also be used to appropriately analyze such data [14,20], if the proper distribution can be identified. Henshall and Goddard [10] showed that logistic regression of genotype onto phenotype also resulted in unbiased estimates of effects; however, the original phenotype must be normally distributed for this method as well.
Moreno et al. [19] used simulation to compare Cox, Weibull, and Gaussian models for QTL analysis with a survival phenotype. They found that when the data are censored at a fixed date, the survival models improved power of QTL detection and the accuracy of estimates of QTL location and effect. They did not, however, evaluate the models for analysis of selective genotyping survival data, which was the objective of the current study.
McElroy et al. [18] used the Cox proportional hazards model and linear regression to identify markers associated with survival in selectively genotyped layer chickens infected with Marek disease virus. Using simulation under the null hypothesis of no QTL, they found that both linear regression and the Cox proportional hazards model resulted in valid false positive rates for tests of association. They did not, however, compare the power of these two models, nor did they evaluate the use of a Weibull model. Because the Weibull model is a parametric survival model, it should be more powerful that the Cox model if the data follow a Weibull distribution. Therefore, the objective of this study was to compare the validity of false positive rates and the power of employing the Weibull model, the Cox proportional hazards model, and the linear regression model to analyze QTL associations under full and selective genotyping of survival data.

Data simulation
Survival data were simulated for a backcross under the null hypothesis of no QTL, or the alternative hypothesis that a QTL affecting survival resides at the marker under analysis. To mimic real data, the simulated data were generated to represent actual data from the backcross population of Marek disease viruschallenged layer chickens described by McElroy et al. [18]. Survival times in the real experimental population over the recording time ranged from 33 to 140 d (Fig. 1). Survival times followed a right-skewed distribution with a mean of 65.5 d, a median of 59.0 d, and a standard deviation of 23.9 d. Twentyeight individuals (4.3%) survived to the end of the study and were considered censored in the analyses. A survival function (S (t) KM ) was estimated from the real data using the non-parametric Kaplan-Meier estimator [13]: where j is the rank of a particular day (t) among all chronologically ordered days in which death occurred, k is the rank of the day in which the survival function is being evaluated, n j is the number of animals still alive on day j, and d j is the number of animals dying on day j. This resulted in an estimate of S (t) KM for each time of death (t). Survival data for 700 backcross animals were simulated following methods described by Moreno et al. [19] by generating a survival probability, S 0 (t) i , for where S (t) i is a draw from a random uniform [0,1], β s is the simulated QTL allele substitution effect [5], and x i indicates the QTL allele received from the F1 parent, which was either zero or one and was randomly selected from a Bernoulli distribution with a 50% chance of getting either allele. The value of S 0 (t) i obtained for an animal was then cross-referenced with the estimated Kaplan-Meier function obtained from the real data to get the corresponding time of death. To simulate selective genotyping, only the 20% short (140 animals) and 20% long (140 animals) surviving individuals were considered for analysis and their genotype for the marker (QTL) was assumed known.
The simulation was performed for each of four allele substitution effect levels of the QTL: β s = 0, 0.1, 0.2, or 0.3. The largest effect was chosen to be 0.3 because all analyses had high power for this effect. It should be noted that a larger β s results in an increased risk of death and, therefore, corresponds to a more negative effect on days of survival. Two additional censoring scenarios (0 and 20%) were also considered, in addition to the 4.3% censoring that was present in the real data. For the no censoring scenario, the Kaplan-Meier function from the real data set was extended past day 140 by assuming the same risk of dying for each day past 140 as the average risk from day 101 and 139. For 20% censoring, day 77 was considered as the last day of the study.

Models of analysis
Simulated data were analyzed for associations of marker and phenotype using the Weibull, Cox proportional hazards, and linear regression models. The linear regression model (e.g., [28]) was: where T i is the survival time of animal i, in days; β LR is the increase in the mean survival time associated with inheriting allele 1 versus allele 0; p(Q) i is the probability of inheriting one versus the other QTL allele for animal i; µ i is the average survival time when p(Q) i = 0; and ε i is the residual for animal i.
The Cox model used, following Smith [25], was: where S (t; n) i is the probability that animal i survived at least until time t, where β PH is the allele substitution effect of the QTL on the natural log of the hazard ratio for the Cox model, and p(Q) i is as defined previously. The Weibull model [30] used was: where S (t) i is the probability that animal i survived at least until time t, θ is the shape parameter of the Weibull distribution, α is the scale parameter of the Weibull distribution, β W is the allele substitution effect of the QTL on the natural log of the hazard ratio for the Weibull model, and p(Q) i is as defined previously. For all models, p(Q) i = 0 or 1 for genotyped individuals, depending on which QTL allele they inherited. Selective genotyping data were analyzed in two ways: (i) including only genotypic and phenotypic data from genotyped individuals (SG) in the analysis, and (ii) also including phenotypic data from the ungenotyped individuals (SGI) in the analysis. For the SGI scenario, p(Q) i = 0.5 for ungenotyped individuals because, with unknown genotypes, these individuals were equally likely to have received either allele. These data were then analyzed with the three models of interest. For the 4.3 and 20% censoring scenarios, animals dying on day 140 and 77, respectively, were considered censored for the Weibull and Cox models, and to have died on day 140 and 77 for the linear regression model.
The proper method for including ungenotyped individuals in the survival models, in which the allele inherited is unknown, is to use the average of the hazards for the two possible alleles in the partial likelihood. However, standard statistical packages do not permit modification of the likelihood equations. To allow implementation of standard methods, p(Q) i = 0.5 was used as the explanatory variable for the survival models as an approximation of the average of the hazards for the two possible alleles in the partial likelihood. This method is discussed in detail in the appendix of McElroy et al. [18] for the Cox model. They found that, under the null hypothesis of no QTL effect, the estimated effects are slightly biased away from zero when the approximation is used, but the standard error of the estimated effects is also higher, resulting in an appropriate type I error rate.
False positive rates (assuming no QTL effect) and power of the analyses were calculated based on 1000 replicates for each allelic effect level for each of two significance levels (P < 0.05 and 0.01) for the Full (all individuals genotyped), SG, and SGI genotyping scenarios. A two-tailed binomial test was used to determine if false positive rates differed significantly from expected (i.e. the α level) under the null hypothesis and a two-tailed Fisher exact test was used to determine if the power of the alternate models differed significantly.
The Weibull model for all genotyping scenarios and the linear regression model for the SGI scenario resulted in inflated false positive rates. Although transforming the data to fit a Weibull distribution could have made the Weibull model appropriate for analysis in the Full genotyping scenario and may have improved the performance of the Weibull model in the SGI scenario, no transformations of the data were performed for the current study because transformation would not have addressed the SG scenario, which was the main emphasis of the study. Instead, for the Weibull analyses and the linear regression analyses in the SGI scenario, empirical thresholds with appropriate false positive levels were derived and used to compute power for each level of censoring. Empirical thresholds were obtained from 10 000 replicates of data simulated with no QTL effect.
Average allele substitution effect estimates and their variances across each set of 1000 replicates were also calculated. Estimated effects from linear regression describe the average difference of survival in days between the two QTL alleles, and estimates from the Cox and Weibull models describe the allele substitution effect on the natural log of the hazard ratio. To make the effect estimates from all three models directly comparable, the mean difference of survival in days between groups inheriting one versus the other QTL allele was estimated for the Cox and Weibull models for each replicate. For the Weibull model, estimates in terms of the hazard ratio can be directly converted into estimates in days by: where E(T) is the expected mean days of survival, Γ is the gamma function, and p(Q) i , α, β W , and θ are as described previously. The mean difference in days was calculated by finding the difference between E(T) for p(Q) i = 0 and 1. To obtain estimates in days from the Cox proportional hazards model, the difference in the means of the survival functions for p(Q) i = 0 and 1 was calculated across all times using the Cox proportional hazards estimates for β PH . For this estimation, censored individuals were assumed to have died on the last day of the study. Estimates from the three models were compared based on their correlations, magnitudes, and standard errors. The magnitudes of effect estimates from the three models were compared by using a t-test assuming unequal variances. All simulations and analyses were performed using the R [22] statistical package. Table I shows the false positive rates for the Weibull, Cox, and linear regression models for α = 0.05. False positive rates for the Cox model were not significantly (P > 0.05) inflated for any genotyping scenario. False positive rates for linear regression were not significantly (P > 0.05) inflated in the full and SG cases, but were significantly (P < 0.05) inflated for every censoring level when SGI was used. The Weibull model had a significantly (P < 0.05) higher number of false positives than expected for every censoring and genotyping scenario. Significance of differences from the expectations for the three models with α = 0.01 were similar to those when α = 0.05.

Power between models
Powers of the Cox, Weibull, and linear regression models for different effects and censoring for α = 0.05 are shown in Figure 2. Although not always Linear regression tended to have higher power than the Weibull model with the SGI scenario and 0 and 0.04% censoring. The results for the power between models presented in Figure 2 are for α = 0.05, but trends were similar for α = 0.01. 3 P-value levels determined empirically for these analyses. 4 Bars with different letter superscripts (a, b, or ab) within scenario, censoring, and effect are significantly different at P < 0.05. No letter indicates no significant differences (P > 0.05).

Power among genotyping scenarios
Although not always significant, definite trends in differences of power were apparent when comparing genotyping scenarios (Tab. II). For all three models, across QTL effects and censoring, the Full genotyping scenario had more power than SG and SGI, and SGI had more power than SG. Table II shows results for α = 0.05 but trends were similar for α = 0.01.

Effect estimates
For the Full genotyping scenario, estimates of allele substitution effects on the ln(hazard ratio) scale from the CPH model were never significantly (P > 0.2) different from the simulated values (0.1, 0.2, or 0.3) (data not shown). For the Cox model, the correlation of estimates on the ln(hazard ratio) scale with converted estimates in days (within censoring level) was less than -0.99 for all scenarios, so the Cox estimates in days were adequate to compare estimates in days from all models with the expected estimates that were in terms of the ln(hazard ratio) within censoring level. Effect estimates from the Cox model for the SG and SGI scenarios with non-zero simulated effects were significantly (P < 0.05) larger than expected (i.e. 0.1, 0.2, and 0.3), but effects were less overestimated for the SGI scenario.
Within a censoring and genotyping scenario, mean estimates from all three models were highly correlated (greater than 0.99). For a given model, correlations of mean effect estimates (in days) from different genotyping scenarios were also greater than 0.99. The rest of the results will only consider simulated effects >0. For the Full genotyping scenario, mean estimates from the Cox and linear regression models did not differ significantly (P > 0.05), except for the 0.2 censoring, effect = 0.3, case (Tab. III). For the Full scenario, mean estimates from the Weibull model were significantly (P < 0.05) greater than mean estimates from the Cox model. For the SG and SGI scenarios, mean estimates from the linear regression model were significantly (P < 0.05) larger than mean estimates from the Cox model for all cases. For the Full and SG scenarios, mean Weibull estimates were significantly (P < 0.05) larger than mean estimates from linear regression, with the exception of the SG, no-censoring scenario. For the SGI scenario with low censoring (0 and 0.04), mean estimates from linear regression were significantly (P < 0.05) larger than Weibull estimates. However, for the SGI high-censoring (0.2) scenario, mean estimates from the Weibull model were significantly (P < 0.05) larger than mean estimates from linear regression.
For all three models, standard errors of the means of the effect estimates were less for the Full scenario than for SG and SGI (Tab. III). For the Cox and Weibull models, standard errors were greater for the SG scenario than for the SGI scenario but they did not differ for the linear regression model.   3 Numbers with different letter superscripts (a, b, or c) within a row and genotyping scenario (Full, SG, SGI) are significantly different at P < 0.05. No letter indicates no significant differences (P < 0.05) within a row and genotyping scenario. 4 Standard errors were equal across effects within model and genotyping scenario.

False positive rates
Significance tests under the Cox model were robust to the distribution of phenotypes that was included in the analysis. The Weibull model had significantly (P < 0.05) inflated false positive rates for all scenarios, indicating that the data did not fit a Weibull distribution. This was confirmed by significance (P < 0.01) of a lack of fit test of the survival distribution from the real population to the Weibull distribution using the Cramer-von Mises W test [21]. Tests based on linear regression were robust to the lack of normality in the Full and SG scenarios, but had an inflated false positive rate for the SGI scenario. Further research is needed to understand why the addition of phenotypes of the ungenotyped individuals causes linear regression to have an inflated false positive rate using this particular dataset and to determine if this inflated false positive rate occurs with datasets having different distributions.
In their comparison of models for the analysis of a non-selectively genotyped survival dataset, Moreno et al. [19] first transformed their data to better fit a Weibull or Gaussian distribution. With the transformed data, Moreno et al. [19] found that the Weibull and the Cox models had similar performances. Transformation of our data would likely have resulted in similar results for our Full scenario but for the SG scenario, the transformed dataset would still be statistically inappropriate for the Weibull model, unless all phenotypic records were included in the analysis. The effects of transforming the data on the selective genotyping scenarios are unknown.

Power differences between models
Differences in power between models varied between genotyping scenarios and censoring levels. In the Full scenario, the Cox model had higher power than the linear regression and Weibull models. This is likely because the Cox model is more appropriate for this distribution. The linear regression model had less power than the Cox and Weibull models when censoring was 20%. This was expected because the censored individuals were considered as dying on the last day of the study in the linear regression model, and therefore the extreme long-surviving individuals (which are more likely to have the favorable genotype) were grouped with less extreme long-surviving individuals that would be expected to have a lower frequency of the favorable genotype. This situation reduces the mean phenotypic difference between individuals with the favorable allele and individuals with the unfavorable allele, and thereby reduces power.
In contrast to Moreno et al. [19], we found significant differences in power between the Cox and the Weibull models in the Full scenario. The discrepancy between the two studies is likely due to the fact that the data were transformed to fit a Weibull distribution in the Moreno et al. [19] study, whereas the data were not transformed in the present study.
For the linear regression analyses, Moreno et al. [19] either excluded censored individuals or included them as dying on the last day of the study, as in our study. They found that differences in power between the survival models and both linear regression analyses were small when censoring was low or random, but increased as censoring increased. When censoring was greater than 20% and at a fixed date, linear regression resulted in much lower power than the survival models when censored individuals were excluded. Inclusion of censored individuals as dying on the last day resulted in similar power as the survival models at 20% censoring, but in substantially lower power at 40% censoring. The current study included censored individuals in the linear regression analyses and only examined censoring up to 20%. The differences in power we observed between the linear regression model and the Cox model would likely have been much larger if the censored individuals were excluded from the linear regression analyses or at 40% censoring.

Power differences between genotyping scenarios
For all models and scenarios, power was somewhat (usually less than 10%) lower for the SG and SGI scenarios than for the Full scenario. This is expected because for a normally distributed phenotype, it has been shown that, for a fixed number of genotyped individuals, SG is more powerful than Full [3,14,15], but when genotyping is not constrained, Full is more powerful than SG. This is because the SG scenario excludes some of the information that is used in the Full scenario, as does the SGI scenario. For the simulated survival data, SGI had slightly (usually less than 5%) higher power than SG for all models and scenarios. To determine which individuals are the most extreme for selective genotyping, phenotypes from the entire population must usually be obtained, so phenotypes of ungenotyped individuals should routinely be available for inclusion in a selective genotyping analysis to increase power.

Effect estimation
For the Full scenario, estimates of the hazard ratio from the Cox model were not significantly different from expectations (P > 0.05; data not shown). This is likely because the survival data were appropriately distributed for the Cox model. The linear regression model also estimated effects accurately in the Full scenario when compared to estimates from the Cox model after conversion to days, indicating the robustness of linear regression to deviations from normality. Moreno et al. [19] also found little differences between estimates from the Cox and linear regression models when censored individuals were included in the linear regression analysis as dying on the last day of the study. The Weibull model overestimated the effects in the Full scenario by approximately 50%, likely due to lack of fit of the data to a Weibull distribution.
Even though in the full scenario the estimates of the hazard ratio from the Cox model were not significantly different from the expected, the estimates in days were smaller when censoring was 20% for all models (Tab. III). With censored data, the actual length of survival of the animals is unknown. The effect in days from linear regression is the difference in the means of the two allelic groups, and it is only known that censored individuals survived at least to the date at which they were censored. The difference between the two allelic groups must get smaller since the length of survival time is reduced for the long surviving allelic group as a result of increased censoring from shortening the length of the study.
For the SG scenario, it is well known that linear regression overestimates effects when the phenotypes are normally distributed, and methods have been devised to correct for these biases [3,20,24]. The upward bias of estimates from linear regression is due to a positive correlation between QTL effects and residuals when only extreme phenotypes are included in the analysis [3]. The Cox and Weibull models also overestimated effects in the SG scenario by approximately 100%, although the Cox model did so significantly (P < 0.05) less than the Weibull and linear regression models. For the SGI scenario, linear regression overestimated the true effects slightly more than for the SG scenario, and the Cox and Weibull models overestimated the true effects slightly less. Skewness of the distribution may cause estimated effects from linear regression to increase when including the ungenotyped individuals with a p(Q) i = 0.5, since the mean phenotype of these individuals will not fall on the regression line between the mean phenotypes of the two extremes. With a normally distributed phenotype, the mean of the ungenotyped individuals, when given the mean genotypic probability (0.5), is expected to fall on the regression line between the means of the two phenotypically extreme groups, and therefore should not change the slope of the regression line. Inclusion of non-genotyped individuals in a selective genotyping analysis has been examined by several authors [10,14,31] as a way to reduce biases in effect estimates. However, all of these studies assumed a normal distribution. Further investigation into the proper statistical models for including non-genotyped individuals in analysis of non-normal, selectively genotyped data is needed.

Conclusions
Within the scope of the observed data distribution, the Cox model performed better than the linear regression and Weibull models across the tested genotyping scenarios and censoring levels, without the need to derive empirical significance thresholds. This was likely because the distribution of the data met the assumptions of the Cox model better than that of the other two models. However, survival models are much more computationally demanding than linear regression and the effect estimates from survival models are much more difficult to interpret than the effect estimates from linear regression. This, combined with the fact that the linear regression and Cox models had similar power, suggests that with little censoring and Full or SG genotypic scenarios, linear regression can be the model of choice, since false positive rates for linear regression were valid for these two scenarios. The Weibull model had an inflated false positive rate for all scenarios and linear regression had an inflated false positive rate for the SGI scenario, so significance thresholds had to be determined empirically, which is also computationally demanding.
In cases where selective genotyping is employed, more power can be obtained by including phenotypes from non-genotyped individuals in the analysis. These phenotypes should typically be available since they need to be recorded to determine which individuals are extreme. The Cox model was the only model that did not require derivation of empirical thresholds for SGI, and, therefore, may be the best model to use in this situation. Further research is needed to determine if the Cox model would also be appropriate for normally distributed traits with the SGI scenario.
The current study was based on single marker analysis, rather than employing interval mapping but the genotypic data were entered into the models as probabilities of marker genotypes. The genotypic data from an interval mapping analysis are also used to compute probabilities of genotypes at a specific locus. The methods explored herein should therefore also apply to interval mapping analyses but require further validation of model comparisons because probabilities will be different from 0 and 1 for all individuals.