Joint tests for quantitative trait loci in experimental crosses

Selective genotyping is common because it can increase the expected correlation between QTL genotype and phenotype and thus increase the statistical power of linkage tests (i.e., regression-based tests). Linkage can also be tested by assessing whether the marginal genotypic distribution conforms to its expectation, a marginal-based test. We developed a class of joint tests that, by constraining intercepts in regression-based analyses, capitalize on the information available in both regression-based and marginal-based tests. We simulated data corresponding to the null hypothesis of no QTL effect and the alternative of some QTL effect at the locus for a backcross and an F2 intercross between inbred strains. Regression-based and marginal-based tests were compared to corresponding joint tests. We studied the effects of random sampling, selective sampling from a single tail of the phenotypic distribution, and selective sampling from both tails of the phenotypic distribution. Joint tests were nearly as powerful as all competing alternatives for random sampling and two-tailed selection under both backcross and F2 intercross situations. Joint tests were generally more powerful for one-tailed selection under both backcross and F2 intercross situations. However, joint tests cannot be recommended for one-tailed selective genotyping if segregation distortion is suspected.


INTRODUCTION
Selective genotyping is a common approach used to enhance the efficiency of quantitative trait loci (QTL) mapping studies [13,25], which employs an extreme threshold (ET) design and entails analyzing only a subset of individuals with extreme scores. In an ET2 design, individuals are sampled from both tails of the phenotypic distribution (i.e., cases with unusually high and low values of the phenotype). ET2 designs have been shown to decrease uncertainty about the underlying QTL genotypes, yield valid false positive rates, and increase the statistical power per genotyped individual [1,7,13] because the expected correlation between genotype and phenotype generally increases [3]. For example, Allison [2] showed that the ET2 design increased the power of his TDT Q5 . However, there is a trade-off between (a) increasing the correlation through extreme sampling and (b) reducing the overall statistical power due to the reduction in sample size. The association between genotype and phenotype has been the focus of tests in QTL mapping, including studies of experimental crosses. We refer to tests that evaluate whether the distribution of the phenotype (Y) is dependent on some function of the genotype (G) as regression-based tests.
It is also common for genetics researchers to use and ET1 design and sample from only one tail of the phenotypic distribution. The ET1 design is similar in concept to "case-only" designs often used in human studies [15]. However, ET1 designs decrease the power of regression-based tests due to a restriction of range [16]. It is important to note that, when (and only when) the null hypothesis is false, extreme sampling can also affect the marginal distribution of genotypes. That is, under the null hypothesis of no linkage, the marginal distribution of genotypes has the same expected frequencies regardless of the phenotypic value. For example, in an experimental BB × BD backcross, all offspring would be either BB or BD and these two genotypes would be equally likely, assuming no segregation distortion. Under the null hypothesis of no linkage, Y is not related to the genotype (G). Likewise, G is not related to Y and the probability of sampling a case with a either BB or BD genotype should be equal regardless of Y, P(BB|Y) = P(BD|Y) = 1 / 2 , assuming no segregation distortion. Recognizing this, one can construct tests of linkage when ET designs are used by testing for departures from the genotypic distribution that would be expected under the null hypothesis. We refer to such tests as marginal-based tests. Lander and Botstein [13] have provided considerable detail on increasing the power of QTL mapping by selective genotyping of progeny with extreme phenotypes in backcross designs. Similar discussions that include F2 intercross designs appear in [5] and [20]. Nevertheless, marginal-based tests have been underutilized in the development of QTL mapping procedures for experimental crosses.
In this paper, we develop methods that capitalize on the information available in both regression-based and marginal-based tests of linkage for experimental crosses. We show that these tests are rarely less powerful and are usually more powerful than regression-based or marginal-based tests alone. Moreover, the tests we have developed are easily implemented in standard software, should be robust to non-normality, can be applied to either backcross or F2 intercross designs, and allow for extreme sampling with either ET1 or ET2 sampling. In developing these tests, we assume that there is no segregation distortion. However, we note that the marginal-based and joint tests rely crucially on this assumption, especially in ET1 designs. Therefore, we examine the statistical properties of these tests when segregation distortion is present. We also discuss how the tests herein should be used if segregation distortion is suspected.

INDIVIDUAL TESTS OF LINKAGE
Before proceeding further, it will be useful to define the specific tests of linkage that we employed (see Tab. I). We considered two types of experimental crosses: A backcross and an F2 intercross. Let the two parental strains be denoted BB and DD. Assume that the backcross utilized is one between the BB strain and the BB × DD F1. Then, at each locus, progeny in a backcross can have either BB or BD genotypes. Scoring these by the number of D alleles, the corresponding genotypic values would be G = 0 and 1, respectively. For the F2 intercross, BB, BD, and DD genotypes would be scored G = 0, 1, and 2, respectively.

Regression-based tests
The first two regression-based tests involve ordinary least squares (OLS) regression in which phenotype (Y) is regressed on genotype (G): R 1 refers to treating G as a continuous variable with a 1 degree-of-freedom (df ) test and testing the null hypothesis that the slope (β 1 ) equals zero. R 2 refers to treating G as a categorical variable with a 2 df test of the null hypothesis that both slopes (β 1 and β 2 ) equal zero to allow for departures from additivity: where A and D are linear and quadratic polynomial contrast variables, respectively. We note that R 2 cannot be applied to backcross designs because there are only two genotypes and thus 1 df. For F2 intercrosses, however, both R 1 and R 2 can be applied. Although OLS regression procedures can be used to estimate linkage parameters with selective genotyping, the estimates are expected to be biased, and thus, a maximum likelihood procedure for obtaining unbiased estimates has been suggested [13]. For F2 intercrosses, we define R 3 and R 4 as the maximum likelihood procedure of Xu and Vogl [25] applied to the linear models (1) and (2), respectively. Briefly, this technique is a simple modification of the EM algorithm for assessing linkage for selective genotyping using only the phenotypic values of genotyped individuals. The fifth and sixth tests depend on whether the experiment involves a backcross or F2 intercross. For backcross designs, R 5 is calculated by regressing the genotype (G) on phenotype (Y) using logistic regression [11] and testing the null hypothesis that the slope (β 1 ) equals zero: This method was proposed for binary variables and thus is not generally applicable to F2 intercross designs. In the case of an F2 intercross, we define R 6 as multinomial regression with three categories for the response variable, which requires estimating two slopes and two intercepts: Thus, R 6 is a 2 df test of whether both slopes (β 1 and γ 1 ) are equal to zero.
We defined six marginal-based tests, three for each ET sampling design. For ET1 designs, M 7 is defined as a single-sample t-test of whether the mean of G is different from its null expectation (µ G ). Specifically, µ G = 1 / 2 in a backcross and µ G = 1 in an F2 intercross. As alternatives, we utilize chisquare goodness of fit tests. For the backcross, we define M 8 as a 1 df χ 2 test of G versus expected frequencies of P(G = 0) = P(G = 1) = 1 / 2 . For F2 crosses, we define M 9 as a 2 df χ 2 test of whether the sample frequencies for G departs from the null expectation of P(G = 0) = 1 / 4 , P(G = 1) = 1 / 2 , and P(G = 2) = 1 / 4 . We note that these marginal tests rely heavily on the assumption of random segregation in the ET1 design; however, for an ET2 design, this is not necessarily the case. In family-based studies, test statistics that incorporate information from both affected and unaffected siblings are used to control for segregation distortion [22]. Likewise, for QTL studies the use of information from both ends of the distribution will control for segregation distortion [2]. Under the null hypothesis of no linkage, the marginal distribution of genotypes has the same expected frequencies regardless of the phenotypic value. Therefore, the upper and lower tails will have the same expected values of G (same genotype frequencies) under the null hypothesis regardless of whether or not there is segregation distortion. There are standard statistical tests that can be applied as marginal-based tests for an ET2 design. For either an F2 or backcross design, we define M 10 as an independent samples t-test to assess whether the mean of G is equal for the upper and lower tails. As alternatives, we utilize chi-square tests of independence. For a backcross design, we define M 11 as a 2 × 2 (e.g., BB vs. BD by Upper vs. Lower) chi-square test with df = 1. For an F2 intercross, we define M 12 as a 3 × 2 (e.g., BB vs. BD vs. DD by Upper vs. Lower) chi-square test with df = 2.

JOINT TESTS OF LINKAGE
In the context of human IBD-based QTL mapping in sib-pair studies, Forrest and Feingold [8] provide proof that under the null hypothesis of no linkage, regression-based tests and marginal-based tests are independent. Therefore, one way to construct composite tests that capitalize on the information from regression-based and marginal-based test statistics is simply to sum them up and treat them as χ 2 with df equal to the sum of the df of the two tests being combined. We introduced joint tests that do not require the asymptotic independence of the tests, which we found to be more powerful than composite tests in preliminary studies.
We modified the Henshall and Goddard [11] approach, which reverses the position of dependent and independent variables in a regression model (i.e., regressing genotype on phenotype). Our modification involves constraining the intercept to have a pre-specified value based on expectations from the marginal distribution of the genotype given the experimental cross. Large test statistics reflect deviations from the null hypothesis of no association between G and Y and deviations from the genotype frequencies expected under the null hypothesis of no linkage. Thus, these methods provide joint tests of the null hypotheses for the regression-based and marginal-based tests. Sham et al. [21] present a similar approach in the context of human linkage studies.
To employ OLS regression, prior to regressing the genotype on the pheno- backcross, 1 1 / 2 in a DD × BD backcross, and 1 in an F2 intercross. One can then center Y, Y * = Y −Ȳ, and regress G * on the Y * and force the regression through the origin: with β 0 ≡ 0. This offers a single df test that will be sensitive to departures from both the null expectation of G * = 0 and the null covariance between G * and the phenotype. We denote this OLS-based joint test as J 13 .
Although OLS should be robust to the non-normality of residuals that will occur when G * is used as the dependent variable given the sample sizes typically used in QTL mapping, logistic regression offers an alternative that models the categorical nature of the genotypes and avoids the normality assumption. In the case of a BB × BD backcross we can simply regress G on Y * as in model (3), except that we constrain the estimate of β 0 ≡ 0. This is because under the null hypothesis, β 1 = 0, and thus, ln [P(BD)/P(BB)] = β 0 . Also, under the null hypothesis, P(BD) = P(BB) = 1 / 2 , which implies that β 0 = ln [P(BD)/P(BB)] = 0. Thus, we define J 14 as the 1 df test that β 1 = 0 while restricting β 0 to be 0.

SIMULATION STUDIES
To demonstrate the validity of our joint tests with respect to Type 1 error rates and to evaluate their power relative to the marginal-based and regressionbased tests, we conducted a variety of simulations. Table I provides a summary of the tests compared in these simulations. To evaluate Type 1 error rates, simulations were conducted under the null hypothesis of no linkage. To evaluate Type 2 error rates (i.e., statistical power), the basic model used in the simulations was that of a quantitative trait with a single major QTL. For the non-null situations, the proportion of phenotypic variance explained by the QTL was fixed at h 2 = 3%, 5%, 8%, and 11% of the total phenotypic variance in two separate sets of simulations for backcross and F2 intercross designs. Additive and non-additive (dominant) models were simulated. The residual within genotype distribution was normal with a mean of zero and unit variance.
Type 1 and Type 2 errors were evaluated at a significance level of α = 0.0001. For simulations under the null model, 100 000 simulated datasets were used for each situation to ensure reasonable precision for an alpha level as small as 0.0001. For simulations under the alternative hypothesis, 10 000 simulated datasets were used for each situation. A total sample size of N = 500 progeny was used in all the simulations.
Three sampling schemes were considered: (1) Random sampling. All 500 progeny were analyzed; (2) Selection from both tails of the phenotypic distribution (ET2 design). The 500 progeny were ranked with respect to their phenotypic values and the top and bottom 125 (50%) or 50 (20%) progeny were selected for genotyping and analysis; and (3) selection from one tail of the phenotypic distribution (ET1 design). The 500 progeny were ranked with respect to their phenotypic values and the top 250 (50%) or 100 (20%) progeny were selected for genotyping and analysis.
Because segregation distortion is often seen in crosses between inbred lines of both plants and animals, two conditions of allelic segregation were imposed. One condition is random segregation (no segregation distortion) where the probability of the offspring receiving the D allele during meiosis is 0.5. The second condition simulates segregation distortion where the probability of the offspring receiving the D allele during meiosis is 0.7.

Type 1 error rate
Tables II and III show the Type 1 error rates of all tests at α = 0.0001 for the backcross and F2 intercross designs, respectively. These values serve as an evaluation of the conformity of the test statistics to their asymptotic distribution for relatively small sample sizes. Lander and Botstein [13] suggest that linear regression cannot be used when only extreme progeny have been genotyped because genotypic effects will be grossly overestimated because of the biased selection; however, this does not imply that the Type 1 error rate will be inflated. Our results confirmed this. For all tests considered, the empirical Type 1 error rates are very close to the nominal alpha indicating excellent conformity to the asymptotic distribution of the test statistics, when there was no segregation distortion.
When segregation distortion (P = 0.7) was simulated, the Type 1 error rates for the regression-based tests were basically unaffected. By contrast, the Type 1 error rates for the marginal-based tests were severely inflated when either random sampling or an ET1 design was employed (see Tabs. II and III). For the joint tests developed for a backcross design, the Type 1 error rates were inflated when there was segregation distortion (P = 0.7) and one-tailed (ET1) sampling (see Tab. II). Similarly for the joint tests developed for an F2 design, the Type 1 error rates were inflated when there was segregation distortion (P = 0.7) and ET1 sampling (see Tab. II), but there was also some inflation in the false positive rate under a Random and ET2 sampling for the joint test involving multinomial regression with fixed intercepts (M 15 ). The results for selective sampling of N = 250 were very similar and for a brevity that was not displayed.

Statistical power
In some cases, the empirical power rates reached the maximum of unity; however, the tests demonstrated low to moderate statistical power in many other cases. We note that the power curves for the maximum likelihood regression tests (R 3 and R 4 ) were so similar to their OLS counterparts that for graphic clarity we did not plot their results.  Figure 1 shows that when there was no segregation distortion the regressionbased and the joint tests had virtually identical power; whereas, the marginalbased test had virtually no statistical power. When segregation distortion was present, the joint tests showed a slight power advantage over the regressionbased tests. Figure 2 shows that with an ET1 design the joint tests demonstrated a considerable power advantage over the marginal-based tests, while the regression-based tests had minimal power due to restriction of range. However, this power advantage dissipated with the reduction of the sample size from N = 250 to 100. When segregation distortion was present only the regression-based tests demonstrated acceptable Type 1 error rates under ET1 sampling and therefore were the only valid tests under these circumstances. For ET2 designs, the joint tests had very similar power curves as both the regression and marginal tests. For ET2 designs, the results indicate that most procedures have similar power curves when segregation distortions were present, especially the regression-based and joint tests (results not displayed).

F2 Intercross designs
For an F2 intercross with no segregation distortion and random sampling of N = 500 progeny, the regression-based and the joint tests had virtually identical power; whereas, the marginal-based test had virtually no statistical power. The OLS single-df tests (R 1 and J 13 ) demonstrated more power when there was an additive model. By contrast, when there was a dominant mode of inheritance, the OLS two-df tests (R 2 ) and the multinomial regression tests (R 6 and J 15 ) demonstrated more power. For the tests that maintained valid Type 1 error rates under segregation distortion, the OLS joint test (J 13 ) demonstrated considerably more statistical power than the other regressionbased tests (results not displayed). Figure 3 displays the power curves for each test for an F2 intercross with an additive mode of inheritance and no segregation distortion. As was the case for the backcross, under an ET1 design, the joint tests demonstrated a considerable power advantage over the marginal-based tests, while the regression-based tests had minimal power. However, this power advantage dissipated with the reduction of sample size from N = 250 to 100, especially for the multinomial joint test (J 15 ). For ET2 designs, the joint tests had very similar power curves as both the regression and marginal tests.  no power (results not displayed). This result can be attributed to the fact that with one-tailed sampling we chose the upper tail of the phenotypic distribution. Under a dominant mode of inheritance, the heterozygote (BD) and the homozygote (DD) are expected to have the same average phenotype that is greater than the average phenotype of the other homozygote (BB). By selecting the upper tail of the phenotypic distribution one may end up comparing two genotypes with the same expected value.
For an F2 intercross with additive and dominant modes of inheritance and segregation distortion, J 13 had similar power curves to other procedures and had more power than the other procedures under an ET2 design and a dominant mode of inheritance. Again, this power advantage dissipated when the sample was reduced to N = 100 (results not displayed).

DISCUSSION
All tests held Type 1 error rate reasonably near the nominal α under random segregation. But the marginal-based and joint Tests inflated the Type 1 error rate under one-tailed selection (ET1) and segregation distortion. However, the marginal-based and joint tests had valid Type 1 error rates under two-tailed selection (ET2) even when there was segregation distortion. Although the fact that Mendelian inheritance is nearly universal, segregation distortion is considered to be a potent evolutionary force [19]. Yet, the prevalence and importance of segregation distortion is widely debated. Although some researchers contend that segregation distortion is a rare curiosity with little evolutionary importance, it is well known that it occurs more frequently among inbred strains of plants and animals [23]. For example, Xu et al. [26] observed 7% to 32% of markers in inbred rice strains to demonstrate segregation distortion. Also, Liu et al. [14] reported that 29.4% of the 238 loci mapped in inbred soybean strains were found with segregation distortion. Regardless of the evolutionary importance of segregation distortion, we demonstrated it to be a statistical problem for the marginal and joint tests under ET1 designs. We also show that under situations of random sampling and symmetric selective sampling from both tails of the phenotypic distribution (ET2) in an F2 intercross, these tests have roughly equivalent power compared to corresponding alternative tests. In cases of selective sampling from one tail of the phenotypic distribution (ET1), these joint tests are generally more powerful than corresponding alternatives assuming no segregation distortion.
When segregation distortion was present only the regression-based approaches yield valid tests under one-tailed selective genotyping. Thus, the joint tests can only be validly employed with two-tailed selective genotyping if segregation distortion is suspected. The joint tests showed a distinct power advantage over the regression-based tests with random sampling and ET2 designs for additive models with 50% (N = 250) sampling. Thus, for backcross and F2 intercross designs, joint tests are recommended for analyzing data, especially if there is an additive mode of inheritance; however, joint tests are not generally recommended for non-additive modes of inheritance. Furthermore, when segregation distortion is present and an ET1 design is used, the joint tests cannot be recommended. Thus, we developed joint tests that capitalize on information available in both the marginal distribution of genotype and a genotype/phenotype association and are valid in a variety of situation, however, they should be used cautiously if segregation distortion is suspected. Therefore, we also recommend that if segregation distortion is likely, then genetic researchers should use ET2 designs when possible, in which case joint tests may have more statistical power. However, if an ET1 design is employed, the researcher should follow up significant results with more complete genotyping to investigate the possibility of segregation distortion.
Of course it is important to concede that our results only definitively apply to the conditions that were simulated. Our simulations assumed that, within genotype, phenotypic errors were normally distributed and that the error variance was constant across genotypes. Under situations in which these conditions are not met, the relative power of the different tests may not be exactly as reported herein. However, it is noteworthy that the joint tests as constructed should be relatively robust. This is due to features of the backcross and F2 intercross designs and that the tests are based on OLS or logistic regression procedures.
The OLS-based tests assume error distributions that are Gaussian with a constant variance across the genotypes but have been shown to be relatively robust to many forms of non-normality in samples of modest size. Importantly, however, violations of the normality and homogeneous variances are more likely to reduce the statistical power of the OLS-based procedures for detecting linkage. The logistic and multinomial-based procedures are used to predict the genotype membership as a function of the phenotype. In regression terminology, the phenotype is a fixed effect, and thus, there are no distributional assumptions for the phenotype. In this regard, the tests developed herein, particularly those based on logistic (or multinomial) regression, may have some advantages over other tests that assume normality and use maximum likelihood estimation. Fan and Wang [6] have demonstrated that unequal variances and unequal sample sizes do not drastically affect the error rates of logistic regression in the two-group problem (i.e., backcross). Barón [4] demonstrated that in the three-group problem (i.e., F2 intercross), multinomial regression models were preferable with non-normal data, and were comparable to OLS-based procedures with normal data. It is important to note that violations of standard linear model assumptions (i.e., normality; homescedasticity) are commonplace in data from many agriculture disciplines (e.g., livestock breeding). Thus, researchers in these fields have learned not to trust distribution-based P-values and resampling-based tests (e.g., permutation; bootstrap) are generally applied. Thus, future work should examine the statistical properties of these joint tests when resampling based methods are applied.
There are several other strengths of the methods developed that should be considered. First, the extension of these methods to multivariate testing is extremely simple because one simply needs to add more phenotypes on the predictor side of the logistic or multinomial regression equations. This allows one to consider designs in which researchers are mapping genes for a binary (disease) trait and some quantitative phenotypes are also measured on all organisms. In contrast, extension of the joint tests herein to multilocus models will be somewhat more challenging, though certainly not impossible. Such tests would require putting multiple variables on the dependent side of the equation. This might best be done through the multinomial regression approach by extending it to allow for modeling of multilocus genotype contingency tables.
However, using this type of joint test for multilocus models may require more biological assumptions (e.g., Hardy-Weinberg equilibrium; random segregation) or the use of haplotypes.
Similar to the Haley and Knott [9] method, the joint tests can be extended to test any locus within a marker interval in order to approximate interval mapping. First, the probabilities of genotypes can be calculated using the multipoint method of Jiang and Zeng [12]. Then these values can be used in place of marker genotypes. However, developing an exact interval mapping method (similar to the conventional interval mapping in line crosses) will require an additional effort based on the EM algorithm, which is beyond the scope of this paper. Yet, this presents an interesting topic for future study. There are several other future research directions for these joint tests. In the field of experimental crosses, when modeling a disease trait, it may be a useful alternative to threshold models [17]; however, studies have found threshold models to be less powerful for QTL detection than simpler linear models [18]. Future research comparing these approaches may be warranted.

EXAMPLE
For demonstration purpose, we took selected data from a project designed to locate the gene on the mouse chromosome linked with the quantitative trait, the percentage of lung fibrosis that was induced by bleomycin, a therapy for treatment of cancer. Significant positions have been detected on chromosome 17 marker D17mit16 and chromosome 11 marker D11mit272 [10].
For brevity, we took N = 165 mice that had been genotyped on D11mit272 (see Tab. IV). Using tests R 1 and R 2 , the results for all 165 mice show that there are significant mean differences in fibrosis among the three genotypes of Suppose that the researchers employed the ET2 design and genotyped the 88 phenotypically extreme mice. These mice included the top 25% (i.e., mice with fibrosis ≥ 2.50% of the lung n U = 44) and n L = 44 mice with zero fibrosis. The results of the OLS regression-based tests, show the significant mean differences in fibrosis among the three genotypes of D11mit272 [R 2 : F (2,85) = 9.46, P = 0.00020, η 2 = 0.182]. The additive component of this  For the marginal based tests, the results are shown descriptively in Table IV. The results of the marginal tests were statistically significant. The independent samples t-test comparing the mean genotype across the two extremes of the phenotypic distribution value was [M 10 : t (107) = 3.51, P = 0.00073, η 2 = 0.125]. The 2 × 3 contingency table analysis comparing the proportions of each genotype across the two extremes of the phenotypic distribution resulted in χ 2 = 11.18, df = 2, P = 0.00373, φ 2 = 0.127. For the OLS based joint test, J 13 , the results were statistically significant [F (1,85) = 18.48, P = 0.00004]. The multinomial regression joint test (J 15 ) results in a likelihood ratio chi-square statistic [χ 2 = 16.67, df = 2, P = 0.00024]. Since distributionbased P-values are often questioned we also computed P-values via resampling. In this situation with N = 88 the number of permutations was extremely large; therefore we randomly permuted the data 20 000 times and took the test statistics percentile rank in the distribution of permuted test statistics as a permutation-based P-value [24]. The permutation-based P-values were P = 0.00002 for the OLS-based J 13 and P = 0.00006 for the multinomial regression-based J 15 .