Interval mapping of quantitative trait loci with selective DNA pooling data

Selective DNA pooling is an efficient method to identify chromosomal regions that harbor quantitative trait loci (QTL) by comparing marker allele frequencies in pooled DNA from phenotypically extreme individuals. Currently used single marker analysis methods can detect linkage of markers to a QTL but do not provide separate estimates of QTL position and effect, nor do they utilize the joint information from multiple markers. In this study, two interval mapping methods for analysis of selective DNA pooling data were developed and evaluated. One was based on least squares regression (LS-pool) and the other on approximate maximum likelihood (ML-pool). Both methods simultaneously utilize information from multiple markers and multiple families and can be applied to different family structures (half-sib, F2 cross and backcross). The results from these two interval mapping methods were compared with results from single marker analysis by simulation. The results indicate that both LS-pool and ML-pool provided greater power to detect the QTL than single marker analysis. They also provide separate estimates of QTL location and effect. With large family sizes, both LS-pool and ML-pool provided similar power and estimates of QTL location and effect as selective genotyping. With small family sizes, however, the LS-pool method resulted in severely biased estimates of QTL location for distal QTL but this bias was reduced with the ML-pool.


INTRODUCTION
Detecting genes underlying quantitative variation (quantitative trait loci or QTL) with the aid of molecular genetic markers is an important research area in both animal and plant breeding. However, for QTL with small or moderate effect, much genotyping is required to achieve a desired power [9] and the genotyping cost can be prohibitive. Selective DNA pooling is an efficient method to detect linkage between markers and QTL by comparing marker allele frequencies in pooled DNA from phenotypically extreme individuals [8]. Marker allele frequencies can be estimated by quantifying PCR product in the pool [22] and linkage to a QTL can be detected by conducting a significance test at each marker. This approach has been used to detect QTL in dairy cattle [12,18,20,24], beef cattle [13,26] and chickens [18,19,28].
Analyses of selective DNA pooling data are typically based on single marker analyses [8], which cannot provide separate estimates of QTL location and QTL effect, nor can they utilize the joint information from multiple linked markers around a QTL. Interval mapping methods have been developed to get around these problems for individual genotyping data [16] but have not been developed for selective DNA pooling data.
Dekkers [10] showed that pool frequencies for flanking markers contain information to map a QTL within an interval. In his study, observed marker allele frequencies in the selected DNA pools were modeled as a linear function of QTL allele frequency in the same pool and recombination rates between markers, and location and allele frequency of the QTL could then be solved analytically based on observed frequencies at the two flanking markers. Simulation results showed that this method provided nearly unbiased estimates when power was high but was biased when power was low. In addition, estimates did not exist for some replicates and others provided estimates outside the parameter space. Also, this method is not suitable for pooled analysis of multiple families and only used data from flanking markers and not from markers outside the interval [10]. External markers can provide information to map QTL in the case of DNA pooling data because observed frequencies are subject to technical errors.
The objective of this study, therefore, was to develop an interval mapping method to overcome the forementioned problems. Two methods that allow simultaneous analysis of selective DNA pooling data from multiple markers and multiple families were developed. One was based on least squares regression (LS-pool) and the other on approximate maximum likelihood (ML-pool). Both methods were evaluated by simulation.

MATERIALS AND METHODS
Basic principles of detecting QTL using selective DNA pooling data were presented by Darvasi and Soller [8]. Figure 1 illustrates its application to a single half-sib family, with a sire that is heterozygous for a QTL (Qq) and a nearby marker (Mm). The sire is mated to multiple dams randomly chosen from a population in which the marker and QTL are in linkage equilibrium. In concept, progeny can be separated into two groups, depending on the QTL allele received from the sire. The dam's QTL alleles, polygenic effects and environmental factors contribute to variation within each group of progeny, resulting in normally distributed phenotypes for the quantitative trait within each group. For selective DNA pooling, progeny are ranked based on phenotype and the highest and lowest p% are selected. An equal amount of DNA is extracted from each selected individual and DNA from individuals in the same selected tail is pooled to form upper and lower pools. The frequency of marker alleles in each pool can be determined by densitometric PCR or other quantitative genotyping methods. Three alternative methods for analysis of the resulting data will be presented.
), or in matrix notation: where f i is a vector with observed marker allele frequencies for family i and 1 / 2 is a vector with elements 1 / 2 . For the least squares analysis, sampling and technical errors are combined into a single residual vector: e i = se i +te i . For a given putative position of the QTL, recombination rates r j are known and, thus, elements of matrix X i are known, and Model 1 can be fitted using ordinary least squares: This model can be extended to multiple independent sire families by simply expanding the dimensions of the matrices in Model 1. Using a common QTL position, the multi-family model estimates separate QTL allele frequency deviations for each family, which allows for a different QTL substitution effect for each sire. Similar to least squares interval mapping with individual genotyping data [14], the model is fitted at each putative QTL position and ordinary least squares is used to estimate parameters β i = (p U Q i − 1 / 2 ), assuming residuals are identically and independently distributed. The following test statistics are calculated at each position and the position with the highest statistic is taken as the estimate of QTL position: if V TE is known, where SS regression,i is the sum of squares of regression for family i; if V TE is not known, where SS error,i is the sum squares of residuals for family i. Estimated QTL allele frequencies at the best position are then used to estimate QTL substitution effects for each sire i,α i , following Dekkers [10]. In some applications, D values -the difference in observed marker allele frequencies between the upper and lower pools -are used for QTL mapping [17].

691
To adapt to handle D values, the following model can be used: or in matrix notation: where D M i j is the D value of the j th marker of the i th sire family, D Q i is the expected D value for the QTL allele of the i th sire family, and e D i j are residuals, including both sampling and technical errors, with variance equal to SE 2 D i j , which can be derived as described in Lipkin et al. [17], accounting for variance of technical error, the overlap of sire marker alleles with those of its mates, different numbers of pools and replicates, and different numbers of daughters per pool. A weighted least squares [23] method can then be applied to allow for different values of SE 2 D i j for different sires. The test statistic, summed over families at a given putative QTL position, can then be derived as: where V i is a diagonal matrix with variances SE 2 D i j as elements.

Approximate maximum likelihood interval mapping method (ML-pool)
Sampling errors that contribute to observed frequencies at linked markers for a given family, i.e. elements of vector se i in model 1, are correlated. These correlations are not accounted for by the LS-pool method, which reduces its efficiency. An approximate maximum likelihood method, ML-pool, was developed to overcome this problem.
In the ML-pool method, the distribution of e i = se i + te i is approximated to multivariate normality, given the multi-factorial nature of technical errors, near-normality of the distribution of the binomial sampling errors with sufficiently large n i (n i > 30), and the small probability that modeled frequencies fall outside the parameter space (0-1), since the expected allele frequency is near 0.5. With the expectation of the vector of marker allele frequencies for sire i defined as in Model 1 (X i β i ), the covariance matrix is defined as: where matrices Σ U i and Σ L i are the covariance matrices of residuals for marker allele frequencies within the upper and lower pools of family i. By conditioning on the proportion selected for the upper and lower pool within a family, marker frequencies from the upper and lower pool are uncorrelated. Variances and covariances in Σ U i are defined as: If markers j and l bracket the QTL (M j -Q-M l ) then: where r jl is the recombination rate between markers (see Appendix online for detailed derivation).
If the marker order is (M j -M l -Q): Both X i β i and Σ i are functions of p U Q i and r, the vector of recombination rates between markers and QTL, which is determined by QTL location. Consequently, for a given QTL location (π Q ) and certain values of p U Q i , the likelihood function for the vector of observed allele frequencies of k markers for m independent families, based on approximation to multivariate normality, is: Under the null hypothesis of no QTL, p U Q i = 1 / 2 for each family and the likelihood is a constant (L 0 (f-1 / 2 )) and does not depend on QTL location. Under the alternative hypothesis, the likelihood function (L A (f-1 / 2 )) can be maximized by a golden-section search algorithm [15] for the optimal p U Q i of each family at a given QTL position (π Q ) and the following log likelihood ratio statistic (LR) can be calculated Selective DNA pooling QTL mapping 693 Each putative QTL position along the chromosome is tested and the set of parameters (π Q and p U Q 1 ,p U Q 2 , . . . , p U Q m ) that provides the highest LR gives the estimates of QTL position and QTL allele frequencies, which are used to estimate QTL allele substitution effects for each sire, as for the LS-pool. With unknown technical error variance, V TE is included as an additional parameter to be optimized in the search routine.
For D values, the covariance matrix can be adapted by including SE 2 D i j on the diagonal and off-diagonals that are the sum of the covariances for residuals of observed marker allele frequencies in the upper and lower pools and a similar likelihood ratio statistic (LR) can be calculated.

Simulation model and parameters
Ten half-sib families with 500 or 2000 progeny per family were simulated to validate the proposed methods. The simulated population structure was designed to mimic dairy cattle data used for a selective DNA pooling study by Lipkin et al. [17] and Mosig et al. [20]. For each individual, six fully informative markers were evenly spaced on a 100 cM chromosome (including markers at the ends). Dam alleles were assumed to be different from sire alleles and in population-wide linkage equilibrium with the QTL. Crossovers were generated according to the Haldane mapping function, which implies independence of recombination events in adjacent intervals on the chromosome. A single additive bi-allelic QTL with population frequency 0.5 was simulated at position 11 or 46 cM, with an allele substitution effect of 0.25 phenotypic standard deviations, which was set equal to 1. Heritability was 0.25 and phenotypic values of progeny were affected by the QTL along with polygenic effects and environmental factors, which were both normally distributed, and simulated as: where y i j is the phenotypic value of progeny j of sire i, μ is the overall mean, g QT L i j is the QTL effect based on the QTL alleles received from the sire and dam, g sire i is the polygenic effect of the sire i, g dam i j is the polygenic effect of dam j mated to sire i, g M i j is the polygenic effect due to Mendelian sampling, and ε i j is the environmental effect for progeny j of sire i. Progeny were ranked by phenotype within each half-sib family and the top and bottom 10% contributed to DNA pools. For each marker, the true paternal allele frequencies in pools were obtained by counting and a normally distributed technical error with mean zero and zero variance (no technical error) or 0.0014 was added.
Then, to satisfy the condition that frequencies of the two alleles sum to one, simulated frequencies were divided by the sum of the simulated frequencies of the two paternal alleles. The resulting variance due to technical errors in the observed allele frequencies was either V TE = 0.0 or V TE = 0.0007. The latter was equal to the technical error variance estimated by Lipkin et al. [17]. Allele frequencies were observed for each half-sib family and for all markers.
Single marker analysis, LS-pool and ML-pool were applied to the simulated selective DNA pooling data, with or without previous knowledge about technical error variance. Sire marker haplotypes were assumed known. For comparison, the simulated data were also analyzed by selective genotyping by applying regular least squares interval mapping [14] to individual marker genotype and phenotype data on individuals with high and low phenotypes. Estimates of QTL effects were adjusted based on selection intensity following Darvasi and Soller [8].
For each set of parameters and each mapping method, the criteria for comparison of methods were the following: (1) power to detect the QTL, (2) bias and variance of estimates of QTL location, and (3) bias and variance of estimates of QTL effects. The LS-pool, ML-pool and selective genotyping methods provide separate estimates of QTL location and QTL effect. For single marker analyses, position of the most significant marker was used as the estimate of QTL position. For each set of parameters and each mapping method, 10 000 replicates were simulated under the null hypothesis of no QTL to determine 5% chromosome-wise significant thresholds of the test statistics and 3000 replicates were simulated under the alternative hypothesis.

Validation of the symmetry assumption
One important assumption in both LS-pool and ML-pool is that distributions of phenotypic values within the group of progeny receiving the "Q" or "q" allele from the sire are the same and symmetric. Under this assumption, frequency p U Q i is expected to be equal to p L q i and, therefore, only one parameter for QTL allele frequency needs to be estimated. This symmetry assumption will be invalid if the QTL is dominant or if the QTL allele frequency among dams is not 0.5. Under these situations, Qq progeny will not be equally distributed across the upper and lower pools and it may be more appropriate to fit two QTL allele frequency parameters in the model, one for each selected pool.
Then Model 1 becomes: The symmetry assumption was evaluated and results from least squares models that fitted one (LS-pool-1) or two QTL frequencies (LS-pool-2), one for the upper and one for the lower pool, were compared for different combinations of QTL dominance and QTL allele frequencies among dams. Since the ML-pool is computationally more demanding and the difference between the LS-pool and ML-pool was not expected to be large, only LS-pool was investigated. Table I shows power for the LS-pool, ML-pool and single marker methods of analysis of the simulated selective DNA pooling data and of selective genotyping analysis of the simulated individual genotyping data. All four methods resulted in high and similar power ( 97%) for the large family size and moderate power (51 to 80%) with small family size (Tab. I). Power was the highest for selective genotyping, because it is not affected by technical errors associated with pooling and utilizes the distribution of phenotypes within the phenotypic tails. Power for selective genotyping was, however, only up to 6% greater than for the ML-pool. Among methods using selective DNA pooling data, for most situations, ML-pool provided the highest power, followed by LS-pool and single marker analysis. The power of the LS-pool was, however, significantly affected by true QTL position, and was close to or lower than power from single marker analysis for non-central QTL, and similar to or greater than power from the ML-pool for central QTL with known V TE . For the latter case, power from the LS-pool was even greater than power from selective genotyping. These discrepancies resulted from the heterogeneous distribution of the    Table III. small family size). Power of the LS-pool was 10 to 14% greater for a central QTL than for a distal QTL, 2 to 5% greater for single marker analysis, but only 1 to 2% greater for the ML-pool. The presence of technical errors (V TE = 0.0007 versus 0) only slightly decreased power ( 5%) for all methods and in all situations, except that single marker analysis with known V TE and a distal QTL had 7% greater power when no technical errors were present. Table II shows means and standard errors (as a measure of mapping accuracy) of estimates of QTL location obtained from the four methods. The results The results are for known technical error variance (V TE ) but were almost the same with unknown V TE . The results of selective genotyping were independent of V TE and are presented twice. The results were based on 3000 replicates. Other simulation parameters are as in Table I. were little affected by prior knowledge of technical error variance, so only results with known variance are shown. With a central QTL or with large family size, all four methods resulted in nearly unbiased estimates of QTL location (bias 4.5 cM) but with distal QTL and small family size, all four methods resulted in some bias toward the center of the chromosome. Biases were the smallest for selective genotyping (<5 cM) and the greatest for the LS-pool (9 to 11 cM). Estimates from the ML-pool had similar biases as single marker analysis (6 to 8 cM). The presence of technical errors only slightly increased biases (<2 cM) for all situations and with all four methods. Standard errors (SE) of estimates of QTL location were reasonable with large family size (<12 cM) but large (11 to 21 cM) with small family size for all four methods. Standard errors were up to 4.6 cM larger for distal than central QTL and the presence of technical errors increased SE's by 1 to 3.6 cM. Single marker analysis had location estimates with the largest SE. With large family size, selective genotyping had smaller SE of location estimates than other methods. But with small family size, the LS-pool had the smallest SE, even smaller than selective genotyping, except for distal QTL and with the presence of technical errors. This result is also caused by the heterogeneous distribution of the test statistic for the LS-pool, which results in a tendency of higher test statistics around the center of the chromosome (Fig. 2) and, therefore, regression of position estimates towards the center.

Estimates of QTL effects
Only interval mapping methods (LS-pool, ML-pool and selective genotyping methods) provide estimates of QTL effects. Single marker analysis does provide estimates of marker-associated effects but these were not evaluated. All methods gave unbiased or nearly unbiased estimates of QTL effects and similar SE's of estimates (results not shown). Means and accuracy of estimates of QTL effects with known or unknown technical errors were essentially the same for the LS-pool and ML-pool. Standard errors were small (0.06-0.07 phenotypic standard deviations) for large families (2000 progeny) but were doubled (0.13 to 0.14 standard deviations) for small families (500 progeny). The ratio of SE of estimates of QTL effects was proportional to the square root of the ratio family size, as expected for estimates from regular linear regression. True QTL location and the presence of technical error had little effect on estimates of QTL effects.

Comparison of methods based on significant replicates
Generally, only significant QTL mapping results are reported from actual experiments. Thus, it is also necessary to evaluate methods based on significant replicates only. Table III shows means and SE's of estimates of QTL location based on only significant replicates for the small family size (all methods had high power with large family size, so the results were almost unchanged with only significant replicates and therefore omitted). The results with known and unknown V TE were similar and only estimates with known V TE are presented. Similar to results from all replicates (Tab. II), biases in estimates of QTL position for significant QTL were negligible with central QTL (Tab. III). When the QTL was distal, biases were reduced from 4.8 to 2.6 cM for selective genotyping, from 6-7 cM to 3-4 cM for single marker analysis and ML-pool, but from 10 to 9 cM for the LS-pool. Therefore, biases towards the center of estimates of location were nearly halved for selective genotyping, single marker analysis, and ML-pool, when considering only significant replicates, but a large bias remained for the LS-pool with distal QTL. For the ML-pool, single marker analysis, and selective genotyping, SE's of estimates of QTL location were reduced by about 3 cM with central QTL and by 5-6 cM with distal QTL. But for the LS-pool, standard errors were reduced only by 0-2 cM with central QTL and by about 3 cM with distal QTL. For all methods, the QTL effect was overestimated when selecting only significant results (mean estimates were 0.27 standard deviations while the true effect was 0.25 standard deviations) but the SE of estimates was almost unchanged (results not shown). Differences between the four methods in estimates of QTL location and effect were similar when considering only significant instead of all replicates. Table IV shows the sum of true QTL allele frequencies over selected pools, power, and estimates of QTL location and of QTL substitution effects from LS-pool-1 (one parameter for QTL allele frequency) and LS-pool-2 (two parameters for QTL allele frequency, one for each pool), with no and complete dominance at the QTL and different QTL allele frequencies in the dam population. The results in Table IV indicate that the sum of the true QTL allele frequencies over both selected pools was very close to one, which suggests that the symmetry assumption was valid even if the QTL was dominant or the QTL frequency among dams deviated from 0.5. The LS-pool-1 method consistently had greater power to detect the QTL, and lower bias and standard errors of estimates of QTL location than the LS-pool-2, except with complete dominance and high frequency (0.9) of the dominant QTL allele in the dam population, for which both methods had very low power and poor estimates. Estimates of QTL effects were similar and unbiased for both methods. The difference in power between LS-pool-1 and LS-pool-2 was about 20% when the QTL was co-dominant or when the frequency of the dominant QTL allele in the dam population was 0.5 or lower. Frequency of the QTL among dams had little effect on power and estimates of QTL location when the QTL was co-dominant but had a large impact with complete dominance. Low frequency Table IV. Ten half-sib families with 500 progeny were used and the true QTL was at 11 cM. Results with unknown technical error and variance equal to 0.0 are presented as an example. Other simulation parameters were the same as Table III. of a dominant QTL allele in the dam population greatly increased power and precision of estimates of QTL location, while a high frequency decreased both power and precision of estimates of location. Estimates of QTL effect were similar for LS-pool-1 and LS-pool-2, were nearly unbiased, and had similar standard errors for all situations. When the QTL is dominant and the dominant allele is rare in the dam population, the ability to detect the QTL is large but when the QTL is dominant and the frequency of the dominant allele is greater than 0.5 in the dam population, it was almost not possible to detect a QTL of moderate effect (Tab. IV). A similar result was also found for single marker analysis [4]. Dominance and allele frequencies in the dam population affect the QTL allele substitution effect [11], which determines power to detect the QTL and, thereby, affects the bias and accuracy of estimates of QTL location and effect.

DISCUSSION
With rapidly improved techniques, the cost of genotyping large numbers of individuals is decreasing, which reduces the benefits of pooling. However, it remains important to pursue methods to efficiently collect QTL information, especially in the first step of genome scan. Selective DNA pooling can be one of those methods. In addition to QTL mapping in pedigreed populations using linkage analysis, DNA pooling techniques have been applied to large scale association analyses in several recent studies [1-3, 6, 21].
In this paper, we present methodology that allows detection and interval mapping of QTL based on selective DNA pooling data in linkage analyses. The developed methods have clear advantages over the single marker methods that are currently employed for analysis of such data [8] and over the analytical method for analysis of flanking markers that was proposed by Dekkers [10]. These include (1) ability to obtain separate estimates of QTL position and effect; (2) estimates of location that are guaranteed to be within the parameter space, which was not possible with the analytical method of Dekkers [10]; (3) ability for simultaneous analysis of multiple markers and families; and (4) ability to account for missing or uninformative data for individual markers on individual sires. The impact of these advantages over current methods will be discussed further below, within the context of the simulation evaluations that were conducted. In addition, we demonstrated that the interval mapping analysis methods for selective DNA pooling data, in particular ML-pool, resulted in QTL mapping results (power, accuracy, and precision) that were not much worse than those obtained from selective genotyping analysis, which requires individual genotyping. Selective DNA pooling allows for a substantial savings in genotyping costs and analysis of resulting data by the ML-pool resulted in only 3-6% lower power than selective genotyping, even with small family size and distal QTL (Tab. I). In addition, the ML-pool resulted in less than 2.2 cM greater bias toward the center than selective genotyping, less than 4 cM greater SE estimates of location, as indicators of mapping accuracy (Tab. II). These results indicate that most QTL information from selective genotyping data is contained in marker allele frequencies in the phenotypic extremes and that ML-pool can efficiently retrieve this information, even if a certain level of error is present in estimates of marker allele frequencies. Although the least squares regression method that was used here is not the most efficient method for analysis of selective genotyping data, it is computationally much less demanding and is expected to give similar results than maximum likelihood methods [16,27] for the balanced data sets that were analyzed here.
The interval mapping methods developed here for selective DNA pooling data utilize information from all markers on the chromosome to detect the presence of a QTL at a given position. With individual genotyping and fully informative markers, only flanking markers provide information to detect a QTL at a given position and external markers provide no additional information. This is not the case for selective DNA pooling data because of the technical errors that are associated with allele frequency estimates at each marker and, thus, simultaneous use of data on all markers results in some averaging of technical errors. In the present analyses and simulations, technical errors were assumed independent across markers. In practice, however, allele frequencies on linked markers are usually estimated from the same batch, by the same machine, and laboratory analyses are conducted by the same person. In addition, there will be variation in the amount of DNA that is present in the pool from each individual. All these factors cause correlations between technical errors at linked markers. Ignoring correlations among technical errors will result in some biases in estimates of QTL location, similar to the biases introduced from ignoring correlations among sampling errors when comparing the LS-pool to the ML-pool method.
Simulation results show that the magnitude of the variance of technical errors (V TE ) only had a small effect on QTL mapping results for all three pool analysis methods, including single marker analysis (Tabs. I and II). Baro et al. [4] and Darvasi and Soller [8] observed a larger effect of V TE for single marker analysis, but they evaluated a much wider range of V TE (from 0 to 0.1) than what has been obtained in practice [18]. Interval mapping methods that simultaneously use multiple markers should theoretically be more robust to technical errors than single marker analysis because technical errors will be averaged out by considering information from linked markers but this trend was not very clear in the current study (Tabs. I and II). Utilizing prior knowledge of technical error variance did, however, result in the greatest increases in power for single marker analysis (up to 20%), followed by the LS-pool (up to 13%), and minimal ( 2%) for the ML-pool (Tab. I). The small increment for the ML-pool was probably due to more accurate estimates of V TE for the ML-pool than LS-pool when V TE is unknown.
When comparing LS-pool and ML-pool methods, both methods provided similar QTL mapping results for the large family size; but with small family size, the LS-pool resulted in lower power and severe biases in estimates of location when the QTL was distal (Tabs. I and II). The ML-pool method generally had equal or greater power to detect the QTL than the LS-pool method, except when the QTL was positioned at the center and technical error variance was known (Tab. I). The ML-pool also resulted in smaller biases but in lower accuracy of location estimates than the LS-pool (Tab. II). The differences between the ML-pool over LS-pool stem from the fact that the ML-pool accounts for correlations in allele frequencies between linked markers and is, therefore, based on a more appropriate model than the LS-pool. The ML-pool Method is, however, computationally more intensive, while the LS-pool can be readily applied with standard statistical software.
Because of the computational ease and flexibility of least-squares analyses, some methods were explored to correct the large biases in position estimates that were observed for the LS-pool with small family size and distal QTL. In addition, since estimates of QTL location from all methods resulted in some biases in location estimates, methods to successfully correct biases for the LSpool may also help to correct biases from other methods. There are two reasons for bias in location estimates from the LS-pool when the QTL is distal: (1) heterogeneous distribution of the test statistic across the chromosome and (2) non-central position of the QTL within the parameter space. The former is unique to the LS-pool (Fig. 2). A non-central position of the QTL is a source of bias that is common to all QTL mapping methods and is caused by the bounds that are imposed on deviations of location estimates from the true position by the boundaries of the chromosome. Therefore, in addition to the position of the QTL within the flanking marker interval, its position on the chromosome can have a large impact on estimates of the QTL position, including estimates from single marker analysis and selective genotyping with regular interval mapping (Tab. II). Biases introduced by non-centrality will be greater for methods with lower power; because deviations from the true position will be larger and will, therefore, have a greater impact on methods for analysis of DNA pooling data. Based on the reasons for biases in estimates of QTL location in the LS-pool described above, different methods for correcting the bias were developed and evaluated. These included two approaches aimed at correcting biases due to heterogeneous distribution of the test statistic: use of flanking markers only, and standardization of the test statistic by correcting for the mean and variance of the test statistic under the null hypothesis (Fig. 2). In addition, a parametric bootstrap method [7] was employed to develop a "correction" table that provides the average estimated location for each true QTL position. To obtain this table, phenotypic values for each individual were simulated and the estimate of the QTL effect obtained from the original data by the LS-pool was used as the true QTL effect, since the effect estimates were found to be nearly unbiased in the LS-pool. Although all three methods reduced biases in estimates of location, several additional problems were created, including an overabundance of estimates at marker positions and a reduction in mapping accuracy. Further research is needed to effectively correct biases in estimates of QTL location.
With single marker analysis and selective genotyping method, the QTL position relative to flanking markers has an impact on the mapping result (power, accuracy and precision) of single marker analysis and selective genotyping using the regular interval mapping method. However, in the LS-pool and MLpool, when all the informative markers along the chromosome are simultaneously used, the true QTL position relative to the chromosome is more important, especially for the LS-pool, where a heterogeneous distribution of the test statistic was observed under the null hypothesis.
Both LS-pool and ML-pool methods were robust to potential deviations from the assumption that the frequency of the favorable QTL allele in the upper tail is expected to be equal to the frequency of the unfavorable QTL allele in the lower tail (E(p U Q i ) = E(p L q i )). Two factors that could violate this assumption were explored: dominance at the QTL and different QTL allele frequencies among dams. In both cases, however, it was redundant to include two frequency parameters in the model, which will reduce power and accuracy and precision of estimates. Other factors that could result in E(p U Q i ) not to be equal to E(p L q i ) are (1) selection of unequal proportions in the two tails, or (2) nonnormality of the distribution of phenotypes. Both could be accommodated in the one-parameter model by including the expected relationship between p U Q i and p L q i . With different selection proportion and normally distributed phenotypes, this relationship can be derived as a function of selection intensities corresponding to the proportions selected in the upper and lower tails, based on the effect of selection on allele frequencies [11], and the QTL effect, which allele frequencies and intensity, further reducing the number of parameters to estimate. Power to detect more than one QTL on a chromosome will, however, be limited for most designs, even more so than for individual genotyping data.
Both the LS-pool and ML-pool require knowledge of marker haplotypes of parents, which is usually not known in practice. Haplotypes can be identified based on progeny, genotyped individually, or based on cosegregant pools [25], but requires extra costs.
Another limitation of the selective DNA pooling interval mapping methods is that there is no easy way to obtain chromosome-wise significant thresholds that account for multiple correlated tests conducted on the chromosome. One possibility is simulation, in which the phenotypic value and marker information of the progeny are simulated to mimic the real data. However, this depends on assumptions about the model and the phenotypic distribution. In addition, both the LS-pool and ML-pool also assume that the multiple sire families are independent, which may not be true in practice.