Empirical evaluation of selective DNA pooling to map QTL in dairy cattle using a half-sib design by comparison to individual genotyping and interval mapping

This study represents the first attempt at an empirical evaluation of the DNA pooling methodology by comparing it to individual genotyping and interval mapping to detect QTL in a dairy half-sib design. The findings indicated that the use of peak heights from the pool electropherograms without correction for stutter (shadow) product and preferential amplification performed as well as corrected estimates of frequencies. However, errors were found to decrease the power of the experiment at every stage of the pooling and analysis. The main sources of errors include technical errors from DNA quantification, pool construction, inconsistent differential amplification, and from the prevalence of sire alleles in the dams. Additionally, interval mapping using individual genotyping gains information from phenotypic differences between individuals in the same pool and from neighbouring markers, which is lost in a DNA pooling design. These errors cause some differences between the markers detected as significant by pooling and those found significant by interval mapping based on individual selective genotyping. Therefore, it is recommended that pooled genotyping only be used as part of an initial screen with significant results to be confirmed by individual genotyping. Strategies for improving the efficiency of the DNA pooling design are also presented.


INTRODUCTION
Linkage analysis studies to detect genes of modest effect that are expected to underlie complex traits are large costly experiments. Selective genotyping, where only the extreme individuals are typed, can reduce costs [7,16]. By combining selective genotyping with DNA pooling, the costs can be lowered significantly. Such a design is appealing for dairy populations, where the availability of large half-sib populations can be exploited efficiently [18,19,21,25].
Prediction of linkage in such studies is based on the observation of a difference in sire marker allele frequencies between the extreme pools [8]. For microsatellites, which represent the most widely used class of markers in linkage studies, inferring allele frequencies from pools is complicated by the presence of stutter (shadow) product and preferential amplification. Thus, accurate frequencies can only be estimated from pools after correcting for these artefacts. Several methods have been presented in the literature for dealing with microsatellite stutters [1,18,23]. To correct for stuttering it is necessary to characterise the nature of stutter patterning for each microsatellite marker with individual genotypes before the correction can be applied to electropherograms of pooled DNA samples. However, when the objective is to determine the difference between the high and low pools it may not be necessary to determine individual allele frequencies from pools. This is because biases associated with stutter peaks and preferential amplification should apply equally to both high and low pools when analysis is performed within families, as was done here.
The main objective of this study was to understand how selective DNA pooling compares to individual selective genotyping. To this end all of the markers genotyped in the pools were individually genotyped in the sire families comprising the same individuals used to construct the pools. A simple linear model was used to analyse the sire marker allele heights in the pool data, which was obtained from Genotyper TM without correction for stutter effects. This was compared to the analysis of the true sire allele frequencies determined from individual genotypes in the same model. The established method of Lipkin et al. [18] was also evaluated to see how it compared to the linear model analysis that was used here. Finally, results from the linear model analysis were compared to those from QTL Express software used for interval mapping, to enable comparison with selective genotyping (same animals).

Pool preparation and genotyping
Samples used in this study were from six paternal half-sib dairy cow pedigrees. All animals were ranked according to the Australian selection index (ASI) which is a profit index calculated as (3.8 × Protein EBV) + (0.9 × Fat EBV) -(0.048 × Milk EBV) where EBV is the estimated breeding value for that trait [4,17]. Roughly equal numbers of daughters from the phenotypic extremes (top and bottom 10%) of the ASI were identified for each sire, and blood samples from a final total of 1244 cows were obtained. The mean ASI of the high and low cows was 69.5 and −1.2, respectively. DNA was prepared from whole blood using the QIAamp 96 Spin Blood Kit on a Bio Robot 3000 (Qiagen) and quantified as described in Mariasegaram et al. [20]. Within each sire and each extreme (high or low ASI), we constructed two pools (e.g. High Pool A and High Pool B) with individuals allocated evenly and randomly between the two. Equal amounts of DNA from all of the respective individuals were combined to form the pools. Thus, there were a total of 6 × 2 × 2 = 24 pools with DNA from 30 to 50 cows per pool.
Eighty-six fluorescently labelled microsatellites spaced 15-25 cM apart were chosen from the 29 autosomal bovine chromosomes to emulate a typical genome scan. These markers were amplified in the two high and low pools, respectively, per sire. PCR products were electrophoresed on the ABI 3700 DNA Analyser and the electropherograms viewed in Genotyper Version 3.5.

Statistical analysis using ratio of sire allele heights (RH) in a linear model
For each pool genotyped with a specific marker, the size in bp and height (H) of the two sire alleles were recorded in Genotyper TM . The ratio of sire alleles was calculated as the height of the smaller allele (H S ) to the height of the larger allele (H L ) to obtain H S /H L .
The following nested linear model was used to analyse the data: where Y = log 10 (H S /H L ), µ = overall mean, m i = effect of the i th marker, s i j = effect of the j th sire within the i th marker, p i jk = effect of the ASI tail (high or low) within the j th sire within the i th marker, e = error associated with the l th pool (l = 1 or 2) within the i jk tail. ANOVA was carried out in GenStat 6 th edition (VSN International Ltd) using the above model. A single F-test was used to accept or reject the null hypothesis that there were no overall difference between the high and low pools for any of the markers analysed: (H S /H L ) Highpool = (H S /H L ) Lowpool or equivalently all p i jk = 0. Within each sire-marker combination, a t-test was used to test the significance of the difference between high and low pools. The t-value for each sire-marker combination was calculated as: where, Y H is the mean of the log ratio of sire alleles for both replicate high pools for a given sire-marker combination; Y L is the mean of the log ratio of sire alleles for both replicate low pools for a given sire-marker combination; s.e.d. is the standard error of the difference obtained from GenStat ANOVA output.

Comparison of pools to individual genotyping
The power of the DNA pooling design was evaluated in relation to individual genotyping. This is important because technical errors inherent to the pooling methodology as well as the impact of sire allele prevalence in the dam population can affect the power of the pooling design to detect QTL [2,8]. The published method of Lipkin et al. [18] was also evaluated on the individual genotyping data to see how it compared to the simple linear model analysis that was used for the analysis of data from the pool genome scan. Finally, the correspondence in t-values between the single marker analysis and interval mapping was compared for these markers.

Calculation of TR -sire allele frequency ratio in pools based on individual genotyping
A frequency ratio (abbreviated TR) was calculated for each pool as the frequency of the smaller sire allele (F S ) to the frequency of the larger allele (F L ) to give F S /F L for individuals in a particular pool. The logarithm of this frequency ratio was substituted in place of the log ratio of heights (RH) in the linear model and the ANOVA repeated. This allowed us to compare how the ratio of heights (RH) calculated from pool electropherograms performed in relation to using the frequency ratio that was based on individual genotyping (TR). The agreement between the two sets of values provided an indication of the effect of technical errors on the pooling methodology.

Calculation of SR -sire allele frequency difference in pools based on individual genotyping data after excluding uninformative daughters with the same heterozygous genotype as their sires
If a sire was heterozygous at a marker for alleles, S and L, then all S and L alleles recorded in his daughters were included in the frequencies used to calculate TR. However, some of these alleles come from the dams of the cows genotyped, not from their sire. Therefore, in calculating SR, cows where the sire origin of the sire alleles could not be deduced because they were heterozygous for the same alleles as their sire were excluded from the estimation of F L and F S . A sire frequency ratio was not calculated for each pool as in previous instances. This was because when there were only a few alleles segregating in the population at a certain marker locus, the probability of both of the sire alleles being prevalent amongst the dam population was much higher. As a consequence, few individuals were informative, and this often resulted in bloated values of the sire allele frequency ratios for some of those sire-marker-pool combinations. A more robust estimate proved to be the difference between the frequency of the smaller sire allele (F S ) and the frequency of the larger sire allele (F L ) to give F S − F L , which was calculated for each sire-pool-marker combination. This value was analyzed using the linear model without logarithmic transformation. A weighted ANOVA was performed using the number of individuals in each pool after excluding the non-informative individuals. The standard error of the difference was calculated as follows: where, residual mean square was obtained from the ANOVA output, while n1 and n2 represent the total number of informative individual that constituted the high and low pools.

Significance testing by the method of Lipkin et al. [18] and using individual genotyping data
The linear model method that we describe used the replicate pools to derive an empirical estimate of the error variance and used this in the significance test. An alternative is to use sampling properties of the binomial distribution as well as the technical error contribution to estimate the standard error of the difference between pools, as described by Lipkin et al. [18]. The significance test for marker-QTL linkage in their method is based on rejecting the null hypothesis, D = 0, where D is the difference in sire allele frequency between the high and low pool for a certain marker [18].
is the standard error of D. In evaluating this method on the 86 marker data set, the technical error variance (from errors in DNA quantification, pool formation, frequency estimation, etc.) associated with determining frequencies from pools was assumed to be zero because we calculated frequencies from the individual genotype data.
The original design described by Lipkin et al. [18] involved the use of external pools formed from daughters with the highest or lowest EBV, and internal pools consisting of the remaining high or low daughters. Marker-QTL linkage was tested initially on the external pools. Significant results were followed up by genotyping the internal pools in addition to the external pools. The design used for this study differed slightly in that while two pools were prepared for each sire from the daughters at each of the phenotypic tails (resulting in two high and two low pools), all four pools were genotyped at the outset. As described above, the allocation of selected individuals to either of the high or low pools was done randomly. For ease of application of the Lipkin et al. [18] method, however, the external pools were arbitrarily set to be the combination of high pool 1 and low pool 1, while the internal pools paired high pool 2 with low pool 2 within each sire. Sire allele frequencies were determined from individual genotypes of the daughters used to construct the pools. The z-score measuring significance of the sire-marker-ASI effect was averaged across the internal and external pools.
The main aim of evaluating the allele frequency difference method of Lipkin et al. [18] was to compare it to the simple linear model that was developed for the analysis of our experimental data.

Comparison of t-values from pool genome scan to t-values from interval mapping using individual genotypes
The comparison of t-values between QTL Express output to those from the single marker analysis is not straightforward. While QTL Express provides an estimate of the significance at each marker in terms of a t-value, the sign, however, is dependent on how the programme defined the first allele of the first marker. The subsequent haplotype at each marker is then defined with respect to that first allele. Therefore, a negative t-value from QTL express is not directly comparable to a negative value from the single marker analysis. In the case of the latter method, the ratio was calculated as height of the smaller allele divided by the height of the larger allele. If this was changed to say, the height of the larger allele divided by the height of the smaller allele, then the sign of the t-value reverses as well. Comparisons were valid between the linear model analyses because of the consistency in how the ratios were calculated. To enable comparison between QTL Express output and that from single marker analysis, the sign of t-values for QTL Express data were changed to be the same as from the SR analysis. Obviously, this would be a best case scenario and the correlation should only be viewed as the highest achievable under such circumstances. For the purpose of this analysis, the programme was forced to calculate significance in the form of a t-value at each marker position along the chromosome within individual families. In each case the programme applied a one-QTL model to the data set using least squares regression analysis [9,15,24]. The values were compared to data from the single marker analysis in the linear model.

Evaluating the need to determine accurate allele frequencies from pools
To determine if correction for stutter offered any significant advantage over just using the ratio of heights from Genotyper TM , 46 markers from the previous 86 were selected and frequency estimates of the sire alleles in the pools were calculated after stutter band correction [18]. This was in addition to frequency estimates from individual genotyping and ratio of height measurements that were already present for the pools. A new ratio, ER, was calculated as the estimated frequency of the smaller allele (F ES )/estimated frequency of the larger allele (F EL ). The logarithm of ER ratio was analysed in the linear model in the same way as TR (based on individual genotyping) and RH (ratio of sire allele peak heights) for the equivalent sire-marker-ASI combinations. The strength of the correlation in t-values between ER and TR, compared to RH and TR for the equivalent sire-marker test of significance was used as an indication of the need for stutter correction. Additionally, the estimated frequencies of the sire alleles were also analysed by the method of Lipkin et al. [18] with standard errors recalculated for this marker set. Technical error variance was set at 0.000722 for this analysis [18].

Linear model based analysis based on the 86 marker data set
The overall test of significance, measured by the F-test, was significant (P < 0.005) for RH, TR and SR (Tab. I). The accuracy of the selective DNA    1 Actual sire allele frequency difference between pools (from individual genotyping data) after excluding daughters with the same genotype as their sire. 2 Sire allele frequency ratio based on individual genotyping. 3 Ratio of sire allele heights from pool electropherograms. 4 The allele frequency difference method of Lipkin et al. [18] evaluated on frequencies from individual genotyping data. 5 Significance was evaluated at the marker position.
pooling technique is measured by how well it predicts significant sire-marker combinations in relation to individual genotyping. The 86 markers resulted in 235 sire-marker tests of significance. The correlation between t-values for sire-marker combinations based on analysis of the ratio of peak heights (RH) and the overall sire allele frequency ratio (TR) determined from individual genotyping data was high, with r = 0.93 (Tab. II). Therefore, technical errors associated with pooling reduced the correlation by only 0.07, from 1.0 to 0.93. When the t-values from heights analysis (RH) was compared to the t-values for SR, where alleles from dams have been excluded, the correlation decreased to r = 0.89. By contrast, a comparison of TR to SR results in an r-value of 0.96. Therefore, the correlation between SR and RH is 0.04 = 0.93−0.89 below that between RH and TR which is exactly the loss expected from the 0.96 correlation between SR and TR. This loss is caused by alleles of the same size as the sire alleles but derived from the dams and thus adding to the total frequency of these alleles in TR and RH but not SR. The mean percentage of uninformative daughters across all heterozygous markers was 26%. When the dam error is combined with the technical errors, the total reduction in correlation is 0.11 from analysing the ratio of heights in the linear model (Tab. II).

Significance testing by the method of Lipkin et al. [18] based on the 86 marker data set and individual genotyping
The agreement between t-values for all sire-marker-ASI effects from the analysis of TR to the mean z-scores using the Lipkin et al. [18] method was good (correlation of 0.99, Tab. II). The observed correlation between the Z-values and SR is the same as that between TR and SR (r = 0.96, Tab. II). Thus, there is no difference between estimating the error variance using the replicate pools or using the properties of the binomial distribution and technical error contribution as employed by Lipkin et al. [18].

Comparing interval mapping and single marker analysis
Out of 235 sire-marker combinations, only 218 comparisons were possible, since some of the markers had not been completely genotyped in all of the families at the time of carrying out the interval mapping. The correlation between t-values from QTL Express and RH (ratio of heights) is 0.76 (Tab. II). This improves slightly between TR and QTL Express data to 0.81. The correlation is the highest when the comparison is between t-values from the analysis of SR in the linear model and QTL Express output (r = 0.83).
At a significance level of P < 0.05, the selective DNA pooling methodology correctly predicted 11 sire-marker combinations that were also significant by interval mapping (Tab. III). Seventeen sire-marker combinations were identified as instances of false-negatives, where significance was found with interval mapping but was not picked up by selective DNA pooling. Another 16 sire-marker combinations were false positives from the selective DNA pooling design. One hundred and seventy-four of the 218 sire-marker combinations were found not to be significant by both approaches. Increasing the significance threshold to 0.01 helped to reduce the proportion of false positives, but had only minimal effect on the percentage of false negatives with respect to interval mapping. A third scenario was tried where the significance threshold for the QTL Express data was made more stringent (P < 0.05) in relation to the RH output (P < 0.1). However, it made no difference in terms of reducing the false negatives, with proportions remaining the same as when the significance threshold for both RH and QTL Express data was set at 0.05.

Using estimated frequencies based on 46 marker data set
There were a total of 123 sire-marker-ASI tests for the 46 markers that were evaluated in this study by the shadow correction method of Lipkin et al. [18]. The correlation between ER and TR represented as r(ER, TR) = r(RH, TR) was 0.95 (Tab. IV). Hence, the use of estimated allele frequencies based on the stutter correction (ER) offered no advantage over the analysis of the simple ratio of electropherogram heights (RH). The correlation between RH and Table III. Efficiency of a selective DNA pooling screen to identify markers for interval mapping.  The Lipkin et al. [18] method was evaluated on the estimated frequencies for the 46 marker data set. ER is the ratio based on estimated frequencies.
TR is comparable to the correlation of 0.93 that was obtained earlier for the larger data set using 86 markers (Tab. IV). The slight improvement in this instance was likely because the best "behaving" markers were selected for estimating frequencies by the stutter band correction method. As found with the 86 marker data set, r(ER, Lipkin) = 0.99. This high correlation is because the same data based on estimated frequencies for the 46 markers was used in both cases. Thus, the linear model gives a comparable result to the Lipkin et al. [18] method of analysis. The QTL Express data for the 46 marker data set (Tab. IV), agree very well with the 86 marker set (Tab. II).

DISCUSSION
The selective DNA pooling strategy is beginning to be routinely used as a part of initial genome scans to identify chromosomal regions for subsequent follow up through individual genotyping [12,22]. While there have been some attempts at computer simulations to study the effects of technical errors and the prevalence of sire alleles in the dam population on the power of the selective DNA pooling design [2], there has been no report of an empirical evaluation of these errors and their impact on QTL detection. The individual genotyping data provided an excellent opportunity for such an empirical assessment of the sensitivity of this methodology. Below we discuss in turn the effect of several sources of error.

Technical error in pooling
The impact of technical errors on the sensitivity of DNA pooling methodology was found to be small (correlation of 0.93 between TR and RH, Tab. II).
We have tried to minimise the impact of DNA quantification errors by using quantitative competitive PCR to measure the DNA concentration [20]. Unfortunately, some of the extracted DNA was of very low concentration, so even a small error in the DNA concentration for these samples would result in too much or too little DNA being added to the pool. In addition, DNA concentration varied from sample to sample. This meant that the volume to be aliquoted had to be adjusted each time in order to pool equal amounts of the DNA from each cow. Variability in PCR and in the heights reported by the sequencer might have also contributed to decreasing the correlation between TR and RH, but these errors were small (unpublished results).

Use of "band height" instead of estimated allele frequencies
Another possible source of error is the non-correction for stutter in calculating the simple ratio of heights. The results from this study suggest that the correction for stutter does not offer any advantage over the use of the peak heights, where the correlation (r) between (ER and TR) is the same as r(RH, TR) = 0.95 in Table IV. This is not surprising if we assume that PCR amplification of pools from the same sire, under identical conditions, will produce similar stutter and differential amplification patterns. Therefore, comparing pools within families should effectively eliminate most of the problems associated with these artefacts [6]. Significant t-values are therefore, more likely to reflect actual differences in sire allele distribution between the pools [14]. Interestingly, r(RH, ER) equals 0.98 as opposed to the 1.0 that would be expected based on the earlier relationship through TR. Correction for stutter effects in pools by the correction procedure of Lipkin et al. [18] is likely to introduce its own errors in the frequency estimation. Therefore any slight loss in accuracy from using the raw heights without stutter correction will probably equal the loss of accuracy from frequency estimation. The implication from these findings is very significant since it means that the extra effort involved in setting up stutter profiles to estimate marker allele frequencies from pools is not required.

Contribution of dam alleles to inaccuracy in the analysis
The ability to calculate the true sire allele contribution to each pool (SR) was possible because of the availability of individual genotypes for the cow progeny. The t-values from the analysis of SR in the linear model were assumed to be the gold standard against which all other estimates were compared. The relatively higher correlation of 0.96 between TR and SR, compared to 0.89 between RH and SR, indicates that the dam contribution to the sire alleles in conjunction with the technical errors decreased the correlation by 0.11. These findings were consistent with the daughter design used for this study, where the sires are mated at random to the dams. Depending on the frequency of the sire alleles in the general population, the inheritance of these allele types from the dams could contribute substantially to the total error of this method with the use of half-sibling experimental designs.

Comparison of use of a simple linear model to the established method of Lipkin et al. [18]
Our method differs from that of Lipkin et al. [18] in two respects -use of band heights instead of estimated allele frequencies and use of a linear model with error variance estimated from replicate pools instead of an error variance based on the binomial distribution and technical errors associated with the pooling methodology. We investigated both these effects separately and together and found that neither greatly affected the results. Therefore either method can be used.

Concordance between t-values from single marker analysis and those from interval mapping
Interval mapping is a much more sophisticated approach that uses information from nearby markers as well as individual phenotypes. From Table II, even under the best case scenario, where technical errors are non-existent and it is possible to obtain true sire allele proportions for pools (SR), the correlation with interval mapping data is only 0.83. This would suggest that selective DNA pooling can never be 100% predictive for an equal number of samples in both designs. However, an important factor that must be considered in respect of interval mapping based on individual genotyping is the prevalence of microsatellite genotyping errors [13]. Such errors are almost impossible to eliminate completely and will impact the accuracy of any interval mapping study. Additionally, results from interval mapping analysis are heavily dependent on model assumptions with different models likely to yield different estimates [10].
The implicit assumption in performing this comparison was that the distribution of the test statistic (t-tests) under the null distribution of no linkage is the same for both methods. While the type I error rate for the two methods may differ slightly from the nominal significance level due to the approximations made by each method, the comparison between the methods has been made at both P < 0.05 and P < 0.01 and gives similar results. This evaluation clearly shows that markers found to be significant in an interval mapping analysis will not necessarily be detected from analysis of pooled genotyping and vice versa.

CONCLUSIONS
In order to maximise the power of selective DNA pooling compared to interval mapping, more individuals need to be included in the pools (minimising sampling errors) as also pointed out by Zou and Zhao [26]. This should compensate for technical errors inherent to the pooling methodology as well as loss of individual genotypic information from DNA pooling.
Barratt et al. [3] examined the effects of errors at each experimental stage of the pooling design and concluded that construction of numerous smaller pools from the same tail was preferred over fewer and larger pools to establish an acceptable trade-off between the need for accuracy and costs. More recently, Cohen et al. [5] described a fractionated pooling design to address problems specific to mapping QTL in animals. In this design, individuals are randomly grouped into multiple pools (> 2) at each tail within the different sire families.
A higher density of markers needs to be used to maximise the chance that there will be at least one marker in close proximity to the QTL. Pools need to be constructed as accurately as possible to minimise technical errors. When these precautions are combined with a multi-stage design [11], it is likely that selective DNA pooling will be much more successful in identifying markers worth pursuing by individual genotyping for interval mapping using a halfsibling experimental design.