Sire design power calculation for QTL mapping experiments

Estimates of sire design power for QTL mapping experiments obtained using three different methods of algebraic approximation were analysed by comparing them with the results of data simulations. Even when the binomial probability that any number of sires out of the total number of sires are jointly heterozygous at the marker and the QT loci was taken into consideration, the algebraic approximations overestimated powers. However, they could be used to rank designs differing in the number of sires if the total size of the experiment is given. The results were discussed, focusing on the assumptions made about the number of informative offspring, the balance between the two offspring sub-groups which receive the same marker allele from the sire and the distribution of the statistic. Given that a full algebraic approach would be computationally costly, data simulation can be considered a useful tool in estimating the power of QTL detection sire designs. © Inra/Elsevier, Paris


INTRODUCTION
The use of genetic markers to locate genes whose polymorphism partly explains the genetic variability of quantitative traits was proposed by Sax [3] and further detailed by Neimann-Sorensen and Robertson [2] and others. The principle is to identify, in the offspring of an individual, those which received one or other of the two chromosomal fragments surrounding the marker in question. If a quantitative locus is located on this fragment, and if the parent is heterozygous at both the marker and QTL (quantitative trait locus), then a systematic difference is observed between the two sub-groups of progeny. With the development of molecular markers based on DNA variations, the application of these ideas has become feasible on a large scale particularly in livestock populations, where large families are routinely recorded. The design of such experiments has been studied in detail by a number of authors, in particular Soller and Genizi [4] and Weller et al. [6]. In order to optimize these designs, it is necessary to estimate their power. Focusing on simple population structures, Soller and Genizi [4], as well as Weller et al. [6], approached this power estimation considering fully balanced populations, and using approximate distributions of the test statistic. In these early papers, markers were studied one by one, and the test statistics applied were simple ANOVA methods, modelling trait means as linear combinations of sire and marker within sire effects. In their approximation, these authors worked with the asymptotic X 2 or normal approximation of the F statistic, and considered simply the mean contrast averaging different possibilities for the sire and offspring genotypes at the QT and marker loci. The power of such designs, as well as more complex experiments involving two or three generations and mixing half-and full-sib families, was further studied by van der Beek et al. !5). In their paper, these authors considered the mixture of sub-populations, as characterized by the number of heterozygous sires at the QTL, rather than the mean.
Alternatively, the estimate of the design power may be obtained by simulating heterogeneous populations and applying studied test statistics to the generated sets of data, without any approximation, but at the expense of more computing time. This approach was followed by Le Roy and Elsen [1] in a study addressing the relative value of ANOVA and maximum-likelihood methods for QTL detection.
The aim of this study is to evaluate the validity of approximate sire design power estimates, by comparing three algebraic methods with simulating data.

Hypotheses
Powers were calculated for a single marker analysis. Multiallelic marker loci (with na = 4 alleles) were studied. Alleles M i were assumed to be distributed with frequencies in a geometric series (f, = f, f 2 = o f, f 3 = a 2 f, ..., with f = 1/(1+cr+a2 ...)). In this situation, the parameter a was obtained, given the mean heterozygosity of the marker (E( f hm)), solving the equation E( f hm) _ 1 _ I:i( o : i -l )2 /(I :i o: i -l ) 2 . This marker was supposed to be totally linked to the QTL. The design was organized with np half-sib families comprising no progenies per sire. mp was the expected number of sires for which a marker contrast can be computed, i.e. the expected number of heterozygous sires at the marker locus, and lp, the expected number of heterozygous sires at both marker and QT loci. mo was the expected effective family size, i.e. the mean number of offspring per sire for which the marker allele received from the sire is identified. This effective family size is linked to the allele frequencies by the relation: mo = £ i j 2 f z f, (1 -0.5(f i + /,))/E, j 2/,/ j . The first type error cx (accepting a linked QTL when it does not exist) was fixed at 1 %.

Compared methods
The following three approximations were studied.
1) The approximation used by Weller et  2) The approximation followed by van der Beek et al. !5!: in this approximation, the variability in number of heterozygous sires at the QTL is considered. The power was given by: where xp is the number of heterozygous sires at the QTL and Pr(xp/mp) is the binomial probability that xp out of mp (the expected number of heterozygous sires at the marker locus) are heterozygous also at the QTL.
3) An approximation where variation at both the sire marker and the QT loci are considered. The power was given by: where yp is the number of heterozygous sires at the marker locus and Pr(yp/np) is the binomial probability that yp out of np sires are heterozygous at the marker locus. 4) In order to test the reliability of the three algebraic methods above, the design power was also estimated by simulating data and applying the standard F test. For each power calculation 10 000 replicates were used under the null and the alternative hypotheses. The variance ratio for the classic hierarchical ANOVA was calculated as: where Zi M1h; (resp. Zi M2k ) are the quantitative performances of the jth daughter of an heterozygous M1M2 sire i, which received marker allele All (resp. M2), and T!Mi (resp. ni l' ln) is their number. The power was estimated by the ratio between the number of replicates under the alternative hypothesis whose statistic exceeds a certain threshold and the total number of replicates.
The threshold was the (1a) percentile of the 10 000 replicates under the null hypothesis. Thus, no assumptions about the distribution of the statistic were made. Table I reports the power estimates of sire designs with a half-sib family structure for a gene effect (GE) of 0.5 or 1 phenotypic standard deviation (< 7 p), for various numbers of sires, for two total experiment sizes (tno equal to 500 or 1000 daughters), for a constant polygenic heritability h z of 0.25 and assuming a recombination rate (r) of 0. Expected heterozygosities at both loci, marker and QT, are assumed to be 0.5. Four alleles are segregating at the marker locus with frequencies 0.664, 0.229, 0.079 and 0.028. Note that the total heritability (including the variation at the QTL) equals 0.375 if GE = 0.5, 0.75 if GE = 1.0.

RESULTS
It is shown that when the gene effect is one half ap and the total experiment size is 500 daughters, the three algebraic methods give similar results and, considering that the power is low in this situation, these approximations only slightly overestimated the power as compared to the simulated data. The results for the same GE but with a total experiment size of 1000 daughters, confirm that no significant differences exist between algebraic methods except when the number of sires is low in which case Pl greatly overestimated the power. The overestimation of algebraic methods with respect to simulations is more important here than it is with a total experiment size of 500 daughters. As regards the GE of 1.0 Qr &dquo; when the total experiment size is 500 daughters algebraic results continued to overestimate power except for P3, when the number of sires is equal to 2, in which case PI gives particularly high power compared to the other algebraic and simulation methods. For a total experiment size of 1 000 daughters, PI greatly overestimated power for any considered number of sires, while P2 and P3 give results more similar to simulated data.