The power of two experimental designs for detecting linkage between a marker locus and a locus affecting a quantitative character in a segregating population

Summary - The statistical power of 2 experimental designs (backcrossing and intercrossing) for detecting linkage between a marker gene and a quantitative trait locus (QTL) in families derived from a segregating population is investigated. Formulae which relate power to the recombination frequency (r) between the genes, the genetical properties of the quantitative trait controlled by the QTL and the design parameters are developed. The reliability of some simplifying assumptions was confirmed by computer simulations. Application of these formulae has shown that the power of the 2 designs with population size of 1 000 was < 20% when r was 0.3 for all heritabilities of single gene considered, few large families are better than many small families, and backcrossing is generally more efficient than intercrossing. The allele frequencies and dominance properties of the QTLs have important interactions in their effects on power.

the quantitative trait controlled by the QTL and the design parameters are developed. The reliability of some simplifying assumptions was confirmed by computer simulations. Application of these formulae has shown that the power of the 2 designs with population size of 1 000 was < 20% when r was 0.3 for all heritabilities of single gene considered, few large families are better than many small families, and backcrossing is generally more efficient than intercrossing. The allele frequencies and dominance properties of the QTLs have important interactions in their effects on power. statistical power / marker -QTL linkage / backcross / intercross Résumé -Puissance de 2 plans d'expérience pour détecter une liaison génétique entre un locus marqueur et un locus influençant un caractère quantitatif dans une population en ségrégation. Cet article étudie la puissance statistique de 2 plans d'expérience (rétrocroisement et intercroisement de F I ) pour détecter une liaison génétique entre un gène marqueur et un locus de caractère quantitatif (QTL) dans des familles dérivées d'une population en ségrégation. Des formules sont établies pour exprimer la puissance en fonc- INTRODUCTION With the rapid development of molecular techniques in the last decade, their application to the investigation of the genetical basis of quantitative characters has become a subject of considerable activity (Botstein et al, 1980;Beckmann and Soller, 1986;Lander and Botstein, 1989). The central idea of these new investigations was to use the newly-discovered molecular markers (for example, RFLPs) at defined map positions for tracing linked quantitative trait loci ((aTLs).
Methodologically, this can be accomplished by detecting linkage between a genetic marker(s) and a QTL(s) through various appropriate experimental designs Mather, 1957, 1960;Thoday, 1961;Jayakar, 1970;Hill, 1975;Weller, 1986;Luo, 1989;Luo and Kearsey, 1989;Lander and Botstein, 1989). Hill (1975) demonstrated the use of analysis of variance for detecting linkage between a marker gene and a QTL by means of a nested backcrossing or intercrossing experiment and attempted to work out the power of these designs. However, because of the varying sizes of each of the nested groups, the numerator of the final test statistic used in the analysis of variance to detect the marker-QTL linkage cannot be expressed as a constant times a random x2 variable. Therefore, she was unable to work out analytical expression for the power of the experimental designs. Soller et al (1976Soller et al ( , 1978 suggested excluding the offspring with heterozygous marker genotypes in the power analyses of the intercross design in order to increase the power of the designs. This has also avoided the complexity caused by the unequal sample sizes among the different marker genotypes and allowed use of the normal procedure of hierarchical analysis of variance so as to set up an F-distributed test statistic. Obviously, this results in the loss of useful information and artificially inflates the expected variance between offspring marker classes. The present paper will focus on exploring a statistical approach to work out the experimental power of the designs suggested by Jayakar (1970) and Hill (1975) and relate the power directly to genetic parameters of the marker gene and the QTL and the relevant design parameters. This will allow factors affecting the power to be investigated comprehensively.

Basic assumptions and experimental design
The method involves analysing progeny from natural or controlled matings in a population. Consider 2 autosomal loci, one affects a quantitative character (QTL) while the other is a codominant marker. The 2 loci are linked with a recombination fraction of r (r' = 1 -r). Let the frequency of allele Q I at the QTL be denoted p (p = 1 -q) and the phenotypic distributions of the 3 genotypes at the QTL, ie Q 1 Q 1 , Q 1 Q 2 and Q 2Q2 are assumed to be N( IL +a, ( J ' 2 ), N( M+ d, (J' 2 ) and N(p-a, (J' 2 ) respectively, where a and d represent the additive and dominant effect at the QTL (Falconer, 1989). With just one QTL, 0 2 will be the environmental variance alone, but with other unlinked QTLs, it will also include genetic variance at these loci. The phenotypes of the 3 marker genotypes, viz M, M l , M, M 2 and M 2 M 2 are distinguishable, ie the marker locus is codominant and we assume that the QTL and the marker gene are in linkage equilibrium in the population. One can score the progeny of these families where parents are M 1 M 1 x M 1 M 2 or M I M 2 x M l M 2 (ie backcrossing or intercrossing) and record the quantitative phenotype and marker genotype. If, for example, we consider an experiment consisting of s sibships, within each of which there are m marker classes (m = 2 and 3 for backcrossing and intercrossing designs, respectively). Let nZ! represent the number of sibs within the jth marker class within the ith sibship, then the variation for the quantitative trait can be partitioned into that between and within sibships, while that of within sibships can be further partitioned into variation within and between marker genotypes. For such unbalanced 2-way nested classification data, variance components have been worked out by Searle (1971, p 475-477). If it is further assumed that each sibship has a constant size of n then the total experimental size is s x n and analysis of variance for both backcrossing and intercrossing designs is illustrated in table I, in which: following Searle (1961) and Snedecor and Cochran (1968, p 189-191).

Statistical model
In the analysis of variance described in table I, the linear model for phenotypic record of the quantitative trait measured on the kth sib (k = i, 2, ... , n2!) with the jth marker genotype (j = 1,2,..., m) within the ith sibship (i = 1,2,..., s) can be written as: where ii is an overall population mean while Q i, /3 ij and ez!! are contributions from the sibship, from the marker genotype within sibship and residual error respectively. They are assumed to be independently and normally distributed with zero means and variances o, 2, o l2 and o,2 respectively. The frequency distribution of the QTL genotypes, the expected means and variances of the progenies within the ith marker genotypes and within all possible sibships were obtained by IIill (1975), and these were carefully rederived by Luo (1989). It was found that the expected variance between marker genotypes within sibships (a2) is: and the expected variance within marker genotypes within sibships ( 0 &dquo;) is: for the intercross design; while the corresponding variances for the backcross design are: It is easily seen from equations [3.lt and [4.1] that the expected variance between marker genotypes within sibship (u M(I) or o,2 m( B) ) for either the intercross or backcross design will be statistically zero if the marker gene is not linked with the QTL, ie r = 0.5. The expected variance could also be zero if one of alleles at the QTL is fixed, ie p = 0 or 1, but these situations are trivial. As pointed out by Jayakar (1970), under the null hypothesis Ho : r = 0.5, the following ratio of mean squares: is distributed as a central F-variable with expected value of 1. However, the ratio will be a noncentral F-variable when r is less than 0.5. where F is a noncentral F-variable with degrees of freedom described in table I and the noncentrality parameter: whose definition is the same as that in Kendall et al (1983, p 37) and in Johnson and Kotz (1970, p 191).
By definition, the power function of the 2 designs for detecting the linkage can be written in the following general form: where Fv,,v2; 6 represents a noncentral F-variable with degrees of freedom vi and v 2 and noncentral parameter 6 while Fa;Vl;V2 stands for the upper a point of a central F-variable with degrees of freedom VI and v 2 .
Power calculation So far, the power for detecting the linkage by use of these designs has been shown to be a function of the recombination fraction (r) and the basic genetic parameters at the QTL, mamely the allelic frequency p (q = 1 -p), the additive and dominant effects at the QTL (a and d), the residual variance (or 2) as well as the experimental design parameters s (ie the number of sibships) and n (ie the size of the sibships).
For a given broad heritability (h') and dominance ratio (f = !) at the QTL, the b a genetic variance associated with the QTL in an F 2 population is: For convenience, let the phenotypic variance of the quantitative trait in the F 2 population be 100, the additive and dominant effect (a and d) can be solved as: and the additive and dominance effects at the QTL are obtained from: Once the design parameters (s and n) and the genetic parameters at the QLT (p, f and h') are given together with the recombination frequency between the marker and QTL (r), the value of the noncentral F-variable can be calculated by using equation (9!. For a given significance level a of the test, the power of detecting the linkage can thus be worked out through equation [11] directly by using the relevant statistical tables such as that by Tang (1938) or Tiku (1967). Although these tables are available to provide the power of an F-test they are restricted to a limited number of degrees of freedom and to a limited range of values of the noncentral parameter. However, several procedures are available to approximate the power of the F-test (Patnaik, 1949;Laubscher, 1960;Tiku, 1965Tiku, , 1967. For its higher accuracy, Tiku's 3-moment common approximation by using Laguerre series was programmed in Mathematica (Wolfram, 1991) to evaluate the experimental power in the present paper.

Power evaluation from simulations
Since approximations [6.2] and [7.2] were made in deriving the power function, the reliability of these approximations was checked by comparing the theoretical prediction of the power to the powers which were calculated from simulation experiments.
A Fortran-77 computer programme was designed for: i) simulating the inheritance of the marker-QTL linkage in the 2 nested experiments as described above for any combinations of experimental design and genetic parameters (Luo, 1989); ii) computing F-value from analysis of variance using the simulation data following the algorithm described by Searle (1971); and iii) calculating the frequency of significant F-values in replicated simulation trials as in Carbonell et al, (1992), which gives the empirical power.

RESULTS
Although the power of the 2 designs can be easily investigated at any combinations of experimental design and genetic parameters, a total experimental size of 1 000 was only considered here. The powers of the 2 designs were evaluated by both theoretical prediction and computer simulation for all possible combinations of 2 design structures (10 (sibships) x 100 (sibs) and 20 x 50), heritability h 2 = 0.01,0.05 and 0.10, allelic frequency p = 0.25,0.5 and 0.75, dominance ratio f = 0.0,0.5 and 1.0 as well as recombination frequency between the marker gene and QTL r = 0.0,0.1 and 0.3. The powers were evaluated at a significant level (a) equal to 0.05. For simplicity, only part of the results were listed in table II for demonstrating an agreement between powers evaluated from theoretical prediction and simulation based on 500 replicates (in parentheses).
The powers of the 2 designs were also computed analytically for the experimental size of 1 000 but realistically smaller size of sibsips and were tabulated in table III. It could be interesting to compare the present power predictor to that of Soller and Genizi (1978). Table III in Soller and Genizi (1978) listed the number of sibships and the total experimental sizes required for achieving a power of 90% when the allelic frequency (p), dominance ratio ( f ) and contrast at the QTL were 0.5, 0.0 and 0.01 (equivalent to 1% heritability in the present study) respectively, and the recombination frequency between the marker and QTL was zero. The powers with these population structures and the same genetic parameters were evaluated by use of the present method. The difference of the evaluated powers to 90% has been summarised in table IV.
Effects of recombination frequency between the marker and QTL (r), allelic frequency (p) and dominance ratio ( f ) at the QTL on the power of both backcrossing and intercrossing designs have been illustrated in figure 1 for a given heritability of 0.1.

DISCUSSION
Derivations in the present paper have shown that the power of the 2 kinds of designs for detecting linkage between a marker gene and a QTL can be expressed as function of design parameters and parameters describing genetic properties of the marker and QTL. The powers from theoretical evaluation agree very well with those from stochastic simulation under consideration of a wide range of situations (table II), suggesting reliability of the theoretical analysis.
Recombination frequency between the marker and QTL displayed a pronounced effect on the power when h 2 > 0.05 (tables II, III). In this case, both designs lost 70% of their power with an increase of r from 0.1 to 0.3. Moreover, the linkage would be unlikely to be detected (power < 20%) when the QTL would be linked to the marker with a recombination frequency > 0.3 when h 2 6 0.1. It has been pointed out by Risch (1991) and Collins and Morton (1991) that power is dramatically reduced when the recombination frequency is > 0.3. Recently, Luo and Woolliams (1992) studied the effect of recombination frequency between marker and QTL on accuracy of estimation of genetic parameters of the QTL with heritability of 0.1 and found that maximum likelihood estimates of these parameters is usually biased once the recombination frequency reaches 0.3.
The power of both designs increased with increasing dominance ratio at low allelic frequency (p = 0.25) (fig la), but decreased with increasing dominance ratio at high allelic frequency (p = 0.75) (fig lc). However, there was little effect of dominance on the power of backcrossing at the allelic frequency of 0.5. At the same allelic frequency the power of intercrossing still increased with increasing dominance ratio (fig 1b).
There was no evidence of effect of allelic frequency on the power of both designs when gene effect at the QTL was purely additive ( f = 0.0) (fig 1d). However, the power decreased with increasing allelic frequency when the allele displayed dominance (fig le, f). Soller and Genizi (1978) published the first comprehensive theoretical study of the same designs as addressed in the present paper but through investigating significance of contrast between means of marker genotypes of interest in quantitative trait. They concluded that the effects of gene frequency and dominance level would be important when the number of families was small. Because when the number of families is small, the probability that the contrast in each of the families be zero is so large that the power requirement will not be met for any size of family. They suggested that the probability of zero contrast would be 0.90 for backcrosses and 0.94 for intercrosses when a = d and 2pq = 0.3. Therefore, at least 22 and 34 families for the 2 designs respectively must be sampled in order that on average non zero contrasts can be expected in 2 of these families. However, if the power of these designs is calculated in the way developed in the present paper, loss in the power due to probability of sampling families with zero contrast can be avoided, since it is likely that those families with zero contrast will nevertheless contribute to significance of the variance between marker types within families. In fact, in the case of h 2 = 0.01, f = 1.0, 2pq = 0.3 and experimental size of 5 000 (10 x 500), a power of 0.76 (0.70) for the backcrossing design or 0.64 (0.62) for the intercrossing design was obtained from the theoretical prediction (simulation) in the present study.
Comparison in table IV was made between the present power predictor and that in Soller and Genizi (1978). It can be seen that the powers of the backcrossing designs were slightly higher in the present paper than in Soller and Genizi (1978).
While the present method yielded higher power of the intercrossing designs with small number of sibships and large sibship size than in Soller and Genizi (1978), in which offspring class with heterozygous genotype at the marker locus was excluded. However, with large number of sibships and small sibship size, the power of the intercrossing designs was lower in the present paper than in Soller and Genizi (1978). Several researchers (Hill, 1975;Soller and Genizi, 1978) have found that for a given total experimental size, design with fewer sibships but larger sibship size was more powerful than that with more sibships but smaller sibship size. This was confirmed by the present study. Moreover, it was found that decline in power due to smaller sibship size was more severe for intercrossing design than for backcrossing design (tables II, III). The effect of the population structure on the power is parallel to its effect on degrees of freedom of the residual expected mean square (table I).
For most animal species, realistic full sibship size is very small, eg 5 to 20, but half-sibship size might be very large. Weller et al (1990) investigated daughter and granddaughter designs and their powers for detecting the marker and QTL linkage in dairy cattle populations in which one sire might have several hundred daughters or granddaughters if breeding was by AI. By organising experiment of such animal species into half-sibship population structure, one might expect more power since the residual expected mean square would have more degrees of freedom. Comparison of power of the 2 designs revealed that backcrossing was generally more powerful than intercrossing. This agrees with the conclusion of Soller and Genizi (1978).
The present studies have not directly provided the total experimental size required for a given power under a particular genetic and design situation. The theoretical calculation in the present paper can, however, be easily used in the procedure suggested by Fox (1956) so as to obtain the size of experiment for a given power in a specific situation.