Use of sib-pair linkage methods for the estimation of the genetic variance at a quantitative trait locus

Emploi des methodes d'evaluation des liaisons genetiques par les couples de germains pour estimer la variance genetique a un locus de caractere quantitatif. Jusqu'a une periode recente, le test de liaison genetique de Haseman et Elston, base sur les couples de germains, ne pouvait etre utilise que pour la mise en evidence de liaisons entre un locus a effet quantitatif (QTL) et un locus marqueur. Il n'etait pas possible d'estimer la part de la variance genetique totale liee au QTL, ni le taux de recombinaison avec le locus marqueur. Suite au developpement de cartes denses dans la plupart des especes d'elevage domestiques, chaque QTL est susceptible d'etre localise entre 2 locus marqueurs flanquants. Dans cette situation, le test de Haseman-Elston peut etre modifie pour estimer a la fois la variance du QTL et les taux de recombinaison avec chacun des locus marqueurs flanquants. Dans le present article, 2 methodes d'estimation de la variance du QTL basees sur les differences quadratiques des performances des germains sont developpees : l'une n'estime que la variance du QTL, en revanche l'autre estime la variance du QTL et les 2 taux de recombinaison. Une etude de simulation permettant d'apprecier la puissance et la qualite des 2 methodes d'estimation est presentee. La methode permettant d'estimer la variance du QTL uniquement apparait plus puissante que la seconde. Chaque methode donne des resultats assez proches des vraies valeurs en ce qui concerne la variance du QTL. Les taux de recombinaison sont en revanche globalement sous-estimes.

Summary -Until recently, the sib-pair linkage method of Haseman and Elston could only be used for the detection of linkage between a quantitative trait locus (QTL) and a marker locus. It was not possible to estimate the amount of genetic variance contributed by the QTL or its recombination fraction with the marker locus. With the advent of dense marker maps for nearly every domestic species, every QTL should be located between 2 flanking markers. In this situation, the Haseman-Elston test can be modified to estimate the variance of a putative QTL as well as its recombination fractions with the 2 flanking markers. In the present paper, we derive 2 different estimation methods for the QTL variance based on the squared performance of full sibs: in one only the QTL variance is estimated, while in the other both the QTL variance and the recombination fractions are estimated. The method that estimates only the QTL variance turns out to be more powerful than the other. With respect to the estimation of QTL variance both methods give results close to the true values. However, the estimation of recombination fractions resulted in an overall underestimation of the true parameters.
sib-pair linkage / quantitative trait locus / genetic marker / genetic variance / recombination fraction * Correspondence and reprints Résumé -Emploi des méthodes d'évaluation des liaisons génétiques par les couples de germains pour estimer la variance génétique à un locus de caractère quantitatif. Haseman and Elston (1972) developed the idea of detecting linkage between a genetic marker and a quantitative trait locus (QTL) by examining the squared difference of the performance of full-sibs. Other studies (Blackwelder and Elston, 1982; showed that the method is robust against a variety of distributions of the trait examined and that it can also make use of multivariate data ). G6tz and Ollivier (1992) found that in animal populations, especially in pigs, the power of the method is at least comparable to that of methods based on the analysis of variance. However, the Haseman-Elston method in its original form could only detect linkage between a marker and a QTL, but could not estimate whether this was due to a QTL with large effect at a large distance, or to a QTL with small effect that is closely linked to the marker.

INTRODUCTION
Studies for the establishment of a complete linkage map of the pig genome are under way (Anderson et al, 1993;Rohrer et al, 1994). This will lead to the situation that in the near future every QTL of economic importance will be in the vicinity of 2 flanking markers. This article will show how sib-pair linkage tests can be applied for the estimation of the variance caused by a QTL located between 2 flanking markers. A simulation study will be presented to examine the power and properties of the method. Haseman and Elston's test (1972) is based on the idea that the difference in the performance of full-sibs becomes smaller if the sibs share a larger proportion of alleles identical by descent (ibd) at a QTL with large effect. Elston (1990) gives a general description of the method that will only briefly be outlined here for the simplified case of a QTL with no dominance. The basic variable of the Haseman-Elston test is the squared difference (Y j ) between 2 sibs (1 and 2) within a family j:

Theory
Given the proportion of genes ibd at the QTL (!r!t), Elston (1990) shows that the expectation of Y j is: where a' is the additive genetic variance due to the QTL and ae the variance of the difference of all other genetic environmental components. Since the proportion of genes ibd at the QTL cannot be observed, the proportion of genes ibd at the linked marker locus ( 7 r j m) must be used to estimate 7 r jt . The expectation of Y j given !r!! is: where B is the recombination frequency between QTL and marker locus. This is a general linear regression equation and can be written as: The expectation of the regression coefficient is: where b is an estimator of !3. This expectation is zero if either Q9 is zero or 0 is equal to 0.5. Blackwelder and Elston (1982) showed that the distribution of the estimated regression coefficients is asymptotically normal. Thus, a simple one-sided t-test can be applied to test whether the regression coefficient is significantly negative. However, it can also be seen from the expectation of b that a significantly negative estimate can result from a large 0 together with a large QTL effect or from a small QTL effect and tight linkage.
To estimate 0 and Q q, we suppose that there are 2 markers flanking the QTL. This assumption seems valid in the case where a complete marker map exists. The number of parameters to be estimated increases to 3: 2 recombination frequencies, which will be designated 0 1 and 0 2 , and the QTL variance Q q. The total recombination frequency between the 2 markers (0 t ) can be supposed to be known from a mapping experiment or can be estimated directly from the data. Method I. Estimation using 2 separate tests of linkage Two different approaches can be taken to estimate a § in the case of 2 markers. The first approach arises in a situation where separate test linkage for 2 markers lead to significant results. If the marker loci are known to be linked, all 3 parameters can be estimated using the expectations of the 2 regression coefficients (b 1 and b 2 ). As a side condition, the relationship between the 2 recombination frequencies and 9 t is needed. This relationship can be assumed to be known, if assumptions about the mode of interference are made. Throughout the rest of the paper we will assume no interference between 0 1 and 8 2 .
Since B 2 can be inferred from 0 1 and 0 t via equation [5], solutions for the 3 unknowns can be found. However, because the range of possible values for the 2 regression coefficients is theoretically between plus and minus infinity, there is not always a solution in the range of real numbers.

Method II. Estimation using the combined information of 2 markers
The second approach starts out from the fact that from equation [2] the estimator of the regression coefficient divided by &mdash;2 is already a biased estimator of o-q 2.
In the single marker case, the bias increases rapidly even for small recombination frequencies, rendering the estimator practically useless. If, however, the data is restricted to sib-pairs with the same proportion of genes ibd at both marker-loci ( 7 r jml = 7 r j &dquo; 2 ) 1 then in the majority of cases the proportion of genes ibd at the QTL ( 7 r jt ) is equal to the proportion of genes ibd at the 2 marker loci. This will not occur in 2 rare situations: i) in case of double recombination and the 2 recombinations take place on either side of the QTL; and ii) if 2 separate recombination events in 2 sibs take place on different sides of the QTL. Consequently, the proportion of alleles ibd at both marker loci is a reliable estimator of !r!t. The price to be paid for this is that the proportion of usable sib-pairs is reduced by a factor that can be expressed as: where Tf t = (1 -20 t + 20 ¡ ). In the case of a 20 cM marker map and informative matings, 55% of the sib-pairs would be selected, and if the markers were at distances of 4 cM that fraction would increase to 86%. The expectation of F § given a certain proportion (x) of alleles ibd at the marker loci is: which can again be written as a linear function of 7rjml : v where 1Jt 1 = (1 -201 + 20i) and !2 = (1 -202 + 2 B 2) .
From this it follows that the expectation of the regression coefficient (b o ) is: Again, a9 = -b o/ 2 is a biased estimator of the QTL variance. Whether the bias is acceptable or not, it depends on the size of the biasing factor in the range of realistic values for O t . Figure 1 shows the value of k as a function of 0 1 for 4 different values of O t .
The maximum bias always occurs if 0 1 = 0 2 . The maximum is not equal to O t/ 2 because with no interference, the 2 recombination rates do not act additively. For large values of Bt, k can take values down to 0.93, while for smaller values the bias is negligible. Figure 2 shows that range of possible values and the expectation of k depending on O t . It can be seen that the expectation of k results in a bias of less than 5% over the whole range considered.
The expectation of k for a given 0 t can be easily calculated. Since k is always between 0 and 1, a second estimator for the QTL variance can be derived by dividing the initial estimator by the expected value of k: where E(k) is given by: Simulation A simulation study was conducted in order to examine the power of the 2 methods and the goodness of estimation. Data were simulated according to the following model: ' where: r z j = phenotypic value of animal i in family j !' f 1, = overall mean q2! = effect of the QTL genotype of animal i 6f! = sire's contribution to polygenic breeding value (without QTL genotype) bv dj = dam's contribution to polygenic breeding value (without QTL genotype) !2! = Mendelian sampling effect ce j = effect of common litter environment eZ! = residual error For the constant parameters in the simulation, the following values were used: total phenotypic variance was set to 1000, the heritability of the trait was 0.3 (including the QTL effect) and common environmental variance was 0.2. The population structure simulated was that of a typical pig-breeding situation with 25 sires, 10 dams per sire and 8 progeny per sire-dam pair. Thus, a total of 2 000 progeny were simulated in each replication. For a discussion of the effects of the mating structure and common environment, see G6tz and Ollivier (1992). Gbtz and Ollivier (1992) found that the use of fully informative matings can increase the power of the Haseman-Elston test for a given number of genotypings.
Consequently, only these matings were used in the calculations. This has no consequence for the validity of the results, but it should be borne in mind that the number of genotyped individuals in practice would be slightly higher than 2 275. Within any family, all possible differences between full-sibs were used for the calculation of Y!s as proposed by Blackwelder and Elston (1982). This resulted in 28 comparisons per family and 7 000 comparisons per round of simulation.
Variable parameters in the simulation were: (i) the distance between the 2 markers (0 t ) (ii) the position of the QTL between the 2 markers as expressed by 0 1 and 0 2 (iii) the size of the QTL effect The distance between the 2 markers was varied approximately between 0.04 and 0.154, assuming no interference. The combinations of 0 1 and 0 2 that were simulated are given in table I. Two codominant alleles with equal frequencies were assumed at the QTL. For both marker loci 10 alleles with equal frequencies were simulated and for the QTL effect genetic variances of 40, 80 and 120 were assumed. This resulted in a total of 30 different variants, each of them being simulated with 1 000 replications.

Analysis of simulation results
The power of the methods was defined as the percentage of replications where the null hypothesis was rejected at the 5% level. For Method II this approach is unambiguous while this is not the case for Method I. For the first method there are 2 null hypotheses of which 1 or both can be rejected at a = 0.05. Since both tests rely on the same values for F § , they are not independent so that the nominal type I errors for a global error of 5% can only be determined by simulation under the null hypothesis. However, these type I errors still depend on the 2 recombination frequencies so that the true state of nature must be known for an exact determination. Therefore, it was decided that a replication was significant for Method I if both null hypotheses were rejected at a 5% level. For the interpretation of the results it should be borne in mind that Method I has a slight disadvantage.
For the estimation with Method II only sib-pairs with the same proportion of alleles ibd at both marker loci were selected from the same data that were used for the estimation with Method I. As was explained previously, this results in a reduction of the number of effective sib-pairs of between 15% (for B l/ B Z = 0.02/0.02) and 42% (for 0 1/ 0 2 = 0.02/0.14).
To assess the goodness of the estimation, all replications of a certain variant (significant and non-significant) must be averaged. As can be seen from equations [3] and [4], the first method requires the square-root of the ratio of b i and b 2 for the estimation of a q 2. In practical applications this is not likely to cause problems, since significant regression coefficients always have a negative sign. In a simulation with a low value for the QTL effect, however, this causes problems because the estimated regression coefficients are normally distributed and a certain fraction can be expected with positive values. For these replicates a value for Q9 cannot be estimated. Because regression coefficients at positive values are all non-significant, the missing QTL variances cause an overestimation of this parameter. Table II shows the power of the 2 methods of estimation for all simulated variants.

Power of the 2 methods
For a QTL effect of 40, the power is low for both methods and all variants. However, it can be observed that Method II has higher power in all variants and that the decrease in power with increasing 0 t is less for the second method. For a QTL effect of 80, the superiority of the second method is evident. The superiority is more pronounced if the values of 0 1 and 0 2 are unequal, which is caused by an increasing proportion of replicates where only one of the 2 tests in Method I gives a significant result. If the QTL effect is 120, both methods have high power with differences occurring only if the 2 recombination rates were of very different size.
Estimation of B l , 0 2 and a9 using Method I The average estimated values for Q q are given in table III for the 2 methods. For low values of Q q an overestimation occurs, which is caused by the fact that a certain number of replications could not be calculated for reasons mentioned above. This could be as much as 24% of the replicates. With or2 equal to 80 and 120, the percentage of replicates without result decreased to 10 and 3%, respectively. In accordance with these numbers, the overestimation is less with increasing QTL variance and decreasing recombination fractions within QTL variance. For a QTL variance of 120 some slight underestimations occur with higher recombination fractions. In accordance with the fact that most of the dropouts occur if the 2 recombination fractions are of very different sizes, the worst estimates are achieved if the QTL is located close to 1 of the 2 flanking markers.
The estimated values for the recombination fractions equally suffer from the problem of replications without solution. In contrast to the estimation of QTL variance, this leads to an underestimation of recombination fractions for small values ofa q. 2 Table IV shows that for a QTL variance of 40 the estimators are heavily biased downwards. This improves with increasing values for a q 2. A remarkable decrease of the standard deviation of the estimates can also be observed. However, none of the estimates are very precise, mainly due to the low expected numbers of double recombinants within the 2 000 progeny.

Estimation of a using Method II
The estimates for a9 using the second method are also presented in table III. From theory, Method II is expected to underestimate the true value of the parameter. This expectation is confirmed by the results with a single exception.
Especially for QTL variance of 40 the estimates are clearly superior to those of Method 7. For larger values of B t the underestimation gets larger but stays within the range that can be explained by the decreasing value of k.
The results for the second estimator (i!2*) are also given in table III as Method IIc. On average, this manipulation reduces the underestimation of aq from about 3% to less than 1%.

DISCUSSION
The present study has shown that with 2 flanking markers for a given QTL, the principle of sib-pair linkage methods can be applied to obtain estimates of the QTL variance and to locate the QTL in the interval. The simulation study made use of the results of G6tz and Ollivier (1992), who showed that in animal breeding the preselection of fully informative matings is an appropriate way to improve the power of QTL-detection for a given number of genotypings. If many markers are to be examined, the parents will only be informative for a fraction of the markers. Since the major costs in the given design arise from the typing of progeny, these should only be typed for the markers where their parents are informative. With Method II it should also be possible to use multiple markers as proposed by Haley et al (1994). However, without modification this only seems feasible if the linkage phases in the parents are known, recombinant sibs are excluded from the analysis and the markers are not so far apart that double recombinants become important.
The results on power for the detection of a given QTL suggest that a QTL contributing 8% of the phenotypic variance can be detected with a power between 55 and 75% for the given design. In comparison with the results of G6tz and Ollivier (1992) it must be taken into account that in the present study the QTL effect was included in the total genetic variance. In a comparable situation the power of both methods presented here is less than in G6tz and Ollivier (1992). The reasons are that the type I error for Method I is not comparable to the 5% level in the previous study and that Method II uses fewer sib-pairs. The power of Method II is superior to that of Method I in all of the simulated variants. The reason is evident, since Method I does not use the prior information that the 2 marker loci are linked with a known recombination fraction for the test of linkage. This information is only used in the estimation step, given that linkage of both loci to the QTL was detected. Future research should be directed towards a way of incorporating the information of linkage between the markers in the detection of linkage with a QTL.
One way to do this has recently been presented by Fulker and Cardon (1994). They used the information of 2 markers to estimate 7 r t and regressed g on this estimated value. Since the estimation only works if 0 1 and 0 2 are known, they use an approach similar to interval mapping (Lander and Botstein, 1989) to plot the tstatistics against the putative QTL position. The authors also encountered problems in trying to determine the correct t-value for a certain type I error rate, even for the single interval case, which they solved by simulation under the null hypothesis.
However, in every realistic scenario this problem will occur since usually many markers will be examined at a time. In this situation, none of the test statistics has a simple distribution and one would always have to escape to simulation studies in order to examine the distribution of the test statistic.
A comparison of the results of Fulker and Cardon (1994) at h 2 = 0.125 with our results (Method II) for a QTL variance of 120 shows little difference. This indicates that sibs with different percentages of alleles ibd at the 2 marker loci contribute little or nothing to the estimation of 7 r t .
In the estimation of the QTL variance, Method I is characterized by the overestimation of a § if the true value is small. In practice, this is not likely to be a problem, since significant regression coefficients are always negative, but it makes it difficult to prove the unbiasedness of the estimator. However, for a QTL variance of 120 no overestimations occurred and underestimations were in all cases less than 3%. Method II is a priori a biased estimator. The results show that the bias is small if the QTL is located close to one of the markers and that the estimation of or2can be improved by dividing the initial estimator by E(k). The maximum bias can also be quantified if the recombination fraction between the 2 markers is known. However, since Method I gives similar estimates and information about the location of the QTL, one could use Method II to detect simultaneous linkage of 2 markers with a QTL and then use Method I for the estimation. Unfortunately, Fulker and Cardon (1994) give little information about the quality of their estimator of the QTL variance. The only result they give indicates that their algorithm leads to an overestimation of a q 2 The estimation of recombination frequencies between the QTL and the markers leads to unsatisfactory results. The majority of recombination fractions were underestimated, although for higher QTL effects the estimators came close to the true values. Non-estimable replicates certainly influenced these results as well. The same observation can be made from the results of Fulker and Cardon (1994) which show relatively flat curves in the vicinity of the true QTL location and a tendency to place the QTL in the middle of the interval. In comparison, our Method I tends to locate the QTL closer to the marker with the smaller recombination fraction. Knott and Haley (1992) examined the application of maximum likelihood (ML) in outbreeding populations with a full-sib structure. The advantage of ML is the fact that it is possible to estimate the gene effects at the QTL as well as the gene frequencies. However, the computational effort is much higher for ML than for the approach in the present paper. The authors conclude that for the treatment of realistic population structures and the inclusion of fixed effects, numerical approximations are needed to render practical data tractable by ML. regression in crosses between inbred lines. This design is not tractable with our methods because of the complete linkage disequilibrium in an F 2 derived from inbred lines. However, those authors found that there 'seemed little advantage to be gained from resort to maximum likelihood methods for the analysis of these types of data'. The method is similar to that of Fulker and Cardon (1994) since both methods are based on the idea of interval mapping (Lander and Botstein, 1989).

CONCLUSIONS
The extension of Haseman and Elston's (1972) method of sib-pair linkage presented here allows for the estimation of QTL variance and recombination fractions if a relatively dense and informative marker map is available. Since the method uses only intra-family comparisons it does not need to take fixed effect into account so long as they affect all sibs in a family in the same way.
However, there are some limitations of the method that shall be mentioned here.
The first is that the results presented rely heavily on the availability of highly polymorphic markers. If one or both of the markers is not very polymorphic, the number of parents to be typed increases dramatically. For a discussion of this topic see Gbtz and Ollivier (1992). The second limitation is the dependency of sib-pair linkage tests on the magnitude of the residual variance. For traits with low heritability the power of the method is low, while high heritability and common environmental effects are favourable. In addition, large family sizes increase the power of the method (G6tz and Ollivier, 1992).
In outbreeding populations the detection of segregating (aTLs is generally more difficult than in crosses between inbred lines . However, the detected (aTLs are known to be segregating while with (aTLs detected in crossing experiments a large fraction of favourable QTLs will already be fixed in the superior line. It is doubtful whether preselection of fully informative matings will still work if many markers are to be examined, since parents will be informative only for a fraction of markers. Nevertheless, it would be possible to type the progeny only for those markers where their parents are informative.

ACKNOWLEDGMENTS
This work was supported by a grant of the 'Deutsche Forschungsgemeinschaft' which is gratefully acknowledged.