Theoretical aspects of applying sib-pair linkage tests to livestock species

The Haseman-Elston (HE) sib-pair linkage test in its original form is computationally simple but suffers from low power. With the advent of highly polymorphic markers, the exclusive use of fully informative matings (ie matings where the number of genes identical by descent for any sib pair can be inferred without error) for the HE test becomes feasible. This article examines the influence of highly polymorphic marker systems (5 alleles), large family sizes (6 full-sibs) and hierarchical breeding structures (mating ratio of 25) on the power of the HE test by means of simulation studies. Simulations are performed under the assumption that the costs of marker genotyping are a limiting factor for marker-QTL linkage studies. Consequently, the total number of individuals (parents and offspring) typed is fixed at 5 000 in each of the situations compared. The results show

1 Institut für Tierxucht und Haustiergenetik der Universität Göttingen, Albrecht-Thaer Weg 1, D-3l00 Göttingen, Germany; 2 Institut National de la Recherche Agronomique, Station de C9n g tique Quantitative et Appliquée, 78352 Jouy-en-Josas Cedex, Fmnce (Received 26 March 1991;accepted 12 December 1991) Summary -The Haseman-Elston (HE) sib-pair linkage test in its original form is computationally simple but suffers from low power. With the advent of highly polymorphic markers, the exclusive use of fully informative matings (ie matings where the number of genes identical by descent for any sib pair can be inferred without error) for the HE test becomes feasible. This article examines the influence of highly polymorphic marker systems (5 alleles), large family sizes (6 full-sibs) and hierarchical breeding structures (mating ratio of 25) on the power of the HE test by means of simulation studies. Simulations are performed under the assumption that the costs of marker genotyping are a limiting factor for marker-QTL linkage studies. Consequently, the total number of individuals (parents and offspring) typed is fixed at 5 000 in each of the situations compared. The results show that the power of the HE test is considerably increased when both highly polymorphic markers and large full-sib families are available. For example, for a locus explaining 8% of the phenotypic variance the power of the test increases from 14 to 74% if the locus has 5 alleles instead of 2 and sibship size is 6 instead of 2. Hierarchical breeding structures tend to further increase the power of the test, for the example given from 74 to 79%.

INTRODUCTION
The sib-pair linkage method of Haseman and Elston (1972) is a tool for the detection of linkage between markers and quantitative trait loci ((aTLs). The major advantage of the Haseman-Elston method is its computational ease, allowing fast screening of a large number of marker loci and traits. Further, the method is robust for a large variety of continuous distributions of the quantitative trait (Blackwelder and Elston, 1982). However, in its original form the method suffers from low power (Robertson, 1973), except when the effect of the QTL is very high and linkage between marker and QTL is tight (Blackwelder and Elston, 1982).
In typical animal breeding situations, the possibilities for the estimation of effects are often more advantageous than in human populations. Usually one has larger families, complete pedigree information and markers are available for parents and offspring. The advent of new, highly polymorphic markers such as minisatellites (Jeffreys et al, 1985) and microsatellites (Weber and May, 1988;Soller and Beckmann, 1990) increases the probability of informative matings and because of that also the probability for the detection of given linkage relationships.
The objective of this paper is to examine the power of the Haseman-Elston test in animal breeding situations under the assumption of the availability of highly polymorphic marker systems.

The Haseman Elston test
The linkage test of Haseman and Elston (1972) is based on the idea that the greater the number of alleles a pair of full-sibs shares identical by descent (ibd) at a marker locus which is linked to a QTL, the smaller the difference in the values of the quantitative trait which is affected by the QTL. A generalized description of the method is given by Elston (1990). The number of genes ibd at the trait locus for 2 full-sibs can either be 4, 1 or 2 but inference on this number depends on the parents' and sibs' genotypes at the marker locus. The proportion of genes ibd at the marker locus for sib pair j(-7 r j ,,) is estimated from the parents' and offspring's genotypes and the regression of the squared difference of the sibs' phenotypic values on !r!,n is calculated. If linkage between the marker and a QTL exists, this regression is expected to be negative. Haseman and Elston (1972) assume random mating with respect to the marker locus, linkage equilibrium and no effect of the marker on the trait locus. The phenotypic value of the ith sib of the jth sib pair is assumed to be of the form: where p is the overall mean, 9ij is the genotypic value at the trait locus and e ij is the residual effect including genetic effects due to all loci other than the linked QTL. It is assumed that e j = el! -e 2i is a random variable with zero mean and variance Q e.
In a random mating population with a trait locus showing a given additive genetic variance (a a 2) and no dominance, the expectation of the squared difference of the 2 sibs' phenotypic values (Y j = !xl! -x2!!2) given the proportion of genes ibd at the trait locus (!r!t) is shown by Elston (1990) to be: In practice 7 r jt is not known but has to be estimated by the number of genes ibd at the marker locus ( 7 r jm ). Elston (1990) shows the expectation of l j to be: if there are only 2 alleles at the trait locus, where 0 is the recombination frequency between the marker locus and the QTL. This expectation can be written in the form: .  (1989) showed that the estimator of Q is unbiased even if there is dominance at the trait locus, provided that information on the marker genotype of the parents is available.
Effect of marker polymorphism Elston (1990) gives formulae for the estimation of !r!&dquo;i from the parents' and sibs' genotypes at the marker locus. Since this study deals mainly with highly polymorphic markers, it is assumed here that a large number of informative matings is available. In animal populations there are usually large full-sib families which makes it worthwhile to type the parents first and, according to these results, only the offspring of fully informative matings. A fully informative mating in this sense is a mating where both parents are heterozygous and have at least 3 different alleles at the codominant marker locus which corresponds to mating types VI and VII in table II of. Haseman and Elston (1972). Thus, the number of genes ibd for a sib pair can be inferred without error.
The frequency of these matings in a given population depends on the number of alleles at the marker locus and their frequencies. In the general case of n alleles and unequal gene frequencies (p i ), the expected proportion of fully informative matings under random mating (PFIM) can be written as: Taking into account the number of animals usually tested in mapping experiments, it is unlikely for a system to be declared as highly polymorphic if one or two of the alleles show extreme frequencies. Therefore, the case of equal allelic frequencies (p = 1/n) is considered here and the proportion of informative matings is then given by: Table I gives the expected PFIMs for various values of n. This proportion increases with increasing number of alleles. For loci with 9 alleles or more, less than 25% of the matings have to be rejected. For more than 15 alleles, the further increases in the proportion of fully informative matings are only small.

Effect of breeding structure
Most animal populations have a very different structure from the human populations for whom this test was originally designed. In general the breeding structure for poultry, pigs and fish is favorable for effective testing because large full-sib families are available. Sheep and goat populations have an intermediate value for the application of the test whereas cattle populations show a very unfavorable structure. Blackwelder and Elston (1982) show that, under the null hypothesis of no linkage, the s(s &mdash; 1)/2 sib pairs from a sibship of size s can be treated as independent without affecting the type I error rate, so that treating sib pairs as independent should provide a test with good power, and a correct nominal value. In addition to increased power, the study of large full-sib families requires typing of fewer parents resulting in a reduction of overall costs. Here, comparisons will be made for a given overall cost of typing, assuming that there is no limitation of the number of individuals measured for the trait of interest. In that case, if a proportion PFIM can be selected among the families available, the number of sibships ( f ) of size s which can be measured given a total number N of typed animals is: Table II gives the numbers of parents and offspring for the 3 variants which will be considered in the simulations. For the first variant ( &dquo;standard&dquo; ), for which all types of matings are used, 1250 sib pairs from 1 250 families are generated, giving a total of 5 000 typed animals. From these 5 000 animals 2 500 have to be measured for the quantitative trait. The second variant has a PFIM of 0.576 (see table I for n = 5). From 1587 typed couples f = 914 show a fully informative mating type. These 914 couples have a total of 1 828 progeny, resulting in a total of 5 002 (2 * (1587 + 914)) typed animals. In the case of families with 6 sibs, 3 168 offspring from f = 528 families can be measured, an increase of 75% in the number of offspring as compared to the second variant.
Up to now, the families were assumed to be unrelated. However, in animal breeding populations one male is usually mated to several females. This gives rise to genetic covariances between families for the polygenic part of the genotype. Since the variable 5 J is derived from the difference of the phenotypic values of 2 full-sibs, it is clear that there are no covariances between the Y ! s of half-sib families that have one parent in common. Thus, the power of the Haseman-Elston test is not directly influenced by the genetic structure of animal populations.
However, if the cost of marker genotyping is a limiting factor and the families to be genotyped are selected in a 2-stage procedure, the number of measured offspring given a fixed number of assays can be considerably increased. We consider again the case of a population from which fully informative matings are selected for genotyping of the offspring with a fixed overall number of typed animals (N). In a first step only sires are typed and the heterozygotes are selected for genotyping of their mates. In a second step, heterozygous dams with at least one allele different from their mate's are selected and have their offspring genotyped. Then, the final number of families ( f ) measured for quantitative traits depends on: sibship size (s) ; mating ratio (r = number of dams per sire); selection rate of sires (m l ) ; selection rate of dams given heterozygous sires ( M2 ).
The number of families is then: With n equally frequent alleles [6] can be written as PFIM = MlM2 with m l = (n &mdash; 1)/n and m 2 = ((n -1)/nl -2/n 2 . Note that [8] does not reduce to [7] when r = 1, because of the 2-step selection implied in [8]. Table III gives an example of the values of f for different mating ratios (r). For low values of r, the number of measured families increases rapidly with increasing mating ratio. The largest effect of this strategy can be observed for the case of a polymorphic marker in 2 sib-families. Beyond a ratio of 10 the value of f converges rapidly towards the limit for r equal to infinity, which is appropriate if only one male is used.

Simulation studies
Simulations were performed to examine the impact of 4 factors on the power of the Haseman-Elston test: i) fully informative matings; ii) family size of 6 full-sibs; iii) within-family environmental correlation (c 2 ) ; and iv) a typical pig breeding structure. Other factors varied were: 1) variance due to the linked QTL (relative to phenotypic variance) and 2) recombination frequency between the marker and the QT L .  Table IV gives the range of variation for the different parameters. Each simulation was replicated 500 times. The sizes of the examined sibships were 2 and 6, respectively. For the larger sibship size the test was based on all possible sib pairs within the sibship, as proposed by Blackwelder and Elston (1982), resulting in 15 comparisons per sibship. The gene frequencies at the trait locus were p = q = 0.5 and additive gene action was assumed. At the marker locus the gene frequencies were 0.5 for the the standard method (assuming 2 alleles) and 0.2 (corresponding to 5 alleles) for the case of fully informative matings. In the polyallelic case only the offspring of fully informative matings as defined above were considered as being typed and thus included in the analyses. All simulations were calculated for a total of 5 000 assays.
To check the significance level, simulations were performed under the null hypothesis (no effect of QTL or recombination frequency = 0.5) for all of the variants presented here. The empirical significance level was determined as the percentage of replications that, under the null hypothesis, exceeded the critical value of a 1-sided t-test with type I error of 5%. In none of the cases was this percentage significantly different from 5%, in a test based on the binomial distribution. The SAS univariate procedure (SAS, 1989) indicated no significant departure of the regression coefficients from normality.

RESULTS
In this section the impact of fully informative matings arising from highly polymorphic markers and of sibships of size 6 on the power of the Haseman-Elston test is examined. Columns 3 and 4 of table V show the effect of using only fully informative matings on the power of the Haseman-Elston test. The column for the standard version of the test shows the poor power of this method for the QTL effects considered here. When the QTL contributes 16% to the phenotypic variance, which is equivalent to 1.1 phenotypic standard deviations between the 2 homozygotes, the power is only 33%. The use of fully informative matings hardly increases power, except for higher QTL effects. However, even for the largest QTL effect the power of the test is below 50%. For lower QTL effects the power shows more or less erratic fluctuations with increasing recombination rate, while for a QTL effect of 16% the power is reduced between 15 and 20 percentage points when the recombination rate increases from 0 to 0.1. The use of larger families leads to a major increase in the power of the Haseman-Elston test. QTL effects of 8% can be detected with more than 50% power if the recombination rate is 0 or 0.05 (column 5, table V).
The effect of a within-family environmental component on the power of the test is given in table VI. This component reduces the within-family variance and thus increases the power of the test. The average increase is 59% of power for the 2-sib and 40% for the 6-sib families. In the latter situation, a QTL effect of 4% can be detected with nearly 50% power if a common environmental component is present and there is no recombination between the marker and the QTL.
In the simulations of hierarchical breeding structures the numbers of families for r = 25 were used. The results are given in table VII. It can be seen that in this situation the test based on 2-sib families is still not competitive. In the case of 6-sib families one should be able to detect a QTL effect of 8% with power between 48 and 79%. For smaller QTL effects power is not sufficient unless there is additional &dquo;support&dquo; from common environmental effects. Soller and Genizi (1978) using the method of Jayakar (1970) presented calculations for half-and full-sib designs. The method of Soller and Genizi (1978) for a QTL contributing 4% of the phenotypic variance of the population has been compared to our simulation results, as summarized in table VIII. The base of the comparisons is an equal number of preselected matings for both tests (fully informative for Haseman-Elston, intercross for Soller and Genizi). The test of Soller and Genizi (1978) always has less power than the Haseman-Elston test for the two heritabilities that have been tested.

DISCUSSION
The present study confirms the findings of Robertson (1973) that the Haseman-Elston sib-pair linkage method in its original form has very low power. This is especially true if the variance explained by the QTL is small as compared to the residual variance, since the variance of Y j is proportional to the fourth power of a e (Robertson, 1973). As a consequence, measures to increase the power of the test should aim at a reduction of the residual variance. On the other hand, systematic environmental and sex effects that affect the difference between full-sibs would have to be eliminated. This, as well as the increase in power due to common environmental effects, leads to the recommendation that full-sib families should be reared together, as long as no competition effects occur.
Preselection of fully informative matings leads to an increase of the power of the test, especially for higher QTL effects. Furthermore, the power estimates found by simulation agree well with the values calculated according to the asymptotic formulae given by Blackwelder and Elston (1982) and .
Thus, for a given mating structure and marker polymorphism, the necessary size of experiments for the detection of marker-QTL linkage can be calculated in advance for a given QTL effect. Power could also be increased by selective genotyping. Instead of selecting informative matings one would rather select sibships with at least one extreme individual for the trait investigated (see the 2-sib case recently analysed by Carey and Williamson (1991)). Blackwelder and Elston (1982) concluded that a significance level of 5% as chosen in this study, would not be strong evidence for linkage in view of the low prior odds in favor of 2 genes being on the same chromosome. However, since this method is a preliminary instrument for the scanning of segregating populations, a relatively high type I error could be accepted in order to reduce the risk of rejecting possible QTLs. A crucial question is whether power is influenced by using non-independent comparisons in large families. Results of Hodge (1984) suggest that the information contained in a sibship of size s approaches 2s &mdash; 3 for s > 4. However, these findings could not be confirmed in simulation studies of Blackwelder and Elston (1982)  , who treated sibships of sizes 3 and 5 as in the present study.
Their results indicate that neither type I error nor power are affected by treating all the comparisons in a sibship as independent. The same was true for the present study.
Furthermore, family sizes would not be equal in field studies. This is advantageous with respect to power since the expected number of comparisons per sibship increases with the variance of sibship size. A random variation of family size (s) in pig populations may be approximated by a Poisson distribution with parameter 8 = E(s) = V(s). Thus, as E[s(s -1)/2] = [ 8 ( 8 -1) + V(s)!/2 = !/2, the average number of comparisons per family is always greater than with constant family size and twice as much for s = 2. For pig populations, an average family size of 6 is a rather pessimistic assumption; in practice it would rather be between 8 and 9 and one such family would on average be equivalent to 32-40 families of 2 sibs.
The extension to hierarchical breeding structures increases the power if the families to be typed are selected in a 2-stage procedure. Relationships between families do not induce covariances between the 5 J and therefore need not be considered.
In a typical pig breeding situation the method can be used to detect QTLs of effects between 4 and 8% of the phenotypic variance with about 50% power for a total of 5 000 assays. This should include all economically interesting QTLs because in the supposed situation (linkage equilibrium) the phases must be determined for each sire in each generation if marker assisted selection is applied. Since the determination of linkage phases causes additional cost, only (aTLs with large effects are of interest. Another question is whether the preselection of fully informative matings may give rise to false linkage. Preselection is made only with regard to the marker genotype. Therefore, false linkage should not occur as long as there is linkage equilibrium between the marker and the QTL. However, this cannot be proven until the QTL genotype can be determined directly. As suggested by an anonymous referee, inferences suggesting evidence in favor of linkage in the case of linkage disequilibrium should be correct. Furthermore, falsely positive evidence should not occur for unlinked loci which are in linkage disequilibrium when data for the parents are available. However, this has never been studied for sib-pair methods. The main reason for linkage disequilibrium in animal populations is hybridization between subpopulations that have been kept separate for several generations (Lande and Thompson, 1990). If this should be the case in the population considered, one would rather use the standard methodology of multiple regression which exploits the existing linkage disequilibria.
A comparison of the present results with those from other workers is difficult since only few investigations deal with the problem of detecting linkage within segregating populations. The comparison with the method of Soller and Genizi (1978) showed that the power of this method is inferior to the Haseman-Elston test. The reason is that the Soller and Genizi method favours half-sib families in the order of 1 500 animals and is thus better suited to dairy cattle populations. Weller et al (1990) introduced the granddaughter design which leads to a considerable increase of power for a given number of assays compared with the Soller and Genizi (1978) design. It also depends on highly polymorphic markers and its range of application is limited to dairy cattle populations because this method requires large numbers of sons per sire and granddaughters per son to be effective. Beckmann and Soller (1988) considered crosses between segregating populations.
Their method is based on F 2 crosses and thus requires additional testing of F i individuals, with at least 2 generations and a special experimental design. It also implies 2 parental populations close to fixation (p > 0.8) for alternative QTL-alleles and a difference between the 2 homozygotes of at least 0.4 standard deviations. The authors suggest that this is more likely for traits like disease resistance than for the traditional performance traits.

CONCLUSIONS
With the advent of multiallelic markers, the Haseman-Elston sib-pair linkage method becomes more powerful for the detection of linkage between markers and QTLs. However, the marker should at least have 5 or more alleles at intermediate frequencies in order to reduce the number of parent animals to be typed. The method allows the detection of linkage within any segregating population. It can serve to indicate whether more sophisticated methods, that estimate recombination frequency and allele effects but require special mating plans, are appropriate. The method can also make use of multivariate data as shown by Amos et al (1990).
The power of the method increases considerably with increasing family size and with hierarchical breeding structures. Thus, it is especially useful for species with high reproductive rates such as pigs and poultry.  extended the test to any type of non-inbred relative pair. Thus, it could also be applied to dairy cattle where large half-sib families are available. However, on average twice as many half-sib pairs are needed as compared to full-sibs and at present there exists no possibility of combining different types of relatives in one analysis.