Mapping QTL in outbred populations using selected samples

Detection de QTLs dans une population non consanguine a partir d'un echantillon selectionne. Une simulation a ete realisee de maniere a analyser l'influence de la selection familiale et du typage selectif dans les familles selectionnees, sur la qualite d'estimation des parametres genetiques dans une population non consanguine ayant une structure de demi-freres. Les genotypes marqueurs ont ete determines uniquement pour les peres dont la descendance s'est situee aux extremites haute ou basse de la distribution phenotypique pour le caractere etudie. La descendance des peres selectionnes a ete genotypee. A l'interieur des familles selectionnees, trois schemas differents d'echantillonnage ont ete consideres: (i) aux extremites de la distribution (ii) au hasard (iii) echantillonnage exhaustif. Les donnees de controle etaient constituees de la descendance triee au hasard de peres tries au hasard. Une procedure de detection de QTL par intervalle basee sur l'approche du modele aleatoire a ete appliquee aux donnees simulees. La position du QTL et la valeur des composantes de variance ont ete estimees en utilisant une technique de maximum de vraisemblance. Par rapport aux donnees de controle, le typage selectif des peres a augmente la puissance de detection des QTLs mais a entraine des estimees de composantes de variance severement biaisees, particulierement quand la descendance extreme des peres selectionnes a ete echantillonnee. L'inclusion des donnees phenotypiques de tous les individus et non seulement ceux types pour les marqueurs ameliore la qualite d'estimation des parametres QTL sans perte de puissance de detection de QTL.

Résumé -Détection de QTLs dans une population non consanguine à partir d'un échantillon sélectionné. Une simulation a été réalisée de manière à analyser l'influence de la sélection familiale et du typage sélectif dans les familles sélectionnées, sur la qualité d'estimation des paramètres génétiques dans une population non consanguine ayant une structure de demi-frères. Les génotypes marqueurs ont été déterminés uniquement pour les pères dont la descendance s'est située aux extrémités haute ou basse de la distribution phénotypique pour le caractère étudié. La descendance des pères sélectionnés a été génotypée. À l'intérieur des familles sélectionnées, trois schémas différents d'échantillonnage ont été considérés : (i) aux extrémités de la distribution (ii) au hasard (iii) échantillonnage exhaustif. Les données de contrôle étaient constituées de la descendance triée au hasard de pères triés au hasard. Une procédure de détection de QTL par intervalle basée sur l'approche du modèle aléatoire a été appliquée aux données simulées. La position du QTL et la valeur des composantes de variance ont été estimées en utilisant une technique de maximum de vraisemblance. Par rapport aux données de contrôle, le typage sélectif des pères a augmenté la puissance de détection des QTLs mais a entraîné des estimées de composantes de variance sévèrement biaisées, particulièrement quand la descendance extrême des pères sélectionnés a été échantillonnée. L'inclusion des données phénotypiques de tous les individus et non seulement ceux typés pour les marqueurs améliore la qualité d'estimation des paramètres QTL sans perte de puissance de détection de QTL. &copy; Inra/Elsevier, Paris QTL / sélection familiale / typage sélectif / détection de QTL par intervalle 1. INTRODUCTION Selective genotyping is a method of quantitative trait locus (QTL) mapping in which the analysis of linkage between marker loci and a QTL affecting the trait of interest is carried out by genotyping only individuals from the high and low phenotypic tails of the entire distribution of the trait values in the population [2]. Individuals that deviate most from the population mean are considered to be most informative for linkage, because their genotypes can be inferred from their phenotypes more clearly than can those for average animals (7!. For a given power, selective genotyping can considerably reduce the number of individuals genotyped at the expense of an increase in the number of individuals phenotyped. Thus, the benefits of selective genotyping depend on whether the information on the trait is readily available or whether additional expensive testing is required. In a livestock population that is part of a breeding program, performance records are easily accessible for a large number of animals. By genotyping only extreme animals, the cost of linkage analysis can be considerably reduced. An important aspect of using selected samples for QTL detection is to choose extreme sibs from parents with average phenotypic values, because such parents are more likely to be heterozygous for the CdTL. If parents have similar extreme phenotypes (either high or low) they are probably homozygous for the QTL and, therefore, the linkage would be much more difficult to detect [12]. Sires with a large within family deviation are considered to be most informative for linkage. If a QTL with a reasonably large effect segregates in the population, phenotypic deviation between the extreme offspring will be due to the presence of the alternative QTL alleles in either tail of the distribution. Phenotypic differences among individuals that are due to a large polygenic or environmental deviation will be eliminated if the families that the individuals for genotyping are sampled from are large enough. Therefore, in livestock populations with usually large half-sib families, it would be useful to select sire families with most extreme offspring prior to genotyping to ensure sufficient within family genetic variability necessary for successful detection of a putative QTL segregating in the population. However, very little research on this topic has been carried out to date. Furthermore, most of the experiments considering selective genotyping have been designed assuming a biallelic QTL and expecting an increased frequency of alternative QTL alleles in either tail of the distribution. This assumption is correct for experiments involving inbred line crosses or backcrosses, when the QTL alleles can be directly inferred from the marker alleles. This assumption, however, does not hold for outbred populations. In an outbred population, inbred lines are not easily available. Linkage phases are usually unknown as well as the number of genes affecting the trait and the number of alleles at the putative QTL. The genetic architecture and the exact mode of inheritance at the QTL are unknown. As a consequence, the allelic effects of genes cannot be estimated. In such situations, a robust method for linkage analysis, which does not require specification of the genetic model, is preferable. Goldgar [5] defined a random model for linkage analysis that has been proved to be robust against different genetic models and efficient for linkage analysis in outbred populations. Under the random model, QTL effects are assumed to be normally distributed, which leads to the estimation of the variance associated with the QTL (i.e. with a chromosomal region) instead of estimating QTL allelic effects.
The random model approach to QTL mapping in half-sib families is based on phenotypic similarity (or covariance) between genetically related individuals.
This covariance can be defined as a function of the proportion of genes identicalby-descent (IBD) that two individuals share at the loci affecting the trait.
The covariance between two relatives comprises the polygenic and the QTL component. The polygenic component consists of many genes with small effects. Thus, it is assumed that the average proportion of alleles IBD shared by two relatives equals the genetic relationship coefficient between them, i.e. 1/4 in half-sib families. On the other hand, the QTL component usually represents one major locus (QTL) with a large effect. Therefore, for the same kind of relationship, the proportion of alleles IBD shared by the relatives at the QTL differs from one pair of relatives to another. In half-sib families with one common parent the proportion of alleles IBD at the QTL ranges from 0 to 1/2. Because the QTL itself is unobservable, the proportion of alleles IBD at the QTL must be inferred from the available information on linked marker loci [6]. The greater the shared proportion of alleles IBD, the more similar are the phenotypes of the two relatives. With a larger deviation of the actual IBD proportion from the expected average value of 1/4, the power of separating the QTL from the polygenic component and the power of detecting a QTL become larger. Selective genotyping is expected to increase deviation of the IBD proportion from the average by changing the IBD proportion towards the maximum within the extreme groups, and towards zero between the extreme groups. Therefore, a QTL analysis under the random model should be more efficient if individuals for genotyping are sampled from the tails of the distribution.
The objectives of this paper have been defined as follows: 1) to examine efficiency of selection of sires, i.e. half-sib families prior to selective genotyping of the offspring; 2) to examine the impact of selective genotyping within selected families on power and estimation of QTL parameters using different sampling schemes; 3) to examine the efficiency of the random model approach for QTL mapping under selective genotyping, with information available on only genotyped individuals or on all phenotyped animals.

Data simulation and analyses
Genetic and phenotypic data were generated by Monte-Carlo simulation techniques. Mapping QTL was considered within a 20 cM long chromosomal segment flanked by two markers, both with four equally frequent alleles. For simplicity, a QTL was simulated in the middle of the segment, i.e. at 10 cM. Five codominant alleles with equal frequency were assumed at the QTL.
Parents were generated by random allocation of genotypes at each locus assuming Hardy-Weinberg equilibrium. Parental linkage phases were assumed unknown. Progeny were generated assuming no interference, so that a recombination event between the first marker and the QTL did not affect the occurrence of a recombination event between the QTL and the second marker. The recombination fraction was calculated by the Haldane map function.
Phenotypic data for progeny were simulated as follows: where Yij is the phenotypic value of the individual j in the half-sib family i; p is the population mean; q2! is the effect of the QTL genotype of individual j in family i; s i is the sire's contribution to the polygenic value; d ij is the dam's contribution to the polygenic value; O ij is the effect of Mendelian sampling on the polygenic value; and e ij is the residual error. The phenotypic value of the trait was assumed to be normally distributed with mean equal to zero and variance equal to one. Heritability of the trait was assumed to be 0.25. Allelic effect of the QTL was defined so that the additive variance of the QTL accounted for 40, 20 and 4 % of the genetic variance, i.e. 10, 5 and 1 % of the total phenotypic variance, so that the true values of QTL heritability (h2 ) and polygenic heritability (ha) were 0.10, 0.05 and 0.01 and 0.15, 0.20 and 0.24, respectively.

Sampling schemes
A typical dairy cattle population with prevailing half-sib family structure was assumed. The base population under the breeding program consisted of 500 sires used by an artificial insemination (AI) organization and an infinite number of females. Each sire was bred with 300 randomly chosen unrelated dams to produce one phenotyped offspring per mating. The selection of individuals for genotyping followed in two steps. In the first step sire families assumed to be most informative for QTL mapping were selected. In the second step offspring from selected families were chosen for genotyping and QTL analysis.

Selection of families
Offspring of all sires were ranked according to their simulated phenotypes to choose sires whose progeny will be genotyped. Only sires with offspring within the top and the bottom 10 % of the entire distribution were considered for selection. The selection decision was based on the assumption that these sires are most likely to be heterozygous for the QTL affecting the trait. The selection criterion for sires was defined as where n l is the number of progeny in the top 10 % of the distribution and n 2 is the number of progeny in the bottom 10 % of the distribution. If a sire has a large number of daughters in both the top and the bottom 10 % of the distribution, both n l and n 2 will be large, and c will have a small value, closer to zero as n l and n 2 increase. Therefore, sires were ranked according to the value of c, assigning higher rank to those sires with a smaller value of c. Sires were selected starting from that with the smallest value of c, i.e. from the sire with the largest number of offspring equally distributed in the top and bottom 10 % of the entire distribution. Sampling continued until the number of sires needed for genotyping was reached.

Selection of individuals within selected families
Three different sampling schemes were applied to the progeny of the selected sires.
Scheme I: from each of the selected sires, the number of offspring needed for analysis were sampled starting from the tails of the distribution. Therefore, 50 % of the animals for genotyping had the lowest and 50 % the highest phenotypic values.
Scheme II: from each of the selected sires, the offspring needed for genotyping were randomly sampled from the entire family.
Scheme III: each sire from the base population was allowed to produce only the exact number of offspring needed for genotyping. Sires were selected according to the criterion c. No selection was applied to the offspring, i.e. all offspring of a selected sire were analyzed.
Note that not all of the offspring of the selected sires chosen for genotyping were necessarily within the top and bottom 10 % of the entire phenotypic distribution.
Control: in addition to the sampling schemes, control data were generated assuming no selection in either sires or offspring. These data were used as a comparison basis.
The number of genotyped offspring was held constant at 2 000. Number of families and number of offspring per family varied. For each sampling scheme, three different combinations were examined: 100 families of 20 offspring, 40 families of 50 offspring and 20 families of 100 offspring.
For scheme I, additional simulations were carried out assuming a base population consisting of 100 sires with 80 offspring each. Twenty sires were chosen for genotyping starting from the sire with the largest number of offspring equally distributed in the top and the bottom 10 % of the phenotypic distribution. The proportions of offspring chosen for genotyping were 0.10, 0.25, 0.50 and 1.00. One half of the total number of the genotyped individuals was taken from either tail of the phenotypic distribution. But, in the analysis, all data were considered: typed and untyped offspring from the selected sires as well as all (untyped) offspring from the unselected sires. Thus, the sample size was equal for all analyses -100 families with 80 offspring each.

Statistical analyses
Simulated data were analyzed using the following model: where y2! is the phenotypic trait value of the jth individual in the ith family assumed ideally precorrected for environmental fixed effects, u is the population mean, g ij is the additive genetic effect of the QTL with g i j -N(O, a 9 2 ) , a ij is the additive effect of the polygenic component with a ij rv N(o, a!), and e ij is the random environmental variation with e ij rv 7V(0,cr!). Assuming linkage equilibrium, the variance of Yij is where a 2 is the phenotypic variance, U2 is the variance associated with a QTL, Q a is the variance associated with genes other than the tested QTL (polygenic variance), and Q e is the environmental (residual) variance.
The expected value of the covariance between two non-inbred half-sibs within the family is where 1f q is the proportion of alleles identical-by-descent (IBD) shared by the half-sibs j and j' at the putative QTL. The coefficient of the polygenic variance is 1/4 because, by expectation, two non-inbred half-sibs share 1/4 alleles IBD.
With k half-sibs in the ith family, the covariance matrix (V,) among phenotypic values of the half-sibs (y2! ) is with and where h 9 = a! / a2 and h! = a!/ a2. 7 r is the proportion of alleles IBD shared by the individuals j and j' at the (aTL. 7 rq must be estimated using information on linked marker loci. Given the proportion of alleles IBD at two markers flanking the putative QTL, the proportion of alleles IBD at the QTL can be estimated using linear regression [3]: where 1Tl and !2 are IBD values for two flanking markers. For simplicity, marker genotypes were assumed known in both parents. The proportion of alleles IBD at marker loci shared by two half-sibs within a family was estimated using simulated marker genotypes of the offspring and their parents using the procedure described by Haseman and Elston [6] for the situation with known parental information, appropriately adjusted to fit the half-sib family structure !9!. For those samples in which only a part of the individuals were genotyped, but all phenotypes were included in the analysis, the same procedure was applied to calculate the proportion of IBD at marker loci shared by two typed half-sibs from a typed sire. The unknown proportions of IBD shared by two untyped half-sibs or by one typed and another untyped half-sib were replaced by their expected value of 0.25. Assuming a multivariate normal distribution of the data (yZ!), we have a joint density function of the observations within a half-sib family: where y i = [Yi y22 y 23 ... yZ!!' is a k x 1 vector of observed phenotypic values for k half-sibs within the ith family, and 1 is a k x 1 vector with all entries equal to one.

The overall log likelihood for N independent half-sib families is
The maximum likelihood interval mapping procedure was applied to the generated data. The likelihood function was maximized with respect to h'g, h', and !2 for each testing position along the chromosomal segment using a simplex algorithm described by Xu and Atchley [11]. The chromosome was screened from the left to the right end in steps of 2 cM. For each position, the likelihood ratio test (LR) was computed as minus twice the difference in log likelihood between the null hypothesis (h9 = 0) and the alternative hypothesis (h9 ! 0). The testing position with the highest LR was accepted as the most likely position of the QTL. Similarly, estimated variance components (h9 and h2 ) at the position with the highest likelihood ratio were accepted as maximum likelihood estimates for these parameters. For each sampling scheme and each parameter combination, the simulation and analysis were repeated 100 times.
The power of QTL detection was obtained empirically by simulation. The empirical distribution of the LR test statistic under H o was generated by simulating and analyzing data in the same manner, but assuming no QTL in the entire segment. For each sampling scheme and each parameter combination, data simulation and estimation under H o were repeated 100 times. Each time the highest value of the LR was recorded. After 100 replicates, the obtained LR values were ordered, and the 95th value was chosen as an empirical 5 % significance threshold for this parameter combination. The power of QTL detection was then calculated as a percentage of replicates in which the maximum LR exceeded the corresponding threshold. The parameter with most influence on power was family size. For the fixed number of genotyped progeny (2 000), considerably higher power was obtained with larger families and a smaller number of families than with smaller family size and a larger number of families. For all sampling schemes, regardless of the size of QTL effect, the highest power was obtained with 20 families with 100 progeny each -almost twice as high as for the reverse combination with 100 families and 20 progeny each. This is explained by the increased number of half-sib pairs within a family. In general, for N families with n half-sibs each, the total number of half-sib pairs is Nn(!2 1). As n increases while nN remains constant, the number of half-sib pairs also increases, and this results in an increased amount of information used in the analysis.
The proportion of variance explained by the QTL was another factor that influenced power of QTL detection. Generally, higher power was obtained with a larger QTL. With a small QTL (h' = 0.05 and 0.01) power was very low and ranged between 0 and 14 %, depending on the sampling schemes and family size.
For scheme I, in which the most extreme offspring of the selected sires were sampled, the power of QTL detection could not be calculated. In obtaining the empirical threshold value for scheme I, the LR was zero for all positions in all 100 replicates, i.e. likelihood failed to maximize through the entire chromosomal segment. Therefore, the advantage of using selected samples can be seen only from schemes II and III. A relatively large QTL (h9 = 0.10) can be detected with higher power than in the situation when the sires are not selected. Also, a QTL with small effects (h9 = 0.05) can be detected with higher power if the half-sib families are large enough. Only for a very small QTL (h) = 0.01) does the selection of sires seem not to be advantageous.
Mean estimates of QTL position with the corresponding among replicates standard deviations are given in table IL Under scheme I, for some parameter combinations with h) = 0.05 and h2 = 0.01, the position of the QTL was not estimable, because the likelihood failed to maximize through the entire segment. For other parameter combinations, the position of the QTL was poorly estimated and biased downwards with low QTL heritability and smaller family size. The estimates improved with increased QTL heritability and family size.
For scheme II the estimates for QTL position ranged between approximately 7 and 11 cM. Similar estimates were obtained for scheme III, except for the parameter combinations with a sample size of 100 families of 20 offspring and h) = 0.05 and 0.01. The estimates of the QTL position for the parameter combinations with a low QTL heritability tend to take values on the left-hand side of the chromosome, especially when low QTL heritability was accompanied by small family size. This downward bias was not expected, because QTL was simulated centrally. The unexpected results might be due to the properties of the simplex algorithm used to maximize the likelihood function. With a low QTL heritability, the simplex algorithm was apparently unable to continue maximization of the likelihood function after reaching a local maximum.
The among replicate standard deviations of the estimates for the QTL position were large with low QTL heritability and smaller family size, because the individual estimates largely vary from one replicate to the other. The estimates were more accurate, i.e. had smaller among replicate standard deviations as the family size and the QTL heritability increased. Compared with the control, the estimates for QTL position with selected samples were biased with smaller family size and lower QTL heritability.
The estimates for QTL heritability (h9), polygenic heritability (ha), total heritability (h') and phenotypic variance (!2), are given in table III. The true values of QTL heritability were 0.10, 0.05 and 0.01 with the corresponding polygenic heritability of 0.15, 0.20 and 0.24, respectively. With scheme I, the estimated !2 ranged from 2.5 to 5.0. The a 2 in the sample was, thus, drastically increased compared with the simulated value of 1.0 in the base population prior to selection. The increased a 2 was due to sampling individuals from the tails of the distribution. The increase in a 2 , however, was not accompanied by an equivalent increase in the estimated genetic variance. Moreover, the two components of the genetic variance were not equally affected. In general, the estimates for h9 were closer to the simulated values and only slightly biased.
But, the estimates for ha and, therefore, the estimates for ht expressed as a sum of h2 and ha, were severely underestimated. For parameter combinations in which the likelihood failed to maximize, the estimated values for hfl were equal to zero in all replicates.
In scheme II, the estimated a 2 was only slightly above the simulated value of 1.0. The estimates for h9 were slightly underestimated for simulated QTL heritabilities of 0.10 and 0.05, and slightly overestimated for the simulated QTL heritability of 0.01. However, severe bias was observed for the estimates of ha, and, consequently, the estimates of ht were biased downwards.
In scheme III, the estimated a 2 was somewhat overestimated. The mean estimates ranged from 1.17 to 1.42 for the simulated value of 1.0. The estimates for hg were close to the simulated values except for the parameter combinations with a sample size of 100 families of 20 offspring. In this sampling scheme as well, severe bias in hfl and ht was observed.
With the control data, considerably less biased estimates for h g 2 ha, h2 and Q 2 were obtained for all parameter combinations.

Accounting for selection
The results presented show the advantage of selective genotyping over random samples in giving increased power to detect a QTL. On the other hand, the estimates of QTL position, and, especially, variance components, are grossly biased. This large downward bias is probably due to the method of analysis, which ignores selection. In all three schemes, the selection favors progeny of those sires with the largest number of offspring falling into the top and bottom 10 % of the entire distribution. Therefore, when the most extreme offspring of the selected sires are sampled (scheme I), or even when the offspring for genotyping are randomly sampled from the entire family (schemes II and III), the continuity of normal distribution of data that existed before selection is broken. The assumption of normality required for maximum likelihood estimation is violated, which results in biased estimates or inability to maximize the likelihood function. It is known that standard likelihood methods cannot produce proper results if only selected offspring or offspring from selected sires are genotyped !7! . Thus, an analysis by maximum likelihood techniques must account for truncated selection. This involves maximizing likelihood separately for individuals in the top and in the bottom tail of the distribution !2!.
For the selection and the sampling schemes presented in this study, however, the method described by Darvasi and Soller [2] cannot be applied, because the truncation point cannot be unambiguously determined. Some of the genotyped offspring of the selected sires may not have extreme phenotypes, because the truncation point is not distinct, especially in sampling schemes II and III, where the offspring are randomly sampled or the whole family is analyzed. To account properly for this form of selection, missing data methods should be used !8!. According to Lander and Botstein [7], the correct results will be obtained by maximum likelihood techniques if the phenotypes are recorded for all animals and genotypes for untyped animals are simply entered as missing. Therefore, a part of the analysis was repeated with inclusion of all data available on typed and untyped individuals. The proportion of alleles IBD at marker loci for untyped animals was replaced by its expected average value of 0.25, as described in the Methods of the paper. The results from the simulation for power, QTL position and heritabilities are given in tables IV-VI, respectively. As expected, power to detect a QTL is higher when more individuals are genotyped (table IV). Compared with the situation when only 10 % of the population with the most extreme phenotypes are genotyped, the power is nearly doubled when complete offspring information is available. However, an increase in proportion of genotyped individuals above 25 % does not result in a corresponding increase in power, especially when the QTL accounts for a greater part of the genetic variance. With a smaller QTL effect, the selection of animals with extreme phenotypes is primarily based on polygenic and environmental effects, so that detection of the QTL definitely requires more genotyping.
Including all data in the analysis allowed for correct estimation of QTL position regardless of the proportion of untyped animals (table V). Mean estimates for QTL position range from 6 to 11 cM and are similar for all parameter combinations. This result was obtained even for the parameter combinations with a QTL heritability of 0.01. Clearly, the estimates are more accurate with larger proportions of genotyped animals, but this improvement in accuracy is not large enough to justify the costs of genotyping more individuals.
The estimates for QTL heritability (h 9 2), polygenic heritability (ha), total heritability (hf) and phenotypic variance (a 2 ) are given in table VI. The estimates for ht are very close to the simulated value of 0.25 for all parameter combinations. The mean estimates for h 2 are, however, mostly biased upwards.
The bias is negatively proportional to the number of genotyped animals, and relatively higher as the QTL heritability decreases. Consequently, the mean estimates of hfl are biased downwards. Nevertheless, the sum of h9 + hfl is conserved at -0.25, which indicates a successful partitioning of overall genetic and residual variance.
Confounding between h9 and ha is considered to be a general frailty of the sib-pair approach [4]. This problem has been addressed in several previous studies !1, 9!. Confounding between h9 and hfl can be regarded as independent of the experimental design used and, therefore, not primarily caused by selective genotyping. The power of separating h9 and ha, however, depends on the deviation of irq from the average, i.e. from 0.25 in the case of the half-sib design !10!. When the data contain a greater proportion of missing marker genotypes, the proportion of alleles IBD at marker loci shared by two half-sibs is replaced by 0.25, and the estimated 7 rq is, consequently, closer to 0.25. Thus, when fewer animals are genotyped, the separation of h9 and ha becomes more difficult.
This can clearly be seen from the results presented in table VI.
Although this paper does not consider simulation studies for sampling schemes II and III with all data included, it is expected that similar results would have been obtained for both randomly sampled offspring and the entire families of the selected sires.

CONCLUSIONS
The results presented of the simulation study show that selective genotyping within selected families is advantageous compared with the conventional design based on random samples, because it results in increased power for a given number of individuals genotyped, or, in other words, reduces the number of individuals that need to be genotyped for a given power. This is due to the increased signal of QTL by selection, because over 80 % of the information used in linkage analysis comes from the top and the bottom 20 % of the distribution !2!. From the practical aspect, the method of selection considered in this study is even more efficient than the standard selective genotyping, because selection of extreme individuals is mainly based on sires, whose information is readily available or at least easier to obtain. Because the selection of candidates for genotyping is based on the entire distribution of progeny phenotypic values, it is not necessary to raise and measure any extra individual only for the sake of QTL analysis. In some instances, sires chosen for genotyping can be used more extensively to assure more intensive selection of extreme individuals and an additional increase in power. This is, however, not indispensable, because even an analysis of randomly sampled progeny of a selected sire results in a higher power than in a design without any selection.
To enable proper estimation of QTL parameters -QTL position and variance -when using selected samples, it is necessary to account for selection. The most convenient approach is to include phenotypic data for all individuals and marker data for selected ones, whereas marker data for unselected individuals can simply be entered as missing. The !rs for genotyped individuals will then be calculated in the usual manner, whereas the 7 rs for all other individuals will be replaced by their expected average value of 1/4 for half-sibs. Such an analysis will give correct estimates for the QTL position and genetic variance. The separation of the QTL variance from the polygenic variance will be, however, affected by the proportion of untyped individuals. This is a known difficulty of the sib-pair approach. This problem might be solved if more sophisticated methods for QTL mapping were used. For practical applications the model of analysis described in this paper can be easily extended to include fixed effects or an additional random effect (e.g. a second (!TL). The model can be also adjusted to handle general pedigrees and in this way take into account the relationships among animals.