Estimation of the average effects of specific alleles detected by the pseudo-testcross QTL mapping strategy

Dans une famille de pleins-freres issue du croisement entre deux individus heterozygotes, l'analyse des associations entre des marqueurs moleculaires dominants (RAPD ; polymorphisme de l'ADN amplifie au hasard) et des caracteres quantitatifs permet de detecter des locus impliques dans l'expression des variations phenotypiques de caracteres quantitatifs (QTL). Ces QTL sont specifiques de chacun des parents du croisement. Nous developpons ici un modele permettant de determiner la valeur generale des alleles au QTL au sein de la population, en vue d'une utilisation dans les programmes de selection chez les arbres forestiers. La methode proposee prolonge la localisation de QTL specifiques par la strategie du pseudo-testcross, basee sur la selection de marqueurs a simple dose presents chez un parent et absents chez l'autre. Elle exploite le fait qu'un des deux parents de la famille de plein-freres est homozygote nul pour les deux marqueurs RAPD bordant le QTL. Ainsi, en observant l'une de ses descendances de demi-freres, il est possible d'estimer les frequences dans la population des alleles codant pour la presence de bande RAPD. Une descendance de demi-freres issue de l'autre parent double heterozygote aux marqueurs bordant le QTL est alors utilisee pour determiner les effets moyens (additifs) des alleles au QTL.

QTL mapping strategy which is based on the selection of single dose markers present in one parent and absent in the other. It specifically exploits the fact that one of the parents of the full-sib family is double null for the RAPD markers bracketing the QTL so that, by looking at its half-sib family, 'band present' allele frequencies of the two markers can be obtained at the population level. The half-sib family of the other parent, which is doubly heterozygous, is then used to estimate the average effect (ie, the additive effect) of the two QTL alleles. QTL / breeding value / full-sib / half-sib / pseudo-testcross Résumé -Détermination des effets moyens des allèles à un QTL spécifique détecté par la stratégie du pseudo-testcross. Dans une famille de pleins-frères issue du croisement entre deux individus hétérozygotes, l'analyse des associations entre des marqueurs moléculaires dominants (RAPD, polymorphisme de l'ADN amplifié au hasard) et des caractères quantitatifs permet de détecter des locus impliqués dans l'expression des variations phénotypiques de caractères quantitatifs (QTL). Ces QTL sont spécifiques de chacun des parents du croisement. Nous développons ici un modèle permettant de déterminer la valeur générale des allèles au QTL au sein de la population, en vue d'une utilisation dans les programmes de sélection chez les arbres forestiers. La méthode proposée prolonge la localisation de QTL spécifiques par la stratégie du pseudo-testcross, basée sur la sélection de marqueurs à simple dose présents chez un parent et absents chez l'autre. Elle exploite le fait qu'un des deux parents de la famille de plein-frères est homozygote nul pour les

INTRODUCTION
The use of molecular markers as a complementary tool for breeding is based on linkage disequilibria between molecular markers and QTLs (quantitative trait loci) involved in the control of quantitative traits. Marker-assisted selection (MAS) in crop plants has been investigated in essentially two directions: (i) for genotype construction (eg, Young and Tanksley, 1989), and (ii) for predicting the breeding value of an individual progenitor (Lande and Thompson, 1990).
From population genetic studies, it is known that wild allogamous species such as forest trees are often in linkage equilibrium. It is likely, therefore, that the alleles at QTLs and the alleles at marker loci are randomly associated in different individuals (Avery and Hill, 1979;Muona, 1982;Beckman and Soller, 1983;Beckman and Soller, 1986;Hasting 1989;Lande and Thompson, 1990).
Consequently, it would be difficult to establish significant marker-QTL associations at the population level in large mating populations. This absence of linkage disequilibria has been raised against the feasibility of MAS in forest trees (Strauss et al, 1992). With the considerable decrease in cost and increased automation of the RAPD (random amplified polymorphic DNA) technique (Williams et al, 1990), however, it is now possible to construct a single-tree map for every individual in an elite breeding population: an extreme alternative to deal the linkage equilibrium that was presented and discussed by Grattapaglia and Sederoff (1994). Recently, Grattapaglia et al (1995) extended their approach to QTL mapping within a full-sib family. QTLs are then defined in a narrow genetic background and could be used to identify the best performers for vegetative propagation. However, there is no evidence that such specific QTL will be significant in broader genetic backgrounds. Experimental results in crop plants have demonstrated the inconsistency of QTL expression across populations (Tanksley and Hewitt, 1988;Graef et al, 1989;Beavis et al, 1991). Conversely, QTLs defined against a wide genetic background could be more useful in breeding than QTLs defined in a specific background.
The aim of this paper is to show that the general value of a 'specific' QTL detected in a full-sib family following the method of Grattapaglia et al (1995) can be easily evaluated, provided that both parents of the full-sib family are involved in maternal half-sib (open-pollinated or polycross) families. Such two-generation pedigrees are widely available in most forest tree-breeding programs that involve the simultaneous estimation of specific and general combining abilities of selected trees. We actually expect that some QTLs controlling economically important traits exist at the population level and can be detected in a broad genetic background. The proposed strategy assumes that the QTL mapping has already been performed in a single full-sib family and therefore concentrates on the estimation of the effects of the QTL alleles.

MODELS AND METHODS
Two-way pseudo-testcross mapping strategy using dominant RAPD markers This strategy essentially exploits the high levels of heterozygosity of outbred individuals and the efficiency of the RAPD assay in uncovering large numbers of genetic markers in an informative configuration (Grattapaglia and Sederoff, 1994). Single-dose RAPD markers are screened in such a way that they are in a heterozygous state in one parent and absent in the other, or vice versa, and therefore segregate l:l ratio in the F1 progeny following a testcross configuration (Carlson et al, 1991;Cai et al, 1994;Echt et al, 1994;Weeden et al, 1994;Kubisiak et al, 1995). Two separate sets of linkage data are therefore obtained and a specific genetic map is constructed for each parent. A quantitative trait dissection analysis is carried out independently for both parents of the cross under the conventional backcross model and leads to the detection of individual 'specific' QTLs. In the QTL analysis, the null hypothesis tests an allelic substitution averaged over the alleles inherited from the other parent (Leonard-Schippers et al, 1994;Grattapaglia et al, 1995).

Genetic model
The model described in this paper is based on a full-sib (FS) progeny between two heterozygous parents P x and P Y , and on half-sib (HS) progenies of these parents.
We suppose that the linkage analysis has already been carried out in the FS family and that a QTL has been detected in P x . The same analysis could be applied for a QTL detected in Py. In that case, the analysis would be performed for markers heterozygous in Py and homozygous null in P x . Let Q denote a quantitative trait locus lying between linked dominant molecular marker loci A and B. Let r l and r 2 denote recombination frequencies between A and Q and B and Q, respectively. As the QTL detection has already been performed, the values of r l and r 2 are known. Besides the recombination rates, the haplotypes of P x are also known from the mapping experiment. In the pseudo-testcross configuration, the genotypes are: A l A 2Q1Q2 B 1 B 2 for Px and A 2 A 2Q3Q4 B 2 B 2 for P Y , where A 1 and B 1 are the alleles coding for the presence of a particular RAPD fragments, A 2 and B 2 the alleles coding for the absence of the same fragments, QiQ 2 the QTL genotype of Px and Q 3Q4 the QTL genotype of Py. Since QTL alleles are unknown, Q 3 and Q 4 may be identical to Q 1 and Q 2 . For P x the expected frequencies of the gametes A 1Q1 B 1 , A 2Q2 B 2 , A 1Q1 B 2 , A 2Q2 Bi, Ai Q 2 B 2 A 2Q1 B 1 , AiQzBi and A 2Q1 B 2 are: (1 -r i )(1 -r 2 )/2, (1 -1'd (1 -1 '2)/2, (1 -r i )r 2/ 2, (1 -1'd1 '2/2, r l (1 -r z )/2, r l (1r z )/2, r l r 2/ 2 and r l r z/ 2, respectively. Let Qp denote any quantitative trait allele (QTA), in the population at the studied locus, and f A1 , f B1 and f Q p the allele frequencies of A 1 , B 1 and Qp in the pollen pool. For a two-allele marker model, frequencies of the alternative alleles A 2 and B 2 are (1 -f A ,) and (1f Bl ), respectively. Assuming linkage equilibrium in the population, the expected frequencies for gametes A1 QpB 1' A 2QP Bz, AiQpB 2 , and A 2Q pB 1 in the pollen are: f AdBl f Q p, (1 -f Al )(1fs l )f Q p, f A1 ( 1f Bl )f QP , and (1f A1 )f Bl f Q p, respectively. Let /1 qp denote the trait mean of QTL genotype (a9Qp(q = 1, 2) and Or the expected value of the mean of marker phenotype r weighted by the frequency of the different combinations of marker-QTL genotypes included in r. According to the Fisher model (Kempthorne, 1957), the trait mean of QTL genotype Q k oi is n n /1 kl = a k + a, + d kl , where a k Ff Q p /1k p and a l = ! fQp/1lp are the average P = 1 p = i effects of (aTA k and l, dk! denotes the dominance effect and n is the number of QTA in the population. The expected values are derived using double-crossover gamete frequencies.
Coupling-linkage phase of the marker presence alleles of the mother tree is assumed, but the same analysis could be carried out for the repulsion phase.

Determination of the marker allelic frequencies f A1 and f Bl in the pollen pool
The HS progeny produced with Py as the female parent was used to determine the allele frequencies of markers A and B in the population (pollen pool of the polycross). Since Py was double null for the bracketing RAPD markers, the maximum likelihood estimates for f A1 and f B1 are simply the observed frequencies ie, f A1 = n Al/ N and f Bl = n BI/ N, with n Al and n Bl the number of HS individuals carrying the 'RAPD band present' alleles for markers A and B, respectively.
Evaluation of the average effect of Q 1 and Q 2 The HS progeny produced with P x as the female parent was used to determine the average effect of Q 1 and Q 2 mapped in P X . Px is heterozygous for the two bracketing RAPD markers, which makes it possible to carry out genotype discrimination within the limits of the fact that , or !A2B2!), and 0(y,j) is the probability density of y ri . Each marker phenotype class is a mixture of two populations of QTL genotypes including with certainty Q 1 or (a 2 respectively. Therefore, the probability density functions of the four marker phenotype classes can be written as a linear combination of the two component normal distributions weighted according to their proportions, eg, for the marker phenotype class (A1B1!: where <P Q 1 ( Yi ) and O Q2 ( Yi ) were the probability density functions of genotypes QpQ 1 i and Q P (a 2 (A and B are defined in table II).
There is no analytical solution of the problem, but maximum likelihood estimates of the four parameters could be found using the EM algorithm suggested by Lander and Botstein (1989) or the quasi-Newton algorithm used by Knott and Haley (1992). Alternatively, approximations of maximum likelihood estimators at the first order of probability (Rebai et al, 1995) could be determined for both (aTA effects, a 1 and a 2 , using the following model: where Y ri is the phenotypic value of individual i belonging to the marker phenotype 1'; J -l is the general mean; g r is an indicator variable indexing the rth (r = 1, ... 4) marker phenotype class, taking values of 1 for individual i of the rth phenotypic class and 0 otherwise; Or = E(Yi/MP T ) = E(YZ/(!1) P r (Q 1/ MP r ) + E(YdQ2)Pr(Q2/MPr) , with MP , the marker phenotype r(!A1B1!, [AiB2] , [A 2 B,], [A 2 B 2] ); 1 E(Yi /Qi ) and E(Y i/Q2 ) are the additive effects of Q 1 and (az respectively; Pr ( Q1/MPr ) and P r (Q 2/MPr ) are the probabilities of genotype Q 1QP and Q 2Q p respectively, given the marker phenotype r (see table II); and c! is a random normal variable associated with each record with mean 0 and variance or' 2 . This variance is assumed to be identical for each class r.

This linear model [1] could be written as
where Y is the column vector of the phenotypic values; and a and 13 are the column vectors of respectively the a 1 and a 2 probability coefficients in 8. Because of the linear dependency between cc and 13 (IX = 1 -13), equation [2] can be written as: with the constraint a, + a 2 = 0 and assuming a 1 = -a 2 = a. The least squares estimators for a and var(a) are: a = (X'X)-1 X'Y, and var(a) = ( J &dquo;'2(X'X)-1 with X the incidence vector of the model (X = 2a -1). An expression of X is shown in table II.

DISCUSSION
The construction of a single-tree maps was first investigated using the megagametophyte of conifers. The megagametophyte is a haploid tissue derived from the same megaspore that gives rise to the maternal gametes, and can be easily obtained from germinating seeds. This mapping strategy has been widely reported (Conkle, 1980;Tulsieram et al, 1992;Nelson et al, 1993Nelson et al, , 1994Binelli and Bucci, 1994;Plomion et al, 1995) and is being used to dissect quantitative traits in loblolly pine (O'Malley and McKeand, 1994). The half-sib analysis depends on the ability to uniquely identify the marker alleles derived from the common progenitor, and detects exclusively the effect of QTLs in a heterozygous state. The identification of the maternal genetic contribution can be easily obtained with the analysis of the megagametophyte because it carries exactly the same genetic constitution as the fertilized ovule that gives rise to the embryo scored for the quantitative trait (Grattapaglia et al, 1992). Assuming a random genetic contribution from the male parents in the pollen pool, each segregating QTL detected in the mother tree can be characterized as differences in the average allele effect (Falconer, 1989). QTL defined in this way will be of great interest, because they will be major components of breeding value. A major drawback of this half-sib QTL mapping strategy is, however, that it requires the development of specific QTL mapping populations and cannot be used to analyze QTL in existing plantations. The megagametophyte is, in fact, a temporary tissue that can only be collected from germinants. Another limitation is that the megagametophyte is a special feature of conifers, and is not biologically available in most tree species. In order to implement the marker information to tree improvement, Liu et al (1993) developed a general strategy for genetic mapping and QTL analysis, based on diploid HS progenies. Such two-generation pedigrees are widely available in forest tree-breeding programs and could be used for immediate appli-(aTLs are also expressed in a wide range of genetic backgrounds. This could result in greater efficiency later on for use in marker-assisted selection. We demonstrated that this can be done by studying the two maternal HS progenies involving both parents of an FS family. As we noticed before, in the pseudo-testcross strategy, the QTL analysis tests the difference between the average trait value of (Ca l (a 3 +Q 1Q4 ) versus (Q 2Q3 + Q 2Q4 ) in parent P x ; this is equivalent to the following contrast: a 1a 2 + 1/2(d 13 + d 14 -d 23 -dz 4 ). With the method we developed in this paper, the QTL analysis led to the measurement of a purely additive contrast (a l -a z ).
The proposed approach relies on the previous identification of specific (aTLs in a single FS progeny and therefore presents some limitations. As pointed out by Grattapaglia et al (1995), the only (aTLs that can be detected in the FS are those that are heterozygous in the parents of the cross and where the differential effect between the alternative CaTA is relatively large. In principle, the access of average breeding values of most QTL alleles of the population could be achieved by analyzing progeny tests involving several half-sib or full-sib families. In domestic animals, Weller et al (1990) have proposed a 'granddaughter' design where the marker genotype is determined on the sons of heterozygous sires and the quantitative trait value measured on the daughters of the sons. Such a design involves three generations and is of particular interest because many half-sib families of relatively small sizes can be obtained from a single sire. Van der Beek et al (1995) have also demonstrated the value of three-generation pedigrees for mapping QTL in outbred populations. Conversely three-generation pedigrees are still rare in most forest tree-breeding programs, and two-generation pedigrees involving large HS families obtained by polymix cross or open pollination and connected FS families can be easily produced. QTL mapping strategies have also been developed for cases where several unrelated FS families with few individuals are available (Knott and Haley, 1992). When family sizes are large, it is possible to find marker QTL associations within a single pedigree without the need to accumulate data on individual markers across pedigrees. However, when family sizes are small an increase of QTL detection power can be obtained with a large number of FS families because the linkage phase can be accurately determined. At this point it must be pointed out that in farm animals, marker-assisted selection models have often been developed for multiallelic codominant molecular markers. Indeed, restriction fragment length polymorphisms (RFLP), simple sequence repeats (SSR) and minisatellites markers have been developed for linkage map construction (Hillel et al, 1990;Barendse et al, 1994;Bishop et al, 1994;Broad and Hill, 1994;Crawford et al, 1994;Ellegren et al, 1994;Rohrer et al, 1994;Archibald et al, 1995) and quantitative trait dissection analysis Georges et al, 1995) for the major animal species. For such types of marker loci, most sires will be heterozygous at most markers and for most offspring it will be possible to uniquely identify which parental allele is transmitted. The development of such powerful marker techniques is still limited in forestry.
To our knowledge, only two species have been mapped with RFLPs: Pinus taeda (Devey et al, 1994) and Populus trichocarpa x P deltoides (Bradshaw et al, 1994). Conversely, dominant RAPD markers have been intensively used for the development of single tree maps and for QTL analysis using the two-way pseudo-testcross strategy. RAPDs provide a fast, efficient and reliable way to gather marker data.
Other advantages with this method are the requirement of only a small amount of DNA for PCR reactions, the rapidity with which polymorphisms can be screened and the potential automation of the technique. Since primers can be arbitrarily chosen, any individual can be mapped with the same set of primers. A limitation of the RAPD fragments, however, is their questionable locus specificity when assessments are made on different individuals. On the one hand, although the same set of primers can be used, this does not mean that the same fragments corresponding to the same loci will always be amplified. On the other hand, the same fragment size observed in two different genotypes does not necessarily correspond to sequence homology. Finally, their diallelic nature is the major limitation for the unequivocal determination of parental marker alleles transmitted to each offspring, unless one of the alleles is rare (Soller, 1978).
Another important point is that when large QTL effects are confirmed in HS progeny, it is likely that the favorable QTAs exist at a low frequency in the breeding population. Such rare and favorable alleles would be of great value to increase the average value of the breeding population through MAS. Conversely, QTAs detected in FS families may turn out to be unimportant at the population level. This could very well happen if other alleles at the same loci with larger average effects exist at relatively high frequencies in the pollen pool. Indeed, a QTL effect (eg, Q 1 as favorable allele and Q 2 as unfavorable allele) could be observed in a specific unfavorable background (Q 3 and Q 4 unfavorable QTL alleles in P Y ), whereas this effect would be smaller in a wide genetic background (pollen pool) where QTA Qp would be frequently favorable.
The proposed strategy assumes that a specific QTL has already been detected in an FS family and that only those segments containing QTLs of interest are tracked in the two HS progenies. Therefore, the additional genotyping costs required for the estimation of the average QTA-effects may be lower than in the mapping experiment. Indeed, the costs will mainly depend on the size of the HS family necessary for this estimation. In order to be certain that the estimated substitution effects will be representative for the whole population, the number of half-sib progeny will have to be large if the number of QTAs is also large. Evidence for multiple QTA has not yet been demonstrated in forest tree species, but it is likely to be true for outbred and highly heterozygous plants as reported by Van Eck et al (1994).
We have assumed that the QTL position is precisely known from the analysis conducted with the FS progeny. Alternatively, a more accurate location could be obtained by combining genotypic information at the flanking markers of both the FS and HS progenies. Methods developed by Luo and Kearsey (1989) or Knapp et al (1990) could be used to obtain such an estimate. While the precise location would be crucial for map-based cloning objectives, however, it should not be a major problem for marker-assisted selection. Indeed, the extreme markers bracketing an LOD 1.0 support interval could be used for ensuring successful selection for the favorable QTL allele.
We investigated the combined FS and HS strategy developed here, and illustrated the feasibility of the integration of RAPD markers in the maritime pine and eucalyptus tree-breeding programs . Genetic gain and costs associated with the use of molecular markers in separate FS progenies were evaluated and compared to other strategies that aim to obtain similar selection efficiency. MAS has proved to have great potential as a complementary tool in the breeding of elite populations of forest trees.