The use of marker haplotypes in animal breeding schemes

L'utilisation d'haplotypes marqueurs dans les schemas de selection animaux. L'information sur des haplotypes de marqueurs a ete utilisee pour augmenter les gains genetiques dans des schemas de selection a noyaux fermes. De tels schemas ont ete simules sur dix generations separees: d'abord cinq generations de selection classique (non assistee par marqueur) puis cinq generations de selection assistee par marqueur (SAM). La transmission des alleles au locus quantitatif (QTL) etait suivie par des haplotypes marqueurs avec une probabilite 1 - r. L'accent etait mis sur les gains genetiques supplementaires obtenus lors des premieres generations de SAM, puisque l'on a suppose que de nouveaux QTL etaient continuellement detectes. Dans la premiere generation de SAM, le gain genetique etait accru de 8,8 et 38 %, selon que le controle de la performance intervenait avant la mise a la reproduction (par exemple une selection sur la vitesse de croissance) ou apres (par exemple la fertilite), et sous l'hypothese d'un QTL marque expliquant 33 % de la variance genetique et avec r = 0,1. Le gain genetique supplementaire diminuait avec le nombre de generations de SAM puisque la variance du QTL etait de plus en plus exploitee. Les reponses supplementaires dues a SAM augmentaient plus que proportionnellement a la variance du QTL et augmentaient a mesure que l'heritabilite decroissait. Quand r augmentait de 0,05 a 0,2, le gain genetique de SAM ne diminuait que de 7,7% (avec un controle apres la mise a la reproduction). La SAM etait a peu pres egalement efficace pour des caracteres exprimes dans un seul sexe que pour des caracteres exprimes dans les deux sexes. Dans le cas d'un caractere de carcasse, mesure apres abattage, les gains de reponse atteignaient 64 %. Pour un caractere mesure apres la mise a la reproduction, les gains genetiques additionnels augmentaient notablement avec le nombre de descendants par mere, parce que les marqueurs rendaient alors possible une selection intrafamille. On conclut que les gains dus a MAS peuvent etre importants quand il y a detection continue de nouveaux QTL et que le controle de performance se fait apres la mise a la reproduction.


INTRODUCTION
In recent years, genetic maps of DNA markers have become available for several species of livestock (Barendse et al, 1994;Bishop et al, 1994;Rohrer et al, 1994) and more marker maps are under construction (Haley et al, 1990). In the near future, it is expected that maps with approximate distances between adjacent markers of 10-20 cM will cover most of the genome (see Visscher and Haley, 1995, for a review). In regions where quantitative trait loci (QTL) are found, higher map densities may be achieved. Some experiments to map quantitative trait loci (QTL) on the marker map have been conducted (Anderson et al, 1994;Georges et al, 1994). More QTL mapping experiments will probably follow and the approximate position and effect of the largest QTL will be assessed. It will be difficult to distinguish whether an effect is due to one or several closely linked QTL, but regions where the QTL for the economically most important traits map can and will be located.
In previous studies, associations between single markers and QTL were based on daughter or granddaughter designs (Kashi et al, 1990;Weller et al, 1990;Meuwissen and Van Arendonk, 1992) and identified QTL had to be traced for two or more generations away from the sire in which they were identified before being used for selection. When marker haplotypes, that surround a QTL, do not recombine, the QTL can be traced with certainty (neglecting double recombinants) and BLUP (Best linear unbiased prediction) methods can estimate QTL effects from previous generations (Goddard, 1992), such that no daughter or granddaughter design is needed. Furthermore, in contrast to previous studies (eg, Gibson, 1994), emphasis will be on the selection response during early generations of selection, since this is economically most important, and new QTL will be continuously detected in ongoing MAS schemes. This paper will describe the use of marker haplotypes in animal breeding schemes and will identify situations where MAS is particularly useful.

MODEL
Genetic model It will be assumed here that regions where QTL are present have been identified by QTL mapping experiments. In such a region the presence of one QTL with many alleles is assumed, since the actual number of alleles is unknown. Assuming many alleles minimizes the change in allele frequencies due to selection, which makes the extra response from MAS last longer. This is not important here, where emphasis is on early generation response rates, but with a finite number of alleles and possibly extreme allele frequencies, the extra response from MAS will be reduced during later generations of selection. Also, the assumption of many alleles reflects the, perhaps realistic, situation where the assumed QTL effect is actually due to a cluster of closely linked QTL: the effect of each cluster is then represented by an allele. A number of markers are scattered around the QTL, together forming a marker haplotype. In the absence of recombination within the haplotype, the inheritance of the haplotype is followed by DNA marker analysis. Double recombinations between two adjacent markers within the haplotype are neglected, which is reasonable even for haplotypes that cover a large distance as long as the distance between the two adjacent markers remains small. Hence, the inheritance of the QTL follows that of the marker haplotype.
When recombination occurs, it is assumed that the inheritance of the QTL is not traceable. Probability statements about the inheritance of the QTL could be made, but this is not attempted here since they require accurate estimates of the position of the QTL, which are generally not available (Haley and Knott, 1994). In its simplest form, the marker haplotype may be formed by two markers bracketing the QTL. When the markers are non-informative with respect to their inheritance, ie, from marker analysis it cannot be deduced whether the markers were inherited from the dam or from the sire, a situation similar to recombination occurs; the inheritance of the QTL effect could not be followed.
The QTL alleles of base generation animals were obtained by sampling from the distribution N(O, 1/ 2 V QTLi ), where V QTLI = variance due to the ith QTL. The factor y 2 is due to the fact that an animal has a paternal and a maternal QTL allele, each of which contributes half of the total variance due to the QTL. Effects of QTL of descendants of base generation animals were obtained by Mendelian sampling from their parental effects. The probability that the marker haplotype recombined (at least once), and the Mendelian sampling of the QTL alleles could not be followed by the marker haplotypes, was r. The actual marker haplotypes were not simulated: only recombination or no recombination within a haplotype was simulated. This procedure was replicated for all marked QTL. A polygenic effect, g i , was simulated to reflect the non-marked genes. In the base generation, polygenic effects were sampled from N(0, V 6 ), where V a = additive variance of polygenic effects. In later generations, it was sampled from N( 1/2 g s + 1/2 g d ; 1/2 V a ), where s and d denote the sire and dam respectively. Phenotypic records, y 2 , were obtained by adding an environmental effect to the sum of the polygenic and QTL effects. The environmental effect was sampled from N(0, V e ).

Breeding value estimation
Estimation of breeding values with marker brackets or haplotypes follows Goddard (1992). Records were analyzed by the model: where y = vector of records, u = vector of polygenic effects, Z = incidence matrix linking animals to records, q i = vector of allelic effects for the ith marked QTL, Q i = incidence matrix linking QTL alleles to animals (every animal has two QTL alleles, hence every row of Q i has two elements equal to 1 and the remaining elements are 0), and e = vector of environmental effects.
As an example, consider two base generation animals, s and d, and one offspring, o. The alleles of the base generation animals are all considered as different base population alleles: q sP' q sm , q d p, and q dm , where p and m denote the paternal and maternal allele respectively. Now, suppose that the offspring o received the maternal marker haplotype of its sire s (actually, since s is a base animal, one of the haplotypes is arbitrarily denoted as maternal) and a recombined marker haplotype of its dam d. Hence, the paternal QTL allele of o is a copy of q sm and the maternal allele is either q d p or !dm. For the maternal allele of o a new QTL allele is postulated and included in the vector q, with a mean value of 1/2 q d p + 1/ zq am and a variance around this mean of E(1/2[ 1/ 2(q d p-q dm )] 2 +1/2[1/2(q d p-q dm )] 4 = 1/4V QTL .
Hence the total variance of q om is V(1/2q d p + 1/2q dm ) + 1/4V(!TL = 1/4(1/2V(!TL + 1/2V QTL ) + 1/4V QTL = 1/2U QT I, (which equals that of the other QTL alleles, eg, q sp), and Cov( q d m, qom) = Cov(q dm , l/2<? dm ) = 1/4VQTL. It follows that: Note that G has the same structure as a numerator relationship matrix (Henderson, 1976), where q d p and q dm are the parents of q om . Hence, a pedigree of QTL alleles is formed, and G-L is obtained from Henderson's rules. Also, Var(u) = AV a , and A -1 follows from Henderson (1976). Estimates for u and q l are obtained by solving Henderson's (1984) mixed model equations (in the case of one QTL): where A = E /6 and the variance components, V QTLI (needed for G), V a , and V e , were assumed to be known. Extension to more QTL is straightforward. V QTLI requires knowledge about the size of the QTL effects and their allele frequencies, which may, at least approximately, be obtained from the QTL mapping experiment.
Otherwise, they could be obtained by an REML analysis (Fernando and Grossman, 1989;Goddard, 1992).
In situations, where marker information is not available, the equations: are solved to obtain breeding value estimates a, where ,l3 = V e/ (V a + EV QTLI )-Both u+Eq i from equations [2] and a from equations [3] are estimates of the total breeding value u + Eq i , which includes the QTL and the polygenes.

Breeding schemes
The analysis of DNA markers for vast numbers of commercial animals was considered too expensive. Hence only nucleus animals were analyzed, although in some instances the effect of having marker information on commercial offspring of selected sires was assessed. Only closed nucleus breeding schemes were studied, because these are most common across species. In species with low female reproductive rates, alternative breeding schemes occur (mainly open nucleus schemes), but due to the availability of modern reproductive techniques these schemes tend more and more towards closed nucleus schemes (Nicholas and Smith, 1983; Meuwissen, 1991a).
Because marker information will be mainly available on nucleus animals, genetic markers will increase this tendency towards closed nucleus schemes. The parameters of the closed nucleus scheme are summarized in table I. Because a QTL mapping experiment precedes the selection on marker information, it was assumed that marker information was available on five generations of animals prior to the start of MAS, which is in generation 0. Also, in an ongoing MAS scheme where a new QTL is detected, marker information becomes available on previous generations of animals. Breeding schemes were simulated for five generations prior to MAS (generation 0) with selection on a from equations !3!. Marker information accumulated during these five generations. After these five initial generations of selection, five generations of MAS followed with selection for u + Eqi (from equations (2!). Alternatively, selection on a from equations [3] continued for another five generations, which was denoted by non-MAS.

RESULTS
Records available before selection Table II compares genetic gains after one, two, three and five generations of MAS to the analogous gains with non-MAS. When records were available before selection (eg, growth rate, feed efficiency), extra rates of gain due to MAS were moderate and varied from 8.8% in generation 1 to 2% over five generations. The decline in the extra response is because the variance of the QTL effect decreases as the beneficial QTL alleles increase in frequency. The latter occurs more rapidly with MAS and thus the genetic gains with MAS decrease more rapidly than those with non-MAS. Also, non-MAS puts more selection pressure on the polygenic effects u than MAS. Hence, the genetic gain in u with non-MAS exceeds that with MAS, which reduces the difference in total selection response. Therefore, non-MAS tends to catch up with MAS as the number of generations increases (see Gibson, 1994).
Eventually both MAS and non-MAS exploit all the variance in the QTL, the advantage of MAS being that it exploits the QTL variance faster. However, if a new QTL is found every ith generation, a stable extra genetic gain is achieved equal to that indicated in table II after i generations of MAS, ignoring the gain from continued use of marked QTL after generation i.

Records available after selection
When records become available after selection (eg, with selection for fertility or longevity), the extra response due to MAS is increased and ranged from 38 to 15% over one to five generations. In this situation, conventional selection is for the average EBV (Estimated breeding value) of the parents and within-family variation is not used by selection. MAS uses the within-family variance associated with the QTL, which results in the large increases in response rates.

Effects of heritability
Extra response rates due to MAS are larger, with lower heritabilities (table II).
With decreasing heritabilities the accuracy of selection decreases, but QTL effects are still fairly accurately estimated. This is because the tracing of copies of the QTL alleles by markers leads to the availability of multiple records on the QTL alleles. The accuracy of estimation with multiple records still decreases with decreasing heritability, but less so than with single records. Hence, the superiority of MAS increases with decreasing heritability.

Size of QTL effects
In situations with one marked QTL and recording before selection, the first generation extra genetic gain due to MAS is 1.3, 4.0, and 8.8% with V QTL , values of 0.03125, 0.0625, and 0.125 respectively (table III). These figures are 6, 16 and 38%, respectively, when recording is after selection. Hence, the extra gain is more than proportional to V QTL1' The accuracy of selection increases from < 7 i/ 0 g to jo,T2 + 0,,2!,TL)/Olg3 where af = variance of estimated breeding values with non-MAS; afQ TL = extra variance explained at the QTL by MAS; and ai = total genetic variance. Since v(af + afQTL)/a9 equals approximately (1 + i/2o'!Qrp!/o!)o'i/Og, the increase in accuracy of selection is approximately proportional to U IQ TL' Further, Q Q TL = V QTL RQ T L 2 = V QTL [V QT L / (V QT L + o,,2 /n)] where r QTL = accuracy of estimation of the QTL effect, Q e = error variance (after accounting for estimation errors of all other effects in the model), and n = the number of copies of a QTL that are traced by the markers. Hence, ulQ TL increases more than proportionally to V QTL -In particular, where selection precedes the recording/expression of the trait, genetic gains increase more than the aforementioned proportion due to decreased intra-class correlations between EBVs of relatives (because markers explain withinfamily variance). This results in increased selection intensities (Hill, 1976;Meuwissen, 1991b). Double recombinations were ignored in this study for several reasons: r may be high due to non-informative markers and not due to a large distance between the markers; the extreme markers of a haplotype may be far apart, but individual marker brackets may be small without knowing to which bracket the QTL maps, hence, double recombinations will be detected and treated as single recombinants; and the probability of double recombinations is small except for large r. Table V also shows rates of gain in the case of a marker bracket with a QTL in the middle (M I -QTL l -M 2 ), a recombination rate between the markers of 0.4, and when accounting for double recombinants. A realistic mapping function is obtained from Kosambi (1944), which is used in table V. The distance between M 1 and M 2 is then 0.55 M and the probability of a double recombination between M 1 and QTL 1 and between QTL 1 and M 2 is 0.05. Genetic gains were substantially reduced by double recombinants: 13-5% in generations 1-5 with V QTL , = 0.125. With r = 0.2, the probability of double recombinants is only 0.004, which does not yield a real reduction in genetic gain. Hence, a marker bracket of 55 cM is too large and genetic gains are substantially increased by having an additional marker within the bracket, even when this does not increase the precision of the estimate of the QTL site.

Effects of additional QTL
A simulation was also conducted, where the markers of the bracket were so far apart that the recombination rate between them and the QTL was 0.5. Hence, the markers yielded no information. In this case, genetic gain was 8% lower than with non-MAS (result not shown), because of the high frequency of double recombinations (25%) that resulted in erroneous tracing of QTL alleles.

Information from commercial offspring
In previous studies on the use of MAS, elite sires (or grandsires) were assumed to have progeny test information on many commercial offspring in order to obtain accurate estimates of effects of QTL alleles (Kashi et al, 1990;Meuwissen and Van Arendonk, 1992). First generation response rates increased by 44% due to MAS, when marker and performance information on 1000 commercial progeny was available (table VI). Without this progeny information, this figure was 38% (table II). Hence, when all available information on QTL alleles is used, as in equations !2!, the availability of marker information on many commercial offspring yielded only moderately increased rates of genetic gains.

Sex-limited traits
When records are available after selection and only on females, eg, in the case of juvenile MOET (multiple ovulation and embryo transfer) schemes for dairy cattle (Nicholas and Smith, 1983), genetic gains were increased by 38 and 21% after one and five generations of MAS respectively (table VII). The former figure is similar to that with non-sex-limited traits. The latter is probably increased due to the less efficient MAS on sex-limited traits leading to less reduction of variance at the QTL. Table VIII considers selection for a carcass trait, which was measured by slaughtering at random half of the animals of each full sib family. The slaughtered animals were not eligible for selection. Non-slaughtered animals were selected after the information on their slaughtered sibs was recorded. Conventional selection yielded much lower response rates than those in table II because of the reduced selection differential, ie, half of the selection candidates were slaughtered, and the limited information. MAS increased rates of gain by 24% when the same breeding structure was used. In addition, all animals could be selected on marker information before the slaughtering, and the non-selected animals could be slaughtered to provide information for the next generation of selection. This increased rates of gain by 64% in the first generation of selection.

Increased numbers of offspring per dam
In the previous situations, the number of offspring per dam was limited to four (= 200/50). In pigs and in cattle, when modern reproductive techniques are used, the number of offspring can be much larger. With ten offspring per dam and recording before selection, response rates increased by 13 and 2% after one and five generations of MAS respectively (table IX). When recording was after selection, these figures were 45 and 17% respectively. In comparison to table II, first generations response rates were increased due to the higher female reproductive rates, but in later generations the advantage diminished due to the faster reduction in variances.
When comparing tables II and IX, it may be noted that in the case of selection before recording, non-MAS genetic gain is higher with fewer offspring per dam. This is because estimated breeding values of all full sibs are equal and selection is between full sib families. With ten offspring per dam there are fewer full sib families (20 instead of 50), hence genetic gains are larger with four offspring per dam.
In the previous tables it was assumed that marker information was available on generations of animals prior to the MAS. This occurs in situations where a QTL detection experiment precedes the MAS, and/or in a continuous MAS scheme where a new QTL is found. In other situations there may be little marker information on the generations of animals preceding the MAS, eg, when the QTL detection experiment involved a small subset of the population and there are no DNA samples available on the other animals.

Limited marker information on previous generations
When there is only marker information on the animals in generation 0 and their parents and grandparents, the extra genetic gains due to MAS are only 2.8 and 6% in the first generations of selection, with recording before and after selection respectively (compare tables II and X), because information on the QTL effects has to accumulate during these generations.
The response in the first generation of MAS comes from the estimates of the paternal and maternal QTL alleles of the animals in generation -1, which are based on the EBV of their sires and dams respectively, and are traced by the markers to generation 0 animals. When selection is after recording, the records in generation 0 will also provide information on the effects of the paternal and maternal alleles of animals in generation -1. First generation extra response rates are therefore further reduced due to the absence of previous generations of marker information when selection is prior to recording compared to after recording.
After five generations of MAS, extra genetic gains are 2 and 9% respectively, in the absence of marker information on previous generations (table X). This compares to increases of 2 and 15%, respectively, in table II. Hence, the differences in genetic gain become smaller with generation number, but it also takes longer before V QTL , is exploited and before MAS can capitalize on a new QTL.

DISCUSSION
Assumptions: the genetic model The true genetic model, ie, number of QTL, distribution of QTL effects, and number of alleles per QTL, is unknown and thus impossible to simulate. Fortunately, the short term results, which were required here, are not too sensitive to the genetic model. This is tested by a simulation with only two additive alleles with initial frequencies of 0.25 of the positive allele. This yielded after the five initial generations of non-MAS a frequency of approximately 0.5 and V QTL = 0.125. Rates of gain over one, three and five generations of MAS were 0.340, 0.869, and 1.255 respectively, which compare to the figures of 0.329, 0.890, and 1.328 in table II. Hence, with two alleles initial rates of gain from MAS are as high as with many alleles and equal V QTL , but rates of gain decrease faster due to the larger reduction of the variance at the QTL. In the fifth generation of MAS, V QTL was 0.011 or 0.002 when many alleles or two alleles, respectively, were assumed. In conclusion, short term (up till. three generations) predictions of rate of gain from the many alleles model are reasonably accurate, but the longer term rates of gain depend on the number of alleles, the distribution of allele effects, and the initial allele frequencies.
For simplicity, r was defined as the probability that marker haplotypes recombined. More precisely, r is the probability that the Mendelian sampling of QTL alleles could not be traced by markers. This could be due to recombination within the marker haplotype but also due to markers being non-informative, or the haplotype not being known with certainty. In particular, when there is little recombination within marker haplotypes, the frequencies of certain haplotypes will be increased by selection, which decreases the information content of the marker haplotypes. However, additional markers could be typed within the marker haplotype as generations progressed to maintain the informativeness of the haplotype.
Within the first generation of marker haplotyping, the linkage phase between markers is unknown. This is no problem, because QTL effects are also not estimable from a single generation of marker data. In the next generation the linkage phase is known, since the inheritance of the markers can be traced. This is unless both parents and offspring are heterozygous for the same alleles at all marker loci, which is unlikely with reasonably informative markers; if it occurs, QTL effects will be assumed untraceable (as q om in !1!).

Assumptions: the model of analysis
The selection decreases the variance among QTL alleles, which leads to reduced Mendelian sampling variances. This was not accounted for by either breeding value estimation method, MAS or non-MAS. With MAS, the Mendelian sampling variance of q om from the alleles q d p and q dm is reduced because selection makes q d p and q d n, more alike. This reduces the variance of q on , in formula (1!. With non-MAS the Mendelian sampling variance is also reduced, because part of this variance is due to V QTL . Ignoring these variance reductions probably did not affect genetic gains much since genetic gains are not very sensitive to errors in variance estimates (Sales and Hill, 1976). However here the model of simulation and data analysis differed slightly as well.
It was assumed that the initial variance due to the QTL was known without error. In practice, this will not be the case. Consider an extreme case where the true V QTL , = 0 but it is assumed to be 0.125, ie, a false QTL is assumed. With recording after selection and r = 0.1, this case yields first generation genetic gains of 0.158 and 0.183 with MAS and non-MAS respectively. Hence, a substantial reduction in genetic gain of 14% is incurred when a false QTL is assumed.
Rates of genetic gain MAS will only yield permanent increased rates of gain when there is a continuous input of newly identified QTL. The extra response rates due to MAS decreased very rapidly with increasing number of generations of selection for the same QTL (eg, table II). The rate at which new QTL will be discovered is difficult to predict.
Beneficial QTL alleles, that are generated by mutations, may stay at a very low initial allele frequency for a long time and changes in allele frequency will be mainly due to drift (Falconer, 1989). As long as a QTL allele is at very low frequency, it contributes little to the genetic variance despite a possibly large QTL effect, and hence it is difficult to estimate how many of these QTL are present in a population.
Once the allele drifts towards an 'intermediate' (still rather low) frequency, (marker assisted) selection increases the frequency rapidly. Hence, the rate at which new QTL are detected depends on the rate at which mutations drift to 'intermediate' frequencies and the amount of effort going into the detection of new QTL.
The inclusion of new traits in the breeding objective may lead to new QTL becoming relevant. Also, QTL may code for enzymes at rate-limiting steps in metabolic pathays. When the flux through such a step is increased by MAS, another step will become rate limiting and, hence, another QTL will occur.
Marker information increased rates of gain here more than in other studies (eg, Ruane and Colleau, 1994;Zhang and Smith, 1993), because of: 1) the use of marker haplotypes that trace QTL alleles with considerable probability instead of using one marker that is linked to a QTL; 2) the inclusion of marker information from previous generations; 3) the consideration of early response rates instead of longer term response rates (assuming that detection of new QTL continues); 4) the emphasis on traits that are recorded after selection, such that the markers increase the accuracy of selection substantially and increase the intensity of selection; and 5) the assumption that the variance associated with marked QTL was known. With non-MAS, selection may be performed only after recording, whereas MAS may lead to schemes with selection before recording because of the increased selection accuracy of young animals. This reduction of generation intervals may further increase rates of gain due to MAS (Meuwissen and Van Arendonk, 1992). In conclusion, the extra rates of gain from MAS can be large, in particular when there is a continuous detection of new QTL alleles and when traits are measured after selection.