Potential gain from including major gene information in breeding value estimation

Le but de l'etude est de comparer l'application de deux indices dans le cas d'une selection sur un caractere quantitatif soumis a l'effet d'un gene majeur. Dans le premier cas, l'indice ne prend pas en compte l'information sur le genotype au locus majeur (methode standard) alors que le deuxieme indice prend en compte cette information (methode modifiee). Deux types de schemas sont consideres : selection individuelle et selection sur descendance. Le calcul du progres genetique et de l'evolution des frequences alleliques est realise pas a pas en considerant des generations chevauchantes. Tous les effets etudies sur la superiorite de la methode modifiee sur la methode standard suggerent de nombreuses interactions. Cependant, il ressort que la prise en compte de l'information sur le gene majeur dans l'indexation est avantageuse dans les cas de faible heritabilite, de fort effet du gene majeur et de faible proportion initiale de l'allele favorable surtout lorsque cet allele est recessif. Le taux de selection n'a que peu d'influence sur les resultats. Enfin, l'interet de la methode modifiee est plus visible et plus rapide dans la selection individuelle que dans la selection sur descendance. Il n'en demeure pas moins qu'en dehors des conditions extremes precedemment citees, l'interet de la methode modifiee sur la methode standard reste pour le moins limite et la prise en compte de l'information sur les genotypes au locus majeur dans l'indice de selection, sans modification du schema de selection, ne constitue surement pas le meilleur outil de valorisation de cette information pour la selection.


INTRODUCTION
Most of quantitative genetics theory and its application to animal breeding is based on the assumption that a trait is controlled by a very large number of small independent genes. Nevertheless, evidence of genes with a large effect on quantitative traits is increasingly being found in livestock: double muscling in pigs (Ollivier, 1980), cattle (Hanset and Michaux, 1985), Callipyge in sheep (Cockett et al, 1994), dwarfism in poultry (M6rat and Ricard, 1974), hyperovulation in sheep (Booroola gene: Piper and Bindon, 1982;Inverdale gene: Davis et al, 1991), high milk protein content in goats (Grosclaude et al, 1987), low technological yield for the cooking of ham in pigs (Le , high milk flow in goats (Ricordeau et al, 1990). In order to take greater advantage of this genetic variability for animal improvement, specific genetic evaluation methods and selection schemes should be applied (Smith, 1967;Soller, 1978;Smith and Webb, 1981;Smith, 1982; Hoeschele, 1990;Gibson, 1994). Alternatively, organisation of matings including genotypic information may be proposed for a more efficient fixation of recessive favourable alleles (eg, Caballero et al, 1991).
In this paper, genotypes at the major locus were perfectly identified, an infrequent situation at the present time (eg, milk protein content in goats, halothane in pigs) but which should become more frequent in the future thanks to progress made in molecular genetics. The usefulness of including the major genotype information in breeding value estimation was evaluated by comparing it with the standard situation where this information is not considered. This comparison was performed in the framework of selection schemes for a trait measured on young animals from both sexes, eg, growth rate (scheme I) and for a trait measured on females only with a progeny test of sires, eg, milk production, (scheme II). Various populations with different genetic contexts (heritability, major gene effect, initial allele frequencies) and organisation (selection pressure, number of generations selected) were studied.
Standard and modified situations were compared based on the genetic progress they were expected to produce. The selection schemes considered were very simplified, only the main features of the situations studied were kept. This paper considers, as did Gibson (1994), a dynamic model where the evolution of allele frequencies and genetic means are described step by step, using a model matching the proposition made by Hill (1974) and Elsen and Mocquot (1974). This is a generalization of the Smith (1982) model.

Description of the selection schemes
The generations were overlapping and in demographic equilibrium within an infinite population. The age structure of the population was constant for both sexes. A constant selection pressure of 80% was assumed for the dam-daughter path.
The three other paths (sire-son, sire-daughter, dam-son) were selected with the same selection pressure q. The situations studied, even if somewhat arbitrary, were expected to reflect an average situation for performance test and progeny test selection schemes.
Scheme I is a model of a selection plan organized for instance in a meat sheep or cattle breed, with the trait measured in both sexes at the same time, when animals are between 0 and 1 year old. The generation interval is about 2 years. Only one selection step was considered before the first reproduction for each of the two sexes. The proportions of available breeding animals per age class are given in table I. Scheme II is a model of a selection plan organized in a dairy species. The generation interval is about 3 years in the present study. The trait was measured only in females. Males were selected after a progeny test on 40 daughters whereas females were selected on their own performance after their first reproduction. In this scheme, a constant 30% of the daughters was supposed to be born from young progeny tested males. The result of the progeny test was available when the young males were 2 years old. The first reproduction of females was not used for replacement. The proportions available per age class are given in table I.

Genetic model
The principles of the model were those of Smith (1982). The whole population was divided into classes defined by the major genotype i at a single major locus (i = AA, AB or BB, A being the favourable allele), the age j and the sex k. At a given time t, the components of the classes were their relative size a2!!t, their major locus genotypic mean value C i and their polygenic mean p zjkt . Time 0 (t = 0) determined the situation before the selection process began, thus the whole population was considered homogeneous for allele frequencies and polygenic value. At t = 1, the first generation after selection was applied was born. The a. jxt = E a ijxt have been given above. They were constrained to E a, jxt = 1. The i j evolution of the population was described through the evolution of the components ¡..tijkt and a ij x t , assuming the within class variances to be constant during the whole selection process.
The model included three types of relations as described below.

Ageing without selection
Between two successive classes of ages j -1 and j at time t and t + 1 without selection, two equalities occurred Ageing with selection When selection was carried out between the ages j -1 and j, the previous relations became where A zj -ikt is the mean polygenic superiority of selected individuals in the class ij -lk at time t. In practice, there is only one selection step for reproducers, so that only one age class was considered for ageing with selection: j = 1 for both sexes in scheme I; j = 2 for females and j = 3 for males in scheme II where q ijkt is the selection pressure for class ijk at time t and q!k is the selection pressure, which is supposed to be constant, for the set of individuals of age j and sex k.

Replacement
The components of the newborn individuals depended on the components of their parents (k = s for sire, k = d for dam) with Tisid i being the probability that an individual has genotype i given its parents genotypes is and i d .

Estimation of the selection pressures and selection differentials
Since the algebra used is similar for male and female selection and since the selection is performed in only one step, neither the index k nor the index j are specified. In order to simplify the algebra, the index t is also suppressed.
A reproducer r is characterized by its global genetic value h r which includes its polygenic value g r and its major locus genotypic value G r . The parental value H r of a reproducer was defined as the expected progeny performance Xp, ie, half the breeding value defined by Falconer (1989). It was estimated by the selection index I = H r corresponding to the expectation of Xp dependent on various types of information according to the case: own performance X r (scheme I and females in scheme II) or offspring performances X o (males in scheme II) and with the genotypic information at the major locus, G r and Go. In the standard method, the selection is made on an index supposed to be an expectation of the parental value when ignoring the existence of the major locus: the index I is defined as a simple regression on the own performance value X r (scheme I) or offspring performances X o (males in scheme II). The evolution of genetic value of selected reproducers, applying either index, has to be calculated as well as changes of allele frequencies and polygenic mean of each genotype.
The joint probability density of the genetic value 1 r of the reproducer r and of its index I is f (I',., I). This density is a mixture of subdensities O i , corresponding to genotypes i,.: with Ct i , being the i r class frequency within the considered group of reproducers. The within genotype polygenic mean superiority of the selected individuals is given by where T is the selection threshold (the I value above which the candidates are selected) and q the selection pressure corresponding to T: Application of these principles to the different cases studied is described in the Appendix.
In all cases studied, the threshold is found iteratively, as described by Ducrocq and Quaas (1988). However, contrary to the standard situation, the breeding value evaluation taking the major locus genotype into account was obtained after a two level iterative process: since the parental value H r has been defined as the progeny mean, it depends on the genotypic structure of the selected mate (ms) population (the aims and / -li mJ which itself depends on the airs and / -li rs of the selected reproducers (rs). Taking as a starting point the genotypic structure of the mate population before selection, the solution was obtained iteratively with a given selection pressure q. In order to simplify the algebra of the young male indexes I, it was assumed that the characteristics (mean polygenic values and major genotype frequencies) of the female population (when selecting males) could replace those of their future mates.

Comparison criteria
The value of including the genotype information in the parental value estimation was measured by the extra genetic gain as compared with the standard method. Starting from an initial point where all within major genotype classes were assumed to have equal polygenic means ( / -lij kO = p Hi, j, k), the nonlinear change of the a ijkt and l'ijkt over time differed between the two parental value estimation methods.
The evolution of the 0-1-year old females (yt = Z!c!odt(!t0dt + Ci) /a.Odt) was used as a measure of genetic progress, but our primary criterion was: with t f being the number of years considered and by t the difference between both methods for year t.
This criterion was preferred to the final deviation 6y tf which gives only a partial description of the differences between both methods. Preliminary analyses showed that comparisons between the methods were hardly influenced by the inclusion of a discounting factor in the t-summations, and the comparisons were finally limited to a nondiscounted criterion. The methods were also compared according to the evolution of the allele frequencies.
Cases studied The selection methods were compared for various combinations of the following parameters: Genetic parameters: the within major genotype heritability coefficient (h 2 ) was given values between 0.1 and 0.5 and the 'major gene effect' defined here as AC = C,9 A -C BB between 1 and 3 within genotype phenotypic standard deviations. Allele A was dominant (AA = AB = AC, BB = 0), additive (AA = 2AB = AC, BB = 0) or recessive (AA = AC, AB = BB = 0) over the allele B. Initial frequency p for allele A was tested between 0.1 and 0.9.
The global heritability i o-&dquo;r -r-&dquo;r with f rq(G,.) the frequency of genotype G r ), which includes both polygenes and major genes, depends on polygenic heritability, major gene effect (both constant) and allele frequencies (variable with time). Initial H 2 is between 0.11 and 0.81 (table II).
Population structure: the selection pressure q was given values of 5, 10 and 20%.

Evolution of mean genetic and polygenic values
The evolution of the mean genetic values of young females is illustrated in figure 1 for the case of A dominant, additive and recessive with h 2 = 0.3, p = 0.1, q = 0.1 I and AC = 2. In scheme I (fig la), when A is dominant or additive, the difference is nil at the beginning of the process. In the medium term, the modified method shows a higher increase of mean genetic value, essentially owing to the faster fixation of the favourable allele. In the long term, the standard method appears more efficient when comparing the final mean genetic value. When allele A is recessive, the modified method is slightly less efficient in the short term (&mdash;0.02op), but from year 3, this method becomes and remains more efficient than the standard selection (+0.08!P).
The reduced efficiency of the modified method within the very first years is observed for the large major gene effect (AC = 2 or OC = 3), but not for OC = 1. In scheme II (fig lb), with the same parameters, the maximal difference between both methods is lower than that observed in scheme I. When A is dominant, mean genetic value is always higher when applying the modified method, with a nil difference at the beginning that vanishes in the long run (+0.06( 7 p). For A additive, the modified method becomes less efficient than the standard method within the first 25 years of selection (year 17). In the long term (not shown), the modified method becomes less efficient for A recessive but not for A dominant. Lower mean genetic values are observed for the case of A recessive in the first five generations for the modified selection (-0.05o-p) ' The lower efficiency in the long term of marker assisted selection or combined selection when taking into account a major gene, when effects of alleles are additive, is now established (Gibson, 1994;Woolliams and Pong-Wong, 1995). The recessive case is not mentioned in these studies. The relative superiority of one method compared to the other is dependent on the rate of fixation of the favourable allele, but also on polygenic value evolution till fixation. An example is given in figure 2a and b in the case of A recessive and additive with AC = 2, h 2 = 0.3, p = 0.1 and q = 0.1. The polygenic mean increases more rapidly when the standard indexes are applied. This phenomenon is observed for both selection schemes, with a stronger effect in the case of scheme II during the early years. In the case of individual selection and A recessive, this tendency changes after fixation of the favourable effect in the modified method (year 15) giving a faster increase of polygenic values in this modified method as compared to the standard one. When the favourable allele is fixed in the standard scheme, the evolution patterns become parallel. In the case of scheme II, these phenomena do not appear during the first 25 years of selection.

Choice of period length t f
Our criterion is a measure of the weighted surface between both mean genetic value curves, truncated at the final time t f . The criterion C(t f ) reaches its maximum value for intermediate t f , as illustrated in figure 3 for h 2 = 0.3, AC = 2, p = 0.1 and q = 0.10 for A recessive. In this situation, the maximum is achieved at year 12 in the case of scheme I, and at year 22 in the case of scheme II. For A dominant and additive, the maximum is lower and achieved earlier. Figure 3 indicates that including the major gene information in the selection criterion gives a slightly negative result in the very first few years, only in the case of a recessive favourable allele. This is probably due to the nonoptimality of our criterion when considering, in the evaluation of the breeding value I of the future reproducer, the genotypic structure of the contemporary mate population before selection as fully representative of the whole genotypic structure of the dams. An optimal index should take into account the whole future mate population structure. In fact, this negative result in the very first few years appears when the initial frequency of allele A is lower than or equal to 0.1 (not shown) and when allele A is recessive. In this case, the modified method permits selection of AB genotypes instead of BB genotypes, even if their polygenic values are lower. The proportion of AB in mates is not high enough to increase the proportion of AA in the progeny greatly, thus the lower polygenic gain is probably not counterbalanced by the increase of AA genotypes.
In the following discussion, unless otherwise mentioned, the results are given for t f = 10, a period length for which differences between both methods are maximal.

Major gene and polygenic effects
The influence of heritability and major gene effect parameters on genetic progress is described in figure 4a and b, considering an initial allele A frequency p = 0.10 and a selection pressure q = 0.10. The gain C(t f ) decreases when the heritability increases: the greater the extent to which the genetic variation may be explained by the major gene, the more it becomes worthwhile to include the corresponding information in the breeding evaluation. This result was already observed by Smith (1967) who compared selection based on (1) individual performance, (2) known genetic loci and (3) a selection index of (1) and (2) on the basis of their short term responses. Marker assisted selection is also most useful when the heritability of the trait is low (Lande and Thompson, 1990;Ruane and Colleau, 1995;Meuwissen and Goddard, 1996), at least in the short term. _ The effect of the deviation between AA and BB depends on the degree of dominance: the gain G(t f ) is higher with increasing major gene effect when A is recessive, and lower in other situations. The main value of including the genotypic information in the parental value estimation is the possibility of selecting carriers which do not show their superiority when only their phenotypes are considered: this is the case when A is recessive or when A is codominant or dominant but with a small effect. This gain C(t f ) may be quite important when the favourable allele A is recessive (up to 200% in scheme I) but decreases when its dominance over B increases. It becomes nearly nil for full dominance in scheme II. These results, which confirm the previous hypothesis, could be explained by the following arguments. When A is recessive and infrequent, the standard selection has poor efficiency for increase of allele A frequency, for AB and BB have the same value. Thus, they have nearly the same chance of being selected if the number of reproducers to be retained is higher than the number of AA in the candidates. On the contrary, the modified selection distinguishes AB and BB candidates, and thus is more efficient to increase A allele frequency in the short term. That is not the case when allele A is dominant or when its frequency is high enough. This difference is also reduced in scheme II because AB and BB genotypes of the reproducers are more distinct with progeny testing.

Allele frequencies
An illustration of the influence of the initial allele A frequency on the difference in genetic progress between methods is given in figure 5 for scheme I, A recessive, considering an heritability h 2 = 0.3 and a major gene effect OC = 2, reflecting the general findings.
In scheme II, the gain C(t f ) is very low and the differences owing to the initial frequency p are negligible. In scheme I, the gain reaches a maximum for small p values, with the exception of the recessive allele A case where a maximum is obtained for intermediate values (0.10), while no gain is obtained with a very small initial p. This result is due to the curvilinearity of allele A frequency evolution with selection. This is illustrated in figure 6a and b where A is recessive and q = 0.10: for an intermediate period length t f of 10 years, the difference between the standard and modified methods is maximum when p = 0.10. This is not true for a longer period, when the allele A frequencies may differ between the two methods. These results clearly show that the value of putting genotypic information in the parental values is directly related to the acceleration of allele A fixation that it permits, which in turn depends on the starting point p.
Selection scheme and selection pressure Figures 3 and 4b were drawn for schemes I and II. In all the cases studied, the maximum gain C(t f ) was much higher for scheme I. There might be two reasons for this difference: (1) more complete information about the whole genetic value of reproducers was available from the progeny test than from the performance test, thus diminishing the value of including major gene data and (2) the longer time taken by scheme II to take into account the extra information (ie, to increase allele A frequency) on the major gene in parental value evaluation. A comparison based on a longer length period t f should give a higher C(t f ) in scheme II and a lower one in scheme I (cf fig 3). The effect of selection pressure q depends on the degree of dominance, scheme ( fig 7) and initial allele frequencies (fig 5) but in general, it seems to have a very limited influence on the gain C(t f ).

GENERAL DISCUSSION
We found that, in comparison with the traditional breeding value estimation, which assumes polygenic inheritance, the inclusion of information about the genotype at a major locus is valuable in limited circumstances, which could roughly be.defined as those cases in which the standard methods are less effective at fixing the favourable allele (very low A initial frequency, recessivity of A) or when most of the gain comes from the major gene itself (low heritability, short term results). The value of including the major gene information in the selection indexes may be very high in the most favourable cases (200% increase of the genetic gain), but it is more often low or slightly negative.
The situation of additive QTL was studied by others with divergent results.
Negative long term results were obtained by Gibson (1994) and Woolliams and Pong Wong (1995). On the contrary, Zhang and Smith (1992), Gimelfarb and Lande (1994) found positive extra rates of genetic responses with marker-assisted selection based on the use of linkage disequilibrium, a situation much less favourable than ours owing to the progressive disappearance of marker-C!TL associations. However, in all these studies, a diminution of the superiority of the modified methods when considering more generations is constant. The divergences between results may come from the characteristics of the genes studied, number of generations simulated as well as type of modelling used. The higher efficiency of methods accounting for major gene information over standard methods with lower global heritability and higher initial favourable gene frequency was already shown by Smith (1967) for an additive major gene and by many others for additive marker QTL (eg, Lande and Thompson, 1990; Gimelfarb and Lande, 1994;Edwards and Page, 1994;Colleau, 1995, 1996).
Contrary to others, our criterion for evaluating the efficiency of alternative methods did not consider only the mean genetic level or gene frequency at a given time but included the dynamics of the evolution due to selection. We emphasized that most of the selection schemes are able to fix favourable alleles and the differences between schemes are to be appreciated in the way they reach this state. This modelling is classical for the comparison of selection plans and needed for their economic evaluation.
Our model assumed an infinite number of loci and population size and considered only the evolution of major genotype frequencies and mean polygenic values with selection. Linkage disequilibrium between major gene and the polygenes was automatically accounted for in the model, but not the Bulmer effect within major genotype. The corresponding reduction in polygenic variance should occur in both standard and modified selection schemes. Whether this reduction is higher in the modified or standard forms is far from obvious for three reasons. First, as compared to the standard situation, the modified method induced a weaker selection pressure on polygenes within favourable genotypes, and a stronger one within unfavourable genotypes. Second, the individuals of a given genotype may have progeny of different genotypes (BB giving for instance AB offspring, and even AA grand-offspring) with a corresponding redistribution of polygenic variation. Third, the evolutions of allele frequencies at a polygene on the one hand, and of linkage disequilibrium between polygene loci on the other depend on their location relative to the major locus. Thus a full model should describe not only this possible variance reduction but also should deal with the linkage between the major locus and some of the minor loci controlling the trait. In a simulation of standard and modified selection schemes of type I describing the polygenic value as the sum of identified QTL (defined by their location on the genome and allele effect), Fournet et al (1995Fournet et al ( , 1997 did not find any significant difference in polygenic variance reduction between selection schemes. Finally, the Bulmer effect is most important at the beginning of a selection scheme while modification of a selection scheme to account for the segregation of a major gene should occur in an already running scheme, minimizing this effect. This study only dealt with a possible change in breeding value estimations without any modification of the selection plan. The information given by genotypes at a major locus may be used to change the organization of the selection scheme itself. The most effective for schemes type II would probably be a preselection of young males based on their own genotype before (or by replacing) their progeny test: Smith (1967), Soller (1978), Gomez-Raya and Gibson (1993). This kind of preselection was studied when QTLs, indirectly detected through the use of marker information, were known (eg, Soller and Beckmann, 1983;Kashi et al, 1990;Meuwissen and Van Arendonk, 1992;Brascamp et al, 1993). A dynamic model similar to the model used in this paper could yield more information on the efficiency of these new plans.