An approximate theory of selection assuming a finite number of quantitative trait loci

Summary - An approximate theory of mid-term selection for a quantitative trait is developed for the case when a finite number of unlinked loci contribute to phenotypes. Assuming Gaussian distributions of phenotypic and genetic effects, the analysis shows that the dynamics of the response to selection is defined by one single additional parameter, the effective number L of quantitative trait loci (QTL). This number is expected to be rather small (3-20) if QTLs have variable contributions to the genetic variance. As is confirmed by simulation, the change with time of the genetic variance and of the cumulative response to selection depend on this effective number of QTLs rather than on the total number of contributing loci. The model extends the analysis of Bulmer, and shows that an equilibrium structure arises after a few generations in which some amount of genetic variability is hidden by gametic disequilibria. The additive genetic variance V,q and the genic variance V remain linked by: V- /— 1/Lwhere K is the proportion of variance removed by selection, and h the current heritability of the trait. From this property, a complete approximate theory of selection can be developed, and modifications of correlations between relatives can be proposed. However, the model generally overestimates the cumulative response to selection except in early generations, which defines the time scale for which the present theory is of potential practical value.

is confirmed by simulation, the change with time of the genetic variance and of the cumulative response to selection depend on this effective number of QTLs rather than on the total number of contributing loci. The model extends the analysis of Bulmer, and shows that an equilibrium structure arises after a few generations in which some amount of genetic variability is hidden by gametic disequilibria. The additive genetic variance V,q and the genic variance V a remain linked by: V A = % -/ t(l &mdash; 1/L e )h 2 V A , where K is the proportion of variance removed by selection, and h 2 the current heritability of the trait. From this property, a complete approximate theory of selection can be developed, and modifications of correlations between relatives can be proposed. However, the model generally overestimates the cumulative response to selection except in early generations, which defines the time scale for which the present theory is of potential practical value. quantitative genetics / selection / genetic variance Résumé -Théorie approchée de la sélection pour un caractère dû à un nombre fini de locus. Une théorie approchée de la sélection est développée dans le cas d'un caractère quantitatif dont la variabilité génétique est due à un nombre fini de locus génétiquement indépendants. Le calcul est développé analytiquement en admettant que toutes les distributions statistiques peuvent être approchées par des lois normales. L'analyse montre que le comportement global du système génétique dépend essentiellement d'un «nombre e,/!j&dquo;ccace de locus», Le, dont les valeurs vraisemblables sont sans doute faibles INTRODUCTION Models of quantitative genetics are generally developed under the assumptions of the infinitesimal model, which states that a very large number of genetically unlinked loci contribute to the genetic variance of a trait. More precisely, it is assumed that all contributions of individual loci are of the same order of magnitude. This hypothesis ensures that the distribution of breeding values is Gaussian, and validates the whole statistical apparatus that made the statistical developments of applied quantitative genetics possible and its practical achievements. The scope of the present paper is to develop an approximate theory that may cope with more general genetic situations, owing to the introduction of an additional parameter characterizing a quantitative trait. The cases considered in the following involve variable contributions to the quantitative trait of a finite number of genetically unlinked loci. The derivations rely on the hypothesis that all distributions can be approximated by a Gaussian, following the method illustrated by Lande (1976) and Chevalet (1988). This makes it possible to define an analytical theory of selection with a model that seems less unrealistic than the usual infinitesimal hypothesis involving very many unlinked quantitative loci.
Two main qualitative predictions are derived from the Gaussian model: (i) a single parameter, which can be called the effective number of quantitative trait loci (QTL), is a good summary of the distribution of the variable contributions of QTLs to the genetic variance of the trait; and (ii) under continued selection the amount of genetic variability that is hidden by negative correlations between contributions of different loci can be calculated as a function of selection intensity, the current additive genetic variance, and the effective number of QTLs. In addition to analytical derivations, simulations were performed in order to evaluate the qualitative and quantitative importance of departures from normality.

GENETIC MODEL
We consider a diploid monoecious population of N reproducing individuals per generation, with L loci. Let be the genotypes of a male and a female gamete, respectively. The numbers g (-) and g( f ) are defined as absolute effects of the genes carried by the corresponding loci e. These effects are distributed in the population, and their joint distribution is assumed to be multivariate normal. Assuming symmetry between male and female contributions, any value is written in the following way: where g is the mean value of a gamete and y a residual.
Matings are assumed to be random, so that the variance covariance matrix of gene effects in new zygotes takes the form where G = Cov(g(-), 9 (-) , ) = Cov(g( f ), 9 ( f )') is the variance covariance matrix between gene effects of a gamete drawn from the reproducing individuals in the preceding generation.
The value of phenotype P in a zygote with (g( m ), g( f )) genotypic value is assumed to depend linearly on gene effects: where B is a (L x 1) vector. Note that considering several vectors B allows several traits to be considered simultaneously.
The genotypic distribution among the zygotes is given by equation !1!, so that the first 2 moments of a trait P are: where (2B'GB) is the additive genetic variance V A of the trait, and V E is the variance of environmental effects on the trait. Similarly, the genetic covariance between P and a second trait Q characterized by a vector C, is: Cov(P, Q) = 2C'GB.
Under the Gaussian approximation, the genetic modifications induced by selection on the phenotype are calculated from the regression equations, and depend only on the first 2 moments of the phenotypic changes. Thus, the exact selection rule is not important. For example, truncation selection and stabilizing selection with a Gaussian fitness function yield the same predictions provided they are characterized by the same changes in the mean and variance of phenotypes. The relevant parameters are defined as follows, where subscript s refers to values after selection: -the selection intensity i, relating the change in mean phenotypic value to the phenotypic standard deviation the relative change in the variance K Assuming that selection occurs among a large population of zygotes, the values of covariances between gene effects in the selected individuals is: where K e is defined as Then, taking account of gametogenesis, and rej being the recombination fraction between loci and j, recurrence relationships between 2 successive generations (t) and (t + 1) can be derived for the mean and the variance covariance matrix of gene effects (Lande, 1976;Chevalet, 1988 The simulated model shares the same general hypotheses as the analytical scheme (same initial value of heritability, same distribution of the contributions of loci to the genetic variance), but is a completely discrete genetic model.
At each locus, a finite number of alleles are assigned additive effects that sum up to the breeding value of a zygote,' to which a Gaussian random variable is added to simulate the environmental effect. The additive effects of alleles are drawn in the initial generation from a Gaussian distribution, and adjusted to yield the specified heritability and distribution of contributions among loci. The population size is described by 2 numbers: the number of zygotes; and the number N of selected adults. Truncation selection on individual phenotypic values is performed, and adults are mated at random (with selfing occurring with a probability of 1/N).
The genetic make-up of gametes produced by the parents are generated using a pseudo-random-number generator to simulate Mendelian segregations.
Programs allow for various initial distributions of allelic effects within and across loci, several selection rules (truncation selection is used here), and various linkage relationships between loci (fixed at 1/2 in the present work). Outputs from the program include, at each generation, the mean values and standard deviations over replicated runs of the following criteria: mean breeding value; genetic and genic variances, effective numbers of loci (equations [17] and [18] below) and of alleles per locus; mean homozygosity; proportion of fixed loci; and (for models assuming independent loci) the T parameter defined in the following (equation !21!). Onehundred runs were performed for each considered case. Programs were written in Fortran 77 and were run on a UNIX machine.

ANALYTICAL DERIVATIONS
The effective number of QTLs With equal contributions of unlinked loci, equation [9] leads to only 2 equations describing the change with time of 2 macroscopic statistics, the additive genetic variance V,q , and the genic variance V a (ie the sum of the variances contributed by the loci). Removing time indices (the asterisk denoting the next generation), the equations are (Chevalet, 1988): where h 2 is the current value of heritability, h 2 = yA . The genic variance V a Var(P) can be written as: D being the sum of the contributions to VA of the covariances between gene effects at different loci.
In the case of unlinked loci, equation [9] has 2 types, for diagonal terms (r tj = 0) and for non-diagonal terms (rt j = - Directional selection for a trait due to the additive effects of several loci develops negative correlations between the contributions of distinct loci. In the statistical setting of the infinitesimal model, in which loci are not individually considered, this effect has been proven by Bulmer (1971) by considering the regression of the genotypic value on phenotypes after selection. In a very large population, and assuming initial linkage equilibrium, he derived the following recursion (a special case of equations [11] and (12!): He also showed that after a few generations, an equilibrium structure arises, in which the genic variance V a remains equal to the initial genetic variance viQ) and the genetic variance is fixed at a reduced value dependent on selection strength.
The limit values are such that Equation [19] gives the total amount contributed at equilibrium by negative correlations (ie linkage disequilibria) to the genetic variance.
In the first generation, this result can be shown directly by a genetic analysis, under the hypothesis of the infinitesimal model, starting from a model involving multiallelic distributions if the initial population is assumed to be in Hardy-Weinberg equilibrium at all loci, and in linkage equilibrium for all pairs of loci. A more general treatment of the problem is proposed by Turelli and Barton (1990), based on the calculations of all the moments of distributions. However, unless special hypotheses are stated, their approach does not provide explicit recurrence relationships after the first generation. In the present model, the genetic variance decreases to zero as soon as L is finite when selection is active ( K is positive), and if N is finite selection accelerates the fixation process (Chevalet, 1988 the recursion in T is obtained as follows (discarding time indices as before): The numerator can be written as: In the denominator, VI is written as thus: The recursion in (Va,T) can then be derived using function F (equation !22!) and assuming either that the phenotypic variance is constant (Var(P) = Vp), or that the environmental variance (V E ) is constant. In the latter case Var(P) and Var * (P) are written as F(V a , T) + V E and VI + V E using expressions [22] and [23!.
In the case of constant phenotypic variance Vp, we obtain the system: Written in this way, it can be seen that V a is a slowly varying expression, for N and L e not too small, while T reaches the neighborhood of a limit T in a few generations: This yields equation [20] above. In fact, as is done in Appendix, we can show analytically that T reaches the neighborhood of 1 within 4 to 5 generations; after this first step, the convergence to T may be rather slow and depends on the relative values of K , L e and N (numerical calculations). The same occurs for both models of phenotypic variance (constant phenotypic or environmental variances), with the same limit T and the same kind of convergence.

An approximate complete solution
The analysis of the model can be further developed, owing to the reduction to 2 equations, and even to a single equation. Indeed, since T reaches its limit in a few generations, replacing T!t! by T in equation [21]  Integration gives the (scaled) time u 2 -u i corresponding to a reduction in the genic variance, from Va( t ') to U!t2!. The result is easily obtained by changing the variable V a to W = F(Va,T) (W is used here instead of V A to avoid confusion between the true value of genetic variance and its approximation). The differential equation becomes and the solution is which gives W 2 as an implicit function of W l and of (u 2 -Ul ). Using this approximation, an equation for the cumulated response to selection can be derived.
The cumulative response from time t, to time t 2 is Changing V A to its approximation W, and replacing the sum by an integral, we can write: where This integral can be readily calculated by rewriting equation [33] as so that, taking account of equation !34!, 1(u l ,u 2 ) is The set of equations [34]-[36] provides an explicit solution to the problem, which can be completed by a further equation giving, from equations (10!, the variance of the response due to sampling. Comparing the numerical results obtained with this continuous approximation with that derived from the iterative use of equations (11! and [15] shows a very good agreement, as far as the comparison does not involve the first generations if initial conditions assume linkage equilibrium (V A = E). In the latter case, the continuous approximation underestimates the initial response to selection. Although it is derived under the hypothesis that N and L e are rather large, the approximation is still correct for values of L e as small as 5.
A similar analysis can be carried out for the model assuming a constant environmental variance, rather than a constant phenotypic variance.

DISCUSSION
The preceding calculations show that the dynamics of the multilocus system considered can be described by introducing a single additional parameter (the effective number of C!TLs), as compared to the standard statistical setting of quantitative genetics. The result holds as far as only macroscopic properties of the system are considered, and for a limited number of generations, because the nonlinear features of the system (equation (9!) do not allow the derivation of a uniform upper boundary for the deviations of individual locus contributions from their mean values. This parameter, L e , also allows the structure of genetic variance to be predicted, according to the generalization of Bulmer's result to a finite population and to a finite number of (aTLs (equation (20!).

The number of (!TLs
Recent results of QTL detection, mainly in plants, suggest a rather small number of loci contributing a significant part to the genetic variance of quantitative traits. This does not mean that a few genes are involved in the make-up of the trait, but that only loci contributing a rather large genetic variance can be detected by segregation analysis. A simple way to describe the distribution of individual contributions of loci may be to consider them as pertaining to a geometric series of ratio x (x < .1). Considering the case of an initial generation, in which selection has not yet developed correlations between loci, the individual contribution of locus i can be written as: so that the total genetic variance is From equation [17] this yields: giving quite low values that hardly depend on the actual number L. For example, this equation gives L e about 3 for x = 0.5, L e about 10 for x = 0.8, and L e about 20 for x = 0.9. If an arithmetic series is assumed, the ratio L e/ L remains larger.
Writing: yields Simulation runs were performed with different actual numbers L of loci sharing the same value of L e , according to equations [38] and [40] (uniform distribution, arithmetic, and 2 geometric distributions). They result in parallel evolutions concerning the genetic variances V A (fig 1), as well as for the genic variance V a and the cumulative response to selection (not shown).
The validity of this parameter as a predictor of the dynamics of the system depends on it being constant over generations. Numerical calculations do not indicate any significant departure of L e from its initial value. Simulation results show how L e , as estimated each generation from the simulated data (from equation [17]), changes with time (fig 2). Unless the size of the population is quite small and selection intensity is high, it is seen that L e is slowly varying; at least, comparison with figure 1 indicates that this parameter changes with time more slowly than the genetic variance. The changes of L e with time occur after a first period while it is almost constant, in the cases when the initial distribution of contributions of loci to the genetic variance is either uniform (leading to a decrease of L e ) or highly variable (geometric series with ratio x < 0.8) in which case L, increases. Conversely, slightly variable contributions (arithmetic series, or geometric series with x about 0.9-0.95) lead to quite stable L e values. Changes are more significant as selection intensity is greater. Moreover, it seems from simulations that this parameter may be very sensitive to population size, suggesting that a large effective number of (aTL's cannot segregate simultaneously in a small population under strong selection. Thus, the approximate analytical derivations, as well as the simulations, indicate that L, is a significant parameter. Even if the absolute values obtained for these quantities do not generally fit theoretical predictions very well, the previous results indicate that L e , as defined in equation !17!, controls the sensitivity of the genetic system to the number of QTLs. Moreover, it is a parameter that can be estimated. Firstly, the present L e definition is the same as that obtained with the classical design using a cross between inbred lines, and comparing the variance in F 2 to the differences of parental lines (Wright, 1968;Lande, 1981). Secondly, methods based on segregation analysis of many linked genetic markers will provide data giving the distribution of the most important G!TLs segregating in populations and their ' contributions to the genetic variance. Even if these results provide new tools for selecting individuals on a true genetic basis, knowledge of L e remains of interest for analysing performances in populations, as soon as selection has been practised.
Genetic structure under selection Equation [20] shows that under continuous selection, a genetic structure arises that characterizes an internal equilibrium between selection and recombination. In fact, this relationship holds exactly in the equilibrium state considered by Lande (1976) in the framework of a model involving selection and mutations, when recombination fractions are set to 1/2. It can be also seen as a kind of quasi-linkage equilibrium, a situation encountered in more general models under weak selection (Barton and Turelli, 1991). In contrast with these exact but asymptotic results, the present result holds very early in the process of selection (fig 3 and Appendix), then holds during the whole process. Moreover it does not require a weak selection. It is approximate but quite accurate, and probably more meaningful in the context of experimental or applied quantitative genetics than asymptotic results which may be more relevant to evolutionary problems. This relationship is also dependent on the effective number of C!TLs. It is illustrated in figure 3, in which theoretical predictions are compared to the results of simulations. Firstly, it turns out that Gaussian predictions of genetic variances are satisfactory during several generations, and more so as population size is greater and selection intensity lower. It seems however that the theory underestimates the amount of 'hidden' variance in small populations, which is expressed by an estimated value of T larger than its theoretical value. Even with very few QTLs, approximations are good for large population sizes and medium selection intensity, the greatest departures between analytical predictions and simulations appearing for high selection intensities. Secondly, the prediction concerning the organization of genetic variability, as given by equation !20!, is more robust than that of genetic variances themselves. The agreement between theory and simulations is observed during more generations, although an increasing variability of the estimated T parameter is observed when significant departures between theoretical and observed variances arise (the estimated variance of T between replicates then shows a sharp increase). Such a structure of genetic variability under selection implies some changes in the partition of genetic variance among groups of related individuals. For example, within the framework of the simple population model considered until now, the partition of genetic variance into full-sib covariance (C FS ), and within-family variance (V w , which is equal to half the genic variance for unlinked (aTLs), can be written as follows, in the population of new unselected zygotes: CONCLUSION In this paper, the genetic model is restricted to a set of unlinked loci. Although it may be stated that any trait is due to many loci spread along the chromosomes, considering a finite set of unlinked loci may not be completely unrealistic. The first experimental results on the localization of GZTLs in tomato suggest that a few main regions are involved, and that these regions may in fact include several closely linked genes. The system made up of unlinked loci with many alleles (modelled by a Gaussian distribution of effects) may thus be a correct first-order approximation of a genetic system involving several clusters of genes on different chromosomes, provided no long-term prediction is required. Indeed, simulation runs involving either several unlinked loci with many alleles taken from a normal distribution, or several clusters of tightly linked loci with only 2 alleles, lead to very similar responses to directional selection (not shown). However, this is no longer true in later generations, when many recombination events within clusters generate 'new' alleles, and lead to responses that are unpredictable in the framework of the present theory (Hospital, 1992;Hospital and Chevalet, 1993).
Investigating the effects of the distribution of quantitative loci on the response to selection could be performed here owing to the use of the Gaussian approximation.
It is well known that Gaussian distributions are not robust under Mendelian segregation (Felsenstein, 1977), and that, even if a Gaussian distribution is a correct approximation at some time, selection promotes asymmetrical distributions needing higher moments to be included in the recurrence relationships (Turelli and Barton, 1990). When looking at the simulation results, we can see that equation [20] remains correct, but that the reduction of the genic variance V a under selection is underestimated. This suggests that the Gaussian expression of regressions of genotypes at individual loci on phenotypes is the main incorrect step of the approximation. In fact it is clear that (in a true genetic model) selection shifts allele frequencies and develops asymmetrical distributions which cannot be handled in the framework of Gaussian distributions. Developments of distributions in Gram-Charlier series could be investigated, but such developments would not allow the joint distributions at several loci to be made explicit.
The interesting result obtained here is that several qualitative features of the genetic system are correctly taken into account by the Gaussian approximation. Mainly, the analysis introduces a new macroscopic parameter (the effective number IA, August 16-21, 1976

Firstly, equation [26] is rewritten with the new variable
Then it is shown that the series I U t is bounded by a positive series Z t converging rapidly to a small value.
The following notation is introduced for the current value of heritability The function (Z &mdash;! cp(Z)) is positive and monotone increasing for (0 < Z < 1); its value for 0 is about a/2. There are 2 roots of the equation (Z = !o(Z)) in !0,1!; one is about a, and the other can be shown to be larger than 1/2, provided that K is not too large (for large N and L e , the condition is: cp(1/2) < 1/2 if K < 24 -4). Moreover, the derivative of cp at the smallest root is about 1/2, so that any series (Z t ) with an initial value such that 0 < Zo < 1/2 converges rapidly to this root. Then it follows from equation [49] that, if for some time v the inequality IUv l < 1/2 holds, the series (I[ Tv +tl) is smaller by the series ( Zt ) defined by and that the series T!t! reaches the neighborhood of T within 4-5 generations.