Prediction of the response to a selection for canalisation of a continuous trait in animal breeding

Prediction de la reponse a une selection canalisante d'un caractere continu en genetique animale. Le probleme de la selection canalisante est traite grâce a un modele heteroscedastique mettant en jeu une valeur genetique pour la moyenne et une valeur genetique pour le logarithme de la variance, toutes deux associees a une seule valeur phenotypique. Pour un objectif de selection visant a minimiser l'esperance des carres des differences entre le phenotype et l'optimum, pour un descendant d'un candidat a la selection, des index sont estimes et des expressions approchees de la regression parent-descendant sont calculees. La precision de ces expressions analytiques est mesuree a l'aide de simulations. Afin d'apprehender la capacite de ces populations a etre canalisees vers un optimum economique, des exemples sont donnes: le rapport entre matiere grasse et matiere proteique du lait de chevre, et le pH d'un muscle chez le porc.

Résumé -Prédiction de la réponse à une sélection canalisante d'un caractère continu en génétique animale. Le problème de la sélection canalisante est traité grâce à un modèle hétéroscédastique mettant en jeu une valeur génétique pour la moyenne et une valeur génétique pour le logarithme de la variance, toutes deux associées à une seule valeur phénotypique. Pour un objectif de sélection visant à minimiser l'espérance des carrés des différences entre le phénotype et l'optimum, pour un descendant d'un candidat à la sélection, des index sont estimés et des expressions approchées de la régression parent-descendant sont calculées. La précision de ces expressions analytiques est mesurée à l'aide de simulations. Afin d'appréhender la capacité de 1. INTRODUCTION Production homogeneity is an important factor of economic efficiency in animal breeding. For instance, optimal weights and ages at slaughtering exist for broilers, lambs and pigs, and the breeder's profit depends on his ability to send large homogeneous groups to the abattoir; optimal characteristics of meat such as its pH 24 h after slaughtering exist but depend on the type of transformation; ewes lambing twins have the maximum profitability while single litters are not sufficiently productive and triplets or larger litters are too difficult to raise; with extensive conditions where food is determined by climatic situations, genotypes able to maintain the level of production would be of interest. Hohenboken [22] listed different types of matings (inbreeding, outbreeding, top crossing and assortative matings) and selection (normalising, directional and canalising) which can lead to a reduction in trait variability.
Stabilisation of phenotypes towards a dominant expression has been known for a long time as a major determinant of species evolution, similarly to mutations and genetic drift (e.g. [4] for a review). Different hypotheses explaining these natural stabilising selection forces have been proposed (2, 3, 8, 15, 16, 19, 27, 38, 45-47, 49, 52!. A number of models assume that trait stabilisation is controlled by fitness genes (e.g. [9] for a review), which keeps the mean phenotype at a fixed 'optimal' level, without a necessary reduction of the trait variability. Alternative hypotheses were proposed for canalisation; for instance Rendel et al. [32,33] assumed that the development of a given organ is under the control of a set of genes, while a major gene controls the effects of these genes within bounds to keep the phenotype roughly constant. Whatever its origin stabilisation is to be related to the environment(s) in which it is observed, which makes it essential [48] to distinguish stabilisation of a trait in a precise environment (normalising selection) from the aptitude to maintain a constant phenotype in fluctuating environments (canalising selection).
Various artificial stabilising selection experiments have been carried out with laboratory animals: drosophila [17,23,29,30,34,40,41,44,48], tribolium [5,6,24,43] and mice [32]. Most often, selection was of a normalising type with a culling of extreme individuals, this selection being applied globally [5,29,30,41,43,44], within family [24] or between family [6,34]. Canalising selection was experimentally applied by Waddington [48] and by Sheiner and Lyman !40!, their rule being the selection of individuals less sensitive to breeding temperature and by Gibson and Bradley [17] who applied a culling of extremes in a population bred in unstable environment (fluctuating temperature). Some general conclusions from these experiments may be proposed: 1) very generally, stabilising selection is efficient, leading to a strong diminution of phenotypic variance; 2) heritability estimations during and at the end of the selection experiments often showed that the selected trait genetic variance decreased, this conclusion not being general; 3) in many cases it was possible to prove that the environmental variance, or the sensitivity of individuals to environmental fluctuation, was reduced by selection.
In this paper we investigate mathematical tools for the evaluation of the possibility and efficiency of organising canalising selection in animal populations. Existence of a genetic component in variance heterogeneity between groups is a prerequisite for such a selection goal to be feasible. Statistical modelling and estimation procedures have been developed to take account of variance heterogeneity (e.g. [10, 11, 35, 36!), in particular using a logarithmic link between variances and predictive parameters [12, 13, 39!. In the following, we extend such models by introducing a genetic value among these parameters, consider the possibility of estimating this new genetic value, then discuss the efficiency of selection based on this model. Although our objective is to apply such methodologies to continuous and discrete traits, we first concentrate here on continuous traits. Applications to artificial canalising selection towards an economic optimum in goat and pig breeding are given.
2. GENETIC MODEL 2.1. Building of a model Our approach was motivated by the extensive literature mentioned in the Introduction, and in particular the paper of Rendel and Sheldon [34] shows that artificial canalising selection does work, in the sense that the population mean reaches the optimum and, more importantly, the environmental variance is reduced. Some individuals are less susceptible to environment than others, this particularity being genetically controlled, since it responds to selection. Some genes are now known to control variability, e.g. the Apolipoprotein E locus [31] in humans, the Ubx locus in Drosophila [18], the dwarfism locus in chickens (Tixier-Boichard, pers. comm.), and some (aTLs with effects on variance are already suspected !1!. Like Wagner et al. [50] in their equation 7, the effect of polymorphism at a given locus on the environmental variance may be expressed by a genotypedependent multiplicative factor for this variance. The same hypotheses (in particular no interactions between genes) and reasoning as in the Fisher model allow the previous one-locus model to be extended to a polygenic or infinitesimal model, in which each individual has a genetic value governing a multiplicative factor for the environmental variance. Since the analysis needs the evaluation of phenotypic variances associated with genetic values, it must be based on experimental designs allowing for the repeated expression of the same or of closely related genetic values. Although not necessarily efficient, any population scheme might be considered, but we focus here on two simple situations, repeated measurements on a single individual, and evaluation of one individual from the performances of its offspring.

Animal model: basic model
A model linking a phenotype y j of a given animal (from repeated phenotypes y = (!1, ..., yj , ... , yn ) ) with two genetic values u and v is considered. According to the infinitesimal model of quantitative genetics, these genetic values u and v, possibly correlated, are assumed to be continuous normally distributed variables, and contribute to the mean and to the logarithm of the environmental variance. The simplest version of the model can be written as: where p is the population mean and the population log variance mean, while: and the Ej s are independent identically distributed N(0, 1) Gaussian variables, independent of u and v. Additive genetic variances are denoted by afl and a V ' 2 and r is the correlation coefficient between u and v. The distribution of the conditional random variable Ylu, v is Gaussian ./1!(! + u, exp(! + v)), but the unconditional distribution of Y is not. The unconditional mean and variance (the phenotypic variance or y 2 of the random variable Y are equal to Note that the v genetic value and its variance o, are dimensionless; exp( 77 ) has the same units as the phenotypic variance, and exp(w/2) is the average (genetic) scale factor of the environmental variance.

Animal model: extensions
More general formulations of the model are needed to cope with real situations. First, introducing permanent environmental effects (denoted by p and t) common to several performances of the same individual is necessary to take account of non-genetically controlled correlations, both on the mean value -as it is usual to deal with repeatability -and on the log variance of the within performance environmental effect. Thus, the jth performance of an individual is modelled as: where (u, v), (p, t) and follow independent Gaussian distributions: the bivariate normal (2), a similar bivariate distribution with components o, 2, at and correlation p, and A!(0,1), respectively. When q individuals are measured in several environments, a more general heteroscedastic model can be stated as: ,-/ where yg is the jth performance of a particular animal in a particular (animal x environment) combination i. This full model (6) is a generalisation of model (1) introducing environmental and genetic parameters to be estimated: location parameters ({3, u, p) and dispersion parameters (6, v, t) with incidence matrices (x i , z i , z i ) and (q i , z i , z i ), respectively. Vectors u, p, v and t have the same length q. !3 and 6 denote fixed effects, while u, v and p, t are random genetic and random permanent environmental effects attached to individuals, respectively. The vectors of genetic values u and v have then a joint normal distribution: where &copy; denotes the Kronecker product and A is the relationship matrix between the animals present in the analysis. Permanent environmental effects p and t are similarly distributed as: where I is the identity matrix, independently of (u, v).
This general way of setting up the model needs, however, some caution when applied to actual data, to assess which parameters are estimable, taking account of the structure of the experimental design. Specifically, analysing a possible genetic determinism of heteroscedasticity needs a sufficient number of repeated measures to be available for the same (or related) genotypes.

Sire model
In a progeny test scheme, the phenotypic values attached to an individual are the performances of its offspring. From the previous animal model, the performance y2! of the jth offspring of sire i can be written as follows, conditional on the genetic values u i and v i of the sire and assuming unrelated dams: It is assumed here that the terms aZ! and { 3 ij include the genetic effects in offspring not accounted for by the part transmitted by the sire. Permanent environmental effects in the offspring (the p and t variables of model 5 are possible. This can be rewritten as with E'( Etj ) = 0, Var(e!) = 1. The distribution of e! is only approximately normal N (0,1). Models (9) and (10) are not strictly equivalent, but, since the first two moments of y j are equal under both models, they are equivalent in the sense of Henderson [21] (see e.g. [37] for an application of this concept). For example, for large numbers of offspring per sire, the mean sire's performances and sample within sire variances have asymptotically the same structure of variances and covariances between relatives under both models.
The corresponding generalised approximate sire model is written as with the joint densities (7) for u and v, and (8) for p and t.
Methods needed to estimate parameters are outlined in Appendix A. In particular, they allow the genetic values of individuals to be estimated, as the conditional expectations of genetic values, given observed phenotypes y: h = E(u!y) and v = E(vly), if variance components are known. Estimation of variance components was similarly developed to make the method possible to apply.
In the following we first focus on developments of the basic model, which is simple enough to derive approximate analytical predictions of the response to selection and to compare several selection criteria. In a second step we check the validity of the theoretical approach by means of simulations and test the ability of the extended models and corresponding numerical procedures to tackle actual data and evaluate the potential for canalising selection.

Objective and criterion
One objective that summarises the breeding goal (progeny performances close to the optimum and with low variability around it) is the minimisation of the expected squared deviation of offspring performances from the optimum y o . This is the one we have chosen. For an individual characterised by a set y of performances (on itself and on its relatives), a selection criterion is defined as the expectation of the squared deviation E !(Yd -yo)2lyJ of offspring performance Y d , conditional on y, and selection will proceed by keeping individuals with minimal values of this index, such that: is lower than a threshold t(z) depending on the chosen selection intensity t.
In classical linear theory, it is equivalent to giving an individual a merit with respect to the selection objective, defined as the expectation of its offspring performance, or to consider its genetic value u, since the former is just equal to half the latter. Breeding animals are ranked according to their estimated genetic value.
In the present context, due to the non-linearity of the model, we define, for a candidate to selection with given genetic values u and v, its merit for canalising selection as the expected squared deviation of an offspring performance: Its conditional expectation E(M * ly) is equal to the index With complications due to the non-linear setting of our model, we derive in the following the mean and variance of an individual's phenotype distribution, conditional on the performances of a relative.

Conditional mean and variance
We need the distribution of a phenotype Y d of a progeny d, given perfor- where a is the relationship coefficient between animals F and d (a = 0.5 if d is the progeny of F).
The density f (yd!y) describing the distribution of Y d , conditional on y cannot be explicitly derived, but its moments are calculable or can be approximated. We have: This is first integrated over y d , owing to then with respect to u d and v d with and finally the distribution of u and v conditional on y is approximated as: General formulae can be derived that take into account all performances of the whole pedigree, not only performances of a single relative. The explicit forms of the extensions of equations (18) and (19) are given in Appendix B.
The combination of equations (18) and (19) gives the index I * (y) in equation (14), equal to the conditional expectation E(M*!y) of the genetic merit M * , as in Goffinet and Elsen !20!. can be considered as a genetic merit referring to a candidate's own value and not as in equation (13) to that of a future offspring. The mean fitness of the population is the proportion of selected individuals: where We obtain the distribution of genetic values among selected parents: Since genetic values are transmitted linearly to the offspring, the genetic responses to selection, R(w,u) and R(w,v), are the differences of expected genotypic values u and v, respectively, between candidates and selected individuals (assuming that selection occurs in a single sex, only half of this progress is transmitted to the next generation): where wg refers to equation (26). The  This leads to, if r = 0, where V stands for exp(? l + a!j2). If genotypes are correlated, an extra term 2rau av V (p,y o + 4 ra u av) is added to the numerator, and 4(l + n)r OUUV V 2 1 -tyo + ! 2 rauav ) is added to the denominator.
The response to selection can be written as: i.e. as the product of selection intensity (1 = ! ) , of a realized heritability, the B tf7 ratio b(w, II) defined in equation (34), and of the standard deviation Qn of the selection index.

Sire model
As for the individual model, the genetic merit for the sire model is defined as: and the fitness The expectation E(M) = E[I(Y)] is the same as given in equation (27). The response to selection in the trait II(Y) among male parents is and the selection differential is The regression coefficient b giving the response to canalising selection in a progeny test scheme is equal to the ratio of (36) to (37). Figure 1 plots the response given in equation (36) in units of selection intensity and phenotypic variance, from an equation similar to equation (35).

Extensions
The previous exact results, obtained using the fitness function (23) and analogous for the sire model, hold for weak selection, and their expressions as ratios of a covariance to a variance indicate that they can also be obtained from a linear approximation. This comment makes it possible to extend easily the approximate prediction of response in cases when different weights are given to the variance of performances and to their deviation from the optimum. Considering the animal model with repeated measurements (5), let us denote II 1 ( Y ) = (y -YO ) 2 , II 2 ( Y ) = Sy, the two components of II = (II l (y),II 2 ( Y ))&dquo; s = ( 81 , S2 )' a vector of selective values, a = (cr l , cr 2 )' a vector of weights. We are interested in the response for the trait a'II, when using the index s'll as selection criterion. The parent-offspring regression is equal to where G and P are 2 x 2 symmetric matrices of elements introducing the following notations h2 = or2 2 , c 2 -(or2 + 2 2 A = (y -y o )lay. From equation (38), parent-offspring regressions for the mean and for the variance can be written separately. With s i = 0 and a 1 = 0 for instance, b tends to as n tends to infinity and if at' = 0. This parent-offspring regression is lower than a half, and tends to 1/2 as afl tends to zero.
Note that the parent-offspring regression for y is which tends to 1/2 as n tends to infinity and if Q p = 0. j-i index, then the variance term P!2 = Var(II2 2 ) is proportional to When o, = 0, the response in IIZ is null and the selection differential is equal to 2/(n -1), taking into account n -1 degrees of freedom. For n = 2, it corresponds to the variance of the trait (Y -y) 2 (squared deviation from the mean), up to a multiplicative term. When afl = 0, the response in Y is null and the selection differential is equal to V/n, More generally, this extension shows that a selection index (weights s = (s l , s 2 )') can be adjusted to optimise the response in a given objective specified by weights a = (a l , a 2 )'.

Simulations
Simulations were used to check the accuracy of previous analytical expressions of response as proposed in equations (34)  Simulations were restricted to the case of the sire model with no genetic correlation (r = 0).

Selection scheme
The selection scheme was as follows.
1) Genetic values of sires and dams of the base population were randomly drawn from the joint distribution (7) with no relationships (A is the identity matrix), giving the sets {(ui, vi), i = 1, ... , ,S} for the sires, and (u j , v j ), j = 1, ... , D} for the dams.
2) Sires and dams were mated at random. 3) For each couple (i,j), the performance y2! of a daughter was generated according to: where Etj , aZ! and {3 ij were drawn from the Gaussian distributions N(0,1), jV(0,o'!/2) and N(0, a!/2), respectively. The terms a2! and {3 ij represent Mendelian sampling. 4) An index for each sire was computed and elite sires were selected. 5) The elite sires produced S sons with the same female cohort used in steps 1-2. Step 2 (with sons of step 5 and daughters of step 3) to step 5 were repeated until the 10th generation.  Figure 3 plots the evolution of phenotypic means and standard deviations over generations of canalising selection. Several aspects appear: -with a high heritability h!, the population mean tends in a linear manner towards the optimum in a very efficient way; -the convergence of the mean is slightly better if w is low; -the decrease in phenotypic variance has a linear tendency, although more fluctuating than the evolution of the mean; -this decrease is even more evident as Q v is higher and h2 is lower. This general balance was encountered throughout the simulation experiments: a particular aspect was maximally improved when the other aspects were not under selection pressure. Variances are best reduced when the population mean is at the optimum. The optimum is more rapidly reached when no genetic variability of the variances is present. Figure 4 compares the performances of the two indices I and T. The likelihood based index gives more efficient results for the trait mean p t , probably because heterogeneous variances were taken into account in the evaluation of the animal genetic values u, giving less biased estimates. On the contrary, the phenotypic variance QY is best reduced with the simplified index, presumably due to the lack of robustness of v estimation by maximum likelihood.
A full Bayesian estimation procedure with marginal posterior expectation of parameters might be more appropriate. It was nevertheless not performed because of the heaviness of the algorithm, since numerical integrations are then needed. The two indices give, however, equal values of the global criterion (p't -yo) 2 + 0 ,2 yl t at any time t.
The phenotypic variance and squared difference between mean and optimum are lowered more and more as selection intensity is increased, while the parentoffspring regression remains constant in the simulations as in the approximate theory (not shown). The introduction of a log linear model is an easy way to handle a multiplicative model on the variance. It is known that the distribution of InS 2 , the logarithm of the sample variance estimator, is approximately normal (e.g. !25!). Similarly, Bayesian considerations on prior/posterior densities show that the Gaussian distribution is a good approximation to a log inverted chi-square (see [13]). This led us to focus all analytical derivations on the first two moments of distributions, assimilating when needed any distribution to the Gaussian distribution sharing these same moments. Although this may be a crude approximation if it is used for prediction of genetic response over several generations, it allows first order solutions to be derived, and makes it possible to build statistical evaluation procedures.
The model allows estimation of the importance of genetic determinism in the heterogeneity of variances, and hence prediction of how the population may respond to selection against variability. For example, the proportion of the selection response due to the genetic variability in the v-component is given by the ratio where the Gs are given in equation (40). It is all the more important as the population mean is closer to the optimum, the u-genetic variance is lower, and the v-genetic variance is larger.
Estimation of genetic parameters (or u 2, r, av 2) may be somewhat imprecise, especially for u2 and r. Hence it may be worth considering the robustness of predictions with respect to badly known parameters. As far as a simple global criterion is used, the question can be dealt with easily, considering the expected responses as functions of parameter values. The situation would be more difficult to handle for selection schemes that would rely on the knowledge of parameter values, for example if a balance between selection for the mean or for the variance were adjusted each generation.

Data
The generalised version of the sire model (11), including fixed and random permanent environmental effects, was applied to actual data in goats (dairy production) and in pigs (pH of muscles after slaughtering).

Milk data
Protein and fat contents were measured on milk from 2 383 first lactation goats between 1992 and 1995. The goats were daughters of 54 artificial insemination sires, with 20 observations at least in the data set. The trait of interest is the ratio of fat to protein contents, with a desired optimum equal to 1.3. This objective would be complementary to yield traits such as milk yield or protein yield. The phenotypic mean and variance are equal to 1.1 and 0.0135, respectively, i.e. the population mean is 1.7 phenotypic standard deviations away from the optimum. Data are normally distributed. For computational ease, data were pre-corrected with the additive model including herd, season, lactation length and age, on a much larger data set including all lactations of all herds where the 2 383 kept daughters had been producing. The variance components were estimated, leading to a null correlation coefficient (r -0) and zero variability variance (Qv -0), and a heritability h! = 0.44 of the same order as those for the protein and fat.
A canalising selection experiment is expected to drive the population mean rapidly towards the optimum, but without change in environmental variance. For example, assuming selection of the best 10 % of sires, a reduction of 1.5 phenotypic standard deviations of the population quadratic deviation (!c-yo)2 2 would be expected in one generation. 5  With an optimum value y o = 5.7 not different from the overall mean p = 5.75, the estimated variance components should allow a high response to canalising selection to be obtained through a strong reduction of the genetically controlled part of environmental variance: assuming that selection sorts out the best 10 % of male parents, a reduction of about 12 % of the initial phenotypic variance in one generation. A null correlation would give a reduction of 11 % (figure 1).
It must be stressed that predictions derived from the above analysis of fat to protein ratio in goats and of pig pH muscle data are only indicative. For example, the effect of a wrongly estimated correlation value r remains to be assessed, even ifin the goat exampleno significant genetic component of variance was found for variances. Also, although precision of the previous early estimates was not evaluated, larger data sets are probably needed. A proper prediction of expected response to selection cannot be proposed until these analyses are carried out.
So far we do not have results from an actual selection experiment, based on our index selection rules, which would be necessary to completely validate the approach through the comparison of observed realised heritabilities with our predictions. It is one of the perspectives of the current work to organise such selection experiments.

Selection criteria .
We have considered a single global criterion that combines selection for the mean and selection against the variance of the trait.
Shnol and Kondrashov [42] considered the action of selection with fitness w(y) on a quantitative trait y. They concluded that truncation selection minimises the genetic load and the variance of the trait after selection. Linear selection (corresponding to our continuous fitness with low selection) gives minimal variance of the relative fitness and is less efficient than truncation selection. However, linear selection gave us the opportunity for robust analytical approximations of realised heritability. Calculations were impossible for truncation selection, even with the simpler index. Within the limits of the present comparisons with simulations, the fitness approximation proved useful, even in cases with strong departure from linearity, and with a rather strong selection intensity (proportion of selected individuals equal to 20 %).
More sophisticated selection criteria may be defined, allowing selection to be differentially directed towards changing the mean value of the trait or reducing the environmental variance. In fact using a global genetic merit to be maximised in the next generation is a way to distribute selection intensity between both parameters. It is possible that a higher multi-generation response could be obtained if selection were controlled each generation in view of the objective. For example, the index (y -YO ) 2 + S; can be generalised into 81(Y -YO ) 2 + S2'S'!, allowing a greater selection pressure either on the location near the optimum, or on the dispersion, as illustrated in the above theoretical section. The same remark is available for the index (.E(y!y) &mdash; y o ) z + Var(Yd!y). More generally, the selection criterion might be based on the economic worth of offspring. The criterion would then be defined as the expected economic value of offspring, a function depending on the distribution of expected phenotypes and on the economic value of phenotypic values. But of course other types of indices and mating systems are potentially interesting to consider, for instance a linear index when Q v is small, mate selection or group selection.
Managing the balance between location and scale could be interesting in a long-term selection process, provided some analytical approximation is available in order to include one-generation expected response in a dynamic programming approach. Evaluation of the approximation for mid-term objectives remains, however, to be considered. While the present paper focused on shortterm selection (one generation), such developments would require some analytical approximation of the response during several generations. At variance with the present work, changes in genetic variances and covariances should be taken into account. Further research is needed in this area, keeping in mind that the approach used, according to which most distributions are approximated by the Gaussian ones that share the same first and second moments, is known to be a rather poor approximation in genetic models as soon as multi-generation problems are considered. It may be, however, a useful approach for predictions over five to ten generations ( !7! ).
Another extension of this work concerns discrete characters. For example, a concrete demand of sheep breeders is obtaining exactly two lambs per lambing, with reduced variability around this economic optimum (SanCristobal-Gaudy et al., in prep.). More generally, the innovation of this work -the introduction of two groups of polygenes, possibly not independent, acting respectively on the trait mean and log variance -could be useful in other areas of applied quantitative genetics in which heterogeneities of variance arise. Also, while the two sources of genetic variability were studied within the framework of the infinitesimal model, extensions might include major genes which control either the mean or the variability of a trait. For example, using the present setting, a segregation analysis could be conducted to decide whether polygenes and/or major genes act on the log variance, as was carried out for the mean [26].