Bayesian analysis of genetic change due to selection using Gibbs sampling

Summary - A method of analysing response to selection using a Bayesian perspective is presented. The following measures of response to selection were analysed: 1) total response in terms of the difference in additive genetic means between last and first generations; 2) the slope (through the origin) of the regression of mean additive genetic value on generation; 3) the linear regression slope of mean additive genetic value on generation. Inferences are based on marginal posterior distributions of the above-defined measures of genetic response, and uncertainties about fixed effects and variance components are taken into account. The marginal posterior distributions were estimated using the Gibbs sampler. Two simulated data sets with heritability levels 0.2 and 0.5 having 5 cycles of selection were used to illustrate the method. Two analyses were carried out for each data set, with partial data (generations 0-2) and with the whole data. The Bayesian analysis differed from a traditional analysis based on best linear unbiased predictors (BLUP) with an animal model, when the amount of information in the data was small. Inferences about selection response were similar with both methods at high heritability values and using all the data for the analysis. The Bayesian approach correctly assessed the degree of uncertainty associated with insufficient information in the data. A Bayesian analysis using 2 different sets of prior distributions for the variance components showed that inferences differed only when the relative amount of information contributed by the data was small.

Inferences are based on marginal posterior distributions of the above-defined measures of genetic response, and uncertainties about fixed effects and variance components are taken into account. The marginal posterior distributions were estimated using the Gibbs sampler. Two simulated data sets with heritability levels 0.2 and 0.5 having 5 cycles of selection were used to illustrate the method. Two analyses were carried out for each data set, with partial data (generations 0-2) and with the whole data. The Bayesian analysis differed from a traditional analysis based on best linear unbiased predictors (BLUP) with an animal model, when the amount of information in the data was small. Inferences about selection response were similar with both methods at high heritability values and using all the data for the analysis. The Bayesian approach correctly assessed the degree of uncertainty associated with insufficient information in the data. A Bayesian analysis using 2 different sets of prior distributions for the variance components showed that inferences differed only when the relative amount of information contributed by the data was small. response to selection / Bayesian analysis / Gibbs sampling / animal model Résumé -Analyse bayésienne de la réponse génétique à la sélection à l'aide de l'échantillonnage de Gibbs. Cet article présente une méthode d'analyse des expériences de sélection dans une perspective bayésienne. Les mesures suivantes des réponses à la incertitudes sur les effets fixés et les composantes de variance. Les distributions marginales a posteriori ont été estimées à l'aide de l'échantillonnage de Gibbs. Deux ensembles de données simulées avec des héritabilités de 0,2 et 0,5 et 5 cycles de sélection ont été utilisés pour illustrer la méthode. Deux analyses ont été faites sur chaque ensemble de données, avec des données incomplètes (génération 0-!! et avec les données complètes. L'analyse bayésienne différait de l'analyse traditionnelle, basée sur le BL UP avec un modèle animal, quand la quantité d'information utilisée était réduite. Les inférences sur la réponse à lt sélection étaient similaires avec les 2 méthodes quand l'héritabilité était élevée et que toutes les données étaient prises en compte dans l'analyse. L'approche bayésienne évaluait correctement le degré d'incertitude lié à une information insuffisante dans les données. Une analyse bayésienne avec 2 distributions a priori des composantes de variance a montre que les inférences ne difj&dquo;éraient que si la part d'information fournie par les données était faible. réponse à la sélection / analyse bayésienne / échantillonnage de Gibbs / modèle animal INTRODUCTION Many selection programs in farm animals rely on best linear unbiased predictors (BLUP) using Henderson's (1973) mixed-model equations as a computing device, in order to predict breeding values and rank candidates for selection. With the increasing computing power available and with the development of efficient algorithms for writing the inverse of the additive relationship matrix (Henderson, 1976;(auaas, 1976), 'animal' models have been gradually replacing the originally used sire models. The appeal of 'animal' models is that, given the model, use is made of the information provided by all known additive genetic relationships among individuals. This is important to obtain more precise predictors and to account for the effects of certain forms of selection on prediction and estimation of genetic parameters.
A natural application of 'animal' models has been prediction of the genetic means of cohorts, for example, groups of individuals born in a given time interval such as a year or a generation. These predicted genetic means are typically computed as the average of the BLUP of the genetic values of the appropriate individuals. From these, genetic change can be expressed as, for example, the regression of the mean predicted additive genetic value on time or on appropriate cumulative selection differentials (Blair and Pollak, 1984). In common with selection index, it is assumed in BLUP that the variances of the random effects or ratios thereof are known, so the predictions of breeding values and genetic means depend on such ratios. This, in turn, causes a dependency of the estimators of genetic change derived from 'animal' models on the ratios of the variances of the random effects used as 'priors' for solving the mixed-model equations. This point was first noted by Thompson (1986), who showed in simple settings, that an estimator of realized heritability given by the ratio between the BLUP of total response and the total selection differential leads to estimates that are highly dependent on the value of heritability used as 'prior' in the BLUP analysis. In view of this, it is reasonable to expect that the statistical properties of the BLUP estimator of response will depend on the method with which the 'prior' heritability is estimated.
In the absence of selection, Kackar and Harville (1981) showed that when the estimators of variance in the mixed-model equations are obtained with even estimators that are translation invariant and functions of the data, unbiased predictors of the breeding value are obtained. No other properties are known and these would be difficult to derive because of the nonlinearity of the predictor. In selected populations, frequentist properties of predictors of breeding value based on estimated variances have not been derived analytically using classical statistical theory. Further, there are no results from classical theory indicating which estimator of heritability ought to be used, even though the restricted maximum likelihood (REML) estimator is an intuitively appealing candidate. This is so because the likelihood function is the same with or without selection, provided that certain conditions are met Fernando and Gianola, 1990).
However, frequentist properties of likelihood-based methods under selection have not been unambiguously characterized. For example, it is not known whether the maximum likelihood estimator is always consistent under selection. Some properties of BLUP-like estimators of response computed by replacing unknown variances by likelihood-type estimates were examined by Sorensen and Kennedy (1986) using computer simulation, and the methodology has been applied recently to analyse designed selection experiments (Meyer and Hill, 1991). Unfortunately, sampling distributions of estimators of response are difficult to derive analytically and one must resort to approximate results, whose validity is difficult to assess. In summary, the problem of exact inferences about genetic change when variances are unknown has not been solved via classical statistical methods.
However, this problem has a conceptually simple solution when framed in a Bayesian setting, as suggested by Sorensen and Johansson (1992), drawing from results in   and Fernando and Gianola (1990). The starting point is that if the history of the selection process is contained in the data employed in the analysis, then the posterior distribution has the same mathematical form with or without selection ). Inferences about breeding values (or functions thereof, such as selection response) are made using the marginal posterior distribution of the vector of breeding values or from the marginal posterior distribution of selection response. All other unknown parameters, such as 'fixed effects' and variance components or heritability, are viewed as nuisance parameters and must be integrated out of the joint posterior distribution .
The mean of the posterior distribution of additive genetic values can be viewed as a weighted average of BLUP predictions where the weighting function is the marginal posterior density of heritability.
Estimating selection response by giving all weight to an REML estimate of heritability has been given theoretical justification by . When the information in an experiment about heritability is large enough, the marginal posterior distribution of this parameter should be nearly symmetric; the modal value of the marginal posterior distribution of heritability is then a good approximation to its expected value. In this case, the posterior distribution of selection response can be approximated by replacing the unknown heritability by the mode of its marginal posterior distribution. However, this approximation may be poor if the experiment has little informational content on heritability. A full implementation of the Bayesian approach to inferences about selection response relies on having the means of carrying out the necessary integrations of joint posterior densities with respect to the nuisance parameters. A Monte-Carlo procedure to carry out these integrations numerically known as Gibbs sampling is now available (Geman and Geman, 1984;. The procedure has been implemented in an animal breeding context by Wang et al (1993a;1994a,b) in a study of variance component inferences using simulated sire models, and in analyses of litter size in pigs. Application of the Bayesian approach to the analysis of selection experiments yields the marginal posterior distribution of response to selection, from which inferences about it can be made, irrespective of whether variances are unknown. In this paper, we describe Bayesian inference about selection response using animal models where the marginalizations are achieved by means of Gibbs sampling.

Gibbs sampling
The Gibbs sampler is a technique for generating random vectors from a joint distribution by successively sampling from conditional distributions of all random variables involved in the model. A component of the above vector is a random sample from the appropriate marginal distribution. This numerical technique was introduced in the context of image processing by Geman and Geman (1984) and since then has received much attention in the recent statistical literature Gelfand et al, 1992;Casella and George, 1992). To illustrate the procedure, let us suppose there are 3 random variables, X, Y and Z, with joint density p(x, y, z); we are interested in obtaining the marginal distributions of X, Y and Z, with densities p(x), p(y) and p(z), respectively. In many cases, the necessary integrations are difficult or impossible to perform algebraically and the Gibbs sampler provides a means of sampling from such a marginal distribution. Our interest is typically in the marginal distributions and the Gibbs sampler proceeds as follows. Let  where y l is the realized value of Y obtained in the second sampling. This constitutes the first iteration of the Gibbs sampling. The process of sampling and updating is repeated k times, where k is known as the length of the Gibbs sequence. As k -> oo, the points of the kth iteration ( Xk , Yk , Zk ) constitute 1 sample point from p(x, y, z) when viewed jointly, or from p(x), p(y) and p(z) when viewed marginally. In order to obtain m samples,  suggest generating m independent 'Gibbs sequences' from m arbitrary starting values and using the final value at the kth iterate from each sequence as the sample values. This method of Gibbs sampling is known as a multiple-start sampling scheme, or multiple chains. Alternatively, a single long Gibbs sequence (initialized therefore once only) can be generated and every dth observation is extracted (eg, Geyer 1992) with the total number of samples saved being m. Animal breeding applications of the short-and long-chain methods are given by Wang et al (1993, 1994a,b).
Having obtained the m samples from the marginal distributions, possibly correlated, features of the marginal distribution of interest can be obtained appealing to the ergodic theorem (Geyer, 1992;Smith and Roberts, 1993): where x i (i = 1, ... , m) are the samples from the marginal distribution of x, u ' is any feature of the marginal distribution, eg, mean, variance or, in general, any feature of a function of x, and g(.) is an appropriate operator. For example, if u is the mean of the marginal distribution, g(!) is the identity operator and a consistent estimator of the variance of the marginal distribution is (Geyer, 1992). Note that S m is a Monte-Carlo estimate of u, and the error of this estimator can be made arbitrarily small by increasing m. Another way of obtaining features of a marginal distribution is first to estimate the density using [11, and then compute summary statistics (features) of that distribution from the estimated density.
There are at least 2 ways to estimate a density from the Gibbs samples. One is to use random samples (x i ) to estimate p(x). We consider the normal kernel estimator (eg, Silverman, 1986).
where p(x) is the estimated density at x, and h is a fixed constant (called the window width) given by the user. The window width determines the smoothness of the estimated curve.
Another way of estimating a density is based on averaging conditional densities . From the following standard result: and given knowledge of the full conditional distribution, an estimate of p(x) is obtained from: « where t/i,2:i;...,t/!,2:nt are the realized values of final Y, Z samples from each Gibbs sequence. Each pair y 2 , z i constitutes 1 sample, and there are m such pairs.
Notice that the x i are not used to estimate p(x). Estimation of the density of a function of the original variables is accomplished either by applying [2] directly to the samples of the function, or by applying the theory of transformation of random variables in an appropriate conditional density, and then using !3!. In this case, the Gibbs sampling scheme does not need to be rerun, provided the needed samples are saved.

Model
In the present paper we consider a univariate mixed model with 2 variance components for ease of exposition. Extensions to more general mixed models are given by Wang et al (1993b;1994) and by Jensen et al (1994). The model is: where y is the data vector of order n by 1; X is a known incidence matrix of order n by p; Z is a known incidence matrix of order n by q; b is a p by 1 vector of uniquely defined 'fixed effects' (so that X has full column rank); a is a q by 1 'random' vector representing individual additive genetic values of animals and e is a vector of random residuals of order n by 1. The conditional distribution that pertains to the realization of y is assumed to be: where H is a known n by n matrix, which here will be assumed to be the identity matrix, and aj (a scalar) is the unknown variance of the random residuals.
We assume a genetic model in which genes act additively within and between loci, and that there is effectively an infinite number of loci. Under this infinitesimal model, and assuming further initial Hardy-Weinberg and linkage equilibrium, the distribution of additive genetic values conditional on the additive genetic covariance is multivariate normal: In [6], A is the known q by q matrix of additive genetic relationships among animals, and Q a (a scalar) is the unknown additive genetic variance in the conceptual base population, before selection took place.
The vector b will be assumed to have a proper prior uniform distribution: where b m in and b max are, respectively, the minimum and maximum values which b can take, a priori. Further, a and b are assumed to be independent, a priori.
To complete the description of the model, the prior distributions of the variance components need to be specified. In order to study the effect of different priors on inferences about heritability and response to selection, 2 sets of prior distributions will be assumed. Firstly, u2 and aj will be assumed to follow independent proper prior uniform distributions of the form: where a 2maX and afl!!! are the maximum values which, according to prior knowledge, a! and !2 can take. In the rest of the paper, all densities which are functions of b, Q a, and ae will implicity take the value zero if the bounds in 7 and [8] are exceeded.
Secondly, Q a and <7! will be assumed to follow a priori scaled inverted chi-square distributions: where v i and Sf are parameters of the distribution. Note that a uniform prior can be obtained from [9] by setting v i = -2 and S2 = 0.
Posterior distribution of selection response Let a represent the vector of parameters associated with the model. Bayes theorem provides a means of deriving the posterior distributions of a conditional on the data: The first term in the numerator of the right-hand side of [10] is the conditional density of the data given the parameters, and the second is the prior joint distribution of the parameters in the model. The denominator in [10] is a normalizing constant (marginal distribution of the data) that does not depend on a. Applying !10!, and assuming the set of priors [9] for the variance components, the joint posterior distribution of the parameters is: The joint posterior under a model assuming the proper set of prior uniform distributions [8] for the variance components is simply obtained by setting v i = -2 and SZ = 0 (i = e, a) in !12!. To make the notation less burdensome, we drop from now onwards the conditioning on v and S.
Inferences about response to selection can be made working with a function of the marginal posterior distribution of a. The latter is obtained integrating [12] over the remaining parameters: where E = (o, a 2,a e 2) and the expectation is taken over the joint posterior distribution of the vector of 'fixed effects' and variance components. This density cannot be written in a closed form. In finite samples, the posterior distribution of a should be neither normal nor symmetric. Response to selection is defined as a linear function of the vector of additive genetic values: where K is an appropriately defined transformation matrix and R can be a vector (or a scalar) whose elements could be the mean additive genetic values of each generation, or contrasts between these means, or alternatively, regression coefficients representing linear and quadratic changes of genetic means with respect to some measure of time, such as generations. By virtue of the central limit theorem, the posterior distribution of R should be approximately normal, even if [13] is not.

Full conditional posterior distributions (the Gibbs sampler)
In order to implement the Gibbs sampling scheme, the full conditional posterior densities of all the parameters in the model must be obtained. These distributions are, in principle, obtained dividing [12] by the appropriate posterior density function or the equivalent, regarding all parameters in [12] other than the one of interest as known. For the fixed and random effects though, it is easier to proceed as follows.
Using results from Lindley and Smith (1972), the conditional distribution of 0 given all other parameters is multivariate normal: where and 0 satisfies: which are the mixed-model equations of Henderson (1973). and using standard multivariate normal theory, it can be shown for any such partition that: where 0 1 is given by: Expressions [19] and [20] can be computed in matrix or scalar form. Let b i be a scalar corresponding to the ith element in the vector of 'fixed effects', b_ i . be b without its ith element, x i be the ith column vector in X, and X_.L be that part of matrix X with x i excluded. Using [19] we find that the full conditional distribution of b i given all other parameters is normal. where b i satisfies: For the 'random effects', again using [19] and letting a-i be a without its ith element, we find that the full conditional distribution of the scalar a i given all the other parameters is also normal: where z i is the ith column of Z, c ii = (Var(ai!a_,,))-1 Q a is the element in the ith row and column of A-1 , and a i satisfies: In [24], c i ,_ i is the row of A-' corresponding to the ith individual with the ith element excluded.
The full conditional distribution of each of the variance components is readily obtained by dividing [12] by p(b, a, !-ily), where E-i is E with !2 excluded. Since this last distribution does not depend on a 2 , this becomes (Wang et al, 1994a): which has the form of an inverted chi-square distribution, with parameters: When the prior distribution for the variance components is assumed to be uniform, the full conditional distributions have the same form as in [25], except that, in [26] and subsequent formulae, v i = -2 and S2 = 0 (i = e, a). (vi) repeat (ii) to (v) k (length of chain) times.
As koo, this creates a Markov chain with an equilibrium distribution having [12] as density. If along the single path k, m samples are extracted at intervals of length d, the algorithm is called a single-long-chain algorithm. If, on the other hand, m independent chains are implemented, each of length k, and the kth iteration is saved as a sample, then this is known as the short-chain algorithm.
If the Gibbs sampler reached convergence, for the m samples (b, a, E) 1, ... , (b, a, E)&dquo;, we have: wheremeans distributed with marginal posterior density p(.ly). In any particular sample, we notice that the elements of the vectors b and a, b i and a i , say, are samples of the univariate marginal distributions p(bi!y) and p(a il y). In order to estimate marginal posterior densities of the variance components, in addition to the above quantities the following sums of squares must be stored for each of the m samples: For estimation of selection response, the quantities to be stored depend on the structure of K in (14!.

Density estimation
As indicated previously, a marginal density can be estimated, for example, using the normal density kernel estimator [2] with the m Gibbs samples from the relevant marginal distribution, or by averaging over conditional densities (equation [3]). Density estimation by the former method is straightforward, by applying (2!. Here, we outline density estimation using (3!. The formulae for variance components and functions thereof were given by Wang et al (1993, 1994b To estimate the marginal posterior density of heritability h 2 = a£ /(a£ +!e), the point of departure is the full conditional distribution of a£ . This distribution has !e as a conditioning variable, and therefore Q e is treated as a constant. Since the inverse transformation is a! = <T!/(1 -h 2 ), and the Jacobian of the transformation from Q a to h 2 is J = Q e/(1 -h 2 ) , then from [27] we obtain: The estimation of the marginal posterior density of each additive genetic value follows the same principles: where, for the jth sample: Equally spaced points in the effective range of a i are chosen, and for each point an estimate of its density is obtained by inserting, for each of the m Gibbs samples, the realized values for o,2, b, a-i , and k in [30] and [31]. These quantities, together with a i , need to be stored in order to obtain an estimate of the marginal posterior density of the ith additive genetic value. This process is repeated for each of the equally spaced points.
Estimation of the marginal posterior density of response depends on the way it is expressed. In general R in [14] can be a scalar (the genetic mean of a given generation, or response per time unit) or it can be a vector, whose elements could describe, for example, linear and higher order terms of changes of additive genetic values with time or unit of selection pressure applied. We first derive a general expression for estimation of the marginal posterior density of R and then look at some special cases to illustrate the procedure.
Assume that R contains s elements and we wish to estimate p(Riy) as: In order to implement (32!, the full conditional distribution on the right-hand side is needed. This distribution is obtained applying the theory of transformations to p(a il b, a-j, E, y), where a i is a vector of additive genetic values of order s, and a_ i is the vector of all additive genetic values with a i deleted, such that a = (a i , a_ i )'.
The s additive genetic values in a i must be chosen so that the matrix of the transformation from a i to R is non-singular. We can then write [14] as: so that we have expressed R in a part which is a function of a i and another one which is not. The matrix K i is non-singular of order s by s, and from [33], the inverse transformation is Since the Jacobian of the transformation from a i to R is det(K! 1 ), letting As a simple illustration, consider the estimation of the marginal posterior density of total response to selection, defined as the difference in average additive genetic values between the last generation ( f ) and the first one: where a fj and a 1j are additive genetic values of individuals in the final and first generations, and n and n 1 are the number of additive genetic values in the final and first generation, respectively. We will arbitrarily choose a f1 and carry out a linear transformation from p(afllb, a-11,!, y) to p(Rlb, a-11,!, y). We write [37] in the form of !33): so that in the notation of [33], we have: and the marginal posterior density is estimated as follows: EXAMPLES To illustrate, the methodology was used to analyse 2 simulated data sets. Genotypes were sampled using a Gaussian additive genetic model. The phenotypic variance was always 10, and base population heritability was 0.2 and 0.5 in data sets 1 and 2, respectively. A phenotypic record was obtained by summing a fixed effect (30 fixed effects in the complete data set, but these values are of no importance here), an additive genetic effect and a residual term. Both additive genetic and residual effects were assumed to follow normal distributions with null means and variances based on the 2 heritability levels. In the generation of Mendelian sampling effects, parental inbreeding was taken into account. Five cycles of single trait selection based on a BLUP animal model were practised. Each generation resulted from the mating of 8 males and 20 females, and each female produced 3 offspring with records and 2 additional female offspring without records which were also considered as female replacements. Males were selected across generations (the best 8 each cycle) and females, which bred only once, were selected from the highest scoring predicted breeding values available each generation. The total number of animals at the end of the program was 968, of which 720 had records. The total number of sires and dams in each data set was 42 and 241. Details of the simulation including a description of the generation of fixed effects and additive genetic values can be found in Sorensen (1988).
Two analyses were carried out for each data set: an analysis based on all records; and one using data from generations 0-2 only. The rationale for this is that it is not uncommon in the literature to find reports of selection experiments in progress. Hence, we can illustrate in this manner how inferences are modified as additional information is collected.
For each analysis, marginal posterior densities for the additive genetic variance a , residual variance ( Q e), phenotypic variance (aP) and heritability (h 2 ) were estimated. In addition, marginal posterior densities of 2 expressions for selection response were estimated. These were: 1) difference between final and initial average additive genetic values or total response (TR); and 2) linear regression through the origin of the linear regression of additive genetic values on generation. Unless otherwise stated, the analyses reported below were performed assuming that variance components follow a priori independent uniform distributions as shown in (8!. The upper bounds of the additive and environmental variances (a 2 2 were arbitrarily chosen to be 10 and 20, respectively. The Gibbs sampler was implemented using a single chain of total length 1 205 000, with a warm-up (initial iterations discarded) of length 5 000, and a subsampling interval between samples of d = 10. Thus, a total of m = 120 000 samples were saved. Calculations were programmed in Fortran in double precision. Random numbers were generated using IMSL subroutines (IMSL Inc, 1989).
Plots of the estimated marginal posterior densities of Q a, Q e, h 2 , TR and 6 were obtained with both the average of conditionals and with the normal kernel density estimator; those for a!, t50 and 6 1 were generated using the normal kernel density estimator only, for technical reasons. The window used for the normal kernel estimator was the range of the effective domain of the parameter, divided by 75. The effective domain covered 99.9% of the density mass. Each of the plots was generated by dividing the effective domain of that variable into 100 evenly spaced intervals. Summary statistics of distributions, such as the mean, median and variance were computed by Simpson's integration rules, by further dividing the effective domain into 1 000 evenly spaced intervals using cubic-spline techniques (IMSL Inc, 1989). Modes were located through grid search. For illustration and not as a means of a definite comparison between methods, a standard classical analysis was carried out. First, REML estimates of variance components were obtained.
These REML estimates were then used in lieu of the true values in Henderson's mixed-model equations to obtain estimates of additive genetic values. Finally, estimated genetic responses to selection were computed based on appropriate averages of the estimated additive genetic values. Results of this classical analysis are reported later under the heading ST in figures 5-8. The estimated marginal posterior densities of the variance components and of heritability are shown in figures 1 and 2 for the analysis based on partial (PART) and whole (WHOLE) data, respectively, when base population heritability was 0.2. Corresponding plots for base heritability 0.5 are given in figures 3 and 4.
For the data simulated with h 2 = 0.2, the posterior distributions reflected considerable uncertainty about the values of the parameters. As expected, the variances of the posterior distributions were smaller in WHOLE (fig 2) than in PART (fig 1). In particular, the PART analysis of U2 and h2 showed highly skewed distributions; the mean, the mode and the median differed from each other. The REML estimates in the PART analysis for the additive genetic variance and heritability were very close to zero. Asymptotic confidence intervals were not computed, but they would probably include a region outside the parameter space. The Bayesian analysis indicated with considerable probability that both parameters are non-zero. When WHOLE was considered (fig 2), the degree of uncertainty was reduced, but point estimates in the Bayesian analysis of genetic variance and of heritability were different from those in the REML estimates. For example, the posterior mean, mode and median of heritability were 0.13, 0.08 and 0.12, respectively, in comparison with the REML estimate of 0.07. The lack of symmetry of the posterior distribution of Q a and h2 clearly suggests that the simulated selection experiment does not have a high degree of resolution concerning inferences about genetic parameters. This point would not be as forcefully illustrated in a standard REML analysis, where, at best, one would get joint approximate asymptotic confidence intervals for the parameters of interest.
In the case of h 2 = 0.5 (fig 3 and 4), it should be noted that the distributions were less skewed than for h 2 = 0.2, although still skewed for PART (fig 3). Both the Bayesian and the REML analyses suggested a moderate to high heritability, but the fact that the posterior distribution of heritability for PART (fig 3) was very skewed indicates that more data should be collected for accurate inferences about heritability. The WHOLE analysis (fig 4) yielded nearly symmetric posterior distributions of heritability with smaller variance, centered at 0.55 (the REML estimate was 0.543). It should be noted that the residual variance was poorly estimated in PART (fig 3), but the Bayesian analysis clearly depicted the extent of uncertainty about this parameter. This also illustrates the fact that a variance component smaller in value relative to others in the model is harder to estimate, contrary to the common belief in animal breeding that residual variance is always well estimated.
The assessment of response to selection for the population simulated with h 2 = 0.2 is given in figures 5 and 6 for the PART and WHOLE data analyses, respectively. The true response in the simulated data at a given generation, computed as the average of true genetic values within the generaton in question, was TR = 0.643 units and TR = 1.783 units, for PART and WHOLE. The regressions of true response on generation were 0.321 and 0.343, for PART and WHOLE, respectively. The hypothesis of no response to selection cannot be rejected in the PART analysis ( fig 5). The classical approach gave an estimated response of zero. However, the Bayesian analysis documents the extent of uncertainty, and assigns considerable posterior probability to the proposition that selection may be effective; the posterior probability of response (eg, Ql > 0 or TR = t 2 -to > 0) was much larger than that of the complement. The strength of additional evidence brought up in the whole analysis (fig 6) supports the hypothesis that selection was effective. For example, the difference in genetic mean between generation 0 and 5 was centered around 1.852, though the variance of the posterior distribution was as large as 1.075. The classical analysis gave an estimated value of ST = 1.296 units, but no exact measure of variation can be associated to this estimate. Approximate results could be obtained, for example, using Monte-Carlo simulations, but this was not attempted in the present work.
In the simulation with h 2 = 0.5, the true response in the simulated data, computed as the average of true genetic values from the relevant generations, was TR = 3.258 units and TR = 7.148 units for the PART and WHOLE data sets, respectively. The associated regressions on generation were 1.629 and 1.436, respectively. The Bayesian and classical analyses indicated response to selection beyond reasonable doubt (fig 7 and 8). Note, however, that the posterior distribution of TR = t 2to in PART was skewed (fig 7). In figure 8, all the posterior distributions were nearly symmetric, and the classical and Bayesian analyses were essentially in agreement. In spite of this symmetry, considerable uncertainty remained about the true rate of response to selection. For example, values of t 5to ranging from 3 to 11 units have appreciable posterior density (fig 8). In the light of this, results from a similar experiment where realized genetic change is, say, 50% higher or lower than our posterior mean of about 6.7 units, could not be interpreted as yielding conflicting evidence.
In order to study the influence of different priors on inferences about heritability and response to selection, part of the above analyses using the same data, was repeated assuming that variance components followed, a priori, the inverse chisquare distributions (9!. The 'degree of belief' parameters v i and the a priori value of the variance components S 2 , were assessed by the method of moments from an independently simulated data set with base population heritability of 0.5, and consisting of 3 cycles of selection. An REML estimate of heritability from this data was 0.46 with an asymptotic variance of 2.94 x 10-2 . The method of moments as described in Wang et al (1994b), produced the following estimates: v a = 11; v e = 464; Sa = 4.50; Se = 5.19. A reanalysis of the PART and WHOLE replicate (heritability = 0.5) yielded the results shown in table I. The figures in the table clearly illustrate how increasing the amount of information in the data modifies the input contributed by the prior. Thus in the PART dataset, the uniform prior and the informative prior (which implies a prior value of heritability of 0.46) lead to clear differences in the expected value of the marginal posterior distribution of heritability and selection response. However, in the WHOLE data set, inferences using either prior are very similar. The figures also illustrate how the variance of marginal posterior distributions is reduced when an informative prior is used.
The examples indicate the power of the Bayesian analysis to reveal uncertainty in response to selection when the information contained in the data about the appropriate parameters is small (eg, PART analysis at h 2 = 0.2). Other plots, such as marginal posterior distribution of generation means, for example, are possible and could illustrate the evolution of the drift variance as the experiment progresses.

DISCUSSION
We have presented a way of analysing response to selection in experimental or nonexperimental settings, from a Bayesian viewpoint. The end-point of the analysis is the construction of a marginal posterior distribution of a measure of selection response which, in this study, was defined as a linear function of additive genetic values. The marginal distribution can be viewed as a weighted average of an infinite number of conditional distributions, where the weighting function is the marginal posterior distribution of variances and other parameters, which are not necessarily of interest. The approach relies heavily on the result ) that if the data on which selection was based are included in the analysis, the Bayesian approach accounts for selection automatically, in the sense that the joint posterior or any marginal posterior distribution is the same with or without selection. This is a particular case of what has been known as 'ignorable selection' (Little and Rubin, 1987;Im et al, 1989) and it implies that selection can be ignored if the information of any data-based selection is contained in the Bayesian model. Furthermore, inferences pertain to parameters, eg, heritability, of the base population. As shown with the simulated data, the marginal posterior distributions of variances and functions thereof are often not symmetric. This fact is taken into account when computing the marginal posterior distributions of measures of selection response. In this way, the uncertainty associated with all nuisance parameters in the model is taken into account when drawing inferences about response to selection. This is in marked contrast with the estimation of response that is obtained using BLUP with an animal model, which assumes the variance ratio known, and gives 100% weight to an estimate of this ratio.
The analysis of the simulated data used uniform distributions for the fixed effects and variance components. The choice of appropriate prior distributions is a contentious issue in Bayesian inference, especially when these priors are supposed to convey vague initial knowledge. Injudicious choice of noninformative prior distributions can lead to improper posterior distributions, and this may not be easy to recognize, especially when the analysis is based on numerical methods, as in the present work. The subject is still a matter of debate and an important concept is that of reference priors (Bernardo, 1979;Berger and Bernardo, 1992), which have the property that they contribute with minimum information to the posterior distribution while the information arising from the likelihood is maximized. In the single parameter case, reference priors often yield the Jeffreys priors (Jeffreys, 1961).
The uniform prior distributions we have chosen for the fixed effects and the variance components represent a state of little prior knowlege, but are clearly not invariant under transformations and must be viewed as simple ad iaoc approximations to the appropriate reference prior distributions. We have illustrated, however, that when the amount of information contained in the data is adequate inferences are affected little by the choice of priors. It is not generally a simple matter to decide when one is in a setting of adequate information, and it may be therefore revealing to carry out analyses with different priors to study how inferences are affected (Berger, 1985). If use of different priors leads to very different results, this indicates that the information in the likelihood is weak and more data ought to be collected in order to draw firmer conclusions.
The richness of the information that can be extracted using the Bayesian approach is at the cost of computational demands. The demand is in terms of computer time rather than in terms of programming complexity. However, an important limitation of iterative simulation methods is that drawing from the appropriate posterior density takes place only in the limit, as the number of draws becomes infinite. It is not easy to check for convergence to the correct distribution, and there is little developed theory indicating how long the Gibbs chain should be.
The rate of convergence can be exceedingly slow, especially when random variables are highly correlated, which is the case with animal models. Further, there is often a possibility that the Gibbs sequence may get 'trapped' near zero which is an absorbing state, in which case one must reinitialize the chain (Zeger and Karim, 1991). Gibbs sampling convergence is an area of active research where a number of partial answers based on pragmatic approaches have been suggested. These include checking for serial correlations, monitoring the behavior of several series of runs of the chains starting from a wide range of initial values and checking both withinand between-series variation (Gelman and Rubin, 1992). Another possibility, given a Gibbs sample of size m, is to vary the length of the sequence and to overlay plots of the estimated densities to see if these are distinguishable . Convergence can probably be accelerated if correlated random variables (such as additive genetic values in animal models) are blocked together, at the expense of having to sample from multivariate conditional distributions (Smith and Roberts, 1993). In general though, selection experiments are scarce and expensive, and the cost of the additional computation needed for carrying out the Bayesian analysis in often marginal relative to the whole cost.
The great appeal of this method is that it yields a Monte-Carlo estimate of a full marginal posterior distribution of a parameter of interest, from which probabilities that the parameter lies between specified values can be easily computed. This is particularly relevant in the case where the asymptotic normality of posterior distributions is difficult to justify, which can often be the case in selection experiments. The pictorial representation of marginal posterior distributions, and the associated possibility of making precise probability statements, provide for a very rich inferential framework. Errors incurred when estimating a posterior density by Monte-Carlo methods can be made arbitrarily small by increasing the chain length in the Gibbs sampling, at least in principle.
The Bayesian analysis can also be useful to study the design of selection experiments. The literature on experimental design relies heavily on assumptions such as absence of nuisance parameters and knowledge of base population parameters.
Studies about the use of animals models in selection experiments have concentrated on analysis rather than design issues. This is partly because in a frequentist setting it is difficult to compute the sampling variance of the estimator of response, especially when variances used as priors have been estimated from the data at hand. Using the Bayesian approach, a variety of designs could be studied and their efficiency compared by means of analyses of predictive distributions.