Bayesian estimation of dispersion parameters with a reduced animal model including polygenic and QTL effects

Estimation Bayesienne des parametres de dispersion dans un modele animal reduit comprenant un effet polygenique et l'effet d'un QTL. En genetique animale, les algorithmes de Monte-Carlo par chaines de Markov sont utilises de plus en plus souvent pour en inferer aux distributions marginales a posteriori des parametres du modele genetique. L'algorithme d'echantillonnage de Gibbs est utilise largement et demande la connaissance des densites conditionnelles, dans une forme standard. Dans cette etude, on decrit une met hode Bayesienne pour la cartographie statistique d'un locus a effet quantitatif (QTL), ou l'application d'un modele animal reduit conduit a des densites de parametres de dispersion qui n'ont pas de forme standard. On utilise l'algorithme de Metropolis-Hasting pour l'echantillonnage de ces densites non standard. La souplesse de l'algorithme de Metropolis-Hasting permet egalement de changer la parametrisation du modele genetique: au lieu des composantes de variances habituelles, on peut utiliser une composante de variance (residuelle) et deux rapports de composantes de variance: l'heritabilite et la proportion de la variance genetique due au QTL. Il est plus facile de specifier l'information a priori sur des proportions, en partie parce qu'elle ne depend pas de l'echelle. Trois fichiers de donnees simulees sont utilises pour etudier la performance du modele animal reduit, par rappor au modele animal strict, l'effet de parametrisation du modele genetique et la qualite du test de la presence d'un QTL a une position donnee.


INTRODUCTION
The wide availability of high-speed computing and the advent of methods based on Monte Carlo simulation, particularly those using Markov chain algorithms, have opened powerful pathways to tackle complicated tasks in (Bayesian) statistics [9,10]. Markov chain Monte Carlo (MCMC) methods provide means for obtaining marginal distributions from a complex non-standard joint density of all unknown parameters (which is not feasible analytically). There are a variety of techniques for implementation [9] of which Gibbs sampling [11] is most commonly used in animal breeding. The applications include univariate models, threshold models, multi-trait analysis, segregation analysis and QTL mapping [15,17,29,31,33].
Because Gibbs sampling requires direct sampling from full conditional distributions, data augmentation [22] is often used so that 'standard' sampling densities are obtained. Often, however, this is at the expense of a substantial increase in number of parameters to be sampled. For example, the full conditional density for a genetic variance component becomes standard (inverted gamma distribution) when a genetic effect is sampled for each animal in the pedigree, as in a (full) animal model (FAM). The dimensionality increases even more rapidly when the FAM is applied to the analysis of granddaughter designs [34] in QTL mapping experiments, i.e. marker genotypes on granddaughters are not known and need to be sampled as well. In addition, absence of marker data hampers accurate estimation of genetic effects within granddaughters, which form the majority in a granddaughter design. This might lead to very slow mixing properties of the dispersion parameters (see also Sorensen et al. !21!). The reduced animal model (RAM, Quaas and Pollak, [19]) is equivalent to the FAM, but can greatly reduce the dimensionality of a problem by eliminating effects of animals with no descendants. With a RAM, however, full conditional densities for dispersion parameters are not standard. Intuitively, RAM, used to eliminate genetic effects and concentrate information, is the antithesis of data augmentation, used to arrive at simple standard densities. For the Metropolis-Hastings (MH) algorithm [14,18!, however, a standard density is not required, in fact, the sampling density needs to be known only up to proportionality. Another alternative for the FAM is the application of a sire model which implies that only sires are evaluated based on progeny records. With a sire model, the genetic merit of the dam of progeny is not accounted for and only the phenotypic information on offspring is used. The RAM offers the opportunity to include maternal relationships, offspring with known marker genotypes and information on grandoffspring. As a result the RAM is better suited for the analysis of data with a complex pedigree structure. The flexibility of the MH algorithm also allows for a greater choice of the parameterization (variance components or ratios thereof) of the genetic model. If Gibbs sampling is to be employed, the parameterization is often dictated by mathematical tractability to obtain the simple sampling density. The MH algorithm readily admits much flexibility in modelling prior belief regarding dispersion parameters, which is an advantageous property in Bayesian analysis !16!.
In this paper, we present MCMC algorithms that allow Bayesian linkage analysis with a RAM. We study two alternative parameterizations of the genetic model and use a test statistic to postulate presence of a QTL at a fixed position relative to an informative marker bracket. Three sets of simulation data using a typical granddaughter design are used.

Genetic model
The additive genetic variance (o,2) underlying a quantitative trait is assumed to be due to two independent random effects, due to a putative QTL and residual independent polygenes. The QTL effects (v) are assumed to have a N(0, GO,2) prior distribution where G is the gametic relationship matrix [2,8], and ui is the variance due to a single allelic effect at the QTL. Matrix G depends upon one unknown parameter, the map position of the QTL relative to the (known) positions of bracketing (informative) markers. Here we consider the location of the QTL to be known. The polygenic effects (u) have a N(0, Au u 2) prior distribution, where A is the numerator relationship matrix. The genetic model underlying the phenotype of an animal is where b is the vector with fixed effects, vi and v? are the two (allelic) QTL effects for animal i, and e i ! N(0, lo,2). e (QTL effects within individual are assigned according to marker alleles, as proposed by Wang et al. [32]). The sum of the three genetic effects is the animal's breeding value (a). In addition to genetic effects, location parameters comprise fixed effects that are, a priori, assumed to follow the proper uniform distribution: f (b) -U[b n ,; n , bmax! ! where b m i n and b max are the minimum and maximum values for elements in b.

Reduced animal model (RAM)
The RAM is used to reduce the number of location parameters that need to be sampled. The RAM eliminates the need to sample genetic effects of animals with neither descendants nor marker genotypes, i.e. ungenotyped non-parents. The phenotypic information on these animals can easily be absorbed into their parents without loss of information. Absorption of non-parents that have marker genotypes becomes more complex when position of QTL is unknown; it is therefore better to include them explicitly in the analysis. In the remainder of the paper, it is assumed that marker genotypes on non-parents are not available. The genetic effects of nonparents can be expressed as linear functions of the parental genetic effects by the following equations [4], and where each row in P contains at most two non-zero elements (= 0.5), and each row in Q has at most four non-zero elements [32], the terms wnon -p arcn t s and §non-parents pertain to remaining genetic variance due to Mendelian segregation of alleles. In a granddaughter design, the P and Q for granddaughters, not having marker genotypes observed nor augmented, have similar structures, where Q 9 denotes the Kronecker product, and J is a unity matrix [20]. This equality does not hold if marker genotypes are augmented, since phenotypes contain information that can alter the marker genotype probabilities for ungenotyped nonparents [2].
The phenotypes for a quantitative trait can now be expressed as, for row vectors P i and Q i (possibly null), and where u) i reflects the amount of total additive genetic variance that is present in E . 2 Based on the pedigree, four categories of animals are distinguished in the R A M (table 1). The vectors P i and Q i contain partial regression coefficients. For parents, the only non-zero coefficients pertain to the individual's own genetic effects (ones); for non-parents, the individual's parents' genetic effects (halves). Note that P i and GZ i are null for a non-parent with unknown parents, and that nonparents' phenotypes in this category contribute to the estimation of fixed effects and phenotypic (residual) variance only.

Parameterization
Let B denote the set of location parameters (b, u and v) and dispersion parameters.
We consider the following two parameterizations for the dispersion parameters where and In the first, 0 v c , the parameters are the variance components (VC). This is the usual parameterization. A difficulty with this is that it is problematic for an animal breeder to elicit a reasonable prior of the genetic VC. Animal breeders, it seems to us, are much more likely to have, and be able to state, prior opinions about such things as heritabilities. Consequently, in O RT , parameter h 2 is the heritability of a trait, and parameter &dquo;( is the proportion of additive genetic variance due to the putative QTL. This parameterization allows more flexible modelling of prior knowledge because h 2 and -y do not depend on scale. Theobald et al. [23] used a variance ratio, a u/Ue 2 2, parameterization but noted that the animal breeder may prefer to think in terms of heritability. We prefer the part-whole ratios h 2 and y.
The components or2 and <7! can be expressed in terms of Q e, h 2 2 and and 2.4. Priors We now present the prior knowledge on dispersion parameters, priors for location parameters having been given earlier. In earlier studies, two different priors are often used to describe uncertainty on VC. The inverted gamma (IG) distribution, or its special case the inverted chi-square distribution, is common because it is often the conjugate prior for the VC if the FAM (or sire model) is applied. Hence, the full conditional distribution for VC will then be a posterior updating of a standard prior !9!. This simplifies Gibbs sampling. We will use the IG as the prior for 0 '; -though with a RAM it is not conjugate, where x = e, u, or v. The rhs of (10) constitutes the kernel of the distribution. The mean (p) of an IG(o:, ( 3) is ((a -1)(3) -1 , and the variance equals ((a-1)!-2)/!)'B Van Tassell et al. [29] suggest setting a = 2.000001 and /3 ! (!)-1 for an 'almost flat' prior with a mean corresponding to prior expectation (p,). The IG distributions for three different prior expectations are given in figure 1. When the prior expectation is close to zero (p, = 5.0), the distribution is more peaked and has less variance because mass accumulates near zero. When the prior expectation is relatively high (p, = 60), the probability of or2 being equal to zero is very small, which might be undesirable and/or unrealistic for ui. An alternative prior distribution for or2 is which is a proper prior for ufl with a uniform density over a pre-defined large, finite interval, for example from zero to 200 (figure 1). These prior distributions for VC are used mainly to represent prior uncertainty !21, 30, 31!.
Corresponding to (10) (11) there is an equivalent prior distribution for A(!y). However, because neither (10) nor (11) were chosen for any intrinsic 'rightness' we prefer a simpler alternative of using Beta distributions for the ratio parameters A and -y to represent prior knowledge, where x = h 2 or -y. When prior distribution parameters a x and / 3 x are both set equal to 1, the prior is a uniform density between 0 and 1 (figure 2), i.e. flat prior. Alternatively, a x and !3! can be specified to represent prior expectations for parameters of interest (figure 2). For example, one can centre the density for heritability of a yield trait in dairy cattle around the prior expectation (= 0.40), with a relatively flat (Beta (2.5, 3.75)) or peaked (Beta (30.0, 45.0)) distribution when prior certainty is moderate or strong, respectively. Furthermore, prior knowledge on !y, proportion of additive genetic variance due to a putative QTL, can be modelled to give relatively high probabilities of values close to zero, e.g. (Beta (0.9, 2.7)). Another option, suggested by a reviewer, would be to put vague priors on o x and /3 x as in Berger [1].

Joint posterior density
The joint posterior density of B is the product of likelihood and prior densities of elements in 0, described above. Let n i denote the number of observations on animals of category i (table 1), the total number of observations being given as N, and let q denote the number animals with offspring, i.e. parents. Then, 2q is the number of QTL effects (two allelic effects per animal). With O v c, Under B RT , dispersion parameters, and priors thereof, are different from 0 vc the joint posterior density is 2.6. Full conditional densities From the joint posterior densities (13) and (14), the full conditional density for each element in B can be derived by treating all other elements in 0 as constants and selecting the terms involving the parameter of interest. When this leads to the kernel of a standard density, e.g. Normal for location parameters or an IG distribution, e.g. variance components with FAM, Gibbs sampling is applied to draw samples for that element in 0. Otherwise, the full conditional density is non-standard and sampling needs to be done by other techniques. (All full conditional densities are given in the Appendix).

Sampling non-standard densities by Metropolis-Hastings algorithm
Sampling a non-standard density can be carried out a variety of ways, including various rejection sampling techniques [6, 7, 12, 13!, and Metropolis-Hastings sampling within Gibbs sampling !6!. We use the Metropolis-Hastings algorithm (MH). Let !r(x) denote the target density, the non-standard density of a particular element in 0, and let q(x, y) be the candidate generating density. Then, the probability of move from current value x to candidate value y for O i is, When y is not accepted, the value for 0 z remains equal to x, at least until the next update for 0z . Chib and Greenberg [6] described several candidate generating densities for MH. We use the random walk approach in which candidate y is drawn from a distribution centred around the current value x. To ensure that all sampled parameters are within the parameter space the sampling distribution, q(x, y), was U(B L , B u ) with where t is a positive constant determined empirically for each parameter to give acceptance rates between 25 and 50 % [6,24].  (table II). In each MCMC cycle, however, the remaining two were computed, using (6) and (7)

SIMULATION
In this study, granddaughter designs were generated by Monte Carlo simulation. The unrelated grandsire families each contained 40 sires that were half sibs. The number of families was 20 except in simulation III where designs with 50 families were simulated as well (table IIB. Polygenic and QTL effects for grandsires were sampled from N(0, Q u) and N(0, er!), respectively. The polygenic effect for sires was simulated as US = !(UGs) 2 + 4lz , where UGS is the grandsire's polygenic effect, and *!t, Mendelian sampling, is distributed independently as N(0, Var (4l j )) with Var(4lz) = 0.75 x Q u (no inbreeding). Each sire inherited one QTL at random from its (grand) sire. The maternally inherited QTL effect for a sire was drawn from N(0, er!). Each sire had 100 daughters with phenotypes observed, that were generated as where p is a 0/1 variable. In all simulations the phenotypic variance and the heritability of the trait were 100 and 0.40, respectively. The proportion of genetic variance due to the QTL (= 1' ) was by default 0.25, or 0.10 in simulation III (table 777). Two genetic markers bracketing the QTL position at lOcM (Haldane mapping function) were simulated with five alleles at each marker, with equal frequencies over alleles per marker. For grandsires, the marker genotypes were fully informative, i.e. heterozygous, and the linkage phase between marker alleles is assumed to be known a priori. The uncertainty on linkage phase in sires can be included in 0, but we did not. All possible linkage phases within sires were weighted by their probability of occurrence and one average relationship matrix between grandsires' and sires' QTL effects was used.

Simulation I: comparison RAM versus FAM
For each of the four MCMC algorithms that are given in table II, a single MCMC chain was run and 2 000 thinned samples were used for post-MCMC analysis (table III). In the case of 0 v c , prior distributions for Q e, Q u and a 2 were 'flat' IGs (figure 1) with expected means equal to 60, 30 and 5 (values used for simulation), respectively. In the case of B RT , the prior for a was again an IG and priors for h 2 2 and -y were Beta (2.5, 3.75) and Beta (0.9, 2.7), respectively. Figure 3 presents the mixing properties for parameter ui within the chains for the RAM-0 vc and RAM-0vc alternatives and points to slower mixing when using the FAM. This slow mixing is also indicated by high auto-correlation (! 1) among samples for parameters ui 2 and when the FAM was used (table IV). With the same thinning, the autocorrelation among samples in the RAM is x 0.70. The estimates for posterior mean and coefficient of variation, derived from samples of the four chains, are given in table V. These estimates are very similar over models (RAM and FAM) and parameterizations (0 v c and B RT ). The coefficients of variation for ui and 'Y are relatively large and indicate that a posteriori knowledge on these parameters remains small, while estimates for oe and h2 are accurate. The magnitude of the sampling correlation among parameters within MCMC cycles was very similar for both models and parameterizations. The samples for o, and ufl showed a moderately high negative correlation (-0.7), while the sampling correlation between h 2 and 7 was relatively low and positive (0.2). The correlation among samples for u £ and h 2 was very high but apparently did not adversely affect the auto-correlation of these parameters. Taking 100 ESS as a minimum [26] the MCMC chain was rather short for statistical inferences for -y in FAM-B RT . However, running a longer MCMC chain was not practical since the FAM-0 vc MCMC chain needed 68 593 min CPU (47 days) on a HP 9000-735 (125Hz) workstation. This was almost 100 times the 11 h that were needed to run the RAM with similar chain length.
The slow mixing of parameters for a FAM was likely to be due to the lack of marker data on granddaughters. Distinction between polygenic and QTL effects within these animals is hardly possible. Consequently, they provide little information regarding dispersion but because they are so numerous they dominate the distribution from which the next sample for the dispersion parameter is drawn. Heuristically, one first generates u and v with variances reflecting current a 2 . Subsequently one samples a new o 2 from a peaked distribution with a mean near the sample variance of the u and v. Not surprisingly one gets back a a 2 very similar to the previous one, as a result of which the chain is slowly mixing.
The data from simulation I were also used to examine the effect of priors on posterior inferences on the proportion of QTL when e RT was used. Four different priors for 7 were used, ranging from a 'flat' (but not a 'non-informative') uniform prior to a density peaked at zero. The latter reflects the prior expectation that the genetic variance due to the QTL is small or equal to zero. Figure 4 presents both prior and posterior densities. The uniform and the 'peaked-at-zero' prior resulted in the highest (0.20) and lowest posterior mean estimate (0.10), respectively. For this design, the information from the data is not overwhelming the prior knowledge. All priors studied, however, showed consistency for the posterior probability of y = 0, i.e. the data supported the presence of a QTL at the studied position of the chromosome.

Simulation II: parameterization of the genetic model
In simulation II, five replicates of data were used to study the effects of alternative parameterizations of the genetic model, for the RAM only. Genetic and population parameters were similar to those in simulation I (table III). Based on the results for ESS from the initial MCMC chains (table I!, the MCMC chains were run for 250 000 cycles and every 250th was sample used for analysis (m = 1000). Now, uniform priors for all dispersion parameters were used. The sampling correlations were averaged over the five replicates and are presented in table VI. These correlations are consistent with those from the initial MCMC chains (table IV); i.e. auto-correlations were highest among samples for 0 &dquo; (in Ovc) and q (in O RT ), i.e. around 0.68. These parameters also had lowest and similar ESS (! 230). These results indicate that sampling efficiency is similar for the two studied parameterizations (0vc and O RT ) of the genetic model and shorter chains may suffice. The posterior mean estimates, averaged over five replicates, for all dispersion parameters were in close agreement with the values used for simulation (not shown). 4 (20 or 50 grandsire families) were studied in combination with two different sizes of the QTL (explaining either 10 or 25 % of the genetic variance). Two different priors for were studied with the O RT parameterization. For each combination of design and -y, test runs preceding the 25 replicates were used to empirically determine values for t in the MH algorithm, in order to achieve the desired acceptance rates. From the marginal posterior density an odds ratio was computed and the presence of the QTL was accepted only if this ratio exceeded a critical value of 20. Using this test statistic we postulated the power of detecting the QTL for specific designs and using different priors (table V77). The small design (20 x 40) has low power of QTL detection, i.e. only 25 %, for a QTL that explains 10 % of the genetic variance. Power increased when either the QTL explained more genetic variance or when a large design (50 x 40) was used. For the large design with a relatively large QTL, power of detection is 100 %, for both priors considered. The use of the 'peaked-at-zero' prior reduced power in the two intermediate cases but increased power in the small design with the small QTL. Estimates for posterior mode, mean and HPD90 were averaged over the 25 replicates and these averages are presented in figure 5. When the 'peaked-at-zero' prior was used, point estimates were lower compared to using the uniform prior. This prior also led to shorter -and closer to zero -HPD90 region in all combinations of design and 7 but the impact was more noticeable for the small design.

CONCLUSIONS
We presented MCMC algorithms, using the Gibbs sampler and the MH algorithm, which facilitate Bayesian estimation of location and dispersion parameters with a RAM. The RAM proved to be superior to the FAM; RAM required much less computational time because of the greatly reduced number of location parameters and also better mixing of the dispersion parameters. Information on individual phenotypes led to accurate estimation of both residual variance and heritability, as was similar to Van Arendonk et al. !27!. On the contrary, daughter yield deviations [28] may result in poor estimation of polygenic and residual variances [25]. The use of B RT allows a better representation of prior belief about dispersion parameters while sampling efficiency was similar to the usual 0 vc parameterization.
Considering ratios of variance components rather than variance components themselves in sampling procedures has been previously proposed !23!. However, our ratios can be interpreted directly and have implicit boundaries (zero and one), where Theobald et al. [23] needed a specific restriction on their ratio. The representations of prior knowledge in the two parameterizations were not equivalent and differences in posterior estimates can be expected. However, the use of vague priors (absence of prior knowledge) in the two parameterizations lead to very similar results.
In this study, position of the QTL was assumed known. Extension of the MCMC algorithm to allow estimation of QTL position has been studied and implemented (3!. Currently, the method of Bink et al. [2] to sample genotypes for a single marker is being extended to multiple markers linked to a normally distributed QTL. Then, a robust MCMC method becomes available for linkage analysis in multiple generation pedigrees allowing incomplete information on both trait phenotypes and marker genotypes. The conditional densities for location parameters are the same with either set of dispersion parameters (0 v c or O RT ). When sampling genetic effects, the ratios a2 of VC needed can be computed from either parameterization, i.e. a u 1 = 2 = u or2 e ( h2 a2 1 h2 ae C ( 1y ) x I -h 2 and a -1 v 1 = -9 2 e = C 2 ! x 1 -h2 In this study we considered B 1 -h ) o! B!2 2 1 -n// only one fixed effect, an overall mean m, for which the conditional density becomes where ji r equals y k corrected for genetic effects, following the categorization in table L The conditional variance of this overall mean is a weighted average over categories.
Again, for phenotypes on animals in categories 1 to 3, the residual variance, 0 '2 e i 7 contains parts of the genetic variances. The conditional density for the polygenic effect of animal j can be given as where where y 2 is the ith phenotype for animal j, corrected for all effects, other than polygenic, y l . is the average of phenotypes on non-parent l, also corrected for all effects other than polygenic, op(j) represents the offspring of animal j, which are parents themselves, o n p(j) represents the offspring of animal j, which are nonparents. Furthermore, UM , k is the polygenic effect of the other (if known) parent (mate of animal j) of offspring k, n j is the number of phenotypes for animal j, y z is the ith phenotype for animal j, corrected for all effects other than QTL, W l . is the average of phenotypes on non-parent l, also corrected for all effects other than QTL, dq!'1 is the first element of the xth row of D7'QT for animal j, and corrects for the covariance at the QTL between parent and offspring. Similarly, dqd!'1 is the first element of the xth row of f!!D! 1Q! for animal j, and corrects for the covariance between parent and the mate belonging to a particular offspring of that parent j.

A1.2. Dispersion parameters
In the RAM, the residuals (e) have different variances over the categories of animals (table 1). Hence, conditional densities for VC in 0vc are not standard densities. For example, when deriving the full conditional density for o,,2, the term W i (a2+ 2a 2 ) is known in the likelihood part of the joint posterior density (13). It can thus be treated as a constant, but it does not drop out of the equation. With B RT , the conditional density of u2 is standard, but those for h 2 2 and -y are not. With 0vc , the conditional density of variance component x, for x = e, u or v, is where and With O RT , the conditional density for or,2, is an IG (r, s) distribution with where N is the total number of phenotypes