Detection of multiple QTL with epistatic effects under a mixed inheritance model in an outbred population

A quantitative trait depends on multiple quantitative trait loci (QTL) and on the interaction between two or more QTL, named epistasis. Several methods to detect multiple QTL in various types of design have been proposed, but most of these are based on the assumption that each QTL works independently and epistasis has not been explored sufficiently. The objective of the study was to propose an integrated method to detect multiple QTL with epistases using Bayesian inference via a Markov chain Monte Carlo (MCMC) algorithm. Since the mixed inheritance model is assumed and the deterministic algorithm to calculate the probabilities of QTL genotypes is incorporated in the method, this can be applied to an outbred population such as livestock. Additionally, we treated a pair of QTL as one variable in the Reversible jump Markov chain Monte Carlo (RJMCMC) algorithm so that two QTL were able to be simultaneously added into or deleted from a model. As a result, both of the QTL can be detected, not only in cases where either of the two QTL has main effects and they have epistatic effects between each other, but also in cases where neither of the two QTL has main effects but they have epistatic effects. The method will help ascertain the complicated structure of quantitative traits.


INTRODUCTION
It may be more realistic that interlocus interactions (epistasis) between two or more quantitative trait loci (QTL), as well as the main effects of QTL themselves, play an important role in expressing a quantitative trait, and the importance of exploring epistatic effects has been discussed recently [17,30].
Then, several statistical methods for the detection of epistatic QTL, using a maximum-likelihood [25] or least squares approach [6], were proposed. Since these are multiple-dimensional search approaches, however, many computations are involved. Additionally, these are based on the assumption that the QTL number is fixed, and it may not be appropriate to apply these methods to data where the QTL number is unknown. Then, several improved methods have been proposed to detect multiple epistatic QTL, incorporating a genetic algorithm [1], extending the composite interval mapping [12,30], and adopting a one-dimensional search [10]. At this stage, genome-wide levels of significance and the confidence interval of estimates have to be calculated by using, for example, the permutation test [2] and bootstrap method [24], in which millions of computations are required.
Alternatively, the Bayesian approach via Markov chain Monte Carlo (MCMC) algorithms such as the Gibbs sampler [4] and the Metropolis-Hastings (MH) algorithm [8,15], has been drawing attention as a new and promising approach and was initially used to map QTL by Thaller and Hoeschele [20,21]. Additionally, owing to the advent of the Reversible jump Markov chain Monte Carlo (RJMCMC) algorithm [5], the number of QTL itself has become estimable, and several methods using the approach were developed [9,18,19,22,26]. Yi and Xu [27] developed a revolutionary Bayesian method to map multiple epistatic QTL in a population derived from two inbred lines, and by simulation studies. They showed that the method detects QTL accurately when either of the two QTL has main effects and they have epistatic effects, but when neither of the two QTL has main effects but they have epistatic effects, the pair of QTL may not be able to be detected by the method. Additionally, their method is applicable only to an F 2 population derived from two inbred lines.
The objective of this article was to propose an integrated method using Bayesian inference via MCMC algorithms for the detection of multiple QTL with epistasis under the mixed inheritance model in an F 2 population derived from two divergent breeds, such as cattle and swine. In this method, by treating a pair of QTL as one variable in the RJMCMC algorithm, the proposed method will work well, not only in cases where either of the two QTL has main effects and they have epistatic effects, but also in cases where neither of the two QTL has main effects but they have epistatic effects. We briefly evaluated the propriety of the proposed method by simulation studies. By using this method, we can simultaneously estimate the number, locations and effects of QTL, whether epistasis exists among the QTL or not, and also the extent to which the polygenic effect affects a quantitative trait.

Mixed inheritance model
In this article, we assumed an F 2 population generated from two breeds such as swine, cattle and so on, which were the upward-and downwardselected breeds in which different marker alleles were not completely fixed while different alleles at QTL were fixed in each breed.
Here, y represents a vector for the phenotypic values of a quantitative trait. The vector y can be described as where, and γ j = a j1 d j1 a j2 d j2 aa j ad j da j dd j . In these equations, β, u (∼N(0, Aσ 2 u )) and e (∼N(0, Iσ 2 e )) are vectors of the overall mean and/or covariates, polygenic effect and environmental effects, respectively, where A is the numerator relationship matrix, and σ 2 u and σ 2 e are polygenic and environmental variances, respectively. X and Z are design matrices relating β and u to y, respectively. l and m represent the numbers of QTL and pairs of epistatic QTL, respectively. δ i is the vector of additive genetic and dominance effects of the ith QTL, and γ j is the vector of additive and dominance effects of the first and second QTL in the jth pair, and epistatic effects of the pair. Q i is the matrix of probabilities of the ith putative QTL genotypes, and W j is the matrix of probabilities of two epistatic QTL genotypes and coefficients of epistatic effects in the jth pair, in which p ik (QQ) and p ik (qq) are the conditional probabilities that the individual k has two alleles derived from breeds 1 and 2 for the ith QTL, respectively. p ik (Qq) is the conditional probability that the individual k is heterozygous. In this article, instead of sampling QTL genotypes as one of the parameters, these probabilities are calculated deterministically by the method proposed by Haley et al. [7], using only the marker genotypes. These effects are based on the model proposed by Cockerham [3], except that the QTL genotype probabilities are used instead of the QTL genotypes themselves. For the convenience of the readers, the symbols used to denote all the parameters in this article and their meanings are shown in Table I.

Statistical analyses
In this study, the Bayesian approach via the Gibbs sampler, the MH algorithm and the RJMCMC algorithm were used. The posterior distributions for all the parameters involved can be generated from these algorithms.

Updating all the parameters except the number of QTL
For the overall mean β, main and epistatic effect of QTL δ and γ, polygenic effects u, and polygenic variance σ 2 u , the Gibbs sampler was used. In this algorithm, the new value of a parameter is sampled from the distribution conditioned on the current values of all the other parameters and the phenotypic values.
A new value of β is sampled from the normal distribution where η and τ 2 are prior mean and variance for the overall mean, respectively. New values of δ and γ are sampled from the normal distribution respectively.
The prior for the genetic effects is assumed to be distributed as N (0,τ 2 ), and T represents a diagonal matrix whose diagonal elements are τ. Q i and W j are determined based on the genotypes of flanking markers.
New values of u are sampled from the normal distribution where q is the order of A.
For QTL locations θ and environmental variance σ 2 e , the MH algorithm has been used. At the initial step of the algorithm, an arbitrary value for a parameter k, noted by k [1] , is determined and after implementation of the same procedures for all the other parameters in turn, the subsequent value for k is sampled from a symmetric uniform distribution on the interval [k [1] − d, k [1] + d], where d is a predetermined tuning parameter, updated with a probability min (1, λ). λ represents the ratio of likelihood when newly proposed values are substituted for previous values. If new proposals are accepted, these unknown parameters are updated, but if not, the previous values remain unchanged. By repeating this procedure many times, posterior distributions for θ and σ 2 e were generated. Once θ was sampled, the conditional probabilities p ik (QQ), p ik (qq) and p ik (Qq) were also determined.

Updating the number of QTL
Updating the number of QTL and epistatic QTL pairs, noted by l and m, respectively, needs a change in the dimension of the linear model, so the RJMCMC algorithm was used. The prior distribution of the number of QTL is the Poisson distribution with a mean µ l+m , and with l max and m max being set as the upper limits of QTL number and QTL pair number, respectively. In this article, we treated an epistatic QTL pair in the same way as a QTL with only a main effect. Therefore, if a QTL has both a main effect and epistasis with other QTL, the QTL is treated as one of a QTL pair and its main effect belongs to the vector γ. The new proposals can be one of the five following with the probability p a , p ae ,p m ,p d , and p de : to add one QTL (noted by proposal 1), to add one pair of QTL (noted by 2), to leave the QTL number unchanged (noted by 3), to delete one QTL (noted by 4), and to delete one pair of QTL (noted by 5). In this study, p a , p ae , p m , p d , and p de are all equal, being onefifth. When both l and m are 0, p d is 0 and when l and m are l max and m max , respectively, p a is 0. When the new proposal is 1, the location and the effects of a new QTL are sampled and the likelihood is recalculated, and the proposal is accepted with the probability where ϕ = (l, m, θ, δ, γ, Q, W, σ 2 u , σ 2 e ), and the likelihood ratio λ is as follows; When the new proposal is 2, the proposal is accepted with the probability The updating step follows that of proposal 1, except that new locations and effects are sampled for two QTL. When the proposal is 3, only the other parameters in the model are updated. When a proposal is 4, one of the QTL in the model is excluded with equal probability. The proposal is accepted with probability and ϕ* denotes ϕ without δ j , i.e., the effect of the deleted QTL. For proposal 5, one of the pairs of QTL in the model is excluded with equal probability. The proposal is accepted with a probability min The complete sampling outline is as follows: (a) updating µ using the Gibbs sampler; (b) updating l or m using the RJMCMC algorithm; (c) updating θ using the MH algorithm; (d) updating δ and γ using the Gibbs sampler; (e) updating u using the Gibbs sampler; (f) updating σ 2 u using the Gibbs sampler; (g) updating σ 2 e using the MH algorithm.

Simulation study
As mentioned above, an F 2 population generated from upward-and downward-selected breeds was assumed. Ten sires and 400 dams were picked up from each breed, noted by breed 1 and 2, respectively. Each dam has one progeny when they are randomly mated, and that the 400 progeny, noted by F 1 , are male or female with a probability of 0.5. F 2 individuals were generated by intercrossing the F 1 individuals randomly. At that time, each dam has three progeny, therefore the number of individuals in the F 2 population is expected to be 600, but in most cases, the number of females in the F 1 generation is not Each dam has one progeny. The sex ratio is set to .5.
Each dam has three progeny. As the number of F females is not exactly half of the total number of individuals, then the number of F individuals also fluctuates slightly from 600.  Figure 1.
The number of alleles in each breed was set to three, and two of them were common in both breeds. Each allele frequency was as follows: In breed 1, the frequencies of alleles 1, 2, and 3 were 0.8, 0.1, and 0.1, respectively. In breed 2, the frequencies of alleles 1, 2, and 4 were 0.1, 0.8, and 0.1, respectively.
In this study, we implemented simulation studies for four cases. In case 1, no QTL were set and only a polygenic effect exists. In case 2, one QTL with only a main effect and one pair of QTL with both main and epistatic effects between each other were present. In case 3, one QTL with only a main effect and two pairs of QTL with an epistatic effect were set, and for one of the two pairs, neither QTL has the main effect. In case 4, which is the most complicated situation, the same number of QTL were located at the same position as in case 3. One QTL has epistatic effects with the other two, and none of the three QTL has the main effect. The other two QTL have the respective main effects. The overall mean of the phenotypic values was set to 100, and it was assumed that there were two chromosomes, each of which were 100 cM long and both had eleven marker loci, one every 10 cM. The polygenic and environmental variances were 25 and 75, respectively. These situations and values were common in all cases. The locations and effects of QTL are presented in Table II. Phenotypic values were available only for the F 2 population and the pedigree information was used for the three generations.
The chain was run 50 000 rounds totally, and in order to eliminate the effect of a serial correlation, the chain was thinned and saved one sample per 10 cycles, so 5000 samples were finally saved per replicate. In total, 25 replicates were carried out for the respective cases, where the means, medians and modes   N (100, 8 2 ), and for δ and γ, the prior was set to N (0, 8 2 ). The mean of the prior Poisson distribution of the number of QTL and QTL pair number µ l+m was 3, and in case 3, to check the influence of µ l+m on the estimates, two additional different prior means (µ l+m = 2 and 5) were used. For the updating of θ and σ 2 e via the MH algorithm, the tuning parameter d was set to 2.5. For σ 2 e , the range from which the values were sampled was from 0 to the phenotypic value. The upper limits of QTL and QTL pair numbers, noted by l max and m max , were set to 4 and 3, respectively. The calculations started with the null model (l [1] = m [1] = 0), but for all the other parameters, the initial values were decided arbitrarily. In case 3, the different initial values, l [1] = l max and m [1] = m max , were also used to investigate the influence of this factor. Table III represents the posterior distributions of the QTL number in each case. Table IV represents the modes of the posterior distributions of all the other parameters in these cases. Since both the mean and the median were hardly different from the mode, the former are not shown here.

RESULTS
In case 1, neither the QTL with the main effect nor pairs of epistatic QTL were detected, which is consistent with true situations. The result indicates that the method can avoid type II errors in the absence of QTL and QTL pairs. In case 2, QTL pairs with both strong main and epistatic effects, which are relatively easy to detect, were mapped accurately. In cases 3 and 4, it is shown that the probabilities that the same number of QTL and QTL pairs as the true ones existed were extremely high (from 0.832 to 0.902), i.e., that the estimated QTL number corresponded to the true one. Especially, it should be noted that the proposed method succeeded in detecting epistatic QTL pairs without main effects. To help the understanding of the readers, Figure 2 shows histograms of the posterior distributions for locations of epistatic QTL in one of the replicates of cases 3 and 4.
Additionally, even if as in case 4, QTL that have epistatic effects with two QTL at the same time, or maybe more, were also detectable. For the other parameters, though only the environmental variance component was considerably overestimated, it is clear that the overall mean, the positions and effects of QTL, and polygenic variance were also accurately estimated. For comparison, one of the simulated data set in case 3 was analyzed using conventional interval mapping. Figure 3 shows that the QTL with the main effect, such as Table IV. The posterior distributions of the overall mean, the locations and effects of QTL, polygenic and environmental variances. a See Table I [1] = m [1] = 0 and l [1] = l max and m [1] = m max , respectively. The thin line and the bold line indicate QTL number and QTL pair number, respectively. QTL1 and QTL2, were detectable under the single-QTL model, but epistatic QTL without main effects could not be detected.
In case 3, it is also shown that since µ l+m was larger, the posterior distributions of l and m tended to shift to slightly larger values but the influence was not so strong as to alter their modes. For all the other parameters, quite similar results were yielded regardless of prior means. This indicates that the method is robust, if not perfect, to prior means of QTL number.
We also checked the influence of starting values, including the mixing behavior of the RJMCMC algorithm. The results are shown in Figure 4. Whether starting with l [1] = m [1] = 0 or l [1] = l max and m [1] = m max , though models which were quite different from the true one were sampled in the burn-in period, along iterations of the sampler, l and m gradually got closer to the true values, and finally the correct model was reached. The appropriate mixing behavior of the RJMCMC algorithm was observed for l, whereas for m, the sampler did not frequently move to the other models. However, the extent was not so serious, and this is not surprising because ten parameters were involved for a pair of QTL, whereas for a single QTL only three were involved. This problem can be overcome by sufficient iterations of sampling and by lengthening the interval at which the samples are stored.

DISCUSSION
Generally, it may be more likely that a quantitative trait is controlled by multiple QTL with different modes of inheritance and effects of various sizes, which may have some epistatic effects on one another. To ascertain the complicated mechanism of quantitative traits, conventional methods considering only one QTL at a time, such as interval mapping [13] and composite interval mapping [11,29], have seemed to be inadequate and these may increase the risk of both type I and type II errors. Therefore, it is necessary to develop a more sophisticated method, in which the multiple QTL are simultaneously taken into account. Since Bayesian approaches via MCMC algorithms have been generally used as an alternative to maximum-likelihood and least squares methods, several methods to detect multiple QTL have been developed in an F 2 population derived from two inbred lines such as mice and Drosophila [18], in a crossbred population such as swine [19], in a granddaughter design such as dairy cattle [22], and in a population with a more complex pedigree [9].
These methods are all based on the assumption that each QTL contribute independently to a quantitative trait. The estimation of the number of QTL itself has been emphasized too much, while epistatic effects among them have hardly been explored. Yi and Xu [27] developed an epoch-making method to map multiple epistatic QTL using the Bayesian inference via MCMC algorithms.
However, the authors stated that their method may not be adequate in cases where neither of two QTL has the main effect but they have the epistatic effect. Traits on which neither of two epistatic QTL has the main effect have been discussed in several articles, and epistatic loci that do not have the main effect do not seem to be negligible components in quantitative traits. van Wezel [23] showed that mice homozygous for the STS allele at the D18Mit57 marker and having at least one BALB/c allele at D17Mit72, and mice homozygous for the STS allele at the D17Mit72 marker and homozygous for BALB/c allele at the D18Mit57 marker were more susceptible to colon tumors, and stated that the two loci had the strong reciprocal interaction, but neither had the significant main effect. Li et al. [14] reported that, on three grain yield components in rice, the majority of epistatic markers did not appear to have individual effects. Then, by treating a pair of epistatic QTL as one variable in the RJMCMC algorithm, we have made it possible for two QTL to be added to or deleted from the model simultaneously. As a result, not only pairs of QTL that have both the main and the epistatic effects but also those that have no main effects but have significant epistatic effects are also detectable.
Moreover, the method proposed by Yi and Xu is designed for an inbred population, which can be obtained easily in crops and experimental animals. Because of a long generation interval, the risk of inbreeding depression, and great cost for feeding and management, it is very difficult to produce an inbred line by repeating consanguine mating in livestock. This is the reason why the method cannot be directly applied to complicated data in livestock, therefore, additional careful considerations and appropriate algorithms are required. Then, as is often the case with livestock such as cattle and swine, the model used assumes that putative QTL are diallelic with alternative alleles fixed in two parental breeds but alleles at marker loci are not completely fixed. We have also incorporated the deterministic approach to calculate the probabilities of QTL genotypes [7], which has little effect on the accuracy of estimation and save us considerable computing time [16,27].
Additionally, by setting the mixed inheritance model in which a random genetic effect, along with a multiple QTL with a moderate or large effect, is included, the method is able to take into account QTL that are segregating within each breed and a large number of polygenes with minute and individually undetectable effects.
In this study, an integrated system for detecting multiple epistatic QTL in an outbred population under the mixed inheritance model has been established. Recently, to overcome the drawback of the previous method [27], Yi et al. [28] have developed a more elaborate and robust method which can add or delete QTL two by two, and unlike our method, two QTL added into the model at the same time are not necessarily deleted together. In addition to removing insignificant effects of QTL (whether main or epistatic) from the model and to retaining the appropriate mixing behavior, effect indicators, which take a value one if QTL have significant main effects and zero otherwise, were used as additional parameters in the method. However, their method is principally designed for populations with only two genotypes, e.g., a backcross population, double haploid lines and recombinant inbred lines, and of course does not take a polygenic effect into account. They also stated that for an F 2 population in which higher-order epistatic effects are involved, the number of parameters and computational load will largely increase. In our method, epistatic QTL without significant main effects can be detected in a relatively simple way in an F 2 population, without any additional variables.
Of course, by some brief modifications, our method can be extended for different types of design, such as the half-sib population. The method is also robust to prior distributions and starting values, and the RJMCMC algorithm mixes reasonably. The mixing behavior of the RJMCMC algorithm depends on the way in which the effects of QTL added newly to the model are generated [27]. Especially, in the case of a large number of unknown parameters, sampling values for new parameters from uniform distributions, without any regards for the current values of the other parameters, resulted in a seriously low acceptance rate, i.e., "sticky" (data not shown). By conditioning the probability density function for a parameter on the current values of the other parameters and the phenotypic values, the mixing behavior is improved and appropriate convergence to the posterior distributions can be attained. As mentioned above, the results obtained by simulation experiments were almost consistent with the true values, and it gave researchers evidence that the system proposed in this study will work well in real cases.