Fine mapping of multiple QTL using combined linkage and linkage disequilibrium mapping – A comparison of single QTL and multi QTL methods

Two previously described QTL mapping methods, which combine linkage analysis (LA) and linkage disequilibrium analysis (LD), were compared for their ability to detect and map multiple QTL. The methods were tested on five different simulated data sets in which the exact QTL positions were known. Every simulated data set contained two QTL, but the distances between these QTL were varied from 15 to 150 cM. The results show that the single QTL mapping method (LDLA) gave good results as long as the distance between the QTL was large (> 90 cM). When the distance between the QTL was reduced, the single QTL method had problems positioning the two QTL and tended to position only one QTL, i.e. a "ghost" QTL, in between the two real QTL positions. The multi QTL mapping method (MP-LDLA) gave good results for all evaluated distances between the QTL. For the large distances between the QTL (> 90 cM) the single QTL method more often positioned the QTL in the correct marker bracket, but considering the broader likelihood peaks of the single point method it could be argued that the multi QTL method was more precise. Since the distances were reduced the multi QTL method was clearly more accurate than the single QTL method. The two methods combine well, and together provide a good tool to position single or multiple QTL in practical situations, where the number of QTL and their positions are unknown.


INTRODUCTION
Several studies have suggested combining linkage and linkage disequilibrium mapping for finding QTL, e.g. [1,4,12,14]. Both linkage analysis and linkage disequilibrium analysis have limitations when applied separately, but the joint analysis including both yields precise mapping results. Linkage analysis is powerful in detecting QTL. However, linkage analysis only takes into consideration the recombinations that can be observed using pedigree data in the genotyped generations to position the QTL. This implies that, even if dense marker maps are used, the confidence intervals of the position estimates are quite large, because dense marker maps generally give too few recombinations in the genotyped generations to delineate small regions of interest. Thus, finding confidence intervals denser than about 10 cM is very unusual for linkage analysis. Linkage disequilibrium (LD) however, is based on identity by descent probabilities (IBD), allowing to use historical recombinations, and thus can distinguish between very dense marker maps and give very short confidence intervals. LD alone is, however, likely to result in false positives, because of spurious associations between the markers and the QTL.
Most QTL mapping methods are designed to detect and map a single QTL, i.e. single QTL mapping. In a situation where there are more than one QTL in the investigated region, the effects of the other QTL may bias the results from single QTL analysis. One problem is when two QTL reinforce each other, thus leading to fitting a "ghost" QTL in between the two real ones [5]. In other cases two QTL may cancel each other's effects, thus leading to not finding any QTL.
To avoid the problems concerning several QTL, methods designed for multi QTL have been suggested, e.g. [3,9,10]. Although it is expected that combined linkage and LD mapping is less likely to fit a "ghost" QTL, because its teststatistic drops faster when moving away from the QTL position, it is unknown how sensitive this mapping method is to the presence of multiple QTL.
In this paper two previously presented methods combining linkage and linkage disequilibrium mapping, one using single QTL mapping (LDLA) and one using multi QTL mapping (MP-LDLA), are compared for their ability to find multiple QTL in studying simulated data sets, where the exact position of the underlying QTL is known.

Statistical analysis
A method combining linkage analysis and linkage disequilibrium (LDLA) has been described earlier [12]. This method uses single QTL mapping through maximum likelihood. Meuwissen and Goddard [9] described a multi QTL method combining linkage and linkage disequilibrium mapping (MP-LDLA) using a Bayesian framework for finding QTL. Basically, both these methods use a matrix of IBD probabilities (G i ), calculated as described by Meuwissen and Goddard [8], but use different test statistics for locating the QTL and make different assumptions about the QTL. LDLA tests whether there is a QTL at the putative QTL position or no QTL on the chromosome. MP-LDLA tests whether there is a QTL at the putative position given all other QTL fitted in the model.

Models
The LDLA mapping method is described in detail by Meuwissen et al. [12]. Briefly, the method consists of the following three steps. The first step estimates the linkage phases for the genotyped animals. Linkage phases are estimated on the basis of marker information. The second step calculates the G i matrix, which consists of IBD probabilities (at the putative QTL position) between the haplotypes. This matrix of IBD probabilities (G i ) is calculated from the marker haplotypes as described by Meuwissen and Goddard [8]. Identity by descent (IBD) probabilities between animals in the pedigree are calculated through linkage analysis (LA), while IBD probabilities between founder animals are estimated by linkage disequilibrium (LD) analysis. This results in a complete G i matrix of IBD probabilities between all haplotypes at the evaluated QTL positions, and thus assumes that all animals are descendants of a common base population. Only the midpoint of every marker bracket is considered, since for dense marker brackets the positions inside the bracket will have similar likelihoods. The third step in the analysis calculates the likelihood of the data using restricted maximum likelihood (REML). The likelihood is calculated after fitting a QTL at the midpoint of every marker bracket. The likelihood for a model containing a QTL and background genes is compared to a model only containing the background genes. The difference between the models is a function of the difference of the Log Likelihoods of the two models. The data are modelled by µ is an overall mean, 1 is a vector of ones, u is a vector of random polygenic effects, h i is a vector of random haplotype effects, e is a vector of random sampling errors, Z 1 is an incidence matrix relating polygenic effects with records and Z 2 is an incidence matrix relating haplotypes with records. The correlation matrices of h i and u are G i and A, respectively, where G i is the matrix of IBD probabilities between haplotypes at position i. The variances of the random effects h i , u and e and the likelihood of the model are estimated using the ASREML package [2]. IBD probabilities (G i ) and likelihoods are evaluated for a QTL in the middle of each marker bracket.
The MP-LDLA method is described in detail by Meuwissen and Goddard [9]. This method also includes the matrix of IBD probabilities (G i ), but estimates variance components and haplotype effects using Gibbs sampling. The method estimates an indicator variable I i , which indicates whether there is a QTL at the i th position, where only the midpoints of the marker brackets are considered as QTL positions. The data is modelled by where µ, 1, Z 1 , u and e are as for the LDLA method, Σ i denotes summation over all QTL positions (bracket midpoints), h i is a vector of haplotype effects, X i is an incidence matrix relating haplotype effects with records and I i is an indicator variable where where G i is the matrix of IBD probabilities between haplotypes at position i. Variance components and haplotype effects are estimated using Gibbs sampling with 100 000 cycles. The prior probability of having a QTL on bracket i (I i = 1) is proportional to the length of the bracket [13]. The total prior probability of having a QTL in the simulated data was assumed to be 100%, i.e. a QTL was previously detected in this region. The following independence sampling implementation of the Metropolis-Hastings algorithm was used to sample σ 2 hi for position i from p(σ 2 hi |y * ), where y * is the data corrected for the overall mean, polygenic effect and all other fitted QTL [11]: with a probability of Min[p(y * |σ 2 h(new) )/p(y * |σ 2 hi ); 1], and go to step 1, where p(y * |σ 2 hi ) denotes the likelihood of the data given variance σ 2 hi . The prior probability of having a QTL in bracket i (I i = 1) is: [13], S = 0.002 (assuming a priori that about 100 QTL explain the total genetic variance), and π ι is proportional to the length of the bracket [13] and chosen such that Σ ι π ι = 1, i.e. the total prior probability of having a QTL in the simulated data was assumed to be 100%, i.e. a QTL was previously detected in this region. The above Metropolis-Hastings sampling leads to "births" and "deaths" of QTL since the prior distribution suggests the presence or absence of a QTL with probability π ι or (1 − π ι ), respectively, which is very similar to the method of Sillanpaa and Arjas [15]. Flat priors are assumed for σ 2 u and σ 2 e . After discarding the first 10 000 cycles of the Monte Carlo Markov chain as burn-in, the fraction of cycles with I i = 1 gives a posterior probability of a QTL at position i. If the posterior probability exceeds 0.5 a significant QTL is considered present in the bracket. This is because a probability over 0.5 means the probability of having a QTL in the bracket is larger than the probability of not having a QTL [13].

The simulated data-sets
The simulated data consisted of five different sets of a granddaughter design. The different simulations differed in the distance between the two QTL. Distances between the two QTL were 15, 30, 50, 90 and 150 cM. Table I gives a description of the different marker maps used for the analysis. Basically, the marker spacing close to the QTL were 1 cM between each marker, while for the regions further apart from the QTL the marker spacing was 10 cM (5 for the situation where the QTL are only 15 cM apart). There were in total 50 simulations for LDLA and 25 for MP-LDLA. The reason for the reduced number of simulations for MP-LDLA was that analysing by this method is very computer time demanding.
The simulations were done by the gene dropping method [6,7]. First a population was simulated for 100 generations with a fixed population size of 100, 50 sires and 50 dams. Random mating among this pool produced the next generation of 20 sires and 1000 dams. The next generation consisted of 1000 sires that were produced by mating the 20 sires to the 1000 dams in the previous generation. The final generation consisted of 100 000 daughters. This gave a granddaughter design with 20 grandsires, each grandsire with 50 sons and each son with a half-sib group of 100 daughters. In the last generation, one founder QTL allele was chosen at random from the group of surviving founder alleles that had a frequency between 0.1 and 0.9. This obtained a value of 0.5, while the others were given a value of 0. The effect of the QTL was additive and the difference between homozygotes was 1. The average frequency of the MAF (minor allele frequency) was 0.3 leading to a QTL variance of approximately 0.1. The polygenic variance was 0.1 and the error variance was 0.9, leading to a heritability of 0.18.

RESULTS
Two different situations were tested for each of the five different distances between the QTL. First, a sparse marker map was analysed, with marker distance of 10 cM (5 cM for the situation where the QTL were only 15 cM apart). It was tested whether both the LDLA and MP-LDLA method were able to detect both QTL using a sparse marker map. At least this should be the case when the QTL were far apart. Secondly, based on the sparse results, new investigations were performed with a dense marker map (1 cM). For the situations with 150, 90 and 50 cM between the QTL, this was done by two different analyses, one for each of the regions containing the QTL. For 30 cM between QTL, the LDLA analysis was performed for a region containing both the QTL, while the MP-LDLA analysis was done for two separate regions. This was due to the fact that analysing the entire chromosome was very computer demanding. For 15 cM between QTL both the LDLA and MP-LDLA analysis was done for the entire region including both QTL.
Tables II, III and IV show the precision of the two methods using 10 cM, 5 cM and 1 cM marker brackets. Figures 1, 2 and 3 show the mean results for Log Likelihoods and posterior probabilities based on all simulations (50 simulations for LDLA and 25 simulations for MP-LDLA). The deviation is given as the number of brackets separating the most likely position from the correct position. Also, the tables include all distinct peaks from the analysis. This implies that in some cases the number of QTL in the Tables differs from the number of simulations, because the analysis sometimes found additional QTL and sometimes did not find both QTL. When analysing for sparse (10 cM) brackets the LDLA method gave very convincing results as long as the distance between the two QTL were large (150 cM and 90 cM), considering that the most likely QTL positions were never more than one bracket apart from the correct QTL positions. Since the QTL were placed closer (50 cM) the precision was reduced, and LDLA in almost 50% of the simulations fitted only one QTL.

Continued.
For closely linked QTL (30 cM) the method only positioned one QTL in 48 of the 50 simulations. When the distance was 30 cM the method either found one of the correct QTL, or found one QTL between the two correct QTL positions, i.e. a "ghost" QTL. For the situation when the QTL was 15 cM apart an analysis with marker brackets of 5 cM was used, which kept the QTL in the bracket midpoint. The results from analyses with 5 cM marker brackets are shown in Table III and Figure 2. When the distance was 15 cM, most fitted QTL using the LDLA method were in the positions between the two real QTL, and so can be viewed as "ghosts". The MP-LDLA method however, generally showed good accuracy in all evaluated situations, placing the QTL in either the correct bracket or the bracket next to the correct bracket for more than 90% of the QTL positions. There was, however, a few simulations in which the MP-LDLA method did not find one of the QTL. Both for the situation when the QTL were 90 cM apart (3 missed QTL), the situation when the QTL were 50 cM apart (2 missed QTL) and the situation when the QTL were 30 cM apart (2 missed QTL).
For marker brackets of 1 cM, the results were different. For the larger distances between the QTL (150 or 90 cM), LDLA did position the two QTL better than MP-LDLA. The MP-LDLA method was more precise when the distance between QTL was 50, 30 cM or 15 cM. However when looking at the figures, it is clear that even though LDLA found the correct QTL position more often than MP-LDLA for the two large distances between the QTL, LDLA did produce quite broad likelihood peaks on average over all simulations. MP-LDLA placed the QTL in the brackets next to the correct position more often than did LDLA in these situations, but the mean posterior probabilities gave very distinct peaks around the correct positions. These differences were probably due to the different test statistics used by the two methods.
The situation when the QTL were 30 cM apart was special. For the LDLA method, the complete region between the QTL was analysed in one analysis, while for MP-LDLA this had to be done in two analyses, because the computer capacity was not large enough to handle the complete region in one analysis. As shown in Figure 3, this produced some "ghost" QTL at the ends of the regions evaluated in each run by MP-LDLA. To get a better comparison of the two methods and avoid the bias of the closely positioned QTL, an additional evaluation was performed with 5 cM brackets. These results are shown in Table III and Figure 2. When the 5 cM brackets were analysed, MP-LDLA did not produce any "ghost" QTL, and yielded more precise results than LDLA.
One possible disadvantage concerning both methods is the fact that both of them did position more than two QTL in many of the runs. For the LDLA method, this was often a smaller peak some distance away from the fitted QTL. For the MP-LDLA method, this was typically expressed by two close brackets showing the same posterior probability for a QTL. This was often two neighbouring brackets, and is probably due to similarities in haplotypes for these neighbouring brackets. However, this could also be interpreted as one inaccurately positioned QTL. In other cases, there were one or three brackets with very low posterior probabilities in between the two brackets showing the same high posterior probability for a QTL. The middle bracket in these cases was most often the correct QTL position. So, instead of finding the correct position, the method in these cases found two QTL positions, either the two brackets next to the real position, or the brackets two positions away from the real QTL.

DISCUSSION
Although these methods are designed for handling fine mapping of QTL, they are also both well suited for analysing broader chromosome parts, because they combine linkage analysis and linkage disequilibrium analysis. For sparser analysis, the IBD probabilities between the founder haplotypes will generally be small, and the analysis will approach that of a pure linkage analysis. For the two sparse situations (with QTL 150 cM and 90 cM apart) both methods were able to give good results even if there were two QTL in the analysed material. This is because when the QTL are as far apart as 150 and 90 cM, they are not expected to influence each other much.
The investigated methods were very computer time demanding. This was especially a problem in the current simulation study where many replicates were analysed. However, when analysing real data sets, the computer time is a lesser problem because one needs to analyse only a single data set. E.g. when the analyses of a replicate takes one day, this is too long for analysing 25 or 50 replicates, whereas it is quite acceptable when analysing a single practical data set.
For the LDLA method, we did expect biased estimates of the QTL positions when the distance between the QTL were reduced. This was because the method considers only one QTL, and the positioning will be influenced by the other QTL. The results for the LDLA method clearly showed bias due to the second QTL when the distance between the QTL were 50, 30 or 15 cM and the sparse marker map was used. The results included both "ghost" QTL and also showed large areas where the Log Likelihoods were very similar. Within these areas, the method also tended to find quite a few spurious QTL marked by small peaks in the Log Likelihoods. The additional likelihood peaks found further away from the correct QTL positions than five marker brackets were not included in the results, because these were generally lower likelihoods than the ones positioned around the real QTL positions, and also because the peaks were less distinct. Most often they resulted from a peak in likelihood for one single marker bracket in a trend that generally was decreasing. Thus, one deviating bracket was most often the reason for these additional peaks. The spurious QTL did, however, result from the problems LDLA had in positioning the QTL within this area that really included two QTL. The results for the 1 cM marker maps are generally better, with clear indications for more than one QTL. The method however, had problems positioning the QTL. Looking at Figure 3e) there were no clear peaks found anywhere in the analysed region, though there were indications for QTL around the two correct positions, where the Log Likelihoods were higher than over the closest brackets.
The MP-LDLA method generally yielded good estimates of QTL positions for all evaluated situations. The method was superior to LDLA for positioning QTL when the QTL were 50, 30 or 15 cM apart. The method also gave good estimates when the QTL were further apart (150 or 90 cM), but in these situations LDLA in more cases positioned the QTL in the correct bracket. Considering that LDLA gave wider likelihood peaks, it still could be argued that the precision was better with MP-LDLA. MP-LDLA in some runs had difficulties in positioning the QTL. The problems included both missed QTL and some cases where the method positioned spurious QTL. Generally, these problems were more frequent when using the sparser marker maps. For the dense situations, the method yielded more precise results. This might be expected, since the method is intended for fine mapping using dense marker maps.
These simulations indicate that the results of LDLA and MP-LDLA combine well and together provide a good tool to find one or several QTL affecting the trait in question. In practical situations, where one does not know how many QTL there are beforehand, the general recommendation is to use both methods. LDLA should be able to give good estimates as long as there is only one QTL present, but the fact that LDLA tends to find "ghost" QTL when there are close QTL makes it necessary to use the MP-LDLA method in addition. In this way one should be able to find and position the QTL, regardless how many there are.