 Research
 Open Access
Linear models for joint association and linkage QTL mapping
 Andrés Legarra^{1}Email author and
 Rohan L Fernando^{2, 3}
https://doi.org/10.1186/129796864143
© Legarra and Fernando; licensee BioMed Central Ltd. 2009
 Received: 22 January 2009
 Accepted: 29 September 2009
 Published: 29 September 2009
Abstract
Background
Populational linkage disequilibrium and withinfamily linkage are commonly used for QTL mapping and marker assisted selection. The combination of both results in more robust and accurate locations of the QTL, but models proposed so far have been either single marker, complex in practice or well fit to a particular family structure.
Results
We herein present linear model theory to come up with additive effects of the QTL alleles in any member of a general pedigree, conditional to observed markers and pedigree, accounting for possible linkage disequilibrium among QTLs and markers. The model is based on association analysis in the founders; further, the additive effect of the QTLs transmitted to the descendants is a weighted (by the probabilities of transmission) average of the substitution effects of founders' haplotypes. The model allows for noncomplete linkage disequilibrium QTLmarkers in the founders. Two submodels are presented: a simple and easy to implement HaleyKnott type regression for halfsib families, and a general mixed (variance component) model for general pedigrees. The model can use information from all markers. The performance of the regression method is compared by simulation with a more complex IBD method by Meuwissen and Goddard. Numerical examples are provided.
Conclusion
The linear model theory provides a useful framework for QTL mapping with dense marker maps. Results show similar accuracies but a bias of the IBD method towards the center of the region. Computations for the linear regression model are extremely simple, in contrast with IBD methods. Extensions of the model to genomic selection and multiQTL mapping are straightforward.
Keywords
 Linkage Analysis
 Genomic Selection
 General Pedigree
 Haplotype Class
 Small Chromosomal Segment
Background
Linkage analysis (LA) is a popular tool for QTL detection and localization. Its accuracy is limited by the number of meioses observed in the studied pedigree, which can represent several centiMorgan. Linkage disequilibrium (LD, also called gametic phase disequilibrium) is the nonrandom association among different loci, and is increasingly used in human and agricultural association studies for gene mapping. The joint use of LD and LA (also called LDLA) permits to map QTL more accurately than LA while retaining its robustness to spurious associations, and this technique has been applied in human [1], plant [2], and livestock [3] populations. This is achieved by explicitely modelling relatedness not accounted for in association analysis [2]. LDLA is also robust to nonadditive modes of inheritance [4]. In addition, the joint use of LD and LA makes it possible to test linkage alone or linkage disequilibrium separately [1]. A characteristic of plants and livestock is that often, close pedigree relationships exist and are recorded among the individuals genotyped for QTL detection (e.g., bulls or plant varieties), and including these relationships in the analyses can be worthwhile.
In livestock, several approaches have been proposed to take into account LD information within LA [3, 5, 6]. These methods model the process generating LD among the putative QTL and the surrounding markers; this process can quickly become unmanageable in the general case [7], and even difficult to approximate [8–10]. Extensions of LD models to include LA (that is, the cosegregation of markers and QTL due to physical linkage) are cumbersome for the general case [6] or restricted to certain pedigree structures like halfsibs families (C. Cierco, pers. comm.). The parameters of LD generating processes can be either estimated from the data, which is often difficult, or fixed a priori which is unsatisfactory. The existence or not of these events in the past history of a population is unknown. Therefore the validity of any assumptions is largely unknown.
An alternative is QTL mapping by simple association (regression in the case of quantitative traits) of phenotypes on marker alleles, which has been s.hown to be an effective method [11, 12], while retaining simplicity; this is widely used in human genetics [13]. On the other hand, QTL mapping in livestock by LA relies heavily on the use of half and fullsibs families and relatively simple ascertainment of phases and transmission probabilities (e.g. [14]). For this reason, HaleyKnott type regressions for simple designs [14] and variance component methods for more complex designs [15] are well adapted, computationally simpler and almost as good [16, 17] as full integrated likelihoods [18, 19]. Linear models are appealing for their ease of use and understanding and good performance.
In this work, we combine association analysis with probabilities of transmission using conditional expectations. Ultimately, we come up with linear models for joint association and linkage mapping, which are generalizations of LA mapping. Two particular cases will be detailed: a halfsib regression which applies in many livestock practical settings, and a general mixed model approach valid for any type of pedigree.
Methods
This section is organised as follows. In the subsection "Splitting QTL effects", we show how to come up with expectations for gametic QTL effects integrating association and linkage. The following two subsections "LDLA HaleyKnott type regression" and "Variance components mapping" explicitly present two linear models (HaleyKnott type regression for halfsib families and a general mixed model for a general pedigree) and the statistical tests that lead to QTL detection, location, and ascertainment of the hypothesis linkage, association, both or lack of both. Numerical examples and performance of the methods are illustrated by simulations in subsection "Illustrations", under two different scenarios.
Splitting QTL effects
In this section we will show how QTL effects can be split in a part conditional on LD in the founders and cosegregation, and another part which is unconditional on LD in the founders. This results in a flexible linear model setting. Throughout the paper, we will assume a polymorphic QTL with an unknown number of alleles nq: {q_{1} ⋯ q_{ nq }}, with effects α= (α_{1} ⋯ α_{ nq }); dominance is not considered. Let v denote the additive effects of all gametes carriers of QTLs in a population; this will be referred to as "gametic effects" (e.g. [15]).
In the following we consider haplotypes, which are phased markers, i.e., a set of 1, 2, or several ordered markers on the same chromosome. Haplotypes can be classified in classes. Classes can be formed by simple classification or by more sophisticated techniques such as cluster analysis [20, 21]. For the sake of discussion we will assume that haplotypes are composed of two markers with a putative QTL located at the middle, but our approach is general and conditional only on the existence of haplotype classes.
In all the following, we generally consider a single position in the genome. This position is situated on a specific chromosome number of the physical map or karyotype; for example, BTA14. In a diploid species, each individual has two copies of each chromosome: one from the paternal side and one from the maternal side. Identification of the origin of each chromosome copy is not always possible. In the following, when referring to any given chromosome pair containing a specific locus of the genome and to distinguish the two chromosome copies, we shall note them 1 and 2.
The haplotype (jth chromosome in ith individual, j = {1, 2}) can be assigned to a haplotype class k through a function δ( ) acting on a haplotype h. In its simplest form, δ( ) is a lookup table. So, for the case of two flanking SNPs, classes are 1 to 4, composed of haplotypes 00, 01, 10 and 11. The number of haplotype classes at the candidate position is nh.
We assume that linkage disequilibrium exists between haplotype classes and QTL alleles. Conditional on each haplotype class, population frequencies for a QTL state are denoted by matrix π= {π_{1,1}⋯ π_{nq, nh}}. That is, the probability of QTL state l conditional to haplotype class k is Pr(Q ≡ q_{ l }k) = π_{l, k}. Assuming linkage equilibrium, π_{l, 1}= ⋯ = π_{l, nh}= π_{ l }, the marginal population frequency of the lth allele of the QTL. In this situation, haplotype classes are not informative on QTL states. However, given disequibrium between the markers loci and the QTL locus, π_{ l }, ... will vary among the different haplotype classes.
Founders
Now, can be modelled as the sum of a conditional expectation plus a deviation: , where this deviation (assuming the true state of the QTL is q_{ l }) is as above. The deviation has a discrete distribution with possible states {(α_{1}  β_{ k }),⋯(α_{ nq } β_{ k })} with probabilities {π_{1, k},⋯ π_{nq, k}}, which are generally unknown.
Nonfounders
where the subscripts 1 and 2 refer to the two QTL alleles of the sire and the dam. In the expression above, four probabilities are needed because maternal and paternal origin can not always be stablished with certainty [26] and, for the same reason, labels 1 and 2 are used instead of "paternal" and "maternal" for each QTL allele in each individual. Elements in w_{i, j}sum to 1.
The , the deviation of with respect to its expectation has states with associated probabilities which are conditional on marker information as well.
The two building blocks in the previous section (modelling of expectations of gametic effects in founders by LD, and of non founders by conditioning on founders and LA) allow us to construct several linear models considering LD, LA, or both. In the next two sections, we will detail two linear models including LD and LA for cases commonly used in livestock genetics: a regression approach applied to idealized pedigree structures (halfsib families), and a more flexible variance component approach which can be used for general pedigree structures.
LDLA HaleyKnott type regression
where e is a vector of residuals. This model can be fitted by, for example, leastsquares. Tests for QTL detection and location using interval mapping can be done by likelihood ratio or Ftests, assuming homoscedasticity of variances. Variances are indeed not homogeneous, for example, if a QTL is fixed within a haplotype class but not in another. The non consideration of dam effects also inflates the residual variance. Note, in addition, that the model is generally not fullrank: effects are non estimable withinsire (but their contrasts are). The β coefficients will be estimable if they are not confounded with any gametic effect; that is, if no haplotype class is present in one sire only. However, this does not create any problem for QTL localization and detection.
An interesting property of the model is that it is a generalization of HaleyKnott regression [14, 19], which occurs if we assume linkage equilibrium among founder haplotypes. Note that spurious signals due to, for example, stratification, are unlikely in this model because there is a verification, through linkage (i.e. the PDQ's) that associated haplotypes are transmitted to the next generation and still have an effect. This breaks down spurious associations that would be observed at the founders' level.
This expression models appropriately the cosegregation of markers and those QTL in LD with them. We call this model "LD decay" because it models appropriately the decay of initial LD existant in the founders by tracing the effect of the different segments through the pedigree with the aid of flanking markers, i.e., by linkage. However, it would not detect a QTL in the case of LE.
Statistical testing
Many tests are possible using the statistical model in equation (13). Usually (for example in interval mapping), several possible QTL locations are tested simultaneously or sequentially. For a particular putative QTL location, the null hypothesis is the nonsegregation of alleles of the QTL having different effects. This implies that all haplotype substitution effects, as well as the v* deviations, have the same value. This amounts to a common overall mean for the data, with β= 0, = 0. There are three alternative hypothesis depending on the existence of complete linkage disequilibrium, only linkage, or both.
 1.
H _{0} (null hypothesis): No cosegregation markersQTL effects (i.e. no linkage) and no linkage disequilibrium among haplotypesQTL: β= 0, = 0.
 2.
H _{1}: Complete linkage disequilibrium at the founders: β≠ 0, = 0.
 3.
H _{2}: Linkage equilibrium at the founders but cosegregation markersQTL effects: β= 0, ≠ 0.
 4.
H _{3}: Incomplete linkage disequilibrium at the founders and residual cosegregation markersQTL effects: β≠ 0, ≠ 0
In addition, it is possible to test H_{3} against H_{1} and H_{2}.
Variance components mapping
Extension to a variance components or mixed model mapping framework [15, 27, 28] is possible [29, 30]. As before, let v be the gametic effects for all the QTL gametes in the population. We will show how the first and second moments of the joint distribution of v can be constructed, conditional on marker information and within haplotypic classes means and variances.
Each gametic effect is modelled as (i) a weighted average of the gametic effects of its ancestors (for nonfounder individuals) or of haplotypic effects (for founder individuals), plus (ii) independent random variables due to mendelian sampling [15], ϕ. The expression (15) potentially includes nonfounder gametic effects in the progeny of nonfounder animals, allowing for generality and multigenerational pedigrees.
Note that is partitioned into founders and nonfounders, and all subsequent partitioned matrices. In particular, W can be partitioned accordingly, so that rows tracing the origin of founder gametes from other gametes in the population are formed by 0's. Note that the setting is very similar to a genetic groups model [31]. Rules for computing the first and second moments of the distribution of the gametic effects v follow [29].
Conditional distribution of the gametic effects
Conditional mean for the gametic value
For founder alleles, conditionally on the haplotype , this is simply the mean of the corresponding haplotypic class, that is , as is 1 for k = δ( ) and 0 for anything else.
where (I  W)^{1} represents summation over all possible paths of transmission from ancestors to descendants, and represents the expected franction of founder gametes in the descendant gametes [31]. Matrix Q_{ f }is an incidence matrix relating founder gametes to founder haplotypic classes. Matrix Q can be recursively computed using equation (16). These expressions are similar to the QTL crossbred model [32, 33], save for groups for founders, which are based on haplotype classes instead of breeds.
Conditional variance of the gametic value
where the computations of and have been previously shown. Note that this expression reduces to the classical one [15] under linkage equilibrium.
Conditional covariances
where the covariances in the right hand side are also conditional on m and β. This formula is the same as for the case of linkage equilibrium in the founders [15, 26]. However, the variances differ due to the different haplotype origins, and the covariances will not be the same as those under linkage equilibrium.
Statistical model
where X and Z are incidence matrices and b is a vector of fixed effects. Residuals e are normally distributed e ~ MVN(0, R), where MVN stands for multivariate normal, and R = I .
Note that enter nontrivially into G. For the maximum likelihood techniques, derivativefree techniques might be used with equation (22). For the Bayesian approach, albeit the "data augmentation" of gametic effects in (23) or (24) partly simplifies computations, the full posterior conditionals of θ do not have closed forms; MetropolisHastings might be used. Other possible simplifications are:

Supress v* from the model in (24), i.e. y = Xb + ZQ β+ e. This implicitely assumes: (i) QTL alleles are fixed within haplotype class; and (ii) transmissions are known with certainty (i.e. PDQ's are either 0 or 1). Under these two conditions, Var(v*) = 0. This might happen for very dense marker maps where markers are fully informative on QTL state and transmissions. The result is a leastsquares estimator as follows:

Assume constant variances across classes and, further, that PDQ's are known with certainty. If this is the case, Var(v*) = and standard algorithms and software (e.g., REML) can be used.

If variances are not constant within class but each gametic effect can be asigned exactly to a class k (i.e. PDQ's are either 0 or 1), then its variance is . This is a mixed model with heterogeneity of variances. This assumption is similar to that by PérezEnciso and Varona [33].
 1.
H _{0} (null hypothesis): No segregation of QTL effects (i.e. no linkage) and no linkage disequilibrium haplotypeQTL: .
 2.
H _{1}: Complete linkage disequilibrium at the founders: .
 3.
H _{2}: Linkage equilibrium: .
 4.
H _{3}: Incomplete linkage disequilibrium at the founders: .
Illustrations
Numerical examples
Pedigree and markers for the numerical example
animal  dam  sire  Maternal haplotype  Paternal haplotype 

1  0  0  10  01 
2  0  0  11  00 
3  0  0  01  11 
4  1  2  10  00 
5  3  2  01  11 
6  3  2  01  01 
7  2  5  00  11 
8  2  5  00  01 
LDLA regression
Variance components mapping
In order to construct the mixed model equations we assume certain values for the class substitution effects β' = [0.9, 0.5, 0.5, 0.1] and for the withinclass variances = (0.09, 0.25, 0.25, 0.09) (in practice these values have to be estimated).
Expectation of gametic effects
Covariance matrix of gametic effects
Simulations
Scenarios
First, four simulations were carried out to check the behaviour of the different methods for fine mapping. We used the LDSO software for the simulations (F. Ytournel, pers. comm), a set of programs developed at INRA (T. Druet, F. Guillaume, pers. comm.) for phase determination and computation of PDQs, and userwritten programs for setting up and solving the linear models.
The first set of scenarios will be termed as "drift". Two subscenarios differing on the size of the region of interest (5 or 20 cM) were designed. A 5 (alternatively, 20) cM region with 21 SNP markers (i.e., 20 brackets), with a biallelic QTL at position 2.125 (alternatively, 8.5) cM (at the middle of the 9th bracket). The QTL was biallelic with an effect of 1 for the second allele. No foundational event was assumed (i.e., marker and QTL alleles were assigned at random in the ancestral population). SNP alleles were assigned at random in the founders. This population evolved during 100 generations with an effective size of 100. Therefore the only source of LD was drift. After these populational events, a daughter design was simulated, with 15 sires each with 20 daughters. Phenotypes were simulated according to the QTL effects and to a residual variance of 1; no polygenic effects were simulated. This is a scenario where IBD methods are likely to perform well. Although the design is fairly small for dairy cattle, it is not unlikely for swine or sheep, and our purpose was not to provide a large amount of information.
The second two scenarios ("admixture") are radically different and include strong admixture. Again, 5 and 20 cM region are considered, with same positions for the QTL. Initially, two breeds existed differing in their polygenic average by 1. A QTL is considered with equal frequency in each breed, with an effect of 1 for the second allele. SNP alleles were assigned at random in the founders. Both breeds were crossed and a mixed population of 50 individuals evolved during 20 generations. A daughter design as before was simulated. Phenotypes were simulated according to the QTL, the inherited polygenic part of each breed, and a residual variance of 1. This scenario might generate admixture by drift if one SNP locus is indicative of breed origin.
Methods
We compared the performances of five different methods: (1) LA: HaleyKnott linkage analysis [14], (2) LDLA: the regression LDLA method in this work (equation 13), (3) LD decay: LDLA regression by equation (14), that is, ignoring the v* terms, (4) twomarker: regression on twomarker haplotypes (i.e., association analysis), and (5) an IBD method [3, 34], which computes IBD among founders based on all markers (Lee, pers. comm.).
The simplest approach is to perform single marker association analysis, which has been shown to be as good as more complex methods in quite a variety of scenarios [35]. We nevertheless discarded this option because the simulation method places the QTL in the middle of a bracket. This automatically penalizes the singlemarker analysis. Further, by using always two markers we can compare all methods in the same grounds (except IBD).
For the twomarker and IBD method, phases were assumed to be known with certainty; this might have resulted in slightly optimistic results. Performance of the different methods was based on the average error (i.e., the bias, in cM) and the mean square error (MSE, in cM^{2}). All simulated populations had a minor allele frequency of 0.1 at least for the QTL. One hundred replicates were run.
Results
Performance of five finemapping methods in drift and a 5 cM segment.
Method  Bias  MSE  

LA  0.29  (0.15)  2.22  (0.22) 
LDLA  0.06  (0.08)  0.67  (0.09) 
LD decay  0.11  (0.08)  0.69  (0.10) 
Twomarker  0.13  (0.08)  0.66  (0.10) 
IBD  0.34  (0.08)  0.78  (0.15) 
Performance of five finemapping methods in drift and a 20 cM segment.
Method  Bias  MSE  

LA  0.51  (0.44)  19.61  (2.89) 
LDLA  0.18  (0.26)  7.06  (1.65) 
LD decay  0.12  (0.24)  5.68  (1.52) 
Twomarker  0.05  (0.24)  5.89  (1.41) 
IBD  1.20  (0.19)  5.14  (1.51) 
Performance of five finemapping methods in admixture and a 5 cM segment.
Method  Bias  MSE  

LA  0.42  (0.14)  2.06  (0.21) 
LDLA  0.31  (0.14)  2.15  (0.23) 
LD decay  0.31  (0.14)  2.02  (0.21) 
Twomarker  0.16  (0.13)  1.82  (0.18) 
IBD  0.23  (0.13)  1.69  (0.20) 
Performance of five finemapping methods in admixture and a 20 cM segment.
Method  Bias  MSE  

LA  0.58  (0.60)  36.60  (3.42) 
LDLA  0.97  (0.53)  28.43  (3.06) 
LD decay  0.04  (0.49)  23.76  (2.56) 
Twomarker  0.17  (0.51)  25.89  (2.50) 
IBD  1.94  (0.50)  28.78  (3.43) 
In the "drift" scenario, LA and IBD methods are biased for the 5 cM scenario, and the IBD method is biased for the 20 cM scenario. The ranking of the methods changes with the scenario, with LA being always the worst in accuracy, as expected. The reason of the inability of LA to map the QTL is simple: in small intervals, recombinations  which are needed for LA to map a QTLseldom occur. Thus, in the 5 cM scenario, the performance of LA is roughly equivalent to random mapping of the QTL. For the remaining methods, differences are indeed largely nonsignificant except for the bias.
Most of these results are due to the multimarker nature, in this order, of LA, IBD and LDLA, which might tend to point central regions since, in these, haplotypes are more informative and PDQ's are better calculated. This is alleviated in the LDLA method and LD decay method by the implicit twomarker association analysis.
As for the admixture scenario, Table 4 shows basically that QTL location cannot be accurately estimated. The reason is that the scenario is not informative enough due to the low number of historical recombinations and the noise added by admixture. Differences in performances (MSE) of the different methods are not statistically significant; however, LA, LDLA and LD decay do show some bias.
Discussion
Comparison to other models for LDLA
We have presented a method for joint association and linkage, which belongs to a more general class of joint linkage disequilibrium and linkage analysis. In fact, existing methods belong to one of two exclusive classes: those that model somehow the LD phenomena and those that do not.
Some models [5, 6] assumed a mutation followed by expansion of the mutant haplotype. The pertinency of this scenario in general agricultural populations (and indeed in complex diseases in humans) is arguable. The likelihood in Farnir et al. [5] was an approximation, based on the assumption of independence among contiguous markers; and the form of the likelihood was only appropriate for family designs. The more complex model in PérezEnciso [6] holds for any pedigree structure below the founders, but computations were difficult.
Models for association and linkage in human populations exist [1, 36]. These, although very similar to our approach, are difficult to apply to livestock since they are rigidly familystructured; in addition, the QTDT [1] uses unilocus information only for transmission events, whereas in our model it is possible  and recommended  to use multilocus information to compute the PDQ's, and it is possible (but perhaps not useful) to define haplotype classes spanning several loci. Conversely, the QTDT has no need of PDQ calculation or  possibly  map ordering of SNPs.
The most popular model for LDLA QTL detection in livestock has been reported by Meuwissen et al. [3] and has been fairly used [20, 37]. The method relies on the construction of a matrix of covariances among founders (the socalled IBD probabilities), say H, based on identity of state among markers; these IBD probabilities are derived following approximate coalescent models [8, 9, 38, 39]. IBD methods use the same parameter (the variance assigned to the QTL) for both covariance due to association and covariance due to linkage. Modelling linkage and association using different parameters (β and v) allows for a greater flexibility in our model. This can be explained as follows.
Relationship of the IBD method for LDLA to our approach
The covariance matrix for the entire vector of gametic effects can be computed, recursively, using equation (20) in our paper, starting with the covariance matrix in equation (29). If there is no LD, will be zero and (29) will reduce to the , which is the covariance matrix under LE. Also, (29) depends on two variance components that relate to the gametic variance due to LD and the remainder. In the IBD method [8, 9], Var(v_{ f }) is written as , where H is an IBD matrix with diagonals equal to 1 and offdiagonals given by . Thus, in the IBD method [8, 9] the partitioning of the gametic variance due to LD and the remainder is entirely dependent of the assumptions underlying the computation of Σ.
A practical problem using IBD methods (our experience is with the IBD methods [3]), is that often matrix H turns out to be negative definite; hence the likelihood of the phenotypes is undefined. The reason is that construction of H is not based on a joint distribution for all founder chromosomes, but it is computed for two haplotypes (or chromosomes) at a time, marginalizing over the rest. This leads to approximate marginal probabilities in H instead of a joint distribution. Thus, the estimated H matrix is at best an approximation. A way to deal with nonpositive definiteness is bending [40], or clustering (a data reduction technique) [20]. Both approaches might result in a loss of information, have unknown statistical properties and are subject to arbitrary tuning parameters.
At any rate, both modelling the LD phenomena and IBD based models rely in assumed population events. The robustness of these methods to, for example, admixtured breeds, is largely unknown and difficult to verify. Our model and those by Fernando et al. and Gilbert et al. [30, 41] do not model the process generating LD among QTL and markers, and therefore are more general. The only strong assumption that they made was that of a biallelic QTL, which is overcome in ours, at the price of a greater number of unknowns.
Originality
The originality of our approach is that (i) it is feasible and welltaylored for some agricultural populations, in particular livestock (because it relies on phase and transmission information easily ascertainable, and holds for any family structure) and corn (where indeed a similar idea  nested association  has been developed [42]), (ii) it is a linear model (with all the adequate machinery), while (iii) at the same time providing, based on expectations and covariances, a simple and coherent linearmodels framework for association and linkage and (iv) reduces to wellknown models on the hypothesis of LE or complete LD. Indeed, our models allows us to test the four relevant hypotheses (disequilibrium, linkage, both or none) and reduces to association or linkage under the respective hypothesis, which is not the case for other methods such as IBD models for example, which assume that LD exists.
Our method is computationally simple to use, provided that phases and PDQ's can be accurately calculated. If this is not the case, inference is possible, in principle, by integrating over all the joint distribution of phases and transmissions. After phase determination and computation of PDQs, all the machinery of the linear models can be applied. This makes it possible to include simultaneously other effects (environmental effects, polygenic effects) and the use of other tools such as permutation tests, bootstrapping and in particular the simultaneous fit of several QTLs [43]. The latter one is of particular interest for recent developments in genomewide genetic evaluation ("genomic selection") using LDLA. For example, the number of simultaneous effects fitted by Calus et al. [44] was ~ 600,000 for twomarker haplotypes in a genome composed of ~ 2300 markers. If a "LD decay model" is used (such as equation 25) the number of equations is linear in the number of loci, while retaining the use of LD and of some of the LA. Even with the full linkage and association model (equation 24), sparsity of the mixed model equations is guaranteed.
A practical problem with the method is how to define "classes" of haplotypes; for example, how many markers to include in the definition of the classes. Including more markers in the definition of the haplotype is straightforward, but probably at the price of greater complexity. The optimal number of markers seems scenario dependent [12, 35]. A practical rule of thumb is to define classes that are manageable  that is, not too many. For example, Druet et al. [45] considered haplotypes spanning either 3 or 10 markers, with a number of classes of 8 and 700, respectively. The latter were too many and had to be clustered. They observed that 3marker haplotypes provided narrower intervals than 10marker haplotypes, at the possible price of more falsepositive detections. With multiallelic markers the twoloci classes might be impractical. Two options might be (i) to consider the closest microsatellite, or (ii) to split the effect of a haplotype class in a sum of individual marker locus effects. In this option a descendant of haplotype, say, "13" with probability w would be in expectation w times the effect of allele 1 at the first locus, plus w times the effect of allele 3 at the second locus.
Performance of the method
Computations for any of the regression methods (LA, LDLA, LD decay and twomarker) were extremely fast. For the case of LDLA, computing one position took 0.02 seconds. For the IBD method, each position took about 40 seconds.
Results show no clear ranking of methods. Indeed, the fact that the IBD method is often biased deserves further attention for small chromosomal segments, albeit its good performance in the drift 20cM simulation shows the value of multimarker information in relatively sparse maps. The LD decay method is possibly the best across all scenarios, but the twomarker regression analysis is almost as good. Zhao et al. [35] have shown that the even simpler method of singlemarker regression performed slightly better than twomarker regression. Thus, future work should compare our methods (LDLA or LD decay) with singlemarker association.
The admixture simulation shows basically that the extra noise generated affected all methods for localization of QTLs; whether this holds for detection remains to be seen. Thus, more extensive simulations need to be undertaken to compare accuracy, power, and robustness to spurious associations of the different methods.
It seems, nevertheless, that our linear model (LDLA or LD decay) is at least as good in performance as the IBD method, while keeping simplicity. In fact, for small chromosomal segments, association between QTL and markers is very informative [12, 39]. As an aside, simulations should not place the QTL at the center of the segment as this hides bias of the methods and artificially decreases MSE.
It is expected that, for narrower and narrower marker intervals, most information will be captured by the LD term and less by the LA terms. At the limit, if the QTL is the marker, the variance for the gametic effect v (v*) will be null and all information will be contained in β. On the other hand, for very distant markers, variance of v will be high and β will tend to zero. Still, linkage will still be used in modelling the pedigree transmission of fully associated marker effects.
Conclusion
We have presented simple linear models for QTL detection and localization including populational linkage disequilibrium and withinfamily cosegregation. The methods uses all available information (i.e., multiple markers and pedigrees). The performance of these methods is satisfactory, as shown by simulations. These methods are computationally much simpler than other proposals. Extensions to multiple QTL mapping and genomic selection are straightforward. These methods should help researchers in QTL mapping and marker assisted selection, in particular in livestock species, where the required information is available, just like regression is more used than fulllikelihood methods [14], when possible.
Declarations
Acknowledgements
This work benefit from a visit of the first author to Iowa State University, financed by the International Relations Department, INRA, and by Newsham Choice Genetics. Support from EU research project SABRE is gratefully acknowledged (Legarra). Support from the National Research Initiative Competitive Grants Program of the U.S. Department of Agriculture, Award 20073520517862 is gratefully acknowledged (Fernando). This work has much benefited from discussions with JeanMichel Elsen, Hélène Gilbert, Brigitte Mangin and Magali San Cristobal. We are also grateful to JeanMichel Elsen for carefully reading the manuscript. Reviewer's suggestions and corrections are gratefully acknowledged.
Authors’ Affiliations
References
 Abecasis GR, Cardon LR, Cookson WOC: A general test of association for quantitative traits in nuclear families. Am J Hum Genet. 2000, 66: 279292. 10.1086/302698.PubMed CentralView ArticlePubMedGoogle Scholar
 Yu J, Pressoir G, Briggs WH, Bi IV, Yamasaki M, Doebley JF, McMullen MD, Gaut BS, Nielsen DM, Holland JB, Kresovich S, Buckler ES: A unified mixedmodel method for association mapping that accounts for multiple levels of relatedness. Nat Genet. 2006, 38: 203208. 10.1038/ng1702.View ArticlePubMedGoogle Scholar
 Meuwissen THE, Karlsen A, Lien S, Olsaker I, Goddard ME: Fine mapping of a quantitative trait locus for twinning rate using combined linkage and linkage disequilibrium mapping. Genetics. 2002, 161: 373379.PubMed CentralPubMedGoogle Scholar
 Dupuis J, Siegmund DO, Yakir B: A unified framework for linkage and association analysis of quantitative traits. Proc Natl Acad Sci USA. 2007, 104: 2021020215. 10.1073/pnas.0707138105.PubMed CentralView ArticlePubMedGoogle Scholar
 Farnir F, Grisart B, Coppieters W, Riquet J, Berzi P, Cambisano N, Karim L, Mni M, Moisio S, Simon P, Wagenaar D, Vilkki J, Georges M: Simultaneous mining of linkage and linkage disequilibrium to fine map quantitative trait loci in outbred halfsib pedigrees: revisiting the location of a quantitative trait locus with major effect on milk production on bovine chromosome 14. Genetics. 2002, 161: 275287.PubMed CentralPubMedGoogle Scholar
 PérezEnciso M: Fine mapping of complex trait genes combining pedigree and linkage disequilibrium information: a Bayesian unified framework. Genetics. 2003, 163: 14971510.PubMed CentralPubMedGoogle Scholar
 Rannala B, Slatkin M: Methods for multipoint disease mapping using linkage disequilibrium. Genet Epidemiol. 2000, 19 (Suppl 1): S71S77. 10.1002/10982272(2000)19:1+<::AIDGEPI11>3.0.CO;2D.View ArticlePubMedGoogle Scholar
 Meuwissen THE, Goddard ME: Fine mapping of quantitative trait loci using linkage disequilibria with closely linked marker loci. Genetics. 2000, 155: 421430.PubMed CentralPubMedGoogle Scholar
 Meuwissen THE, Goddard ME: Prediction of identity by descent probabilities from markerhaplotypes. Genet Sel Evol. 2001, 33: 605634. 10.1186/12979686336605.PubMed CentralView ArticlePubMedGoogle Scholar
 Boitard S, Abdallah J, de Rochambeau H, CiercoAyrolles C, Mangin B: Linkage disequilibrium interval mapping of quantitative trait loci. BMC Genomics. 2006, 7: 5410.1186/14712164754.PubMed CentralView ArticlePubMedGoogle Scholar
 Abdallah JM, Mangin B, Goffinet B, CiercoAyrolles C, PerezEnciso M: A comparison between methods for linkage disequilibrium fine mapping of quantitative trait loci. Genet Res. 2004, 83: 4147. 10.1017/S0016672303006554.View ArticlePubMedGoogle Scholar
 Grapes L, Dekkers JCM, Rothschild MF, Fernando RL: Comparing linkage disequilibriumbased methods for fine mapping quantitative trait loci. Genetics. 2004, 166: 15611570. 10.1534/genetics.166.3.1561.PubMed CentralView ArticlePubMedGoogle Scholar
 Kruglyak L: The road to genomewide association studies. Nat Rev Genet. 2008, 9: 314318. 10.1038/nrg2316.View ArticlePubMedGoogle Scholar
 Knott SA, Elsen JM, Haley CS: Methods for multiplemarker mapping of quantitative trait loci in halfsib populations. Theor Appl Genet. 1996, 93: 7180. 10.1007/BF00225729.View ArticlePubMedGoogle Scholar
 Fernando RL, Grossman M: Marker assisted prediction using best linear unbiased prediction. Genet Sel Evol. 1989, 21: 467477. 10.1186/12979686214467.PubMed CentralView ArticleGoogle Scholar
 Baret PV, Knott SA, Visscher PM: On the use of linear regression and maximum likelihood for QTL mapping in halfsib designs. Genet Res. 1998, 72: 149158. 10.1017/S0016672398003450.View ArticlePubMedGoogle Scholar
 Mangin B, Goffinet B, Le Roy P, Boichard D, Elsen JM: Alternative models for QTL detection in livestock. II. Likelihood approximations and sire marker genotype estimations. Genet Sel Evol. 1999, 31: 225237. 10.1186/12979686313225.PubMed CentralView ArticleGoogle Scholar
 Heath S: Markov chain Monte Carlo segregation and linkage analysis for oligogenic models. Am J Hum Genet. 1997, 61: 748760. 10.1086/515506.PubMed CentralView ArticlePubMedGoogle Scholar
 Elsen JM, Mangin B, Goffinet B, Boichard D, Le Roy P: Alternative models for QTL detection in livestock. I. General introduction. Genet Sel Evol. 1999, 31: 213224. 10.1186/12979686313213.PubMed CentralView ArticleGoogle Scholar
 Blott S, Kim JJ, Moisio S, SchmidtKüntzel A, Cornet A, Berzi P, Cambisano N, Ford C, Grisart B, Johnson D, Karim L, Simon P, Snell R, Spelman R, Wong J, Vilkki J, Georges M, Farnir F, Coppieters W: Molecular dissection of a quantitative trait locus: a phenylalaninetotyrosine substitution in the transmembrane domain of the bovine growth hormone receptor is associated with a major effect on milk yield and composition. Genetics. 2003, 163: 253266.PubMed CentralPubMedGoogle Scholar
 Li J, Zhou Y, Elston RC: Haplotypebased quantitative trait mapping using a clustering algorithm. BMC Bioinformatics. 2006, 7: 25810.1186/147121057258.PubMed CentralView ArticlePubMedGoogle Scholar
 Thompson EA, Heath S: Estimation of conditional multilocus gene identity among relatives. Statistics in Molecular Biology and Genetics: Selected Proceedings of a 1997 Joint AMSIMSSIAM Summer Conference on Statistics in Molecular Biology, of IMS Lecture NoteMonograph Series. Edited by: SeillierMoiseiwitsch F. 1999, Institute of Mathematical Statistics (Hayward, CA), 33: 95113.View ArticleGoogle Scholar
 Qian D, Beckmann L: Minimumrecombinant haplotyping in pedigrees. Am J Hum Genet. 2002, 70: 14341445. 10.1086/340610.PubMed CentralView ArticlePubMedGoogle Scholar
 Windig JJ, Meuwissen THE: Rapid haplotype reconstruction in pedigrees with dense marker maps. J Anim Breed Genet. 2004, 121: 2639. 10.1046/j.14390388.2003.00439.x.View ArticleGoogle Scholar
 Abraham KJ, Totir LR, Fernando RL: Improved techniques for sampling complex pedigrees with the Gibbs sampler. Genet Sel Evol. 2007, 39: 2738. 10.1186/1297968639127.PubMed CentralView ArticlePubMedGoogle Scholar
 Wang T, Fernando RL, Vanderbeek S, Grossman M, Vanarendonk JAM: Covariance between relatives for a marked quantitative trait locus. Genet Sel Evol. 1995, 27: 251274. 10.1186/12979686273251.PubMed CentralView ArticleGoogle Scholar
 Grignola FE, Hoeschele I, Tier B: Mapping quantitative trait loci in outcross populations via residual maximum likelihood .1. Methodology. Genet Sel Evol. 1996, 28: 479490. 10.1186/12979686286479.PubMed CentralView ArticleGoogle Scholar
 Almasy L, Blangero J: Multipoint quantitativetrait linkage analysis in general pedigrees. Am J Hum Genet. 1998, 62: 11981211. 10.1086/301844.PubMed CentralView ArticlePubMedGoogle Scholar
 Fernando R, Totir L: Incorporating molecular information in breeding programmes: methodology. Poultry genetics, breeding and biotechnology. Edited by: Muir W, Aggrey S. 2003, Wallingford: CAB International, 537548.View ArticleGoogle Scholar
 Fernando RL: Incorporating molecular markers into genetic evaluation. Book of abstracts of the 55th Annual Meeting of the European Association for Animal Production, Bled, Slovenia. 2004, 10: 86Google Scholar
 Quaas RL: Additive genetic model with groups and relationships. J Dairy Sci. 1988, 71: 13381345.View ArticleGoogle Scholar
 Goddard ME: A mixed model for analyses of data on multiple geneticmarkers. Theor Appl Genet. 1992, 83: 878886. 10.1007/BF00226711.PubMedGoogle Scholar
 PérezEnciso M, Varona L: Quantitative trait loci mapping in F2. crosses between outbred lines. Genetics. 2000, 155: 391405.PubMed CentralPubMedGoogle Scholar
 Lee SH, Werf van der JHJ: An efficient variance component approach implementing an average information REML suitable for combined LD and linkage mapping with a general complex pedigree. Genet Sel Evol. 2006, 38: 2543. 10.1186/1297968638125.PubMed CentralView ArticlePubMedGoogle Scholar
 Zhao HH, Fernando RL, Dekkers JCM: Power and precision of alternate methods for linkage disequilibrium mapping of quantitative trait loci. Genetics. 2007, 175: 19751986. 10.1534/genetics.106.066480.PubMed CentralView ArticlePubMedGoogle Scholar
 Almasy L, Williams JT, Dyer TD, Blangero J: Quantitative trait locus detection using combined linkage/disequilibrium analysis. Genet Epidemiol. 1999, 17 (Suppl 1): S31S36.View ArticlePubMedGoogle Scholar
 Gautier M, Barcelona RR, Fritz S, Grohs C, Druet T, Boichard D, Eggen A, Meuwissen THE: Fine mapping and physical characterization of two linked quantitative trait loci affecting milk fat yield in dairy cattle on BTA26. Genetics. 2006, 172: 425436. 10.1534/genetics.105.046169.PubMed CentralView ArticlePubMedGoogle Scholar
 Hill WG, HernándezSánchez J: Prediction of multilocus identitybydescent. Genetics. 2007, 176: 23072315. 10.1534/genetics.107.074344.PubMed CentralView ArticlePubMedGoogle Scholar
 Meuwissen THE, Goddard ME: Multipoint identitybydescent prediction using dense markers to map quantitative trait loci and estimate effective population size. Genetics. 2007, 176: 25512560. 10.1534/genetics.107.070953.PubMed CentralView ArticlePubMedGoogle Scholar
 Hayes JF, Hill WG: Modification of estimates of parameters in the construction of genetic selection indices ('bending'). Biometrics. 1981, 37: 483493. 10.2307/2530561.View ArticleGoogle Scholar
 Gilbert H, Firat MZ, Totir LR, Dekkers JCM, Fernando RL: A new method to fine mappinf quantitative trait locus using linkage disequilibrium. Book of abstracts of the 55th Annual Meeting of the European Association for Animal Production, Bled, Slovenia. 2004, 10: 87Google Scholar
 Yu J, Holland JB, McMullen MD, Buckler ES: Genetic design and statistical power of nested association mapping in maize. Genetics. 2008, 178: 539551. 10.1534/genetics.107.074245.PubMed CentralView ArticlePubMedGoogle Scholar
 Verbyla AP, Cullis BR, Thompson R: The analysis of QTL by simultaneous use of the full linkage map. Theor Appl Genet. 2007, 116: 95111. 10.1007/s001220070650x.View ArticlePubMedGoogle Scholar
 Calus MPL, Meuwissen THE, de Roos APW, Veerkamp RF: Accuracy of genomic selection using different methods to define haplotypes. Genetics. 2008, 178: 553561. 10.1534/genetics.107.080838.PubMed CentralView ArticlePubMedGoogle Scholar
 Druet T, Fritz S, Boussaha M, BenJemaa S, Guillaume F, Derbala D, Zelenika D, Lechner D, Charon C, Boichard D, Gut IG, Eggen A, Gautier M: Fine mapping of quantitative trait loci affecting female fertility in dairy cattle on BTA03 using a dense singlenucleotide polymorphism map. Genetics. 2008, 178: 22272235. 10.1534/genetics.107.085035.PubMed CentralView ArticlePubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Comments
View archived comments (1)