 Research
 Open Access
 Published:
Linear models for joint association and linkage QTL mapping
Genetics Selection Evolution volume 41, Article number: 43 (2009)
Abstract
Background
Populational linkage disequilibrium and withinfamily linkage are commonly used for QTL mapping and marker assisted selection. The combination of both results in more robust and accurate locations of the QTL, but models proposed so far have been either single marker, complex in practice or well fit to a particular family structure.
Results
We herein present linear model theory to come up with additive effects of the QTL alleles in any member of a general pedigree, conditional to observed markers and pedigree, accounting for possible linkage disequilibrium among QTLs and markers. The model is based on association analysis in the founders; further, the additive effect of the QTLs transmitted to the descendants is a weighted (by the probabilities of transmission) average of the substitution effects of founders' haplotypes. The model allows for noncomplete linkage disequilibrium QTLmarkers in the founders. Two submodels are presented: a simple and easy to implement HaleyKnott type regression for halfsib families, and a general mixed (variance component) model for general pedigrees. The model can use information from all markers. The performance of the regression method is compared by simulation with a more complex IBD method by Meuwissen and Goddard. Numerical examples are provided.
Conclusion
The linear model theory provides a useful framework for QTL mapping with dense marker maps. Results show similar accuracies but a bias of the IBD method towards the center of the region. Computations for the linear regression model are extremely simple, in contrast with IBD methods. Extensions of the model to genomic selection and multiQTL mapping are straightforward.
Background
Linkage analysis (LA) is a popular tool for QTL detection and localization. Its accuracy is limited by the number of meioses observed in the studied pedigree, which can represent several centiMorgan. Linkage disequilibrium (LD, also called gametic phase disequilibrium) is the nonrandom association among different loci, and is increasingly used in human and agricultural association studies for gene mapping. The joint use of LD and LA (also called LDLA) permits to map QTL more accurately than LA while retaining its robustness to spurious associations, and this technique has been applied in human [1], plant [2], and livestock [3] populations. This is achieved by explicitely modelling relatedness not accounted for in association analysis [2]. LDLA is also robust to nonadditive modes of inheritance [4]. In addition, the joint use of LD and LA makes it possible to test linkage alone or linkage disequilibrium separately [1]. A characteristic of plants and livestock is that often, close pedigree relationships exist and are recorded among the individuals genotyped for QTL detection (e.g., bulls or plant varieties), and including these relationships in the analyses can be worthwhile.
In livestock, several approaches have been proposed to take into account LD information within LA [3, 5, 6]. These methods model the process generating LD among the putative QTL and the surrounding markers; this process can quickly become unmanageable in the general case [7], and even difficult to approximate [8–10]. Extensions of LD models to include LA (that is, the cosegregation of markers and QTL due to physical linkage) are cumbersome for the general case [6] or restricted to certain pedigree structures like halfsibs families (C. Cierco, pers. comm.). The parameters of LD generating processes can be either estimated from the data, which is often difficult, or fixed a priori which is unsatisfactory. The existence or not of these events in the past history of a population is unknown. Therefore the validity of any assumptions is largely unknown.
An alternative is QTL mapping by simple association (regression in the case of quantitative traits) of phenotypes on marker alleles, which has been s.hown to be an effective method [11, 12], while retaining simplicity; this is widely used in human genetics [13]. On the other hand, QTL mapping in livestock by LA relies heavily on the use of half and fullsibs families and relatively simple ascertainment of phases and transmission probabilities (e.g. [14]). For this reason, HaleyKnott type regressions for simple designs [14] and variance component methods for more complex designs [15] are well adapted, computationally simpler and almost as good [16, 17] as full integrated likelihoods [18, 19]. Linear models are appealing for their ease of use and understanding and good performance.
In this work, we combine association analysis with probabilities of transmission using conditional expectations. Ultimately, we come up with linear models for joint association and linkage mapping, which are generalizations of LA mapping. Two particular cases will be detailed: a halfsib regression which applies in many livestock practical settings, and a general mixed model approach valid for any type of pedigree.
Methods
This section is organised as follows. In the subsection "Splitting QTL effects", we show how to come up with expectations for gametic QTL effects integrating association and linkage. The following two subsections "LDLA HaleyKnott type regression" and "Variance components mapping" explicitly present two linear models (HaleyKnott type regression for halfsib families and a general mixed model for a general pedigree) and the statistical tests that lead to QTL detection, location, and ascertainment of the hypothesis linkage, association, both or lack of both. Numerical examples and performance of the methods are illustrated by simulations in subsection "Illustrations", under two different scenarios.
Splitting QTL effects
In this section we will show how QTL effects can be split in a part conditional on LD in the founders and cosegregation, and another part which is unconditional on LD in the founders. This results in a flexible linear model setting. Throughout the paper, we will assume a polymorphic QTL with an unknown number of alleles nq: {q_{1} ⋯ q_{ nq }}, with effects α= (α_{1} ⋯ α_{ nq }); dominance is not considered. Let v denote the additive effects of all gametes carriers of QTLs in a population; this will be referred to as "gametic effects" (e.g. [15]).
In the following we consider haplotypes, which are phased markers, i.e., a set of 1, 2, or several ordered markers on the same chromosome. Haplotypes can be classified in classes. Classes can be formed by simple classification or by more sophisticated techniques such as cluster analysis [20, 21]. For the sake of discussion we will assume that haplotypes are composed of two markers with a putative QTL located at the middle, but our approach is general and conditional only on the existence of haplotype classes.
In all the following, we generally consider a single position in the genome. This position is situated on a specific chromosome number of the physical map or karyotype; for example, BTA14. In a diploid species, each individual has two copies of each chromosome: one from the paternal side and one from the maternal side. Identification of the origin of each chromosome copy is not always possible. In the following, when referring to any given chromosome pair containing a specific locus of the genome and to distinguish the two chromosome copies, we shall note them 1 and 2.
The haplotype (jth chromosome in ith individual, j = {1, 2}) can be assigned to a haplotype class k through a function δ( ) acting on a haplotype h. In its simplest form, δ( ) is a lookup table. So, for the case of two flanking SNPs, classes are 1 to 4, composed of haplotypes 00, 01, 10 and 11. The number of haplotype classes at the candidate position is nh.
We assume that linkage disequilibrium exists between haplotype classes and QTL alleles. Conditional on each haplotype class, population frequencies for a QTL state are denoted by matrix π= {π_{1,1}⋯ π_{nq, nh}}. That is, the probability of QTL state l conditional to haplotype class k is Pr(Q ≡ q_{ l }k) = π_{l, k}. Assuming linkage equilibrium, π_{l, 1}= ⋯ = π_{l, nh}= π_{ l }, the marginal population frequency of the lth allele of the QTL. In this situation, haplotype classes are not informative on QTL states. However, given disequibrium between the markers loci and the QTL locus, π_{ l }, ... will vary among the different haplotype classes.
Founders
The haplotype of a founder individual i on chromosome j is and belongs to a class k (δ() = k). The distribution of additive gametic effect conditional on k is determined by π:
and the expectation of conditional on the haplotype is:
Neither the α effects nor the π proportions are known in practice. Thus, we propose to substitute the summation ∑α_{ l }π_{l, k}by a term β_{ k }; that is, to substitute the weighted effects of QTL alleles for each haplotype class by the overall withinclass mean. This amounts to considering β_{ k }as the "substitution effect", at the population level, of the haplotype. This is precisely what is done in association analysis of quantitative traits. The set of different haplotype substitution effects is β= {β_{1},⋯β_{ nh }}. In this new formulation:
Now, can be modelled as the sum of a conditional expectation plus a deviation: , where this deviation (assuming the true state of the QTL is q_{ l }) is as above. The deviation has a discrete distribution with possible states {(α_{1}  β_{ k }),⋯(α_{ nq } β_{ k })} with probabilities {π_{1, k},⋯ π_{nq, k}}, which are generally unknown.
Nonfounders
For a nonfounder individual i, let be the probability that the QTL allele at chromosome j of individual i is inherited from the QTL allele at chromosome x of its father; and let probability that allele at chromosome j is inherited from the chromosome y of its mother. In the absence of marker information, these are 0.5. Assume that these probabilities have been computed, conditional on all marker information (m), using one of several methods [14, 22–25]. We will refer to these probabilities as PDQ's (probability of descent for a QTL allele) [26]; they can be put together in a row vector w_{i, j}(while each PDQ is a conditional probability, we do not explicitly include m in the notation for simplicity in the following expressions).
where the subscripts 1 and 2 refer to the two QTL alleles of the sire and the dam. In the expression above, four probabilities are needed because maternal and paternal origin can not always be stablished with certainty [26] and, for the same reason, labels 1 and 2 are used instead of "paternal" and "maternal" for each QTL allele in each individual. Elements in w_{i, j}sum to 1.
The conditional distribution of , the gametic effect, is a discrete set of QTL effects α, with probabilities dependent on, first, the QTL state of its parents; and second, on the probabilities of transmission of these parental QTLs towards i. That is:
In particular, if the parents of i are among the founders, then it follows that:
It follows that the expectation of conditional on marker information and the rest of parameters is then simply:
which, if the parents are founders, is:
because of the properties of expectations (i.e., we can factor out w_{i, j})^{.} That is, the expected value of a gametic effect is equal to the substitution effects of the parents' haplotypes, weighted by the corresponding transmission probabilities. This is a particular case of a general, recursive formula that also works if the parents of the individual are nonfounders themselves:
The , the deviation of with respect to its expectation has states with associated probabilities which are conditional on marker information as well.
The two building blocks in the previous section (modelling of expectations of gametic effects in founders by LD, and of non founders by conditioning on founders and LA) allow us to construct several linear models considering LD, LA, or both. In the next two sections, we will detail two linear models including LD and LA for cases commonly used in livestock genetics: a regression approach applied to idealized pedigree structures (halfsib families), and a more flexible variance component approach which can be used for general pedigree structures.
LDLA HaleyKnott type regression
Consider n sires with m marker information. Assume further that QTL states at the sires are independent, conditional on their haplotypes and the corresponding conditional probabilities π(i.e. we assume no other relationship among sires beyond haplotype similarities, which is usual in this type of regression [14]). Suppose each of the n sires is mated to several dams with one daughter per dam  a halfsib design. As before, let be the probability that the QTL allele at chromosome j of individual i is inherited from chromosome x of the sire; let be the probability that the QTL allele at chromosome j is inherited from chromosome y of the dam; these PDQ's, computed based on m, can be put together in a matrix W_{ i }.
The expectation of the phenotype y_{ i }of a given offspring i from sire s and dam d, conditional on its parents' gametic effects is:
Gametic effects can be split, as shown above. A part is conditional on linkage disequilibrium in the founders (E(v)), which in turn can be conditioned on haplotype substitution effects β. Another part is not conditional on linkage disequilibrium at the founders (v*). Then:
Note that, in the preceding expression, we assume that haplotypes in the sire and dam are known with certainty. Assuming paternal (p) and maternal (m) origins can be established with certainty, it is possible to further simplify the expression by condensing dams' information. First, it is possible to condition only on the deviations v* in the sire, because in this design v*'s for the dams are generally difficult to estimate and nonestimable in leastsquares regression. Second, we can assume that the proportions π in the founders are still accurate one generation later  that is, the decay of LD is slow, which holds for short distances (≈ 1% per generation in intervals of 1 cM). If this holds, it is possible to change the weighted substitution effect of the two haplotypes in the dam, and , to the substitution effect of the haplotype found in the maternally inherited chromosome of descendant i(). This strategy was followed by Farnir et al. [5]. Then:
where w_{s, i}is a row vector with the two PDQ's from chromosomes 1 and 2 in the sire towards the paternal chromosome in i. Extension to n sires is immediate:
where W_{ p }are the PDQ's from sires to paternal chromosome in the offspring; is the set of "residual" gametic effects in the sires; and Q_{ s }and Q_{ m }are incidence matrices relating, haplotypes in the sires, and maternal haplotypes in the offspring, to appropriate elements in β. Last, Z_{ p }and Z_{ m }are appropriate incidence matrices relating paternal and maternal gametes in the progeny to records. This conditional expectation immediately translates into a statistical model:
where e is a vector of residuals. This model can be fitted by, for example, leastsquares. Tests for QTL detection and location using interval mapping can be done by likelihood ratio or Ftests, assuming homoscedasticity of variances. Variances are indeed not homogeneous, for example, if a QTL is fixed within a haplotype class but not in another. The non consideration of dam effects also inflates the residual variance. Note, in addition, that the model is generally not fullrank: effects are non estimable withinsire (but their contrasts are). The β coefficients will be estimable if they are not confounded with any gametic effect; that is, if no haplotype class is present in one sire only. However, this does not create any problem for QTL localization and detection.
An interesting property of the model is that it is a generalization of HaleyKnott regression [14, 19], which occurs if we assume linkage equilibrium among founder haplotypes. Note that spurious signals due to, for example, stratification, are unlikely in this model because there is a verification, through linkage (i.e. the PDQ's) that associated haplotypes are transmitted to the next generation and still have an effect. This breaks down spurious associations that would be observed at the founders' level.
A simplified model, which does not include the v* effects is:
This expression models appropriately the cosegregation of markers and those QTL in LD with them. We call this model "LD decay" because it models appropriately the decay of initial LD existant in the founders by tracing the effect of the different segments through the pedigree with the aid of flanking markers, i.e., by linkage. However, it would not detect a QTL in the case of LE.
Statistical testing
Many tests are possible using the statistical model in equation (13). Usually (for example in interval mapping), several possible QTL locations are tested simultaneously or sequentially. For a particular putative QTL location, the null hypothesis is the nonsegregation of alleles of the QTL having different effects. This implies that all haplotype substitution effects, as well as the v* deviations, have the same value. This amounts to a common overall mean for the data, with β= 0, = 0. There are three alternative hypothesis depending on the existence of complete linkage disequilibrium, only linkage, or both.
The four hypothesis are:

1.
H _{0} (null hypothesis): No cosegregation markersQTL effects (i.e. no linkage) and no linkage disequilibrium among haplotypesQTL: β= 0, = 0.

2.
H _{1}: Complete linkage disequilibrium at the founders: β≠ 0, = 0.

3.
H _{2}: Linkage equilibrium at the founders but cosegregation markersQTL effects: β= 0, ≠ 0.

4.
H _{3}: Incomplete linkage disequilibrium at the founders and residual cosegregation markersQTL effects: β≠ 0, ≠ 0
In addition, it is possible to test H_{3} against H_{1} and H_{2}.
Variance components mapping
Extension to a variance components or mixed model mapping framework [15, 27, 28] is possible [29, 30]. As before, let v be the gametic effects for all the QTL gametes in the population. We will show how the first and second moments of the joint distribution of v can be constructed, conditional on marker information and within haplotypic classes means and variances.
Following previous notation, the following recursive equation for gametic effects holds:
Each gametic effect is modelled as (i) a weighted average of the gametic effects of its ancestors (for nonfounder individuals) or of haplotypic effects (for founder individuals), plus (ii) independent random variables due to mendelian sampling [15], ϕ. The expression (15) potentially includes nonfounder gametic effects in the progeny of nonfounder animals, allowing for generality and multigenerational pedigrees.
Note that is partitioned into founders and nonfounders, and all subsequent partitioned matrices. In particular, W can be partitioned accordingly, so that rows tracing the origin of founder gametes from other gametes in the population are formed by 0's. Note that the setting is very similar to a genetic groups model [31]. Rules for computing the first and second moments of the distribution of the gametic effects v follow [29].
Conditional distribution of the gametic effects
Conditional mean for the gametic value
The development is as in previous sections. Let be the probability that gamete came from haplotypic class k. In general, for the jth allele of the ith individual,
For founder alleles, conditionally on the haplotype , this is simply the mean of the corresponding haplotypic class, that is , as is 1 for k = δ() and 0 for anything else.
For nonfounders, a recursive equation holds:
and therefore:
where w_{ i }is a matrix of PDQ's as before, and s and d indicate the gametes in the father and mother. From expression (15) [31]. Thus, another representation in matrix algebra is:
where (I  W)^{1} represents summation over all possible paths of transmission from ancestors to descendants, and represents the expected franction of founder gametes in the descendant gametes [31]. Matrix Q_{ f }is an incidence matrix relating founder gametes to founder haplotypic classes. Matrix Q can be recursively computed using equation (16). These expressions are similar to the QTL crossbred model [32, 33], save for groups for founders, which are based on haplotype classes instead of breeds.
Conditional variance of the gametic value
Any gamete can in principle be traced to one or several founder populations (i.e., haplotypic classes). Had the gamete come from the haplotype class k, its conditional variance of the gametic effect would be just , where , the average gametic effect in class k. As the number of QTL alleles and their distribution are unknown, the different are parameters to be estimated in the model. However, the gamete can come from several origins, each with probability ; therefore, the distribution of the gametic effect is a mixture. Conditioning on all possible origins k = (1, ... nh),
which can be expanded [29] to:
where the computations of and have been previously shown. Note that this expression reduces to the classical one [15] under linkage equilibrium.
Conditional covariances
As modelled here, the conditional covariance of two gametic effects depends on the event that they are identical by descent in the observed pedigree. Let and be two gametes, with indexes arranged so that i can be a descendant of j but not the opposite. The QTL allele at the gamete is one of the four gametes of its parents, s and d. The conditional covariance between the gametic values and is then:
where the covariances in the right hand side are also conditional on m and β. This formula is the same as for the case of linkage equilibrium in the founders [15, 26]. However, the variances differ due to the different haplotype origins, and the covariances will not be the same as those under linkage equilibrium.
Statistical model
A linear model including gametic effects is:
where X and Z are incidence matrices and b is a vector of fixed effects. Residuals e are normally distributed e ~ MVN(0, R), where MVN stands for multivariate normal, and R = I.
Further, assume normality for v (this is an approximation). Then, , where Q and G (the covariance matrix of gametic effects) are computed as above in equations (19, 20). Under this assumption of normality, the distribution of y is:
where V = ZGZ '+ R, and the likelihood is:
Using this likelihood, Bayesian techniques or maximum likelihood techniques can be used to infer parameters of the model and location of the QTL. In particular, mixed model equations are:
Note that G^{1} can be easily constructed using partitioned matrix rules [26]. These equations might not be convenient because β is found on the right hand side. An alternative formulation uses
that is, using v* = v  Q β, which has zero expectation. The mixed model equations are then [31]:
Note that enter nontrivially into G. For the maximum likelihood techniques, derivativefree techniques might be used with equation (22). For the Bayesian approach, albeit the "data augmentation" of gametic effects in (23) or (24) partly simplifies computations, the full posterior conditionals of θ do not have closed forms; MetropolisHastings might be used. Other possible simplifications are:

Supress v* from the model in (24), i.e. y = Xb + ZQ β+ e. This implicitely assumes: (i) QTL alleles are fixed within haplotype class; and (ii) transmissions are known with certainty (i.e. PDQ's are either 0 or 1). Under these two conditions, Var(v*) = 0. This might happen for very dense marker maps where markers are fully informative on QTL state and transmissions. The result is a leastsquares estimator as follows:
(25) 
Assume constant variances across classes and, further, that PDQ's are known with certainty. If this is the case, Var(v*) = and standard algorithms and software (e.g., REML) can be used.

If variances are not constant within class but each gametic effect can be asigned exactly to a class k (i.e. PDQ's are either 0 or 1), then its variance is . This is a mixed model with heterogeneity of variances. This assumption is similar to that by PérezEnciso and Varona [33].
Again, the null hypothesis is the nonsegregation of QTL effects, that is, all haplotype substitution effects, as well as the v* deviations, have a null value; save that v* are now random effects. The four hypotheses are:

1.
H _{0} (null hypothesis): No segregation of QTL effects (i.e. no linkage) and no linkage disequilibrium haplotypeQTL: .

2.
H _{1}: Complete linkage disequilibrium at the founders: .

3.
H _{2}: Linkage equilibrium: .

4.
H _{3}: Incomplete linkage disequilibrium at the founders: .
Illustrations
Numerical examples
We will show how the terms in both linear models are set up. Consider the pedigree and markers in Table 1. We assumed a distance of 30 cM between markers and a QTL placed at the middle. Note that, assuming few recombinations, transmissions in the pedigree are simple to follow. From this information, it can be inferred that a recombination has occurred to form the sire gamete in 6.
LDLA regression
Consider sires 2 and 5 (assuming they are unrelated) and phenotypes of offspring (4 to 6 for sire 2 and 7 and 8 for sire 5). We need to set up the incidence matrix relating β to sires' haplotypes (Q_{ s }) and maternalinherited haplotypes (Q_{ m }). Let levels 1 to 4 in β represent haplotypes 00, 01, 10, 11. Then:
Assuming chromosome origins were established with certainty, probabilities of transmission are 0.98 for the nonrecombinant and 0.02 for the recombinants (actually, double recombinants) if markers were transmitted together, or 0.5 if they were not. The matrix of PDQ's W_{ p }is thus:
There are four (twice the number of sires) gametic sire effects . Last, Z_{ p }and Z_{ m }are 5 × 5 identity matrices for records of individuals 4 to 8. Note that animal 5 is in the analysis both as sire and as offspring. The final equations (13) are thus:
Variance components mapping
In order to construct the mixed model equations we assume certain values for the class substitution effects β' = [0.9, 0.5, 0.5, 0.1] and for the withinclass variances = (0.09, 0.25, 0.25, 0.09) (in practice these values have to be estimated).
Expectation of gametic effects
Setting up the matrix Q for the founders implies just setting the element corresponding to the jth haplotype of the ith founder and the δ () class to 1, and all other to zero. Gametic effects are ordered within each animal. Then the first six rows of Q are:
where the first two rows correspond to animal 1, the next two to animal 2, and so on. Let's take nonfounder animal 4. Its rows in Q are the product of the corresponding PDQ's times the rows in Q corresponding to their parents 2 (sire) and 1 (dam). That is:
The process is repeated for every individual. Individual 7 is descendant of two nonfounders (sire is 5 and dam is 4), but the same logic applies.
Matrix Q is then:
Covariance matrix of gametic effects
To compute the variance we apply (19). For founders, variances are for the first gamete in 1, for the second, for the first gamete in 2, and so on. For nonfounders, let consider for example gamete 2 in individual 4 and gamete 2 in individual 6. Note that the terms are contained in matrix Q above. If we apply the formula and ignore null terms (those = 0):
and
We can see that the higher uncertainty in the origin of results in a higher variance. As for the covariances, these were computed using the algorithm of Wang et al. [26]. The final covariance matrix G is:
Simulations
Scenarios
First, four simulations were carried out to check the behaviour of the different methods for fine mapping. We used the LDSO software for the simulations (F. Ytournel, pers. comm), a set of programs developed at INRA (T. Druet, F. Guillaume, pers. comm.) for phase determination and computation of PDQs, and userwritten programs for setting up and solving the linear models.
The first set of scenarios will be termed as "drift". Two subscenarios differing on the size of the region of interest (5 or 20 cM) were designed. A 5 (alternatively, 20) cM region with 21 SNP markers (i.e., 20 brackets), with a biallelic QTL at position 2.125 (alternatively, 8.5) cM (at the middle of the 9th bracket). The QTL was biallelic with an effect of 1 for the second allele. No foundational event was assumed (i.e., marker and QTL alleles were assigned at random in the ancestral population). SNP alleles were assigned at random in the founders. This population evolved during 100 generations with an effective size of 100. Therefore the only source of LD was drift. After these populational events, a daughter design was simulated, with 15 sires each with 20 daughters. Phenotypes were simulated according to the QTL effects and to a residual variance of 1; no polygenic effects were simulated. This is a scenario where IBD methods are likely to perform well. Although the design is fairly small for dairy cattle, it is not unlikely for swine or sheep, and our purpose was not to provide a large amount of information.
The second two scenarios ("admixture") are radically different and include strong admixture. Again, 5 and 20 cM region are considered, with same positions for the QTL. Initially, two breeds existed differing in their polygenic average by 1. A QTL is considered with equal frequency in each breed, with an effect of 1 for the second allele. SNP alleles were assigned at random in the founders. Both breeds were crossed and a mixed population of 50 individuals evolved during 20 generations. A daughter design as before was simulated. Phenotypes were simulated according to the QTL, the inherited polygenic part of each breed, and a residual variance of 1. This scenario might generate admixture by drift if one SNP locus is indicative of breed origin.
Methods
We compared the performances of five different methods: (1) LA: HaleyKnott linkage analysis [14], (2) LDLA: the regression LDLA method in this work (equation 13), (3) LD decay: LDLA regression by equation (14), that is, ignoring the v* terms, (4) twomarker: regression on twomarker haplotypes (i.e., association analysis), and (5) an IBD method [3, 34], which computes IBD among founders based on all markers (Lee, pers. comm.).
The simplest approach is to perform single marker association analysis, which has been shown to be as good as more complex methods in quite a variety of scenarios [35]. We nevertheless discarded this option because the simulation method places the QTL in the middle of a bracket. This automatically penalizes the singlemarker analysis. Further, by using always two markers we can compare all methods in the same grounds (except IBD).
For the twomarker and IBD method, phases were assumed to be known with certainty; this might have resulted in slightly optimistic results. Performance of the different methods was based on the average error (i.e., the bias, in cM) and the mean square error (MSE, in cM^{2}). All simulated populations had a minor allele frequency of 0.1 at least for the QTL. One hundred replicates were run.
Results
Tables 2 and 3 show the results of the simulations for the "drift" scenario and Tables 4 and 5 for the "admixture" scenario.
In the "drift" scenario, LA and IBD methods are biased for the 5 cM scenario, and the IBD method is biased for the 20 cM scenario. The ranking of the methods changes with the scenario, with LA being always the worst in accuracy, as expected. The reason of the inability of LA to map the QTL is simple: in small intervals, recombinations  which are needed for LA to map a QTLseldom occur. Thus, in the 5 cM scenario, the performance of LA is roughly equivalent to random mapping of the QTL. For the remaining methods, differences are indeed largely nonsignificant except for the bias.
Figure 1 shows a plot of estimated locations in the 100 simulations vs the QTL position in the "drift 5 cM" scenario. From the graph, it is clear that the IBD method tends towards the center of the haplotype, whereas the other methods are the less biased. The LDLA method is somewhere in the middle.
Figure 2 shows the interval mapping profile of the pvalue along the chromosome for four replicates of the "drift 5 cM" scenario. It can be seen that the signal of association (i.e. twomarker) is considerably smoothed by the LDLA and IBD methods; this might compromise detection power (not addressed here). It is also apparent that the IBD methods look like a weighted average of signals of close markers; this results in smoothing but also in uncertainty.
Most of these results are due to the multimarker nature, in this order, of LA, IBD and LDLA, which might tend to point central regions since, in these, haplotypes are more informative and PDQ's are better calculated. This is alleviated in the LDLA method and LD decay method by the implicit twomarker association analysis.
As for the admixture scenario, Table 4 shows basically that QTL location cannot be accurately estimated. The reason is that the scenario is not informative enough due to the low number of historical recombinations and the noise added by admixture. Differences in performances (MSE) of the different methods are not statistically significant; however, LA, LDLA and LD decay do show some bias.
The 20 cM admixture scenario (Table 5) shows much worse performance of the mapping methods than in the drift scenario; and this, for the same reasons as above: few historical recombinations and noise added by admixture. LA is the worst method in terms of MSE, whereas the LD decay method is the best. However, differences are not significant, and no clear conclusions can be drawn. The profiles in Figure 3 are indeed very chaotic, and they would be difficult to interpret in reallife experiments.
Discussion
Comparison to other models for LDLA
We have presented a method for joint association and linkage, which belongs to a more general class of joint linkage disequilibrium and linkage analysis. In fact, existing methods belong to one of two exclusive classes: those that model somehow the LD phenomena and those that do not.
Some models [5, 6] assumed a mutation followed by expansion of the mutant haplotype. The pertinency of this scenario in general agricultural populations (and indeed in complex diseases in humans) is arguable. The likelihood in Farnir et al. [5] was an approximation, based on the assumption of independence among contiguous markers; and the form of the likelihood was only appropriate for family designs. The more complex model in PérezEnciso [6] holds for any pedigree structure below the founders, but computations were difficult.
Models for association and linkage in human populations exist [1, 36]. These, although very similar to our approach, are difficult to apply to livestock since they are rigidly familystructured; in addition, the QTDT [1] uses unilocus information only for transmission events, whereas in our model it is possible  and recommended  to use multilocus information to compute the PDQ's, and it is possible (but perhaps not useful) to define haplotype classes spanning several loci. Conversely, the QTDT has no need of PDQ calculation or  possibly  map ordering of SNPs.
The most popular model for LDLA QTL detection in livestock has been reported by Meuwissen et al. [3] and has been fairly used [20, 37]. The method relies on the construction of a matrix of covariances among founders (the socalled IBD probabilities), say H, based on identity of state among markers; these IBD probabilities are derived following approximate coalescent models [8, 9, 38, 39]. IBD methods use the same parameter (the variance assigned to the QTL) for both covariance due to association and covariance due to linkage. Modelling linkage and association using different parameters (β and v) allows for a greater flexibility in our model. This can be explained as follows.
Relationship of the IBD method for LDLA to our approach
Suppose we have two marker loci flanking a QTL. Assume that LD is generated by some random process such as drift or mutation. Then, given this LD generation (LG) event, the expected value of the effect of gamete j for a founder individual i is denoted by
For SNP markers, there will be four possible values for the haplotypes. Let β denote the vector of the four β_{ k }variables. In our approach, β is treated as a fixed effect. However, over LG events, β can be thought of as random. Suppose the LG process is such that the expected value of β over LG events is
and the covariance matrix of β over LG events is
The matrix Σ will depend of the LG process, and in the IBD method of Meuwissen and Goddard [8, 9], it is the matrix of IBD probabilities at the QTL conditional on the observed marker data. Thus, when marginalized over the LG events, the mean and variance of β_{ k }do not depend on the marker haplotype. Similarly, the marginal (or unconditional) variance of does not depend on the marker haplotype, and it is denoted by It follows that the unconditional variance of
is
Now, using notation in our paper, the covariance matrix of gametic effects of the founders can be written as
The covariance matrix for the entire vector of gametic effects can be computed, recursively, using equation (20) in our paper, starting with the covariance matrix in equation (29). If there is no LD, will be zero and (29) will reduce to the , which is the covariance matrix under LE. Also, (29) depends on two variance components that relate to the gametic variance due to LD and the remainder. In the IBD method [8, 9], Var(v_{ f }) is written as , where H is an IBD matrix with diagonals equal to 1 and offdiagonals given by . Thus, in the IBD method [8, 9] the partitioning of the gametic variance due to LD and the remainder is entirely dependent of the assumptions underlying the computation of Σ.
A practical problem using IBD methods (our experience is with the IBD methods [3]), is that often matrix H turns out to be negative definite; hence the likelihood of the phenotypes is undefined. The reason is that construction of H is not based on a joint distribution for all founder chromosomes, but it is computed for two haplotypes (or chromosomes) at a time, marginalizing over the rest. This leads to approximate marginal probabilities in H instead of a joint distribution. Thus, the estimated H matrix is at best an approximation. A way to deal with nonpositive definiteness is bending [40], or clustering (a data reduction technique) [20]. Both approaches might result in a loss of information, have unknown statistical properties and are subject to arbitrary tuning parameters.
At any rate, both modelling the LD phenomena and IBD based models rely in assumed population events. The robustness of these methods to, for example, admixtured breeds, is largely unknown and difficult to verify. Our model and those by Fernando et al. and Gilbert et al. [30, 41] do not model the process generating LD among QTL and markers, and therefore are more general. The only strong assumption that they made was that of a biallelic QTL, which is overcome in ours, at the price of a greater number of unknowns.
Originality
The originality of our approach is that (i) it is feasible and welltaylored for some agricultural populations, in particular livestock (because it relies on phase and transmission information easily ascertainable, and holds for any family structure) and corn (where indeed a similar idea  nested association  has been developed [42]), (ii) it is a linear model (with all the adequate machinery), while (iii) at the same time providing, based on expectations and covariances, a simple and coherent linearmodels framework for association and linkage and (iv) reduces to wellknown models on the hypothesis of LE or complete LD. Indeed, our models allows us to test the four relevant hypotheses (disequilibrium, linkage, both or none) and reduces to association or linkage under the respective hypothesis, which is not the case for other methods such as IBD models for example, which assume that LD exists.
Our method is computationally simple to use, provided that phases and PDQ's can be accurately calculated. If this is not the case, inference is possible, in principle, by integrating over all the joint distribution of phases and transmissions. After phase determination and computation of PDQs, all the machinery of the linear models can be applied. This makes it possible to include simultaneously other effects (environmental effects, polygenic effects) and the use of other tools such as permutation tests, bootstrapping and in particular the simultaneous fit of several QTLs [43]. The latter one is of particular interest for recent developments in genomewide genetic evaluation ("genomic selection") using LDLA. For example, the number of simultaneous effects fitted by Calus et al. [44] was ~ 600,000 for twomarker haplotypes in a genome composed of ~ 2300 markers. If a "LD decay model" is used (such as equation 25) the number of equations is linear in the number of loci, while retaining the use of LD and of some of the LA. Even with the full linkage and association model (equation 24), sparsity of the mixed model equations is guaranteed.
A practical problem with the method is how to define "classes" of haplotypes; for example, how many markers to include in the definition of the classes. Including more markers in the definition of the haplotype is straightforward, but probably at the price of greater complexity. The optimal number of markers seems scenario dependent [12, 35]. A practical rule of thumb is to define classes that are manageable  that is, not too many. For example, Druet et al. [45] considered haplotypes spanning either 3 or 10 markers, with a number of classes of 8 and 700, respectively. The latter were too many and had to be clustered. They observed that 3marker haplotypes provided narrower intervals than 10marker haplotypes, at the possible price of more falsepositive detections. With multiallelic markers the twoloci classes might be impractical. Two options might be (i) to consider the closest microsatellite, or (ii) to split the effect of a haplotype class in a sum of individual marker locus effects. In this option a descendant of haplotype, say, "13" with probability w would be in expectation w times the effect of allele 1 at the first locus, plus w times the effect of allele 3 at the second locus.
Performance of the method
Computations for any of the regression methods (LA, LDLA, LD decay and twomarker) were extremely fast. For the case of LDLA, computing one position took 0.02 seconds. For the IBD method, each position took about 40 seconds.
Results show no clear ranking of methods. Indeed, the fact that the IBD method is often biased deserves further attention for small chromosomal segments, albeit its good performance in the drift 20cM simulation shows the value of multimarker information in relatively sparse maps. The LD decay method is possibly the best across all scenarios, but the twomarker regression analysis is almost as good. Zhao et al. [35] have shown that the even simpler method of singlemarker regression performed slightly better than twomarker regression. Thus, future work should compare our methods (LDLA or LD decay) with singlemarker association.
The admixture simulation shows basically that the extra noise generated affected all methods for localization of QTLs; whether this holds for detection remains to be seen. Thus, more extensive simulations need to be undertaken to compare accuracy, power, and robustness to spurious associations of the different methods.
It seems, nevertheless, that our linear model (LDLA or LD decay) is at least as good in performance as the IBD method, while keeping simplicity. In fact, for small chromosomal segments, association between QTL and markers is very informative [12, 39]. As an aside, simulations should not place the QTL at the center of the segment as this hides bias of the methods and artificially decreases MSE.
It is expected that, for narrower and narrower marker intervals, most information will be captured by the LD term and less by the LA terms. At the limit, if the QTL is the marker, the variance for the gametic effect v (v*) will be null and all information will be contained in β. On the other hand, for very distant markers, variance of v will be high and β will tend to zero. Still, linkage will still be used in modelling the pedigree transmission of fully associated marker effects.
Conclusion
We have presented simple linear models for QTL detection and localization including populational linkage disequilibrium and withinfamily cosegregation. The methods uses all available information (i.e., multiple markers and pedigrees). The performance of these methods is satisfactory, as shown by simulations. These methods are computationally much simpler than other proposals. Extensions to multiple QTL mapping and genomic selection are straightforward. These methods should help researchers in QTL mapping and marker assisted selection, in particular in livestock species, where the required information is available, just like regression is more used than fulllikelihood methods [14], when possible.
References
 1.
Abecasis GR, Cardon LR, Cookson WOC: A general test of association for quantitative traits in nuclear families. Am J Hum Genet. 2000, 66: 279292. 10.1086/302698.
 2.
Yu J, Pressoir G, Briggs WH, Bi IV, Yamasaki M, Doebley JF, McMullen MD, Gaut BS, Nielsen DM, Holland JB, Kresovich S, Buckler ES: A unified mixedmodel method for association mapping that accounts for multiple levels of relatedness. Nat Genet. 2006, 38: 203208. 10.1038/ng1702.
 3.
Meuwissen THE, Karlsen A, Lien S, Olsaker I, Goddard ME: Fine mapping of a quantitative trait locus for twinning rate using combined linkage and linkage disequilibrium mapping. Genetics. 2002, 161: 373379.
 4.
Dupuis J, Siegmund DO, Yakir B: A unified framework for linkage and association analysis of quantitative traits. Proc Natl Acad Sci USA. 2007, 104: 2021020215. 10.1073/pnas.0707138105.
 5.
Farnir F, Grisart B, Coppieters W, Riquet J, Berzi P, Cambisano N, Karim L, Mni M, Moisio S, Simon P, Wagenaar D, Vilkki J, Georges M: Simultaneous mining of linkage and linkage disequilibrium to fine map quantitative trait loci in outbred halfsib pedigrees: revisiting the location of a quantitative trait locus with major effect on milk production on bovine chromosome 14. Genetics. 2002, 161: 275287.
 6.
PérezEnciso M: Fine mapping of complex trait genes combining pedigree and linkage disequilibrium information: a Bayesian unified framework. Genetics. 2003, 163: 14971510.
 7.
Rannala B, Slatkin M: Methods for multipoint disease mapping using linkage disequilibrium. Genet Epidemiol. 2000, 19 (Suppl 1): S71S77. 10.1002/10982272(2000)19:1+<::AIDGEPI11>3.0.CO;2D.
 8.
Meuwissen THE, Goddard ME: Fine mapping of quantitative trait loci using linkage disequilibria with closely linked marker loci. Genetics. 2000, 155: 421430.
 9.
Meuwissen THE, Goddard ME: Prediction of identity by descent probabilities from markerhaplotypes. Genet Sel Evol. 2001, 33: 605634. 10.1186/12979686336605.
 10.
Boitard S, Abdallah J, de Rochambeau H, CiercoAyrolles C, Mangin B: Linkage disequilibrium interval mapping of quantitative trait loci. BMC Genomics. 2006, 7: 5410.1186/14712164754.
 11.
Abdallah JM, Mangin B, Goffinet B, CiercoAyrolles C, PerezEnciso M: A comparison between methods for linkage disequilibrium fine mapping of quantitative trait loci. Genet Res. 2004, 83: 4147. 10.1017/S0016672303006554.
 12.
Grapes L, Dekkers JCM, Rothschild MF, Fernando RL: Comparing linkage disequilibriumbased methods for fine mapping quantitative trait loci. Genetics. 2004, 166: 15611570. 10.1534/genetics.166.3.1561.
 13.
Kruglyak L: The road to genomewide association studies. Nat Rev Genet. 2008, 9: 314318. 10.1038/nrg2316.
 14.
Knott SA, Elsen JM, Haley CS: Methods for multiplemarker mapping of quantitative trait loci in halfsib populations. Theor Appl Genet. 1996, 93: 7180. 10.1007/BF00225729.
 15.
Fernando RL, Grossman M: Marker assisted prediction using best linear unbiased prediction. Genet Sel Evol. 1989, 21: 467477. 10.1186/12979686214467.
 16.
Baret PV, Knott SA, Visscher PM: On the use of linear regression and maximum likelihood for QTL mapping in halfsib designs. Genet Res. 1998, 72: 149158. 10.1017/S0016672398003450.
 17.
Mangin B, Goffinet B, Le Roy P, Boichard D, Elsen JM: Alternative models for QTL detection in livestock. II. Likelihood approximations and sire marker genotype estimations. Genet Sel Evol. 1999, 31: 225237. 10.1186/12979686313225.
 18.
Heath S: Markov chain Monte Carlo segregation and linkage analysis for oligogenic models. Am J Hum Genet. 1997, 61: 748760. 10.1086/515506.
 19.
Elsen JM, Mangin B, Goffinet B, Boichard D, Le Roy P: Alternative models for QTL detection in livestock. I. General introduction. Genet Sel Evol. 1999, 31: 213224. 10.1186/12979686313213.
 20.
Blott S, Kim JJ, Moisio S, SchmidtKüntzel A, Cornet A, Berzi P, Cambisano N, Ford C, Grisart B, Johnson D, Karim L, Simon P, Snell R, Spelman R, Wong J, Vilkki J, Georges M, Farnir F, Coppieters W: Molecular dissection of a quantitative trait locus: a phenylalaninetotyrosine substitution in the transmembrane domain of the bovine growth hormone receptor is associated with a major effect on milk yield and composition. Genetics. 2003, 163: 253266.
 21.
Li J, Zhou Y, Elston RC: Haplotypebased quantitative trait mapping using a clustering algorithm. BMC Bioinformatics. 2006, 7: 25810.1186/147121057258.
 22.
Thompson EA, Heath S: Estimation of conditional multilocus gene identity among relatives. Statistics in Molecular Biology and Genetics: Selected Proceedings of a 1997 Joint AMSIMSSIAM Summer Conference on Statistics in Molecular Biology, of IMS Lecture NoteMonograph Series. Edited by: SeillierMoiseiwitsch F. 1999, Institute of Mathematical Statistics (Hayward, CA), 33: 95113.
 23.
Qian D, Beckmann L: Minimumrecombinant haplotyping in pedigrees. Am J Hum Genet. 2002, 70: 14341445. 10.1086/340610.
 24.
Windig JJ, Meuwissen THE: Rapid haplotype reconstruction in pedigrees with dense marker maps. J Anim Breed Genet. 2004, 121: 2639. 10.1046/j.14390388.2003.00439.x.
 25.
Abraham KJ, Totir LR, Fernando RL: Improved techniques for sampling complex pedigrees with the Gibbs sampler. Genet Sel Evol. 2007, 39: 2738. 10.1186/1297968639127.
 26.
Wang T, Fernando RL, Vanderbeek S, Grossman M, Vanarendonk JAM: Covariance between relatives for a marked quantitative trait locus. Genet Sel Evol. 1995, 27: 251274. 10.1186/12979686273251.
 27.
Grignola FE, Hoeschele I, Tier B: Mapping quantitative trait loci in outcross populations via residual maximum likelihood .1. Methodology. Genet Sel Evol. 1996, 28: 479490. 10.1186/12979686286479.
 28.
Almasy L, Blangero J: Multipoint quantitativetrait linkage analysis in general pedigrees. Am J Hum Genet. 1998, 62: 11981211. 10.1086/301844.
 29.
Fernando R, Totir L: Incorporating molecular information in breeding programmes: methodology. Poultry genetics, breeding and biotechnology. Edited by: Muir W, Aggrey S. 2003, Wallingford: CAB International, 537548.
 30.
Fernando RL: Incorporating molecular markers into genetic evaluation. Book of abstracts of the 55th Annual Meeting of the European Association for Animal Production, Bled, Slovenia. 2004, 10: 86
 31.
Quaas RL: Additive genetic model with groups and relationships. J Dairy Sci. 1988, 71: 13381345.
 32.
Goddard ME: A mixed model for analyses of data on multiple geneticmarkers. Theor Appl Genet. 1992, 83: 878886. 10.1007/BF00226711.
 33.
PérezEnciso M, Varona L: Quantitative trait loci mapping in F2. crosses between outbred lines. Genetics. 2000, 155: 391405.
 34.
Lee SH, Werf van der JHJ: An efficient variance component approach implementing an average information REML suitable for combined LD and linkage mapping with a general complex pedigree. Genet Sel Evol. 2006, 38: 2543. 10.1186/1297968638125.
 35.
Zhao HH, Fernando RL, Dekkers JCM: Power and precision of alternate methods for linkage disequilibrium mapping of quantitative trait loci. Genetics. 2007, 175: 19751986. 10.1534/genetics.106.066480.
 36.
Almasy L, Williams JT, Dyer TD, Blangero J: Quantitative trait locus detection using combined linkage/disequilibrium analysis. Genet Epidemiol. 1999, 17 (Suppl 1): S31S36.
 37.
Gautier M, Barcelona RR, Fritz S, Grohs C, Druet T, Boichard D, Eggen A, Meuwissen THE: Fine mapping and physical characterization of two linked quantitative trait loci affecting milk fat yield in dairy cattle on BTA26. Genetics. 2006, 172: 425436. 10.1534/genetics.105.046169.
 38.
Hill WG, HernándezSánchez J: Prediction of multilocus identitybydescent. Genetics. 2007, 176: 23072315. 10.1534/genetics.107.074344.
 39.
Meuwissen THE, Goddard ME: Multipoint identitybydescent prediction using dense markers to map quantitative trait loci and estimate effective population size. Genetics. 2007, 176: 25512560. 10.1534/genetics.107.070953.
 40.
Hayes JF, Hill WG: Modification of estimates of parameters in the construction of genetic selection indices ('bending'). Biometrics. 1981, 37: 483493. 10.2307/2530561.
 41.
Gilbert H, Firat MZ, Totir LR, Dekkers JCM, Fernando RL: A new method to fine mappinf quantitative trait locus using linkage disequilibrium. Book of abstracts of the 55th Annual Meeting of the European Association for Animal Production, Bled, Slovenia. 2004, 10: 87
 42.
Yu J, Holland JB, McMullen MD, Buckler ES: Genetic design and statistical power of nested association mapping in maize. Genetics. 2008, 178: 539551. 10.1534/genetics.107.074245.
 43.
Verbyla AP, Cullis BR, Thompson R: The analysis of QTL by simultaneous use of the full linkage map. Theor Appl Genet. 2007, 116: 95111. 10.1007/s001220070650x.
 44.
Calus MPL, Meuwissen THE, de Roos APW, Veerkamp RF: Accuracy of genomic selection using different methods to define haplotypes. Genetics. 2008, 178: 553561. 10.1534/genetics.107.080838.
 45.
Druet T, Fritz S, Boussaha M, BenJemaa S, Guillaume F, Derbala D, Zelenika D, Lechner D, Charon C, Boichard D, Gut IG, Eggen A, Gautier M: Fine mapping of quantitative trait loci affecting female fertility in dairy cattle on BTA03 using a dense singlenucleotide polymorphism map. Genetics. 2008, 178: 22272235. 10.1534/genetics.107.085035.
Acknowledgements
This work benefit from a visit of the first author to Iowa State University, financed by the International Relations Department, INRA, and by Newsham Choice Genetics. Support from EU research project SABRE is gratefully acknowledged (Legarra). Support from the National Research Initiative Competitive Grants Program of the U.S. Department of Agriculture, Award 20073520517862 is gratefully acknowledged (Fernando). This work has much benefited from discussions with JeanMichel Elsen, Hélène Gilbert, Brigitte Mangin and Magali San Cristobal. We are also grateful to JeanMichel Elsen for carefully reading the manuscript. Reviewer's suggestions and corrections are gratefully acknowledged.
Author information
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
AL and RF derived the theory and wrote the text. AL performed the simulations and the numerical examples.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Legarra, A., Fernando, R.L. Linear models for joint association and linkage QTL mapping. Genet Sel Evol 41, 43 (2009) doi:10.1186/129796864143
Received
Accepted
Published
DOI
Keywords
 Linkage Analysis
 Genomic Selection
 General Pedigree
 Haplotype Class
 Small Chromosomal Segment
Comments
View archived comments (1)