Mapping linked quantitative trait loci via residual maximum likelihood

Detection de genes lies a effets quantitatifs (QTL) grâce au maximum de vraisemblance residuelle. On presente une methode de maximum de vraisemblance residuelle pour estimer les positionss et les contributions a la variabilite genetique de deux QTLs lies. La methode fournit egalement des tests de l'existence d'un seul QTL lie a un groupe de marqueurs (par napport a zero) ou de deux QTLs (par rapport a un seul). Un algorithme deterministe sans calcul de derivees est utilise. La matrice de variance-covariance des effets alleliques a chaque QTL et son inverse est calculee conditionnellement a l'information incomplete sur les marqueurs multiples lies. Les covariances entre les effets aux differents QTLs et entre les effets aux QTLs et les effects polygeniques sont supposeesn nulles. Une etude de simulation a ete effectuee pour analyser les parametres estimes et les tests de rapports de vraisemblance. Le schema experimental a ete un schema petites-filles avec 2 000 fils, 20 peres des fils et 9 ancetres de ces peres. Des donnees ont ete simulees avec un modele de variation au QTL de type Gaussien ou biallelique. Les genotypes pour cinq ou neuf marqueurs egalement espaces ont ete generes pour tous les fils et leus ancetres. Deux QTLs lies expliquaient conjointement 50 % ou 25 % de la variance genetique additive et la distance entre les QTLs variait de 20 CM a 40 CM. La puissance de detection d'un second QTL a depasse 0,5, dans tous les cas pour la situation 50 %, et quand la distance entre QTLs etait superieure ou egale ) 30 CM pour la situation 25 %. Le test d'un QTL par rapport a deux QTLs correspond a la reunion de deux tests. On l'a trouve plutot conservatif. Les parametres ont ete estimes avec une grande precision excepte la distance entre deux QTLs proches qui a ete legerement surestimee.


INTRODUCTION
A variety of methods for the statistical mapping of quantitative trait loci (QTL) exist. While some methods analyze squared phenotypic differences of relative pairs (eg, Haseman and Elston, 1972;Gotz and Ollivier, 1994), most methods analyze the individual phenotypes of pedigree members. Main methods applied to livestock populations are maximum likelihood (ML) (eg, Weller, 1986;Lander and Botstein, 1989; Knott and Haley, 1992), least-squares (LS) as an approximation to ML (eg, Weller et al, 1990;Haley et al, 1994;Zeng, 1994), and a combination of ML and LS referred to as composite interval mapping (Zeng, 1994) or multiple QTL mapping (Jansen, 1993). These methods were developed mainly for line crossing and, hence, cannot fully account for the more complex data structures of outcross populations, such as data on several families with relationships across families, incomplete marker information, unknown number of QTL alleles in the population and varying amounts of data on different (aTLs or in different families. Recently, Thaller and Hoeschele (1996a, b)   implemented a Bayesian method for QTL mapping using single markers or all markers on a chromosome, respectively, via Markov chain Monte Carlo algorithms, and applied the analyses to simulated granddaughter designs identical to those in the present study. Hoeschele et al (1997) showed that the Bayesian analysis can accommodate either a biallelic or a normal-effects QTL model. While the Bayesian analysis was able to account for pedigree relationships both at the QTL and for the polygenic component, and gave good parameter estimates, it was very demanding in terms of computing time, in particular when fitting two (aTLs (Uimari and Hoeschele, 1997). Therefore, Grignola et al (1996a) developed a residual maximum likelihood method, using a deterministic, derivative-free algorithm, to map a single QTL. Hoeschele et al (1997) showed that this method can be considered as an approximation to the Bayesian analysis fitting a normal-effects QTL model. In the normaleffects QTL model postulated by the REML analysis, the vector of QTL allelic effects is random with a prior normal distribution. The REML analysis builds on earlier work by Fernando and Grossman (1989), Cantet and Smith (1991) and Goddard (1992) on best linear unbiased prediction of QTL allelic effects by extending it to the estimation of QTL, polygenic, and residual variance components and of QTL location, using incomplete information from multiple linked markers. Xu and Atchley (1995) performed interval mapping using maximum likelihood based on a mixed model with random QTL effects, but these authors fitted additive genotypic effects rather than allelic effects at the QTL, with variance-covariance matrix proportional to a matrix of proportions of alleles identical-by-descent, and assumed that this matrix was known. Their analysis was applied to unrelated fullsib pairs. In order to account for several QTLs on the same chromosome, Xu and Atchley (1995) used the idea behind composite interval mapping and fitted variances at the two markers flanking the marker bracket for a QTL. This approach, however, is not appropriate for multi-generational pedigrees, as effects associated with marker alleles erode across generations owing to recombination. It is also problematic for outbred populations, where incomplete marker information causes the flanking and next-to-flanking markers to differ among families.
In this paper, we extend the REML method of Grignola et al (1996a) to the fitting of multiple linked QTLs. While the extension is general for any number of linked QTLs, we apply the method to simulated granddaughter designs by fitting either one or two QTLs.

METHODOLOGY Mixed linear model
The model is identical to that of Grignola et al (1996a), except that it includes effects at several (t) QTLs, and it can be written as: where y is a vector of phenotypes, X is a design-covariate matrix, j3 is a vector of fixed effects, Z is an incidence matrix relating records to individuals, u is a vector of residual additive (polygenic) effects, T is an incidence matrix relating individuals to alleles, v i is a vector of QTL allelic effects at QTL i, e is a vector of residuals, A is the additive genetic relationship matrix, c 7 ' is the polygenic variance, Q, 0 ,2 V( i) is the variance-covariance matrix of the allelic effects at QTL i conditional on marker information, Qv!2! is the allelic variance at QTL i (or half of the additive variance at QTL i), R is a known diagonal matrix, and Q e is residual variance. Each matrix Q i depends on one unknown parameter, the map position of QTL i (d i ). Parameters related to the marker map (marker positions and allele frequencies) are assumed to be known. The model is parameterized in terms of the unknown parameter's heritability (h 2 = 0&dquo;!/0&dquo;2), with aj_ being phenotypic and U2 additive genetic variance, fraction of the additive genetic variance explained by the allelic effects at QTL i (v? = o,' v(i) /'a 2; i = l, ... , t), the residual variance 0,2, e and QTL map locations d l , ... , di, ... , dt.
A model equivalent to the animal model in [1] is (Grignola et al, 1996a): where W has at most two non-zero elements equal to 0.5 in each row in columns pertaining to the known parents of an individual, F i is a matrix with up to four non-zero elements per row pertaining to the QTL effects of an individual's parents (Wang et al, 1995;Grignola et al, 1996a), Ap and Qp( j ) are sub-matrices of A and Q, respectively, pertaining to all animals that are parents, and m and e i are Mendelian sampling terms for polygenic and QTL effects, respectively, with covariance matrices as specified in equation !2!. While Var(m) is diagonal, Var(e i ) can have some off-diagonal elements in inbred populations (Hoeschele, 1993;Wang et al, 1995 (Bulmer, 1985). A reduced animal model (RAM) can be obtained from model [2] by combining m, the e i (i = 1, ... , t) and e into the residual. Mixed model equations (MME) can be formed directly for the RAM, or by setting up the MME for model [2] and absorbing the equations in m and the e i (i = 1,..., t). The resulting MMEs for the RAM and for t = 2 (aTLs are: with the A matrices defined in equation !2!. Matrix D, which results from successive absorption of the Mendelian sampling terms for the polygenic component and the QTLs, can be shown to be always diagonal and very simple to compute, even when several (aTLs (t > 2) are fitted. Let 6v!i!!! represent the Mendelian sampling term pertaining to v effect k (k = 1, 2) of individual j at QTL i, and 6,,( j ) the Mendelian sampling term for the polygenic effect of j. Then, the element of D pertaining to individual j (d jj ) is computed as follows: where r jj is the jth diagonal element of R-1 .

REML analysis
The REML analysis was performed using interval mapping and a derivative-free algorithm to maximize the likelihood for any given set of QTL positions, as described by Grignola et al (1996a) for a single QTL model. The log residual likelihood for the animal model was obtained by adding correction terms to the residual likelihood formed directly from the RAM MME (Grignola et al, 1996a).

The RAM residual likelihood is:
where N is the number of phenotypic observations, NF the number of estimable fixed effects (rank of X), NR RA ,yI the number of random genetic effects of the parents ((1 + 2t) times the number of parents), CRAM is the coefficient matrix in the left-hand-side of [3], P = V-' -V-1 X(X'V-1 X)-1 X'V-1 , V = Var(Y)/(7!, and GRA,!,I is a block-diagonal with blocks Ap(7! and Qp(i)(7!(i) for i = 1, ... , t (see also Meyer, 1989).
The RAM residual likelihood is modified to obtain the residual likelihood for the animal model as follows (Grignola et al 1996a): where A is the block-diagonal with blocks A u and !v(i) (i = 1, ... , t) from !2!, Czz is the part of the MME for model [2] pertaining to m and e i (i = 1, ... , t ), and NR is total number of random genetic effects !(1 !-2t) times the number of animals] in the animal model. The analysis is conducted in the form of interval mapping as in Grignola et al (1996a), except that now a t-dimensional search on a grid of combinations of positions of the t CaTLs must be performed. More precisely, we performed cyclic maximization by optimizing the position of the first QTL while holding the position of the second QTL constant and subsequently fixing the position of the first QTL while optimizing the position of the second QTL, etc. A minimum distance was allowed between the QTLs, which was determined such that the (aTLs were always separated by two markers. Whittaker et al (1996) showed that for regression analysis and F2 or backcross designs, the two locations and effects of two (aTLs in adjacent marker intervals are not jointly estimable. With other methods and designs, locations and variances of two (aTLs in adjacent intervals should be either not estimable or poorly estimated. At each combination of d 1 and d 2 values, the residual likelihood is maximized with respect to the parameters h z , v2 (i = 1,..., t) and er!.
Matrices Qp( j ), F i and Ov!2! were calculated for each QTL as described in Grignola et al (1996a).

Hypothesis testing
The presence of at least one QTL on the chromosome harboring the marker linkage group can be tested by maximizing the likelihood under the one-QTL model and under a polygenic model with no QTL fitted (Grignola et al, 1996a). The distribution of the likelihood ratio statistic for these two models can be obtained via simulation or data permutation (Churchill and Doerge, 1994;Grignola et al, 1996a, b;. Here, we consider testing the one-QTL model against the two-QTL model. This test is performed by comparing the maximized residual likelihood under the two-QTL model with (i) the maximized residual likelihood under the one-QTL model, (ii) the residual likelihood maximized under the one-QTL model with QTL position fixed at the REML estimate of d 1 obtained under the two-QTL model, and (iii) the residual likelihood maximized under the one-QTL model with QTL position fixed at the REML estimate of d 2 obtained under the two-QTL model. The distribution of these likelihood ratio statistics is not known, and obtaining it via data permutation would be difficult computationally, as many permutations would need to be analyzed, and as the two-dimensional search took 1-2 h of run-time for the design described below. The likelihood ratios corresponding to (i) (LR d ), (ii) (LR dl ), and (iii) (LR d2 ) should have an asymptotic chi-square distribution within 1 and 3 degrees of freedom. When using LR dl and LR dz , both ratios have to be significant in order to reject the null hypothesis of one QTL. This test is an intersection-union test (Casella and Berger, 1990;Berger, 1996), where for the first likelihood ratio the hypotheses are: H o : &OElig;!(l) -I-0 and &OElig;!(2) = 0 versus H 1 : &OElig;!(l) -I-0 and &OElig;!(2) -I-0, and for the second likelihood ratio the hypotheses are: Ho: &OElig;!(1) = 0 and &OElig;!(2) -I-0 versus H 1 : &OElig;!(1) -I-0 and &OElig;!(2) -I-0. The intersectionunion test constructed in this way can be quite conservative, as its size may be much less than its specified value ce. For genome-wide testing, the significance level should also be adjusted for the number of independent tests performed (the number of chromosomes analyzed times the number of independent traits).

Design
The design simulated was a granddaughter design (GDD) as in the single QTL study of Grignola et al (1996b), where marker genotypes are available on sons and phenotypes on daughters of the sons. The structure resembled the real GDD of the US public gene mapping project for dairy cattle based on the dairy bull DNA repository (Da et al, 1994). The simulated GDD consisted of 2 000 sons, 20 sires, and nine ancestors of the sires (fig 1).. The phenotype simulated was daughter yield deviation (DYD) of sons (Van-Raden and Wiggans, 1991). DYD is an average of the phenotypes of the daughters adjusted for systematic environmental effects and genetic values of the daughters' dams. For details about the analysis of DYDs, see Grignola et al (1996b).
Marker and QTL genotypes were simulated according to Hardy-Weinberg frequencies and the map positions of all loci. All loci were in the same linkage group. Each marker locus had five alleles at equal frequencies. Several designs were considered which differed in the map positions of the two QTLs, in the number of markers, and in the proportion of the additive genetic variance explained by the two QTLs. These designs are defined in table II. Also simulated was a single QTL at 45 cM to test the two-QTL analysis with data generated under the single QTL model.
Polygenic and QTL effects were simulated according to the pedigree in figure 1. Data were analyzed by using the pedigree information on the sires. Note that in the simulation, no linkage disequilibrium (across families) was generated, ie, covariances between pairs of effects at different (aTLs or between QTL and polygenic effects were zero. Therefore, an additional design was simulated where linkage disequilibrium was generated by simulating DYDs also for sires, creating a larger number of sires and culling those sires with DYD lower than the 90th percentile of the DYD distribution. QTL positions for this design were 30 cM (interval 2) and 70 cM (interval 3) with five markers, and the QTL model was the normal-effects model (see below). Estimates of the simulated correlations (SE in parentheses), across 30 replicates, were -0.20 (0.05), -0.33 (0.04), and -0.32 (0.04), between pairs of v effects at QTL 1 and QTL 2, between pairs of v effects at QTL 1 and polygenic effects, and between pairs of v effects at QTL 2 and polygenic effects, respectively. The effects of one or several generations of phenotypic truncation selection on additive genetic variance in a finite locus model has been studied analytically by Hospital and Chevalet (1996). QTL models Two different QTL models were used to simulate data. Under both models, phenotypes were simulated as where n j was the number of daughters of son j, gi!k was the sum of the v effects in daughter k of son j at QTL i, u j was a normally distributed polygenic effect, e j was a normally distributed residual, polygenic variance ( 0 &dquo;) was equal to the difference between additive genetic variance (afl ) and the variance explained by the QTLs, and afl was environmental variance. Number of daughters per son was set to 50, corresponding to a reliability (Van Raden and Wiggans, 1991) near 0.8. Narrow sense heritability of individual phenotypes was h 2 = 0.3, and phenotypic SD was Q P = 100.
Note that the QTL contribution to the DYDs of sons was generated by sampling individual QTL allelic effects of daughters under each of the two genetic models described below. This sampling of QTL effects ensures that DYD of a heterozygous son, or of a son with substantial difference in the additive effects of the alleles at RESULTS The designs studied are described in table II and differ in the QTL positions, in the number of markers, and in the proportion of the additive genetic variance explained jointly by two linked QTLs. Overall, the QTL parameters were estimated quite accurately as in the single-CaTL analysis of Grignola et al (1996b), except that there was a tendency to overestimate the distance between the CaTLs with decreasing true distance.
Parameter estimates for all designs in table II and for the normal-effects QTL model used in the data simulations are presented in table III. There appeared to be a slight tendency to overestimate the QTL variance contributions (v 2 ), but, in most cases not significantly. The QTL map positions and the distance between the QTLs were estimated accurately when the true map distance between the (aTLs was 30 or 40 cM. When the true map distance was only 20 cM, there was a tendency to overestimate the QTL distance. This overestimation was significantly more pronounced when the number of markers was reduced from nine (every 10 cM, designs IIIA, B) to five (every 20 cM, designs IVA, B). To investigate whether the overestimation of the QTL distance was related to the search strategy requiring a minimum distance between the QTLs such that these were always separated by two markers (with the exception of designs IVA, B), the minimum distance was reduced to 10 and 2 cM. However, parameter estimates and likelihood ratios remained unchanged.
When the (aTLs accounted jointly for only 25% of the additive genetic variance as compared to 50%, there was little change in the precision of the estimates of the QTL variance contributions. Standard errors of the QTL positions were higher, and overestimation of the distance between (aTLs only 20 cM apart was slightly more pronounced. Parameter estimates for designs simulated under the biallelic QTL model are shown in table V. Except for the QTL model, these designs are identical to designs IA, B and IIIA, B in table II. Parameters were estimated with an accuracy not noticeably lower than for the normal-effects QTL model, an observation in agreement with the single-(aTL study of Grignola et al (1996b).
When analyzing the designs in table II with the single-(aTL model, the most likely QTL position (d in tables III and V) was always somewhere in between the QTL positions estimated under the two-QTL model. Averaged across replicates, the estimated QTL position was very near the mean of the true positions. This result was expected, as both (aTLs had equal variance contributions and on average equally informative flanking markers. Likelihood ratio statistics for all designs in table II and for the normal-effects QTL model are presented in table IV. The average values of the likelihood ratio statistics for testing between the single-and two-QTL models declined as expected with decreasing distance between the two QTLs (designs IIA, B versus designs IIIA, B), with decreasing number of markers (designs IA, B versus IIA, B, and designs IIIA, B versus designs IVA, B), and with decreasing joint variance contribution of the QTLs (A versus B). The average value of the likelihood ratio statistic LR d was always considerably lower than those of LR dl and LR d2 .
The power figures in table IV were calculated assuming that LR dl , LR d2 and LR d follow either a chi-square distribution with 1 df or 3 df and using an cx value of 0.05/29 = 0.0017. To allow for any interpretation of these power figures, we estimated the type-I error by simulating data with a single QTL explaining either 25, 12.5 or 6.25% of the additive genetic variance. For the type-I error estimation, a was set to 0.05, and the number of replicates was 200. Estimates of type-I errors are in table VI, for the two tests (LR d and LR d , and LR d2 ) and for thresholds from chi-square distributions with 1, 2 and 3 df. Type-I errors tended to increase slightly with size of the QTL variance, and were consistently lower for the test using LR d . Based on these results, the empirical type-I error was close to the pre-specified value of 0.05 for the LR dl and LR d2 tests when using the xi-threshold, while it was consistently too low for the LR d test.
With this background, the power of rejecting the single-(aTL model based on requiring both LR dl and LR d2 to exceed the significance threshold was as expected always higher than or equal to the power of the LR d statistic. For the test based on LR dI and LR d2 , power declined as expected with decreasing distance between QTLs and with decreasing true QTL variance contribution. For the joint QTL variance contribution of 50% and the test based on LR d , and LR d2 , power was equal to or higher than 0.5 always for the xi threshold and always except for design IVA for the X2 threshold. For the joint QTL variance contribution of 25%, a power of at least 0.5 was achieved only for the 30 cM distance between QTLs and the X2 and X3 thresholds. This finding must be interpreted by keeping in mind the choice of a = 0.0017 and the fact that the test, as contructed here, is rather conservative.
For the data simulated with a single QTL explaining either 25 or 12.5% of the additive genetic variance, the map positions estimated under the two-QTL model were near the true position and a ghost position to either side of the true position in most replicates. This behavior of the REML analysis seems to support the use of LR dl and LR d2 instead of LR d .
Likelihood ratio statistics for the biallelic QTL model and some of the designs in table II are presented in table V. Overall, likelihood ratios and power figures were similar to those for the normal-effects QTL model, with somewhat lower power for design IA but slightly higher power for other designs. These differences are most likely due to the limited number of replications (30).
The cyclic maximization strategy for the two-QTL model took about 20 min of serial wall-clock time on a 21-processor IBM SP2 system for a chromosome of 80 cM length, compared with 1.5 h for a two-dimensional search. Run-time for the single-QTL analysis was at most 8 min.
For the designs in table II, the two QTLs had equal variance contributions. When linkage disequilibrium was generated by phenotypic truncation selection of sires for the design with QTL positions of 30 and 70 cM and joint QTL genetic variance contribution of 50%, QTL parameters and their estimates were clearly affected. Heritability was estimated low (0.213 ! 0.020), and the vfl (k = 1, 2) were estimated high (0.184 ±0.014, 0.211 ±0.014), but QTL positions were estimated accurately (0.292 t 0.010, 0.716 ± 0.007). Power appeared to be somewhat reduced compared to the same design without selection, and was estimated at 0.86 and 0.73 for the X i 1 and X thresholds, respectively. Reduction in power was probably due to the high estimate of error variance (1737.9 t 55.3).

CONCLUSIONS
The REML analysis of Grignola et al (1996a, b), based on a mixed linear model with random and normally distributed QTL allelic effects and conditional on incomplete information from multiple linked markers, has been extended here to fit multiple linked QTLs. This extension is necessary to eliminate biases in the estimates of the QTL parameters position and variance, which occur when fitting a single QTL and other linked QTLs are present. For the present study, the analysis had been implemented for two QTLs on the same chromosome using a two-dimensional search. When fitting more than two linked QTLs or additional unlinked QTLs, a more efficient search strategy may be required, or alternative algorithms (eg, Meyer and Smith, 1996). In the meantime, a cyclic optimization approach was implemented, with one QTL position held constant while optimizing the other, and vice versa, which reduced the number of likelihood evaluations and hence CPU time considerably relative to a two-dimensional search. As likelihood maximizations at different position combinations are independent of each other, use of multiple processors, if available, would reduce run-times substantially.
For the one-QTL model, relationships between the REML analysis, the equivalent method of Xu and Atchley (1995), the method of Schork (1993), and the Bayesian analysis of  were discussed in Grignola et al (1996a). The method of Xu and Atchley (1995) fitting variances associated with the nextto-flanking markers to account for additional linked QTLs would not have worked well for the designs studied here. A first reason is the inclusion of ancestors of the sires in the analysis, as their method fits random effects associated with the marker alleles in founders which erode across generations due to recombination. Another reason is the small number of families differing in the flanking and next-to-flanking markers, resulting in too little information for estimation of variances associated with the next-to-flanking markers in the method of Xu and Atchley (1995). For similar reasons, composite interval mapping (Zeng, 1994) and multiple QTL mapping (Jansen, 1993), which are based on the inclusion of markers as cofactors, are not suitable for the analysis of multi-generational pedigrees. REML analysis under the two-QTL model yielded fairly accurate parameter estimates. Map distance between the QTLs was overestimated, with decreasing distance between the two QTLs and wider spacing of markers. As in the single QTL study, the REML analysis was robust to the number of alleles at the QTLs, as there was relatively little difference in parameter estimates and likelihood ratio statistics between designs generated with the normal-effects and biallelic QTL models, given the number of replicates performed. Previous linkage analyses (eg, Knott and Haley, 1992;Haley et al, 1994) lead to the conclusion that a minimum distance of 20 cM was required between linked QTLs for their separate detection. This result was confirmed in the present study. However, discrimination among different numbers of QTL (eg, one versus two) requires additional research, including an investigation of alternative approaches such as an adaptation of composite interval mapping to pedigree analysis. Gains in power from fitting additional unlinked QTL and selection of QTL to be included in the analysis are related topics warranting further research.
If there is linkage disequilibirum due to selection, QTL positions will still be estimated accurately, while variance estimates and power may be affected. Accounting for disequilibrium in the analysis should be investigated.