Mapping quantitative trait loci in outcross populations via residual maximum likelihood. II. A simulation study

La position et la contribution a la variance d'un locus de caractere quantitatif (QTL) ont ete estimees avec les variances polygeniques additives et residuelles par une methode de maximum de vraisemblance residuelle et un algorithme sans derivation. La matrice de variance-covariance des effets de QTL et son inverse ont ete calculees conditionnellement a l'information incomplete relative a des ensembles de marqueurs lies. Une simulation a ete realisee pour determiner la precision des parametres estimes et les tests de rapport de vraisemblance. Le plan d'experience etait un schema petite-fille, avec 2 000 fils, 20 peres et 9 ancetres de peres. Des schemas avec 1 000 et 600 fils ont egalement ete etudies. Les donnees ont ete simulees sous trois modeles genetiques differents pour le QTL, un modele biallelique, un modele a 10 alleles d'egale frequence, et un modele avec des effets alleliques distribues normalement et d'une maniere independante chez les individus de base. Le caractere analyse etait la production des petites-filles en ecart a la moyenne, ou leur moyenne ajustee pour les effets de milieu et les valeurs genetiques des conjointes des fils. Les genotypes pour cinq marqueurs situes sur un meme chromosome ont ete generes pour tous les fils et leurs ancetres, et les donnees ont ete analysees avec ou sans relation de parente entre les peres. Les parametres etaient estimes avec une bonne precision dans les trois modeles simules. La methode etait d'une robustesse satisfaisante relativement au nombre d'alleles dans les schemas etudies.

Summary -Position and variance contribution of a single QTL together with additive polygenic and residual variance components were estimated using a residual maximum likelihood method and a derivative-free algorithm. The variance-covariance matrix of QTL effects and its inverse were computed conditional on incomplete information from multiple-linked markers. Simulation was employed to investigate the accuracy of parameter estimates and likelihood ratio tests. The design was a granddaughter design with 2 000 sons, 20 sires of sons and 9 ancestors of sires. Designs with 1 000 and 600 sons were also investigated. Data were simulated under three different genetic models for the QTL, a biallelic model, a multiallelic model with ten alleles at equal frequencies, and a model with normally and independently distributed QTL allelic effects for base individuals. The trait analyzed was daughter yield deviation or the daughter average adjusted for environmental effects and merits of mates of the sons. Genotypes for five markers situated on the same chromosome were generated for all sons and their ancestors. Data were analyzed with and without relationships among sires. Parameters were estimated with good accuracy under all three simulation models. The REML method was fairly robust to the number of alleles at the QTL for the designs studied. quantitative trait loci / residual maximum likelihood / mapping / simulation / granddaughter design Résumé -La cartographie de locus de caractère quantitatif dans des populations en ségrégation à l'aide du maximum de vraisemblance résiduelle. II. Étude de simulation. La position et la contribution à la variance d'un locus de caractère quantitatif (QTL) ont été estimées avec les variances polygéniques additives et résiduelles par une méthode de maximum de vraisemblance résiduelle et un algorithme sans dérivation. La matrice de variance-covariance des effets de QTL et son inverse ont été calculées conditionnellement à l'information incomplète relative à des ensembles de marqueurs liés. Une simulation a été réalisée pour déterminer la précision des paramètres estimés et les tests de rapport de vraisemblance. Le plan d'expérience était un schéma petite-fille, avec 2 000 fils, 20 pères et 9 ancêtres de pères. Des schémas avec 1 000 et 600 fils ont également été étudiés. Les données ont été simulées sous trois modèles génétiques différents pour le QTL, un modèle biallélique, un modèle à 10 allèles d'égale fréquence, et un modèle avec des effets alléliques distribués normalement et d'une manière indépendante chez les individus de base. Le caractère analysé était la production des petites-filles en écart à la moyenne, ou leur moyenne ajustée pour les effets de milieu et les valeurs génétiques des conjointes des fils. Les génotypes pour cinq marqueurs situés sur un même chromosome ont été générés pour tous les fils et leurs ancêtres, et les données ont été analysées avec ou sans relation de parenté entre les pères. Les paramètres étaient estimés avec une bonne précision dans les trois modèles simulés. La méthode était d'une robustesse satisfaisante relativement au INTRODUCTION In a companion paper , a residual maximum likelihood (REML) method was derived for estimating position and variance contribution of a single QTL together with additive polygenic and residual variance components. The REML analysis was implemented with a derivative-free algorithm. The method overcomes the shortcomings of the traditional methods of linear regression (eg, Haley et al, 1994;Zeng, 1994) and maximum likelihood (ML) interval mapping (eg, Weller, 1986;Lander and Botstein, 1989;Knott and Haley, 1992). ML and regression methods cannot fully account for the more complex data structures of outcross populations, eg, data on several families with relationships across families, unknown linkage phases in parents, unknown number of QTL alleles in the population, and varying amounts of data information on different QTLs or in different families.
The REML method is based on a mixed linear model including random polygenic effects and random QTL effects. Polygenic effects represent the sum of the additive effects at all loci not linked to the markers. The QTL allelic effects are assumed to have a prior normal distribution with variance-covariance matrix conditional on information from multiple linked markers. Because the true nature of the QTLs is unknown, ie, the number of alleles at a QTL in the population studied is unknown, the robustness of the REML analysis to the number of alleles at the QTL must be evaluated.
One of the main experimental designs for QTL mapping in livestock is the half-sib design, used in cattle in the form of daughter or granddaughter designs (Weller, 1990). In this paper, we evaluate the accuracy of the REML analysis in QTL mapping using granddaughter designs. The simulated designs resemble actual designs for the US Holstein population. Data are simulated under several genetic models differing in the number of QTL alleles. The analysis is carried out with and without consideration of relationships among sires. Here, we only present simulation results. The analysis is described in detail in the companion paper . SIMULATION

Design
The most frequently used design for mapping QTL in dairy cattle is the granddaughter design (GDD), where marker genotypes are collected on sons and phenotypes on daughters of the sons. A GDD was simulated with a pedigree structure resembling the real GDD of the US public gene mapping project for dairy cattle based on the Dairy Bull DNA Repository (Da et al, 1994). The simulated GDD consisted of 2 000 sons, 20 sires, and 9 ancestors of the sires (fig 1), and is identical to the design used in the Bayesian linkage analyses of Thaller and Hoeschele (1996a,b) and Uimari et al (1996a). While this design had 100 sons per sire, designs with only 50 or 30 sons per sire were also simulated.
The phenotype simulated was daughter yield deviation (DYD) of sons (Van-Raden and Wiggans, 1991). DYD is an average of the phenotypes of the daughters adjusted for systematic environmental effects and genetic values of the daughter's dams. Total variance of DYD equals Var(DYD) = 0.25 0 &dquo;;/ R and can be factored into where R is reliability or squared accuracy of the son's estimated additive genetic effect or breeding value, and 0 & d q u o ; ; is the additive genetic variance. In this factorization, the first component is the variance among the sons' transmitting abilities or half of their additive genetic values, and the second term is 'residual variance' or the variance of the average dam and Mendelian genetic effect of the daughters and the average environmental effect. Variance of DYD can be rewritten as where w = (1 -R)/R and 0 & d q u o ; ; = 0.25 Q a. Therefore, when analyzing DYD with the weight w, DYD has an expected heritability of 0.5, because the expected value of the estimate of 0 &dquo;; is 0.25a2 a. Hence, there is the option of treating heritability as known in the REML analysis when phenotypic data are DYDs.
Marker and QTL genotypes were simulated according to Hardy-Weinberg frequencies and the map positions of all loci. One linkage group was considered which consisted of five marker loci and one QTL. Each marker locus had five alleles at equal frequencies, with the exception of one design where each marker had only three alleles at equal frequencies. The markers were spaced 20 cM apart and, for the results presented here, the QTL was located in interval 3 at 5 cM from the left marker (other QTL positions were simulated to verify that the analysis was working properly).
Polygenic and QTL effects were simulated according to the pedigree in figure 1. Data were analyzed (i) using full pedigree information and (ii) assuming that the 20 sires in figure 1 were unrelated, as is common practice. The QTL contribution to the DYDs of sons was generated by sampling individual QTL allelic effects of daughters under each of the genetic models as described below. This sampling of QTL effects assures that DYD of a heterozygous son or of a son with substantial difference in the additive effect of its two QTL alleles has larger variance among daughters due to the QTL than a homozygous son or a son with similar QTL allelic effects.

Genetic models
Three different genetic models were used to simulate data. Common to all models were the parameters narrow sense heritability of individual phenotypes h 2 = 0.3, phenotypic SDa p = 100, and the order of and recombination rates among all loci.
Under all three models, phenotypes were simulated as where n i was the number of daughters of son i, g was the sum of the v effects in daughter j of son i, u was a normally distributed polygenic effect, e was a normally distributed residual, polygenic variance (or2) was equal to the difference between additive genetic variance (oa 2 ) and the variance explained by the QTL (2 0 ,2) , and or,2, was environmental variance. Number of daughters per son was set to 50, corresponding to a reliability (VanRaden and Wiggans, 1991) of near 0.8.
The ratio of the QTL allelic variance (ov2) to the additive genetic variance ( 0 ,2) is denoted by v 2 below.

Model 1. Normal-effects model
For each individual with one or both parents unknown, one or both QTL effects, respectively, were drawn from N(O, a v 2 ) . For the pedigree in figure 1, there were 32 distinct base alleles, and the QTL was treated as a locus with 32 distinct alleles in passing on alleles to descendants. The parameter U V2 was set to 0.250&dquo;! or 0.06250&dquo;!, ie, the simulated QTL accounted for 50% (2v 2 = 0.5) or 12.5% (2v 2 = 0.125) of the total additive genetic variance, respectively. In an additional simulation, v 2 was set to zero to obtain the empirical distribution of the test statistic under the null hypothesis.
Model 2. Multiallelic model The QTL had ten alleles with equal frequencies. For the biallelic QTL with 2v 2 = 0.5 (see below), the difference among homozygotes was 2a = 20' a . For the multiallelic QTL, means of the ten homozygous genotypes ranged from -g a to Qa at equal intervals. Means of heterozygotes were calculated assuming additive gene action.
Given these means (!,), additive effects of alleles were determined as where pi = p 2 = ... = Pi o = p = 0.1. The variance at the QTL was which yielded a value of 2v 2 = 0.204.
Model 3. Biallelic model The QTL was biallelic with allele frequency of 0.5. The variance at the QTL was where for p = 0.5 and 2v 2 = 0.5 or 2 V 2 = 0.125, QTL substitution effect a was determined and used to compute the additive effects of the two QTL alleles as -pa and (1 -p)a.

RESULTS
The REML analysis for single QTL mapping using all markers on a chromosome is decribed in detail in the companion paper . Analyses were performed with and without considering relationships among sires and with the marker linkage phases of sires and ancestors known or unknown. For the designs considered here, the most probable (more than 90%) linkage phase of the sires was always the true phase, but phases probabilities calculated for the ancestors indicated more uncertainty about their true phases. When phases were treated as unknown, the analysis employed equation (11! of Grignola et al (1996) to calculate the inverse of the variance-covariance matrix of the QTL effects of ancestors and sires. Contributions from sons were calculated assuming that the most likely sires phases equalled the true phases. Analysis of a single data set took around 8 mins of computing time on an IBM SP2 system with RS6000 390 and 590 nodes. In a preliminary investigation, the REML analysis, using a derivative-free Simplex algorithm was started from very different initial values for the parameters to verify convergence to the same estimates. As an example, a particular data set simulated under the biallelic QTL model (2v 2 = 0.5) was analyzed using the two starting value sets [0.9, 0.45, 0. Parameter estimates and likelihood ratio statistics are shown in tables II-VI.
All results are based on 50 replicates. In table II, results for the normal-effects QTL model with 2v 2 = 0.5 are presented. Several different analyses, described in table II, were conducted. First, sires were simulated as unrelated and analyzed without relationships. Then, relationships among sires were simulated according to figure 1. In analyses of these data, sires were treated as unrelated, treated as related with known marker linkage phases (ie, the true phases were used for all sires and ancestors), treated as related with the most likely linkage phases used in place of the true phases, or treated as related with linkage phases considered as unknown (by using equation [11] in Grignola et al, 1996).
Accuracy of parameter estimates was slightly higher when sires were simulated and analyzed as being unrelated (analysis I), compared to the case where sires were simulated and analyzed as related (analyses III-V). When sires were simulated related and analyzed unrelated, the estimate of heritability was slightly lower and the likelihood ratio statistic was lower than the corresponding values obtained with sires treated as related (analyses II and IV). When sires were analyzed using relationships, treating the most likely marker linkage phases of sires and ancestors as the true phases produced almost identical parameter estimates and likelihood ratio statistics as considering linkage phases as unknown (analyses III-V). Only small differences between analyses with known, most likely, or unknown linkage phases are to be expected for a granddaughter design with marker information on 100 sons per sire.
A further analysis (analysis VI in table II) was conducted on the same GDD except that markers had three alleles at equal frequencies rather than the five alleles simulated for all other design variations. Parameter estimates were not noticeably affected by the decline in marker polymorphism, but the average likelihood ratio statistic was reduced.
In the last analysis of table II (VII), heritability was fixed at 0.5. Accuracy of estimates of the QTL parameters and of residual variance was improved but the value of the likelihood ratio statistic was almost unchanged. Table III contains results for data sets generated with 2v 2 = 0.125 and with relationships among sires, and analyzed first by ignoring relationships among sires and secondly by accounting for relationships with linkage phases treated as unknown (equation !11! in Grignola et al, 1996). Analyses I and II in table III were conducted for the GDD with 100 sons per sire. The estimate of heritability and the likelihood ratio statistic were again lower when relationships among sires were ignored. Expectedly, position of this smaller QTL was less accurately estimated, and the likelihood ratios were considerably lower than those in table II. In the analyses of the granddaughter designs with only 50 (analyses III and IV) or 30 (analyses V and VI) sons, heritability estimates were also lower when sires were treated as unrelated, and QTL variance contribution was overestimated. The differences between analyses with and without relationships in the likelihood ratio statistics were rather small for the designs with 30 and 50 sons, where the likelihood ratio statistics were near the threshold values of 5.99 (0.05 type-I error level) and 9.21 (0.01 type-I error level) when assuming a chi-square distribution with two degrees of freedom.
Figures 3 and 4 depict residual likelihood profiles for single replicates generated with 2v 2 = 0.5 and 2 V 2 = 0.125, respectively, and obtained from analyses with relationships among sires and with linkage phases of sires and ancestors treated as unknown. Both figures display the marker positions, the most likely QTL location, and a confidence interval (CI) for the QTL position calculated by the LOD dropoff method of Lander and Botstein (1989). The limits of this CI were found by determining the map position at either side of the most likely position, where the LOD score had fallen by one unit (this calculation required converting natural logarithms to base 10 logarithms). As pointed out earlier, information from all markers was utilized, leading to smoother profiles (Knott and Haley, 1992;Georges et al, 1995) than the original interval mapping method of Lander and Botstein (1989).
Results for the multiallelic model (2v 2 = 0.204) are presented in table IV. Again, data sets were analyzed by first treating sires as unrelated and then by utilizing relationships among sires with linkage phases of sires and ancestors treated as unknown. Parameters were estimated quite accurately, and likelihood ratios were significant and similar to those for the normal-effects QTL model with 2v! = 0.125.
Results for the biallelic models with half of the homozygote difference equal to one additive genetic SD (2v 2 = 0.5) or to one half of it (2v 2 = 0.125) are presented in tables V and VI, respectively. For both tables, sires were related in the simulation, and data sets were analyzed by ignoring relationships and by accounting for relationships with linkage phases unknown (equation !11! in Grignola et al, 1996). Ignoring relationships among sires again decreased the estimate of heritability but increased the estimate of v 2 . The least accurate estimate of QTL position was obtained under the biallelic model with 2v 2 = 0.125. Overall, and at least when relationships among sires were considered in the analyses, parameter estimates were not noticeably inferior to those obtained under the correponding normal-effects QTL models (table II), and likelihood ratios for the biallelic and normal-effects models were similar.
CONCLUSIONS REML analysis based on a mixed linear model with random QTL allelic effects, having a prior normal distribution, provided quite accurate estimates of QTL location and of the variance components, in particular of the QTL variance contribution. Data were generated under three different genetic models for the QTL, the biallelic, the multiallelic (ten alleles) and the normal-effects (two effects per founder drawn from independent and identical normal distributions) model.
While the normal-effects model is very similar to the analysis model, the biallelic model with gene frequency p and substitution effect a deviates the most from the model of analysis. In the biallelic model, 50% of the individuals are expected to be homozygous for a gene frequency of p = 0.5, and variance among daughters of a homozygous son is equal to while variance among daughters of heterozygous sons is higher by 0.25a 2 . Therefore, Thaller and Hoeschele (1996a,b) and Uimari et al (1996a) fitted two different residual variances of DYD when performing Bayesian analysis of linkage of a biallelic QT L . Despite the discrepancies between the biallelic model and the model of analysis, the REML analysis was quite robust to the number of alleles at the QTL, a result which is in agreement with findings of Xu and Atchley (1995); ie, polymorphism at the QTL did not strongly affect parameter estimates or hypothesis tests. This finding confirms the usefulness of the REML analysis as an alternative method of analysis which, although not nonparametric, requires fewer parametric assumptions (number of QTL alleles and their frequencies) than maximum likelihood and Bayesian analyses based on biallelic QTL models. Furthermore, REML is in general known to be quite robust to deviations from normality.
The REML analysis can be considered as an approximation to the Bayesian analysis of Hoeschele et al (1996) fitting a normal-effects QTL model. The Bayesian analysis has the advantage of also being able to fit a biallelic model, but for the designs considered here, it does not give more accurate parameter estimates than the REML analysis and requires several hours of computing time. Although the REML analysis fitting a single QTL requires only around 8 mi of CPU time, this requirement still prohibits the calculation of genome-wide significance thresholds for this method using data permutation (Churchill and Doerge, 1994), unless a number of less stringent significance levels are chosen to obtain threshold values and these are used to extrapolate to the desired significance threshold (Uimari et al, 1996b). Only the least-squares method allows direct computation of genomewide significance thresholds from a large number of permutations (eg, 10 000 to 100000).
The REML analysis has been implemented in the Fortran program SMREML, which is available from the authors upon request. The program is currently being extended to fit two linked QTLs per chromosome, rather than using the approach of Xu and Atchley (1995) fitting the variances associated with the next-to-flanking markers to account for additional, linked QTLs. Their approach is only approximate as effects associated with marker alleles identified within founders erode over generations. Furthermore it requires many additional parameters when the marker polymorphism is limited, causing the flanking and next-to-flanking markers to differ among families. In the near future, the program will be extended to other designs (eg, full-sibships within half sibships), where the current computation of the inverse of the variance-covariance matrix becomes approximate due to uncertain linkage phases in parents of final offspring, and other ways of computing this inverse exactly (eg, Van Arendonk et al, 1994) will be implemented.