Regression on markers with uncertain allele transmission for QTL mapping in half-sib designs

Recently, regression of phenotype on marker genotypes was described for quantitative trait loci (QTL) mapping in F populations and shown to be equivalent to regression interval mapping (RIM). In this study, regression on markers was extended to half-sib designs with uncertain marker allele transmission, and properties of QTL parameters were examined analytically. In this method, offspring phenotypes are first regressed on the probability of transmission of a given allele from the common parent at flanking marker loci. Resulting regression coefficients can then be interpreted based on an assumed genetic model. With presence of a single QTL in the marker interval, it was shown that expected values of regression coefficients for the flanking markers contained all information about position and effect of the QTL and were independent of the probability of marker allele transmission. Through simulation, it was shown that regression of phenotype on marker allele transmission probabilities is equivalent to RIM under the same assumed genetic model. Regression on marker genotypes is computationally less time consuming than QTL interval mapping, as it eliminates the need to search for the best QTL position across marker intervals. This can form the basis for more efficient methods of analysis with more complex models, including threshold or logistic models for the analysis of categorical traits. © Inra/Elsevier, Paris genetic marker / QTL mapping / half-sib design

régression sur les marqueurs a été étendue aux schémas demi-frères avec transmission incertaine des allèles aux marqueurs et les propriétés des paramètres concernant les (aTLs ont été examinées analytiquement. Dans cette méthode, les phénotypes de la descendance ont été d'abord régressés sur la probabilité de transmission d'un allèle donné issu du parent commun à des loci de marqueurs flanquants. Les coefficients de régression résultant peuvent alors être interprétés à partir d'un modèle génétique supposé. En présence d'un seul QTL par intervalle de marqueurs, on a montré que les valeurs espérées des coefficients de régression pour les marqueurs flanquants contenaient toute l'information à propos de la position et de l'effet du QTL, et étaient indépendantes de la probabilité de transmission des allèles aux marqueurs. Par simulation, on a montré que la régression du phénotype sur la probabilité de transmission des allèles aux marqueurs est équivalente au RIM avec le même modèle génétique supposé. La régression sur les génotypes aux marqueurs demande moins de temps de calcul que la détection de (aTLs par intervalle, parce qu'éliminant la nécessité de chercher la meilleure position pour le QTL dans les intervalles entre marqueurs. Ceci peut former la base de méthodes plus efficaces avec des modèles plus complexes, incluant les modèles à seuils ou logistiques pour l'analyse des variables discrètes. &copy; Inra/Elsevier, Paris marqueur génétique / détection de QTL / schéma demi-frères 1. INTRODUCTION Identification and mapping of genes affecting quantitative traits, so-called quantitative trait loci or QTL, based on genetic markers has gained much importance in animal and plant genetics in recent years. The main goal behind identifying and mapping QTL is to accelerate genetic progress with the use of information on identified QTL (e.g. [9]). Earlier studies used a single marker approach to detect QTL linked to a marker (e.g. !11!). Lander and Botstein [7] proposed a method to map QTL using two DNA markers that flank a genomic region (so-called interval mapping). Later studies (e.g. [5]) showed that the effect and position of a QTL are confounded in single marker methods and suggested the use of the interval mapping method of Lander and Botstein [7] to overcome this problem. Now, interval mapping of QTL is widely applied in livestock populations based on a variety of statistical methods.
Regression interval mapping (e.g. [3]; henceforth abbreviated to RIM) is based on a genetic model that assumes that a QTL is located in the marker interval. In RIM, phenotypic observations for the quantitative trait are regressed on the probability of offspring inheriting a given QTL allele from a common parent in half-sib designs (e.g. [6, 8, 12!) or from a given parental line in back cross and F 2 designs (e.g. [3]), conditional on a hypothetical position of the QTL in the marker interval. The analysis is repeated for a range of assumed locations of the QTL along the marker interval (grid search). Estimates from the location that gives the minimal residual sum of squares (RSS) are considered to be the best estimates. Wright and Mowers [14] proposed multiple regression on genetic markers to estimate QTL effect in F 2 designs, which will henceforth be referred to as marker regression mapping (MRM). In contrast to RIM, MRM does not require assumptions about a genetic model in the process of statistical analysis but phenotypic observations are regressed on variables that code which marker allele has been transmitted to offspring, instead of on the probability of the offspring inheriting a specific QTL allele given QTL position. The resulting estimates of regression coefficients on marker alleles can then be interpreted based on an assumed genetic model. In F 2 designs, Wright and Mowers [14] showed that the sum of partial regression coefficients on flanking markers provides an unbiased estimate of the effect of an additive QTL in the marker interval when interference is complete and when there are no QTL in adjoining marker intervals (isolated QTL). Without complete interference, however, some bias is introduced.
Whittaker et al. [13] showed that the information contained in the regression coefficients on flanking markers in F 2 and back-cross designs is in fact equivalent to that provided by the conventional regression interval mapping of Haley and Knott [3]; with no interference, estimates of QTL position and effect equivalent to those obtained from RIM can be derived as non-linear functions of regression coefficients on flanking markers. Whittaker et al. [13] considered two situations for multiple marker, multiple QTL models: first, isolated QTL, where a marker interval containing a single QTL is flanked by marker intervals devoid of QTL and second, non-isolated QTL, where flanking marker intervals also contain QTL. They showed that, with no interference, expected regression coefficients from a multi-marker multi-QTL model are equivalent to expected regression coefficients from a two-marker single QTL model for markers that flank an isolated QTL. Specifically, Whittaker et al. [13] showed that the partial regression coefficients for markers that flank an isolated QTL depend only on the effects of the QTL in that interval and not on effects at other QTL, as effects of those QTL are accounted for by simultaneous fitting of markers external to the interval. For non-isolated QTL, Whittaker et al. [13] showed that it is impossible to uniquely map two additive QTL in adjoining intervals but that it is possible to map non-isolated QTL if at least one QTL has non-additive effects. The main advantage of MRM for QTL mapping is that estimates are obtained from a single simple linear regression analysis on markers and there is no need for a grid search as in RIM. [14] and Whittaker et al. [13] assumed that transmission of marker alleles from parent to offspring was known with certainty, which is often not the case in half-sib designs. Also, in F 2 or backcrosses between outbred lines, transmission of marker alleles from parental lines may not be known with certainty (4!. In such situations, only a probability statement can be made about marker allele transmission from the parent to progeny. Progenies with incomplete marker information must be included in the statistical analysis to increase the statistical power and reduce bias and standard errors of estimates [12]. The objective of this paper, therefore, was to extend the MRM method of Whittaker et al. [13] to QTL mapping in a half-sib family, with emphasis on uncertain marker allele transmission. Simulation was used to validate methods and to compare MRM to QTL mapping based on RIM.

The genetic and experimental model
A sire that is heterozygous at two marker loci, 1 and 2, that flank a biallelic QTL is considered. With sire genotype - the QTL is located with recombination rates r l and r z from marker loci 1 and 2, respectively. Rates r l and r z are unknown. The recombination rate between marker loci 1 and 2 is 0 and is assumed known. The Haldane mapping function [2] is assumed such that 0 = r l + r z -2r l rz.
The sire is randomly mated to n dams, resulting in n offspring. The sire transmits one of four marker haplotypes h j to its offspring with frequencies f (h!), where f (h!) is equal to (1 -B)/2 for marker haplotypes -M ll -M 21and -M 12 -M zz -, and equal to 0/2 for marker haplotypes -M ll -Mzzand -M lz -M zl -. Which marker haplotype is transmitted from the sire to progeny cannot always be determined with certainty, but depends on the marker haplotype the progeny received from its dam. The available marker information can, however, be used to compute probabilities of marker allele transmission from the sire to its progeny. The probability of a given paternal marker allele being present in the ith offspring, conditional on the marker information that is available for offspring i (S i ), is denoted as p(M lk ISi) for marker locus 1 and p(M 2t I S i ) for marker locus 2. Here, subscripts k (k = 1, 2) and (P = 1, 2) refer to the paternal marker alleles at marker loci 1 and 2, respectively. The sources of marker information included in S i could include, besides the known recombination rate between markers, 0, marker genotypes for the flanking markers and possibly other markers on the offspring (g i ), its sire (M s ), its dam (M d ), and other relatives.
2.2. Expected phenotypic value of marker haplotypes 2.2.1. Known marker haplotype transmission When marker allele transmission from the sire to offspring can be determined unequivocally, the expected value of offspring phenotype given that the offspring received the jth sire marker haplotype can be derived under an assumed genetic model of one QTL in the marker bracket, based on the probability that the paternal marker haplotype carries the Q, or Q 2 allele. The expected value of offspring phenotype given marker haplotype h! is transmitted by the sire can be derived as Here, E(y!h!) is the expected value of offspring phenotype given paternal marker haplotype h!, w j is the probability that the offspring received the Q, allele from the sire conditional on inheritance of paternal marker haplotype h j , and a is the allele substitution effect at the QTL !1!. Conditional probability w j can be derived as w j = f (Q l , h! )/ f (h! ) where f (Q 1 , h j ) is the joint probability of paternal transmission of the Q, allele and marker haplotype h!. Equations for f (Q l , h! ), f (hj) and w j are given in table L

Unknown marker haplotype transmission
If the paternal marker haplotype transmission is not known with certainty, transmission probabilities can be computed for each paternal marker haplotype based on the marker information that is available for offspring i (S i ). These probabilities, which are denoted as p(h j IS i ) can then be used to derive the expected value of the ith offspring phenotype, as shown below.
With no interference, p(h!!Si) is the product of conditional probabilities for paternal allele transmission at each marker locus: where k and are appropriately determined by h j .
The expected value of the phenotype of offspring i is then obtained as a weighted sum of the expected value of each of the four possible haplotypes, E!y!!h!)! as: Based on the rules of probability when conditioning on the same source of information S i , it can be shown that Note that probabilities p(M ik [ Sz ) and p(M 2RI S i ) are both dependent on each others' information (M lk and M 2R ) which is included in S i . Also, note that when probabilities p(Mlk!Si) and p(M 2RI S i ) are equal to 0 or 1, i.e. when sire marker allele transmission is known, then E(y2!Si) = E(y2!h!).

Expected values from regression on flanking markers
Using the expected values for phenotypes of offspring with known and unknown paternal marker haplotype transmission, as derived above, the expected values of coefficients of regression of phenotype on marker allele probabilities can be derived as shown below.
Let p(Mii [Sz) =p i2 and p(M21 [Sz) = P 2 i-The model for regressing phenotype on marker allele transmission probabilities is where y 2 is the phenotype of offspring i, (3 0 is the overall mean, (3 1 is the regression coefficient on marker 1, fl 2 is the regression coefficient on marker 2, e i is the error term for the ith offspring and all other terms are as described earlier. In matrix notation, the MRM model can be written as Y = P (3 + e, where Y is a vector of observations on n offspring with size n x 1, P is a matrix of size n x 3, and /3 is of size 3 x 1 with 0 = ( ( 30 !31 / ? 2/ . When phenotypic observations are adjusted for the mean genetic values of parents and for all other systematic environmental effects, the expectation of an observation y 2 , with marker information S i , is equal to .E'(t/!5't), which can be calculated using equation (3). Based on equation (3), the expectation of the vector of adjusted observations y can be written as a product of two matrices: E(y) = Hw where H is a matrix of haplotype transmission probabilities of size n x 4 and w is a 4 x 1 vector with haplotype coefficients w. Based on equation (2), haplotype transmission probabilities, p(h!!Si) can be written in terms of p(Ml!S2) = p l i and p(M21 !Si) = P2 i. Equations for E(y) are: Matrix P is given as, Expected values of the regression coefficients can be derived based on Derivations for E(j) in equation (7) are given in Appendix I. The resulting elements in !(/3), after simplification, can be shown to be independent of the paternal marker allele transmission probabilities as Substituting formulas from table I for w j in equation (8), it can be shown that the regression coefficients are equal to Equation (9) proves that E( f J) depends only on the coefficients w j and is independent of marker allele transmission probabilities p(M11!5'2) and P (M 21I Si) ' In other words, -E( / 3) depends only on contrasts between sire marker alleles M ll and M 12 for locus M 1 and between alleles M 21 and M 22 for locus M 2 . The expectations of marker regression coefficients are identical to those found by Whittaker et al. [13] for F 2 designs but are shown here to apply also for half-sib family designs and with uncertain marker haplotype transmission. An alternative proof is also given in Appendix II.

QTL location and its effect
The estimates of the partial regression coefficients f J 1 and j 2 (equation 9) contain all information to determine the position of a QTL that is flanked by markers M 1 and M2_. The absolute value of E(iJ 1 ) will be greater than the absolute value of E(!2) if the QTL is located closer to marker M i , and smaller if the QTL is located closer to marker M 2 . If the QTL is located at the centre of the interval, we would expect E( ( 3 1 ) and E(/? 2 ) to be equal. The relative size of the estimates of the regression coefficients /3 1 and /3 2 leads us to determine the QTL position r i . As shown by Whittaker et al. !1_3!, estimates of QTL location and QTL effect can be obtained by writing E((3 I ) and E(/3 2 ) as a ratio and solving for r i , knowing that r 1 E (0, 0.5).
Following Whittaker et al. [13], the estimate of QTL location (r l ) is given as Once the QTL location has been estimated, !31 and fl 2 can be equated to their expectation, replacing r l with r l and solving for a. Following Whittaker et al. !13!, a is obtained from Note that a solution to equation (10) only exists if !1 and fl 2 have the same sign. If (3 1 and (3 2 have opposite signs, the solution for r l is undefined with respect to presence of a single QTL within the marker interval. If Øl and j2 have the same sign, an estimate of a can be obtained from equation (11) as ,jâ 2 . If !31 and f l 2 have opposite signs, the solution for a is undefined. When a solution for r, exists, the sign of a can be determined, based on the signs of /3, and /? 2' The sign for a will be negative if ( 3 1 and $ 2 are both negative and positive if (3 1 and $ 2 are both positive.

Validation
In the previous section, it was proven analytically that the expectation of the partial regression coefficients are invariable to transmission probabilities. In this section, the analytical proof will be validated by simulation. A single sire family with 100 half-sib progeny was simulated. The recombination rate between QTL and the left marker, r l , was 0.3 and between flanking markers, B, was 0.4. Expectations of offspring phenotypes given paternal marker haplotype, E(y!h!) were then calculated using equation (1). The WjS needed for the computation of E(y!h!) were obtained from substituting r l = 0.3, r 2 = (0-r l )/(1-2r,) = 0.25 and B = 0.4 in the formulas for Wj in table I. They were: w l = 0.87500, W2 = 0.43750, w 3 = 0.56250 and w 4 = 0.12500. To ensure generality, each offspring was randomly assigned a value for the probability that it received alleles M n (p(M n )) and M 21 (p(M 2I )) from the sire based on random draws from a uniform (0,1) distribution. Based on these probabilities, expectations of offspring phenotypes E(y 2 ) were simulated using equation (3). Observations were then regressed on sire marker allele probabilities using model [4]. The resulting regression coefficients (from a single replicate) were / 3 i = 0.3125 and j2 = 0.4375, which is identical to results obtained when substituting r l = 0.3, r z = 0.25 and 0 = 0.4 in the formula for E(/!1) and E(fj 2 ) in equation (9 To compare MRM with RIM for QTL mapping, a single sire family with 500 offspring was simulated. The genome of the sire carried a pair of homologous chromosomes with two biallelic markers with a spacing of 20 cM. A QTL was simulated at 5, 10 or 15 cM from the left marker, which corresponds to recombination rates of 0.04758, 0.09063 and 0.12959 with the left marker. The sire was heterozygous at both marker loci and at the QTL, denoted as -Mn -Q l -M 21 -/ -M 12 -Q 2 -Mzz-. Marker-QTL (MQTL) haplotypes produced by this sire were sampled according to their expected frequencies of transmission. Maternal marker haplotypes were sampled based on population frequencies for M ll and M 2i The marker genotype of each offspring was generated by combining paternal MQTL with the maternal marker haplotype.
Phenotypic values of offspring were generated using the following model where y i is the phenotypic observation on the ith offspring, u is the sire's polygenic effect, q i is the effect of the paternal QTL allele (Q l or Q 2 ) inherited by offspring i, and e i is a random residual. Residuals were sampled from N[O, a! -(0.25 Q a + 0.5a!TL)], where a § is the phenotypic variance, Q a is the polygenic variance and o, QT L 2 is the QTL variance in the dam population, which was based on equal frequencies for the two QTL alleles among dams. A total heritability of 0.25, including the QTL effect, was used. The QTL substitution effect, a, was 0.4!!,. A total of 1000 data sets was simulated for each QTL position. Each data set was analysed by MRM and RIM.  (10) and (11), respectively. Whittaker et al. [13] suggested that estimates of regression coefficients with opposite signs could result when i) the data do not support the presence of a single QTL in the marker interval, ii) the data support the presence of two QTL with opposite signs in the interval, and iii) the data suggest that a QTL is located outside the marker bracket. With regard to possibility iii), if the QTL is estimated to be outside marker 1, R l will have a greater absolute value than /3 2 . Similarly, if the QTL is estimated to be outside marker 2, j 2 is expected to have a greater absolute value than /3 1 , When data suggest that a QTL is outside the marker bracket, the estimate of r l by MRM will be negative or greater than 0 or be undefined. In this situation, RIM would show minimum RSS at one of the marker loci because the search with RIM is limited to the marker bracket.
Based on the above and to allow comparison of results from MRM with results from RIM, the QTL was positioned at one of the markers based on the largest absolute value of /3 1 and 0 2 when regression coefficients from MRM had opposite signs: the QTL was located at M 1 if 113 11 ! 10 21 and at M 2 if 10 11 < 1,6 21 . The estimate of the QTL effect was obtained as J I& 2 1 based on equation (11). Note that this approach was applied only if regression coefficients had opposite signs in a given replicate. Forcing the QTL to lie at one of the markers is analogous to RIM, for which the QTL is located at a marker when the estimate of location falls outside the marker bracket.

2.6.!.3. Test of significance for presence of a QTL
For MRM, a likelihood ratio (LR) test statistic was obtained as for RIM by computing: where n is the total number of offspring in the half-sib family, R6'5' red is the residual sum of squares when fitting only an overall mean and Rss fun is the residual sum of squares when the full model was fitted (equation (4)). For  and j 2 were as expected for a QTL that is located in the centre of the marker bracket (10 cM). For other QTL locations (5 and 15 cM), the marker that is closer to the QTL has a greater value for regression coefficient than the other marker.
Empirical means and standard deviations of estimates of QTL position and effect from MRM and RIM are given in table III. Estimates of QTL location from MRM and RIM were not significantly different and had a correlation close to unity (0.999) for all situations. Both RIM and MRM resulted in unbiased estimates of r l when the QTL was located at the centre of the marker bracket but were significantly biased towards the centre of the marker bracket when the true QTL location was off centre (5 and 15 cM). This bias is as expected, because we are forcing estimates to lie within the interval, in which there is more room for error to the right (or left) of the true location, resulting in the observed bias. For MRM, 38, 33 and 38 % of replicates had estimates of marker regression coefficients with opposite signs when the QTL was located at 5, 10 and 15 cM, respectively. For RIM, the estimate of QTL position was at a marker for 40, 35 and 40 % of replicates, for QTL positions of 5, 10 and 15 cM, respectively. This indicates that MRM and RIM have similar frequencies of locating the QTL within the marker bracket. Estimates of QTL effects did not significantly differ between RIM and MRM and had correlations equal to 0.969, 0.980 and 0.970, for QTL located at 5, 10 and 15 cM, respectively. Estimates of QTL effects were unbiased for both RIM and MRM.

Significance threshold values and power
Values of the LR test statistic were very similar for RIM and MRM under the alternate hypothesis and had correlations of 0.993, 0.997 and 0.996 for QTL located at 5, 10 and 15 cM, respectively. Two sets of empirical significance threshold values were determined for RIM and MRM for each simulated QTL location: the first set (unrestricted) was derived from 10 000 replicates under the null hypothesis irrespective of existence of a solution for QTL position under MRM. The second set of significance thresholds (restricted) was determined only from replicates for which estimates of QTL position and effect existed under MRM. The purpose of this restriction was to limit analyses to replicates for which the estimates of QTL position was inside the marker interval. To obtain the restricted significance thresholds, 50 000 replicates were run, of which only 9 765, 9 750 and 9 803 had useable solutions for QTL located at 5, 10 and 15 cM, respectively. This is as expected because data sets under the null hypothesis are simulated with no QTL in the marker interval. Significance threshold values for RIM were obtained from the same replicates as used for MRM. Resulting threshold values are given in  1.000 when based on the restricted data sets. The empirical power to detect the QTL was also calculated based on the two sets of significance threshold values and are given in table V. The power of RIM and MRM was significantly different when based on either unrestricted or restricted significance threshold values, except for the restricted threshold values when the QTL was at the centre of the marker bracket (10 cM).
When power was computed only from replicates for which estimates of QTL position existed with MRM (620, 670 and 620 of 1 000 replicates when the QTL was located at 5, 10 and 15 cM, respectively), the power of RIM and MRM was not significantly different for any QTL location.

DISCUSSION
In this study, the method of multiple regression of phenotype on marker genotypes for QTL mapping in F 2 populations [13] was extended to a half-sib family design.
In contrast to Wright and Mowers [14] and Whittaker et al. [13], offspring with complete and incomplete marker information on paternal marker allele transmission were included in the analysis. Inclusion of offspring with incomplete marker information in QTL mapping results in higher statistical power and lower standard errors and bias of estimates of QTL location and QTL effect !12!.
It was shown that regression coefficients and hence the resulting estimates of QTL parameters did not depend on transmission probabilities. The regression coefficients only depended on contrasts between marker haplotype class means under known marker haplotype transmission. Although, results from this study focused on half-sib designs, uncertainty of marker allele transmission can also apply to F 2 and backcross designs that involve outbred lines and to QTL mapping with markers of limited polymorphism.
Although MRM and RIM are essentially equivalent, the two methods resulted in different test statistics under the null and alternate hypothesis and, therefore, had different power to detect a QTL (table T!. These differences were found to be caused by the fact that MRM does not restrict the test for the QTL to within the marker interval. Rather, the test is for a QTL anywhere on the chromosome. Furthermore, MRM does not make assumptions on the genetic model in the process of analysis and any effects that are present in the data, even if they do not conform with a genetic model of one QTL within the marker bracket, are picked up by the regression coefficients. The RIM, on the other hand, assumes a genetic model (usually of one QTL within the marker bracket) and, in the present study, searches for the QTL only within the marker bracket; if data indicated a QTL outside the marker bracket, the QTL was mapped to one of the markers. To compare results from RIM and MRM on an equivalent basis, MRM estimates of location outside the marker interval were forced to be at the nearest marker (table III). This was used to illustrate that MRM and RIM are equal when the search is restricted to between the two flanking markers: RIM and MRM had similar LR test statistics under the null and alternate hypotheses (correlation of 1.000, table V) and identical power (not shown). An alternate way of comparing these methods would be to also search for the QTL outside the interval by fitting markers outside the marker bracket under study. In this case, MRM and RIM are expected to give identical results.
One advantage of RIM over MRM, is that the LR test statistic (or RSS) will be continuous across marker intervals, and can be used to provide a graphical representation of the location of the likelihood which can, therefore, be used as a 'confidence region'.
Empirical thresholds for MRM were similar to standard Chi-square values with two degrees of freedom. Empirical thresholds for MRM were not affected by exclusion of replicates for which a solution for QTL position did not exist (table 7V). Empirical threshold values for RIM were intermediate to Chi-square values with one and two degrees of freedom when computed from all replicates (unrestricted) but were close to Chi-square values with two degrees of freedom when computed from replicates for which a solution for QTL position existed with MRM (restricted). This raises the question on the number of degrees of freedom that are available for interval versus marker regression mapping methods in relation to the number of parameters that are estimated. Note that for MRM two parameters are estimated (two regression coefficients). Accordingly, significance thresholds were similar to Chi-square table values with two degrees of freedom. For RIM, two parameters are estimated (QTL position and QTL effect) if the QTL is mapped to between the two markers, but only one parameter is estimated if the data suggest the QTL is outside the marker bracket. In the later case, the QTL is mapped to one of the markers. In order to test the existence of such a mixture distribution of the LR test statistics for RIM, 10 000 replicates were generated under the null hypothesis and threshold values were determined based on replicates in which the QTL was mapped outside versus inside the marker bracket (8 216 versus 1 784 replicates, respectively). When the QTL was mapped outside the marker bracket, 1 and 5 % significance threshold values (based on 8 216 replicates) were 7.05 and 4.34, respectively, which were slightly higher than Chi-square table values with one degree of freedom (6.83 and 3.84). When the QTL was mapped inside the marker bracket, 1 and 5 % significance threshold values (based on 1 784 replicates) were 8.58 and 5.93, respectively, which were slightly less than Chi-square table values with two degrees of freedom (9.21 and 5.99). Therefore, the differences in threshold values between RIM and MRM were due to differences in treatment of QTL fitted outside the marker bracket. As mentioned earlier, RIM and MRM may yield similar results when fitting more markers and searching for a QTL among marker brackets on the chromosome.
When regression is performed on multiple markers, MRM amounts to standard multiple regression, as described by Wright and Mowers [14]. With no interference, only marker brackets which contain a QTL are expected to give non-zero regression coefficients and those that are devoid of QTL are expected to give zero regression coefficients. For multiple QTL located on the same chromosome, results from a two-marker single QTL model is equivalent to a multi-marker multi-QTL model when QTL are isolated, as shown by Whittaker et al. !13!. That is, if a second QTL exists on the same chromosome, its effect on the expected regression coefficients from the two-marker single QTL model, can be removed by fitting a conditional regression on a marker positioned outside the interval but between the interval and the second QTL.
The same procedure also applies to RIM (13!. When multiple QTL are located within the same marker interval, no unique and independent estimates of QTL parameters can be obtained with RIM or MRM [13] and possibly with other statistical methods. In such cases, regression coefficients would simply relate to some weighted average of QTL effects and positions for both RIM and MRM. The MRM studied here was for a single sire family. There are difficulties associated with extension of this method to QTL mapping in a multi-family half-sib design, as studied by, for example, Knott et al. [6] and Liu and Dekkers !8!. In a multi-family analysis with RIM, a nested regression is used with one unique estimate of QTL location but different QTL substitution effects for each sire. Although the MRM method can be extended to multiple families by nesting regression coefficients within family, each family will receive a separate estimate of QTL location and effect. This problem may be overcome by fitting markers as random effects and by expressing estimated variances at markers in terms of a genetic model of one QTL with multiple alleles.
The MRM method described in this study shows that information to map QTL is derived entirely from contrasts between marker-associated effects at flanking markers, regardless of uncertainty of marker allele transmission.
However, the uncertainty of marker transmission results in increased standard error for the regression coefficients. This study has provided further insight into properties of the test statistic for RIM. Specifically, results illustrate that the difference between empirical and table threshold values is not due to multiple testing within the marker interval but results from a mixture of fitting one (when the QTL is positioned outside the marker bracket) and two parameters.
The computational efficiency of MRM over RIM may be of little importance for least-square analyses because the computational demands of RIM are already limited. The same principle of regression on markers can, however, also be applied to other types of models, for example threshold and other non-linear models, for which computing time is of importance.
In general, the marker regression method can be applied to QTL mapping studies where the RIM is considered to be the method of choice. Because of the simplicity of the MRM method, initial screening of marker data can be performed with this method to identify regions displaying QTL activity before adopting advanced statistical methods such as maximum likelihood, generalized linear mixed models, non-parametric or Bayesian methods. Once potential QTL regions are identified we can either choose to adopt advanced methods focused on those genomic regions or simply interpret the regression coefficients based on a genetic model.