Alternative models for QTL detection in livestock. II. Likelihood approximations and sire marker genotype estimations

In this paper, we compare four different methods of dealing with the unknown linkage phase of sire markers which occurs in the detection of quantitative trait loci (QTL) in a half-sib family structure when no information is available on grandparents. The methods are compared by considering a Gaussian approximation of the progeny likelihood instead of the mixture likelihood. In the first simulation study, the properties of the Gaussian model and of the mixture model were investigated, using the simplest method for sire gamete reconstruction. Both models lead to comparable results as regards the test power but the mean square error of sib QTL effect estimates was larger for the Gaussian likelihood than for the mixture likelihood, especially for maps with widely spaced markers. The second simulation study revealed that the simplest method for sire marker genotype estimation was as powerful as complicated methods and that the method including all the possible sire marker genotypes was never the most powerful. © Inra/Elsevier, Paris half-sib family / QTL detection / unknown linkage phase / Gaussian approxi- mation / log-likelihood ratio test


INTRODUCTION
The present paper deals with the detection of one QTL in half-sib families when no information is available on grandparents.
A general form of the likelihood of detecting QTL in simple pedigree structures such as half-sib or full-sib families when marker information is available on progeny, parents and grandparents was presented by Elsen et al. !2!.
This likelihood is a two-level mixture distribution with different possible sire marker genotypes given marker information, and different possible progeny QTL genotypes given sire marker genotype and offspring marker information. This paper describes simulations carried out to compare simplified likelihoods.
As an alternative to the mixture approach, we suggest simplifying the likelihood by considering only one sire marker genotype. Three solutions were explored: the first one, close to the Knott et al. proposal !7!, is the likelihood of quantitative phenotypes conditional on the most probable sire marker genotype given marker information, while in the others, the sire marker genotype is treated as a fixed effect, estimating the likelihood of the quantitative trait observation conditionally or jointly with the sire marker genotype. These comparisons were performed on a simplified form of the likelihood with regard to the mixture of the progeny QTL genotypes. This simplified likelihood is the one used in interval mapping by linear regression [5,8] but instead of least squares tests as in the above papers, maximum log-likelihood ratio tests were used. The properties of this simplification are described in the first part of the paper, using the likelihood of the quantitative phenotypes conditional on the most probable sire marker genotype given marker information. Let hs, p xl , p x 2 denote the vectors of sire marker genotypes hsj and of phenotypic means of trait distribution !Z 1, pi2. Let A o be the likelihood under the null hypothesis that no QTL is segregating in the pedigree where !.i is the phenotypic mean of sire i offspring. Let p be the vector of p i .

Test statistics
The general form of the likelihood presented by Elsen et al. [2] is That leads to the maximum log-likelihood ratio test Full maximum likelihood for this type of likelihood requires a lot of computation because the number of possible sire marker genotypes hs i , in the first summation, grows exponentially with the number of informative markers per sire. Table II presents for T and the other tests proposed in this paper, the CPU time needed for one simulation. Although our program could certainly be optimized, these results show that computing T test is possible for one data set but cannot reasonably be considered for simulations; simulations that are generally needed to obtain significant thresholds.
A natural way of dealing with this difficulty is to work in two steps: in the first step a probable marker genotype for each sire is estimated and in the second step the part of the likelihood corresponding only to these probable marker genotypes is maximized.
A possible estimate for the sire marker genotypes, very close to the sire gamete reconstruction proposed by Knott et al. [7] may be based on Let hs be the vector of estimated sire marker genotypes. For the second step, the likelihood is reduced to In order to simplify the maximization step, the mixture of distributions in progeny can be approximated by a normal distribution with expectation equal to the expectation of the mixture. Then a linear model is obtained at each position x along the chromosome. Let Ãx,hs denote this simplified likelihood equal to A simulation study was carried out to compare the power of QTL detection, using maximum log-likelihood ratio tests, T' and T 2 where

Simulation results
Sire designs with 20 sire families of 50 or 20 descendants per sire were simulated. The linkage group comprised three or eleven equally spaced markers, each with two alleles segregating at equal frequency in the population. Polygenic heritability was fixed at 0.2 and residual variability at l. The power studies were based on a QTL with two alleles at equal frequency, located either at 5 or 35 cM from one end of the linkage group with additive effect equal either to 0.5 or to 1 and no dominance.

Threshold and power
The null distributions of the test statistics were estimated simulating data sets with polygenic effects corresponding to the heritability value used in the simulation model. Significant thresholds for T l and T 2 are shown in table III. The largest difference between the test powers, shown in table IV, was observed for a 20 half-sib progeny design, an 11 marker map and a QTL located at 35 cM with an additive effect equal to 1. In this situation, a gain of about 10 % was obtained with the mixture likelihood as compared to the Gaussian likelihood. However, other cases did not show large differences and either the first or the second test may be the most powerful depending on the case studied.
In the back-cross design, these tests have been proven to be asymptotically equivalent when the QTL effect is small !9!. In order to limit computing time the Gaussian approximation only will be considered in the second part of this paper and in its companion paper !4!. Methods and simulation results given with the Gaussian approximation may be extended to include a mixture of distributions.

Parameter estimates
Despite power results that were quite similar for both methods, it is worthwhile comparing parameter estimates for the QTL location and sib QTL effect.
Mean estimates of position and of empirical standard deviation of the position estimate are shown in table V. Obviously, due to the fact that the position estimate is constrained in order to belong to the chromosome, its bias was found to be more important for a QTL located at the beginning of the chromosome than for a QTL located near the middle of the chromosome, but both methods gave similar bias. Standard deviations of the position estimates were slightly larger for a Gaussian likelihood than for a mixture likelihood for the more widely spaced marker map but they were comparable for the other map studied.
Mean square errors of the within half-sib QTL substitution effect are shown in table VI. ! As the bias of az is small (data not shown), the mean square error is closely related to Results for the Gaussian likelihood in the 11 equally spaced marker maps may be explained by considering the idealized case where the QTL position is known and located on a marker and for which all sires are heterozygous for this marker. The variance of a i depends only on the number of informative descendants per sire. For a marker with two alleles at equal frequency, the number of informative descendants is roughly n i/ 2 and the variance of ai is then 8/n i times the residual variance. For 50 (respectively 20) descendants per sire and a residual variance equal to 1, a 0.16 (respectively 0.4) mean square error is expected in the idealized case. The unknown QTL position, the distance between the QTL position and heterozygous markers for sire, the unknown sire marker genotypes and the overestimation of the residual variance when the additive QTL effect is great [10] explain the increase in the mean square error.
Results for the Gaussian likelihood in the three equally spaced marker maps may be explained considering a second idealized case where the QTL is known to be located at the beginning of the chromosome. As only sires heterozygous at least at one marker are considered, three cases of sires (c i , c 2 , c 3 ) exist with different variance of ai . c l contains sires that are heterozygous for the first marker, c 2 those that are homozygous for the first marker and heterozygous for the second one, and c 3 those that are heterozygous only for the last marker.
The proportion of sires in the three classes are about 4/7, 2/7 and 1/7. The variance of 3f i for sires in the class c i is about where r Ci denotes the recombination rate between the first marker heterozygous in the class c i and the QTL located at the beginning of the chromosome. With 50 descendants per sire (respectively 20) and a residual variance equal to 1, a 1.7 (respectively 4.2) mean square error is expected. A more favourable location of the QTL (near the middle of the chromosome) decreases the mean square error.
The estimation of the within half-sib QTL substitution effect with the mixture likelihood does not only use the mean difference between informative descendants carrying allele A at a marker and those carrying allele B, but takes advantage of information from higher moments of the mixture distribution. Even if this information becomes negligible when the number of descendants per sire is large, in a finite population and especially for a widely spaced maker map, it leads to a significant reduction of the mean square error.

OTHER METHODS TO DEAL WITH UNKNOWN SIRE MARKER GENOTYPES
Errors in sire gamete reconstruction can decrease the power of both methods.
Knott et al. [7] found that in their worst situation only 6 % of informative sires were incorrectly reconstructed, but they had studied large half-sib families with 100 descendants per sire. Table VII shows, for one male, the empirical probability of correct reconstruction based on hs i over 1 000 replications. We confirm a 6 % maximum error in large families but found up to 30 % errors in smaller families, which led us to study alternative methods.
The rationale of the following alternative methods is that their aim is not to improve the quality of sire gamete reconstructions but to increase the power of QTL detection. It is not necessary to work in two steps and the hs i marker genotypes can be treated as nuisance parameters.
3.1. Estimations of sire marker genotypes based on conditional likelihood of quantitative phenotypes The first alternative method is to treat the hs i parameters as fixed parameters in the likelihood of quantitative phenotypes given the marker information, rj i A!,hsi. The full maximum is obtained after a search on a continuous space for the QTL location and effect, within sire mean and variance parameters and on a discrete space for the sire marker genotype parameters. This leads, with the Gaussian approximation of the mixture in progeny, to estimating the sire marker genotypes by The maximum log-likelihood ratio test then gives 3 In practice, the three tests proposed should be slightly modified to take into account that the sire marker genotype space is growing exponentially with the number of informative markers per sire. This sire marker genotype space could be limited to genotypes that satisfy p(hs i I M i ) greater than a given value, fixed in the simulation study to 0.01.

Simulation results
Significant thresholds and powers for T', T', T 4 and T 5 are shown in tables VIII and IX. On the whole the compared tests gave very similar power for all of the situations studied, suggesting that the simplest method can be used, to avoid unnecessary computation. This similarity between tests may be attributed to the high percentage of correct sire gamete reconstruction. Only when markers were widely spaced and when family size was limited, did estimating sire marker genotypes on the weighted likelihood given the marker information lead to a slightly more powerful test.
joint likelihood was first used by Georges et al. [3] to map QTL in dairy cattle by considering only sire-by-sire analyses. Then Jansen et al. [6] computed the conditional likelihood and considered pooled sire analysis. As mentioned by Georges et al. !3), the problem with this likelihood is due to the fact that only the lit'i -/ i x2 can be estimated when there is no information on grandparents. Indeed, using the alternative parametrization p'[l = !;+o!/2, p'[2 2 = Pi -0:'[ /2 it has been proved (in the Appendix) that the sign of ai cannot be estimated. This is not important for an objective of QTL detection but shows the limit of this method, if an objective is to pursue QTL effect estimation simultaneously.
For all the methods studied, the empirical significance thresholds were obtained by simulations of sire and progeny marker genotypes and of the quantitative performances. In practice a permutation test [1] or a Monte-Carlo simulation taking account of the correct marker structure should be used. This could lead to slightly different threshold values. However, because very high correlations between tests were observed, we can guess that the conclusions concerning the different tests should not depend on the chosen significance threshold.
Modelling progeny quantitative observations with mixture distributions is a more computationally demanding approach than methods using Gaussian distributions. Previous studies [5,10] have compared in a single family the estimates obtained using mixture or Gaussian models. They concluded that the estimation accuracy is similar in both models, except for the residual variance when a QTL of large effect is mapped in a widely spaced marker map. Our study on multiple families showed that the accuracy of within half-sib QTL substitution effect estimates decreased significantly for the Gaussian model compared to the mixture model, especially in a more widely spaced marker map and even if the QTL effect was not large, although the test power and the accuracy of the QTL position remained comparable.
Our comparison of alternative methods for handling the problem of unknown sire marker linkage phases showed clearly that a simple method of reconstructing the sire genotype is almost as powerful as more complex methods, especially the one that takes into account all the possible sire marker genotypes since the T 5 test was never the most powerful test.