Use of the EM algorithm to detect QTL affecting multiple-traits in an across half-sib family analysis

QTL detection experiments in livestock species commonly use the half-sib design. Each male is mated to a number of females, each female producing a limited number of progeny. Analysis consists of attempting to detect associations between phenotype and genotype measured on the progeny. When family sizes are limiting experimenters may wish to incorporate as much information as possible into a single analysis. However, combining information across sires is problematic because of incomplete linkage disequilibrium between the markers and the QTL in the population. This study describes formulæ for obtaining MLEs via the expectation maximization (EM) algorithm for use in a multiple-trait, multiple-family analysis. A model specifying a QTL with only two alleles, and a common within sire error variance is assumed. Compared to single-family analyses, power can be improved up to fourfold with multi-family analyses. The accuracy and precision of QTL location estimates are also substantially improved. With small family sizes, the multi-family, multi-trait analyses reduce substantially, but not totally remove, biases in QTL effect estimates. In situations where multiple QTL alleles are segregating the multi-family analysis will average out the effects of the different QTL alleles.


INTRODUCTION
The paternal half-sib design remains a popular design for the mapping of quantitative trait loci (QTL) in livestock. Families can be easily generated from existing out-crossed populations and the results of the mapping experiment are likely to be more applicable to commercial populations. Designs that use crosses between divergent lines or breeds offer greater chance of detecting QTL. However there is the risk that favourable alleles are already at a high frequency in the target population. The basis for using a half-sib design can be summarised in Figure 1a, which shows the same region of a chromosome pair in three sires. The region contains a QTL and a genetic marker and represents only a small section of the genome that is to be scanned. A genetic map of moderate resolution is assumed, for example, a 10-20 cM map. Polymorphic markers such as microsatellites span each chromosome of interest. Markers are screened for their informativeness, i.e., there is a high fraction of progeny for which the allele inherited from the sire can be deemed as having derived from one paternal grandparent as opposed to the other. Up to six or more alleles can segregate in a population for a typical microsatellite, hence the marker in Figure 1a is denoted M i with i = 1 . . . 6. A difference between the phenotypic means of the two offspring groups inheriting the alternative marker alleles indicates that the marker alleles are linked to QTL alleles, and one QTL allele has an effect on phenotype clearly distinquishable from the effect of the other. QTL genotypes are unknown and the exact number of alleles cannot be ascertained. Hence in Figure 1a we have the situation of different QTL alleles linked to different marker alleles in each family. This presents no problem if QTL analyses are completed separately within each half-sib family.
For a fixed experimental resource, it is often preferable to test more small half-sib families than fewer large half-sib families. One reason is that there is greater chance of detecting a rare allele. In the case of a validation study significance tests should not be as stringent as for the initial genome scan. Hence smaller sample sizes suffice, allowing many more families to be screened. However because within sire analysis of small half-sib families provides limited power to detect QTL, experimenters may desire to incorporate information from all sires. Many statistical techniques are available to perform a joint analysis of multiple half-sib families. Linear regression described by Knott et al. [12] uses a model that fits a separate QTL effect for each sire, and a common error variance. This model allows multiple QTL alleles to be segregating in the population. Knott et al. [12] compared linear regression with maximum likelihood. The likelihood function assumed a biallelic QTL and two possible linkage phases in the population. Numerical methods were used to obtain maximum likelihood estimates (MLEs) of the QTL parameters. They found the two approaches gave similar power and estimates for QTL location. While linear regression [11] and maximum likelihood [9] have been extended to the analysis of multiple traits, there has been no extension of these techniques to the analysis of both multiple traits and multiple half-sib families.
Recently maximum likelihood (ML) has been extended to the analysis of more complex models. For example, Jansen et al. [8] describe a ML approach with the potential to analyse complex pedigrees such as multiple half-sib families with genetic ties among families. Likelihoods are optimised using a Monte Carlo expectation-maximization algorithm. Farnir et al. [3] describe likelihood functions for analysis of multiple half-sib families assuming linkage disequilibrium between markers and QTL at the population level. Likelihoods are optimised using quasi-Newton techniques. It is unknown how these optimisation techniques are equipped to handle the addition of multiple-traits to the models.
In this paper we describe an exact expectation-maximisation (EM) algorithm suitable for the maximum likelihood analysis of multiple-traits and multiple-families. Further, the benefits of a multi-trait, multi-family analysis, relative to simpler types of analyses are illustrated using computer simulation.

MODELS AND ASSUMPTIONS
This algorithm was designed for use in initial genome scans and follow up validation studies. In these situations genetic maps of sparse to moderate resolution are the norm and often mothers are not genotyped. Hence we do not consider likelihood functions that model linkage disequilibrium between the markers and the QTL at the population level. Such likelihood functions are more applicable in fine mapping studies. The half-sib families are assumed independent. To provide a workable model a biallelic QTL and two possible linkage phases are assumed. Given a fixed map position, markers are used to provide prior probabilities that offspring inherit one of two alternative QTL alleles, labelled Q 1 and Q 2 . Under these assumptions it becomes simpler to consider meiosis switches or inheritance states of the marker loci, rather than work with the marker genotypes. The meiosis switch states that the allele transmitted to the offspring due to a meoisis in the sire is either the sire's maternal or paternal allele. Ascertaining the meiosis switches in offspring for all marker loci on the paternal chromosome can be achieved using an algorithm such as the Lander-Green algorithm [13]. In two generational pedigrees such as the half-sib design, the actual phase of the alleles in progeny chromosomes (grandpaternal or grandmaternal origin) remain unknown. However, paternal haplotypes with phase choice having the highest likelihood can still be computed using the Lander-Green algorithm, which is limited to small pedigrees such as independent half-sib families. The chromosomal region in Figure 1b is now represented using different nomenclature to that in Figure 1a. Meiosis switches on marker alleles are represented by 1 (2) for maternal (paternal) origin. The three sires represent three of four possible scenarios: the linkage phase between the QTL and the marker in sire 1 is of the first type, arbitrarily denoted phase 1; the linkage phase in sire 2 is the reverse of that in sire 1 (phase 2); sire 3 is homozygous for the Q 1 allele. A sire can also be homozygous for the Q 2 allele, but this is not shown.
For the development of the following algorithm we will assume two flanking markers rather than a single marker. Flanking markers allow location along a chromosome to be inferred. In half-sib designs it is almost certain that for every individual not every marker can be scored for the allele inherited from the sire. If the flanking marker scores are incomplete for a given individual the typical approach is to use "offsets". That is, the nearest informative marker in one or both directions is taken.

MAXIMUM LIKELIHOOD VIA THE EM ALGORITHM
The following linear model is used to test for a QTL affecting n t traits, and located on an interval of markers m and m + 1 . . .
where y i jk is the observation for the kth trait on the jth progeny of the ith sire, µ ik is the mean of the i half-sib family for trait k and b k is the magnitude of the effect of the QTL on trait k. The b k are parameterised using the restriction where b * kl is the additive effect of allele l (l = 1,2) on the kth trait. The random error terms (e i jk ) include environmental variance and a genetic component due to the different polygenic contributions. With a half-sib design there is generally insufficient data to estimate the effects of the QTL allele inherited from the dam, hence this effect is also contained in the random error term. Random error terms are multivariate, normally distributed with a mean that is zero and are uncorrelated between individuals if full-sibs are not included, and correlated between traits recorded on the same individual. A common error variancecovariance matrix, Σ, is assumed for all half-sib groups and is defined as The variables z 111,i j , z 112,i j , etc., are indicator variables taking the value zero or one, where: section. The probabilities that the variables z 111,i j , z 121,i j , z 112,i j and z 122,i j are zero or one depend on: the unknown prior probability h that the sire is heterozygous; the probability that the sire is one of two equally likely possible linkage phases; and the p i j denoting the specified prior probabilities that the progeny has inherited the Q 1 allele, conditional on the genotypes of markers m and m + 1 and the position being tested (1 − p i j is the prior probability the progeny has inherited the Q 2 allele). For example, the prior probability that z 111,i j = 1 is equal to .5hp i j . Denoting the number of sires by n s , the number of progeny within in each half-sib group by n i , the observations for the ith half-sib group by a n i × n t matrix Y i , all µ ik within the ith half-sib group by the vector µ i , and finally all b k by the vector b, the likelihood function is then given by where φ(y i j ; µ i + b, Σ), φ(y i j ; µ i − b, Σ) and φ(y i j ; µ i , Σ) represent the multivariate normal density functions of the vector variable y i j (the jth row of the matrix Y i ) with means µ i + b, µ i − b and µ i , respectively, and covariance matrix Σ.
The univariate representation of this likelihood function has been compared to more complicated likelihood functions by Goffinet et al. [5]. Other likelihood functions that were considered modeled different QTL substitution effects and/or different residual variances for each sire. Generally they found that in terms of power there were no appreciable differences between the alternative formulations.

General formulae for obtaining the MLEs
The maximum likelihood estimate (MLE) of the vector θ = (h, µ, b, Σ) of unknown parameters is obtained as an appropriate solution of the likelihood equation Rather than working with (5) directly, we shall apply the EM algorithm, which is a general method of finding MLEs from a given data set when the data is incomplete or missing: see, for example, McLachlan and Krishnan [14]. The incomplete data are declared to be the unobservable indicator variables, z 111,i j , z 112,i j , z 121,i j , and z 122,i j as defined above. The following indicator variables are also declared to be incomplete data: if the ith sire is heterozygous and has phase 1; if the ith sire is heterozygous and has phase 2.
Assuming these indicator variables to be incomplete data is equivalent to assuming QTL transmission probabilities, heterozygosity and phase of sires to be missing. In this framework, the complete-data log likelihood that can be formed on the basis of the observable data and these incomplete data is (the sire is homozygous) are mutually exclusive, therefore you have that the logarithm of the sum is the sum of the logarithms, when passing from the log likelihood of (4) to the complete-data log likelihood. This point illustrates an important difference between standard parametric models and finite mixture distributions. A parametric model is identifiable if distinct values of the parameters determine distinct members of the parametric family. Identifiability for mixture distributions is defined slightly different in that distinct values of the parameters determine distinct members of the mixture family, allowing permutations of the component labels, i.e. the indicator variables; see, for example, McLachlan and Peel [15].
The algorithm proceeds by completing the maximization step or M-step first. At the tth iteration the estimate of the vector θ of unknown parameters is updated by the global maximizer of Q(θ; θ (t−1) ), which is the conditional expectation of the complete-data log likelihood function given the observed data, using θ (t−1) for θ. The updated estimates of the unknown parameters b, µ i (i = 1,. . .,n t ) and Σ so obtained can be given in closed form. Concerning the updating of the estimates of b, we have that for all k and where In the first iteration the following starting values can be used: To test for the existence of a QTL it will be necessary use a model with no QTL fitted. If this model is completed prior to completing a model with a QTL fitted the resulting half-sib family means can be conveniently used for µ (0) ik . At the tth iteration the means of each half-sib family are found using Next a solution is found for the common variance-covariance matrix in all halfsib families. If an element in the kth row and lth column of Σ is represented as σ kl then where N = n s i=1 n i . The expectation or E-step requires taking the conditional expectation of the complete-data log likelihood log L c (θ) given the observed data, using the current fit for the vector of unknown parameters. As log L c (θ) is linear in the unobservable indicator variables, the E-step is simply effected by replacing them by their conditional expectations given the observed data. As these indicator variables are zero-one variables, their conditional expectations are the posterior probabilities that they are equal to one; that is at the tth iteration The first step in computing the above posterior probabilities is to compute the posterior probability the ith sire is heterozygous at the QTL, which is denoted τ (t) 1,i . To compute τ (t) 1,i we require the probability that the ith sire is heterozygous, 1,i and the probability that the ith sire is homozygous, f (t) 2,i , at the tth iteration.
That is, In the first iteration a convenient starting value for h (0) is 0.5. A value for τ (t) 1,i is then found by normalizing. That is, The posterior probability at the tth iteration that the ith sire is homozygous at the QTL is Once the τ (t) 1,i , for i = 1, . . . , n s , have been computed, the MLE of h at the tth iteration can be found at this point: The next step is to compute the posterior probabilities the ith sire is heterozygous, and either phase 1 or phase 2: Finally, the posterior probabilities that the jth progeny received either Q 1 or Q 2 from the ith sire, who in turn is heterozygous and of phase 1, are then Likewise, the posterior probabilities that the jth progeny received either Q 1 or Q 2 from the ith sire, who in turn is heterozygous and of phase 2, are then This completes the E-step and a new iteration begins. Convergence is reached when the value of the difference in the log likelihood (4) between successive iterations is below a set threshold.
The EM equations have been described elsewhere for the single-trait, singlefamily maximum likelihood analysis [16] and the multiple-trait, single familiy maximum likelihood analysis [9]. The EM equations for the single-trait, multiple-family maximum likelihood analysis are simply the univariate representations of the formulae outlined above.

COMPARING ANALYSIS METHODS USING SIMULATED DATA
Two QTL detection experiments, denoted experiment A and experiment B were simulated in order to compare the multiple-trait, multiple-family analysis method with simpler methods. Experiment A was run under ideal conditions, in order to eliminate almost all sources of bias, thus demonstrating that the EM algorithm yields unbiased estimates under such conditions. Experiment B was run under less favourable, but more realistic conditions.
In both experiments six half-sib families were simulated with the same number of offspring. In experiments A and B the values of n i used were 200 and 25, respectively, for i = 1, . . . , 6. A single chromosome with six equally spaced markers and a single QTL was considered. The recombination fraction between each marker was 0.2. In experiment A, the single QTL was positioned in the middle of the third interval, in the middle of the chromosome. In experiment B, the single QTL was located in the last interval, and closest to the last marker. Using the Haldane mapping function these positions translated to distances of 63.9 and 122 cM from the first marker, for experiments A and B, respectively. Paternal allele statuses (1 or 2) for all markers and the QTL were assigned to each progeny. The status of the allele at the first marker was randomly assigned and the status of the allele at the second marker was the same as for the allele of the first marker with probability 1 − r, with r = 0.2. The process was repeated for each successive marker. However r was equal to 0.1127 when sampling the allele at the QTL and the fourth marker in experiment A and was equal to 0.167 and 0.05 when sampling the allele at the QTL and sixth marker, respectively, in experiment B.
In experiment A all progeny were informative at each marker. That is, the paternal allele statuses (1 or 2) of the marker alleles were revealed to the analysis. In actual half-sib experiments the determination of paternal allele status is rarely possible for all progeny, for any one marker. Often, one or more sires are homozygous for a particular marker, or, if a sire is heterozygous, the dam is also segregating for the same two alleles. Thus in experiment B paternal allele statuses (1 and 2) for 75% of the progeny were revealed to the analysis. These progeny were selected at random. The paternal allele statuses of the remaining 25% of the progeny were assumed unknown in the analysis. For each marker a different subset of progeny was selected to have their paternal allele statuses assumed known. The nearest informative markers were used to infer prior probability of inheriting the Q 1 allele in cases when a marker was non-informative. Two within sire error terms (e i j1 and e i j2 ), for traits 1 and 2, were assigned to each progeny of each sire and were drawn at random from a bivariate normal distribution with zero means and variances and correlation equal to the values in Table I. Sires A, B, C, D and E were segregating for the QTL. Sire F was not segregating. Sire families segregating for the QTL were assigned a phase of 1 or 2 as shown in Table I. For example, sire A has a phase of 1 which implies z 11,A = 1 and z 12,A = 0, whereas sire B has a phase of 2 which implies z 11,B = 0 and z 12,B = 1. Sires B and E had opposite linkage phase to sires A, C and D. Paternal allele status at the QTL for the jth progeny of the ith sire implies that only one of four possible indicator variables (z 111,i j , z 112,i j , z 121,i j , z 122,i j ) is non-zero and trait phenotypes for all progeny can be constructed using equations (1) and (2). The values for b 1 and b 2 are shown in Table I. The values are the same for each sire except sire D, which implies a third QTL allele is segregating in this particular sire. The proportion of the within sire variance due to the QTL was 20% for both traits, except for sire D where it was 10%. One hundred replicate populations containing six half-sib families as described in Table I were generated and analysed with the following interval mapping analysis methods: -single-family, single trait analyses for each half-sib family; -single-family, multi-trait analyses for each half-sib family; -multi-family, single trait analyses for the following combinations of families: ABC (one family has opposite phase); ABD (three alleles are actually segregating); ABE (one family has a larger variance); ABF (one family is not segregating for the QTL); -multi-family, multi-trait analyses for the above combinations of families.
Chromosomes transmitted to progeny from sires were tested at 22 analysis positions, spaced approximately 6 cM apart. Standard hypotheses were used for testing the existence of a QTL at each tested position. For single-trait analyses the hypotheses tested are (there is no QTL at that position) For multiple-trait analyses the hypotheses tested are the same as that outlined in Jiang and Zeng [9], that is H A : at least one element of the vector b is not zero.
To reject the null hypothesis a likelihood ratio test (LRT) statistic −2 log sup Θ O L(θ) sup Θ L(θ) was calculated, where Θ O and Θ are the restricted and unrestricted parameter spaces, respectively. Because classical distribution theory of the LRT statistic does not hold for testing homogeneity against mixture alternatives, the threshold value to reject the null hypothesis cannot be chosen from a χ 2 distribution. Instead empirical threshold values were obtained by permutation testing. Permutation testing was carried out using a variation on the method of Churchill and Doerge [2]. Rather than shuffle phenotypes between progeny, the approach used in the present study changed the identity of a progeny's paternally inherited QTL allele to its alternative, for a random selection of the progeny. For example, if p i j is the probability that progeny j inherited sire i's Q 1 allele, and m i j is a permuted value of p i j , then in each permutation, m i j = p i j with 50% probability, and m i j = 1 − p i j with 50% probability. In previous testing this method produced similar significance levels to the method of Churchill and Doerge [2]. A chromosome wide significance threshold value was obtained for each analysed chromosome by storing the maximum LRT statistic across all the analysis points, for each of 1000 permutations. Once the chromosome is completed, the statistics can then be ranked across permutations to derive an appropriate chromosome wide significance threshold.
In every replicate the maximum LRT statistic on the chromosome was retained for each analysis method. The parameter estimates (position, effects of the QTL, the within-sire variance and the probability of sires being heterozygous) associated with the maximum LRT statistic was also retained. In the results section the means and standard errors of the parameter estimates computed over 100 replicates are presented. This is because means and standard errors should be reported for estimates computed over the entire parameter space. However, the power of each analysis method was defined as the percentage of replicates in which the maximum LRT statistic exceeded the chromosome wide significance threshold at the 5% level.

RESULTS
The results are presented in separate tables for each analysis method: singletrait, single-family (Tab. II); multi-trait, single-family (Tab. III); single-trait, multi-family (Tab. IV); and multi-trait, multi-family (Tab. V). With large halfsib family sizes (n i = 200) all analysis methods resulted in unbiased estimates for all parameters. The power to detect QTL is 100% in all methods. In the single-trait, single-family analysis method, the power is slightly less than 100% only for sires D and E. When analysing sire F, which does not segregate for the QTL, there were no significant results, which is not consistent with the expected false positive rate (5%) under the null hypothesis of no QTL. It is possible that 100 replicates are too few to assess empirically the false positive rate associated with our chromosome wide test. Hence a further 900 replicates were run for this particular sire, using the single-trait, single-family analysis method. Out of 1000 replicates the power to detect QTL was estimated as 5%. The mean position of the maximum LRT statistic when analysing sire F was 62 cM. This is expected when no QTL exists on a chromosome because peaks in the LRT statistic profile should be uniformly distributed across the length of the chromosome. The mean of a uniform distribution which is bounded by 0 and 127.7 is 63.9.
The results when n i = 200 verify the correctness of the maximum likelihood techniques outlined in the methods section. However, it is not realistic to expect family sizes of this magnitude in actual planned experiments. The comparison of analysis methods when n i = 25, and when markers are not completely informative, is of more practical value. Under these conditions the single-trait, single-family analyses have low power to detect QTL. In addition, parameter estimates are biased. Generally, the effect of QTL allele substitution is overestimated and the within sire variance is underestimated. The use of additional information, such as information on correlated traits, and/or by combining information from different sires, will increase power to detect QTL and decrease bias in estimation of QTL parameters. For example, the power of single-trait, single-family analyses to detect QTL ranges from 11 to 34%. The power of multi-trait, multi-family analyses ranges from 92 to 100%. In addition, the estimates of QTL parameters from the multi-trait, multi-family analyses appear Table II. Results of single-trait, single-family analyses: percentage of replicates in which chromosome wide significance level of 5% was achieved (Power), mean estimated position measured in cM (Position); mean estimated QTL allele effect on trait 1 (b 1 ); mean estimated within sire variance for trait 1 (σ 2 1 ); and for two values of half-sib family size (n i ). Standard error of means in parentheses. IC is information content.
Sire Power Positionb † to have only slight bias. Importantly, the precision of estimating QTL position has also been vastly improved. Using single-trait, single-family analyses the standard error of mean estimated position ranges from 33 to 49 cM. Using multi-trait, multi-family analyses the standard error ranges from 5 to 12 cM. The use of additional information on correlated traits alone helps substantially when progeny size is limiting. The power of single-family, multitrait analyses ranges from 22 to 72%. This represents a 100% improvement in power over single-family, single-trait analyses. Similar improvement over single-trait methods was found in the study of Gilbert and Le Roy [4].
The results show that when analysing a combination of sires, when in truth the sires have different QTL alleles segregating, or, there are different within sire error variances, the bi-allelic, common variance model will tend to average out the effects. For example in the single-trait, multi-family analysis of the Table III. Results of multi-trait, single-family analyses: percentage of replicates in which chromosome wide significance level of 5% was achieved (Power), mean estimated position measured in cM (Position); mean estimated QTL allele effect on traits 1 and 2 (b 1 ,b 2 ); mean estimated within sire variances for traits 1 and 2 (σ 2 1 ,σ 2 2 ); and for two values of half-sib family size (n i ). Standard error of means in parentheses. IC is information content.

Computing time
Table VI presents the computing times for the various types of analyses. Computing time appears to increase linearly with the number of families in the analysis. By far the largest effect on computing time is the number of half-sibs Table IV. Results of single-trait, multiple-family analyses: percentage of replicates in which chromosome wide significance level of 5% was achieved (Power); mean estimated position measured in cM (Position); mean estimated QTL allele effect on trait 1 (b 1 ); mean estimated within sire variance for trait 1 (σ 2 1 ); mean estimated heterozygosity parameter (h); and for two values of half-sib family size (n i ). Standard errors of means in parentheses. IC is information content.

Sire
Power Positionb † per sire. The longest analysis (a multi-trait, six-family analysis with large halfsib family sizes) took just under 3 h to complete. The same analysis with the smaller family size took only 17 min to complete.

DISCUSSION AND CONCLUSIONS
The analysis of small half-sib families is not uncommon in many Australian QTL studies. A series of DNA marker validation experiments have been performed in the Australian Beef Quality Cooperative Research Centre program to confirm the locations of QTL for beef tenderness, marbling and yield. These are confirmations of linkages initially detected in a large experimental pedigree, known as the CBX cattle [6]. The validations were performed on 45 sires of tropical and temperate origin. The progeny number per sire ranged from 28 to 78. In swine, a QTL mapping project has been directed at providing a resource for evaluating QTL, either discovered in a previous linkage experiment or reported in the literature (Moran, personal communication). The resource consisted of eight sire families comprised of 38 to 65 progeny. In all these Table V. Results of multi-trait, multiple-family analyses: percentage of replicates in which chromosome wide significance level of 5% was achieved (Power); mean estimated position measured in cM (Position); mean estimated QTL allele effect on traits 1 and 2 (b 1 ,b 2 ); mean estimated within sire variances for traits 1 and 2 (σ 2 1 ,σ 2 2 ); mean estimated heterozygosity parameter (h); and for two values of half-sib family size (n i ). Standard errors of means in parentheses. IC is information content.  (8)  studies the EM algorithm was used to estimate the effects of a bi-allelic QTL in an across half-sib family analysis. The use of this particular model in actual datasets has demonstrated several practical advantages over simpler methods of analysis. The increase in power, as demonstrated by the simulations, is apparent. Often QTL are detected in multi-family analyses, but not in singlefamily analyses. Conversely, there are significant single-family results that are not significant in multi-family analyses. There is a real chance that the significant single-family results may be false positives due to multiple-testing. The multi-family analysis is one way to partly reduce the number of tests performed on the data. The goal of these validation studies is to demonstrate a consistent QTL effect across a wide cross-section of the population. It is also important to ascertain the effects of the QTL on as many traits as possible. If the model fitted a separate QTL effect for each sire, it becomes increasingly difficult to interprete and summarise the results. The simulations have shown that the biallelic QTL model will average out the effects of the different alleles in cases where multiple alleles are segregating.
There have been numerous publications that address the EM algorithm in the context of QTL mapping [1,7,10]. While MLEs have been derived for a variety of genetic models and for a variety of population structures, no publication has yet dealt with the type of finite mixture problem discussed in this study. Other publications have used similar likelihood functions, but used quasi-Newton methods to derive MLEs [3,11]. We have compared the EM algorithm to the quasi-Newton routine (E04JAF) from the NAG library (Numerical Algorithms Group 1990) and found the NAG routine to have less accuracy and power. Warnings of local maxima found were often given and the routine was especially unstable when used in a multi-family, multi-trait analysis. The advantage of the EM algorithm, apart from its numerical stability and greater accuracy, is the facility to provide the posterior probabilities that the ith sire is heterozygous, and either phase 1 or phase 2, which are the τ 11,i and τ 12,i described earlier. These indicators will aid any subsequent marker assisted selection of progeny.
In conclusion the EM algorithm described in this study provides the experimenter with a stable and reliable method for combining information across sires and traits in QTL mapping. The strategy of combining information is more critical when faced with small family sizes.