Genetic evaluation for a quantitative trait controlled by polygenes and a major locus with genotypes not or only partly known

Summary - For a quantitative trait controlled by polygenes and a major locus with 2 alleles, equations for the maximum likelihood estimation of major locus genotype effects and polygenic breeding values, as well as major allele frequency and major locus genotype probabilities, were derived. Because the resulting expressions are computationally untractable for practical application, possible approximations were compared with 2 other procedures suggested in the literature using stochastic computer simulation. Although the frequency of the favourable allele was seriously underestimated when major locus genotypes were entirely unknown, the proposed method compares favourably with the 2 other procedures under certain conditions. None of the procedures compared can satisfactorily separate major genotypic effects from polygenic effects. However, the proposed method has some potential for improvement.

Summary -For a quantitative trait controlled by polygenes and a major locus with 2 alleles, equations for the maximum likelihood estimation of major locus genotype effects and polygenic breeding values, as well as major allele frequency and major locus genotype probabilities, were derived. Because the resulting expressions are computationally untractable for practical application, possible approximations were compared with 2 other procedures suggested in the literature using stochastic computer simulation. Although the frequency of the favourable allele was seriously underestimated when major locus genotypes were entirely unknown, the proposed method compares favourably with the 2 other procedures under certain conditions. None of the procedures compared can satisfactorily separate major genotypic effects from polygenic effects. However, the proposed method has some potential for improvement. major locus / genetic evaluation / segregation analysis Résumé -Évaluation génétique pour un caractère quantitatif contrôlé par des polygènes et un locus majeur à génotypes inconnus ou seulement partiellement connus. Pour un caractère contrôlé par des polygènes et un locus majeur à 2 allèles, les équations pour l'estimation du maximum de vraisemblance des effects génotypiques au locus majeur et des valeurs génétiques polygéniques ont été dérivées, permettant aussi d'estimer la fréquence de l'allèle majeur et les probabilités des génotypes à ce locus. Les expressions obtenues étant incalculables en pratique, des approximations possibles ont été comparées par simulation stochastique à 2 autres procédures proposées dans la littérature. Bien que la fréquence de l'allèle favorable soit sérieusement sous-estimée lorsque les génotypes au locus majeur sont entièrement inconnus, la méthode proposée a quelques avantages sur les 2 autres procédés sous certaines conditions. Aucune des procédures comparées n'est INTRODUCTION Statistical methods based on the infinitesimal model, the assumption of many unlinked loci all with small effects controlling quantitative traits, have been successfully applied in animal breeding. An increasing number of studies, however, have reported single loci having large effects on quantitative traits. Such loci are referred to as major loci. Examples are the prolactin (Cowan et al, 1990) and the weaver loci (Hoeschele and Meinert, 1990) in dairy cattle, and the halothane sensitivity locus (Eikelenboom et al, 1980) and a locus acting on &dquo;Napole&dquo; yield (Le Roy et al, 1990), a pork quality trait, in pigs. Only in the case of the halothane locus has the responsible gene been identified and procedures for its genotyping become available (l!TacLennan and Phillips, 1992).
There is no difficulty with genetic evaluation for traits controlled by a major locus and polygenes when major locus genotypes are known. A fixed major locus effect has to be added to the linear model and major locus effects and polygenic breeding values can be estimated by the usual mixed model equations (Kennedy et al, 1992). When genotypes are unknown, however, satisfactory statistical methods are still lacking. Selection decisions could possibly be based on animal models that include the major locus effects in the polygenic part of the model. In cases where the allele has some positive effect on 1 trait but negative effects on others, it would be desirable to have separate estimates of the major locus and polygenic effects available. The 2 estimates would then be combined according to the breeding objective. Because genotyping of all the animals of a population is likely to be too expensive if at all possible, statistical methods are required that estimate major locus genotype effects as well as polygenic effects and major locus genotype probabilities for each candidate.
Such a method was first proposed in human genetics by Elston and Stewart (1971). The unknown parameters of the model are estimated by maximizing the likelihood of the data. For models with both major locus and polygenic effects exact calculations are very expensive and become unfeasible for pedigrees with more than ! 15 individuals. Several studies compared the power of different approximations of the likelihood function to detect a major locus in half-sib family structures in animal breeding data Elsen and Le Roy, 1989;Knott et al, 1992a). Hoeschele (1988) developed an iterative procedure to estimate major locus genotype probabilities and effects as well as polygenic breeding values. The equations produced for the estimation of genotype probabilities were derived for simple population structures and were based on an approximation of the likelihood function. Kinghorn et al (1993) used the iterative algorithm of van Arendonk et al (1989) to estimate genotype probabilities and estimated genotype effects by regression on genotype probabilities. A method was proposed to correct for the bias inherent in such analyses.
The objectives of this study were: i) to derive exact maximum likelihood equations to estimate major locus genotype probabilities and effects for a quantitative trait with mixed major locus and polygenic inheritance without any restrictions on population structure; ii) to examine possible approximations; and iii) to compare these approximations with the methods of Hoeschele (1988) and Kinghorn et al (1993) by stochastic computer simulation.

Model
Consider a quantitative trait which is controlled by 1 autosomal major locus with 2 alleles, A and a, and many other unlinked loci with alleles of small effects. Mendelian segregation is assumed for all alleles at all loci. The allele with the major effect, A, has a frequency of p in the base population, which is assumed to be unselected, not inbred and in Hardy-Weinberg and gametic equilibria. In the base population the 3 possible genotypes at the major locus (AA, Aa and aa), which will be denoted as 1, 2 and 3 throughout this paper, are therefore expected to occur in frequencies of p 2 , 2p(1-p) and (1-p) 2 , respectively. Because genotyping of animals might be impossible or too expensive, we assume for the moment that the genotypes at the major locus are not known. With 1 observation per animal the following mixed linear model can be formulated: where y = observation vector b = vector of non-genetic fixed effects g = vector of fixed major locus genotype effects [g 1 92 g3!! a = vector of random polygenic breeding values e = vector of random errors X,Z = known incidence matrices T = unknown incidence matrix indicating true major locus genotypes of all the animals in the population The expectation and variance of the random variables are assumed to be: The linear model is mixed in both the statistical sense (Henderson, 1984), as it contains fixed and random effects, and the genetic sense (Morton and MacLean, 1974), as it contains a single locus and a polygenic effect. Strictly additive gene action of the polygenes is assumed but dominance is allowed for at the major locus. In order to keep the model simple, it is further assumed that the variance components Q a and Q e are known. This assumption implies that the genetic variance caused by polygenes is known but not the genetic variation caused by the segregating major allele, which is determined by the major genotype effects and frequencies. This critical assumption has to be kept in mind when discussing tlte simulation results.

Likelihood function
The likelihood for mixed model [1] was first discussed by Elston and Stewart (1971).
The likelihood can be written as: is a normal density and Pr(Tlp) is the probability of T given the allele frequency p and the pedigree information. Because variance components are assumed to be known, c l = (27r)&dquo;°'!&dquo; -!V ! .ol e 21-1.1, with no as the number of observations, is a constant. Following Elston and Stewart (1971), Pr(Tlp) can be computed as a product of probabilities: ,, where N is the total number of animals in the population and Pr(! !s!d) is the probability of animal i having genotype indicated by t i , the ith row of T, given the genotypes of its parents s and d, and is assumed to be known. Elston and Stewart (1971) give Pr(ti!t9,td) for autosomal and sex-linked loci. When the parents are unknown Pr(tz!ts,td) is replaced by the frequency of the genotype t i in the base population. Known major locus genotypes can be accomodated by setting Pr(! !,!) to zero whenever ti conflicts with the known genotype of animal i. With the base population (animals with unknown parents) in Hardy-Weinberg equilibrium, Pr(Tlp) can be written as: where n l , n 2 and n 3 are the number of base animals of genotype AA, Aa and aa, respectively, and n b = n l + n 2 + n 3 is the total number of base animals.
With 3 possible genotypes the sum in [2] is over 3 N elements. For 20 animals the sum is already over 3.5 x 10 9 possible incidence matrices T. Whenever T conflicts with the pedigree information Pr(Tlp) is zero. Therefore, depending on the pedigree structure, a large number of the elements to sum are zero, but there remains a considerable number of non-zero elements.
As pointed out by Elston and Stewart (1971) the 3 likelihoods conditional on an animal's genotype t i are proportional to the probabilities of animal i having 1 of the 3 possible genotypes. The conditional likelihoods can be obtained by skipping animal i in the summation over all possible incidence matrices T.

Maximum likelihood estimation
In order to maximize L(y), we need the first derivatives with respect to b, g and p: The probability of T given the data and the parameters of the model will be denoted w T and can be computed as where c 2 is the product of c l and a scaling factor such that E WT = 1. Note that T without scaling this sum is equal to the likelihood L(y). After setting to zero and rearranging we get the 2 following equations: Solving for p in the last equation leads to: This equation can be rewritten by replacing 2n 1 + n 2 by v!. T. [2 1 0!', with v' a row vector of length N with ones for base animals and zeros for the other animals.
Because m T depends on b, g and p, equations [3] and [4] have to be solved iteratively. Let tu! be w T with solutions for b, g and p after round r replacing the true values and Q' = L wTT. Note that the ikth element of Q! at convergence is T an estimate of the probability that animal i is of genotype k given the data and the estimates for the fixed effects b, the major locus effects g and the allele frequency p. As mentioned above, the same estimate can be obtained by calculating likelihoods conditional on an animal's 3 genotypes. Using these definitions, equations [3] and [4] can be written as: The solutions for b T , i' and p r converge to maximum likelihood (VIL) estimates.
Local maxima in L(y) could pose a problem and will be discussed later. Hoeschele (1988) estimated the allele frequency from the genotype probabilities of all animals with records whereas [6] considers only base animals, which is in agreement with Ott (1979). Because genotype probabilities of base animals take information from their descendants into account, all information on the allele frequency in the base populations is properly used by !6J.
Animal breeders are not only interested in estimating major locus effects g and allele frequency p but also in predicting polygenic breeding values a. This is usually done by regressing phenotypic observations corrected for fixed effects: (Henderson, 1984), a can also be computed as: The same solutions for b, g and a are obtained by iterating on the following equations together with [6] instead of using (5!, [6] and !7!: Note that 2.:: wTT'Z'ZT = diag(v § . q[) = D r , where vb is a row vector T containing the diagonal elements of Z'Z and q[ the kth column of Q r . The difficulty with this approach is that it is not feasible to compute Q' and ! tUy -* T T'Z'ZMZ'ZT for large populations.

Approximations
Above Q r was defined as: There are 2 problems associated with the computation of C!''. Firstly, the summation is over all possible incidence matrices T and, secondly, a quadratic form involving V-' has to be computed for each element in this sum. It can be shown that the following is an equivalent expression not involving V-1 : where £11 = MZ'(y -Xb r -ZTg r ) (Le . Because aT depends on T, we would have to compute fill for every possible T, which is not feasible. In order to simplify the computations, we could replace *11 by M which does not depend on T. Note that â r = L wT' âT. This approximation was also considered T by Hoeschele (1988). The approximated Q! is then: Instead of using a single estimate of the polygenic breeding value for each animal irrespective of its genotype, we could use 3 values for each animal depending on its genotype but independent of the genotypes of all the other animals. A similar approximation was considered by Elsen and Le Roy (1989) and Knott et al (1992aKnott et al ( , 1992b) for a sire model and was found to be superior to [9]. We considered the following approximation: where aL the element of ai j for animal i with genotype k is calculated as: where x i and t ik are the ith rows of X and ZT, a ?3 is the ijth element of A-1 , and c ii is the diagonal element of the coefficient matrix in [8] pertaining to the ith animal equation.
The summation over all possible incidence matrices T in [9] or [10] can be avoided by using algorithms developed to estimate genotype probabilities. Here, the iterative algorithm of van Arendonk et al (1989) was applied. This procedure will be briefly described in the next section.
As with Q! the difficulty with expression E w' -T'Z'ZMZ'ZT is two-fold; the sum is over all possible T, and the computation of each element in that sum is expensive. Let m2! be the ijth element of Z'ZMZ'Z, and t ik (t jl ) be the elements of T for animal i(j) and genotype /c(l). Now, the klth element of L wTT'Z'ZMZ'ZT can be calculated as: Note that at convergence W't ik . <_,; is an estimate of the probability that T animal i is of genotype k and animal j of genotype L, given the data. For independent animals this quantity is equal to q' ik qj'l the product of the corresponding elements in Q'' and, therefore, the contributions of L wTT'Z'ZMZ'ZT and Q&dquo; Z'ZMZ'ZQ' T to B'' cancel out. For dependent animals the contributions to the klth element of B' are: Now if we neglect the dependencies between animals for the computation of L w2.. tik . t jl we get: T and [8] becomes identical to the mixed model equations given by Hoeschele (1988).
Another way to approximate B'' is to assume that A = I. We then get: and B'' simplifies to: Estimation of genotype probabilities Van Arendonk et al (1989) developed an iterative algorithm to estimate genotype probabilities for discrete phenotypes. Kinghorn et al (1993) applied this algorithm to continuous traits. The comparison of this algorithm with non-iterative methods revealed some errors in the formulae given in the original paper (LLG Janss and JAM van Arendonk, 1991;C Stricker, 1992; personal communications). We applied a corrected version of this algorithm.
For each animal, genotype probabilities from 3 different sources of information are computed using approximation [9] or [10]. One round of iteration involves 3 steps. First genotype probabilities are computed using information from parents and collateral relatives proceeding from the oldest to the youngest animal. In the second step, genotype probabilities are calculated using information from the progeny proceeding from the youngest to the oldest animal. Finally, genotype probabilities using information from each individual performance are calculated and the 3 sources of information combined. The iteration process is stopped when the solutions for genotype probabilities reach a given convergence criterion.
The algorithm works for simpler pedigree structures as simulated in this study but does not allow for loops in the pedigree, also known as cycles (Lange and Elston, 1975). Loops in a pedigree occur through genetic paths (inbreeding loops), mating paths, or a combination of the 2 (marriage loops), eg, a sire mated to 2 genetically related dams. Both inbreeding and marriage loops are common in animal breeding data. A non-iterative algorithm for pedigrees without loops was recently proposed, which should be more efficient than the one used in this study (Fernando et al, 1993). (1988) Hoeschele (1988) used a Bayesian approach to derive an iterative procedure to estimate genotype probabilities Q, allele frequency p and major locus effects g for simple pedigree structures. The genotype probabilities were estimated by formulae that were developed for the specific pedigree structures considered using approximation [9]. In contrast to [6], Hoeschele (1988)  where no is the number of animals with records and vo is a row vector with ones for animals with records and zeros otherwise. The equations that estimate the effects of model [1] are the same as [8] approximated with [11]. We applied this method in the simulation study using the iterative algorithm described above but with approximation [9] to estimate genotype probabilities instead of the formulae given by Hoeschele.

Method of Kinghorn et al (1993)
In least-squares analysis it is usually assumed that all independent variables are known without error. When independent variables are measured with some error, the least-squares estimates are biased (see, for example, Johnston, 1984, p 428). Kinghorn et al (1993) treated the unknown incidence matrix T as the unknown true independent variable and the genotype probabilities Q as an estimate for T associated with some errors. Using Q instead of T in the model leads to biased estimates of g * . Kinghorn et al (1993) derived a correction matrix W, such that g = W!! §* . Given certain assumptions, they showed that W = V!V(, where V t is a 3 x 3 covariance matrix of elements in the 3 columns of T and V 9 is the corresponding covariance matrix of elements in the 3 columns of Q. Because (co)variances in V Q are generally smaller than (co)variances in V t , major locus effects are overestimated in absolute terms when using Q instead of T. The (co)variances in V 9 were calculated from the actual solutions for estimates of genotype probabilities of all animals with records. Covariances in V t were computed as: where q .k is the average genotype probability for genotype k of all animals with records and can be regarded as an estimate of the frequency of that genotype in the population. Genotype probabilities were estimated with the algorithm of van Arendonk et al (1989). This algorithm requires the allele frequency p as an input parameter. Kinghorn et al (1993) kept the initial value for p constant over all iterations, ie regarded the initial p as the true value. But if p was known, Cov(t k ,t¡) could also be derived from the expected frequencies of the 3 genotypes. In our implementation Cov(t!,tl) was computed with [14] and the allele frequency p was estimated with (13!, which is a natural deduction from !14!. The linear model can be written in matrix notation as: Kinghorn et al (1993) assumed that Var(a * ) = Var(a) = A -Q a and Var(e * ) = Var(e) = I -Q e. The matrices Q and W are not known and have to be estimated from the data as described above. Therefore, the following system of equations has to be solved iteratively: Estimates for g should be unbiased but estimates for b and a are still biased. We attempted to correct for the bias in b by adding (X'X)-l X'ZQ(W -I)g'' +1 , the expected difference between b r+1 and b *r+1 under the assumptions E(T) = E(Q), E(aa * ) = 0, and E(e -e * ) = 0, to the current solution 6 *r+ '.

Simulation
The methods of Hoeschele (1988) and Kinghorn et al (1993) were compared with the method developed in this study applying approximations [10] and [12] using stochastic computer simulation. Phenotypic observations were generated by using the following mixed model: where hys i is the fixed effect of herd x year x sex i, g! is the fixed effect of major locus genotype j, a2!! is the polygenic breeding value and e2!! is the random residual effect. The effects in the model were sampled as follows: f hys i N(0,I J fI) fa ijk } -N(0,A J § ) and {e2!! } N N(0,I J § ) . Major locus genotypes were simulated with 2 segregating alleles. Genotypes of base animals were generated by sampling 2 alleles from a uniform distribution between 0.0 and 1.0 with threshold p, the frequency of allele A. Genotypes of progeny were determined according to mendelian segregation. The effect of genotype 3 was set to zero as there is a dependency between fixed herd x year x sex and major locus effects.
Three different sets of parameters were used (table I). Only additive effects of the major locus were considered, although all of the methods compared allow for dominance. In the first set of parameters, 50% of the phenotypic variance (variance due to major locus + polygenic variance + residual variance) is due to genetic effects, 75% of the genetic variance is due to the major locus, and 25% is due to the polygenes. The frequency of allele A with major effect is 25% in the base population, which results in an allele substitution effect a of 1.0, ie genotype effects of 2.0 (AA), 1.0 (Aa) and 0 (aa). In parameter set 2, the allele frequency p is 0.5, but the genotype effects and all the other parameters are the same as in set 1. Thus the variance due to the major locus is increased from 0.375 to 0.5, and the phenotypic variance changes from 1.0 to 1.125. In parameter set 3, the allele frequency p is 0.25 and 50% of the phenotypic variance is due to genetic effects, as in parameter set 1, but the proportion of genetic variance due to the polygenes is increased from 25 to 40%, which results in an allele substitution effect a of 0.8.
Because the algorithm to estimate genotype probabilities used in this study does not allow for complex pedigrees, the structure of the simulated population is very simple. In each of 10 herds, 20 base dams each had a record in year 1. A group of 20 base sires each with their own record in a common herd x year (eg test station) was mated with these base dams. Each sire was randomly mated with 1 dam in each herd. Each mating produced 5 progeny in year 2. The sex of each progeny was determined by sampling from a uniform distribution between 0.0 and 1.0 with threshold 0.5. The population size was 1220, made up of 220 base animals and 1 000 progeny.
In each of the alternatives, the same sequence of random numbers was used. Therefore, identical data sets were analysed with each of the 3 methods considered. Each alternative was replicated 25 times.
With each of the 3 methods, final solutions are obtained by repeatedly computing genotype probabilities and solving a system of equations to get new solutions for major genotype effects and polygenic breeding values. A stopping criterion of the form: was used for major genotype effects g and the allele frequency p.

RESULTS
When the genotypes of all animals with records are known, the estimates for major locus effects g are identical for all 3 methods considered (table II). Estimates for the allele frequency p, however, differed slightly. Using formula [13] (Hoeschele, 1988; Kinghorn et at, 1993) the standard deviations (SD) of estimated p were larger than estimates by [6]. The estimates for g and p agree well with the true values.
Estimates of g across parameter sets are consistently slightly larger than the true values, which can be explained by sampling effects and the fact that for each of the 25 replicates, data for the 3 parameter sets were generated with the same set of random numbers. As expected from the heritabilities, the correlations between true and predicted breeding values were the same for parameter sets 1 and 2 and slightly higher for parameter set 3. The correlations between predicted breeding values and estimated major locus effects were close to zero, showing that the 2 effects were well separated in all cases. Table III shows the simulation results for the 3 parameter sets using all 3 procedures when major locus genotypes were unknown. For parameter sets 1 and 2, estimates of major locus effects g were close to the true values or slightly underestimated with approximated maximum likelihood (AML), underestimated by about 20% with the method of Hoeschele (1988) and overstimated by 25 to 30% with the method of Kinghorn et at (1993). For parameter set 3, estimates of major locus effects g were zero for 2 replicates using AML and for 21 replicates using the method of Hoeschele (1988). Non-zero estimates of g were biased upwards by 14% with A1VIL and by 47% with the method of Kinghorn et at (1993). Both ANIL and the method of Hoeschele (1988) showed a large variability of the nonzero estimates of major locus effects for parameter set 3. When the true allele frequency was 0.25 the allele frequency p was substantially underestimated with AML, but estimated quite well with the other 2 methods. Correlations between true and predicted breeding values were similar for AML and the method of Hoeschele (1988), but zero for the method of Kinghorn et al (199_3). For parameter sets 1 and 2, the correlations between true (Tg) and estimated (Qg) major locus effects were similar for all 3 methods. When major locus effects were smaller (parameter set 3) these correlations were largest with the method of Kinghorn et al (1993). Predicted breeding values were positively correlated to estimated major locus effects Qg with AML and to a larger extent with the method of Hoeschele (1988). Using the method of Kinghorn et al (1993) these correlations were strongly negative.
Because poor estimation of p also affects all the other estimates, additional simulations were done with the allele frequency fixed at the true (expected) value. Results are reported in table IV for A1VIL and the method of Hoeschele (1988) for parameter sets 1 and 3. All other results were close to those of table III and are therefore not shown. Major locus effects g were underestimated less with AML and the correlations were similar for both methods. For parameter set 3, the number of replicates with estimates of zero for major locus effects was again much larger with the method of Hoeschele (1988).  Hoeschele (1988) underestimated major locus effects considerably more than AML (9 to 31% ver.sus 1 to 11%), whereas these effects were overestimated by 22 to 43% with the method of Kinghorn et al (1993). The accuracies of predicted breeding values were again similar for AML and the method of Hoeschele (1988) but much lower for the method of Kinghorn et al (1993). The accuracies of estimated genetic values at the major locus were similar for all 3 methods with a tendency of lower accuracies for the method of Kinghorn et al (1993). When all the sires but none of the dams were genotyped the results, which are not reported here, were intermediate between the 2 cases of no animals and all sires plus 50% of the dams genotyped. So far, final solutions have been reported for iterations where starting values were equal to true (expected) values. Table VI shows the number of replicates that converged to the same solutions using different starting values. Low starting values were half the true values and high starting values were 1.5 times the true values of major locus effects g and allele frequency p. When major locus genotypes were not known, none to a few replicates converged to a single set of solutions with all 3 different starting values. For the method of Hoeschele (1988) with parameter set 3, most of the replicates that converged to the same solutions converged to an estimate of zero for major locus effects g. For AML and the method of Hoeschele (1988), all replicates with 1 exception converged to 1 set of solutions when genotypes of all the sires (but none of the dams) were known. The largest number of replicates with all 3 solutions different was found with the method of Kinghorn et al (1993).

DISCUSSION
The method proposed here (AML) generally slightly underestimates major locus effects g and seriously underestimates allele frequency p when the true frequency is 0.25. The underestimation of p leads to increased estimates of g, although not to the extent that the variance explained by the major locus stays constant (tables III and IV). This variance is higher when the allele frequency is fixed at the true value. The allele frequency was still considerably underestimated for parameter set 1 when the pppulation size was 10 times larger than considered here (results not shown). The allele frequency was estimated by (6!, which was derived by maximizing the likelihood of the data, whereas the other 2 methods used [13]. Additional simulation runs with parameter sets 1 and 3 and approximations [9] and [11] together with [6] showed considerably lower estimates of p and higher estimates of g than results for the same 2 approximations applied together with [13], the method of Hoeschele (1988) (results not shown). There seems to be a problem in applying [6] together with approximations [10] and [12] or, to a lesser extent, with [9] and (11!. Nevertheless [6] is the correct equation for the estimation of the allele frequency by maximum likelihood. The method of Hoeschele (1988) consistently underestimated major locus effects g which is in agreement with the simulation results of the same author. For smaller allele effects (parameter set 3), although still quite large, most of the estimates of g were zero, indicating that the genotype effects have to be large in order to be recognized. The same is true for A1VIL, but to a lesser extent. There was a tendency for the accuracies of predicted polygenic breeding values (a) and estimated major locus effects (6g) to be slightly higher with AML than with the method of Hoeschele (1988). In an unselected population as simulated here the expected correlation between true polygenic and major locus effects is zero. The correlations between the 2 estimates were positive for both methods but in almost all cases they were lower with AML. This indicates that the 2 estimates are less confounded with A1!IL. With selection a negative correlation between the true effects will build up (gametic disequilibrium) which will make separation of the 2 effects more difficult. For AML and the method of Hoeschele (1988), the mean correlations ra, a were lower and r-o-were higher when the allele frequency was 0.5 (parameter set 2) than when the same allele had a frequency of 0.25 (parameter set 1) (tables III and V). Although the proportion of variance explained by the major locus is higher with parameter set 2 it seems to be more difficult to separate polygenic and major locus effects with intermediate allele frequencies. This was also found by Knott et al (1992a) for similar approximations. For parameter sets 1 and 2, both methods showed a large reduction of 35 to 40% for ra, a and 25 to 32% for 7 -T g, Q g when genotypes were unknown rather than known (tables II and III).
With the method of Kinghorn et al (1993), estimates of the allele frequency p were generally closer to the true values than with the other 2 procedures. However, major locus effects were overestimated and the correlations between true and predicted breeding values were close to zero which is in agreement with their simulation results. The method attempts to correct for the bias inherent in major locus estimates by regression on the independent variable ZQ!, an estimate from the data, which is associated with some error. The term ZQ! is postmultiplied by the correction matrix W!. ZQ''W r is then used the same way as a usual incidence matrix in the mixed model equations. Multiplication by w r increases the variance of the independent variable to the variance expected for the unknown term ZT. Because w r is calculated over all animals with records, the new variance is correct only on the average. For an animal with known genotype, the elements in Q! are identical to the values in T and should therefore not be altered by W!. Sires had more progeny than dams, therefore their estimated genotype probabilities were closer to the true values and should have been multiplied by a matrix closer to an identity matrix in comparison to dams. In addition, breeding values estimated by [15] are still biased. These 2 problems are probably responsible for the overestimation of g and very poor prediction of polygenic breeding values. The performance of the method was, however, less affected by smaller allele effects (parameter set 3) than the other 2 procedures.
For all 3 procedures there was a problem of different solutions with different starting values when genotypes were unknown. For AML and the method of Hoeschele (1988) the cause could be the multimodality of the likelihood function.
It seems to be necessary to compute approximated likelihoods which then can be used to select the solutions with the highest likelihood. This could of course also be done with the method of Kinghorn et al (1993) but this method has no direct relationship with maximum likelihood.
In this study variance components were assumed to be known but in practice have to be estimated. Using incorrect values could lead to biased estimates of major genotype effects and frequencies. For example, using an underestimated genetic variance might result in an overestimation of the major genotype effects. If a major allele is known to be segregating variance components free of major genotype effects would have to be estimated with model !1!. This could be very difficult because even when the true variance components were used, all 3 methods performed poorly when no animals were genotyped.
Clearly, none of the methods is satisfactory for a separate genetic evaluation for the major locus and the polygenes. In this study only large effects were considered.
AML and, especially, the method of Hoeschele (1988) were unable to detect smaller effects than used with parameter set 3. For example, the effects estimated for the prolactin locus in a Holstein sire family (Cowan et al, 1990) were much smaller than considered here. The method proposed has some potential for improvement. Future research should focus on the development of algorithms to estimate genotype probabilities without any restriction on pedigree structures. The estimation of joint genotype probabilities for any 2 pairs of animals together with sparse matrix techniques to compute the elements of M could avoid the need for some of the approximations made in this study.