The use of fixed effect models and mixed models to estimate single gene associated effects on polygenic traits

Single gene associated effects on polygenic traits may often be confounded with the effects of a non-random genetic relationship between individuals sharing a particular allele of the investigated gene. Two different statistical models are suggested to separate the single gene associated effects from the remaining additive genotype: a fixed effect model with ancestor variables and a mixed model with random effects of the additive genotypes of the individual animals (individual animal model). The use of the models is illustrated by an example from an experiment with the chicken major histocompatibility complex (MHC) gene region. single gene effects / fixed effect model / animal model / chicken / major histocom-patibility complex Résumé &mdash; L'utilisation des modèles à effets fixes et des modèles mixtes pour estimer des effets de gènes individuels sur des caractères polygéniques. Des effets de gènes individuels sur des caractères polygéniques sont souvent confondus avec des effets d'une relation génétique non aléatoire entre individus partageant un allèle étudié. Deux modèles statistiques différents sont proposés pour séparer les effets associés au gène unique du génotype additif restant: un modèle à effets fixes représentant les contributions des ancêtres et un modèle à effets aléatoires des génotypes additifs individuels (modèle individuel animal). L'emploi des modèles est illustré par une expérience impliquant la région génique du complexe d'histocompatibilité chez la poule. effet de gène individuel / modèle à effets fixes / modèle animal / poule / complexe majeur d'histocompatibilité


INTRODUCTION
The possibilities of detecting genetic polymorphism in domestic animals by analysing gene products or by direct DNA analysis are steadily improving. The utilization of this kind of information in selection programmes or by naked gene transfer techniques is dependent on an increased knowledge of single gene associated effects on polygenic traits. Such effects are often analysed by direct comparison of the average performance of individuals grouped by their genotype for the polymorphic gene. However, since relatives have an increased probability of sharing any particular allele, such groups are not always expected to be randomly related. The single gene associated effects may then be confounded with the effect of a systematic sampling of other unidentified genes affecting the investigated polygenic trait. The problem will be magnified in small, closed populations of animals with high reproductive rates. The obvious solution to this problem will be to restrict the analysis to comparisons of sibs or inbred lines segregating for the polymorphic gene. This, of course, also sets limits to the type of material that may be analysed and to the efficiency of the analysis. The need for statistical models that may separate single gene associated effects from the remaining genotype of any individual in a heterogeneous population is, therefore, obvious.
The present paper describes 2 models that may be used for this purpose.
Within certain limitations, they are applicable in most pedigreed populations. Individuals from several generations can be analysed together. It should be noted that the models are not designed to study the nature of the single gene associated effects. To distinguish between direct effects of the investigated gene and effects caused by linkage disequilibrium with other genes or to determine linkage distance, appropriate experiments should be carried out. However, if the investigated material can be divided into distinct subpopulations, applying the present models on each subpopulation separately will often result in variable estimates if the single gene associated effects are caused by linkage disequilibrium.

MODELS FOR ESTIMATION OF SINGLE GENE ASSOCIATED EFFECTS
If a random genetic relationship is assumed between individuals sharing the same genotype for the investigated gene, then the single gene associated effects can be analysed according to the following basic model of fixed effects.
where I i!!. is the polygenic trait performance of the kth individual in the ith non genetic fixed effect classification with the jth genotype for the investigated gene a is a constant S, is the effect of the ith non genetic fixed effect classification (herd, year, season, etc) G j is the effect of the jth genotype for the investigated gene C i ik is a random error.
Estimates of the G j parameters in the model may be obtained by least squares means analysis. The significances of the contrasts between the estimates may be tested according to standard general linear models procedures. The G j effects reflect the total effect associated with the two alleles constituting the genotype, both the independent effect associated with each allele and any interaction effects between them. Direct gene action may not be distinguished from linkage effects. If the total number of individuals recorded is n, the number of non-genetic fixed effect classifications is f, and the number of genotypes for the investigated gene is f2 , Model 0 may be written in matrix notation as follows: where Y is a (n x 1) vector of ll j k performance records X l is an (n x f ) incidence matrix for the constant, the S i effects and the G j effects ( f = 1 + f l + f 2 ) b l is a ( f x 1) vector including the constant, the S i effects and the G j effects e is an (n x 1) vector of random errors. As pointed out previously, a random genetic relationship within each G j group may normally not be assumed, and the G j estimates according to Model 0 may then be confounded.

Ancestor model
To avoid the confounding effects, the model should be extended to include independent parameters estimating the remaining additive genotype affecting the polygenic trait. Such independent parameters may be estimated when the investigated gene shows variation within family lines or family groups. If the genetic relationship between each of the individuals included in the analysis and each of the complete set of ancestors in a common base population is known, the basic principles of several general linear models estimating crossbreeding parameters (reviewed by Fimland, 1983) may be applied.
The present model is modified to deal with individual gene contributions rather than breed contributions, and the fixed effects of these contributions are regarded as correction terms rather than parameters to be estimated. The extended model may be written as follows: r where (3m is the fixed, additive effect of genes originating from the mth base population ancestor r is the number of base population ancestors B mij /,; is the expected proportion of the total genotype of the kth individual contributed by the mth base population ancestor, E B m ij k = 1.0 for each k, (m = 1, 2, ... , r). The G j effects may be estimated and tested according to the same standard procedures as under Model 0. Model 1 may be written in matrix notation as follows: where X 2 is a (n x r) relationship matrix of Bmijk values showing the expected genetic relationship between each of the recorded individuals and each of the base population ancestors. b 2 is a (r x 1) vector of (3 1 11 regression coefficients.

Individual animal model
The confounding effects of the remaining additive genotype for the polygenic trait may also be eliminated in a mixed model including the random effects of the individual additive genotypes of the recorded animals. The use of an individual animal model to estimate single gene associated effects was suggested by Kennedy and Schaeffer (1990). Basically, this model may be written as follows: where U k is the random, &dquo;single gene free&dquo; additive genetic effect on the polygenic trait in the kth individual.
In matrix notation, this model may be written as follows: where Z is an (n x n) incidence matrix for the individual additive genotypes for the polygenic trait u is an (n x 1) individual additive genotype effect vector of U k values. It has been shown by Henderson that the fixed effect vector (b l ) and the random effect vector (u) may be obtained by computing the best linear unbiased estimates (BLUE) for the fixed effects and the best linear unbiased predictors (BLUP) for the random effects. If all animals have single records, the radom error covariance is assumed to be 0 and the random error variance is equal for all individuals, the following mixed model equations may be applied (Henderson, 1973(Henderson, , 1977: where A is an (n x n) individual additive genetic relationship matrix h 2 is the heritability of the investigated trait when the single gene associated variation is not included in the additive genetic variance component (&dquo;single gene free&dquo; heritability).
The appropriate heritability may be obtained from variance components estimated from the equivalent model by restricted maximum likelihood (REML) and the derivative free approach described by Meyer (1988Meyer ( , 1989. To ensure maximum precision of the b 1 and u solutions, individuals without records in y that contribute to the genetic relationship between individuals with records in y should be included in A. The extended A should always include the base population individuals and their common ancestors during the last preceding generations. If the total number of individuals with and without records in the analysis is n', the dimension of the extended A will be (n' x n') and the corresponding dimensions of Z and u will be (n x n') and (n' x 1). BLUP solutions (u) will consequently be computed for all animals, including individuals without records in y.
To compute the least squares means of the fixed effects under Model 2 and the contrasts between them and to test the significances of the contrasts, a simplified approach may be applied. The complete mixed model equations may be condensed by absorbing the random variables (u) into the fixed effects design matrix (XiX l ). This condensed ( f x f ) matrix may then be applied to estimate and test the contrasts according to standard least squares procedures (see eg Searle, 1982).
The estimates of the contrasts between the G j effects will be directly comparable with the contrasts obtained under Model 0 and Model 1. However, to compute least squares mean estimates of the G j effects that can be directly compared with the estimates under Model 0 and Model 1, the average BLUP value of individuals with records in y must be included in the estimates. This will require the solution of the complete set of mixed model equations to obtain individual BLUP values.

Modifications of the models
The G j effects in the models may be decomposed according to the following general formula: where 81' is the average linear effect of the pth allele of the investigated gene v is the number of alleles of the investigated gene A I , is the frequency of the pth allele carried by the individual (Ap = 0, 1 or 2, I: Ap = 2 for p = 1,2, ... ,v) Î is the regression coefficient for the general effect of heterozygosity in the investigated locus H is the degree of heterozygosity in the investigated locus (normally H = 0 or 1) éq is the regression coefficient for the qth specific combining effect of two . different alleles of the investigated gene w is the number of different specific combinations of two different alleles C 9 is the incidence of the qth specific combination of two different alleles of the investigated gene (C, = 0 or 1, ! Cq = 0 or 1 for q = 1, 2, ... , w) If the G j effects in the models are substituted according to the formula above, the contrasts between the linear effects of the investigated alleles, the general effect of heterozygosity and the contrasts between the specific combining effects of the investigated alleles may be evaluated separately. If the specific combining effects are assumed to be negligible, the e 9 Cq elements may be excluded from the models.
The number of single gene associated estimates may then be reduced compared to the original models. This reduction may be important if a large number of unevenly distributed alleles are investigated simultaneously in a limited number of experimental animals. The parameters of this reduced model may be estimated with a higher accuracy, and the performance of any particular genotype may be predicted from the 8 1' and the -/ estimates, even if the genotype is missing in the experimental records.

PROPERTIES OF THE MODELS
The genetic relationship parameters required in both Models 1 and 2 may be generated from pedigree records. Several generations of related individuals may be analysed simultaneously. In Model 1, the pedigree of the investigated individuals must be traced back to a common ancestor base population. In many cases, the parents of the first experimental generation may be regarded as the base population. In Model 2, the complete genetic relationship matrix between all investigated individuals should be generated. In most cases, this will require pedigree records for several generations of ancestors prior to the first investigated generation. In addition to the genetic relationship parameters, the covariance between relatives is determined by the heritability of the investigated polygenic trait. In Model 1, the realized additive genetic effect of each ancestor genotype is utilized to obtain the (3 m estimates. Consequently, (3 m by definition estimates the &dquo;single gene free&dquo; additive genotype of the base population ancestors and it may be possible to obtain a kind of average &dquo;single gene free&dquo; heritability estimate based on the variance of the (3 m estimates if the phenotypic variance of the polygenic trait in the base population ancestor is known. In Model 2, the heritability is a required input parameter. The use of a a priori heritability estimates in an individual animal model may be justified. As shown in the example in the present paper, the fixed effects solutions may be affected if the difference between the assumed and the real heritability is too large. Since the required heritability input should be &dquo;single gene free&dquo;, reliable a priori estimates may not be available. Kennedy (1990) concluded that the heritability may then be estimated from the experimental records. This can be done by the RENIL approach referred to earlier.
The accuracy of the ( 3 m estimates according to Model 1 is dependent on the number of individuals originating from each of the base populations ancestors. In most species, the number of first generation offspring per ancestor dam may be quite limited. Furthermore, applying Model 1 to first generation offspring only may cause an additional problem because of limited segregation of the investigated gene within offspring sharing proportions of a common additive ancestor genotype. The required genetic composition of the experimental individuals may be achievied by multiple matings of the base population ancestors in different combinations, by recording offspring from generations later than the first one or by pooling several generations of offspring. If possible, the mating scheme should be designed to ensure genetic ties across genotypes for the investigated gene. Model 2 is less sensitive to this type of problems but a certain degree of genetic relationship across genotypes for the investigated gene is still required to eliminate the confounding effects.
In Model 1, the error variance is not expected to be constant across generations.
The direct offspring of the base population ancestors may be scored without error for the B&dquo;,,i!!. variables (0 or 0.5). In the successive generations, the B m ij k variables represent the expected ancestor gene contributions while the real contributions are influenced by the random sampling of alleles during gamete formation. This sampling error is accumulated as the number of generations increases. The precision of the estimates according to Model 1 may consequently be poor, if the number of generations between the ancestor base population and the investigated individuals is too large. The error variance of Model 2 is not influenced by such generation effects.
The average effect of selection for the dependent variable may be adjusted for by including generation effects as fixed effects in Model 0 and Model 1. However, the effect of selection on the ( 3 m estimates according to model 1 may vary from one estimate to another due to random differences in the realized selection intensities in the gene flow from different base population ancestors. This will violate the basic assumption that the (3n effects may be regarded as fixed effects across generations. A similar problem may arise as a result of genetic drift, if severe bottle-necks appear in the gene now from some of the base population ancestors to any of the offspring generations. Consequently, selection and genetic bottle-necks should be avoided when applying Model 1. This problem will be less important if Model 2 is applied. The additive genetic effects (V! ) are then regarded as random effects and the (V k ) values are predicted from the complete genetic variance-covariance matrix rather than from ancestry lines. Any genetic trend will then be corrected for by the BLUP values (U! ) and/or fixed generation effects, depending on the genetic ties between the recorded individuals.

GENETIC INTERPRETATIONS OF THE SINGLE GENE ASSOCIATED PARAMETERS
The parameters of interest in the models are estimated by the G j effects. The total effect associated with each of the genotypes for the investigated gene on the polygenic trait is computed. Since direct gene effects may not be distinguished from linkage effects, the term &dquo;gene region&dquo; will be applied in the following discussion, indicating that the polymorphic gene may function as a marker gene. The G j estimates will contain the additive effects of each of the two gene regions constituting the genotype, the general and specific dominance interactions between the two gene regions and the average epistatic effects between each of the two gene regions and the remaining genotype of each of the individuals within each G j group. The epistatic effects are true single gene associated effects, but they may be difficult to reproduce if the interacting genes are variable, unidentified and not randomly occurring in the G j groups. In addition, heterozygosity in the investigated locus may serve as a marker for general heterozygosity. The G j effects may then be confounded with general heterosis. This may be checked by including the individual coefficients of inbreeding as an independent variable in the model. The parameters of interest in the modified models are estimated by the 6,, !y and e Q effects. The average, linear effects associated with the different allelic gene regions are estimated by the 6p effects. The total linear contribution to any particular genotype may be calculated by adding together the values of the < * )p estimates for each of the two gene regions constituting the genotype. The 6p estimates will reflect the additive, single gene associated effects. The general effect of heterozygosity is estimated by the q effect. The estimate reflects the average deviation from the 6p determined genotype in heterozygous individuals and is influenced by the general dominance interaction between the different gene regions and by the general deviation from linearity caused by epistasis. In addition, any average effect of the investigated gene serving as a marlcer for general heterozygosity will be included. As pointed out earlier, this may be checked separately. The E q estimates are influenced by any specific combining effects in the different heterozygous combinations of the investigated gene, including specific dominance and epistatic interactions involving the two gene regions.

AN EXAMPLE OF THE MODELS IN USE
The ma,jor histocompatibility complex (MHC) in birds and mammals is a cluster of linked genes coding for major cell surface antigens and is known as the B complex in chiclcens. MHC associated effects have been shown on resistance to certain diseases and on immune responsiveness. The association between the MHC gene region and several productivity traits in laying hens was investigated in an experiment at the Agricultural University of Norway. The MHC genotypes of the experimental birds were determined by serological typing at the Institute of Experimental Immunology in Copenhagen according to Simonsen et al (1982).

MATERIALS AND METHODS
The experiment was started by mating individuals with heterozygous combinations of the B13, B19 and B21 gene regions (MHC haplotypes). The birds were taken from a randomly mated control population (L i ) and from a selection line for increased egg weight body to weight ratio (L z ). The selection experiment has been described by holstad (1980). The number of parents in the base population (r in Model 1) was 28 in L l and 80 in L z but since each dam was mated to only one sire, the number of ancestors in Model 1 may be reduced to the number of dams which was 21 in L l and 63 in L 2 . The mating procedure was repeated with heterozygous individuals from the first and the second generation of experimental birds to produce three non-overlapping generations contributing to the experiment (f l = 3 in all models).
No cross-mating between L l and L 2 was allowed. The design resulted in a mixture of individuals carrying all possible combinations of the 3 MHC haplotypes: G, = B13/813, G 2 = B19/BI9, G 3 = B21/B21, G 4 = B13/B19, G 5 = B13/B21 and G 6 = B19/B21 ( f z = 6 in all models) and varying fractions of ancestor gene contributions crosslinking the MHC genotypes. The total number of birds in the experiment (n in all models) was 321 for L l and 505 for L 2 -In model 2, the relationship matrix (A) was generated by including individuals without records in the experimental generations and all common ancestors in the last 3 generations prior to the experiment. The total number of birds in the extended A (n' in Model 2) was 636 in L l and 761 in L a . The full stored coefficient matrix in Model 2 was solved by Gauss-Seidel iteration. The solutions were considered converged when the average value of the product 1'A-l u was < 0.001. The program picks some generalized inverse of the coefficient matrix in the iteration (Smith, 1982).
One of the productivity traits recorded was the laying intensity during the period from the start of laying until 58 weeks eggs of age, measured as the number of eggs laid per 100 days. The association between laying intensity and MHC genotypes was analysed according to Model 1 and Model 2 for both 1 1 and L z .
The sensitivity of Model 2 to changes in the heritability parameter input was checked by applying 5 different values of the parameter to the investigated material (h 2 = 0.1, 0.3, 0.5, 0.7 and 0.9). The &dquo;MHC free&dquo; heritability was estimated from REML variance components in each of the 2 lines separately according to Model 2, as described earlier.

RESULTS AND DISCUSSION
The least squares means of laying intensity for the fixed effects of NIHC genotypes (G j ), according to Model 0 and Model 1 and according to Model 2 over the entire heritability scale are shown in figure 1 for L l and figure 2 for L 2 .
The G j effects according to Model 2 at h z = 0 are by definition equal to the G j effects according to Model 0. In order to compare the results from Model 2 with the other models, a &dquo;MHC free&dquo; heritability value must be chosen. The &dquo;MHC free&dquo; heritability estimates indicated in the figures were based on the REML variance components shown in table I. Several points may be made from the results shown in the figures.
-Considerable confounding effects were demonstrated between the MHC genotype and the remaining additive genotype, especially in L i . Introduction of the &dquo;1VIHC free&dquo; additive genotype in the model affected both the rank and the magnitude of the NIHC associated effects (G j ). The significance of the ranking of the G j effects according to the different models is shown in table II.
Applying Model 0 to the present material would lead to false conclusions according to table II.
-The agreement between the G j solutions according to Model 1 and Model 2 was quite good. A non-significant re-ranking was indicated for G 4 vs G 6 in L l and for G 3 vs G S in L 2 (table II). As pointed out earlier, Model 1 utilizes the realized additive genetic effect of each ancestor genotype to obtain the 1 3m estimates, while Model 2 assumes an average heritability for the entire material. Since MHC genotypes were not equally distributed over ancestor lines, this may cause some differences between the 2 models in the G j estimates. The standard errors of the contrasts between the G j effects in pairwise comparisons with G 6 are shown in table III.
The G j contrasts were estimated with a higher level of accuracy under Model 2 than under Model 1 (table III). This did not affect the significance of the ranking of the G j effects in the present material when the significance level was fixed at 0.01 (table II). However, due to the different precision of the two models, Model 2 may discriminate between G j effects which will not differ significantly under Model 1.
The sensitivity of the G j solutions according to Model 2 to changes in the heritability input parameter appeared to be moderate. In the present material, the stability of the G j estimates seemed to be quite acceptable if the heritability input was varied within an interval of 0.2, at least for heritability estimates > 0.1. Good a priori estimates are available for the total heritability of laying intensity (not &dquo;MHC free&dquo;) in a related material (Kolstad, 1980). The heritability was reported to be 0.37. As may be seen from figures 1 and 2, applying this heritability to the present material would not change the conclusions on the G j effects very much. The rank of the MHC genotypes (G j ) was different in the 2 lines. The lines originated from a common synthetic population formed in 1973 (Liljedhal et al, 1979) and had only been separated during 7 generations when the present experiment was started. The genetic relationship between the 2 lines was consequently quite close.
The MHC associated effects may still be different in the two lines if the effects were caused by distant linkage between the MHC gene region and genes affecting laying intensity. The observed effects would then be temporary effects because of linkage breakdown. The increased mangnitude and significance of the MHC associated effects in L l compared to L 2 is consistent with the linkage hypothesis, since the smaller number of base populations ancestors in L l (28 compared to 80) increases the probability of linkage disequilibrium.