Genetic variation of traits measured in several environments. I. Estimation and testing of homogeneous genetic and intra-class correlations between environments

Variation genetique de caracteres mesures dans plusieurs milieux. I. Estimation et test d'homogeneite des correlations genetiques et intra-classe entre milieux. Cet article etudie les problemes d'estimation des composantes familiales de (co)variance entre milieux et les problemes de test d'homogeneite, soit des correlations genetiques entre milieux seules, soit des correlations genetiques et des correlations intra-classe entre milieux. Les procedures de test reposent sur le rapport de vraisemblances restreintes mazimisees sous les modeles reduits (les differentes hypotheses d'homogeneite) et le modele sature. Un algorithme iteratif d'esperance-maximisation (EM) est propose pour calculer les estimations du maximum de vraisemblance restreinte (REML) des composantes residuelles et familiales de variance-covariance. Les formules EM s'appliquent au modele multicaractere pour le modele sature et a des modeles lineaires univaries pour les modeles reduits. Les formules EM garantissent l'appartenance des composantes de (co)variance estimees a l'espace des parametres. Les procedures presentees dans cet article sont illustrees par l'analyse de 5 caracteres vegetatifs et reproductifs mesures lors d'une experience portant sur 20 familles de pleins freres testees dans 3 milieux differents chez la minette (Medicago lupulina L).


INTRODUCTION
Hypothesis testing of genetic parameters is of great concern when analyzing genotype x environment interaction experiments. For instance, Visscher (1992) investigated the statistical power of balanced sire x environment designs for detecting heterogeneity of phenotypic variance and intra-class correlation between environments. He assumed that the between-family correlation (henceforth referred to as 'genetic correlation') between environments was equal to 1 and consequently heterogeneity of variance components was only due to scaling. This assumption was relaxed by Foulley et al (1994), who considered estimation and testing procedures for homogeneous components of (co)variance between environments. In some cases, it may also be interesting to test less restrictive hypotheses, eg, constant genetic correlations between environments, and constant genetic and intra-class correlations between environments. The objective of this paper is to address this issue and to show how heteroskedastic linear mixed models can be useful for this objective.

THEORY AND METHODS
The saturated model Let us assume that records are generated from a cross-classified layout. We will consider as in Falconer (1952) that expressions of the trait in different environments are those of genetically correlated traits, thus resulting in the following 'genotype x environment' multiple trait linear model: where yZ!x is the performance of the kth individual (k = 1, 2, ... , n) of the jth family ( j = 1, 2, ... , s) evaluated in the ith environment (i = 1, 2, ... , p); bi! is the random effect of the jth family in the ith environment, assumed normally distributed such that Var(b2!) _ !8., Cov(6!,6,'j) = 0'!,,, for i -¡. i' and Cov(b2!, bi!!!) = 0 for j ! j' and any i and i'; and e jk is a residual effect pertaining to the kth individual in the subclass ij, assumed normally and independently distributed with mean 0 and variance o,2 . Using vector notation, ie Yjk = {y2!x}, !! = I lLil , bj = {6:j} and e!x = (eg k ) for i = 1, 2, ... , p, the model [1] can alternatively be written as: y jk = w + b j + e!x, where b j -N(0, E B ) and e jk -N(0, Eyv) with E B = {a'8!!, I representing the (p x p) matrix of between-family components of variance and covariance between environments and E w = diag{ (J!i} } for the (p x p) diagonal matrix of residual components of variance. The null hypothesis (H o ) considered here consists of assuming homogeneous genetic correlation coefficients p jj , = (0'! / (J Bi (J Bi') between environments ( /9 n ' = P , Vi,i' and i 54 i') without making any assumption about the residual variances E, = diagf o, e i 2 1. Until now, we were unable to solve the problem of estimating the corresponding parameters by maximum likelihood (ML) procedures under the multiple trait approach in [1] even for balanced cross-classified designs ). An alternative is to tackle this issue via the concept of equivalent models (Henderson, 1984). Actually, an equivalent model to [1] under H o and restricted to p > 0 can be written using the following 2-way univariate mixed model with interaction: where p, is the mean, h i is the fixed effect of the ith environment; Us!S! is the random family j contribution such that s; rv NID(0,1) and a£ is the family variance for records in the ith environment; À( J Si hsj j is the random family x environment interaction effect such that hsij rv NID(0,1) and À2(J;i is the interaction variance for records in the ith environment; and e ijk is the residual effect assumed NID(O, Q e. These are met given the following 3 one-to-one relationships: H o : constant genetic and intra-class correlations between environments In this part, the null hypothesis (H o ) consists of assuming homogeneous genetic and intra-class correlations between environments (ie, p;!, = a H ii, !!B!!B!, = P and t = o, 2 i l(g2 + afvi) = t Vi, i f and I # i'). The variance covariance structure of the residual is always assumed to be diagonal and heteroskedastic (E, = diagfol e i 1).
As in the case of the above hypothesis of constant genetic correlation between environments only, an equivalent model to [1] under H o and restricted to p > 0 can be written as: where p and h i are the mean and the fixed effects of the ith environment respectively; '7'o'e,.s! is the random family j effect such that 8 * -NID(0,1) and IT 2 a2 is the family variance in the ith environment; WQe!hs ! is the random family x environment interaction effect such that hsgj -NID(0,1) and W 2 U e. is the interaction variance in the ith environment and e2!k is the residual effect assumed NID(0, U ' i ). In the same way, the relationships between models (1] under H o (and for p > 0) and [4] are: Notice that under the univariate model [4], the null hypothesis is tantamount to assuming constant ( I r = Q s. /a2 ; c.!2 = ol 2.,i / a;,) ratios of variances between environments. &dquo; Testing procedure The theory of the likelihood ratio test (LRT) can be applied as previously proposed by Foulley et al (1990,1992), Shaw (1991)  Since the parameters involved here are variance components, the LRT that has desirable asymptotic properties is applied using restricted maximum likelihood (REML) rather than ML estimators (Patterson and Thompson, 1971;Harville, 1974 where y i is a (n 2 x 1) vector of observations in environment i; )3 is a (p x 1) vector of fixed effects with incidence matrix X i ; ui = fs*l and U2 = Ihs!.1 are 2 independent random normal components of the model (in this case, family and interaction effects respectively) with incidence matrices for standardized effects Z l i and Z 2 i respectively; a u , and ( Jei being the u-component and residual components of variances respectively, pertaining to stratum i, and e i is the vector of residuals for stratum i assumed N(O, o, ei 2 I. i ).
The 'expectation-maximization' (EM) approach is a very efficient concept in ML estimation (Dempster et at, 1977) and this algorithm is frequently advocated for estimating variance components in linear models (Quaas, 1992). The generalized EM procedure to compute REML estimators of dispersion parameters, as described by Foulley and Quaas (1994) for one-way heteroskedastic mixed models, can be applied here. Letting u * = (ui!,u2')', 2 = fo,2i 1, U2 = fol 1, y i = (0,2&dquo; 0,2&dquo; A)/ app Ie ere. e Ing u = 1 2 u = u e = ei Yl = u e A and T2 = (-r, w, o,, 21 )' being the 2 sets of estimable parameters for the models [7] and [8] respectively (later on denoted as y = y l or y = y 2 ), the E step consists of computing the function Q(Yly[t]) = 17&dquo; [lnp(yll3, u * ,y) where the expectation between brackets is taken with respect to the distribution of j3, u * given y and y = Y l t l, y[ t ] being the current estimate of y at iteration !t!. The M step consists of selecting the next value y [t+1] of y by maximizing Q(yly [t] ) with respect to y. This EM-REML algorithm can also be derived using Bayesian arguments (Foulley et at, 1987;Foulley and Gianola, 1989 (Zangwill, 1969 (1994).

ILLUSTRATION
The procedures presented in this paper are illustrated with the analysis of an experiment carried out on 20 full-sib families of black medic (Medicago lupulina L) tested in 3 different environments (harvesting, control and competition treatments). The experimental design was described in detail by H6bert (1991). There were 2 replicates per environment and the 20 genotypes were randomly allocated to each replicate . As an illustrative example, we consider 5 vegetative and reproductive traits out of the 36 traits which have been recorded. Table I presents the estimation of genetic and residual parameters under the saturated model. Table II presents the result of the estimation of (co)variance components under the reduced (hypothesis of homogeneity of genetic correlations between environments) model and the likelihood ratio test of this reduced model against the saturated model. Similarly, table III presents similar results but in which the reduced model considered represents the hypothesis of homogeneity of genetic and intra-class correlations between environments. Table III also presents the likelihood ratio test of the reduced model (H o : homogeneity of genetic and intraclass correlations between environments) against the reduced model of table II (H i : homogeneity of genetic correlations between environments only).
Convergence of the EM-REML procedure was measured as the norm of the vector of changes in genetic parameters between iterations. A norm less than 10-6 was obtained after 150 iterations (the number of inner iterations was only one) and the computing time was less than 10 CPU seconds per trait (on an IBM 3090-17T computer).
The results in table II suggest that differences among genetic correlations are not statistically significant (except perhaps for trait [4] with P-value of 0.07). P-values for vegetative and reproductive yields traits represented here by traits [1], [2] and [3] were very high, indicating a lack of heterogeneity in genetic correlations between environments. It seems that the overall correlation under the reduced model (table II) is much larger than a simple average of the 3 estimates under the saturated model. These results are due to one pair of environments with a genetic correlation of 0.99, which pushes the overall correlation also to 0.99. In table III (tests 1 or 2), P-values also indicate that there are no significant differences between ratios of variances between environments, indicating a homogeneity in genetic and intra-class variation between environments.
It can be concluded that the harvesting and competition environments do not generate a meaningful level of stress as compared to the control environment for the expression of genetic and intra-class variation of all traits analyzed. These results can be due to the small sample size (only 40 records per environment). Since genetic correlations between environments were very high and close to one, it is interesting to test for these traits the assumption of these correlations being equal to one. We have thus tested the model under the hypothesis of constant genetic correlations and equal to one (according to the procedure described in ) against the reduced model (hypothesis of homogeneity of genetic correlations). P-values for all traits analyzed (except for trait [2] where the P-value was equal to 0.1) were very high and indicated that these correlations did not differ from one.

DISCUSSION AND CONCLUSION
This paper clearly illustrates the value of univariate heteroskedastic models (Foulley et al, 1990Gianola et al, 1992;San Cristobal et al, 1993) to tackle problems of estimation and hypothesis testing of genetic parameters arising in genotype x environment data structures. It was shown that under each null hypothesis, constant genetic correlations between environments and constant genetic and intraclass correlations between environments, multiple trait and univariate linear models generated the same number of estimable parameters and that there were one-toone relationships between both models. However, it should be noticed that strictly speaking the univariate linear model under H o (either hypothesis) is defined only under p > 0 because negative variances are by definition not possible. Caution must thus be exercised in applying the univariate linear model as an equivalent multiple trait linear model. This last model is obviously more flexible, as previously pointed out by Mallard et al (1983).
The EM algorithm seems a natural choice for the estimation of variance components in univariate linear models but methods other than EM (ECME, Liu and Rubin, 1994; Newton Raphson; quasi-Newton method based on average information, Johnson and Thompson, 1994;derivative-free, Meyer, 1989) can be used to solve this problem. The EM-REML approach presented in this paper is quite flexible. It can accommodate any structure of fixed effects and nondiagonal patterns of the variance-covariance matrices of ui and u2, Var(ui) = A l and Var(u2) = A 2 , ie for the particular model in [2] (Foulley and Henderson, 1989) Var(s * ) = Ao, 2 Var(hs * ) = Ip p (9 A(J!8i with (J!8i = A2U2 , Si s * = {sj}, hs * = (hsgj) and A is the additive genetic relationship matrix.
Evidently, the approaches presented in this paper apply to an unbalanced structure of data and to additional nuisance fixed effects cross-classified with family effects, using the formulae defined in [12abc] and [13abc]. These algorithms can also be utilized for the homoskedastic case by just taking i equal to 1 in the previous formulae. This means that several EM-REML algorithms are presently available to calculate REML estimates of variance components under the standard homoskedastic linear model: (i) the classical EM algorithm based on sufficient statistics; (ii) the related EM of EM-type algorithms (Henderson, 1973;Harville, 1974;Callanan, 1985); and (iii) the generalized EM algorithms proposed by  for models parameterized either with variance components or as in this paper. But additional work is needed to compare the performance of these different algorithms.
Finally, the null hypothesis of constant intra-class correlations without making any assumption on genetic correlations between environments remains to be considered. This problem requires a special treatment as far as the parameterization of the model is concerned and will be reported in a separate article.