Genetic variation of traits measured in several environments. II. Inference on between-environment homogeneity of intra-class correlations

Variation genetique de caracteres mesures dans plusieurs milieux. II. Inference relative a des correlations intra-classe constantes entre milieux. Cet article decrit une approche permettant d'estimer les composantes de variance-covariance entre milieux dans le cas de correlation intra-classe homogenes entre milieux, sans faire d'hypothese sur les correlations genetiques entre milieux pris 2 a 2. Un algorithme iteratif d'esperance-maximisation (EM), comparable a celui decrit par Foulley et Quaas (1994), est propose pour calculer les estimations du maximum de vraisemblance restreinte (REML) des composantes residuelles et familiales de variance covariance. Trois parametrisations differentes (coordonnees cartesiennes, polaires et spheriques) sont proposees pour calculer les estimateurs EM-REML sous le modele reduit (les correlations intra-classe sont supposees toutes egales a une meme constante). Cette procedure est illustree par l'analyse de donnees simulees.


INTRODUCTION
Statistical procedures based on the theory of the generalized likelihood ratio, previously proposed by , Shaw (1991) and Visscher (1992), have been applied to test the homogeneity of genetic and phenotypic parameters against Falconer's (1952) saturated model. In particular, Robert et al (1995) have described a procedure for estimating components of variance and covariance between environments and for testing the homogeneity of the following parameters: (a) a constant genetic correlation between environments; and (b) constant genetic and intra-class correlations between environments.
The objective of this article is to present a procedure for dealing with homogeneous intra-class correlations among environments without making any assumption about the genetic correlations between environments. The method is based on restricted maximum likelihood estimators (REML) and on a generalized expectation-maximization (EM) algorithms as proposed initially by  for heteroskedastic univariate linear models. Three parameterizations of variance-covariance components are suggested for solving this problem. A simulated example is presented to illustrate this procedure. THEORY A model often used to deal with genotypic variation in different environments is the 2-way crossed genotype (random) x environment (fixed) linear model with interaction. In particular, this model has been proposed as an alternative to a multiple-trait approach when variance and covariance components are homogeneous and genetic correlations between environments are positive (Foulley and Henderson, 1989). It has also been employed by Visscher (1992) to study the power of likelihood ratio tests for heterogeneity of intra-class correlations between environments when genetic correlations among them are assumed equal to unity. The aim of this paper is to go one step further in addressing the same problem with the same model but with a heterogeneous structure of variance-covariance components.
The full model Let us assume that records are generated from a cross-classified layout. The model is defined as follows: where It is the mean, h i is the fixed effect of the ith environment: a Si sj is the random family j contribution such that s! ! NID(0,1) and Q s v is the family variance for records in the ith environment; 0'!;!!, is the random family x environment interaction effect such that hsg, -NID(0, 1) and 0'2h . ,. is the interaction variance for records in the ith environment; e2!,! is the residual effect assumed NID(0, a; i ) ' Remember that this model has been extensively used in factor analysis of psychological data (Lawley and Maxwell, 1963). Model [1] can be written more generally using matrix notation as: where Yi is a (n 2 x 1) vector of observations in environment i; 13 is a (p x 1) vector of fixed effects with incidence matrix X i ; ui = (s) ) and u2 = {h,s ! } are 2 independent random normal components of the model with incidence matrices for standardized effects Zit and Z 2i respectively; cr! ! and Q u 2 ,. are the corresponding components of variance, pertaining to stratum i and e i is the vector of residuals for stratum i assumed N( 0 , a f l, In, ) .

The reduced model
The null hypothesis (H o ) consists of assuming homogeneous intra-class correlations between environments (ie, d i, ti = (a;i +a!8i) / (!9!+!hsi+!e!) = t). The variancecovariance structure of the residual is assumed to be diagonal and heteroskedastic.
Under model [I], this hypothesis is tantamount to assuming a constant ratio of variances between environments: V i, afl / (as. + a!8i) = 8 2 , where 8 is a constant. Under this hypothesis, 3 different parameterizations will be considered to solve this problem. An EM-REML algorithm A generalized expectation-maximization (EM) algorithm to compute REML estimators is applied . As in Robert et al (1995) and for heteroskedastic mixed models, the function to be maximized is: where y is the set of estimable parameters for each of the 3 models (under each parameterization considered). Ei l [.] represents the conditional expectation taken with respect to the distribution of fixed and random effects given the data vector and y = y[ t ]. Ei l (.! can be expressed as a function of bilinear forms and a trace of parts of the inverse coefficient matrix of the mixed-model equations (as described in . So, for each parameterization, we derive function [3] with respect to each parameter of y and we solve the resulting system 8Q(Yly[t]) / 9 y = 0. After some algebra and using the method of 'cyclic ascent' (Zangwill, 1969), we obtain the 3 following algorithms. with: . a!t,!+1! is the solution of the equation ,!!t't+1! = tan!(a!-'+!/2) where xi'!!U is the only positive root of the cubic equation: with: The convergence of the EM-REML procedure is measured as the norm of the vector of changes in variance-covariance components between iterations. In our simulation and for the 3 parameterizations, convergence is assumed when the norm is less than 10-6 . In practice, the number of inner iterations is reduced to only one in the method of 'cyclic ascent'. The algebraic solution of quadratic, cubic or quartic equations, using the discriminant method, demonstrates that each time only one root is possible in the parameter space. In the simulated example, the polar parameterization converged the fastest. Testing procedure Let L(y; y) be the log-restricted likelihood, F be the complete parameter space and r o a subset of it pertaining to the null hypothesis H o . H o is rejected at the level a if the statistic ((y) = 2Max r L(y; y) -2Maxr o L(y; y) exceeds (o where ( 0 corresponds to Pr[X2 r , > ( o] = a ( X2 is the chi-square distribution with r degrees of freedom given by difference between the number of parameters estimated under the full and the reduced models). Formulae to evaluate -2MaxL(y; y) can easily be made explicit: where B is the coefficient matrix of the mixed-model equations.

NUMERICAL EXAMPLE
This procedure is illustrated from a hypothetical data set corresponding to a balanced, crossed design with 3 environments, 20 families per environment and 50 replicates per family (p = 3, s = 20 and n = 50). The 20 families were randomized within each environment. Basic ANOVA statistics for the betweenfamily and within-family sums of squares and cross-products are given in table I. Table II presents the estimation of genetic and residual parameters under the full and reduced (hypothesis of a constant intra-class correlation between environments) models respectively, and the likelihood ratio test of the reduced model against the full model. The P values in table II indicate that there are no significant differences between intra-class correlations. DISCUSSION AND CONCLUSION In this paper, estimation and testing of homogeneity of intra-class correlations among environments have been studied with heteroskedastic univariate linear models. Another possible approach to account for 'genotype x environment' effects would be to consider the multiple-trait linear approach, defined by Falconer (1952).
As described hereafter, these 2 approaches may or may not be equivalent. In this discussion, the conditions required to have equivalence between the multiple-trait and the univariate linear models will be established.
In Falconer's approach, expressions of the trait in different environments (i, i') are those of 2 genetically correlated traits, with a coefficient of correlation d(i, i'), By definition, or 2 Si and a!8i are positive parameters, so the following relation must be satisfied: It is worth noticing that the condition in [6] means that the partial genetic correlation between any pair ( j, k) of environments for environments i fixed is also positive.
The problem of testing homogeneity of intra-class correlations between environments was finally solved under 3 different assumptions about the genetic correlations between environments: equal to one (Visscher, 1992); constant and positive (Robert et al, 1995); and just positive (this work).
For more than 3 traits, model [1] is no longer equivalent to the multiple trait approach of Falconer. As a matter of fact, it generates fewer parameters than !4!, 2p vs p(p + 1)!2 for [1] and [4] respectively. This parsimony might be an interesting feature, because the difference in numbers of parameters increases with the number of traits considered (eg, 10 vs 15 parameters for 5 traits). Comparison of approaches on real genetic evaluation problems such as sire evaluation of dairy cattle in several countries would be of great interest.