Heterogeneity of variance for type traits in the Montbeliarde cattle breed

Heterogeneite des variances de caractere d'animaux de race Montbeliarde. Cet article presente et discute l'estimation des composantes de (co)variance (genetiques et residuelles) de caracteres de conformation mesures entre milieux en situation d'heteroscedasticite. Des tests d'homogeneite de cretains parametrees (correlations genetiques entre milieux constantes ou egales a 1, correlations genetiques egales a 1 et correlations intra-classes constantes, homogeneite des variances-covariances genetiques et residuelles) interesant les geneticiens sont egalement presentes. Ces hypotheses sont decrites par un modele pere, unidimensionnel heteroscedastique prenant en compqte les interactions genotype x milieu. Un algorithme iteratif d'esperance-maximisation (EM) est propose pour calculer les estimees du maximum de vraissemblance restreinte (REML) des composantes residuelles et genetiques de variance-covariance. Un test de rapport des vraisemblances restreintes est presente pour tester les differentes hypotheses considerees. Les procedures developpees sont utilisees pour l'analyse des notes de pointage de quelques caracteres de morphologie de 24 301 performances d'animaux de race Montbeliarde issus de 528 peres. Sur l'ensemble des variables analysees, differentes sources (stade de lactation, pointeurs, type de logement) d'heterogeneite des variances genetiques et residuelles ont ete mises en evidence mais en general l'heritabilite du caractere reste constante d'un miiieu a l'autre.

intra-class correlations between environments of type traits remained generally constant. heteroskedasticity / mixed model / genotype x environment interaction / EM algorithm / REML estimation Résumé -Hétérogénéité des variances de caractère d'animaux de race Montbéliarde.
Cet article présente et discute l'estimation des composantes de (co)variance (génétiques et résiduelles) de caractères de conformation mesurés entre milieu! en situation d'hétéroscédasticité. Des tests d'homogénéité de certains paramètres (corrélations génétiques entre milieux constantes ou égales à 1, corrélations génétiques égales à 1 et corrélations intra-classes constantes, homogénéité des variances-covariances génétiques et résiduelles) intéressant les généticiens sont également présentés. Ces hypothèses sont décrites par un modèle père, unidimensionnel hétéroscédastique prenant en compte les interactions génotype x milieu. Un algorithme itératif d'espérance-maximisation (EM) est proposé pour calculer les estimées du maximum de vraisemblance restreinte (REML) des composantes résiduelles et génétiques de variance-covariance. Un test de rapport des vraisemblances restreintes est présenté pour tester les différentes hypothèses considérées. Les procédures développées sont utilisées pour l'analyse des notes de pointage de quelques caractères de INTRODUCTION In many countries breeding values of dairy cattle are estimated using BLUP (best linear unbiased prediction, Henderson, 1973) methodology after estimating variance components via REML (restricted maximum likelihood, Patterson and Thompson, 1971). An important assumption in most models of genetic evaluation (in particular BLUP) is that variance components associated with random effects are constant throughout the support of the distribution of the records. However, the existence of heterogeneous variances for milk production and other traits of economic importance in cattle has been firmly established and well-documented (eg, for milk yield in dairy cattle : Everett et al, 1982;Hill et al, 1983;Van Vleck, 1987;Meinert et al, 1988;Visscher et al, 1991;Weigel, 1992; for growth performance in beef cattle: Garrick et al, 1989). But research on heterogeneous variance associated with conformation traits has been somewhat limited (Mansour et al, 1982;Smothers et al, 1988). Some studies (Smothers et al, 1988(Smothers et al, , 1993Sorensen et al, 1985) showed that sire and residual variances for final type score decreased as herd average increased but heritability remained constant. A number of possible causes for the heterogeneity of variance components has been suggested, including a positive relationship between herd means and variances, differences across geographical regions, changes over time and various herd management characteristics. This heterogeneity of variances can be due to many factors, eg, management factors (feedstuffs, type of housing), genotype x environment interactions, segregating major genes, preferential treatments (Visscher et al, 1991). If this phenomenon is not properly taken into account, differences in within-subclass variances can result in biased breeding value predictions, disproportionate numbers of animals selected from environments with different variances and reduced genetic progress (Hill, 1984;Gianola, 1986;Vinson, 1987;Winkelman and Schaeffer, 1988;Weigel, 1992; Meuwissen and Van der Werf, 1993). To overcome this problem, one possibility is to take heteroskedasticity into account in the statistical model. In particular, potential factors (regions, herds, years, etc) of variance heterogeneity can be identified and they can be tested as meaningful sources of variation of variances (Foulley et al, 1990San Cristobal et al, 1993). The objective of this paper is to present a statistical approach for identifying sources of variation (genetic and residual) of variances, find an appropriate model which takes into account this heteroskedasticity, and to illustrate such an approach in the analysis of conformation traits in the Montbeliarde cattle breed. A completely heteroskedastic univariate mixed model allowing for genotype x environment interaction is used to identify various management factors associated with differences in genetic and residual variance components. In particular, sire models with different, simpler assumptions on genetic parameters (constant genetic correlation and/or constant heritability) in heteroskedastic situations are described and tested using the restricted likelihood ratio statistic. The estimation of parameters for each model is based on the REML method using an EM algorithm. The objective here is not to analyze all type traits and all factors available in the data file but to illustrate the implementation of the methodology developed on a large data set. Only four type traits of the Montbeliarde breed are described and analyzed with three potential factors of heterogeneity. Finally, results of heterogeneity of variances detected on these four type traits are presented and discussed.

Data
Sires of the Montbeliarde cattle breed are routinely evaluated for several type traits measured on their progeny, using best linear unbiased prediction applied to an animal model (Interbull, 1996). Most cows are scored during their first lactation. Type traits are measured or scored on a linear scale from one to nine. For each animal, age at calving, stage of lactation at classification, year of classification, type of housing and main type of feedstuffs are available. The file analyzed included cows scored between September 1988 and August 1994 by technicians from AI cooperatives or from the 'Institut de 1'elevage'. The data analyzed included performance records on 24 301 progeny of 528 sires scored for 28 type traits. Each sire had at least 40 recorded daughters (414 sires) and each classifier had scored at least 15 cows.
Only four traits were analyzed and these were: one measured variable (height at sacrum) and three subjectively scored type traits. The latter consisted of two general appraisal scores of parts of the animal, one with high heritability (h 2 = 0.47, udder overall score) and one with low heritability (h 2 = 0.18, leg overall score) and rear udder height. The means and standard deviations for each trait analyzed are presented in table I. It is suspected that some of the factors described in table II may induce heterogeneous variances. For example, scores given by different classifiers are expected to have not only different means but also different variances. Therefore, a mixed linear model with the usual assumption of homogeneous variances may be inadequate.
The subjective nature of several traits (leg overall score or rear udder height) and the variability of scores caused by some factors (type of housing, stage of lactation) lead to suspect heterogeneity of scores and as a consequence heterogeneity of variances. In this paper, each variable is analyzed separately for each potential factor of variation with a mixed model. For computational reasons, a sire model with heterogeneous variances is preferred to the animal model used in routine genetic evaluations. In order to improve the quality of the genetic evaluation, the methodology developed in Foulley et al (1990) and Robert et al (1995a, b) is used to detect potential sources of heterogeneity of variance for conformation traits. The hypotheses of interest to be tested are the hypotheses of constant genetic correlations and/or constant heritability between levels of factors of heterogeneity.
Each factor of heterogeneity is studied separately, one at a time, assuming that it is the only possible source of heterogeneity. Variance components and sire transmitting abilities are estimated applying classical procedures, ie, REML and BLUP (Patterson and Thompson, 1971;Henderson, 1973) using a mixed model including the random sire effect and the set of fixed effects described in table II. The factors assumed to generate heterogeneity of variances and considered in the present analysis are stage of lactation (8 levels), classifier (21 levels) and type of housing (3 levels).
Because the factor 'classifier' has many levels (21) and the number of records for some classifiers is small, some classifiers are grouped into classes for reasons related to computational feasability. A preliminary analysis using a completely heteroskedastic sire model was performed assuming that the factor 'classifier' was the source of heterogeneity. On the basis of the estimated variance components, four homogeneous groups (classifiers with similar means and standard deviations were grouped) of classifiers were created. The problems related to grouping levels of a factor will be discussed later on.

Models
In each analysis and for each model, one variable (type trait) and one potential factor of variation are considered at a time. Following the notation of Robert et al (1995a), the population is assumed to be stratified into p subpopulations or strata (indexed by i = 1, 2, ... , p) representing each level of the source of variation. For each factor suspected to generate heterogeneity of variances, dispersion parameters of each type trait are estimated under the following five models.

Model a
Data are analyzed using a univariate heteroskedastic sire model with genotype x environment interaction (Robert et al, 1995b). In matrix notation, this model can be written as: where y i is the vector (n i x 1) of observations in subclass i of the factor of heterogeneity considered (i = 1, ...,p), (3 is the (p x 1) vector of fixed effects with associated incidence matrix X i , ui = (s)) and u2 = f hs!2!i} (j = 1, ..., s; s = 528) are two independent random normal components of the model with incidence matrices Z li and Z 2i , respectively, and with variance-covariance matrices equal to A and Ip 0 A, respectively. s* is the random effect of sire j such that s) -NID(0,1) and hs!2!! is the random sire x environment interaction such that h S(ij) N NID(0, 1). e i is the vector of residuals for stratum i assumed N(0, afl, I n , ) . 21ila u 22i and u 2 are the corresponding components of variance pertaining to stratum i. The sires are related via the numerator relationship matrix A (of rank s). For instance, different environments i represent different stages of lactation.
Fixed effects (in 13) can be continuous or discrete covariates but without loss of generality it is assumed here that they represent factors (discrete variables). The fixed effects included in the model are age at calving, stage of lactation, class of milk production of the herd (this effect characterizes the production level of the herd) and classifier. All these effects are considered within year of classification.

Model b
Model under the hypothesis of homogeneity of genetic correlations between environments (for all i and i' , p ;; , = !!1'!ul!! = p). This model Vl-g-.2-, 2 2 2 Uii 1 ! + a!2i V o 1 , 1 + a!2i' defined in Robert et al (1995a) can be written as: where the genetic correlation is p = !2 and A is a positive scalar. Under I + A this hypothesis, the interaction variance is proportional to the sire variance: !2 u2c = ,B 2 a 2 .

Model c
Model under the hypothesis that all genetic correlations are equal to one (p = 1). This hypothesis is tantamount to a heteroskedastic model without any genotype x environment interaction. This completely additive heteroskedastic model can be written as: REML estimation using an EM algorithm To compute REML estimates, a generalized expectation-maximization (EM) algorithm is applied . The principle of this method is described by Dempster et al (1977). Because the method is presented in detail in Foulley and Quaas (1995) and in Robert et al (1995a, b), only a brief summary is given here.
Denote u * = (u l , L 1 2 ) , 0' 2 = {a 2 } w2 {a2 } 0'2 = { a 2 } and y = (U2 &dquo; (r 2 6e)!. For instance, y = 2,i 1 , 10, 2 ,i 1, 10 , 2 is the vector of genetic and residual parameters for the general heteroskedastic model (a). The application of the EM algorithm is based on the definition of a vector of complete data x (where x includes the data vector and the vectors of fixed and random effects of the model, except the residual effect) and on the definition of the corresponding likelihood function L(y; x). The E step consists of computing the where EJ . ' !.! is a condensed notation for a conditional expectation taken with respect to the distribution of x!y, y = y!t!.
The estimation procedure of genetic and residual parameters consists in determining, at each iteration of the EM algorithm, all conditional expectations of expressions [7] to [15]. E! t' (.) can be expressed as the sum of a quadratic form and of a trace of parts of the inverse coefficient matrix of the mixed model equations (as described in Foulley and Quaas, 1995). A numerical procedure which does not require the computation of the inverse of the coefficient matrix to obtain all traces required is presented in the Appendix. This numerical technique allows a considerable reduction of computing costs when the data set analyzed is very large. Standard errors of parameters were not directly provided with standard EM and their computations were too intensive (the data set was too large).
To summarize, the estimation of the genetic and residual parameters amounts to two basic iterative steps. Using starting values of these genetic and residual parameters ((T;!!], U2 [0] and (T!jO]), the first step consists in estimating fixed and random effects with the BLUP mixed model equations. Then, given these conditionally best linear unbiased estimators and predictors (BLUE and BLUP), the second step consists in computing genetic and residual parameters. Both steps are repeated until convergence of the EM algorithm.
Note that the size of the system of mixed model equations [equal to the total number of levels of fixed effects considered + number of sires * (1+ number of levels of the factor of heterogeneity considered)] is very large. Its solution cannot be found by direct inversion of the whole coefficient matrix of mixed model equations (C). The use of specific numerical techniques (storage of nonzero elements only, use of the procedure described in the Appendix to compute traces of products and use of a sparse matrix package FSPAK: Perez-Enciso et al, 1994) and the analysis of the particular structure of parts of the matrix C (whose number of nonzero elements is very small) enables one to minimize storage requirements and computing times.
The computing procedure and the numerical techniques used are described in detail in Robert (1996).
The iterative algorithm (EM) is simple but converges slowly. Convergence of the EM algorithm can be accelerated (Laird et al, 1987) by implementing an acceleration method for iterative solutions of linear systems: where 1'! is the ith estimable parameter of y (sire, interaction or residual variance) at iteration t, 1' i ew is the new parameter at iteration t after acceleration and R is the acceleration coefficient. This acceleration step should be applied only when the evolution of solutions from one iteration to the next becomes stable. The optimal frequency of these acceleration steps is not given by Laird et al (1987). In our application, acceleration was performed when 0.80 < R < 0.94 with: where p is the number of estimable parameters (Robert, 1996).
Programs were written in Fortran 77 and run on an IBM Rise 6000/590. The convergence criterion used for the EM-REML procedure was the norm of the vector of changes in variance-covariance components between two successive iterations.
Let y2t! be the vector representing the set of estimable components of variance at iteration !t!, the stopping rule was: Hypothesis testing An adequate modelling of heteroskedasticity in variance components requires a procedure for hypothesis testing. As proposed by Foulley et al (1990,1992), Shaw (1991) and Visscher (1992) Under Ho, ( is asymptotically distributed according to a xr with r degrees of freedom equal to the difference between the number of parameters estimated under models M l and M o , respectively. In the normal case, explicit calculation of -2MaxL(y; y) is analytically feasible (Searle, 1979): where Const is a constant and ((3,u1 2 ,u2 i ) are mixed model solutions for ((3, u!, u!). C is the coefficient matrix of the mixed model equations.
The main burden in the computation of -2L is to determine the value of InIC1. ] . But using results developed in Quaas (1992) and in the Appendix, this computation can be simplified where the l ii s are the diagonal elements of the Cholesky factor L of matrix C.
The hypothesis of genetic correlations between environments equal to one is a special case of the hypothesis of homogeneity of genetic correlations. This hypothesis (for all i and i', p z ;, = 1) is especially interesting because it is equivalent to the assumption of no interaction term, ie, A = 0. Some problems arise here because the null hypothesis sets the true value of one parameter (A) on the boundary of its parameter space (A = 0). The basic theory in this field was developed by Self and Liang (1987) and applications to variance components testing in mixed models have been discussed in Stram and Lee (1994). Contrasting models (b) and (c), ie, testing H o (!!1. ! 0 for all i and A = 0) against H l (!!1. ! 0 for all I and A # 0) corresponds to a situation which can be handled by referring to case 3 in Stram and Lee (1994).
In this case, the asymptotic distribution of the likelihood ratio statistic under hypothesis H o does not have a chi-squared distribution anymore but is a mixture of chi-squared distributions [!X6 + 2xi! with equal weight between the measure of Dirac in 0 (Mass one at zero, Kaufmann, 1965) and a x2 with one degree of freedom (Gourieroux and Montfort, Chap XXI, 1989). This means that the common procedure based on rejecting H o when the variation in -2L exceeds the value of a x 2 distribution with one degree of freedom and such that p( X2 1 > s) = a (a being the significant level) is too conservative; or in other words, the threshold s is too high. What is usually done in practice is to reject H o for a value of the chi-square such that p ( X2 > s) = 2a (and no longer a) when A > 0.

Preliminary analyses
Type trait records were categorical either because they represent subjective scores drawn from a limited list of possible values (one to nine) or because they resulted from a measure with limited precision. In this paper, the analysis of such traits was performed using a methodology designed for normally distributed random variables. Therefore, before any analysis of heterogeneity of variances, it seemed essential to study the distributions of the variables considered. In a first analysis, a fixed model with homogeneous residual variances was used to analyze the distributions of the residuals. On all variables analyzed, skewness and kurtosis coefficients of residuals were not close to theoretical coefficients for a normal distribution. Some usual tests [Kolmogorov test, Geary's and Pearson's tests (Morice, 1972)] were used to analyze the normality of the distributions and most of them rejected the hypothesis of normality. To make the distribution of the residuals of type trait scores closer to normal, original scores were transformed using a normal score transformation (Bartlett, 1947). Statistical results (means, standard deviations, skewness and kurtosis coefficients) on residuals are presented in table III. Although limited, some improvement toward normality was observed for the skewness and kurtosis coefficients. Transformed variables were used in the following analysis.

Sources of variance heterogeneity
For some combinations of factors and variables (for instance, type of housing and udder overall score), it would be unexpected that heterogeneity of variances exists and is detected. If this were the case, biological interpretation would be potentially difficult. However, all variables presented in table I were analyzed separately in combination with each potential factor of heterogeneity: stage of lactation, classifier and type of housing. For each variable and each factor of heterogeneity considered, components of variance under each model proposed [models (a) to (e)] were estimated. Tables IV, VI and VIII present results of restricted likelihood ratio test for each variable analyzed and results are presented by factor of heterogeneity considered. Tables V, VII and IX present estimates of genetic and residual parameters of variables obtained under the simplest acceptable model, ie, the final model accepted after all testing procedures.

Stage of lactation
Results of stage of lactation are presented in tables IV and V. Except for leg overall score, the analysis of stage of lactation has concluded that heterogeneity of variances existed for all variables analyzed but with constant genetic correlation between levels of factor of variation for rear udder height (model b), with genetic correlations equal to one for height at sacrum (model c) and with constant heritability also (model d) for udder overall score. For leg overall score, the homoskedastic model was accepted with a P-value equal to 0.20: variance in leg scores of animals seems to be homogeneous whatever the level of stage of lactation considered.
Stage of lactation seemed to have a considerable effect on variances for udder overall score and rear udder height (table V). Estimates of variance components computed under model (d) and model (b) for these variables respectively, showed a large variability of genetic and residual variances. For these variables, we could note that residual variances were more important in early lactation (levels 1 to 3, equal to about 1.4 and 1.0 for udder overall score and rear udder height, respectively) and then regularly decreased as stage of lactation increased (about 1.1 and 0.7 for level 8). A biological interpretation could be put forward: at the beginning of lactation, and especially for animals in first lactation, udders were filled up with milk. Therefore, possible defects (bad udder support, teats far apart, bad direction of teats) could be viewed more precisely and were strongly penalized. Then, as stage of lactation increased, the udder became more flexible and less full; defects were less obvious. Scores given by classifiers were less variable. There may also be an effect of culling bias (the extreme cows could be culled and so the variation was reduced).
For udder overall score describing udder in general, the genetic variances varied in the same way as the residual variances since heritability was found to be constant (equal to 0.47). The genetic correlation between levels of stage of lactation was found to be equal to one. But for rear udder height, the genetic correlation was equal to 0.97 and the hypothesis of constant heritability was rejected. An important change between estimates of parameters under models (a, b, c, d) and estimates under the homoskedastic model was observed, which explained why this last model was clearly rejected.
For height at sacrum, heteroskedasticity was surprising. This trait was objectively measured by classifiers. Therefore it was expected that stage of lactation had no impact on variances. Genetic and residual estimates of variance did not vary in a clear manner and the interpretation was less obvious. Note that the hypothesis of constant heritability was rejected with a high P-value (equal to 0.01) but the genetic correlation was equal to one. Furthermore, the value of the likelihood ratio statistic of model (e) against model (d) was clearly smaller than for udder overall score for instance (6 = 12.19 against 78.97, respectively).

Classifiers
Results of this factor are presented in tables VI and VII. The analysis of the factor 'classifier' leads to clear conclusions. The model finally accepted was the model assuming genetic correlations equal to one and constant heritability (model d) for udder overall score and height at sacrum. For leg overall score and rear udder height, only the hypothesis of constant genetic correlations was accepted [model (b)]. For all variables analyzed, model (e) was clearly rejected with values of the likelihood ratio statistic always larger than 50.
When the trait definition was very subjective, as for rear udder height or leg overall score, the attitude of classifiers varied a lot. The same trait did not seem to be scored in a similar way by all classifiers. The analysis of rear udder height clearly showed this situation. Genetic variances varied from 0.03 for classifiers of group 2 up to 0.19 for others (group 3, table VII). The low genetic correlation between classifiers may result from imprecise estimates of variances of a group poorly represented. The scoring of groups of classifiers 2 and 3 seemed to be very different from the scoring of the other groups. In the same way, the heritability of rear udder height was not the same for all classifiers (from 0.18 to 0.72) and seemed to indicate a problem for groups 2 and 3 (these two groups had genetic variances and heritabilities very different from the other groups). This analysis revealed a real problem of consistency regarding the definition of traits among classifiers. This problem was already suspected by specialists. The analysis of height at sacrum seemed to confirm inconsistencies between groups of classifiers. The residual variances varied from 9.5 for group 4 to 14.4 for group 3. In the same way, genetic and residual variances for all traits were smallest for group 4. An explanation of this heteroskedasticity could be that classifiers could have taken a variable number of measures or spent a variable amount of time in measuring this height and generally measures were not exact.

Type of housing
Results are presented in tables VIII and IX. As intuitively expected, the factor 'type of housing' did not lead to heterogeneity of variances for udder overall score. In contrast, the high value of the test statistic for leg overall score, rear udder height and height at sacrum led to rejection of the hypothesis of homoskedasticity. Only the hypothesis of constant genetic correlation between levels of factor of variation was accepted for rear udder height. For height at sacrum and for leg overall score, a genetic correlation equal to one and constant intraclass correlation were accepted, respectively.
time were needed to estimate genetic and residual parameters within stage of lactation (eight levels) as a factor of heterogeneity. These procedures were computationally intensive and may not be easily implemented in national genetic evaluations using an animal model.

DISCUSSION AND CONCLUSION
Most studies on type traits consider that genetic and residual variances are homogeneous. This assumption has not been questioned. However, the results of this first analysis show that heterogeneity of genetic and residual variances exists on such traits. Furthermore, the few studies taking into account heteroskedasticity in mixed linear models assume constant heritability between environments (Koots et al, 1994;Visscher et al, 1991Visscher et al, , 1992Weigel et al, 1994). The original aspect of this paper was to present a way to test these hypotheses (homogeneity of variance components, constant heritability, constant genetic correlation) and to detect sources of variation on type traits. In that respect, our main concern focused on ways to find models as parsimonious as possible to reduce the number of parameters needed to account for heteroskedasticity. This paper was a large application of the approach developed by Foulley et al (1994) and Robert et al (1995 a, b). Using sequential testing procedures, a fitting model can be found for each variable and each factor of heterogeneity considered. However, the overall type-I error rate resulting from the application of such a procedure will be much higher than the significant level chosen at each step.
In this paper, only four variables and three potential factors of heterogeneity were presented, but other analyses with other traits (foot angle, udder balance, etc) and other factors (type of feedstuffs, age at calving, year of classification, class of milk production of the herd) were studied. Several heterogeneities of genetic and residual variances were clearly detected. Three main conclusions can be drawn from this analysis. For most variables and most factors of heterogeneity considered, the homoskedastic model was clearly rejected with large values of the likelihood ratio statistic. This result clearly revealed the inadequacy of the homoskedastic model to study conformation traits. The model under the hypothesis of genetic correlation equal to one and constant heritability seemed to always lead to a better fit than the homoskedastic model for conformation traits.
Some combinations of factor x variable (leg overall score x type of housing, udder overall score x stage of lactation, for instance) with obvious heterogeneity of variances were clearly highlighted, but the likelihood ratio tests led one to accept a very simple model: an additive heteroskedastic mixed linear model with constant heritability (the genetic variance was proportional to the residual variance in the same environment). This finding is important because this model does not involve any interaction between genetic effects (sire effect) and the factor of variation, since the hypothesis of genetic correlation equal to one is accepted in most of the cases considered. Without an interaction term, heteroskedasticity could be more easily accounted for in the current computing strategy used for routine genetic evaluations.
A pronounced effect of the factor 'classifier' was found with genetic and residual heterogeneity of variances and heterogeneity of heritability for variables subjectively scored (rear udder height, leg overall score) but also for variables measured (height at sacrum). A similar conclusion was drawn by  and San Cristobal et al (1993) with respect to the subjective appreciation of muscularity development of beef cattle in the Maine-Anjou breed. The results obtained by McGilliard and Lush (1956) showed that, on the same day, the scoring of different classifiers agreed more than did scoring from the same judge on different days, the correlation between classifiers on the same day being up to 0.74. They also found a significant interaction between classifiers and years for the same cows, which may mean that classifiers do not account for age in quite the same manner, or for the physical aspect of a particular cow at different times. In our analysis, heritability between classifiers varied considerably from 0.18 to 0.72 for instance, but standard errors were unknown. One of the main problems with type classification is the human inaccuracy involved in all subjective assessments. The main part of variation due to classifiers may be due to different attitudes towards different cows, herds or sires, or simply to human inconsistency.
Although computational effort is much larger to estimate genetic parameters with genotype x environment interactions in heteroskedastic mixed models, the model proposed here ui! _ !!l!s! + O&dquo;U2i(hs)ij offers great flexibility to define selection criteria. In particular, selection for a general ability of bulls can be based on predictions of the s! s.
For the factor 'classifier', a precise redefinition of some elementary traits should be considered because variances were generally more variable on elementary traits (eg, rear udder height) than for the other subjective traits (for instance, udder overall score or leg overall score). The approach used here can be a way to help classifiers to homogenize their scoring techniques when the variability detected between classifiers is very large. A posteriori, this analysis may allow an appreciation of the consistency of the classifiers' work and the quality of data collection. With respect to the factor 'classifiers', the grouping in four classes was only made for computational reasons. The likelihood ratio statistic can be used to test the data grouping, but the robustness of the grouping approach used here is questionable.
The analysis of 'height at sacrum' raises some problems. In this analysis, this trait was chosen as a test variable because it is measured and heterogeneity of variances with any factor of variation was not suspected. From our results, the data set must be reanalyzed to check whether these detected heterogeneities of variance are linked to another underlying factor. Heterogeneity may also be due to an inadequate model for means.
The approach presented here relies heavily on the assumption of normality of the data. This was obviously not the case here for the original data since type traits were recorded using scores varying from one to nine. A normal score transformation was used to improve the shape of those distributions. Although limited, the effect of this transformation to reduce skewness was real, which is crucial for the normality assumption (Daumas, 1982). Nevertheless, there are still pending problems about the way to handle such traits. Some of them might be also treated as ordinal discrete data via a threshold liability model (Foulley and Gianola, 1996).
The procedure described here allows one to detect and to precisely determine different sources of heterogeneity of type traits. Once factors of variation have been detected separately, a possible extension could be to use a structural approach combining these factors with a limited number of parameters. Such a method is used on logarithms of variances as it has been used for decades on means (Foulley et al, 1990Gianola et al, 1992;San Cristobal et al, 1993). The procedure is especially flexible owing to the possibility to incorporate prior information on some factors of heteroskedasticity when few data are available per level, such as herds or herd x years.-In the case of genetic correlations not equal to one (when genotype x environment interaction exists, as in models a or b), the modelling seems more tricky and computing costs are higher than in heteroskedastic models without interaction. This problem does not have an easy solution and requires future research.
The inferential procedure chosen here is based on maximum likelihood theory (under regularity conditions). An alternative approach would have been to consider a full bayesian analysis using, for example, MCMC methods (Markov chain Monte Carlo). For small sample inference, a solution may be provided by a posterior distribution estimated by Monte Carlo methods. However, this procedure was not used here since the data set was too large (24 301 progeny of 528 sires).
Although the methodology for heteroskedastic mixed models is appealing, methods of estimation of variance components must be adapted to be used in national systems of genetic evaluations. Concrete and simple proposals to take into account heterogeneity of variances in routine genetic evaluations should be made. Four possible approaches can be considered: -an approach where only one factor of heterogeneity is defined. This factor simply represents a combination of all factors of variation found to be significant with the present approach but number of levels of the resultant factor can be very large; -an approach where the most important factor of variation is considered in the heteroskedastic model and the other factors of variance heterogeneity are ignored; -an approach proposed by Weigel (1992)   where a small set of the data is analyzed to detect heterogeneity of variances and then correction factors for the variances are developed; -a structural approach as defined in Foulley et al (1990,1992) and San Cristobal et al (1993), especially when dealing with a constant heritability coefficient (Foulley, 1997), but attention must be paid to the feasibility of this method with large data sets.