Use of the score test as a goodness-of-fit measure of the covariance structure in genetic analysis of longitudinal data

Model selection is an essential issue in longitudinal data analysis since many different models have been proposed to fit the covariance structure. The likelihood criterion is commonly used and allows to compare the fit of alternative models. Its value does not reflect, however, the potential improvement that can still be reached in fitting the data unless a reference model with the actual covariance structure is available. The score test approach does not require the knowledge of a reference model, and the score statistic has a meaningful interpretation in itself as a goodness-of-fit measure. The aim of this paper was to show how the score statistic may be separated into the genetic and environmental parts, which is difficult with the likelihood criterion, and how it can be used to check parametric assumptions made on variance and correlation parameters. Selection of models for genetic analysis was applied to a dairy cattle example for milk production.


INTRODUCTION
The analysis of longitudinal data in genetic studies is attracting increasing attention. Examples in plant and animal breeding are growth curve analyses [12] or production curves for daily lactation yields for dairy cattle [10]. Evolutionary geneticists are also interested in characters that change with time such as fitness components: survival and reproductive output [15]. Several methodologies have already been proposed to analyse this kind of data. The most commonly used at present are random regression models [3], but other approaches focus more specifically on the modelling of the covariance structure: character process models [14], orthogonal polynomials [11], and structured antedependence models [13].
The comparison of different models is essential in order to choose the most appropriate one. In the likelihood based comparison, several parametric models, possibly non-nested, can be compared and the one with the highest likelihood value (or AIC [1], BIC criterion [16]) is chosen. It is not possible with these criteria, however, to know if additional improvement can still be reached in fitting the data, and if a more complicated parametric model is needed.
The score test [2] is based on the first and second derivatives of the likelihood evaluated under the null hypothesis, i.e. assuming that the parametric model to be tested is the correct one. The score statistic is interpretable in itself and does not require comparison with any other parametric model. In fact, if it is lower than the associated chi-square value, the tested parametric model adequately fits the data and there is no need for a more complex model.
In genetic analysis, the covariance structure is decomposed into a genetic and an environmental component. Until now, no methodology has been proposed to check the goodness-of-fit of both parts and it is in most cases difficult to separate the likelihood into these two components. Additionally, the parametric assumptions made on the covariance structure need to be checked in order to detect any discrepancies in the variance or correlation modelling.
The aim of this paper was to show how the score statistic can be decomposed into a goodness-of-fit measure for both the genetic and environmental parts, as well as for the variance and correlation components. It was applied to the genetic milk production analysis in dairy cattle.

Genetic analysis of repeated measurements
The observed trait X(t) is assumed to change continuously over time t and is assumed to be decomposed as: (1) where µ(t) is a nonrandom function, the genotypic function of X(t), and g(t) and e(t) are Gaussian random functions, which are independent of one another and have an expected value of zero at each time. They represent the time-dependent genetic and environmental deviations, and have covariance functions G(s, t) and E(s, t) between two times s and t, respectively.
The difficulty is to choose the best model for the genetic and environmental parts. Many different kinds of models have been proposed and they are usually compared using likelihood values penalised for the number of parameters: e.g., AIC [1], BIC [16]. Likelihood values do not, however, allow the separation of the genetic and environmental parts. In order to test the goodness-of-fit for the genetic covariance structure obtained with the model, it would be possible to use a likelihood ratio test, fixing the environmental covariance parameters at the MLEs. The likelihood can then be compared to the maximised likelihood, Log L max , that corresponds to unstructured matrices for both genetic and environmental components. In most practical cases, however, the reference value, Log L max , is not known because convergence for unstructured covariances, especially for the genetic part, is not obtainable. On the contrary, the score test does not require a reference value for the likelihood, since it only involves the score vector and information matrix under the null hypothesis, i.e. with covariance parameter estimates obtained with the model to be tested.

Score test with nuisance parameters
Suppose one is interested in testing the goodness-of-fit for the genetic covariance matrix G (i.e. the between-sire component in the case of a sire model). The environmental covariance parameters would be considered as nuisance parameters. Let g be the vector of genetic covariance terms (g = vech(G)). If J is the number of times of measurement, matrix G = (G(t i , t j )) 0≤i, j≤J , is of dimension (J × J) and vector g is ((J(J + 1)/2) × 1). Estimates of g can be obtained either from a completely unstructured matrix or from a longitudinal model such as random regression [3], character process [7,14] or structured antedependence models [13]. Similarly, E of dimension (J × J) is the environmental covariance matrix (or the within-sire component in a sire design), and e is the vector of environmental covariance terms (e = vech(E)), that are considered as nuisance parameters.
The aim is to test that the genetic covariance structure (g) of the data is actually equal to the covariance structure estimated with the parametric model (g 0 ), i.e. the null hypothesis is: H 0 : g = g 0 . Let (g, e) be the REML loglikelihood, that includes both the genetic (g) and environmental (e) parts. The score vectors are defined by S g = ∂ /∂g and S e = ∂ /∂e. The information matrix I, that corresponds to the second derivatives of the likelihood with respect to the covariance components, can be written as: where I gg are the second derivatives of the likelihood with respect to the genetic covariance components and similarly for the environmental part I ee . The inverse of the information matrix is given by: where e.g. I gg = (I gg − I ge I ee −1 I eg ) −1 . The score statistic for H 0 : g = g 0 is [2] W = Q(g 0 , e) I gg Q(g 0 , e) (2) where Q(g 0 , e) = S g (g 0 , e) − I ge I ee −1 S e (g 0 , e). (3) There is usually some simplification by taking e =ê whereê is the MLE at H 0 , in which case: S e (g 0 ,ê) = 0 and W = S g (g 0 ,ê) I gg S g (g 0 ,ê), with I gg evaluated at (g 0 ,ê). The score statistic is then compared to a chi-square with (J(J + 1)/2 − p) degrees of freedom, where p is the number of parameters in the covariance matrix of the tested model (p < J(J + 1)/2). The same calculations could be done symmetrically to test the goodnessof-fit of the environmental covariance matrix. Expression W = Q(g, e 0 ) I ee Q(g, e 0 ) does not require knowledge of the unstructured covariance matrix, and G can be the estimated covariance matrix of any model (g = vech(G)). This is an important advantage in practice over the likelihood ratio test, for example, since the unstructured genetic covariance matrix is often difficult to estimate due to convergence problems. For a chosen model, with covariance parameter estimates g 0 and e 0 , the adjusted score statistic presented above will provide goodness-of-fit measures for g 0 taking into account the uncertainty in the e 0 estimates, that corresponds to the actual fit of estimates g 0 to the data. This would not be possible with a likelihood ratio test that gives the goodness-of-fit measure of the genetic part g 0 , assuming the model for the environmental part is correct.

Score test to check heterogeneity of the residual variance
Several studies with test-day models for the lactation curve analysis, for instance, showed heterogeneity of the residual variance over time [18]. Jaffrézic et al. [8] proposed a link function approach to model the residual variance changes over time as a continuous function, for example a polynomial function of time, using a structural model as proposed by Foulley and Quaas [4]. The score statistic can be decomposed in order to check these parametric assumptions.
In an unstructured framework, the residual variance is included in the environmental covariance matrix. Derivatives of the likelihood with respect to the environmental variance parameters can be easily obtained from the previous score vector and information matrices. Let r be the vector of the environmental variance components. If J is the number of measurements over time, r is of dimension J × 1. Let S r e be the elements of the environmental score vector corresponding to the variance components and I r ee for the information matrix. Assuming everything else is correctly fitted, the score statistic for the environmental variance (including the residual variance) can be obtained as: This score statistic tests whether a more complex residual variance is necessary, given the chosen genetic and environmental covariance structures. As previously, an adjusted score statistic could also be considered for the environmental variance, considering the genetic covariance and the other environmental covariance parameters as nuisance parameters. Let m be the vector of all these "nuisance parameters". The score vector previously considered S = (S g , S e ) can be reordered as: S = (S m , S r ) , and similarly for the information matrix I: with inverse: The adjusted score statistic for the environmental variance (including the residual variance), used to test the null hypothesis H 0 : r = r 0 can be obtained as: This adjusted score statistic checks the actual goodness-of-fit of the environmental variance to the data whereas the previous score statistic (T r ) was a goodness-of-fit measure for the environmental variance assuming all the other covariance parameters to be perfectly fitted. Both score statistics can be useful in practice and answer two different questions. The first (unadjusted, T r ) can be used when a parametric model has already been chosen for the genetic and environmental parts, and parametric assumptions are to be checked on the residual variance only (which will be useful for model selection). The adjusted score statistic would be more useful once the complete model has been chosen to check the actual fit to the data.

Score test for variance and correlation parameters
As found in previous studies [7], the separate modelling of variance and correlation functions allows flexibility in the choice of covariance structure. This approach is used in the character process methodology [14] and to some extent, in structured antedependence models [13]. Both methodologies are based on parametric assumptions that considerably reduce the number of parameters compared to unstructured models. The score statistic can also be decomposed into the variance and correlation components in order to check the parametric assumptions made on both parts and detect if more complicated parametric functions should be used. The methodology will be presented here in the case of structured antedependence (SAD) models, but can easily be extended to character processes.
The idea of antedependence models, as originally proposed by Gabriel [5], is that an observation at time t can be explained by the previous ones. An antedependence structure of order r is defined by the fact that the ith observation (i > r) given the r preceding ones is independent of all other preceding observations [5]. Generalising this concept to genetic analysis, a second order SAD model for the genetic part can be written as: for j ≥ 2. Here, φ 1 and φ 2 are antedependence parameters, and g (t) is assumed to be normally distributed, with mean zero and variance σ 2 g (t), termed "innovation variances", that can change with time. In structured antedependence (SAD) models, Nunez-Anton and Zimmerman [13] propose using a parametric function for innovation variances σ 2 g (t) with, for example, a polynomial of time. This allows to considerably reduce the number of parameters compared to unstructured antedependence models (UAD) as originally proposed by Gabriel [5], where one parameter has to be estimated at each time. Antedependence parameters can also be assumed to change with time, which is particularly useful for unequally spaced data. The same model can be written for environmental effects e(t).
Using a Cholesky decomposition, the inverse of the covariance matrix for antedependence models can be written as: G −1 = L DL where L is a lower triangular matrix with 1's on the diagonal and the negatives of the antedependence parameters on the sub-diagonals and D is a diagonal matrix of the inverse of innovation variances. Score and information matrices for D and L parameters can be calculated as functions of the first and second derivatives of the likelihood with respect to the covariance matrix parameters. In fact, let d be the vector of the diagonal elements of matrix D, and g the vector of the covariance matrix components (g = vech(G)). If J is the number of times of measurement, d is of dimension (J × 1) and g of dimension ((J(J + 1)/2) × 1). The first derivative of the likelihood with respect to the variance components is given by: ∂ Log ∂d j = ∂g ∂d j ∂ Log ∂g (10) where ∂ Log /∂g is the score vector previously used; ∂g/∂d j is the vector of dimension ((J(J + 1)/2) × 1) with elements ∂G/∂D j , (11) and as G −1 = L DL, (12) where ∂D/∂D j has only one non-zero element on position (j, j) which is equal to 1. The score vector ∂ Log /∂d is of dimension (J × 1) with the jth element ∂ Log /∂d j . The information matrix can be obtained by: (13) for (j, k) = 1, . . . , J. As information matrix for D parameters is of dimension J × J and element (j, k) can simply be obtained by: where ∂ 2 Log /∂g 2 is the information matrix previously used. Score and information matrices for the L parameters can be similarly obtained, however derivatives of G −1 are more complex than for D. In fact, Since the second derivatives of ∂L/∂L ij ∂L i j are equal to zero, it follows: Score statistics for the D and L parameters have to be compared to chisquare values with degrees of freedom equal to the number of parameters in unstructured antedependence models [5] minus the number of parameters in the chosen structured model. For example, if there are J times of measurement, for a first order structured antedependence model, the score statistic for D will be compared to a chi-square with (J − p) degrees of freedom, and for L to a chi-square with ((J − 1) − q) degrees of freedom where p and q are the number of parameters in the structured innovation variances and antedependence coefficients, respectively. Similar calculations could be performed for character process models to check parametric assumptions of variances and correlations that are modelled separately, since the covariance matrix can be written: G = DCD where D is a diagonal matrix that contains the square root of the variance parameters, and C is the correlation matrix. This would allow modelling of the variance and correlation functions to be checked separately. The score statistic can be used in place of the Vonesh concordance coefficient [7,17], that requires specification of a reference model and does not take into account the uncertainty in the parameter estimations.

Milk production in dairy cattle
The score test methodology was applied to a data set for the genetic evaluation of first lactation milk production for dairy cattle. This data set has already been studied in previous analyses [9], but until now no methodology has been proposed to check the goodness-of-fit of covariance parameters obtained under the different models. Lactation curves were fitted to test day records for 9277 progeny of 464 Holstein-Friesian sires, assumed unrelated. Observations were made over two years (1993 and 1994). The lactation stage of the animals at the first test varied between 4 and 40 days, with successive tests at approximately 30 day intervals. All cows had 10 measurements. The fixed effects considered were the age at calving, the percentage of North American Holstein genes, and herd-test-month. An exponential curve of Wilmink [19] was fitted as a fixed regression model for the general curve of the population: (19) where t stands for days in milk and parameter λ was assumed to be known and equal to 0.068, chosen based on previous studies [18]. A sire model was used.

Results
All the parameter estimations were performed with ASREML [6]. First and second derivatives of the likelihood required in the score statistic calculations were by-products of the Average Information estimation procedure used in ASREML. Score statistics for the different models are given in Table I. Since 10 times of measurement were considered, the values have to be compared to chi-square with a number of degrees of freedom equal to 55 minus the number of parameters in the covariance structure of the tested model. The chi-square values (α = 0.05) for both genetic and environmental parts for the different models are also given in Table I. Score statistics for unstructured matrices (Model 1 in Tab. I) should be equal to zero since it corresponds to the saturated model (110 parameters). However, covariance matrices were constrained to be positive definite and the unstructured estimate for the genetic covariance matrix corresponded to a point on the boundary of the positive definite region of the parameter space where differentials were not zero. Consequently, the genetic score statistic for the unstructured matrix was equal to 31.1 instead of being zero, and no "reference model" was available in this study.
Judging by likelihood values, structured antedependence (SAD) models performed much better than random regression (RR), despite their smaller number of parameters in the covariance structure (Tab. I, Models 4 to 7). Using the score statistics, it can be seen that the environmental fit is much better for SAD compared even to quartic polynomials (score statistic equal to 525 for a third order SAD with 6 parameters compared to 978 for a quartic random regression with 15 parameters). However, score statistic values showed that the genetic part can be quite well fitted by a simple random regression model such as quadratic (S = 66). This could not be seen in the likelihood value that only represents the overall fit of the model.
Although the goodness-of-fit for the environmental covariance structure was much better for SAD models than random regression, there is still a large potential for improvement since the score statistic is still much larger than the chi-square value (509.3 > 64.0). In order to improve the fit, the assumption of a constant residual variance was relaxed.
Table II and Figure 1 show the difficulty of modelling the residual variance with a simple parametric function of time, and the unstructured model, with 10 residual variances is chosen here (Model 9). Table I shows that, in this model, the fit of the environmental covariance structure is considerably improved (score statistic equal to 122.6) compared to the antedependence models with constant variance (S = 525.4 for Model 4).
Allowing the residual variance to change with time also considerably improved the fit of the random regression model for the environmental covariance structure. For a quartic random regression, for instance, the score statistic was equal to 978.7 (Model 7), whereas it is equal to 90.6 when 10 classes of residual variance are considered (Model 8). It remains, however, still larger than the associated chi-square value (χ 2 E = 43.8).    In all the SAD models previously considered, quadratic functions were used for innovation variances, but antedependence parameters were assumed constant. This is quite a stringent assumption that we tried to relax. In Model 10, the genetic antedependence parameter was assumed linear, and the two first environmental antedependence parameters were considered quadratic. For this model involving 25 covariance parameters, fewer than a quartic-quartic random regression model, the score statistics are much smaller, especially for the environmental part, and are close to the chi-square values (S = 68.1 for the environmental covariance, compared to χ 2 E = 49.8). As described above, the score statistic can also be decomposed to check the parametric assumptions on innovation variances and antedependence parameters. For the genetic part of Model 10, the innovation variance was assumed quadratic, and the score statistic was equal to 9.5, which is smaller than the 5% point of a chi-square with 7 degrees of freedom, i.e. 14.1. The innovation variances were therefore adequately fitted as illustrated in Figure 2 where it is compared to an unstructured first order antedependence model (UAD(1)).   The antedependence coefficient was assumed to change linearly with time. The score statistic was equal to 20.9, and had to be compared to a chi-square with 7 degrees of freedom as well, i.e. 14.1. It seems that the antedependence parameter was still not exactly fitted. However, as shown in Figure 2, changes of the antedependence coefficient over time will be difficult to model with a parametric function without using a large number of parameters.
In this analysis, the best compromise between the number of parameters and the goodness-of-fit of the environmental covariance was obtained with a third order unstructured antedependence model (Model 3). In this case, the score statistic was equal to 20.8, which is smaller than the chi-square value χ 2 E = 31.4. The genetic part seemed much simpler to model, and a simple quadratic random regression seemed to be appropriate (S = 66.2 < χ 2 G = 66.3).

DISCUSSION
The aim of model selection in the analysis of longitudinal data in genetic studies is to find the most appropriate genetic and environmental covariance structures. This is traditionally done using likelihood based criteria such as AIC [1] or BIC [16] that allow for comparison among non-nested models.
It is, however, difficult in general to decompose the likelihood into the genetic and environmental components and therefore to check the fit of both parts. This can be possible, in some cases, using a likelihood ratio test, but requires the knowledge of a "reference model", i.e. of the correct genetic and environmental covariance matrices, which are, in most practical cases, unknown.
The score test overcomes this difficulty since the score statistic is calculated under the null hypothesis, i.e. using estimates of the model to be tested. Furthermore, the adjusted score statistic with nuisance parameters presented in the first part of the paper allows, for a chosen model, the goodness-of-fit of the genetic part to be tested, while taking into account the uncertainty in the environmental covariance estimates.
It was also shown that the score statistic is very flexible and can be decomposed to test the goodness-of-fit for each component of the model. It can, for example, be decomposed into variance and correlation components to check the parametric assumptions for the genetic or environmental part.
The proposed score statistics proved to be useful goodness-of-fit measures in practice. In the dairy cattle example, based upon the likelihood criterion it was found that structured antedependence models provided a much better fit than random regression. However, the large discrepancies in the environmental part found with the score test would not be detected only using the likelihood values. A very large improvement in the fit of the environmental covariance was obtained when relaxing the assumption of constant residual variance, and was clearly shown in the score statistic values. A much simpler model was chosen for the genetic part, and the score test showed that there is no need to consider more complex structures.
This paper therefore shows that score statistics can be very useful goodnessof-fit measures for genetic longitudinal data analyses, and it could be helpful to implement this criterion in software packages.