Modeling relationships between calving traits: a comparison between standard and recursive mixed models
- Evangelina López de Maturana^{1, 2}Email author,
- Gustavo de los Campos^{1},
- Xiao-Lin Wu^{3},
- Daniel Gianola^{1, 3, 4},
- Kent A Weigel^{3} and
- Guilherme JM Rosa^{3}
DOI: 10.1186/1297-9686-42-1
© de Maturana et al; licensee BioMed Central Ltd. 2010
Received: 20 May 2009
Accepted: 25 January 2010
Published: 25 January 2010
Abstract
Background
The use of structural equation models for the analysis of recursive and simultaneous relationships between phenotypes has become more popular recently. The aim of this paper is to illustrate how these models can be applied in animal breeding to achieve parameterizations of different levels of complexity and, more specifically, to model phenotypic recursion between three calving traits: gestation length (GL), calving difficulty (CD) and stillbirth (SB). All recursive models considered here postulate heterogeneous recursive relationships between GL and liabilities to CD and SB, and between liability to CD and liability to SB, depending on categories of GL phenotype.
Methods
Four models were compared in terms of goodness of fit and predictive ability: 1) standard mixed model (SMM), a model with unstructured (co)variance matrices; 2) recursive mixed model 1 (RMM1), assuming that residual correlations are due to the recursive relationships between phenotypes; 3) RMM2, assuming that correlations between residuals and contemporary groups are due to recursive relationships between phenotypes; and 4) RMM3, postulating that the correlations between genetic effects, contemporary groups and residuals are due to recursive relationships between phenotypes.
Results
For all the RMM considered, the estimates of the structural coefficients were similar. Results revealed a nonlinear relationship between GL and the liabilities both to CD and to SB, and a linear relationship between the liabilities to CD and SB.
Differences in terms of goodness of fit and predictive ability of the models considered were negligible, suggesting that RMM3 is plausible.
Conclusions
The applications examined in this study suggest the plausibility of a nonlinear recursive effect from GL onto CD and SB. Also, the fact that the most restrictive model RMM3, which assumes that the only cause of correlation is phenotypic recursion, performs as well as the others indicates that the phenotypic recursion may be an important cause of the observed patterns of genetic and environmental correlations.
Background
Structural equation models (SEM) are well established and widely used in the social sciences. In quantitative genetics, these models were first suggested by Sewall Wright [1] but were ignored for many years. Recently, Gianola and Sorensen [2] suggested a model in which recursive and simultaneous relationships between phenotypes are considered in the context of a multiple-trait Gaussian model. This stimulated application of SEM in animal breeding and genetics (e.g., de los Campos et al. [3, 4], Varona et al. [5], López de Maturana et al. [6], Wu et al. [7]). SEM can be used, for example, to explore potential relationships between variables of interest or to evaluate the plausibility of different hypotheses [8]. In addition, SEM facilitate comparisons between alternative nested path analysis models [9].
López de Maturana et al. [10] applied SEM to study relationships between three calving traits (gestation length (GL), calving difficulty (CD) and stillbirth (SB)). SEM were found useful for detecting heterogeneous correlations between residual, contemporary group, or genetic effects affecting GL and liabilities to CD and SB. However, a comparison between their model and nested models with different restrictions on relationships between variables has not been addressed yet.
The present work complements the study of López de Maturana et al. [10] by comparing, in terms of goodness of fit and predictive ability, a sequence of SEM with different restrictions on the (co)variance matrices among model parameters.
Methods
Data
The data consisted of a sample of primiparous US Holstein cows calving from 2000 to 2005 that were recorded as part of the National Association of Animal Breeders (Columbia, Mo) Calving Ease Program. After editing, the data set contained GL, CD and SB records from 90,393 cows, sired by 1,122 bulls, mated to 567 service sires, and distributed over 935 herd-calving year combinations, as described in López de Maturana et al. [10].
Statistical model
where ( ), and ( ) denote liabilities and thresholds for CD (SB), respectively. For identification purposes, the first thresholds for CD ( ) and SB ( ) were set to 0 and the second threshold for CD ( ) was set to 1. A multivariate normal model was assumed for .
where, λ_{CD←GL(k)}, λ_{SB←GL(k)}and λ_{SB←GL(k)}describe rates of change of the liabilities to CD and SB with respect to GL, and of the liability to SB with respect to the liability to CD, respectively. As noted before, recursive coefficients were allowed to vary across categories of GL, k = {1, if ≤ 267 d; 2, if 267 d < ≤ 273 d; 3, if 273 d < ≤ 279 d; 4, otherwise}, to account for non-linearity of the relationship between GL and the two calving traits. Model residuals, ε_{ i }, were assumed to be independent and identically distributed (IID) across animals, that is, , where R_{0} is a 3 × 3 residual (co)variance matrix, with its last diagonal entry (i.e., the residual variance of the liability to SB) restricted to 1 for identification purposes.
Prior distribution
where, θ_{ k }= (Λ_{ k }, b, h, s, mgs, G_{0}, H_{0}, R_{0}, τ); G_{0} and H_{0} are (co)variance matrices of genetic, herd and residual effects, respectively; λ_{ k }is a vector containing the non-null recursive effects; and τ is the vector with the thresholds.
(Co)variance components
Without imposing further restrictions, the model described in (2) considering the recursive relationship is under-identified. Identification can be attained by imposing restrictions on dispersion, location parameters or on the matrix of recursive effects. For computational convenience and due to the difficulty to assure identification through the location parameters, only restrictions on dispersion or recursive parameters were considered. A sequence of models was obtained by changing the prior specifications for p(λ), p(G_{0}), p(H_{0}), and p(R_{0})
Recursive mixed model 1 (RMM1)
This model assumes that the correlation between residuals in the reduced models, Λ^{-1}_{k}ε_{i}, is solely a consequence of the phenotypic recursion. R_{0} is assumed to be diagonal, i.e., p(R_{0})is the product of two independent scaled inverted Chi-square distributions (for GL and CD, because was set to 1 to ensure identification), and p(G_{0}) and p(H_{0}) are assumed to be distributed a priori as inverted Wishart distributions. The number of unknowns in the dispersion parameters and the matrix of recursive effects is 41: 6 in , and H_{0}, 9 in , 2 in R_{0}, and 3 in each Λ_{ k }.
Recursive mixed model 2 (RMM2)
This model results from adding to RMM1 the restriction that H_{0} is also diagonal. This restriction implies that the correlations between residuals and between contemporary groups in the reduced model are exclusively due to recursive relationships. Thus, the number of parameters entering in [5] in RMM2 (38) is smaller than those entering in [5] in model RMM1 (number of parameters equal to 41). RMM2 is obtained by assigning an inverted Wishart distribution to G_{0} and independent scaled-inverted Chi-square distributions to the unknown diagonal elements of H_{0} and R_{0}. Note that, as in RMM1, is set to 1 to ensure identification.
Recursive mixed model 3 (RMM3)
This model assumes that the only cause of correlations between any of the random effects in the reduced model is the phenotypic recursion. That is, , , , H_{0} and R_{0} are diagonal, and the priors for the unknown diagonal components are independent scaled-inverted Chi-square distributions. The number of unknowns in dispersion parameters and in the matrix of recursive effects is now 26.
Standard mixed model (SMM)
This model is defined by setting and Λ_{k} = I, and by treating G_{0}, H_{0}, and R_{0} as unstructured (co)variance matrices. As prior distributions, inverted Wishart distributions are assumed to G_{0} and H_{0} and a conditional inverted Wishart distribution to R_{0} (p(R_{0}| | = 1)) (see [13] for details). The sum of unknowns in the (co)variance matrices is 32 (6 in H_{0}, 5 in R_{0} and 21 in G_{0}); there are no recursive parameters in this model.
Implementation
With the a priori assumptions described above, the fully conditional distributions of all unknowns in all models have closed forms, and draws from the posterior distribution can be obtained via Gibbs sampling. The SirBayes software [7] was used to implement the models. The length of the chain and the burn-in period were assessed by visual examination of trace plots of posterior samples of selected parameters; additional diagnostic checks were employed. After a preliminary analysis, it was decided to run 5 independent chains, each consisting of 10,000 iterations. In each chain, the first 1,000 iterations were discarded as burn-in, and one of every 10 successive samples was retained. Thus, 4,500 samples were used to infer the posterior distributions of unknown parameters. Features of the marginal posterior distributions of interest, the convergence analysis, and estimates of Monte Carlo error, were obtained using the BOA software http://www.public-health.uiowa.edu/boa.
Model comparison
The performance of the SMM and the three RMM considered was investigated in terms of both goodness of fit and predictive ability, under the consideration that a model that fits current data very well may fail to provide accurate predictions of future (independent) observations [14].
The mean squared error of a calving trait phenotype, , and Pearson's correlation between fitted and observed data, , were evaluated at the posterior means of the unknowns ( ), to assess goodness of fit.
Predictive ability was assessed with MSE and Pearson's correlation, using a 3-fold cross-validation (CV) procedure. The full data set was randomly partitioned into three disjoint subsets, each with approximately one-third of the records. The CV procedure used two of the three subsets for model fitting and prediction (i.e., the training set), and predictive ability was evaluated in the remaining subset (i.e., the testing set). MSE and Pearson's correlation were computed as before, but in this case by concatenating results from the three cross-validation sets.
Above, Φ(·) is the cumulative distribution function of a standard normal variate; τ_{c} is the assumed (or estimated) value of the appropriate threshold for CD and SB, and is the posterior mean of the liability to CD or SB for individual i.
Results and Discussion
Small Monte Carlo errors (~10^{-2}-10^{-4}) were obtained for all the parameters that were estimated in each model; this suggests that convergence was achieved, and that a sufficient number of Gibbs samples was used.
Structural coefficients
Posterior mean (standard deviation) of structural coefficients for calving traits from the recursive mixed models
Structural coefficients | Model^{a} | Category of GL | |||
---|---|---|---|---|---|
261-267 d | 268-273 d | 274-279 d | 280-291 d | ||
λ_{CD←GL}(l. u.^{b}/1 d GL) | RMM1 | 0.005 (0.005) | 0.020** (0.003) | 0.032** (0.005) | 0.040** (0.003) |
RMM2 | 0.006 (0.005) | 0.020** (0.003) | 0.032** (0.005) | 0.040** (0.003) | |
RMM3 | 0.005 (0.005) | 0.021** (0.003) | 0.033** (0.005) | 0.041** (0.003) | |
Overall effect of GL on SB (l. u./1 d GL)^{c} | RMM1 | -0.044** (0.006) | -0.021** (0.004) | -0.008 (0.006) | 0.024** (0.003) |
RMM2 | -0.044** (0.0062) | -0.021** (0.0038) | -0.008 (0.0057) | 0.025** (0.0031) | |
RMM3 | -0.044** (0.006) | -0.021** (0.004) | -0.008 (0.006) | 0.025** (0.003) | |
λ_{SB←CD}(l. u./l. u. CD) | RMM1 | 0.339** (0.023) | 0.331** (0.011) | 0.330** (0.007) | 0.3311** (0.007) |
RMM2 | 0.327** (0.023) | 0.319** (0.010) | 0.317** (0.007) | 0.318** (0.007) | |
RMM3 | 0.330** (0.003) | 0.321** (0.011) | 0.319** (0.007) | 0.320** (0.007) |
Genetic parameters
Additional file 1, Table S1 shows the posterior means (standard deviations) of direct and maternal heritabilities of GL and liabilities to CD and SB for each model. Posterior distributions of direct and maternal heritabilities for the three calving traits were similar across categories of GL and between models (RMM1, RMM2 and RMM3) and were also similar to their counterparts from the SMM. The posterior mean of direct heritability of GL was higher than that for maternal heritability (0.39 vs. 0.08-0.07); corresponding estimates for CD (0.08-0.10 vs. 0.07-0.08) and SB (0.05-0.08 vs. 0.08-0.11) were smaller than those for direct heritability and similar between them. Heritability estimates were within the range of values reported in previous studies [15–17]; estimates for CD and SB were higher than those used in routine genetic evaluations of CD and SB in US Holsteins, except for the direct heritability of CD [18, 19].
Features of the posterior distributions of genetic correlations in the four categories of GL from the SMM and RMM models are shown in Additional file 1, Tables S2, S3, S4 and S5. In general, estimates of genetic correlations obtained from the SMM were within the ranges of values obtained for each category of GL from the RMM analyses. All of the recursive models evaluated in this study detected a heterogeneous correlation between direct and maternal effects of GL and between direct and maternal liabilities to CD and SB, as expected. Similar estimates were found in the analyses of RMM1 and RMM2. Regarding the correlation between direct effects of GL and CD, positive posterior means were obtained from both SMM and RMM by category of GL. For all categories of GL, RMM3 gave lower estimates than the other models, due to restrictions placed on G_{0}. Similarly, positive estimates (although slightly lower) were found between maternal effects of GL and CD. Slightly stronger correlations between direct effects of GL and SB were found using RMM3, compared with those using RMM1 or RMM2, for all categories of GL. Relatively high, positive, and similar estimates were obtained for the genetic correlation between direct effects for CD and SB in each of the four categories of GL, with lower estimates from RMM3. A similar pattern, although with slightly lower estimates, was found for the genetic correlation between the maternal effects of CD and SB.
Similar posterior means of the genetic correlation between direct and maternal effects for the same trait were found in SMM and RMM, and across categories of GL: moderately negative for GL and SB, and close to 0 for CD.
The 90% highest posterior density intervals for genetic correlations between direct and maternal effects for different traits obtained with RMM included 0 or had an almost null posterior mean, and were similar to their counterparts from the SMM. This suggests that effects of genes controlling direct effects for one calving trait are not associated with those controlling maternal effects for another calving trait, and vice versa.
The estimates of previously genetic correlations were within the range of values reported in the literature [15–17].
Additional file 1, Table S6 shows the posterior means of correlations between contemporary groups and between residuals. Almost null estimates of the correlation between contemporary groups of GL and CD were found in SMM and RMM for all categories of GL. Regarding GL and SB, small positive estimates were obtained from the analyses of SMM and RMM1. Results from RMM1 suggest that the correlation changes across categories of GL. Estimates from the other recursive models (RMM2 and RMM3) also suggested that the correlation changes across categories of GL, including a modification of sign: slightly negative in the first two categories of GL (-0.10 and -0.05, respectively), nil in the third, and slightly positive in the fourth (0.06). Posterior means of the correlation between herd-year effects of CD and SB were nil in the analyses of models SMM and RMM1; however, those from models RMM2 and RMM3 were moderate and positive (0.54). Differences in sign and magnitude between estimates were a consequence of the different assumptions regarding the covariances between herd-year effects in SMM and RMM1 versus those in RMM2 and RMM3.
The RMM detected heterogeneous correlations between residuals of GL and both CD and SB that were solely due to the recursive relationship between GL and liabilities to CD and SB residuals. Estimates from SMM were in the interval of values from RMM. Similarly, positive and moderate correlations between residuals of CD and SB were found in all RMM models (0.38-0.40), whereas the estimate from SMM was much lower (0.09).
Model comparison
Among the variety of model comparison methods, MSE and Pearson's correlation between observed and estimated/predicted phenotypes were chosen based on their ease of interpretation and weaker dependence on priors' choice. Mean squared error is a measurement related to the bias-variance trade-off of a model, either for fitting or predictive ability, whereas Pearson's correlation indicates the accuracy of estimations/predictions. The use of these criteria provides information on the model performance for each analyzed trait, but they lack an overall measure of the multivariate model performance. Bayes Factor or DIC could be alternative model selection criteria to provide such information. However, due to their disadvantages, which will be briefly described below, we have discarded them in favor of MSE and Pearson's correlation. Bayes Factor is based on marginal likelihood, and therefore provides a measure of model goodness of fit. This criterion indicates whether the data increased or decreased the odds of model i relative to model j [14]. However, it depends on prior input, and this dependence does not decrease as sample size increases, unlike parameter's estimation based on posterior distributions [20]. In addition, BF does not indicate which hypothesis is the most probable, but it shows which hypothesis would make the sample more probable, if the hypothesis is true and not otherwise. Regarding DIC, it makes a compromise between goodness of fit and model complexity, and in some contexts, it can agree with measures of predictive ability. However, this is not always the case. Additionally, DIC is based on an approximation that may not be appropriate in the class of non-linear models considered here.
Goodness of fit
Goodness of fit criteria for standard (SMM) and recursive (RMM) mixed models
Comparison criteria | Model^{a,b} | ||||
---|---|---|---|---|---|
SMM | RMM1 | RMM2 | RMM3 | ||
GL | |||||
Mean squared error | 18.717 | 18.717 | 18.716 | 18.715 | |
Pearson's correlation | 0.465 | 0.465 | 0.465 | 0.465 | |
CD | |||||
Mean squared error | 0.788 | 0.791 | 0.791 | 0.791 | |
Pearson's correlation | 0.487 | 0.485 | 0.486 | 0.486 | |
SB | |||||
Mean squared error | 0.108 | 0.109 | 0.109 | 0.109 | |
Pearson's correlation | 0.246 | 0.243 | 0.244 | 0.243 |
The differences observed between the posterior mean liabilities to SB from SMM and those from RMM (see Figure 1) did not occur when the goodness of fit of these models was evaluated in terms of MSE and Pearson's correlation between predicted and observed SB score.
Predictive ability
Predictive ability of standard (SMM) and recursive mixed models from the analyses of cross-validation subsets
Comparison criteria | Model^{a,b} | ||||
---|---|---|---|---|---|
SMM | RMM1 | RMM2 | RMM3 | ||
GL | |||||
Average mean squared error | 19.559 | 19.559 | 19.558 | 19.558 | |
Pearson's correlation | 0.424 | 0.424 | 0.424 | 0.424 | |
CD | |||||
Average mean squared error | 0.824 | 0.823 | 0.824 | 0.823 | |
Pearson's correlation | 0.448 | 0.450 | 0.449 | 0.450 | |
SB | |||||
Average mean squared error | 0.111 | 0.111 | 0.111 | 0.111 | |
Pearson's correlation | 0.150 | 0.172 | 0.170 | 0.170 |
The negligible differences in terms of goodness of fit and predictive ability between models might be explained by the small differences in estimated genetic correlations between SMM (off diagonals of , and ) and RMM (off diagonals of ). The larger differences observed in correlations between contemporary groups for GL and liability to SB and between liabilities to CD and SB, as well as their counterparts between residual effects from SMM and RMM, were not reflected in goodness of fit and predictive ability. Thus, a very restrictive model (RMM3, with 26 parameters) provided similar fit and predictive ability as less parsimonious models.
Conclusions
This paper illustrates how SEM can be used to achieve parameterizations with different levels of complexity that represent different genetic models. For example, recursive relationships can be used to generate models in which the genetic parameters are themselves subject to genetic variation.
The applications examined in this study suggest the plausibility of a recursive effect from GL onto CD and SB. Also, as reported in previous studies, this relationship is not linear. The fact that the most restrictive model (RMM3), which assumes that the only cause of correlation is phenotypic recursion, performs as well as the others indicates that the recursion may be an important cause of the observed genetic and environmental correlations.
Declarations
Acknowledgements
The authors would like to acknowledge the National Association of Animal Breeders (Columbia, MO) for providing data for the present study, as well as for providing partial financial support for Dr. Kent Weigel. Research was also supported by the Wisconsin Agriculture Experiment Station and by grant NSF-DMS-044371. We thank an anonymous referee for helpful comments.
Authors’ Affiliations
References
- Wright S: The method of path coefficients. The Annals of Mathematical Statistics. 1934, 5 (3): 161-215. 10.1214/aoms/1177732676.View Article
- Gianola D, Sorensen D: Quantitative genetic models for describing simultaneous and recursive relationships between phenotypes. Genetics. 2004, 167: 1407-1424. 10.1534/genetics.103.025734.PubMed CentralView ArticlePubMed
- de los Campos G, Gianola D, Boettcher P, Moroni P: A structural equation model for describing relationships between somatic cell score and milk yield in dairy goats. J Anim Sci. 2006, 84: 2934-2941. 10.2527/jas.2006-016.View ArticlePubMed
- de los Campos G, Gianola D, Heringstad B: A structural equation model for describing relationships between somatic cell score and milk yield in first-lactation dairy cows. J Dairy Sci. 2006, 89: 4445-4455.View ArticlePubMed
- Varona L, Sorensen D, Thompson R: Analysis of litter size and average litter weight in pigs using a recursive model. Genetics. 2007, 177: 1791-1799. 10.1534/genetics.107.077818.PubMed CentralView ArticlePubMed
- López de Maturana E, Legarra A, Varona L, Ugarte E: Analysis of fertility and dystocia in Holsteins using recursive models to handle censored and categorical data. J Dairy Sci. 2007, 90: 2012-2024. 10.3168/jds.2005-442.View ArticlePubMed
- Wu X-L, Heringstad B, Chang YM, de los Campos G, Gianola D: Inferring relationships between somatic cell score and milk yield using simultaneous and recursive models. J Dairy Sci. 2007, 90: 3508-3521. 10.3168/jds.2006-762.View ArticlePubMed
- Hershberger SL, Marcoulides GA, Parramore MM: Structural equation modeling. Applications in ecological and evolutionary biology. 2003, Cambrigde, UK: The press sindicate of the University of Cambridge
- Bollen KA: Structural equations with latent variables. NewYork. 1989
- López de Maturana E, Wu X-L, Gianola D, Weigel KA, Rosa GJM: Exploring biological relationships between calving traits in primiparous cattle with a Bayesian recursive model. Genetics. 2009, 181: 277-287. 10.1534/genetics.108.094888.View Article
- Willham RL: The role of maternal effects in animal breeding: III. Biometrical aspects of maternal effects in animal breeding. J Anim Sci. 1972, 35: 1288-1292.PubMed
- Kriese LA, Bertrand JK, Benyshek LL: Age adjustment factors, heritabilities and genetic correlations for scrotal circumference and related growth traits in Hereford and Brangus bulls. J Anim Sci. 1991, 69: 478-489.PubMed
- Korsgaard IR, Andersen AH, Sorensen D: A useful reparameterisation to obtain samples from conditional inverse Wishart distributions. Genet Sel Evol. 1999, 31: 177-181. 10.1186/1297-9686-31-2-177.PubMed CentralView Article
- Sorensen DA, Gianola D: Likelihood, Bayesian, and MCMC Methods in Quantitative Genetics. 2002, Springer-Verlag New York, Inc., 175 Fifth Avenue, New YorkView Article
- Heringstad B, Chang YM, Svendsen M, Gianola D: Genetic analysis of calving difficulty and stillbirth in Norwegian Red cows. J Dairy Sci. 2007, 90: 3500-3507. 10.3168/jds.2006-792.View ArticlePubMed
- Hansen M, Lund MS, Pedersen J, Christensen LG: Gestation length in Danish Holsteins has weak genetic associations with stillbirth, calving difficulty, and calf size. Livest Prod Sci. 2004, 91: 23-33. 10.1016/j.livprodsci.2004.06.007.View Article
- Steinbock L, Näsholm A, Berglund B, Johansson K, Philipsson J: Genetic effects on stillbirth and calving difficulty in Swedish Holsteins at first and second calving. J Dairy Sci. 2003, 86: 2228-2235.View ArticlePubMed
- Wiggans GR, Misztal I, Van Tassell CP: Calving ease (co)variance conponents for a sire-maternal grandsire threshold model. J Dairy Sci. 2003, 86: 1845-1848.View ArticlePubMed
- Cole JB, Wiggans GR, VanRaden PM, Miller RH: Stillbirth (co)variance components for a sire-maternal grandsire threshold model and development of a calving ability index for sire selection. J Dairy Sci. 2007, 90: 2489-2496. 10.3168/jds.2006-436.View ArticlePubMed
- Berger JO, Pericchi LR: The Intrinsic Bayes Factor for Model Selection and Prediction. J Am Stat Assoc. 1996, 91: 109-122. 10.2307/2291387.View Article
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.