Model for fitting longitudinal traits subject to threshold response applied to genetic evaluation for heat tolerance
- Juan Pablo Sánchez^{1}Email author,
- Romdhane Rekaya^{2} and
- Ignacy Misztal^{2}
https://doi.org/10.1186/1297-9686-41-10
© Sánchez et al; licensee BioMed Central Ltd. 2009
Received: 17 December 2008
Accepted: 14 January 2009
Published: 14 January 2009
Abstract
A semi-parametric non-linear longitudinal hierarchical model is presented. The model assumes that individual variation exists both in the degree of the linear change of performance (slope) beyond a particular threshold of the independent variable scale and in the magnitude of the threshold itself; these individual variations are attributed to genetic and environmental components. During implementation via a Bayesian MCMC approach, threshold levels were sampled using a Metropolis step because their fully conditional posterior distributions do not have a closed form. The model was tested by simulation following designs similar to previous studies on genetics of heat stress. Posterior means of parameters of interest, under all simulation scenarios, were close to their true values with the latter always being included in the uncertain regions, indicating an absence of bias. The proposed models provide flexible tools for studying genotype by environmental interaction as well as for fitting other longitudinal traits subject to abrupt changes in the performance at particular points on the independent variable scale.
Introduction
Reaction norm models have been proposed as an alternative for fitting Genotype by Environment interactions (GxE) in evolutionary biology and animal breeding [1]. In reaction norm models, the environment is often described by a continuous variable, and the phenotypes are partially explained by the regression of the genotypic values on the environmental values. When an environmental variable is observed on a continuous scale (i.e., temperature), it is expected to have a direct one-to-one relationship between the environmental scale and values. Consequently, the reaction norm model can be fitted by regressing the genotypic values on the observed environmental scale [2, 3]. When the observed environmental scale is not continuous (i.e., herd classes), the genotypic values can be regressed on the effect of the categorical variable defining the different environments using, for example, least squared means of the class effects [4] or inferring the environmental values jointly with the remaining set of parameters in the model [5].
In animal breeding applications of reaction norm models, it was assumed that both the mean and the variances are either continuous, monotone functions of the environmental values [4, 6] or that they are such only when the environmental values exceed a certain threshold [2, 7, 3]. In past studies involving thresholds, the same threshold was assumed for all animals, and it was estimated based on the quality of the fit of the average performances as a function of environmental values.
The objective of this study was to present a Bayesian hierarchical model for fitting a longitudinal trait showing an abrupt linear change at some value of the independent variable. Simulations were inspired by reaction norm models, and the procedure postulates that the effect of the environmental variable is not existent until it exceeds a certain unknown value particular for each individual with data. Furthermore, the model allows for partitioning individual variability on the threshold into genetic and environmental components.
Methods
Model and Prior specification
A general description of hierarchical Bayesian modelling can be found in [8]. Here the first stage of the hierarchy describes the data generating process, or the conditional distribution of the observed phenotypes given the model parameters. The following model was assumed:
y_{ ijk }= CG_{ k }+ μ_{ j }+ ϕ_{ j }× max{0, THI_{ ij }- τ_{0, j}} + ε_{ ijk },
where y_{ ijk }is the i^{ th }observation measured on animal j in contemporary group k (CG_{ k }), and THI_{ ij }is the temperature and humidity index [2, 7] associated with the i^{th} observation of animal j. Random variables μ_{ j }, ϕ_{ j }and τ_{0, j}associated with the animal j represent an intercept (μ_{ j }), or individual value in the absence of heat stress, slope (ϕ_{ j }), or a change in the performance per unit of change in the THI index above the individual threshold (τ_{0, j}). In this study, the heat load function [7] was defined in a way that was similar to previous studies on genetics of instantaneous heat stress on daily milk production [2]. Finally, ε_{ ijk }is a random homoskedastic error term associated with each particular observation.
where U indicates the uniform distribution and K is the number of levels of the contemporary group effect.
where ${\beta}^{\prime}=\left({{\beta}^{\prime}}_{\mathit{\mu}},{{\beta}^{\prime}}_{\mathit{\phi}},{{\beta}^{\prime}}_{{\mathit{\tau}}_{0}}\right)$, ${a}^{\prime}=\left({{a}^{\prime}}_{\mathit{\mu}},{{a}^{\prime}}_{\mathit{\phi}},{{a}^{\prime}}_{{\mathit{\tau}}_{0}}\right)$, and μ, ϕ_{0} and τ_{0} are vectors including scalar parameters of individuals (μ_{ j }, ϕ_{ j }and τ_{0, j}).
Parameters of a given individual were considered to be conditionally independent and affected at their mean level by systematic (β_{ μ }, β_{ ϕ }and ${\beta}_{{\tau}_{0}}$) and genetic effects (a_{ μ }, a_{ ϕ }and ${a}_{{\tau}_{0}}$); the residual (co)variance matrix between underlying variables was R_{0}, which is equivalent to a (co)variance matrix between permanent environmental effects on the observed measures scale.
where G_{0} is the (co)variance matrix between the additive genetic effects for the underlying variables. The residual (co)variance matrix was assumed to follow a uniform distribution.
In the fourth and last hierarchical stage, a prior distribution was assigned to the genetic (co)variance matrix for the underlying variables. A uniform distribution was assumed as in the case of the residual (co)variance matrix.
Fully conditional posterior distributions
The fully conditional posterior distributions must be obtained in order to perform a Bayesian MCMC estimation procedure using the Gibbs sampler algorithm. After defining the joint posterior distribution as the product of the conditional likelihood and all the prior distributions [8], the terms involving the parameter of interest in the joint posterior distribution were retained. For the model described, all the fully conditional posterior distributions are exactly the same as those described for a hierarchical model assuming intercept and linear terms [10], except those involving the individual thresholds. For all the position parameters, both in the first and second hierarchical stages, the fully conditional posterior densities were proportional to normal distributions; the fully conditional distribution for the residual variance in the first stage followed a scaled inverted chi squared distribution, and the genetic and residual (co)variance matrices in the third and second stages followed inverted Wishard distributions.
The first term comes from the likelihood; J refers to the subset of records belonging to animal j. The second term comes from the prior (second hierarchical stage); note that the relationship between the animal j and the other individuals in the population are taken into account throughout the given values of the additive genetic effects. In this second factor, scalars r^{i, j} refer to the relevant elements of the inverse of R_{0}, which is the residual (co)variance matrix in the second hierarchical stage. This fully conditional posterior distribution does not have a known closed form; thus a Metropolis step [11] was used to sample from it.
In this expression ${\widehat{a}}_{\mu ,j}^{\left[r\right]},{\widehat{a}}_{\phi ,j}^{\left[r\right]}$ and ${\widehat{a}}_{{\tau}_{0},j}^{\left[r\right]}$ are the sampled values for the additive genetic effects for the animal j during the r^{th} iteration; ${\tilde{e}}_{\mu ,i},{\tilde{e}}_{\phi ,i}$ and ${\tilde{e}}_{{\tau}_{0},i}$ are random deviates sampled from $MVN\left(0,{\widehat{R}}_{0}^{\left[r\right]}\right)$, where ${\widehat{R}}_{0}^{\left[r\right]}$ is the value of the residual (co)variance matrix in the second hierarchical stage sampled; ${\widehat{\overline{\tau}}}_{0}^{\left[r\right]}$ and ${\widehat{\overline{\phi}}}^{\left[r\right]}$ are sampled values of the overall mean for the threshold level and slope. They were computed during the r^{th} iteration by applying the appropriate vectors of linear contrast to the sampled vector of systematic effects, ${\widehat{\beta}}_{{\tau}_{0}}^{\left[r\right]}$ and ${\widehat{\beta}}_{\phi}^{\left[r\right]}$. Finally, in the equation of the overall phenotypic variance, ${\widehat{\sigma}}_{\epsilon}^{2\left[r\right]}$ is the value of the residual variance in the first hierarchical stage. We used the aggregated phenotypes (i.e.${\widehat{\overline{\phi}}}^{\left[r\right]}+{\widehat{a}}_{\phi ,j}^{\left[r\right]}+{\tilde{e}}_{\phi ,j}$) instead of the sampled values ${\mu}_{j}^{\left[r\right]},{\phi}_{j}^{\left[r\right]}$, and ${\tau}_{0,j}^{\left[r\right]}$ to avoid the variation due to systematic effects in the second hierarchical stage.
where ${\widehat{\overline{\tau}}}_{0}^{\left[r\right]},{\widehat{\overline{\phi}}}^{\left[r\right]},{\widehat{a}}_{\mu ,i}^{\left[r\right]},{\widehat{a}}_{\phi ,i}^{\left[r\right]}$ and ${\widehat{a}}_{{\tau}_{0},i}^{\left[r\right]}$ have the same meaning as those previously described in the equation for ${\widehat{p}}_{ij}^{\left[r\right]}$. Note that non-zero expected values are considered in the equations for computing both phenotypic and genetic variances; the derived random variables, ${\widehat{u}}^{\left[r\right]}$ and ${\widehat{p}}^{\left[r\right]}$, are non-linear functions of random correlated variables, thus their expected values are non-zero [12]. Also note that the relationships between records were not considered when computing the phenotypic variance due to complexity.
Based on these computed variance components, relevant genetic parameters and other genetic quantities can be easily defined for different environments (THI values). For example, heritability or expected genetic response to a selection index could be defined for different environmental values [13].
Data
Simulated data sets were used to investigate the performance of the Bayesian implementation of the model described above.
Different combinations of heritabilities and correlations for the underlying variables were investigated: low (0.1), medium (0.2) and high (0.5) heritabilities; and low (0.2, 0.3) and high (0.7, 0.9) correlations, in absolute value. In addition, two different data set designs were considered, approximately 20 (S20) and 10 (S10) records per animal. Thus, 12 different scenarios were investigated, and for each one ten replications were run.
For both data size scenarios the same genetic structure was considered but with different sizes. For S20 in the first generation, 40 males and 200 females were generated, and in the second generation, each sire was mated to five females, producing four full sibs from each mating. Thus, the entire population consisted of 1,040 animals. For S10 in the first generation, 80 males and 400 females were generated, and in the second generation, each sire was again mated to five females, producing four full sibs. In this case the entire population consisted of 2080 animals. This genetic structure resembles prolific species populations like swine or rabbit.
For both data structures 21,500 records were generated according to the described model and assigned to the total number of animals in the population. For generating records only an overall mean (with a value of 90) was considered in the first hierarchical stage as the CG effect, and overall means for the threshold (19) and for the slope (-0.5) were the only considered systematic effects in the second hierarchical stage. THI values were generated by sampling from a Normal distribution with mean 18.0 and variance 10.0, resembling the distribution of THI values in a temperate climate.
Gibbs Sampler implementation
For each replication, a Gibbs Sampler algorithm was run for 100,000 rounds, of which the first 10,000 were discarded as burn-in period; afterwards one tenth of the rounds were retained. The threshold level was sampled via a Metropolis step by using a proposal density that was normally distributed and centered on the previous value of the threshold. The variance of the proposal density was constant across animals. During the burn-in period, the value of the variance of the proposal was tuned for an average acceptance rate of around 0.5 under all the scenarios. In a post-Gibbs analysis, the convergence of the chains were assessed both by visual inspection of the trace plots for the most relevant parameters and through the Geweke test [14], in addition the effective sample size (ESS) was computed using the function effective Size () from the coda package in R [15].
Results
Parameter estimates for 6 parameter scenarios when 20 records were considered per animal (averages over 10 replications)
1 | 2 | 3 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
True | PM^{a} | PSD^{b} | ESS^{c} | True | PM | PSD | ESS | True | PM | PSD | ESS | |
μ _{ T } | 19 | 18.96 | 0.15 | 352 | 19 | 19.10 | 0.16 | 416 | 19 | 19.08 | 0.16 | 399 |
h ^{ 2 } _{ I } | 0.5 | 0.52 | 0.06 | 2110 | 0.2 | 0.20 | 0.05 | 583 | 0.1 | 0.14 | 0.05 | 318 |
h ^{ 2 } _{ S } | 0.5 | 0.56 | 0.08 | 617 | 0.2 | 0.23 | 0.07 | 392 | 0.1 | 0.12 | 0.06 | 112 |
h ^{ 2 } _{ T } | 0.5 | 0.48 | 0.18 | 91 | 0.2 | 0.36 | 0.15 | 98 | 0.1 | 0.37 | 0.16 | 95 |
ρ _{ g, I-S } | 0.3 | 0.26 | 0.11 | 818 | 0.3 | 0.30 | 0.23 | 301 | 0.3 | 0.54 | 0.27 | 75 |
ρ _{ g, I-T } | -0.2 | -0.21 | 0.24 | 159 | -0.2 | -0.23 | 0.36 | 63 | -0.2 | -0.06 | 0.36 | 61 |
ρ _{ g, S-T } | -0.2 | -0.31 | 0.23 | 141 | -0.2 | -0.19 | 0.33 | 99 | -0.2 | 0.02 | 0.39 | 83 |
ρ _{ p, I-S } | 0.3 | 0.35 | 0.09 | 768 | 0.3 | 0.31 | 0.06 | 601 | 0.3 | 0.30 | 0.05 | 507 |
ρ _{ p, I-T } | -0.2 | -0.23 | 0.23 | 129 | -0.2 | -0.22 | 0.16 | 209 | -0.2 | -0.29 | 0.14 | 217 |
ρ _{ p, S-T } | -0.2 | -0.15 | 0.23 | 109 | -0.2 | -0.21 | 0.14 | 214 | -0.2 | -0.25 | 0.12 | 203 |
σ ^{ 2 } _{ e } | 10 | 9.97 | 0.10 | 8142 | 10 | 9.98 | 0.10 | 9000 | 10 | 9.99 | 0.10 | 9000 |
4 | 5 | 6 | ||||||||||
True | PM | PSD | ESS | True | PM | PSD | ESS | True | PM | PSD | ESS | |
μ _{ T } | 19 | 18.93 | 0.15 | 79 | 19 | 18.98 | 0.17 | 50 | 19 | 19.07 | 0.15 | 38 |
h ^{ 2 } _{ I } | 0.5 | 0.50 | 0.06 | 1411 | 0.2 | 0.20 | 0.05 | 489 | 0.1 | 0.11 | 0.04 | 177 |
h ^{ 2 } _{ S } | 0.5 | 0.52 | 0.07 | 433 | 0.2 | 0.22 | 0.06 | 297 | 0.1 | 0.15 | 0.06 | 110 |
h ^{ 2 } _{ T } | 0.5 | 0.47 | 0.11 | 61 | 0.2 | 0.33 | 0.12 | 48 | 0.1 | 0.31 | 0.10 | 55 |
ρ _{ g, I-S } | 0.7 | 0.68 | 0.07 | 330 | 0.7 | 0.68 | 0.16 | 103 | 0.7 | 0.67 | 0.21 | 76 |
ρ _{ g, I-T } | -0.7 | -0.68 | 0.12 | 51 | -0.7 | -0.56 | 0.24 | 29 | -0.7 | -0.44 | 0.31 | 33 |
ρ _{ g, S-T } | -0.9 | -0.88 | 0.06 | 48 | -0.9 | -0.72 | 0.15 | 61 | -0.9 | -0.72 | 0.18 | 54 |
ρ _{ p, I-S } | 0.7 | 0.74 | 0.05 | 245 | 0.7 | 0.69 | 0.04 | 218 | 0.7 | 0.72 | 0.03 | 212 |
ρ _{ p, I-T } | -0.7 | -0.64 | 0.13 | 48 | -0.7 | -0.72 | 0.09 | 39 | -0.7 | -0.79 | 0.08 | 30 |
ρ _{ p, S-T } | -0.9 | -0.87 | 0.07 | 65 | -0.9 | -0.92 | 0.05 | 57 | -0.9 | -0.92 | 0.04 | 59 |
σ ^{ 2 } _{ e } | 10 | 9.99 | 0.10 | 4018 | 10 | 9.97 | 0.10 | 5860 | 10 | 9.95 | 0.10 | 6458 |
Parameter estimates for 6 parameter scenarios when 10 records were considered per animal (averages over 10 replications)
1 | 2 | 3 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
True | PM^{a} | PSD^{b} | ESS^{c} | True | PM | PSD | ESS | True | PM | PSD | ESS | |
μ _{ T } | 19 | 19.04 | 0.20 | 54 | 19 | 19.08 | 0.15 | 181 | 19 | 19.15 | 0.15 | 174 |
h ^{ 2 } _{ I } | 0.5 | 0.51 | 0.04 | 1683 | 0.2 | 0.19 | 0.04 | 675 | 0.1 | 0.11 | 0.03 | 272 |
h ^{ 2 } _{ S } | 0.5 | 0.55 | 0.07 | 188 | 0.2 | 0.26 | 0.07 | 286 | 0.1 | 0.11 | 0.05 | 79 |
h ^{ 2 } _{ T } | 0.5 | 0.61 | 0.17 | 36 | 0.2 | 0.43 | 0.17 | 58 | 0.1 | 0.38 | 0.15 | 60 |
ρ _{ g, I-S } | 0.3 | 0.26 | 0.09 | 270 | 0.3 | 0.29 | 0.18 | 310 | 0.3 | 0.37 | 0.32 | 51 |
ρ _{ g, I-T } | -0.2 | -0.04 | 0.21 | 68 | -0.2 | -0.03 | 0.34 | 57 | -0.2 | -0.03 | 0.43 | 28 |
ρ _{ g, S-T } | -0.2 | -0.30 | 0.18 | 77 | -0.2 | -0.34 | 0.25 | 82 | -0.2 | -0.51 | 0.28 | 57 |
ρ _{ p, I-S } | 0.3 | 0.38 | 0.08 | 264 | 0.3 | 0.30 | 0.05 | 526 | 0.3 | 0.29 | 0.05 | 289 |
ρ _{ p, I-T } | -0.2 | -0.38 | 0.26 | 37 | -0.2 | -0.31 | 0.18 | 122 | -0.2 | -0.31 | 0.15 | 120 |
ρ _{ p, S-T } | -0.2 | 0.00 | 0.27 | 51 | -0.2 | -0.17 | 0.15 | 139 | -0.2 | -0.15 | 0.11 | 175 |
σ ^{ 2 } _{ e } | 10 | 10.07 | 0.11 | 3124 | 10 | 9.94 | 0.11 | 7308 | 10 | 9.99 | 0.11 | 6390 |
4 | 5 | 6 | ||||||||||
True | PM | PSD | ESS | True | PM | PSD | ESS | True | PM | PSD | ESS | |
μ _{ T } | 19 | 18.98 | 0.22 | 28 | 19 | 19.10 | 0.25 | 17 | 19 | 19.12 | 0.53 | 8 |
h ^{ 2 } _{ I } | 0.5 | 0.49 | 0.04 | 795 | 0.2 | 0.22 | 0.04 | 337 | 0.1 | 0.10 | 0.03 | 136 |
h ^{ 2 } _{ S } | 0.5 | 0.54 | 0.06 | 141 | 0.2 | 0.22 | 0.05 | 82 | 0.1 | 0.12 | 0.05 | 63 |
h ^{ 2 } _{ T } | 0.5 | 0.52 | 0.11 | 25 | 0.2 | 0.37 | 0.11 | 26 | 0.1 | 0.36 | 0.14 | 9 |
ρ _{ g, I-S } | 0.7 | 0.67 | 0.08 | 109 | 0.7 | 0.67 | 0.11 | 55 | 0.7 | 0.70 | 0.19 | 23 |
ρ _{ g, I-T } | -0.7 | -0.63 | 0.13 | 17 | -0.7 | -0.39 | 0.25 | 16 | -0.7 | -0.18 | 0.31 | 20 |
ρ _{ g, S-T } | -0.9 | -0.85 | 0.09 | 12 | -0.9 | -0.70 | 0.16 | 20 | -0.9 | -0.65 | 0.20 | 25 |
ρ _{ p, I-S } | 0.7 | 0.74 | 0.05 | 113 | 0.7 | 0.72 | 0.03 | 72 | 0.7 | 0.72 | 0.03 | 57 |
ρ _{ p, I-T } | -0.7 | -0.74 | 0.12 | 19 | -0.7 | -0.82 | 0.08 | 14 | -0.7 | -0.82 | 0.07 | 12 |
ρ _{ p, S-T } | -0.9 | -0.89 | 0.07 | 31 | -0.9 | -0.91 | 0.05 | 25 | -0.9 | -0.95 | 0.03 | 14 |
σ ^{ 2 } _{ e } | 10 | 10.02 | 0.11 | 1949 | 10 | 9.95 | 0.11 | 2623 | 10 | 10.01 | 0.11 | 1624 |
As expected, inference efficiency, measured through the marginal posterior standard deviation averages across parameters in Tables 1 and 2 (except residual variance), was reduced as the correlations between underlying variables was reduced. On the contrary, algorithm efficiency, measured through the ESS averages across parameters in Tables 1 and 2 (except residual variance), decreased as correlations increased. In both correlation scenarios, increasing heritability increases inference efficiency for genetic correlations but reduces efficiency for the estimation of heritabilities and environmental correlations. In general, the algorithm average efficiency increases with heritability but some exceptions can be found, particularly under data structure S10.
Pearson correlations between predicted and true breeding in the 12 investigated scenarios (average across replications)
Number of records per animal = 20 | ||||||
---|---|---|---|---|---|---|
1^{a} | 2 | 3 | 4 | 5 | 6 | |
Intercept | 0.79 | 0.57 | 0.78 | 0.57 | 0.47 | 0.44 |
Slope | 0.71 | 0.51 | 0.71 | 0.50 | 0.43 | 0.37 |
Threshold | 0.35 | 0.25 | 0.65 | 0.37 | 0.17 | 0.26 |
Number of records per animal = 10 | ||||||
1 | 2 | 3 | 4 | 5 | 6 | |
Intercept | 0.77 | 0.59 | 0.77 | 0.58 | 0.46 | 0.44 |
Slope | 0.63 | 0.47 | 0.66 | 0.47 | 0.32 | 0.36 |
Threshold | 0.24 | 0.16 | 0.57 | 0.34 | 0.12 | 0.17 |
Pearson correlation between predicted and true underlying variables in the 12 investigated scenario (average across replications)
Number of records per animal = 20 | ||||||
---|---|---|---|---|---|---|
1^{a} | 2 | 3 | 4 | 5 | 6 | |
Intercept | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
Slope | 0.85 | 0.84 | 0.89 | 0.87 | 0.84 | 0.87 |
Threshold | 0.42 | 0.42 | 0.81 | 0.80 | 0.41 | 0.80 |
Number of records per animal = 10 | ||||||
1 | 2 | 3 | 4 | 5 | 6 | |
Intercept | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 |
Slope | 0.75 | 0.74 | 0.82 | 0.82 | 0.73 | 0.82 |
Threshold | 0.30 | 0.31 | 0.75 | 0.74 | 0.30 | 0.74 |
Discussion
The model presented in this study provides greater flexibility over traditional reaction norm models when the environmental variable is known, as it allows a semi-parametric form for the reaction norm function. This is a semi-parametric model in the sense that the point in which the linear change is assumed to start is defined by the data themselves. The forms of the functions before and after this point are defined parametrically a priori, i.e., constant before the change point and a linear function afterwards. To increase flexibility, higher order polynomials or spline functions could be fitted within each one of these two separate periods, with the advantage that within each one of the periods, the functions would remain linear on the parameters. The presented inferential procedure gave unbiased estimates because the uncertain regions always covered the true value of the parameters.
Several alternative algorithms have been proposed for non-parametric or semi-parametric curve fitting. One of them is a Reversible Jump MCMC algorithm where the optimal number of change points (parameters in the model) is estimated [16]. The model presented in this study is a simplified version of this semi-parametric procedure, as the number of parameters is fixed a priori. However, the indicated study focuses on fitting averages along the independent variable trajectory; in our case we fit individual sources of variation throughout this trajectory. For this purpose and from a computational point of view, the proposed hierarchical structure is particularly suitable, since the dimension of the problem became greater than when fitting changes in the mean. By using this hierarchical structure, updating mixed model equations in each round of the Gibbs Sampler can be avoided; only the right hand side needs to be modified. In addition, this hierarchical structure jointly with the Bayesian estimation procedure allows for a more appropriate prior assumption that takes advantage of the family structure in the population. Other general procedures for finding change points in continuous functions are the so-called change point techniques. These approaches were previously used in animal breeding to find points of change when fitting heterogeneous residual variance analysing test day milk records [17]. These approaches provide greater flexibility than the models presented because they allow for non-linear functions within each one of the defined regions. However these techniques are more complex because of the non-linearity and the values of two successive functions at change points need to be constrained explicitly to be identical. Our parametrization model can be considered a truncated power representation of a linear spline [18], and in these cases the aforementioned constraints are implicitly considered [19].
Like other previously proposed reaction norm models [2, 7, 3], the described model could be used for studies and evaluations for genetic tolerance to high heat. The model allows the identification of not only those individuals in the population that are less sensitive to temperature changes after a particular threshold, but also those that became heat stressed at higher values of temperature or THI value. And this individual variation can be partitioned into environmental and genetic components, both for the threshold and the intensity of sensitivity to heat stress. This makes it possible to identify genetically superior individuals for a particular underlying variable of interest: intercept, slope, threshold, or some index involving these variables.
The load function used in this study is the same used for fitting the effect of instantaneous THI on milk production [2]. However it is relatively straight forward to consider more complex functions, for example, those used for studying cumulative effect of THI on carcass weight in pigs [7, 3].
In the described model, the covariate (THI) is assumed to be known; however, a traditional reaction norm model could be fitted by predicting an unobserved environmental covariate from the contemporary groups. This extension can be implemented either in two steps as in Kolmodin et al. [4] or more complexly as in Su et al. [5] by integrating out all the possible values of the contemporary group effects. In these models with unknown covariates, it could be equally reasonable to assume that no effect is observed on the phenotypic performance until some threshold in the environmental scale is reached, beyond which some kind of change in the performance could be expected.
The presented model was applied to study variability on the onset of heat stress tolerance on milk production in dairy cattle. In this study the population size was around 90,000 animals and over 300,000 test-day records were considered. For this data set 250,000 Gibbs iterations took approximately 5.0 CPU days.
Although the methodology presented has been illustrated by focusing on the genetics of heat stress tolerance, more applications could be considered. In particular those longitudinal traits showing a threshold response, i.e., those traits with an abrupt change in the response beyond some point on the explanatory variable scale could be fitted using the model presented.
Conclusion
A model for fitting traits in which the response to an environmental variable is subject to an abrupt linear change was presented. The described statistical procedure performed satisfactorily under the simulated scenarios in estimating the model parameters. As an application example, the model could be useful for identifying animals with higher adaptation to environmental changes, to heat in particular. These animals will be characterized by a smaller phenotypic decline in the performance as well as a later onset of environmental stress. In addition, the proposed methodology can attribute the individual variation on these two expressions of tolerance to environmental stress to genetic and systematic components, which would be useful for the detection of genetically superior breeding animals to be used in selection.
Declarations
Acknowledgements
The authors thank Dr Andrés Legarra, Dr Kelly Robbins and Prof Manuel Baselga for their useful comments and suggestions on the early versions of the manuscript. Also suggestions from two referees are greatly appreciated; their comments improved the study design and manuscript. Study partially carried out during a postdoctoral stay of Juan Pablo Sánchez in the Animal and Dairy Science Department of the University of Georgia, US
Authors’ Affiliations
References
- De Jong G: Phenotypic plasticity as a product of selection in a variable environment. Am Nat. 1995, 145: 493-512. 10.1086/285752.View ArticleGoogle Scholar
- Ravagnolo O, Misztal I: Genetic component of heat stress in dairy cattle, parameter estimation. J Dairy Sci. 2000, 83: 2126-2130.View ArticlePubMedGoogle Scholar
- Zumbach B, Misztal I, Tsuruta S, Sanchez JP, Azain M, Herring W, Holl J, Long T, Culbertson M: Genetic component of heat stress in finishing pigs, Parameter estimation. J Anim Sci. 2008, 86: 2076-2081. 10.2527/jas.2007-0282.View ArticlePubMedGoogle Scholar
- Kolmodin R, Strandberg E, Madsen P, Jensen J, Jorjani H: Genotype by environmental interaction in nordic dairy cattle studied using reaction norms. Acta Agric Scand. 2002, 52: 11-24.Google Scholar
- Su G, Madsen P, Lund MS, Sorensen D, Korsgaard IR, Jensen J: Bayesian analysis of the linear reaction norm model with unknown covariates. J Anim Sci. 2006, 84: 1651-1657. 10.2527/jas.2005-517.View ArticlePubMedGoogle Scholar
- Kolmodin R, Strandberg E, Jorjani H, Danell B: Selection in the presence of genotype by enviromental interaction: response in environmental sensitivity. Anim Sci. 2003, 76: 375-385.Google Scholar
- Zumbach B, Misztal I, Tsuruta S, Sanchez JP, Azain M, Herring W, Holl J, Long T, Culbertson M: Genetic component of heat stress in finishing pigs, development of a heat load function. J Anim Sci. 2008, 86: 2082-2088. 10.2527/jas.2007-0523.View ArticlePubMedGoogle Scholar
- Sorensen D, Gianola D: Likelihood, Bayesian and MCMC Methods in Quantitative Genetics. 2002, New York USA: Springer-VerlagView ArticleGoogle Scholar
- Bulmer MG: The mathematical theory of quantitative genetics. 1980, Oxford: Claredon PressGoogle Scholar
- Varona L, Moreno C, García Cortés LA, Altarriba J: Multiple trait genetic analysis of underlying biological variables of production functions. Livest Prod Sci. 1997, 47: 201-209. 10.1016/S0301-6226(96)01415-7.View ArticleGoogle Scholar
- Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E: Equations of state calculations by fast computing machines. J Chem Phys. 1953, 21: 1087-1092. 10.1063/1.1699114.View ArticleGoogle Scholar
- Lynch M, Walsh B: Genetics and analysis of quantitative traits. 1998, Sunderland, MA, USA: Sinauer Associated, firstGoogle Scholar
- Kolmodin R, Bijma P: Response to mass selection when the genotype by environment interaction is modelled as a linear reaction norm. Genet Sel Evol. 2004, 36: 435-454. 10.1051/gse:2004010.PubMed CentralView ArticlePubMedGoogle Scholar
- Geweke J: Evaluating the accuracy of sampling-based approaches to calculating posterior moments. Bayesian Statistics 4. Edited by: Bernardo JM, Berger JO, Dawid AP, Smith AFM. 1992, Oxford: Oxford University PressGoogle Scholar
- Plummer M, Best N, Cowles K, Vines K: CODA: Output analysis and diagnostics for MCMC. R package version 0.13–2. 2008Google Scholar
- Denison DGT, Mallick BK, Smith AFM: Automatic Bayesian curve fitting. J R Statist Soc B 60. 1998, Part 2: 333-350. 10.1111/1467-9868.00128.View ArticleGoogle Scholar
- Rekaya R, Carabaño MJ, Toro MA: Assessment of heterogeneity of residual variances using changepoint techniques. Genet Sel Evol. 2000, 32: 383-394. 10.1051/gse:2000125.PubMed CentralView ArticlePubMedGoogle Scholar
- Meyer K: Random Regression analyses using B-splines to model growth of Australian Angus cattle. Genet Sel Evol. 2005, 37: 437-500. 10.1051/gse:2005010.View ArticleGoogle Scholar
- Gallant AR, Fuller WA: Fitting segmented polynomial regression models whose joint points have to be estimated. J Am Stat Assoc. 1973, 68: 144-147. 10.2307/2284158.View ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.