Genotype by environment interaction for tick resistance of Hereford and Braford beef cattle using reaction norm models

Background The cattle tick is a parasite that adversely affects livestock performance in tropical areas. Although countries such as Australia and Brazil have developed genetic evaluations for tick resistance, these evaluations have not considered genotype by environment (G*E) interactions. Genetic gains could be adversely affected, since breedstock comparisons are environmentally dependent on the presence of G*E interactions, particularly if residual variability is also heterogeneous across environments. The objective of this study was to infer upon the existence of G*E interactions for tick resistance of cattle based on various models with different assumptions of genetic and residual variability. Methods Data were collected by the Delta G Connection Improvement program and included 10,673 records of tick counts on 4363 animals. Twelve models, including three traditional animal models (AM) and nine different hierarchical Bayesian reaction norm models (HBRNM), were investigated. One-step models that jointly estimate environmental covariates and reaction norms and two-step models based on previously estimated environmental covariates were used to infer upon G*E interactions. Model choice was based on the deviance criterion information. Results The best-fitting model specified heterogeneous residual variances across 10 subclasses that were bounded by every decile of the contemporary group (CG) estimates of tick count effects. One-step models generally had the highest estimated genetic variances. Heritability estimates were normally higher for HBRNM than for AM. One-step models based on heterogeneous residual variances also usually led to higher heritability estimates. Estimates of repeatability varied along the environmental gradient (ranging from 0.18 to 0.45), which implies that the relative importance of additive and permanent environmental effects for tick resistance is influenced by the environment. Estimated genetic correlations decreased as the tick infestation level increased, with negative correlations between extreme environmental levels, i.e., between more favorable (low infestation) and harsh environments (high infestation). Conclusions HBRNM can be used to describe the presence of G*E interactions for tick resistance in Hereford and Braford beef cattle. The preferred model for the genetic evaluation of this population for tick counts in Brazilian climates was a one-step model that considered heteroscedastic residual variance. Reaction norm models are a powerful tool to identify and quantify G*E interactions and represent a promising alternative for genetic evaluation of tick resistance, since they are expected to lead to greater selection efficiency and genetic progress.


Background
The cattle tick is a parasite that adversely affects beef cattle production in tropical areas such as Brazil. Retail beef markets are imposing restrictions on meat, ensuring that it is free of chemical residues that are perceived as having negative impacts on environment, public health and human welfare. Therefore, to remain competitive in foreign beef markets, Brazil must aim at complying with these higher standards.
To ensure market competitiveness, one strategy might be to increase the contribution of the Bos taurus breeds to Brazilian herds because they are more advantageous in terms of productive traits [1], such as carcass yield, gain weight, meat quality and sexual precocity compared to Bos indicus breeds. However, Bos taurus breeds tend to have greater susceptibility to tick infestation than Bos indicus breeds [2,3]. Hence, selection of animals for tick resistance would be useful to reduce the need for chemical control while also increasing productivity.
Evidence for additive genetic variability of tick counts in cattle includes reported heritability estimates, which range from 0.05 to 0.42 [2,[4][5][6]. Genetic evaluations for tick counts are routinely performed in countries such as Australia and South Africa, which have a similar climate as Brazil and where the cattle tick is also present. Examples of breeds with such evaluations include breeds such as Bonsmara and Belmont Red [7], and Brahman and Hereford-Shorthorn) [8]. In Brazil, the Conexão Delta G (Delta G Connection) company has used a genetic improvement program based on selection for tick resistance in Hereford and Braford cattle since 2003 [9].
These and other research studies and genetic evaluations [7,9] have not considered genotype by environment interactions (G*E). Failing to consider G*E interactions in genetic evaluations can adversely affect breeding programs if relative genetic merit is affected by the environment [10][11][12][13]; specifically, animals that are identified as top breeders in one environment may not be ideal in other environments. This issue is further exacerbated if progeny are raised in environments that differ from that of their parents [13]. In addition, most current genetic evaluation systems assume homogeneous residual variances across environments, although evidence of residual heteroscedacity has been reported, which is defined as heterogeneity of residual variances across contemporary groups, for traits such as milk yield [14] and post-weaning gain [10,15].
Linear reaction norm models capture a simple form of G*E interactions. They are based on the use of covariance functions [16] that allow for the prediction of the relative genetic merit of animals as a function of gradual linear changes in an environmental covariate. Sometimes this environmental covariate is not known with certainty and must be estimated from the data; Su et al. [17] demonstrated how this inference uncertainty could be formally accounted for by using Bayesian methods. If G*E interactions are important for tick resistance, reaction norm models could be used to fine-tune genetic improvement for tick resistance in Brazilian beef cattle. Because G*E interactions contribute to heterogeneous genetic variability across environments, if heteroscedastic residual variability across environments is ignored, inferences on G*E interactions based on reaction norm models could be biased.
The objective of this study was to infer upon G*E interactions based on models with different assumptions regarding the nature of genetic and residual variation and with different approaches to account for uncertainty on environmental gradient.

Tick count data
Data used in this current study were obtained from a breeding program conducted by Conexão Delta G (Delta G Connection). Data included records of tick counts (TC) on Hereford and Braford beef cattle from eight herds from the Rio Grande do Sul state, Brazil. TC were obtained on each animal from 326 to 729 days of age using the method described by Wharton and Utech [18], for which all engorged female ticks larger than 4.5 mm were counted on the entire left side of the animal when average management group infestation, i.e., animals under the same feeding and sanitary management, exceeded 20 ticks per animal. Up to three such counts were obtained for each animal, with each count separated by a minimum of 30 days, as described in other studies [5,19,20]. A total of 241, 1934 and 2188 animals for which, respectively, one, two and three TC were recorded. The average age during the evaluation period was 524 ± 65 days, and the overall mean TC was equal to 35.0 with a standard deviation of 42.2 (ranging from 0 to 532).
The 4363 animals with records were born between 2008 and 2011 and originated from 604 sires and 3966 dams, with 10 generations of pedigree depth. A total of 11,967 animals remained after pedigree pruning (i.e., removing any terminal ancestors that occur only once in the pedigree file). Pedigree information was incomplete due to the use of multiple-sire matings; 36 % of the animals only had their dam known. For animals with TC, this increased to 65 %. Similar pedigree structures from this same population have been used in other studies [20], and they have not affected the results of genetic evaluation. A detailed breakdown of the pedigree structure is in Table 1.
Because TC were not normally distributed (Fig. 1), a log-transformation was used such that LTTC = log 10 (TC + 1.001), which was used as the response variable [1,20]. The constant 1.001 was included because some TC were equal to 0 [1,20]. Skewness and kurtosis tests were performed and ensured the normality of the residuals from the fitted models.
Contemporary groups (CG) were defined as groups of animals within the same herd, year of birth, season of birth (April-July; August-November and December-March), sex and from the same management group. From 11,316 observations, we selected 10,673 records pertaining to 146 CG with at least five animals and with each LTTC record being within 3.5 standard deviation (SD) from its specific CG. Connectedness among the CG was determined by each having more than 10 genetic links in the dataset, using the AMC software [21]. Estimates of CG effects on LTTC were assumed to be the environmental covariates for a linear reaction norm model because they are the most appropriate entities used to describe the environmental conditions for beef cattle production [10,22,23].

Statistical models
Twelve analyses based on different models and/or inferential methodologies and specifications on residual variability were conducted on the data. These analyses are described below as M 1 to M 12 and are summarized in Table 2.

Traditional animal model (AM)
Consider the following simple linear traditional animal model (M 1 ): Here, y ijk is the kth LTTC record of animal j from CG i, β is the vector of fixed effects that includes an overall intercept, linear regression coefficients for Nellore breed proportion, heterozygosity and recombination loss (predetermined by Cardoso et al. [9]), as well as linear and quadratic regression coefficients on age of the animal; x′ j is the known incidence row vector of covariates connecting β to y ijk ; w i is the random effect of CG i (i = 1, 2, . . . , 146 levels); a j is the random additive genetic effect of animal j; c j is the random permanent environmental effect of animal j; and e ijk is the residual error.
The following distributional assumptions were specified: (1)  , σ c 2 and σ e 2 represent variance components due to CG, additive genetics, permanent environment and residual terms, respectively. Here, A represents the numerator of the relationship matrix between the animals in the pedigree, and I is the identity matrix.

Hierarchical bayesian reaction norm models (HBRNM)
Two somewhat different approaches were used to estimate environmental sensitivities of animals. One approach was based on a commonly used two-step model [24,25], in which in the first step, the regular animal model (M 1 ) from Eq. (1) is used to estimate CG effects ŵ i . The second step consists of using these ŵ i estimates as if they were "known" environmental covariates in a linear reaction norm model. More specifically, posterior means of ŵ i obtained from M 1 were used as covariate values in the following reaction norm model (M 2 ).
Here, φ is an overall linear regression coefficient of y ijk on ŵ i ; a j is the additive genetic intercept of animal j pertaining to genetic merit for an average environment (ŵ i = 0); b j is the random additive genetic effect of the reaction norm slope of animal j on ŵ i ; c j is the non-genetic (e.g., permanent environmental effect) intercept of animal j, as defined for an average environment (ŵ i = 0); and d j is the random permanent environmental effect of the reaction norm slope of animal j on ŵ i . Note that y ijk , x′ j β and e ijk are defined as before.
Another two-step modeling strategy (M 3 ) that is very similar to M 2 is given by Eq. (3): In M 3 , contemporary group effects are refitted as random effects rather than being treated as known covariates, such that M 3 may be more flexible than M 2 for modeling CG effects. Nevertheless, ŵ i was again used as a "known" covariate in the random regression portion of the model.
Including ŵ i as if it is a "known" covariate in the second step of this approach is clearly a limitation that may understate statistical uncertainty and lead to biased predictions on animal genetic merit. These biases may be due to genetic trend, differences in environmental covariate values across CG, or both [10,17]. An appealing one-step approach that avoids these limitations of the two-step approach was proposed by Su et al. [17]. This approach is purely Bayesian in that the covariate (3) associated with the reaction norm is treated as unknown, which allows inferences for all unknowns together within a one-step linear reaction norm model (M 4 ): Model M 4 can be rewritten in matrix notation as below [17]: where y = {y ijk } is the nx1 vector of observations; β is the vector of fixed effects of order p; is the vector of random permanent environment slopes; and e is the nx1 vector of residuals. Matrices X, P, Z a and Z c are known incidence matrices, where the column address of matrices Z b and Z d has exactly one element equal to the environmental covariate (w i or an estimate of w i ) for that CG in the row address of the observation, with all other elements in that row equal to 0.

Prior distributional specifications
To infer environmental sensitivities using a hierarchical Bayesian model, three stages are required: the first stage defines the distribution of the observed data conditional on all other parameters [17]: For a homoscedastic residual specification such as for M 1 , M 2 , M 3 and M 4 , R = Iσ e 2 , where σ e 2 is the residual variance and I is the identity matrix. However, as previously noted, it might be important to model residual heteroscedasticity. We propose two alternative strategies for this. The first heteroscedastic residual specification (S 1 ) is defined by R = diag I n i σ 2 e i , a diagonal matrix with elements equal to σ 2 e i = σ 2 e × ηŵ i and I n i denoting an identity matrix of order n i , where n i is the number of records in the ith CG. Here, η is an unknown scaling parameter that characterizes the degree of heterogeneity of residual variance across environments, and ŵ i is the solution for the ith CG [26].
Based on S 1 , we tested two two-step approaches (M 6 and M 7 ) that used inferred values of ŵ i from M 1 as if they were known and a one-step reaction norm model (M 8 ), where w i is an unknown covariate that is jointly inferred with the reaction norm and η parameters. Model M 6 was a heteroscedastic residual extension of M 2 , whereas M 7 and M 8 were heteroscedastic residual extensions of M 3 and M 4 , respectively.
Another heteroscedastic residual specification (S 2 ) was based on residual variance subclasses determined by a decile-based classification of ŵ i , following Cardoso and Tempelman [10]. That is, CG were ordered into one of 10 categories based on decile delimiters of ŵ i obtained from M 1 , such that R = diag I n k σ 2 e γ k , where the order n k denotes the number of records delimited by deciles k − 1 and k, and was 1157, 1174, 1047, 765, 1188, 1192, 1208, 918, 1150 and 874, respectively, for k = 1, 2, . . . , and 10. This specification was used to extend the two-step models M 2 -M 10 and M 3 -M 11 and the one-step model M 4 -M 12 with this particular heteroscedastic residual specification.
The last two models considered (M 5 and M 9 ) were heteroscedastic residual animal models based on extending M 1 with S 1 and S 2 heteroscedastic residual specifications, and were used as control models to determine the consequences of failing to model G*E interactions versus failing to model residual heteroscedasticity [10].
The second stage of HBRNM is represented by the prior distributions of the location parameters, as follows: where p(β) ∝ 1, σ w 2 is the environmental effect variance; σ a 2 and σ b 2 are the additive genetic variances due to the reaction norm intercept and slope, respectively; σ c 2 and σ d 2 are permanent environment variances due to reaction norm intercept and slope, respectively; σ ab is the genetic covariance between reaction norm intercept and slope; and σ cd is the permanent environment covariance between reaction norm intercept and slope. Then, r ab = σ ab / σ 2 a ×σ 2 b and r cd = σ cd / σ 2 c ×σ 2 d are the corresponding genetic and permanent environment correlations between intercept and slope, respectively.
This assumption leads to a prior mean of α γ equal to 3 [E(α γ ) = 3] and a large prior variance (var(α γ ) = 300) [27]. Due to the absence of relevant previous knowledge, flat or highly dispersed prior densities were assumed for all parameters of all models, and hyperparameters for variance components priors were specified on the basis of REML estimates obtained by M 1 and M 2 (not shown).

Bayesian inference
Bayesian analyses were conducted to sample all parameters from their fully conditional posterior distributions. Gibbs sampling was generally used except for the w i 's and η in M 5 , M 6 , M 7 and M 8 and for α γ (S 2 ) in M 9 , M 10 , M 11 and M 12 . MCMC sampling of these parameters required a random walk Metropolis-Hastings step because their full conditional posterior distributions were unrecognizable (see Cardoso and Tempelman [10] for further details).
Monte Carlo Markov chain (MCMC) based inferences were implemented using the INTERGEN software [29] by saving every 10th cycle from a total of 1,000,000 cycles, after 100,000 cycles of burn-in. Global convergence was checked using Geweke's Z criterion [30] applied to the conditional distribution of the data, as proposed by Brooks and Roberts [31]. In addition, visual inspection of trace plots was conducted, and a minimum effective sample size of 100 for all unknown parameters was obtained.

Model comparison
The deviance information criterion (DIC) was used to compare model fit and model complexity [32]: where D(θ) = E θ|y [D(θ)] is the posterior expectation of Bayesian deviance; p D = D(θ) − D(θ) corresponds to the penalty for increasing model complexity, with θ being the vector of model parameters and D(θ) being the Bayesian deviance as a function of the posterior mean of the parameters. Smaller values of DIC thereby indicate better-fitting models, while taking a penalty for model complexity into consideration.

Variance components and genetic parameters
The additive genetic variance of TC for a specific environment i with effect w i was obtained as follows: The genetic covariance of TC between two environmental gradients based on covariate values w i and w i' was calculated as: so that the corresponding correlation between TC in two specific environments was calculated as described below: r a a j + b j w i , a j + b j w i ′ = cov a a j + b j w i , a j + b j w i ′ (σ 2 a + w 2 i σ 2 b + 2w i ′ σ ab )(σ 2 a + w 2 i ′ σ 2 b + 2w i ′ σ ab ) .

Estimated breeding values
An estimate of the breeding value of sire j for TC, specific to a given environment i was obtained by ĝ j |ŵ i = a j + b j w i [10]. On the one hand, estimates of b j close to 0 indicate that ĝ j is relatively constant across various environments (ŵ i ) such that sire j has an environmentally robust genetic merit. On the other hand, an environmentally sensitive genetic merit has a large estimate b j , meaning its relative performance should substantially change on the environmental gradient [33]. The sire breeding value estimates were compared based on the ranking of the animals obtained by AM and HBRNM for low, medium and high environmental levels. These values were defined by the 10, 50 and 90th percentiles for ŵ i . Potential differences in re-rankings of sires for selection based on these models were also determined by the Spearman correlation between estimated breeding values. Spearman correlations were obtained for all animals and for the top 10 % (60) of sires with 12 or more progeny between low, medium and high environmental levels under different fitted models.

Model comparison
Models M 1 , M 5 and M 9 , which were the only models that did not include G*E interactions with a linear reaction norm model, along with M 7 , and M 1 , had the highest or lowest DIC values. Comparison of DIC between models M 1 , M 5 and/or M 9 implies that considering heterogeneity of residual variance across environments is important for modeling LTTC. However, these DIC improvements from homoscedastic to heteroscedastic residual models were small compared to the improvements in DIC when going from regular animal to linear reaction norm models. This suggests that modeling G*E interactions is more important than modeling heterogeneous residual variances ( Table 2).
The two one-step reaction norm models (M 4 and M 12 ) had lower DIC values than the corresponding two-step reaction norm models, except for M 10 . Thus, treating all CG effects as uncertain when modeling G*E interactions based on reaction norms seems to be important. This observation is in agreement with the findings of Su et al. [17], who demonstrated by simulation that jointly estimating all unknown parameters is more reliable than using previously estimated environmental effects from a simple animal model as known covariates. DIC can be used to compare any type of model (not necessarily nested models) [10,13,22,23]. However, when fitting two-step models, the reported DIC values come from the second step because we could not account for the uncertainty about ŵ i estimates from the first step model M 1 . This limitation might have yielded downwards-biased p D and DIC values for two-step models, but even so, their fit was much poorer compared to their counterpart onestep models ( Table 2).
Model M 12 had the lowest DIC value ( Table 2). Recall that M 12 allows for residual variance groupings into decile-based subclasses, which agrees with the findings of Cardoso and Tempelman [10], who reported this same model as the best-fitting in the characterization of postweaning gain in Angus cattle.
Posterior means ŵ i of w i were similar for all models, regardless of whether G*E interactions were considered, as in M 3 , M 4 , M 7 , M 8 , M 11 and M 12 , or not, as in M 1 and M 9 (Fig. 2); Pearson correlations among these estimates between methods always exceeded 0.99, which means they were also not influenced by homoscedastic versus heteroscedastic residual modeling. These results do not agree with Cardoso and Tempelman [10] for post-weaning gain in Angus cattle, for which estimates ŵ i from the model with the decile-based heteroscedastic classification function (S 2 ) had substantially lower correlations with estimates from the heteroscedastic exponential function models (S 1 ), or even the conventional animal models. Furthermore, every model resulted in negative skewness on the ŵ i , ranging from −0.521 to −0.415.

Inferences on variance components and genetic parameters
The one-step model M 12 resulted in the highest (0.022 ± 0.04) estimate of the reference or intercept ) compared with all other models, except for M 8 (0.025 ± 0.03; Table 3). In addition, M 12 showed the highest estimate of the genetic variance for slope (σ b 2 ) compared to the two-step models, except for M 3 and M 8 , which had the same estimate (0.046 ± 0.022). Estimates of the variance components for reference Table 3 Posterior means and 95 % posterior probability intervals reported as (2.5, 97.5th) posterior percentiles of dispersion parameters estimated for tick counts of Hereford and Braford cattle by different models σ a 2 reaction norm intercept genetic variance; σ b 2 reaction norm slope genetic variance; σ c 2 reaction norm intercept permanent environment variance; σ d 2 reaction norm slope permanent environment variance; r ab genetic correlation between intercept and slope of the reaction norm; r cd permanent environment correlation between intercept and slope of the reaction norm; σ w 2 contemporary group effect (environmental) variance; σ e 2 residual variance  ) were similar among all models (ranging from 0.006 to 0.010). In agreement with σ b 2 , PE slopes (σ d 2 ) were also significant (ranging from 0.015 to 0.084). These results show that the one-step approach confirmed the presence of G*E interactions. Biegelmeyer [20], in a study on tick resistance in Hereford and Braford beef cattle reported similar estimates, i.e., 0.012 and 0.022 for σ a 2 and σ c 2 , respectively. Estimates of the correlation between intercept and slope for the additive genetic and permanent environment effects were characterized by a great deal of uncertainty, as shown by the widths of their respective 95 % posterior probability interval (PPI; Table 3). This large uncertainty differs from those of previous studies [10,25,34], which estimated large and positive correlations. These differences may in part be caused by the fact that the correlation estimates depend upon the scale used for ŵ i or because the biological nature of tick counts is different from that of production traits.
Residual variance estimates (σ e 2 ) were similar among models, ranging from 0.062 ± 0.001 to 0.074 ± 0.010, but they were slightly higher in traditional animal models M 1 (0.072 ± 0.001), M 5 (0.070 ± 0.001) and M 9 (0.074 ± 0.010), which confirms the importance of considering G*E interactions in genetic evaluations for Hereford and Braford beef cattle (Table 3). Cardoso and Tempelman [10] also reported that HBRNM resulted in lower estimates of σ e 2 than AM. However, despite the similarity of the residual variances across the various reaction norm models, Fig. 3 illustrates the need to consider residual heteroscedasticity. The first decile class was particularly deviant from the other classes. This unexpected, very large residual variance at the lowest extreme of the CG effects boundary may be due to data artifacts or a non-obvious biological condition associated with low tick infestation levels. Similar results were demonstrated by Cardoso and Tempelman [10], with residual variances being remarkably decreased at the extremes of the CG average performance. Figure 3 also explains the poor fit of models M 5 , M 6 , M 7 and M 8 , which modeled heteroscedastic residual variance as an exponential function (i.e., S 1 ). This function forced a gradual monotonic change in the residual variances over the CG classes, while M 9 , M 10 , M 11 and M 12 showed a more flexible pattern, perhaps reflecting the true residual variance behavior of the actual data.
Heritability estimates (ĥ 2 ) were generally higher for HBRNM and for M 5 and M 9 , compared to M 1 (ĥ 2 = 0.19 ± 0.04; Fig. 4a). Similar heritability estimates have been reported in the literature, using models such as M 1 and logarithmic transformations of the observed data [1,5]. With M 12 , average heritability estimates were higher, which also indirectly indicates the better fit of one-step versus two-step models that consider residual heteroscedasticity. Other studies in beef cattle also found higher average heritability estimates for weaning weight and 450-day weight, respectively, using HBRNM compared to AM [10,35]. Therefore, greater response to selection is expected when using reaction norm models that model heterogeneity of residual variances across CG. Considering that using data from animals with unknown sires could lead to lower heritability estimates, we found that our heritability estimates were similar to those previously reported in the literature [1,5,20,35].
Estimates of repeatability varied along the environmental gradient (ranging from 0.18 to 0.45) and were, in general, higher under high levels of tick infestation (Fig. 4b).
These results demonstrate the particular importance of modeling permanent environmental effects in harsh environments, where more resistant animals are more likely to maintain a consistent performance. Posterior means of the genetic correlations [see Eq. (19)] between breeding values along the environmental gradients for Hereford and Braford LTTC that were obtained by the best-fitting model M 12 demonstrated a large plateau above 0.80 (Fig. 5). Furthermore, estimated genetic correlations decreased as the tick infestation level increased, with negative correlations between extreme environmental levels, i.e., between more favorable (low infestation) and harsh environments (high infestation). Similar results that demonstrate differences in genetic correlations between breeding values along environmental levels, mainly between high challenge conditions and favorable environments, have been reported in the literature [10,13,25]. However, Ambrosini [36]  small differences for Nellore yearling weight, with genetic correlations between breeding values along the environmental gradient ranging from 0.78 to 1.00.

Inferences on genetic merit
A low genetic correlation between breeding values in extreme infestation environments (Fig. 5) could indicate that different animals would be selected when using the reaction norm model M 12 . However, Spearman rank correlations among genetic values obtained by different models were always higher than 0.85 (Table 4), which indicates that rankings of animals would be similar and, thus, substantial losses on selection precision might not be observed when using a traditional animal model.

Conclusions
Hierarchical Bayesian reaction norm models can be used to describe the presence of genotype by environment interactions for tick resistance in Hereford and Braford beef cattle. The model that best fitted tick counts in Brazilian climates was a one-step model that considered heteroscedastic residual variance based on ten discrete classes of deciles of average CG performance (M 12 ), and hence, this model should be considered as the preferred model for genetic evaluation of this population. However, other functions on residual variance and other classes of models can be evaluated as viable approaches. Reaction norm models are a powerful tool to identify and quantify genotype by environment interactions and present a promising alternative for genetic evaluation of tick resistance, since they are expected to lead to greater selection efficiency and genetic progress.