Phenotypic and genomic data
Data for this study were provided by Genus plc (Hendersonville, TN, USA). Animal Care and Use Committee approval was not necessary for this study because the data were obtained from an existing database. Data on litter size (total number of piglets born per litter) were from a pig pure line. The average litter size was equal to 12.7 ± 3.1 and 13,369 records were available for 3619 sows. Genotypes for all sows were generated using the Illumina PorcineSNP60 BeadChip (Illumina, San Diego, CA). After quality control using default parameters by preGSf90 [20] (HWE, minor allele frequency, SNP call rate and animal call rate), 38,779 autosomal SNPs remained and were used to build genomic relationship matrices.
Genomic evaluation models
Phenotypes were analyzed using a genomic best linear unbiased prediction (GBLUP) (mixed) model. Parity number and the combined effect farm-year-month of farrowing were included as fixed effects. The model also included a random permanent environmental effect for each sow. The linear model including additive, dominant and interaction terms can be written as:
$${\mathbf{y}} = {\mathbf{X\varvec{\upbeta }}} + {\mathbf{f}}b + {\mathbf{Zg}}_{A} + {\mathbf{Zg}}_{D} + {\mathbf{Z}}\mathop \sum \limits_{i = A,D} \mathop \sum \limits_{{\begin{array}{*{20}c} {j = A,D} \\ {i \ge j} \\ \end{array} }} {\mathbf{g}}_{ij} + {\mathbf{Zpe}} + {\mathbf{e}},$$
where \({\mathbf{y}}\) is the vector of phenotypic records, \({\varvec{\upbeta}}\) is the fixed effect vector, \({\mathbf{f}}b\) models the inbreeding depression, where \({\mathbf{f}}\) is the vector of genomic inbreeding coefficients based on the proportion of homozygous SNPs and \(b\) is the inbreeding depression parameter per unit of inbreeding, \({\mathbf{g}}_{A}\) is a vector of breeding values of the sows, \({\mathbf{g}}_{D}\) is a vector of dominance deviations, \({\mathbf{g}}_{ij}\) is a vector of epistatic genetic values, \({\mathbf{pe}}\) is the permanent environmental effect vector \(\left( {Var\left( {{\mathbf{pe}}} \right) = {\mathbf{I}}\sigma_{pe}^{2} } \right)\), \({\mathbf{e}}\) is a residual vector \(\left( {Var\left( {\mathbf{e}} \right) = {\mathbf{I}}\sigma_{e}^{2} } \right)\), and \({\mathbf{X}}\) and \({\mathbf{Z}}\) are design matrices relating records to fixed effects and genetic and permanent environmental effects, respectively. The epistatic genetic effects can be partitioned into additive-by-additive \(({\mathbf{g}}_{AA} )\), additive-by-dominance \(({\mathbf{g}}_{AD} )\), and dominance-by-dominance \(({\mathbf{g}}_{DD} )\) effects. Higher order epistasis interactions were not considered as they were deemed to be negligible [21] or too difficult to estimate. Covariance matrices of genetic effects were as follows:
$$Var\left( {{\mathbf{g}}_{A} } \right) = {\mathbf{G}}_{A} \sigma_{A}^{2} ,\quad Var\left( {{\mathbf{g}}_{D} } \right) = {\mathbf{G}}_{D} \sigma_{D}^{2} ,\quad Var\left( {{\mathbf{g}}_{AA} } \right) = {\mathbf{G}}_{AA} \sigma_{AA}^{2} ,\quad Var\left( {{\mathbf{g}}_{AD} } \right) = {\mathbf{G}}_{AD} \sigma_{AD}^{2} ,\quad Var\left( {{\mathbf{g}}_{DD} } \right) = {\mathbf{G}}_{DD} \sigma_{DD}^{2}$$
where the covariance matrices \({\mathbf{G}}_{A}\) and \({\mathbf{G}}_{D}\) for additive and dominance effects, which are involved in the construction of covariance matrices \({\mathbf{G}}_{AA}\), \({\mathbf{G}}_{AD}\) and \({\mathbf{G}}_{DD}\) for epistatic effects, were constructed assuming HWE. The additive relationship matrix was calculated as in VanRaden [22]: \({\mathbf{G}}_{A} = \frac{{{\mathbf{MM}}^{{\mathbf{\prime }}} }}{{2\sum p_{i} q_{i} }}\), where matrix \({\mathbf{M}}\) has elements that are equal to \(\left( {2 - 2p_{i} } \right), \left( {1 - 2p_{i} } \right), - 2p_{i}\) for genotypes \(A_{1} A_{1}\), \(A_{1} A_{2}\) and \(A_{2} A_{2}\), respectively, where \(p_{i}\) is the frequency of allele \(A_{1}\) at SNP \(i\). The dominance relationship matrix was computed as \({\mathbf{G}}_{D} = \frac{{{\mathbf{WW}}^{{\mathbf{\prime }}} }}{{4\mathop \sum \nolimits_{\varvec{i}} p_{i}^{2} q_{i}^{2} }}\) where \({\mathbf{W}}\) has elements equal to \(- 2q_{i}^{2} , 2p_{i} q_{i} , - 2p_{i}^{2}\) for genotypes \(A_{1} A_{1}\), \(A_{1} A_{2}\) and \(A_{2} A_{2}\), respectively [10]. Covariance matrices for epistatic effects were computed using the Hadamard products and traces as \({\mathbf{G}}_{AA} = \frac{{{\mathbf{G}}_{A} \odot {\mathbf{G}}_{A} }}{{tr\left( {{\mathbf{G}}_{A} \odot {\mathbf{G}}_{A} } \right)/n}}\), \({\mathbf{G}}_{AD} = \frac{{{\mathbf{G}}_{A} \odot {\mathbf{G}}_{D} }}{{tr\left( {{\mathbf{G}}_{A} \odot {\mathbf{G}}_{D} } \right)/n}}\), and \({\mathbf{G}}_{DD}=\frac{{{\mathbf{G}}_{D} \odot {\mathbf{G}}_{D} }}{{tr\left( {{\mathbf{G}}_{D} \odot {\mathbf{G}}_{D} } \right)/n}}\) [14].
In these analyses, we implicitly assume that allele substitution effects of quantitative trait loci (QTL) are distributed as \({\varvec{\upalpha}}{ \sim }N\left( {0,{\mathbf{I}}\sigma_{\alpha }^{2} } \right)\), dominance effects as \({\mathbf{d}}{ \sim }N\left( {0,{\mathbf{I}}\sigma_{d}^{2} } \right)\), additive-by-additive epistatic effects as \(\left( {{\varvec{\upalpha \upalpha }}} \right){ \sim }N\left( {0,{\mathbf{I}}\sigma_{{\left( {\alpha \alpha } \right)}}^{2} } \right)\), and similarly for additive-by-dominant \(\left( {{\mathbf{\alpha d}}} \right)\) and dominant-by-dominant \(\left( {{\mathbf{dd}}} \right)\) effects. Note that under the assumption of HWE, the GBLUP “breeding” model in Vitezica et al. [10] that accounts for additive and dominance effects is a particular case of the model to analyze epistasis in Vitezica et al. [14], which in turn is an extension of the NOIA (natural orthogonal interactions) QTL analysis model by Alvarez-Castro and Carlborg [23]. Note that a “statistical” or “breeding” model implies that the covariance between breeding values and dominance deviations is zero [24]. The NOIA model is orthogonal, which means that genetic effects are defined in such a way that the addition of other genetic effects (e.g. dominance) to a model does not change the definitions of genetic effects (e.g. additive) that were already included in the model [23]. For instance, the additive effect is always the regression of genotypic value on gene content; the dominance effect is a regression of the remainder on a measure of heterozygosity or identity at the genotype, and so on. The NOIA model of Alvarez-Castro and Carlborg [23] achieves orthogonality automatically due to its construction of the incidence matrices \({\mathbf{M}}\) and \({\mathbf{W}}\) and their Kronecker products; Vitezica et al. [14] proved that, for analyses at the individual level, these Kronecker products can be reformulated as Hadamard products.
Statistically, orthogonality means that inclusion of new terms in the model does not change estimates of already included effects in an infinitely large population. For instance, in practice, going from an additive to an additive plus dominant model should not change the estimates of additive variance components much. The advantage of using orthogonality in genetics and breeding is that it is the only way to carry out the estimation of breeding values (additive “statistical” effects) in an unambiguous manner i.e. such that they do not depend on other genetic terms that are fitted in the model.
Variance components were estimated for five nested models that added, in succession, additive effects (\(A\)), dominance effects (\(A + D\)), additive-by-additive genetic effects (\(A + D + AA\)), additive-by-dominance genetic effects (\(A + D + AA + AD\)), and dominance-by-dominance genetic effects (\(A + D + AA + AD + DD\)). Genetic variances (\(\sigma_{A}^{2}\), \(\sigma_{D}^{2}\), \(\sigma_{AA}^{2}\), \(\sigma_{AD}^{2}\) and \(\sigma_{DD}^{2}\)) were estimated by Bayesian methods using Gibbs sampling in the software gibbs2f90 [25], available at http://nce.ads.uga.edu/wiki/. In total, 200,000 iterations were run, discarding the first 10,000 and keeping every 100th sample. Convergence was checked by visual inspection of trace plots for the chains. Post-Gibbs analysis included estimation of the effective sample size and the deviance information criteria (DIC) as a goodness-of-fit criterion [26].
Predictive ability
GBLUP was used to obtain estimated breeding values (EBV) by fixing the variance components that were estimated. Predictive ability of total genetic values for lowly heritable traits is difficult to ascertain, thus for the comparison of models, we proceeded as follows. EBV \(\left( {{\hat{\mathbf{g}}}_{A}^{w} } \right)\) were obtained from the “whole” dataset (13,369 records from 3619 sows), which included litters from 2000 to 2014. EBV \(\left( {{\hat{\mathbf{g}}}_{A}^{p} } \right)\) were also computed for a “partial” dataset that included only litters prior to 2010 (10,002 records from 2440 sows). No sow farrowed both before and after 2010 (such sows were excluded from the data set). The three models: \(A\), \(A + D\), and \(A + D + AA + AD + DD\) were used to estimate \({\hat{\mathbf{g}}}_{A}^{w}\) and \({\hat{\mathbf{g}}}_{A}^{p}\). The models were compared using the following three statistics of cross-validation based on method R approaches [27, 28] to detect bias in subsequent genetic evaluations, and as more recently suggested by [29]:
$$\begin{aligned} b_{0} & = \left( {1^{{\prime }} {\hat{\mathbf{g}}}_{A}^{w} - 1^{{\prime }} {\hat{\mathbf{g}}}_{A}^{p} } \right) /n, \\ b_{1} & = \frac{{Cov \left( {{\hat{\mathbf{g}}}_{A}^{w} ,{\hat{\mathbf{g}}}_{A}^{p} } \right)}}{{Var\left( {{\hat{\mathbf{g}}}_{A}^{p} } \right)}} , \\ \rho & = \frac{{Cov \left( {{\hat{\mathbf{g}}}_{A}^{w} ,{\hat{\mathbf{g}}}_{A}^{p} } \right)}}{{\sqrt {Var\left( {{\hat{\mathbf{g}}}_{A}^{w} } \right)Var\left( {{\hat{\mathbf{g}}}_{A}^{p} } \right)} }}, \\ \end{aligned}$$
where \(b_{0} = 0\) is referred to as the bias and should be equal to 0 under unbiasedness; \(b_{1}\) is a measure of dispersion of EBV, obtained as the slope of the regression of EBV obtained with the whole data on EBV estimated with the partial data, and should be 1; and \(\rho\) measures the relative increase of accuracies of EBV from partial data to whole data and was computed as the correlation of partial data on whole (among two models with different values of \(\rho\), the model with the highest \(\rho\) was chosen). EBV of only “new” sows that farrowed after 2010 were included in the statistics. These sows have no own or progeny phenotypic information in the partial data and thus, they can be seen as selection candidates at birth.
For comparison purposes, we also investigated the predicted ability of phenotypes of “new” sows for the three models as the correlation \(cor\left( {y^{*} ,\hat{y}} \right)\) [30] where \(y^{*}\) is the corrected phenotypic observation obtained from the “whole” dataset \(\left( {y^{*} = y - X\hat{\beta } - f\hat{b}} \right)\), and \(\hat{y}\) is the predicted corrected observation from the “partial” dataset, which is equal to the sum of the estimated genetic values \(\left( {\hat{g}} \right)\) and the estimated permanent environmental effect \(\left( {\widehat{pe}} \right)\). The estimated genetic value included the estimated additive genetic effects (\(A\) model), the sum of estimated additive and dominant genetic effects (\(\hat{g} = \widehat{{g_{A} }} + \widehat{{g_{D} }})\) in the \(A + D\) model, and the sum of all genetic effects \((\hat{g} = \widehat{{g_{A} }} + \widehat{{g_{D} }} + \widehat{{g_{AA} }} + \widehat{{g_{AD} }} + \widehat{{g_{DD} }}\)), in the \(A + D + AA + AD + DD\) model. The regression coefficients of \(y^{*}\) on \(\hat{y}\) were also estimated.