New residual feed intake criterion for longitudinal data

Background Residual feed intake (RFI) is one measure of feed efficiency, which is usually obtained by multiple regression of feed intake (FI) on measures of production, body weight gain and tissue composition. If phenotypic regression is used, the resulting RFI is generally not genetically independent of production traits, whereas if RFI is computed using genetic regression coefficients, RFI and production traits are independent at the genetic level. The corresponding regression coefficients can be easily derived from the result of a multiple trait model that includes FI and production traits. However, this approach is difficult to apply in the case of multiple repeated measurements of FI and production traits. To overcome this difficulty, we used a structured antedependence approach to account for the longitudinality of the data with a phenotypic regression model or with different genetic and environmental regression coefficients [multi- structured antedependence model (SAD) regression model]. Results After demonstrating the properties of RFI obtained by the multi-SAD regression model, we applied the two models to FI and production traits that were recorded for 2435 French Large White pigs over a 10-week period. Heritability estimates were moderate with both models. With the multi-SAD regression model, heritability estimates were quite stable over time, ranging from 0.14 ± 0.04 to 0.16 ± 0.05, while heritability estimates showed a U-shaped profile with the phenotypic regression model (ranging from 0.19 ± 0.06 to 0.28 ± 0.06). Estimates of genetic correlations between RFI at different time points followed the same pattern for the two models but higher estimates were obtained with the phenotypic regression model. Estimates of breeding values that can be used for selection were obtained by eigen-decomposition of the genetic covariance matrix. Correlations between these estimated breeding values obtained with the two models ranged from 0.66 to 0.83. Conclusions The multi-SAD model is preferred for the genetic analysis of longitudinal RFI because, compared to the phenotypic regression model, it provides RFI that are genetically independent of production traits at all time points. Furthermore, it can be applied even when production records are missing at certain time points. Supplementary Information The online version contains supplementary material available at 10.1186/s12711-021-00641-2.


Background
According to the United Nations Population Fund, the global human population is expected to reach 9.1 billion by 2050 [1]. Thus, a 70% increase in food production will be necessary to fullfil the food security requirements of this population in a context of climate change [2]. Feed efficiency in livestock production, which is defined as the ability to transform input (feed) into output (such as body weight), is a key factor in meeting this need. Different criteria to measure feed efficiency have been proposed in the literature. Feed conversion ratio (FCR), which is defined as feed intake per unit of average daily gain, is well understood by farmers and thus widely used. However, due to problems that are inherent to selection on ratio measures [3], residual feed intake (RFI) may be a preferred measure of feed efficiency [4]. RFI corresponds to the difference between an animal's actual feed intake and that predicted on the basis of the animal's production and maintenance requirements [4].
In growing animals, RFI is usually obtained by a multiple regression of feed intake on measures of production, body weight gain, and tissue composition [5]. On the one hand, if phenotypic regression is used for this purpose, the resulting RFI is not genetically independent of production and body weight/composition traits. Thus, to avoid detrimental effects on production traits, selection based on RFI obtained by phenotypic regression must be performed using a selection index that combines RFI and production traits with appropriate weights (derived from the covariance matrix between the components of RFI) [6,7]. On the other hand, if genetic regression is used to derive RFI, it is then independent of the other traits at the genetic level (but not at the phenotypic level) [5]. The use of different regression coefficients for the genetic and environmental parts of the component traits (feed intake, production, body weight/composition traits), in a multiple trait approach, accounts for the multiple origins of the phenotypic correlations [8,9] and provides a RFI that is independent of the other traits at the genetic level.
Using automatic self-feeders [10][11][12][13][14], individual feed intake can be recorded whenever an animal accesses its feeder. Since repeated recordings of body weight and body composition are also possible, it is conceivable to compute repeated measurements of RFI at various ages [15]. Compared to the use of an average RFI over the growing period, the main advantage of such longitudinal RFI data is the ability to describe changes in the trait over time and to provide more accurate estimates of breeding values for genetic selection [16]. However, applying a multiple trait model to account for covariances between measurements at different time points becomes problematic when there are more than five time points. Several flexible mixed-model approaches that require estimation of fewer parameters than the multiple trait model have been proposed for the analysis of longitudinal data in the genetic context: random regression (RR), character process (CP), and structured antedependence (SAD) models [17][18][19]. To evaluate longitudinal RFI and propose appropriate selection criteria, extension of these longitudinal models to the multivariate case is necessary; indeed, a multiple trait analysis of longitudinal FI and production traits would allow the genetic and environmental covariances between FI and production traits to be estimated. Such an extension is not straightforward for the CP model [20] and may result in convergence issues for the RR model due to the large number of parameters to be estimated ( n per matrix of random effects for n traits for a polynomial of degree d i for trait i).
The objective of our study was to propose a multivariate SAD model to estimate the genetic and environmental covariances between FI and production traits. Based on these estimates, we propose the use of genetic regression coefficients for the genetic and environmental parts of the component traits of RFI to obtain genetic and environmental components of longitudinal RFI that will be independent of production traits at the genetic level. The proposed multivariate SAD regression model was applied to pig data and compared to a longitudinal model based on phenotypic regression at each time point.

SAD model for longitudinal RFI
To present the general form of the SAD model used in this study, RFI is considered as a function of four component traits, as usually applied in pigs [7,21]: feed intake (FI), average daily gain (ADG), metabolic body weight (MBW), and backfat thickness (BF). The extension to include different components or additional components is straightforward. For the sake of simplicity, these 'production and maintenance traits' are hereafter referred to as 'production traits' .
Let ADG ij , MBW ij , BF ij , and FI ij be the observed values for ADG, MBW, BF, and FI of animal i at time t j ( j = 1 to n ), and ADG j , MBW j , BF j , and FI j the corresponding vectors of observations across animals for time point j . Here, we propose to model longitudinal RFI in two ways: longitudinal RFI obtained by a phenotypic regression of FI on the production traits at each time point (phenotypic regression model), which is the most commonly used and easiest way to compute RFI [22], versus RFI obtained with a genetic and environmental regression on the production traits, which corresponds to a multi-SAD genetic regression to model longitudinal RFI (multi-SAD regression model). In both cases, we used the SAD approach to account for covariances between measurements at different time points [23], and to account for covariances between FI and production traits in the multi-SAD regression model. Like the RR model, the SAD approach attempts to model the form of the random effects function to provide the most appropriate and parsimonious covariance function for the different random effects in the model. To do so, the RR approach uses (orthogonal) polynomials of time. Then, the covariance matrix between time points is derived from the covariance matrix of the random coefficients. Extension to the multiple trait case is straightforward and is done by considering the covariance matrix of the random coefficients for each trait involved, i.e. the random coefficients for different traits are correlated. In the single trait case, the SAD approach models the random effects function by assuming that a random effect at time point t j can be explained by its previous values (i.e. at time point t k , k < j ). A random effect at time point t j is then obtained by a regression on its preceding value(s). The covariance matrix between time points can be derived from the value of the regression coefficients and the variance of the error term in the regression. Extension to the multiple-trait situation is obtained by assuming that, in addition to the within-trait relationship, a random effect of one trait can be a function of the same random effect of the other traits considered, i.e. by adding the values of that random effect for the other traits in the regression. The covariance matrix between time points and traits is then derived from the values of the regression coefficients and the variances of the error terms. To reduce the number of parameters to be estimated, in the SAD approach it is generally assumed that the regression coefficients (called antedependence and cross-antedependence parameters) and the variance of the error term (called innovation variance) are functions of time [17,24].

Phenotypic regression model
The equation to obtain the genetic and environmental components of RFI for animal i at time point t j is: where µ FI,ij represents the fixed effects at time point t j ; b ADGj , b MBWj , and b BFj are the phenotypic regression coefficients of FI on ADG, MBW, and BF for time point

BFj
, u RFI,ij is the additive genetic effect for RFI of animal i at time point t j , and e RFI,ij is the random residual. Genetic and residual effects of different time points are not independent. To account for covariances between time points, the SAD approach models a random effect at time point t j by a regression on the preceding observations, leading to: where θ uRFI,j and θ eRFI,j are the antedependence parameters at time point t j for the genetic and residual components of RFI, and ε uRFI,ij and ε eRFI,ij are the genetic and residual error terms at time point t j , which are assumed to be random normally distributed effects with a mean 0 and innovation variances Aσ 2 εuRFI,j and Iσ 2 εeRFI,j , respectively, where A is the numerator genetic relationship matrix and I an identity matrix of appropriate size. To reduce the number of parameters to be estimated, antedependence parameters and innovation variances are considered as functions of time: θ s,j = β s q=0 a sq t q j for a function of degree β s ( s = uRFI or eRFI ), and σ 2 εs,j = exp γ s q=0 d sq t q j for a function of degree γ s [25]. Hereafter, the notation SAD β s γ s is used to shorten the description of the SAD model. Then, the u RFI,ij = θ uRFI,j u RFI,i(j−1) + ε uRFI,ij e RFI,ij = θ eRFI,j e RFI,i(j−1) + ε eRFI,ij , variance-covariance matrices G and P of the genetic and environmental components of RFI can be obtained by calculating their inverse using a Cholesky decomposition [26]: P −1 = L ′ D −1 L , where D is a diagonal matrix with innovation variances for RFI as components, and L is a lower triangular matrix with 1s on the diagonal and the negatives of the antedependence parameters for RFI as off-diagonal entries (the same reasoning applies to the G matrix). The phenotypic value of RFI for animal i at time point t j in the phenotypic regression model is then computed as RFI ij = u RFI,ij + e RFI,ij . In this model, the genetic covariance matrix between RFI and longitudinal production traits is not estimated which prevents the computation of an optimum selection index of RFI and production traits.

Multi-SAD regression model
The multi-SAD regression model is derived from a multiple trait SAD model [24,27]. A detailed description of the general form of the multiple trait SAD model is given in David et al. [24]. The set of model equations is as follows: where µ ADG,ij , µ MBW ,ij , µ BF ,ij , and µ FI,ij represent the fixed effects at time point t j , u ADG,ij , u MBW ,ij , u BF ,ij , and u FI,ij are the additive genetic random effect functions, and e ADG,ij , e MBW ,ij , e BF ,ij , and e FI,ij are the random residual functions for ADG, MBW, BF, and FI, respectively. To account for covariances between traits and between time points within a trait, the SAD approach makes it possible to model the form of the random effects function with a regression on the preceding observations and on correlated traits. To model RFI, the following genetic random effects functions are used: where θ uADG,j , θ uMBW ,j , θ uBF ,j , and θ uFI,j are the antedependence parameters at time point t j for the genetic components of ADG, MBW, BF, and FI, respectively; b u,ADGj , b u,MBWj , and b u,BFj are the genetic regression coefficients (cross-antedependence parameters) of FI on ADG, MBW, and BF at time point uBF ,j and ε uADG,j , ε uMBW,j , ε uBF,j , and ε uFI,j are random normally distributed effects (error terms) with mean 0 and innovation variances Aσ 2 εuADG,j , Aσ 2 εuMBW ,j , Aσ 2 εuBF ,j , and Aσ 2 εuFI,j , respectively. The same approach is used to model the residual random function: where θ eADG,j , θ eMBW ,j , θ eBF ,j , and θ eFI,j are the antedependence parameters at time point t j for the residual components of ADG, MBW, BF, and FI, respectively; b eADGj , b eMBWj , and b eBFj are the environmental regression coefficients of FI on ADG, MBW, and BF at time point t j , and ε eADG,j , ε eMBW,j , ε eBF,j , and ε eFI,j are random normally distributed effects (error terms) with mean 0 and innovation variance Iσ 2 εeADG,j , Iσ 2 εeMBW ,j , Iσ 2 εeBF ,j , and Iσ 2 εeFI,j . As in the phenotypic regression model, antedependence parameters and innovation variances are assumed to be continuous functions of time to reduce the number of parameters to be estimated (i.e. θ s,j = β s q=0 a sq t q j for a function of degree β s ( s ∈ {eADG, uADG, eMBW , uMBW , eBF , uBF , eFI, uFI} ) and σ 2 s,j = exp for a function of degree γ s . Regression coefficients (crossantedependence parameters) are also assumed to be functions of time: b s,j = δ s q=0 c sq t q j for a function of degree δ s ( s ∈ {eADG, uADG, eMBW , uMBW , eBF , uBF } ). Given Eqs. (1) and (2), it is possible to obtain the genetic and residual components of RFI*, computed with the multi-SAD regression model, by adjusting FI for ADG, MBW and BF at the genetic and environmental levels. Then, we obtain RFI*, which is independent from ADG, MBW and BF at the genetic level as: The phenotypic value for RFI for animal i at time point t j in the multi-SAD model is RFI * ij = u * RFI,ij + e * RFI,ij . See Additional file 1 for a demonstrative example.
The variance-covariance matrices G * of the genetic component of RFI* can be obtained by calculating its inverse using the following Cholesky decomposition: where D * is a diagonal matrix with genetic innovation variances for FI as components, and L * is a lower triangular matrix with 1s on the diagonal and the negatives of the genetic antedependence parameters for FI as off-diagonal entries [cross antedependence is excluded, (see Additional file 1 for the demonstration)]. The environmental covariance matrix P * of RFI* and production traits is obtained using the following covariance function: P * = BP T B ′ , where P T is the environmental covariance matrix for FI and production traits, and B is a lower triangular matrix of regression coefficients: where b es/FI (s ∈ {ADG, MBW , BF } ) are lower triangular matrices with the negative of the genetic cross-antedependence parameters (genetic regression coefficients) on the diagonal and the negatives of the product of environmental antedependence parameters with genetic crossantedependence parameters as off-diagonal entries for n measurement times. By construction, RFI* at time point t j is then independent of ADG, MBW, and BF at all time points at the genetic level.

Model implementation
The phenotypic regression and multi-SAD regression models can be implemented using the ASReml software [28] and the OWN Fortran program developed by David, which are freely available on the Zenodo platform [29]. The degrees of the polynomials of time for each function can be selected by comparing nested models using likelihood ratio tests. In the multi-SAD regression model, to reduce the number of models to be compared, we selected the specification of the antedependence and innovation variances for each trait independently, and then the cross-antedependence specification. For the first part of the selection process, we started with the most parsimonious model (smallest degree for the polynomial functions of time), and then increasing the degree of the polynomial functions of time (alternating the antedependence parameter and the innovation variance) until there is no further significant improvement of the model. Next, an increase in the degree of the cross-antedependence parameters is tested. For this step, we increased the degree of all cross antedependence parameters together.
In addition, to reduce computing time and avoid convergence issues, antedependence and innovation variance parameters for ADG, MBW, and BF were considered as known (i.e. fixed to their values estimated in the first step of the selection process). Both models provide estimated breeding values (EBV) for each time point t j of observation (TEBV: Time EBV). No simple procedure exists to use 'raw' TEBV for selection, as it requires the n TEBV to be accounted for. Instead, animals should be selected based on the trajectory of their TEBV over time. This selection requires a limited number of descriptive parameters of these trajectories, which are usually obtained by eigendecomposition of the genetic covariance matrix [15,30]. It has been shown that the two first eigenvectors obtained with longitudinal models are usually sufficient to describe the trajectory of feed efficiency [15]. The corresponding summarised estimated breeding values (SBV1 and SBV2), which are the product of the vector of TEBV with the first or the second eigenvectors, generally correspond to the EBV for average feed efficiency and the EBV for the main slope of the trajectory over time, respectively [15].

Application to pig data
We used data from 2435 French Large White boars, castrated males, and gilts from nine generations of two lines that were divergently selected for RFI (called hereafter high RFI line and low RFI line). All pigs were raised after weaning on the Rouillé experimental farm (Vienne, France, https:// doi. org/ 10. 15454/1. 55724 15481 18584 7E12) [21]. The animals were selected based on their RFI over an 18-week test period, starting at ~ 10 weeks of age, which was obtained by a defined phenotypic regression of FI on ADG, MBW, and BF over the test period (i.e. one observation per animal). From each farrowing batch, 48 pigs from at least six litters were moved to a growingfinishing room at ~ 10 weeks of age, with four groups of 12 animals (boars separated from females and castrated males) placed in pens that were each equipped with a single-place electronic feeder ACEMA 64 (Pontivy, France; [31]). Over the 18-week test period, the animals were fed a pelleted diet of cereals and soybean meal with 10 MJ NE/kg and 160 g CP/kg, and a minimum of 0.80 g digestible Lys/MJ NE. The animals had free access to water. Feed intake was recorded each time a pig accessed the feeder. For 16 consecutive weeks of the test period (from 11 to 26 weeks of age), body weight (BW) was recorded weekly for the males, while females and castrated males were weighed at 11, 15, 19, and 23 weeks of age, and more frequently if the test lasted for more than 23 weeks. Backfat thickness was measured ultrasonically on males at around 35, 65, 90, and 95 kg body weight, as the average of six measurements that were recorded at three areas of the body: the neck, the back, and the kidney, on both sides of the spine. For the females and castrated males, BF was measured in the same way, but at 11, 15, 19, and 23 weeks of age.
For longitudinal analyses, phenotypes were computed at weekly intervals. Average daily gain of animal i at week j was calculated over a 4-week period following [32] as and MBW was calculated as ij . BF ij corresponded to the measurements of BF on animal i at week j . Outliers, as defined in Huynh-Tran et al. [15], were removed from the analysis. Data for the first week in the growing-finishing room (11 weeks of age), which was considered to be an adaptation period for the animals, and for weeks corresponding to 25 and 26 weeks of age, as the number of pigs weighed per week decreased at the end of the test period (animals already slaughtered), were also removed from the dataset. The final data set included production and FI records over a 10-week period from ~ 13 to ~ 22 weeks of age, which will be denoted as weeks 1 to 10, hereafter. The relationship matrix for use in the analyses was built from pedigree, which included 3986 animals.

Comparison of models
Given the poor properties for RFI obtained from the phenotypic regression model (genetically correlated with production traits), our aim was to compare RFI obtained with the phenotypic regression and RFI obtained with the multi-SAD regression models, in order to evaluate the advantages of applying the multi-SAD model. The core issue was to evaluate whether the EBV for RFI obtained with the multi-SAD model are sufficiently different from those obtained with the phenotypic regression model to consider applying such a complex model to the data. Fixed effects included in the two models were the same for all traits: week (10 levels), batch (66 levels), sex (3 levels), birth herd (2 levels), age at the start of the test (covariate), and pen (16 levels). To apply the phenotypic regression model, missing phenotypes for BW and BF were replaced by their predictions from linear interpolations then MBW and ADG for all weeks were computed using the predicted BW, as described in Huynh-Tran et al. [32]. To make the phenotypic and multi-SAD regression models comparable, these phenotypes were also used in the multi-SAD regression model, although this model can cope with missing values. Spearman rank correlations were used to compare the weekly EBV and the first and second SBV obtained with the two models.
In addition, the trajectories of individual EBV over weeks were classified into groups by a non-hierarchical k-mean analysis [33] for both models. Finally, we evaluated the effect of the divergent selection on RFI obtained by a phenotypic regression at the test level (i.e. one observation over the test period) on the profile of RFI from the multi-SAD regression model over weeks by comparing changes in SBV due to selection.

Results
The descriptive statistics of the traits per week are in Table 1. The phenotype of all traits increased with time. The coefficients of variation were in the same range for ADG, MBW and FI (ranging from 0.13 to 0.21) and lower for MBW (ranging from 0.07 to 0.10). The SAD models that were retained for the phenotypic regression model and the multi-SAD regression model are described in Table 2. The degrees of the functions of the cross-antedependence parameters (genetic and environmental regression coefficients of FI on ADG, MBW, and BF) were all equal to 1 in the multi-SAD regression model. The corresponding equations are in Additional file 2. The estimates of heritabilities of FI and production traits with the multi-SAD model are shown in Additional file 3: Figure S1. All heritabilities increased with time from 0.20 to 0.39 for FI, from 0.32 to 0.39 for ADG, from 0.21 to 0.60 for MBW, and from 0.27 to 0.48 for BF. Estimates of the genetic and phenotypic correlations between weeks for FI and production traits that were obtained with the multi-SAD model are shown in Additional file 4: Figure S2. Similar patterns were observed for FI and production traits, i.e. within trait, estimates of the genetic and phenotypic correlations decreased as the time between measurements increased, and estimates of the phenotypic correlations were slightly weaker than estimates of the genetic correlations. Within week, estimates of the genetic correlation between FI and ADG ranged from 0.60 and 0.75. They were lower between FI and MBW (ranging from 0.15 to 0.37), and between FI and BF (ranging from 0.10 to 0.28). Estimates of the phenotypic correlations followed the same patterns but with lower values.
Estimates of the regression coefficients obtained with the phenotypic regression and multi-SAD regression models are in Table 3. Estimates of the phenotypic regression coefficient for ADG were quite stable from week 1 to 5, ranging from 0.87 to 1.00, then increased to 1.32 in week 6, and finally decreased linearly to a value of 1.00 in week 10. Estimates of the phenotypic regression coefficient for MBW increased from week 1 to 3 (from 1.04 to 1.29), then tended to decrease until week  Heritability estimates for RFI over time obtained with the phenotypic regression and the multi-SAD regression models were moderately high (see Fig. 1). Those estimated with the multi-SAD regression model remained quite stable over time, ranging from 0.14 ± 0.04 to 0.16 ± 0.05, and those estimated with the phenotypic regression model were slightly higher and showed a U-shaped profile, with a minimum value (0.19 ± 0.06) in week 6 and a maximum value in week 10 (0.28 ± 0.06). Based on 95% confidence intervals, only the heritability estimates obtained for weeks 1 to 3 and week 10 differed significantly between the two models.
The genetic correlations between RFI at different time points estimated by the phenotypic and the multi-SAD regression models followed the same pattern over time between the two models ( Fig. 2), but those obtained with the phenotypic regression model were higher, i.e.
(1) estimates of genetic correlations between two successive time points increased over time and ranged from 0.46 to 0.97 with the phenotypic regression model, and from 0.32 to 0.77 with the multi-SAD regression model; and (2) genetic correlations tended to decrease as time between measurements increased, with estimates of 0.10 and 0.00 between the first and last week of estimation with the phenotypic regression and multi-SAD regression model, respectively. Figure 3 shows the estimates of the phenotypic and genetic correlations between RFI obtained with the phenotypic regression and multi-SAD regression models for the phenotyped animals by line and by week. Estimates of the phenotypic correlations were high and similar between the two lines, ranging from 0.87 to 0.98; they decreased from week 1 to 3 and then showed an inverted U-shaped profile, with a maximum value in week 7. The correlations between weekly EBV decreased over time, Table 3 Estimates of phenotypic, genetic, and environmental regression coefficients (± se) for each week and production trait with the phenotypic regression ( b ADG , b MBW , b BF ) and multi-SAD regression models ( b u,ADG , b u,MBW , b u,BF , b e,ADG , b e,MBW , b e,BF ) ADG average daily gain, MBW metabolic body weight, BF backfat thickness, FI feed intake week Heritability Fig. 1 Heritability estimates of RFI over a period of 10 weeks obtained with the phenotypic regression (green) and with the multi-SAD regression (black) models. Shaded area = 95% confidence interval ranging from 0.96 to 0.79 in the High RFI line and from 0.95 to 0.77 in the Low RFI line. The trajectories of individual EBV over weeks were classified into three groups by a non-hierarchical k-mean analysis. These three patterns (see Additional file 5: Figure S3) differed by their means and slopes, which varied in the same direction, i.e. a higher mean was associated with a higher slope, and reciprocally. A Cohen's kappa clustering agreement of 0.80 was found between the two models. The first two estimated breeding values SBV1 and SBV2 summarized the trajectory pattern into just two parameters: to some extent, SBV1 corresponded to the average RFI over the test period and SBV2 to the slope of the curve over time for each individual. Correlations between the first two SBV obtained with the two models are in Table 4. They were higher for SBV1 than for SBV2, similar in the two lines for SBV1 [0.83 (High RFI) and 0.82 (Low RFI)], and lower in the Low RFI line for SBV2 (0.76 versus 0.66). Figure 4 shows how SBV1 and SBV2 that were obtained with the multi-SAD model changed with selection. After seven generations of selection, animals of the Low RFI line had, on average, lower SBV1 and SBV2 than animals of the High RFI line, i.e. the trajectory of EBV over weeks for the Low line was characterised by a lower mean and a lower slope.

Discussion
We chose the SAD approach to account for the genetic and environmental covariances between longitudinal production traits and FI in order to compute longitudinal RFI independent of production traits at the genetic level using the multi-SAD regression model, and to account for covariances between successive measurements in the phenotypic regression and multi-SAD regression models. Thus, applying the multi-SAD approach to components of RFI, we were able to obtain estimates of the covariance between FI and production traits over time and EBV for RFI independent of production traits. These properties are expected to be useful for selection [9] and for the study of patterns of feed efficiency over time at the genetic and phenotypic levels [34]. A random regression (RR) approach [19] could also have been used to account for these different covariances, as analysis of the longitudinal feed conversion ratio by phenotypic regression with RR or SAD approaches have recently been shown to produce similar results [15]. The RR approach has also been used to model RFI when FI is recorded longitudinally but only single records are available for production traits [35]. However, applying the RR approach to multiple traits requires estimation of more parameters than with the SAD approach and can lead to convergence problems [27]. The SAD approach has also other advantages over the RR approach, including providing a better fit to the data in many situations [17,18,24,25], suffering less from the drawbacks reported for the RR model such as border effects [36], and from the challenge to properly estimate correlations that decrease over time [18]. In spite of these advantages compared to RR, we had to apply several constraints to the multi-SAD regression model to facilitate convergence: (1) we did not account for covariances between production traits (i.e. set to 0), and (2) we fixed the antedependence and innovation variance parameters for the production traits to their values estimated in the first steps of the analysis using single trait SAD models. It should also be noted that the units of measure for production traits and FI were chosen such that the variances of the traits were of similar magnitude to facilitate convergence, which is why ADG was expressed in 10× g/d, BF in 0.1× mm, etc. Initial analyses with fewer traits and time points showed that these constraints had no impact on the estimates of the genetic and environmental parameters for RFI.
It must be acknowledged that the multi-SAD regression model is a complex model that may have practical identifiability issues. Exploring the information matrix I ω helps to detect such problems for the unknown parameter vector ( ω ) [37]. The condition number (square root of the ratio of the first to the last eigenvalue) of the information matrix I ω of the SAD model used in our study was low, i.e. 26, which suggests that there was no practical identifiability issue for the data analyzed.
Using results from the multi-SAD model, we estimated RFI that is independent of production traits at the genetic level by regressing FI on production traits using genetic regression coefficients. Different RFI traits can be estimated from the results of the multi-SAD model. On the one hand, estimates of phenotypic regression coefficients corrected for fixed effects (i.e. extracted from the estimated phenotypic covariance matrix [9]) can be used to obtain a measure of RFI that is phenotypically but not genetically independent of production traits. On the other hand, estimates of the genetic regression coefficients can be used for the genetic part and estimates of the environmental regression coefficients for the environmental part to obtain a measure of RFI that is independent of production traits at both the genetic and phenotypic levels (see Additional file 1). However, the meaning and advantage of this measure of phenotypic RFI remain to be investigated. Here, we also derived RFI using a phenotypic regression model that does not account for the systematic effects that affect production traits. We investigated this model in spite of the known poor properties of the resulting EBV for RFI, i.e. the EBV are not independent of production traits, and the low accuracy of RFI parameter estimates [7,38], because phenotypic regression of FI on ADG, MBW, and BF over the test period (i.e. one observation per animal) was used in the divergent selection experiment. In addition, it is the easiest and most common measure of RFI used in pigs and other species (e.g. cattle, sheep, chicken and even fish [39][40][41][42]), although some authors recommend to compute RFI from the genetic covariance matrix, such as Mebratie et al. [8]. Our results obtained with the multi-SAD and the phenotypic regression models were different. Given the known theoretical benefit of the multi-SAD model over the phenotypic regression model, our comparison suggests that applying the multi-SAD model would be preferable in practice. However, this must be confirmed by a more in-depth analysis of the costs (computing time) and benefits (improving RFI by genetic selection without detrimental effects on production traits at different time points) of applying the multi-SAD model over the phenotypic regression model. This will require evaluation of direct and correlated responses to selection [9] in their different dimensions (over time and generations) under scenarios that differ in the model and selection criteria used (subset of the SBV of interest, BV at a given time point, etc.). Estimates of heritability of FI were moderate and tended to increase with age (see Additional file 3: Figure  S1), in agreement with previous studies [7,43,44]. Estimates of heritability of the production traits were similar to previous estimates over the test period (i.e. one observation per animal) in the same population [45] and were within the range of estimates reported for three French pig breeds [46], for an Irish commercial cross [9], and as .j where phenotypes of animal i at week j are RFI ij =û RFI,ij +ê RFI,ij and RFI * ij = û * RFI,ij +ê * RFI,ij for the phenotypic regression and multi-SAD regression models, respectively. b For week j, ρ û .j ,û * .j where û RFI,ij and û * RFI,ij correspond to the estimated breeding values for RFI for animal i at week j obtained with the phenotypic regression and multi-SAD regression models, respectively. Correlations calculated for 1229 phenotyped animals for the low RFI line and 1122 phenotyped animals for the high RFI line. Shaded area = 95% confidence interval  [47]. Longitudinal estimates for these traits are more rarely reported in pigs. For FI and RFI over 10 consecutive weeks of growth in three pig breeds, Shirali et al. [36] reported estimates of heritability ranging from 0.13 to 0.23 and from 0.02 to 0.20, respectively. As in our study, heritability estimates tended to increase with time for FI but their estimates followed a U-shaped curve for RFI and were lower than in our study. Using random regression approaches with different splines and Legendre polynomials, Cai et al. [48] reported estimates of heritability for longitudinal FI, BW, BF between 90 and 210 days of age on Yorkshire RFI selection lines at Iowa State University that, in some cases, increased at the extreme ages, which is typically due to the limits of the RR model listed earlier in the Discussion. Their estimates, were higher than our estimates for BW and BF and slightly lower for FI. Estimates of heritability for weekly RFI obtained with the phenotypic and multi-SAD regression models ( Fig. 1) were within the range of heritabilities, from 0.13 to 0.38 [6,7,22,46], that have been reported in the literature for average RFI in pigs over the test period. Heritability estimates obtained with the multi-SAD regression model tended to be lower (but only significantly lower in weeks 1, 2, 3, and 10) than those obtained with the phenotypic regression model. The lower estimates were the result of lower estimates of genetic variance, while estimates of environmental variance for RFI were similar for the two models. Concomitantly, in spite of a similar pattern, estimates of the genetic correlation between RFI at different time points obtained with the multi-SAD regression model were lower than those obtained with the phenotypic regression model. These differences (heritabilities and correlations) between these two models are probably the result of the genetic correlations with production traits that persist in the phenotypic regression model, whereas they were null in the multi-SAD regression model. Indeed, as demonstrated by Kennedy et al. [5], genetic independence between production traits and RFI obtained by phenotypic regression is achieved only when 1 − h 2 P cov(u FI , u P ) = h 2 P cov(e FI , e P ) , where P is the production trait. In our study, this condition was probably not met at all time points and, as expected, correlations of weekly EBV for RFI with weekly EBV for production traits tended to be weaker with the multi-SAD regression model than with the phenotypic regression model (see Additional file 6: Figure  S4). The non-zero genetic correlations with production traits obtained with the phenotypic regression model may also explain the decreasing correlations between weekly EBV for RFI obtained with the two models over time (model differences that accumulate over time).
Estimates of phenotypic correlations between weekly RFI obtained with the two models were weaker when phenotypic correlations of RFI with production traits were non-zero in the multi-SAD model (negative correlations with MBW at the beginning of the test period and slightly positive correlations with ADG at the end of the test period (see Additional file 4: Figure S2c), while RFI was phenotypically independent of the production traits with the phenotypic regression model. It should be noted that use of the phenotypic regression model at the test-period level (one single observation per animal) resulted in near genetic independence between RFI and production traits, with estimates of genetic correlations equal to − 0.05, 0.07 and 0.36 for RFI with ADG, BF, and MBW, respectively. In this case, the correlation between EBV obtained with the phenotypic regression and the multi-SAD regression model (adapted to a single observation for each trait) was high (0.96).
Estimates of the phenotypic regression coefficients obtained with the phenotypic regression model changed slightly over time (Table 3). For BF and MBW, the difference between the maximum and minimum phenotypic regression coefficients was not significantly different from 0 given the standard errors of the estimates, whereas for ADG, the regression coefficient estimates were significantly higher in weeks 6, 7, and 8 than in weeks 1, 2, and 3. However, when applying a linear regression model on these regression coefficients, none of the slopes were significantly different from zero (estimated slopes of 0.028, − 0.014, and − 0.004 for ADG, MWB, and BF, respectively, p_values > 0.30). This outcome is explained as follows: in the selection experiment, pigs were fed the same diet throughout the test period and the diet was formulated to cover the animals' energy and amino acid requirements, such that feed intake was essentially driven by the net energy content of the feed [49]. Thus, the phenotypic relationships that we observed for FI with ADG, MBW, and BF were driven mainly by the energy costs of these traits. In most models, the energy costs of maintenance (MBW) and of fat and protein deposition are considered to be constant for an animal during growth [50], although in some cases individual factors have been shown to affect maintenance requirements (visceral mass versus lean tissue, for instance) [51]. In the final multi-SAD regression model, the cross-antedependence parameter functions (regression coefficients) were of degree 1 because higher degree cross-antedependence functions led to convergence problems or to spurious variance estimates. Thus, in the SAD model, genetic and environmental regression coefficients were considered to evolve linearly over time. Interpretation of changes in estimates of regression coefficients over time is more difficult for genetic and environmental regression coefficients than for phenotypic regression coefficients, and no information on these aspects is available in the literature. In practice, different factors could influence the partitioning between genetic and environmental components of the phenotypic covariation of FI with production traits over time. For example, competition for access to the feeder may increase over time for pigs raised in groups, which means that the importance of the environmental factors may increase relative to the genetic determinism of how feed is used for growth. In addition, estimates of regression coefficients varied most from the beginning to the end of the period for MBW, which was the trait with the smallest genetic and environmental variance estimates (ranging from 5 to 66 and from 17 to 45, respectively), while they varied least for BF, which was the trait with the largest genetic and environmental variance estimates (ranging from 18 to 203 and from 48 to 218, respectively). Thus, these changes in coefficients might have limited impacts on covariances of the trait with FI and on the resulting RFI.
For the purpose of comparison, missing phenotypes for production traits were replaced by their linear interpolations for both models. As stated in the Methods section, the multi-SAD regression model can cope with a substantial number of missing values for the production traits, which is another advantage of this model. In the phenotypic regression model, if one of the production traits is missing at a given time point for an animal, RFI is also missing, which would be problematic if many phenotypes were missing for the different production traits. In our case, applying the phenotypic regression model to the original dataset (i.e. the set with numerous missing phenotypes for production traits) led to spurious variance estimates [estimates of heritability of RFI were very high, up to 0.78, (see Additional file 7 Figure S5)]. Results of the multi-SAD regression model applied to the original dataset (the same multi-SAD regression model) are provided in Additional file 7: Figure S5 and Additional file 8: Figure S6. The heritability estimates that were obtained with the multi-SAD regression model applied to the initial dataset did not differ from those obtained with the dataset using linear interpolation of missing phenotypes. Patterns of estimates of the genetic and environmental regression coefficients over time were also similar between the two datasets, as were the slope estimates, although we did note a tendency for steeper slopes when phenotypes were missing. The main differences between the two datasets were evident for the SBV. When phenotypes were missing, the correlations between SBV obtained with the phenotypic regression and the multi-SAD regression model were weaker than the correlations obtained when models were applied to data with no missing phenotypes: 0. This result indicates that, in practice, when applying the multi-SAD regression model to data with missing records, selection based on SBV will differ markedly from selection on estimates obtained with the phenotypic regression model, particularly regarding the profile of RFI changes over time (SBV2). Values of the first eigenvector with the multi-SAD model, which was used to compute SBV1, were all positive and increased with time, while the sign of the values of the second eigenvector changed at week 7. This means that selection for SBV1 would lead to selection in the same direction at all time points, while selection for SBV2 would have opposite effects for the beginning versus the end of the test period. In our study, animals were divergently selected based on RFI obtained by phenotypic regression over the test period (one record per animal). Changes in SBV1 and SBV2 obtained with the multi-SAD model showed that this selection had, as expected, an impact on the average level of RFI over the test period, but also on the dynamics of RFI during the test period. This result is confirmed by the distribution of the individuals from the two lines into three trajectory patterns: 90% of the animals that were classified into the cluster with the lowest mean and slope were from the Low RFI line, while 94% of the animals that were classified into the cluster with the highest mean and slope were from the High RFI line. A previous study in the same population on the impact of selection for RFI at the test period level on the trajectory of feed conversion ratio reported similar results [15]. In practice, selection for RFI based on SBV1 and SBV2 allows control of the average level of RFI, as well as its changes with age. Nonetheless, further research is needed to identify the desired pattern of the RFI trajectory to select for. Indeed, is it preferable to select animals with a constant RFI over time, or an RFI that increases or decreases with age? To answer this question, studies on the covariation of longitudinal RFI with other traits of interest, such as carcass composition and meat quality, are needed, as well as on the evaluation of the economic impact of RFI at different ages.

Conclusions
The multi-SAD regression model is preferred over the phenotypic regression model for analysis of longitudinal RFI because the multi-SAD regression model can be applied even when phenotypes for production traits are missing. In addition, RFI obtained with the multi-SAD regression model are genetically independent of