 Research
 Open Access
 Published:
Using pooled data to estimate variance components and breeding values for traits affected by social interactions
Genetics Selection Evolutionvolume 45, Article number: 27 (2013)
Abstract
Background
Through social interactions, individuals affect one another’s phenotype. In such cases, an individual’s phenotype is affected by the direct (genetic) effect of the individual itself and the indirect (genetic) effects of the group mates. Using data on individual phenotypes, direct and indirect genetic (co)variances can be estimated. Together, they compose the total genetic variance that determines a population’s potential to respond to selection. However, it can be difficult or expensive to obtain individual phenotypes. Phenotypes on traits such as egg production and feed intake are, therefore, often collected on group level. In this study, we investigated whether direct, indirect and total genetic variances, and breeding values can be estimated from pooled data (pooled by group). In addition, we determined the optimal group composition, i.e. the optimal number of families represented in a group to minimise the standard error of the estimates.
Methods
This study was performed in three steps. First, all research questions were answered by theoretical derivations. Second, a simulation study was conducted to investigate the estimation of variance components and optimal group composition. Third, individual and pooled survival records on 12 944 purebred laying hens were analysed to investigate the estimation of breeding values and response to selection.
Results
Through theoretical derivations and simulations, we showed that the total genetic variance can be estimated from pooled data, but the underlying direct and indirect genetic (co)variances cannot. Moreover, we showed that the most accurate estimates are obtained when group members belong to the same family. Additional theoretical derivations and data analyses on survival records showed that the total genetic variance and breeding values can be estimated from pooled data. Moreover, the correlation between the estimated total breeding values obtained from individual and pooled data was surprisingly close to one. This indicates that, for survival in purebred laying hens, loss in response to selection will be small when using pooled instead of individual data.
Conclusions
Using pooled data, the total genetic variance and breeding values can be estimated, but the underlying genetic components cannot. The most accurate estimates are obtained when group members belong to the same family.
Background
Group housing is common practice in most livestock farming systems. Previous studies have shown that grouphoused animals can substantially affect one another’s phenotype through social interactions [1–9]. The heritable effect of an individual on its own phenotype is known as the direct genetic effect, while the heritable effect of an individual on the phenotype of a group mate is known as the social, associative or indirect genetic effect [10–14]. Both direct and indirect genetic effects determine a population’s potential to respond to selection, i.e. the total genetic variance [2, 10–14]. Selection experiments in laying hens and quail [1, 2, 9], and variance component estimates in laying hens, quail, beef cattle and pigs [3–9] have shown that indirect genetic effects can contribute substantially to the total genetic variation in agricultural populations.
Direct, indirect and total genetic variances can be estimated from individual data. However, it can be difficult or expensive to obtain individual phenotypes on certain traits, e.g. egg production and feed intake. Alternatively, data can be obtained on group level, resulting in pooled records. However, pooling data reduces the number of data points. Moreover, multiple animals influence each data point, increasing the complexity of the data. Although there is an obvious loss of power, previous studies have shown that pooled data can be used to estimate direct genetic variances for traits not affected by social interactions [15–17]. However, with social interactions, indirect genetic effects emerge and the complexity of the data increases further. It is unclear whether pooled data are still informative in these situations. Therefore, the main objective of this study was to determine whether pooled data can be used to estimate direct, indirect and total genetic variances, and breeding values for traits affected by social interactions. In addition, optimal group composition was determined, i.e. the optimal number of families represented in a group to minimise the standard error of the estimates.
Methods
This study was performed in three steps. First, all research questions were answered by theoretical derivations. Second, a simulation study was conducted to investigate the estimation of variance components and optimal group composition. Third, individual and pooled survival records on 12 944 purebred laying hens were analysed to investigate the estimation of breeding values and response to selection.
Table 1 lists the main symbols and their meaning.
Theory
Variance components and breeding value estimation
In this section, we examined whether direct, indirect and total genetic variances, and breeding values can be estimated from pooled data.
With social interactions, an individual phenotype consists of the direct genetic (A_{D}) and environmental (E_{D}) effects of the individual itself (i), and the indirect genetic (A_{I}) and environmental (E_{I}) effects of its group mates (j):
where n is the number of individuals per group [11]. From an animal breeding perspective, the total breeding value (A_{T}) is of interest because it determines total response to selection. An animal’s A_{T} consists of a direct and indirect component:
where A_{D} is expressed in the phenotype of the animal itself and A_{I} is expressed in the phenotype of each group mate.
A pooled record (P^{*}) consists of the individual phenotypes of all group members (k):
It follows from Equations (1) and (3) that, with social interactions, a pooled record consists of the A_{D} and E_{D} of each group member, as well as their A_{I} and E_{I} that are expressed n – 1 times:
Because an animal’s A_{D} and A_{I} are expressed in the same pooled record, the direct Zmatrix that links pooled phenotypes to A_{D}’s and the indirect Zmatrix that links pooled phenotypes to A_{I}’s are completely confounded (as shown in Appendix A by using a fictive example (Table 8)). Consequently, direct and indirect (co)variances, and breeding values cannot be estimated from pooled data.
It follows from Equations (2) and (4) that, with social interactions, a pooled record contains the total genetic effect of each group member:
Equation (5) shows strong similarities with:
which shows the content of a pooled record when social interactions do not occur. Previous studies have shown that pooled data can be used to estimate direct genetic variances (${\mathrm{\sigma}}_{{\mathit{A}}_{\mathrm{D}}}^{2}$) and direct breeding values for traits that are not affected by social interactions [15–17]. Similarly, pooled data can be used to estimate total genetic variances (${\mathrm{\sigma}}_{{\mathit{A}}_{\mathrm{T}}}^{2}$) and total breeding values for traits that are affected by social interactions.
Optimal group composition
In this section, the standard error (s.e.) of ${\widehat{\mathrm{\sigma}}}_{{\mathit{A}}_{\mathrm{T}}}^{2}$ is derived for three experimental designs that differ with respect to group composition, i.e. group members belonged to either one, two or n families. The s.e. of an estimate of the genetic variance depends on the between $\left({\mathrm{\sigma}}_{\mathit{b}}^{2}\right)\phantom{\rule{0.25em}{0ex}}$ and withinfamily variance $\left({\mathrm{\sigma}}_{\mathit{w}}^{2}\right)$, the relatedness within a family (r), the number of families (N), and the number of records per family (m) [18]:
Analysis of variance was used to derive ${\mathrm{\sigma}}_{\mathit{b}}^{2}$ and ${\mathrm{\sigma}}_{\mathit{w}}^{2}$ for each design (see Appendix B for derivation).
The s.e. of ${\widehat{\mathrm{\sigma}}}_{{\mathit{A}}_{\mathrm{T}}}^{2}$ differs between experimental designs because the group composition changes the withinfamily variance and the number of records per family (Table 2). On the one hand, the withinfamily variance decreases when the number of families per group decreases, causing a strong decrease in s.e.. On the other hand, the number of records per family decreases when the number of families per group decreases, causing a slight increase in s.e.. Overall, to obtain the most accurate estimate of ${\mathrm{\sigma}}_{{\mathit{A}}_{\mathrm{T}}}^{2}$, group members should belong to the same family. The only exception is when family size (o) equals group size (n). In this case, there is only one record per family and ${\mathrm{\sigma}}_{{\mathit{A}}_{\mathrm{T}}}^{2}$ would not be estimable.
Ideally, group members should be full sibs rather than half sibs, since an increase in relatedness causes a decrease in the s.e. of ${\widehat{\mathrm{\sigma}}}_{{\mathit{A}}_{\mathrm{T}}}^{2}$.
Simulation
To validate the theoretical derivations, a simulation study was conducted in R v2.12.2 [19]. A base population of 500 sires and 500 dams was simulated. Each animal in the base population was assigned a direct and indirect breeding value, drawn from $\mathit{N}\left(\left[\phantom{\rule{0.25em}{0ex}}\begin{array}{c}\hfill 0\hfill \\ \hfill 0\hfill \end{array}\right],\phantom{\rule{0.5em}{0ex}}\left[\begin{array}{cc}\hfill {\mathrm{\sigma}}_{{\mathit{A}}_{\mathrm{D}}}^{2}\hfill & \hfill {\mathrm{\sigma}}_{{\mathit{A}}_{\mathrm{DI}}}\hfill \\ \hfill {\mathrm{\sigma}}_{{\mathit{A}}_{\mathrm{DI}}}\hfill & \hfill {\mathrm{\sigma}}_{{\mathit{A}}_{\mathrm{I}}}^{2}\hfill \end{array}\right]\right)$. The ${\mathrm{\sigma}}_{{\mathit{A}}_{\mathrm{D}}}^{2}$ and ${\mathrm{\sigma}}_{{\mathit{A}}_{\mathrm{I}}}^{2}$ were set to 1.00, and ${\mathrm{\sigma}}_{{\mathit{A}}_{\mathrm{DI}}}$ was set to −0.50, 0.00 or 0.50. Each sire was randomly mated to a single dam, resulting in 12 offspring per mating for a total of 6000 simulated offspring. For each offspring, direct and indirect breeding values were obtained as: ${\mathit{A}}_{\mathrm{D}}=\frac{1}{2}{\mathit{A}}_{{\mathrm{D}}_{\mathrm{S}}}+\frac{1}{2}\phantom{\rule{0.25em}{0ex}}{\mathit{A}}_{{\mathrm{D}}_{\mathrm{D}}}+\mathit{M}{\mathit{S}}_{\mathrm{D}}$ and ${\mathit{A}}_{\mathrm{I}}=\frac{1}{2}{\mathit{A}}_{{\mathrm{I}}_{\mathrm{S}}}+\frac{1}{2}\phantom{\rule{0.25em}{0ex}}{\mathit{A}}_{{\mathrm{I}}_{\mathrm{D}}}+\mathit{M}{\mathit{S}}_{\mathrm{I}}$, where the direct and indirect Mendelian sampling terms were drawn from $\mathit{N}\left(\left[\phantom{\rule{0.25em}{0ex}}\begin{array}{c}\hfill 0\hfill \\ \hfill 0\hfill \end{array}\right],\phantom{\rule{0.5em}{0ex}}\frac{1}{2}\left[\begin{array}{cc}\hfill {\mathrm{\sigma}}_{{\mathit{A}}_{\mathrm{D}}}^{2}\hfill & \hfill {\mathrm{\sigma}}_{{\mathit{A}}_{\mathrm{DI}}}\hfill \\ \hfill {\mathrm{\sigma}}_{{\mathit{A}}_{\mathrm{DI}}}\hfill & \hfill {\mathrm{\sigma}}_{{\mathit{A}}_{\mathrm{I}}}^{2}\hfill \end{array}\right]\right)$. Each offspring was also assigned a direct and indirect environmental value, drawn from $N\left(\left[\begin{array}{c}\hfill 0\hfill \\ \hfill 0\hfill \end{array}\right],\phantom{\rule{0.5em}{0ex}}\left[\begin{array}{cc}\hfill {\mathrm{\sigma}}_{{E}_{\mathrm{D}}}^{2}\hfill & \hfill {\mathrm{\sigma}}_{{E}_{\mathrm{DI}}}\hfill \\ \hfill {\mathrm{\sigma}}_{{E}_{\mathrm{DI}}}\hfill & \hfill {\mathrm{\sigma}}_{{E}_{\mathrm{I}}}^{2}\hfill \end{array}\right]\right)$. The ${\mathrm{\sigma}}_{{\mathit{E}}_{\mathrm{D}}}^{2}$ and ${\mathrm{\sigma}}_{{\mathit{E}}_{\mathrm{I}}}^{2}$ were set to 2.00, and ${\mathrm{\sigma}}_{{\mathit{E}}_{\mathrm{DI}}}$ was set to −1.00, 0.00 or 1.00. Animals were placed in groups of four. Depending on the scenario, group members belonged to one, two or four families. Individual phenotypes were obtained by summing the direct and indirect genetic and environmental components according to Equation (1). Pooled records were obtained by summing individual phenotypes according to Equation (3). Seven scenarios were simulated, which differed in ${\mathrm{\sigma}}_{{\mathit{A}}_{\mathrm{DI}}}$, ${\mathrm{\sigma}}_{{\mathit{E}}_{\mathrm{DI}}}$ or group composition (Table 3). For each scenario, 100 replicates were produced.
Based on the previous section, expectations are that the use of a direct–indirect animal model for pooled data will fail to differentiate between direct and indirect genetic effects, while the use of a traditional animal model for pooled data will yield estimates of ${\mathrm{\sigma}}_{{\mathit{A}}_{\mathrm{T}}}^{2}$. To validate these theoretical predictions, both models were run. First, the simulated pooled records were analysed with the following direct–indirect animal model in ASReml v3.0 [20]:
where y^{*} is a vector that contains pooled records (P^{*}); μ^{*} is a vector that contains the pooled mean; $\phantom{\rule{0.25em}{0ex}}{\mathbf{Z}}_{\mathrm{D}}^{*}$ is an incidence matrix linking the pooled records to A_{D}’s (each pooled record was linked to the A_{D}’s of the four group members); a_{D} is a vector that contains A_{D}’s; ${\mathbf{Z}}_{\mathrm{I}}^{*}$ is an incidence matrix linking the pooled records to A_{I}’s (each pooled record was linked to the A_{I}’s of the four group members); a_{I} is a vector that contains A_{I}’s; and e^{*} is a vector that contains residuals. Second, the simulated pooled records were analysed with the following traditional animal model in ASReml v3.0 [20]:
where y^{*}, μ^{*} and e^{*} are as explained above; Z^{*} is an incidence matrix linking the pooled records to A’s (each pooled record was linked to the A’s of the four group members); and a is a vector that contains A’s.
Based on the previous section, expectations are that the most accurate prediction of ${\mathrm{\sigma}}_{{\mathit{A}}_{\mathrm{T}}}^{2}$ will be obtained when group members belong to the same family. To validate this theoretical prediction, the predicted s.e. of ${\widehat{\mathrm{\sigma}}}_{{\mathit{A}}_{\mathrm{T}}}^{2}$ was compared to (i) the standard deviation (s.d.) of 100 estimates of ${\mathrm{\sigma}}_{{\mathit{A}}_{\mathrm{T}}}^{2}$ (${\widehat{\mathrm{\sigma}}}_{{\mathit{A}}_{\mathrm{T}}}^{2}$’s reported by ASReml) and (ii) the mean of 100 s.e.’s of ${\widehat{\mathrm{\sigma}}}_{{\mathit{A}}_{\mathrm{T}}}^{2}$ (s.e.’s reported by ASReml) for three group compositions (scenarios 1, 6 and 7 of Table 3).
Data analyses
The dataset was part of the preexisting database of Hendrix Genetics (The Netherlands) and contained routinely collected data for breeding value estimation. Animal Care and Use Committee approval was therefore not required.
To validate the theoretical derivations and to gain insight into response to selection, individual and pooled data on survival in purebred laying hens (Gallus gallus) were analysed. Survival in grouphoused laying hens is a wellknown example of a trait affected by social interactions, since a bird’s chance to survive depends on the feather pecking and cannibalistic behaviour of its group mates. Ellen et al. [5] used individual survival data on three purebred lines to estimate direct and indirect genetic (co)variances. Large and statistically significant indirect genetic effects were found in two out of three purebred lines. In the current study, we used data from the same two lines. Data were provided by the “Institut de Sélection Animale B.V.”, the layer breeding division of Hendrix Genetics. Data on 13 192 White Leghorn layers were provided of which 6276 were of line W1 and 6916 were of line WB.
At the age of 17 weeks, the hens were placed in two laying houses. The laying houses consisted of four or five double rows, and each row consisted of three levels. Interaction with neighbours on the back of the cage was possible, but interaction with neighbours on the side was prevented. Four hens of the same purebred line were randomly assigned to each cage. Hens were not beaktrimmed. Further details on housing conditions and management are in Ellen et al. [5].
The individual phenotype was defined as the number of days from the start of the laying period until either death or the end of the experiment, with a maximum of 398 days. The individual phenotypes were summed per cage to obtain pooled records. If one individual phenotype was missing, the entire cage was omitted from the analysis. The final dataset contained records on 6092 W1 and 6852 WB hens.
To obtain the direct, indirect and total genetic parameters for survival time, the individual phenotypes were analysed with the following direct–indirect animal model in ASReml v3.0 [20]:
where y is a vector that contains individual phenotypes; X is an incidence matrix linking the individual phenotypes to fixed effects; b is a vector that contains fixed effects, which included an interaction term for each laying house by row by level combination, an effect for the content of the back cage (full/empty) and a covariate for the average number of survival days in the back cage; Z_{D} is an incidence matrix linking the individual phenotypes to A_{D}’s; a_{D} is a vector that contains A_{D}’s; Z_{I} is an incidence matrix linking the individual phenotypes to A_{I}’s; a_{I} is a vector that contains A_{I}’s; V is an incidence matrix linking the individual phenotypes to random cage effects; cage is a vector that contains random cage effects (to account for the nongenetic covariance among phenotypes of cage members [21]); and e is a vector that contains residuals. This model yields estimates of ${\mathrm{\sigma}}_{{\mathit{A}}_{\mathrm{D}}}^{2}$, ${\mathrm{\sigma}}_{{\mathit{A}}_{\mathrm{DI}}}$ and ${\mathrm{\sigma}}_{{\mathit{A}}_{\mathrm{I}}}^{2}$, from which ${\widehat{\mathrm{\sigma}}}_{{\mathit{A}}_{\mathrm{T}}}^{2}$ can be calculated. Similarly, it yields estimates of A_{D}’s and A_{I}’s, from which ${\widehat{A}}_{\mathrm{T}}$’s can be calculated. To improve a trait, animals should be selected based on their ${\widehat{A}}_{\mathrm{T}}$, since ${\mathrm{\sigma}}_{{A}_{\mathrm{T}}}^{2}$ determines a population’s potential to respond to selection.
Alternatively, a traditional animal model can be used to analyse individual or pooled data. A traditional animal model on individual data only yields estimates of ${\mathrm{\sigma}}_{{\mathit{A}}_{\mathrm{D}}}^{2}$ and A_{D}’s. A traditional model on pooled data is expected to yield estimates of ${\mathrm{\sigma}}_{{\mathit{A}}_{\mathrm{T}}}^{2}$ and A_{T}’s, but not of ${\mathrm{\sigma}}_{{\mathit{A}}_{\mathrm{D}}}^{2}$ and A_{D}’s. To validate this theoretical prediction, these traditional models were also run. First, the individual phenotypes were analysed with the following traditional (direct) animal model in ASReml v3.0 [20]:
where y, X, b, Z_{D}, a_{D}, V, cage and e are as explained above. Second, the pooled records were analysed with the following traditional animal model in ASReml v3.0 [20]:
where y^{*} is a vector that contains pooled records (P^{*}); X^{*} is an incidence matrix linking the pooled records to fixed effects; b^{*} is a vector that contains fixed effects (the same fixed effects as mentioned above); Z^{*} is an incidence matrix linking the pooled records to A’s (each pooled record was linked to the A’s of the four group members); a is a vector that contains A’s; and e^{*} is a vector that contains residuals.
The estimated variance components and breeding values of all three models were compared. In addition, we calculated the loss in response to selection that would occur when applying a traditional model to individual or pooled data instead of a direct–indirect model to individual data. The direct–indirect model applied to individual data yielded estimates of ${\mathrm{\sigma}}_{{A}_{\mathrm{T}}}^{2}$ and A_{T}’s. Based on their ${\widehat{A}}_{\mathrm{T}}$, 250 animals were selected and the corresponding response to selection was calculated. Similarly, for the two traditional animal models, 250 animals were selected based on their ${\widehat{A}}_{\mathrm{D}}$ (obtained from individual data) and $\widehat{A}$ (obtained from pooled data). Once the top 250 animals were selected, their ${\widehat{A}}_{\mathrm{T}}$ (obtained from individual data) was used to calculate the total response to selection. Then, the loss in total response to selection was calculated.
Results and discussion
Simulation
The direct–indirect animal model on pooled records failed to converge, confirming that direct and indirect (co)variances cannot be estimated from pooled data. The traditional animal model on pooled records yielded estimates of ${\mathrm{\sigma}}_{A}^{2}$ and ${\mathrm{\sigma}}_{{E}^{*}}^{2}$. These estimates did not differ significantly from the true ${\mathrm{\sigma}}_{{A}_{\mathrm{T}}}^{2}$ and ${\mathrm{\sigma}}_{{E}^{*}}^{2}$ (Table 4), where
(derived by [14]) and
(analogous to [17]).
Based on Equation (7), the s.e. of ${\widehat{\mathrm{\sigma}}}_{{A}_{\mathrm{T}}}^{2}$ was predicted for three scenarios that differed in group composition, i.e. group members belonged to one, two or four families. The theoretical s.e. of ${\widehat{\mathrm{\sigma}}}_{{A}_{\mathrm{T}}}^{2}$ was compared to (i) the s.d. of 100 estimates of ${\mathrm{\sigma}}_{{A}_{\mathrm{T}}}^{2}$ (${\widehat{\mathrm{\sigma}}}_{{A}_{\mathrm{T}}}^{2}$’s reported by ASReml) and (ii) the mean of 100 s.e.’s of ${\widehat{\mathrm{\sigma}}}_{{A}_{\mathrm{T}}}^{2}$ (s.e.’s reported by ASReml) (Table 5). The theoretical s.e. of ${\widehat{\mathrm{\sigma}}}_{{A}_{\mathrm{T}}}^{2}$ did not differ significantly from the values obtained by simulation. Moreover, as predicted, the most accurate estimate of ${\mathrm{\sigma}}_{{A}_{\mathrm{T}}}^{2}$ was obtained when group members belonged to the same family. In comparison, the s.e. of ${\widehat{\mathrm{\sigma}}}_{{A}_{\mathrm{T}}}^{2}$ was twice as large when group members belonged to different families. This indicates that group composition is crucial when aiming to obtain accurate estimates.
Data analyses
Table 6 shows the estimated variance components for individual survival data analysed with a direct–indirect animal model, and the estimated variance components for individual and pooled survival data analysed with a traditional animal model. The direct–indirect animal model on individual data yielded estimates of ${\mathrm{\sigma}}_{{A}_{\mathrm{D}}}^{2}$, ${\phantom{\rule{0.25em}{0ex}}\mathrm{\sigma}}_{{A}_{\mathrm{DI}}}$ and ${\mathrm{\sigma}}_{{A}_{\mathrm{I}}}^{2}$. Based on these components, ${\widehat{\mathrm{\sigma}}}_{{A}_{\mathrm{T}}}^{2}$ was calculated (according to Equation (13)). The traditional animal model on individual data yielded estimates of ${\mathrm{\sigma}}_{{A}_{\mathrm{D}}}^{2}$. The traditional animal model on pooled data yielded estimates of ${\mathrm{\sigma}}_{A}^{2}$ that closely resembled the estimates of ${\mathrm{\sigma}}_{{A}_{\mathrm{T}}}^{2}$ from individual data. The direct–indirect animal model on individual data also yielded estimates of ${\mathrm{\sigma}}_{\mathit{Cage}}^{2}$ and ${\mathrm{\sigma}}_{E}^{2}$. As derived by Bergsma et al. [21], ${\widehat{\mathrm{\sigma}}}_{\mathit{Cage}}^{2}$ is an estimate of $2{\mathrm{\sigma}}_{{E}_{\mathrm{DI}}}+\left(n2\right){\mathrm{\sigma}}_{{E}_{\mathrm{I}}}^{2}$. As derived by Bijma [22], ${\widehat{\mathrm{\sigma}}}_{E}^{2}$ is an estimate of ${\mathrm{\sigma}}_{{E}_{\mathrm{D}}}^{2}2{\mathrm{\sigma}}_{{E}_{\mathrm{DI}}}+{\mathrm{\sigma}}_{{E}_{\mathrm{I}}}^{2}$. As shown in Equation (14), ${\widehat{\mathrm{\sigma}}}_{{E}^{*}}^{2}$ is an estimate of $n\left[{\mathrm{\sigma}}_{{E}_{\mathrm{D}}}^{2}+2\left(n1\right){\phantom{\rule{0.25em}{0ex}}\mathrm{\sigma}}_{{E}_{\mathrm{DI}}}+{\left(n1\right)}^{2}{\phantom{\rule{0.25em}{0ex}}\mathrm{\sigma}}_{{E}_{\mathrm{I}}}^{2}\right]$. Consequently, the ${\widehat{\mathrm{\sigma}}}_{\mathit{Cage}}^{2}$ and ${\widehat{\mathrm{\sigma}}}_{E}^{2}$ from the direct–indirect animal model on individual data should sum to the ${\widehat{\mathrm{\sigma}}}_{{E}^{*}}^{2}$ from the traditional animal model on pooled data. More precisely:
The expected ${\widehat{\mathrm{\sigma}}}_{{E}^{*}}^{2}$, calculated based on the ${\widehat{\mathrm{\sigma}}}_{\mathit{Cage}}^{2}$ and ${\widehat{\mathrm{\sigma}}}_{E}^{2}$ from the direct–indirect animal model on individual data, and the ${\widehat{\mathrm{\sigma}}}_{{E}^{*}}^{2}$ from the traditional animal model on pooled data closely resembled each other.
Table 6 does not show heritability estimates. Where the classical heritability (h^{2}) is used to express ${\mathrm{\sigma}}_{{A}_{\mathrm{D}}}^{2}$ relative to the phenotypic variance (${\mathrm{\sigma}}_{P}^{2}$), T^{2} is used to express ${\mathrm{\sigma}}_{{A}_{\mathrm{T}}}^{2}$ relative to ${\mathrm{\sigma}}_{P}^{2}$[21]. Comparing values of T^{2} obtained from individual and pooled data would be misleading because they are not expected to be similar. Unlike for a trait that is not affected by social interactions, ${\mathrm{\sigma}}_{{P}^{*}}^{2}$ cannot simply be divided by the number of group members to obtain ${\mathrm{\sigma}}_{P}^{2}$. When group members are unrelated,
and
The nonproportional increase of ${\mathrm{\sigma}}_{P}^{2}$ does not enable a meaningful comparison between values of T^{2} obtained from individual and pooled data.
In conclusion, when group members are unrelated, a traditional animal model on individual data yields estimates of ${\mathrm{\sigma}}_{{A}_{\mathrm{D}}}^{2}$, while a traditional animal model on pooled data yields estimates of ${\mathrm{\sigma}}_{{A}_{\mathrm{T}}}^{2}$. Moreover, the estimated cage and error variances from a direct–indirect animal model on individual data sum to the pooled error variance from a traditional animal model on pooled data. This result could explain the ‘inconsistencies’ found by Biscarini et al. [17], who assumed that a traditional animal model on individual and pooled data should yield the same genetic variance. Moreover, Biscarini et al. [17] expected to find a pooled error variance that is four times larger than the individual error variance. For body weight at the age of 19 and 27 weeks, these expectations were met. For body weight at the age of 43 and 51 weeks, however, the genetic variance estimated from pooled data was smaller than expected, while the pooled error variance was larger than expected. Biscarini et al. [17] mentions the emergence of competition effects as a possible cause. We indeed expect to find indirect genetic effects when the individual data on body weight at the age of 43 and 51 weeks were reanalysed with a direct–indirect animal model. Using Equations (13) and (15), the estimated variance components from individual data would resemble the estimated variance components from pooled data.
The regression coefficients of ${\widehat{A}}_{\mathrm{D}}$’s obtained from individual data on the $\widehat{A}$’s obtained from pooled data strongly deviated from one (0.363 ± 0.006 for W1; 0.392 ± 0.010 for WB). The regression coefficients of ${\widehat{A}}_{\mathrm{T}}$’s obtained from individual data on the $\widehat{A}$’s obtained from pooled data were close to, and not significantly different from, one (1.004 ± 0.003 for W1; 1.001 ± 0.001 for WB). This indicates that the $\widehat{A}$’s obtained from pooled data are unbiased estimates of the ${\widehat{A}}_{\mathrm{T}}$’s obtained from individual data.
Table 7 shows Spearman correlation coefficients between ${\widehat{A}}_{\mathrm{D}}$’s and ${\widehat{A}}_{\mathrm{T}}$’s obtained from individual data and the $\widehat{A}$’s obtained from pooled data. The Spearman correlation coefficients between the ${\widehat{A}}_{\mathrm{T}}$’s obtained from individual data and the $\widehat{A}$’s obtained from pooled data were close to, but significantly different from, one. This indicates only a minor loss in the accuracy of ${\widehat{A}}_{\mathrm{T}}$’s when using pooled instead of individual data, which will be reflected in a minor loss in response to selection when using pooled instead of individual data.
To gain more insight, we calculated the loss in response to selection that occurs when applying a traditional model to individual or pooled data instead of a direct–indirect model to individual data. When applying a traditional model to individual data, the loss in total response to selection was 46.9% for W1 (Figure 1A) and 54.9% for WB (Figure 1C). When applying a traditional model to pooled data, the loss in total response to selection was 3.3% for W1 (Figure 1B) and 0.3% for WB (Figure 1D). In conclusion, the loss in total response to selection will be large when using a traditional animal model on individual data, but will be small when using a traditional animal model on pooled data. However, this outcome may be specific to this dataset. Survival in purebred laying hens was recorded in cages with four unrelated birds. Both direct and indirect genetic effects strongly influenced the trait. Group size, group composition, and the relative impact of direct and indirect genetic effects might influence the loss in total response to selection. For example, for body weight at 19 and 27 weeks of age, indirect genetic effects are expected to be small. In that case, an animal’s A_{T} is mainly expressed in the phenotype of the animal itself. Consequently, we expect that more accurate estimated breeding values can be obtained when using individual instead of pooled data. Biscarini et al. [17] found a correlation of ~ 0.75 between the estimated breeding values based on individual and pooled data, resulting in a large loss in response to selection when using pooled instead of individual data. Thus, using pooled data does not always seem to be a proper alternative and requires further research.
Conclusions
Using pooled data, the total genetic variance and breeding values can be estimated, but the underlying direct and indirect genetic (co)variances and breeding values cannot. The most accurate estimates are obtained when group members belong to the same family. While quantifying the direct and indirect genetic effects is interesting from a biological perspective, obtaining the total genetic effect is most important from an animal breeding perspective. When it is too difficult or expensive to obtain individual data, pooled data can be used to improve traits.
Appendix A
This section demonstrates why direct and indirect (co)variances can be estimated from individual data, but cannot be estimated from pooled data.
Consider a situation where four base parents produce six offspring. Animals are kept in groups of two and individual phenotypes are recorded on all six offspring (Table 8).
When analysing individual data with a direct–indirect animal model, the Zmatrices would be:
Z_{ D } and Z_{ I } are not identical, indicating that the direct and indirect genetic effects are estimated based on different information sources, enabling the model to distinguish between these two effects.
When analysing pooled data with a direct–indirect animal model, the Zmatrices would be:
${\mathbf{Z}}_{\mathbf{D}}^{*}$ and ${\mathbf{Z}}_{\mathbf{I}}^{*}$ are identical, indicating that the direct and indirect genetic effects are estimated based on the same information source, causing complete confounding between direct and indirect genetic effects. The model will not be able to distinguish between these two effects.
Appendix B
Components of variance are determined by analysis of variance, where the full variance $\phantom{\rule{0.25em}{0ex}}\left({\mathrm{\sigma}}_{z}^{2}\right)$ is partitioned into a between $\left({\mathrm{\sigma}}_{b}^{2}\right)\phantom{\rule{0.25em}{0ex}}$ and withinfamily component (${\mathrm{\sigma}}_{w}^{2}$). In this section, the derivation of ${\mathrm{\sigma}}_{z}^{2}$, ${\mathrm{\sigma}}_{b}^{2}$ and ${\mathrm{\sigma}}_{w}^{2}$ are presented for three group compositions.

(i)
When the group is composed of only one family, the A _{T} of a family is expressed n times in the same pooled record. Therefore, the record of interest is P ^{*}/n.
$$\begin{array}{l}{\mathrm{\sigma}}_{z}^{2}=\frac{{\mathrm{\sigma}}_{{P}^{*}}^{2}}{{n}^{2}}=\frac{n\phantom{\rule{0.25em}{0ex}}\left({\mathrm{\sigma}}_{{P}_{\mathrm{D}}}^{2}+2\left(n1\right){\mathrm{\sigma}}_{{P}_{\mathrm{DI}}}+{\left(n1\right)}^{2}\phantom{\rule{0.25em}{0ex}}{\mathrm{\sigma}}_{{P}_{\mathrm{I}}}^{2}\right)\phantom{\rule{0.25em}{0ex}}}{{n}^{2}}\hfill \\ +\frac{n\left(n1\right)r\phantom{\rule{0.25em}{0ex}}\left({\mathrm{\sigma}}_{{A}_{\mathrm{D}}}^{2}+2\left(n1\right)\phantom{\rule{0.25em}{0ex}}{\mathrm{\sigma}}_{{A}_{\mathrm{DI}}}+{\left(n1\right)}^{2}\phantom{\rule{0.25em}{0ex}}{\mathrm{\sigma}}_{{A}_{\mathrm{I}}}^{2}\right)}{{n}^{2}}\hfill \\ =\frac{1}{n}\left({\mathrm{\sigma}}_{{P}_{\mathrm{D}}}^{2}+2\left(n1\right){\mathrm{\sigma}}_{{P}_{\mathrm{DI}}}+{\left(n1\right)}^{2}\phantom{\rule{0.25em}{0ex}}{\mathrm{\sigma}}_{{P}_{\mathrm{I}}}^{2}+\left(n1\right)r\phantom{\rule{0.25em}{0ex}}{\mathrm{\sigma}}_{{A}_{\mathrm{T}}}^{2}\right)\hfill \\ {\mathrm{\sigma}}_{b}^{2}=r\phantom{\rule{0.25em}{0ex}}{\mathrm{\sigma}}_{{A}_{\mathrm{T}}}^{2}\hfill \\ {\mathrm{\sigma}}_{w}^{2}=\frac{1}{n}\left({\mathrm{\sigma}}_{{P}_{\mathrm{D}}}^{2}+2\left(n1\right){\mathrm{\sigma}}_{{P}_{\mathrm{DI}}}+{\left(n1\right)}^{2}{\mathrm{\sigma}}_{{P}_{\mathrm{I}}}^{2}+\left(n1\right)r\phantom{\rule{0.25em}{0ex}}{\mathrm{\sigma}}_{{A}_{\mathrm{T}}}^{2}\right)r\phantom{\rule{0.25em}{0ex}}{\mathrm{\sigma}}_{{A}_{\mathrm{T}}}^{2}\hfill \end{array}$$ 
(ii)
When the group is composed of two families, the A _{T} of a family is expressed n/2 times in the same pooled record. Therefore, the record of interest is 2P ^{*}/n.
$$\begin{array}{l}{\mathrm{\sigma}}_{z}^{2}=\frac{4{\mathrm{\sigma}}_{{P}^{*}}^{2}}{{n}^{2}}=\frac{4n\phantom{\rule{0.25em}{0ex}}\left({\mathrm{\sigma}}_{{P}_{\mathrm{D}}}^{2}+2\left(n1\right)\phantom{\rule{0.25em}{0ex}}{\mathrm{\sigma}}_{{P}_{\mathrm{DI}}}+{\left(n1\right)}^{2}\phantom{\rule{0.25em}{0ex}}{\mathrm{\sigma}}_{{P}_{\mathrm{I}}}^{2}\right)\phantom{\rule{0.25em}{0ex}}}{{n}^{2}}\hfill \\ +\frac{4n\left(\frac{n}{2}1\right)r\phantom{\rule{0.25em}{0ex}}\left({\mathrm{\sigma}}_{{A}_{\mathrm{D}}}^{2}+2\left(n1\right)\phantom{\rule{0.25em}{0ex}}{\mathrm{\sigma}}_{{A}_{\mathrm{DI}}}+{\left(n1\right)}^{2}\phantom{\rule{0.25em}{0ex}}{\mathrm{\sigma}}_{{A}_{\mathrm{I}}}^{2}\right)}{{n}^{2}}\hfill \\ =\frac{4}{n}({\mathrm{\sigma}}_{{P}_{\mathrm{D}}}^{2}+2\left(n1\right){\mathrm{\sigma}}_{{P}_{\mathrm{DI}}}+{\left(n1\right)}^{2}\phantom{\rule{0.25em}{0ex}}{\mathrm{\sigma}}_{{P}_{\mathrm{I}}}^{2}+\left(\frac{n}{2}1\right)r\phantom{\rule{0.25em}{0ex}}{\mathrm{\sigma}}_{{A}_{\mathrm{T}}}^{2})\hfill \\ {\mathrm{\sigma}}_{b}^{2}=r\phantom{\rule{0.25em}{0ex}}{\mathrm{\sigma}}_{{A}_{\mathrm{T}}}^{2}\hfill \\ {\mathrm{\sigma}}_{w}^{2}=\frac{4}{n}\left({\mathrm{\sigma}}_{{P}_{\mathrm{D}}}^{2}+2\left(n1\right){\mathrm{\sigma}}_{{P}_{\mathrm{DI}}}+{\left(n1\right)}^{2}\phantom{\rule{0.25em}{0ex}}{\mathrm{\sigma}}_{{P}_{\mathrm{I}}}^{2}\right.\hfill \\ \phantom{\rule{7em}{0ex}}\left.+\left(\frac{n}{2}1\right)r\phantom{\rule{0.25em}{0ex}}{\mathrm{\sigma}}_{{A}_{\mathrm{T}}}^{2}\right)r\phantom{\rule{0.25em}{0ex}}{\mathrm{\sigma}}_{{A}_{\mathrm{T}}}^{2}\hfill \end{array}$$ 
(iii)
When the group composition is random, the A _{T} of a family is only expressed once per pooled record. Therefore, the record of interest is P ^{*}.
$$\begin{array}{l}{\mathrm{\sigma}}_{z}^{2}={\mathrm{\sigma}}_{{P}^{*}}^{2}=n\phantom{\rule{0.25em}{0ex}}\left({\mathrm{\sigma}}_{{P}_{\mathrm{D}}}^{2}+2\left(n1\right)\phantom{\rule{0.25em}{0ex}}{\mathrm{\sigma}}_{{P}_{\mathrm{DI}}}+{\left(n1\right)}^{2}\phantom{\rule{0.25em}{0ex}}{\mathrm{\sigma}}_{{P}_{\mathrm{I}}}^{2}\right)\hfill \\ {\mathrm{\sigma}}_{b}^{2}=r\phantom{\rule{0.25em}{0ex}}{\mathrm{\sigma}}_{{A}_{\mathrm{T}}}^{2}\hfill \\ {\mathrm{\sigma}}_{w}^{2}=n\phantom{\rule{0.25em}{0ex}}\left({\mathrm{\sigma}}_{{P}_{\mathrm{D}}}^{2}+2\left(n1\right)\phantom{\rule{0.25em}{0ex}}{\mathrm{\sigma}}_{{P}_{\mathrm{DI}}}+{\left(n1\right)}^{2}\phantom{\rule{0.25em}{0ex}}{\mathrm{\sigma}}_{{P}_{\mathrm{I}}}^{2}\right)r\phantom{\rule{0.25em}{0ex}}{\mathrm{\sigma}}_{{A}_{\mathrm{T}}}^{2}\hfill \end{array}$$
References
 1.
Craig DM: Group selection versus individual selection: an experimental analysis. Evolution. 1982, 36: 271282. 10.2307/2408045.
 2.
Muir WM: Incorporation of competitive effects in forest tree or animal breeding programs. Genetics. 2005, 170: 12471259. 10.1534/genetics.104.035956.
 3.
Van Vleck LD, Cundiff LV, Koch RM: Effect of competition on gain in feedlot bulls from Hereford selection lines. J Anim Sci. 2007, 85: 16251633. 10.2527/jas.20070067.
 4.
Chen CY, Kachman SD, Johnson RK, Newman S, Van Vleck LD: Estimation of genetic parameters for average daily gain using models with competition effects. J Anim Sci. 2008, 86: 25252530. 10.2527/jas.20070660.
 5.
Ellen ED, Visscher J, van Arendonk JAM, Bijma P: Survival of laying hens: genetic parameters for direct and associative effects in three purebred layer lines. Poult Sci. 2008, 87: 233239. 10.3382/ps.200700374.
 6.
Chen CY, Johnson RK, Newman S, Kachman SD, Van Vleck LD: Effects of social interactions on empirical responses to selection for average daily gain of boars. J Anim Sci. 2009, 87: 844849.
 7.
Duijvesteijn N, Knol EF, Bijma P: Direct and associative effects for androstenone and genetic correlations with backfat and growth in entire male pigs. J Anim Sci. 2012, 90: 24652475. 10.2527/jas.20114625.
 8.
Peeters K, Eppink TT, Ellen ED, Visscher J, Bijma P: Indirect genetic effects for survival in domestic chicken (Gallus gallus) are magnified in crossbred genotypes and show a parentoforigin effect. Genetics. 2012, 192: 705713. 10.1534/genetics.112.142554.
 9.
Muir WM, Bijma P, Schinckel A: Multilevel selection with kin and nonkin groups, experimental results with Japanese quail (Coturnix japonica). Evolution. 2013, In press
 10.
Willham RL: The covariance between relatives for characters composed of components contributed by related individuals. Biometrics. 1963, 19: 1827. 10.2307/2527570.
 11.
Griffing B: Selection in reference to biological groups I. Individual and group selection applied to populations of unordered groups. Aust J Biol Sci. 1967, 20: 127139.
 12.
Moore AJ, Brodie ED, Wolf JB: Interacting phenotypes and the evolutionary process: I. Direct and indirect genetic effects of social interactions. Evolution. 1997, 51: 13521362. 10.2307/2411187.
 13.
Wolf JB, Brodie ED, Cheverud JM, Moore AJ, Wade MJ: Evolutionary consequences of indirect genetic effects. Trends Ecol Evol. 1998, 13: 6469. 10.1016/S01695347(97)012330.
 14.
Bijma P, Muir WM, van Arendonk JAM: Multilevel selection 1: quantitative genetics of inheritance and response to selection. Genetics. 2007, 175: 277288.
 15.
Olson KM, Garrick DJ, Enns RM: Predicting breeding values and accuracies from group in comparison to individual observations. J Anim Sci. 2006, 84: 8892.
 16.
Biscarini F, Bovenhuis H, van Arendonk JAM: Estimation of variance components and prediction of breeding values using pooled data. J Anim Sci. 2008, 86: 28452852. 10.2527/jas.20070757.
 17.
Biscarini F, Bovenhuis H, Ellen ED, Addo S, van Arendonk JAM: Estimation of heritability and breeding values for early egg production in laying hens from pooled data. Poult Sci. 2010, 89: 18421849. 10.3382/ps.201000730.
 18.
Lynch M, Walsh JB: Genetics and analysis of quantitative traits. 1998, Sunderland: Sinauer Associates Inc
 19.
Venables WN, Smith DM, the R development core team: An Introduction to R. 2011, Vienna: R Foundation for Statistical Computing
 20.
Gilmour AR, Gogel BJ, Cullis BR, Welham SJ, Thompson R: ASReml user guide, Release 3.0. 2009, Hemel Hempstead: VSN International Ltd
 21.
Bergsma R, Kanis E, Knol EF, Bijma P: The contribution of social effects to heritable variation in finishing traits of domestic pigs (Sus scrofa). Genetics. 2008, 178: 15591570. 10.1534/genetics.107.084236.
 22.
Bijma P: Socially affected traits, Inheritance and genetic improvement. Encyclopedia of sustainability science and technology. Edited by: Meyers RA. Springer science and business media LLC, doi:10.1007/9781441908513. In press
Acknowledgements
This research was financially supported by the Netherlands Organization for Scientific Research (NWO) and coordinated by the Dutch Technology Foundation (STW). The Institut de Sélection Animale B.V., a Hendrix Genetics Company, provided the data and was closely involved in this research. The authors would like to thank Ewa SellKubiak, Naomi Duijvesteijn and Sophie Eaglen for their valuable input.
Author information
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
KP, EDE and PB participated in the design of the study. KP conducted the study. KP, EDE and PB wrote the paper. PB was the principal supervisor of the study. All authors read and approved the manuscript.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Received
Accepted
Published
DOI
Keywords
 Pool Data
 Individual Data
 Group Composition
 Group Mate
 Total Genetic Variance