Genetic heterogeneity of residual variance in broiler chickens

Aims were to estimate the extent of genetic heterogeneity in environmental variance. Data comprised 99 535 records of 35-day body weights from broiler chickens reared in a controlled environment. Residual variance within dam families was estimated using ASREML, after fitting fixed effects such as genetic groups and hatches, for each of 377 genetically contemporary sires with a large number of progeny (> 100 males or females each). Residual variance was computed separately for male and female offspring, and after correction for sampling, strong evidence for heterogeneity was found, the standard deviation between sires in within variance amounting to 15–18% of its mean. Reanalysis using log-transformed data gave similar results, and elimination of 2–3% of outlier data reduced the heterogeneity but it was still over 10%. The correlation between estimates for males and females was low, however. The correlation between sire effects on progeny mean and residual variance for body weight was small and negative (-0.1). Using a data set bigger than any yet presented and on a trait measurable in both sexes, this study has shown evidence for heterogeneity in the residual variance, which could not be explained by segregation of major genes unless very few determined the trait.


INTRODUCTION
In the simplest terms, phenotypic value is the sum of the genotypic value and a residual error contributed by random environmental or other non-genetic influences, such as developmental noise. A multivariate normal distribution with homogeneous variances and independent normally distributed effects is often assumed [3,15]. Thus, the assumption is made that the residual variance is the same for any given genotype [10,22]. As environmental variance cannot usually be estimated directly, but only from the variation within families, genetic and non-genetic causes are confounded. Under the infinitesimal model, the genetic variance within families is the same in each family, so assuming that the residual variance is homogeneous, is equivalent to assuming that the environmental variance and the genetic variance within families are homogeneous. There can, however, be substantial heterogeneity of phenotypic variances amongst environmental groupings such as milk yield in dairy herds [1,19]. In a few studies, the objective has been to quantify heterogeneity of variance and explore the extent to which heterogeneity of environmental or residual variance plays a role [4,5,12,13,21]. Foulley and Quaas [4] discuss the estimation of source and magnitude of effect of heterogeneous variances within linear mixed models. There is evidence to suggest that residual heterogeneity is under genetic control, based on analysis of data on fecundity of sheep by SanCristobal-Gaudy et al. [23], who used a log linear model to test the significance of additive genetic influence on the residual variance. Earlier and more direct evidence for genetically structured adaptation or environmental sensitivity is found in studies of genetic assimilation [27] canalisation [18] and environmental plasticity [26].
Variation in environmental sensitivity under genetic control might be explained by the assumption that genotypes at individual loci differ not only in their effect upon the mean but also on the variance of the trait [7,10,23]. Consequently, simple directional selection on the trait would change environmental variability [10]. Furthermore, in production systems the aim is frequently to minimise variation in order to achieve product consistency, which could be achieved if there is a genetic component to variation in environmental variation. More recent evidence for a genetic component includes litter size in pigs [24], body weight of snails [20] and bristle number in Drosophila [16]. Sorensen and Waagepetersen [24] use data on pig litter size to compare four Bayesian models varying in complexity in the residual variance structure in terms of goodness of fit, and retrospective performance for the prediction of response to selection. They conclude that there is very strong evidence for a genetically structured residual heterogeneity. Furthermore reproductive traits are usually not normally distributed and are measurable in only one sex.
A simpler approach has been proposed of using data on animals in the same location, and estimating the variance among large half sib families in the variance within families [8]. The additive genetic variation in the environmental variance can then be estimated on the assumption that the genetic variance within families is constant. This method is transparent, albeit less efficient than a full analysis via model fitting, and requiring some assumptions, particularly of homogeneity of genetic segregation variance. It is basically an extension of an earlier application to detect heterogeneity in phenotypic variance amongst dairy herds [1]. The data analysed in this report are highly suitable for such a study. They are on juvenile body weight in broiler chickens, a continuous trait that is recorded on all birds of both sexes and is approximately normally distributed. Sire family sizes used were large and records of many families were available. The data were from a commercial breeding programme with selection inter alia on body weight, with genetically contemporary birds reared together.

Data
The data provided by Aviagen Ltd. comprised 176 973 records of body weights at 35-days of age for a line of broiler chickens recorded over a three year period. Birds were pedigree breeding stock, with all birds in each weekly hatch reared together in a controlled environment. Potential parents were selected each week and allocated to mating groups, which commenced at three-weekly intervals. Sires and dams were allocated to the mating group that commenced closest to them becoming a set age. A mating group typically comprised about 8 sires, each mated to about 10 dams in a nested structure, for a standard number of hatch weeks. Progeny from different mating groups were reared together, with individuals randomly assigned to pens. Thus progeny born in a particular hatch from a particular mating group come from contemporaneous parents (important since selection was practised for body weight inducing a genetic trend) and shared the same environment.
Data were extracted on sire families which had records of body weight on at least 100 progeny of one sex and 50 of the other, a criterion met by 377 sire families. The resulting data set comprised a total of 99 535 records, 47 730 males and 51 805 females, spread over 50 mating groups spanning 160 hatch weeks (Tab. I).

Outline
Data on males and females were analysed separately within contemporaneous mating groups to render the model as parsimonious as possible and to avoid the need to adjust for differences in mean or variance due to sex and for trend effects over time. All progeny contributed by a mating group and hatching in a particular hatch week were treated equally, hence the environment within hatch week was assumed constant. In order to correct for sampling errors, observations within families were assumed to be normally distributed.
To allow for possible scale effects and log-normal distribution of body weight data, all data were reanalysed after log transformation. To allow for atypical observations, e.g. sick birds, data were reanalysed after excluding outliers.

Basic analysis
The within family residual variance for offspring of each sire was estimated using a linear mixed model in separate analyses for each sex to correct for known fixed environmental and random genetic effects. The model was where y ghi jk is the bodyweight of an individual bird, µ is the overall mean for the sex being analysed, m g is the mating group effect, H gh is the hatch week within mating group effect, S gi is the sire within mating group effect (variance σ 2 s ), D gi j is the dam within sire effect (variance σ 2 d ), and e ghi jk is the residual within dam family effect, allowing for hatch, with variance σ 2 gi assumed to be dependent on sire within mating group. The mean and hatch weeks were fitted as fixed effects and, because parents in a mating group were contemporaries, dam age was confounded with hatch week and thus was not fitted separately. The effects of sires and of dams nested within sires were assumed random. Analyses were undertaken by restricted maximum likelihood (REML) within each mating group using the ASREML package [6], such that residual variances could be estimated separately for each sire family.

Estimation of heterogeneity of within family residual variance
As with the REML analysis using model (1), subsequent calculations were conducted independently on data from each sex. Let s 2 gi denote the REML estimate of σ 2 gi , (the variance within sire family i of mating group g), and let x gi = ln(s 2 gi ). Following Ibanez et al. [12] based upon Foulley et al. [5] this transformation simplifies the analysis of heterogeneity of variance because it removes the relationship between the variance and its sampling variance. Assuming independent normality of observations within dam families within sires, the s 2 gi are proportional to chi-square variates, and the sampling variance of s 2 gi equals 2σ 4 gi /d gi , where there are d gi degrees of freedom (df) used to estimate the variance (i.e. the number of progeny less the number of dam families, ignoring the few df lost for fitting hatch effects across mating groups). If the variances within sires have constant value V W , then var(s 2 gi ) = 2V 2 W /d gi , and var(x gi ) = 2/d gi approximately.
If the variances within sires are homogeneous, the quantity is approximately χ 2 distributed with f g − 1 df, where f g is the number of sire families in mating group g and x g.
If the variances within sire (σ 2 gi ) are heterogeneous, with mean V W and variance V B , then we assume (2) we obtain the expectation of the sum of squares for group g approximately the total df within dams within sires in mating group g. A useful reference for illustrating heterogeneity is the corresponding mean square, and under the assumption of homogeneity, E(Z g ) = 1. This was used as a preliminary test for heterogeneity within individual mating groups. Summing over mating groups, X = g X g , and equating observation to expectation from (4) gives an estimate, CV(V W ) was estimated as √γ 2 . The average variance within sires, V W , and subsequently the variance between sires in within sire variance, V B , were estimated by pooling s 2 gi over mating groups, and V(V W ) was estimated asγ 2 V 2 W . The above procedure is much more straightforward than that suggested by Hill [8] which was based on analysis of s 2 gi to estimate V B rather than on ln(s 2 gi ) to estimate the CV(V W ) as here, and gives similar estimates.
For the data included in the analysis from the 50 mating groups, the number of degrees of freedom between sires ( f g − 1) ranged from 2 to 10, with 1, 1, 7, and 5 mating groups having 2, 3, 4 and 5 df, respectively, and the remaining 36 having 6 − 10 df. The total df between sires within mating groups was 327. The values ofd g (approx. within sire df) ranged from 90.2 to 151.7 for data on male progeny and from 49.9 (group 50) to 145.2 for females. In only 5 mating groups for males and 3 for females wasd g < 100. The means of thed g (weighted by f g − 1 as in the final analysis) were 127.8 for males and 116.9 for females (Tab. II).
Visual inspection of untransformed data on bodyweights showed no obvious signs of non-normality. However skewness and kurtosis coefficients calculated within each hatch group within mating group provided clear evidence of non-normality, with a substantial proportion of hatch groups showing negative skewness and positive kurtosis. On closer scrutiny much of the data was found to have outliers in the lower tail of the distributions.
Analyses were therefore repeated on a trimmed version of the data (Tab. II) after the exclusion of outliers, defined for each hatch group as observations falling outside the inner fences of the standard 'box and whisker' plot [25]. The box extends from the first to third quartile, and the inner fences are placed 1.5 box lengths from the edges of the box. For normally distributed data, the box length is 1.35 SD, and the fences are 2.7 SD either side of the mean. The resulting data sets were 1.9% and 2.4% smaller for males and females respectively, and the meand g was 112.9 and 121.9 for males and females respectively.

Additive genetic variance of residual variance and other components
The additive genetic variance in within family variance was estimated as V AV = 4V(V W ) because one-quarter of the additive variance is between sire families under the assumption that sires and dams are unrelated. The conventional additive genetic variance, V AM , was estimated as four times the variance of sire family effects for bodyweight, S gi , from (1), pooled over mating groups (direct from the REML analysis). The covariance (×4) of S gi and the variance within mating groups was calculated and similarly pooled over mating groups to obtain an estimate of cov AMV , the additive genetic covariance of mean and within family variance [10]. These estimates are not unbiased, but provide a reference point.

Description of data
Mean bodyweights and residual variances of progeny are given for each mating group in Figure 1 and overall in Table I (computed as the mean, weighted by df for sires in each mating group). There is a substantial increase in mean body weight, presumably genetic change due to selection as shown by previous analyses in nucleus broiler populations [17]. There is also a trend over time in the residual variances after fitting model 1, with those for males increasing more than for females. The change in mean is, however, greater than that in phenotypic standard deviation; hence the coefficient of residual variation (residual SD/mean) from fitted regression on mating group number fell over the three years, from about 6.8% to 6.4% in males and from about 6.2% to 5.4% in females.

Heterogeneity amongst sires in residual variance
To illustrate the heterogeneity of variance among sires in variance within sires, values of Z g (see (5)) for each mating group and sex are given in Figure 2. The quantity has expected value 1 in the absence of heterogeneity, which is shown as a reference line in the graphs. The preliminary test for heterogeneity (following Eq. (2)) showed that out of the 50 mating groups, there was  substantial difference, with only 10 and 13 mating groups significantly heterogeneous, although overall the heterogeneity remained very highly significant (P < 0.001 in both sexes).
Estimates of the coefficient of variation between sires in the within dam within sire variance, CV(V W ) for both untransformed and transformed data are given in Table II. As anticipated from Figure 2, these are substantial, 15% for both males and females on the untransformed data. The standard error of these estimates, obtained avoiding distributional assumptions by bootstrap sampling of mating groups and thus of X g values (Eq. (2)), is approximately 1.5%.
There was not a strong scale effect as indicated by a mean-variance relation (cf. Fig. 1). Consequently reanalysis of the data after log transformation gave a very similar pattern (not shown) to the heterogeneities for each mating group as shown for untransformed data in Figure 2, and rather higher overall estimates of CV(V W ), 18% for males and 19% for females. Trimming of the data to remove outliers led to a reduction in the levels of heterogeneity, and CV(V W ) fell to 10% in males and 11% in females (Tab. II).

Relation between sire progeny group mean and residual variance
In Figure 3, the uncorrected estimate of the deviation from its respective mating group mean of the residual variance for each sire, s 2 gi , is plotted against the sire effect on the mean (from the REML analysis based on model (1), also expressed as a deviation from mating group mean). Table III. Summary of estimates of parameters for untransformed and untrimmed data: population mean (mean), phenotypic variance (V P ), mean within family variance (V w ), variance between sires of within family variance V vw, proportion of variance between sires explained by within sire variation CV(V w ), additive genetic variances in mean performance (V AM ), and residual variance (V AV ), and their covariance (cov AMV ) and correlation (corr AMV ). Weights are in units of decagrams (1dg = 10g). Estimates are given in Table III for the additive genetic variance of sire effects upon the mean (V AM ), the equivalent for the residual variance (V AV ) obtained as 4V(V W ) = 4[CV(V W ) × V W ] 2 (from Tab. II, i.e. for untransformed, untrimmed data) and their covariance (cov AMV ). There is a small negative correlation between the sire effects on mean and variance for both sexes (Fig. 3, Tab. III).

Relation between estimates from male and female progeny
The association between sire effects of male and female progeny for mean body weights and for residual variances is shown in Figure 4. When calculated within mating groups and pooled, weighted by df for sire families, the correlations of effects between the sexes were 0.74 for means and 0.11 for residual variances. Outliers were evenly distributed across sires therefore trimming had a small effect on many sires rather than a large effect on a few. Consequently trimming of outliers increased the correlations marginally, to 0.75 and 0.18 for means and residual variances respectively.

Evidence for genetic heterogeneity of residual variance
Evidence of heterogeneity was based upon the assumption that the body weights were approximately normally distributed and consequently that the residual variances would be chi-squared distributed. The observed variance amongst sires in residual (within dam family) variance was significantly greater than expected. Furthermore, 37 and 39 out of 50 estimates for males and females were greater than expectation (illustrated in Fig. 2). The magnitude of the heterogeneity is most simply expressed as the coefficient of variation of the residual variances (after they were corrected for sampling). These CVs were approximately the same for males and females (15%, respectively). Estimates were slightly increased by log transformation of the data (Tab. II) indicating that they were not a consequence of a simple scale relationship of means and standard deviations. Inspection of Figure 2 does, however, show that particularly in males the magnitude of the heterogeneity tended to increase in later mating groups. This is redressed to some extent by removal of outliers within hatch within mating group although this fails to reduce CV(V W ) by more than 1/3. It is moot, however, as to what is the appropriate level of trimming of the data: the aim was to remove particularly aberrant birds, which for example had grown very poorly perhaps as a consequence of some disease challenge or other unidentified sources, and which caused substantial skewness in the data. But trimming is also likely to reduce estimates of real heterogeneity, because obviously the sire families in which individuals are most variable are most likely to have individuals excluded. Thus the estimated CV(V W ) of around 10% after trimming is probably biased downwards.
There were, however, some contra-indications of genetic variation in residual variance. The correlation between sires of estimates of residual variance from male and female progeny was very low, in contrast to that on the mean (Fig. 4). The limited numbers in each sire family would only partly account for this low correlation, although it might have been obscured during pooling with only some mating groups showing heterogeneity.

Possible non-genetic causes of heterogeneity
Other possible causes of the heterogeneity have to be eliminated before it can be attributed to a genetic effect upon the environmental variance. These might be a breakdown in statistical assumptions, confounding environmental heterogeneity, or confounding genetic heterogeneity.
Outliers in the original body weight data caused both skewness and kurtosis. However significant heterogeneity remained after removal of outliers, and after log transformation. Our original analysis can therefore be regarded as fairly robust. One reason for this is that family sizes were large, such that the precise distribution of the estimates of within sire variance does not become critical. Assuming normality, simulation [8] shows that the estimates are not substantially biased with large families such as these.
Environmental heterogeneity could arise if, for example, sires' offspring were reared together and separate from those of others. That was not the case within mating groups. Nor does it seem feasible that any covariance could arise during the mating or hatching process, as each sire had several female mates with progeny reared over several hatches. Covariances among members of dam families contribute only a fraction to the sire component when, as here there are several dams mated to each sire. Further, these data came from a breeder taking care to minimise environmental confounding of family members in order to maximise accuracy of selection. Differential effects of competition within or between progeny groups could lead to heterogeneity of variance, but that seems better regarded as a real factor that has been properly included rather than a source of bias.

Possible genetic causes of heterogeneity
The residual or within dam-family variance comprises the environmental variance and one-half of the additive and three-quarters or more of dominance and epistatic genetic variance components. A gene with large effects upon the mean segregating in the population could be a source of heterogeneity. Indeed heterogeneity of within family variance has been used as a test for the segregation of major genes such as those found in QTL studies (see review by Hill and Knott [9]), albeit not a powerful one. Although there is evidence of quantitative trait loci (QTL) segregating within families even after many generations of selection, genes of such large effect would rapidly be fixed, and have not been observed in segregating populations [2].
A recent review of QTL analysis in chicken [11] reports the majority of QTL for body weight are additive with effects between 0.3 and 1.0 phenotypic standard deviations. Such genome scans for QTL affecting growth suggest at least 6-8 mostly additive genes of large effect. De Koning et al. [2] do identify one QTL for body weight segregating in a different line of commercial broiler chickens that contributes approximately 24% of the within family variance. We have no information as to whether the QTL was additive, and also De Koning et al. point out that estimates of the proportions of variance removed by QTL are biased upwards.
In the Appendix, the possible contribution due to segregation of genes of large effects is considered. For intermediate gene frequencies, the coefficient of variation of within family variance relative to the heritability (CV(Vw)/h 2 ) is of order 1/(2 √ n) for n dominant loci and 1/(4 √ n) for n additive loci. The impact is therefore greater for dominant than additive loci, and is approximately inversely proportional to the square root of the number of loci. In the present example, h 2 ∼ 0.17 and CV(V W ) ∼ 0.15, so segregation of additive genes of large effect would be unlikely to contribute a large part of this heterogeneity. Indeed although an additive gene with an effect of one standard deviation or 21 dg could potentially contribute 12% of heterogeneity between sires, the additive genetic variance would be more than 4 times that observed. The observed CV(Vw)/h 2 of 0.88 suggests that there would need to be one or two dominant loci segregating at low frequencies controlling the trait.
Furthermore, such genetic segregation could not explain the low correlation of the heterogeneity between male and female progeny shown in Figure 4 which, apart from sex linked genes (unlikely, as the sire is homogametic) or sex limited genes (unlikely, as the correlation of mean weights in males and females is high) would be expected to contribute similarly to males and females.

Evidence for environmental sensitivity
Strong evidence for genetic variation in environmental variance also comes from direct analysis by Mackay and Lyman [16] of variance in bristle number among isogenic lines of Drosophila derived from natural populations, and in farmed species from more formal analyses than that presented here incorporating Bayesian models including genetically structured environmental variation for data on litter size of sheep [23], litter size of pigs [24] and body weight of snails [20]. Under the assumption that genetic segregation is unlikely to be the cause, the heterogeneity can be attributed to differences in response to environmental conditions. Both plasticity and canalisation can be explained by environmental sensitivity under genetic control, where variation or stability of expression are selected for directly when fitness is related to the ability to adapt or to control phenotypic expression respectively [26]. Where it is important to have constancy over environments, such as homeostasis of temperature control, genotypes with low variability will be fittest; whereas if environment is constantly changing, adaptation will bring fitness benefits for a range of phenotypic expression.
Genes that affect the mean and the variance also perhaps explain why the coefficient of variation in many traits lies within a narrow range across species and why there is often a mean variance relationship in quantitative traits. There is also evidence of greater environmental sensitivity in inbred compared to outbred populations [14]. This could be explained by homozygotes having a greater effect upon the variance by fixing additive alleles that increase variance.

Prediction of response to selection
A small negative correlation between sire effects on the mean and the variance was observed in this study, indicating little association between gene effects on mean and variance. In previous studies, a stronger negative correlation (−0.62) was found by Sorensen and Waagepetersen [24] and positive correlations of 0.19 and 0.8 by SanCristobal-Gaudy et al. [23] and Ros et al. [20], respectively, although partly explicable as a scale effect.
If there is genetic variation in residual variance, intense directional selection for a trait can lead to increases in both residual and phenotypic variance because individuals of more variable genotypes are more likely to be selected [7,10]. A negative covariance of genetic effects on mean and variance implies that the change in variance would be lessened or even reversed. Selection in this population used information on individuals, full and half sib families for multiple traits, and so it is not possible to make precise predictions of changes in variance. Simple calculations based on the model of Hill and Zhang [10] and assuming intense mass selection on body weight indicate that the negative correlation would be too small to prevent an increase in variance, but family selection puts less pressure on the performance of extreme animals and would further dilute the effect.
If genetic variation in environmental variance can be incorporated into current selection tools, this could be of particular interest in threshold traits such as fecundity [23] and also where homogeneity of product is desirable, or even serve to harness a further source of variation.

CONCLUSIONS
A sire effect amounting to up to 15% of residual variance is supported by evidence from recent studies [20,23,24]. These involved very different approaches yet the magnitude of the effects found were similar, with CV(V W ) ranging from 0.11 to 0.29 when scaled by residual variance. The present data set is bigger than any yet presented, the first to examine a trait in both sexes, and, furthermore, the method has the benefit of transparency in that the magnitude of heterogeneity can be viewed for subsets of the data. It is, however, difficult to explain the low correlation of sire effects for males and females on the variance if the heterogeneity is genetically structured. Previous studies have considered traits measured in one sex or hermaphrodites. In turn the proportion of variance explained is similar for both sexes. What is clear is that the assumption of homogeneous residual variance underpinning many statistical tests such as ANOVA is violated. Although after careful examination of the data the removal of outliers reduced the estimate of heterogeneity by one third this still left 10-11% unaccounted for.
Further work is needed to consider the consequences of heterogeneity in selection programmes. Considerations include the possibility of accounting for and even utilising the genetic component in heterogeneity of residual variance using available estimation tools such as BLUP.

APPENDIX HETEROGENEITY DUE TO GENETIC SEGREGATION VARIANCE
Consider a single locus with two alleles A 1 and A 2 , with frequencies p and q = 1 − p, respectively, and genotypic values A 1 A 1 a, A 1 A 2 d and A 2 A 2 a. Assuming random mating, the variances within full sib families for each mating type are: A 1 A 1 × A 1 A 1 and A 1 A 1 × A 2 A 2 and A 2 A 2 × A 2 A 2 : 0 A 1 A 1 ×A 1 A 2 : (a−d) 2 /4; A 1 A 2 ×A 2 A 2 : (a+d) 2 /4; A 1 A 2 ×A 1 A 2 : a 2 /2+d 2 /4.
The variance between sires in within dam family variance is var(VW) = f i j ν 2 i j − ( f i j ν i j ) 2 . As general formulae in terms of p, q, a and d are messy and uninformative, we give examples. Additive gene action (d = 0): variation within families from segregation from the dam does not depend on the sire's genotype. Then ν 11 = ν 22 = 1/4a 2 , and var(V W ) = 2pq(1 − 2pq)(a 4 /16), simplifying to var(V W ) = a 4 /64 at its maximum when p = 1/2. The heritability contributed by the locus is h 2 = 2pqa 2 /V P , where V P is the phenotypic variance. Assuming for simplicity that the trait is determined by n identical loci, then h 2 = 2npqa 2 /V P and unless the heritability is very high and there is a large environmental correlation of sibs, the mean variance within families approximately equals V P . Hence CV(V W ) ≈ Complete dominance (d = a): there is no variation among progeny of A 1 A 1 sires, so ν 11 = 0, and also ν 12 = q(1 + p/2)a 2 , ν 22 = 2pqa 2 , with mean ν . = pq 2 (3 + q)a 2 . Although var(V W ) = 2pqv 2 12 + q 2 ν 2 22 − ν 2 . has pq 3 as a factor the equation is messy; but for p = 1/2, var(V W ) = (17/256)a 4 ∼ a 4 /16. The additive genetic variance is 8pq 3 a 2 , giving V A = a 2 /2 for p = 1/2. Hence, making the same assumptions as for the additive case, CV(V W )/h 2 ∼ 1/(2 √ n). The segregation variance contributes more with dominance, as expected from the contrast in variances within families of A 1 A 1 vs. A 2 A 2 sires.