# Estimation of breeding values for uniformity of growth in Atlantic salmon (*Salmo salar*) using pedigree relationships or single-step genomic evaluation

- Panya Sae-Lim
^{1}Email author, - Antti Kause
^{2}, - Marie Lillehammer
^{1}and - Han A. Mulder
^{3}

**49**:33

**DOI: **10.1186/s12711-017-0308-3

© The Author(s) 2017

**Received: **29 September 2016

**Accepted: **28 February 2017

**Published: **7 March 2017

## Abstract

### Background

In farmed Atlantic salmon, heritability for uniformity of body weight is low, indicating that the accuracy of estimated breeding values (EBV) may be low. The use of genomic information could be one way to increase accuracy and, hence, obtain greater response to selection. Genomic information can be merged with pedigree information to construct a combined relationship matrix (\({\mathbf{H}}\) matrix) for a single-step genomic evaluation (ssGBLUP), allowing realized relationships of the genotyped animals to be exploited, in addition to numerator pedigree relationships (\({\mathbf{A}}\) matrix). We compared the predictive ability of EBV for uniformity of body weight in Atlantic salmon, when implementing either the \({\mathbf{A}}\) or \({\mathbf{H}}\) matrix in the genetic evaluation. We used double hierarchical generalized linear models (DHGLM) based either on a sire-dam (sire-dam DHGLM) or an animal model (animal DHGLM) for both body weight and its uniformity.

### Results

With the animal DHGLM, the use of \({\mathbf{H}}\) instead of \({\mathbf{A}}\) significantly increased the correlation between the predicted EBV and adjusted phenotypes, which is a measure of predictive ability, for both body weight and its uniformity (41.1 to 78.1%). When log-transformed body weights were used to account for a scale effect, the use of \({\mathbf{H}}\) instead of \({\mathbf{A}}\) produced a small and non-significant increase (1.3 to 13.9%) in predictive ability. The sire-dam DHGLM had lower predictive ability for uniformity compared to the animal DHGLM.

### Conclusions

Use of the combined numerator and genomic relationship matrix (\({\mathbf{H}}\)) significantly increased the predictive ability of EBV for uniformity when using the animal DHGLM for untransformed body weight. The increase was only minor when using log-transformed body weights, which may be due to the lower heritability of scaled uniformity, the lower genetic correlation of transformed body weight with its uniformity compared to the untransformed traits, and the small number of genotyped animals in the reference population. This study shows that ssGBLUP increases the accuracy of EBV for uniformity of body weight and is expected to increase response to selection in uniformity.

## Background

In aquaculture, selection to increase economically important traits such as growth is one of the main breeding goals. However, fish producers show interest to improve not only the mean but also the variance of traits [1]. Uniformity of growth is preferable because more uniform growth allows a more uniform product, harvest of a larger proportion of the population at market size, and reduction of size grading and multiple harvests [2–4]. More uniform growth may also reduce competitive interactions between animals, which contributes to reduce feed monopolization and dominant behaviour, and thus improve well-being of fish [5]. Uniformity is also important for traits that have an intermediate optimal trait value [6], such as fillet lipid%, body shape, and condition factor in the aquaculture industry. A fish whose growth is sensitive to non-measurable environmental factors, known as micro-environments, shows micro-environmental sensitivity, which results in high environmental variance and consequently contributes to increased phenotypic variation, leading to increased size variation within a group of fish. A number of empirical studies in terrestrial and aquatic species show that uniformity is partly determined by genetic factors [4, 7–16]. Thus, selective breeding can open up one avenue to improve uniformity of fish traits.

Atlantic salmon (*Salmo salar* L.) is a farmed fish that is of major economic importance. Heritability for uniformity of body weight has been estimated in Atlantic salmon [14], rainbow trout (*Oncorhynchus mykiss* Walbaum) [4, 8], and Nile tilapia (*Oreochromis niloticus*) [15, 16]. In general, heritability for uniformity (\(h_{v}^{2}\)) is low in livestock and aquaculture species (\(h_{v}^{2}\) < 0.05), indicating that the prediction accuracy of breeding values for uniformity may be low [17, 18]. However, the coefficient of genetic variation (\(GCV\)) of uniformity of body weight is high in fish species (median \(GCV\) = 34.0%: min = 17.4% and max = 64.0%), which indicates high potential for response to selection [4, 8, 14, 16, 19]. One way to increase response to selection for uniformity is to increase the accuracy of estimated breeding values (EBV) for uniformity [20].

In aquaculture, full- and half-sib family sizes are usually large and thus the accuracy of EBV based on full-sibs, half-sibs and own performance is high for body weight, but not for uniformity due to its low heritability [8]. One approach to increase the accuracy of EBV is to use genomic information [21]. With genomic selection, genomic estimated breeding values (GEBV) can be obtained for the selection candidates that are genotyped, even when they have no phenotype records. One reason why genomic selection results in higher accuracy of selection is the more accurate estimation of the Mendelian sampling genetic effects through realized additive genetic relationships among animals [22]. Consequently, individual squared residuals, which is the phenotype for uniformity in a double hierarchical generalized linear model (DHGLM), may also be more accurately estimated when using genomic information.

In many cases, combining numerator pedigree and genomic information in genomic evaluations is implemented in multiple steps, which may introduce bias and need some calculations to combine with pedigree-based EBV [23, 24]. Single step genomic best linear unbiased prediction (ssGBLUP) avoids this, and genomic and pedigree information are combined in one step [23], which may lead to less bias and is less prone to double counting of information compared to genomic evaluation methods that are performed in multiple steps. The ssGBLUP augments the numerator relationship (\({\mathbf{A}}\)) matrix by the genomic relationship (\({\mathbf{G}}\)) matrix in conventional genetic evaluation using BLUP [24]. This combined numerator and genomic relationship matrix is known as the \({\mathbf{H}}\) matrix [25]. In fish breeding, combining pedigree and genomic information allows exploiting the large full- and half-sib families and the more accurate relationships of the genotyped animals, and may yield a higher accuracy of selection for uniformity than the use of the \({\mathbf{A}}\) matrix.

To date, the use of ssGBLUP for uniformity has not been studied. Furthermore, according to a previous study, the sire-dam model, but not the animal model, implemented within the framework of DHGLM provided unbiased (co)variance component estimates [14]. However, an animal DHGLM is expected to perform better than a sire-dam DHGLM for genetic evaluation because the animal DHGLM uses full relationships between animals rather than only among sires and dams. This is particularly important for uniformity, which is quantified by the residuals of individuals, which in the animal model do not contain the Mendelian sampling term. Moreover, for genetic evaluation, the animal DHGLM uses all phenotypic information and, for most breeding programs, at least part of the selection candidates, e.g. females for sex-linked traits, have phenotypes available at the time of selection. Use of the animal DHGLM with ssGBLUP for uniformity has not been tested.

In this study, we implemented ssGBLUP for predicting GEBV for uniformity in Atlantic salmon. Specifically, our aim was to compare the predictive ability of EBV for uniformity of body weight when implementing either BLUP with the \({\mathbf{A}}\) matrix or ssGBLUP with the \({\mathbf{H}}\) matrix. The (co)variance components were estimated from the sire-dam DHGLM with either \({\mathbf{A}}\) or \({\mathbf{H}}\) and compared prior to genetic evaluation.

## Methods

### Data

*P*value cut-off of 10

^{−15}, and (3) their minor allele frequency (MAF) was lower than 0.01. After quality control, 921 of 31,013 SNPs were removed (2.9%) and, thus 30,092 SNPs remained to create the genomic relationship.

Population structure of Atlantic salmon

Population structure | |
---|---|

Sires, dams | 131, 234 |

Sires per dam, mean (range) | 1.0 (1.0) |

Dams per sire, mean (range) | 1.78 (1–3) |

Full-sib families | 234 |

Fish per full-sib family, mean (range) | 15.4 (4–54) |

Number of fish with records | 3595 |

Genotyped animals | |
---|---|

Full-sib families | 50 |

Fish per full-sib family, mean (range) | 28.3 (21–38) |

Number of fish with records | 1416 |

### Relationship matrix

### Statistical analysis

#### Analysis of residuals

Uniformity can be quantified by squared residuals from a BLUP mixed model equation [32]. The use of genomic information to construct realised relationships between animals, especially for full-sibs, is expected to increase the accuracy of residual estimates due to a greater accuracy of EBV for body weight. Therefore, we investigated the effect of ssGBLUP and traditional BLUP on individual residual estimation. Furthermore, we investigated sire-dam and animal models because residual estimates from a sire-dam model contain not only the unexplained environmental effects but also Mendelian sampling genetic effects. Residual estimates from an animal model do not contain the latter when EBV are estimated with an accuracy of 1. In total, residuals from four models were compared, i.e. the sire-dam or animal model with either \({\mathbf{A}}\) or \({\mathbf{H}}\).

*i*th individual, \(\mu\) is the overall mean, \(age\) is the fixed covariate effect due to different levels of age of the fish, calculated from the start feeding date until the date of measurement (day), \(\beta\) is the fixed linear regression coefficient on age, \(t\) is the \(l\)th fixed communal tank effect, \(yc\) is the \(m\)th fixed effect of year class of the parents, \(a_{i}\) is the random additive genetic effect, \(\varvec{a}\sim\,\left[ {0, {\mathbf{A}}\sigma_{a}^{2} } \right]\), where \({\mathbf{A}}\) is the numerator relationship matrix, or \(\varvec{a}\sim\,N\left[ {0, {\mathbf{H}}\sigma_{a}^{2} } \right]\), where \({\mathbf{H}}\) is the combined genomic and pedigree relationship matrix, \(N\) is the normal distribution, and \(\sigma_{a}^{2}\) is the additive genetic variance for body weight, \(c_{n}\) is the random common effect for full-sibs, \(\varvec{c}\sim{\text{N}}\left[ {0, {\mathbf{I}}\sigma_{c}^{2} } \right],\) where \({\mathbf{I}}\) is the identity matrix and \(\sigma_{c}^{2}\) is the common environmental variance of body weight, and \(e_{iklmn}\) is the random residual effect, \(\varvec{e}\sim\,N\left[ {0, {\mathbf{I}}\sigma_{e}^{2} } \right]\), where \(\sigma_{e}^{2}\) is the residual variance of body weight assumed to be homogeneous. For the sire-dam model, the term \(a_{i}\) in Eq. (1) was replaced by the random sire-dam (\(u_{i}\)) effect, \(\varvec{u}\sim\,N\left[ {0, {\mathbf{A}}\sigma_{u}^{2} } \right]\) or \(\varvec{u}\sim\,N\left[ {0,{\mathbf{H}}\sigma_{u}^{2} } \right]\). The same \({\mathbf{A}}\) and \({\mathbf{H}}\) matrices were used for the sire-dam and the animal models.

#### Estimation of genetic parameters for uniformity

To estimate genetic parameters for body weight and its uniformity, the sire-dam DHGLM was used [33, 34] because it is expected to provide unbiased (co)variance components for uniformity [14]. Body weight records were treated in two different ways. First, observed body weight was standardized to a mean of 0 and variance of 1, which facilitates convergence of the DHGLM. Second, we used either the natural log or the Box–Cox transformation to account for possible scale effects, because variances typically increase with increasing trait means [35, 36]. For the Box–Cox transformation, each observation was computed as \(\frac{{y_{i}^{\lambda } - 1}}{\lambda }\), where \(\lambda\) is the transformation parameter, which was estimated based on Eq. (1) without the random effects [37] by maximum likelihood using the MASS package in R software [38]. The estimate of \(\lambda\) was close to 0 (0.076), indicating that the Box–Cox transformation is very similar to log-transformation, which sets \(\lambda\) equal to 0. Therefore, the Box–Cox transformed body weight was not used further. The standardized body weight and natural logarithm body weight are abbreviated as stdWT and lnWT, respectively.

### Calculation of genetic parameters

In the sire-dam DHGLM, the estimated variance for sires was set equal to the estimated genetic variance for dams and equal to one quarter of the additive genetic variance. Hence, the additive genetic variance for body weight (\(\sigma_{a}^{2}\)) and its uniformity (\(\sigma_{{a_{v} ,exp}}^{2}\)) were equal to \(4\sigma_{u}^{2}\) and \(4\sigma_{{u_{v} ,exp}}^{2}\), respectively. Estimates for \(\sigma_{{u_{v} ,exp}}^{2}\) and \(\sigma_{{c_{v} ,exp}}^{2}\) for uniformity of body weight were on the exponential scale (\(exp\)) and were converted to an additive scale (\(\sigma_{{u_{v} }}^{2}\) and \(\sigma_{{c_{v} }}^{2}\)) using the extension of the equations of Mulder et al. [17], as derived by Sae-Lim et al. [8]. The additive genetic variance for uniformity of body weight on the additive scale was equal to \(4\sigma_{{u_{v} }}^{2}\). Phenotypic variance (\(\sigma_{P}^{2}\)) of body weight was equal to \(2\sigma_{u}^{2} + \sigma_{c}^{2} + \sigma_{e}^{2}\), where \(\sigma_{c}^{2}\) is the variance component for the effect common to full-sibs and \(\sigma_{e}^{2}\) is the residual variance of body weight. Heritability for body weight (\(h^{2}\)) was calculated as \(\sigma_{a}^{2} /\sigma_{P}^{2}\). Heritability for uniformity of body weight (\(h_{v}^{2}\)) on the additive scale was calculated as \(\frac{{\sigma_{{a_{v} }}^{2} }}{{2\sigma_{P}^{4} + 3\left( {\sigma_{{a_{v} }}^{2} + \sigma_{{c_{v} }}^{2} } \right)}}\) [8, 40]. Similarly, the common environmental effect was calculated as \(c^{2} = \sigma_{c}^{2} /\sigma_{P}^{2}\) for body weight and as \(c_{v}^{2} =\) \(\frac{{\sigma_{{c_{v} }}^{2} }}{{2\sigma_{P}^{4} + 3\left( {\sigma_{{a_{v} }}^{2} + \sigma_{{c_{v} }}^{2} } \right)}}\) for uniformity of body weight [8]. The genetic coefficient of variation for uniformity of body weight (\(GCV\)) was calculated as \(\sqrt {\sigma_{{a_{v} ,exp}}^{2} }\). Standard errors of \(h_{v}^{2}\) and \(GCV\) were approximated using the equations presented by Mulder et al. [41].

### Genetic evaluation and cross-validation

Two genetic evaluations, i.e., BLUP with \({\mathbf{A}}\) and ssGBLUP with \({\mathbf{H}}\), were performed in a 10-fold cross-validation using the genetic parameters estimated based on the sire-dam DHGLM and !BLUP option in ASReml. In total, four models were used in the 10-fold cross-validation, i.e. animal DHGLM with either \({\mathbf{A}}\) or \({\mathbf{H}}\) on stdWT and lnWT.

- 1.
Adjusted phenotypes for body weight (\(y_{i}^{*}\)) and its uniformity (\(\psi_{i}^{*}\)) were calculated as \(y_{i}^{*} = \hat{a}_{i} + \hat{c}_{i} + \hat{e}_{i}\) and \(\psi_{i}^{*} = \hat{a}_{{v_{i} }} + \hat{c}_{{v_{i} }} + \hat{e}_{{v_{i} }}\), using the solutions from the analysis with Eq. (2) on the full dataset.

- 2.
In a modified dataset, approximately 10% of observed phenotypes (\(y_{i}\)) of animals from each family were masked (=10% of the full dataset). All phenotypes had an equal chance to be masked, but the animals that were masked in the previous fold were not masked again in the next fold.

- 3.
The genetic analysis with Eq. (2) was run on the modified dataset using the \({\mathbf{A}}\) and \({\mathbf{H}}\) matrices and EBV for body weight and its uniformity were predicted for the masked animals.

- 4.For each fold, two measurements were computed:
- (a)
The predictive ability of EBV was calculated as the Pearson correlation of adjusted phenotypes (step 1) with the corresponding EBV (\(\hat{a}^{*}\)) (step 3) for the masked animals that were genotyped, i.e., cor(\(y_{i}^{*}\), \(\hat{a}_{i}^{*}\)) for body weight and cor(\(\psi_{i}^{*}\), \(\hat{a}_{{v_{i} }}^{*}\)) for uniformity. Kendall and Spearman correlations were also calculated for uniformity because \(\psi_{i}^{*}\) was exponentially rather than normally distributed.

- (a)
- (b)
To measure the degree of bias and accuracy of EBV or GEBV of the masked records, the mean square error prediction (MSEP) was calculated as \(\frac{{\mathop \sum \nolimits_{i}^{n} \left( {\hat{a}_{i}^{*} - y_{i}^{*} } \right)^{2} }}{n}\) for body weight and \(\frac{{\mathop \sum \nolimits_{i}^{n} \left( {\hat{a}_{{v_{i} }}^{*} - \psi_{i}^{*} } \right)^{2} }}{n}\) for uniformity of body weight, where \(n\) is the number of masked records in each fold. The MSEP was scaled by the variance of the adjusted phenotypes of the corresponding trait.

- 5.
Steps (1) to (4) were repeated for each of the 10 folds.

Finally, average Pearson, Kendall, and Spearman correlations, MSEP and their standard error (SE) over the 10 folds were calculated. A 95% confidence interval of the difference (\(d\)) in the predictive ability from different models with either \({\mathbf{A}}\) or \({\mathbf{H}}\) was constructed using \(d \pm 1.96 \times {\text{SE}}_{d}\), where the \({\text{SE}}_{d} = \sqrt {\frac{{SD_{animal\,\,DHGLM}^{2} + SD_{{sire{\text{-} }dam \,\,DHGLM}}^{2} }}{number\,\,of\,\, folds}}\). When 0 was not within the 95% confidence interval, the predictive abilities of two models were considered statistically different (*P* < 0.05).

## Results

### Residual estimates

For the sire-dam model, the use of \({\mathbf{H}}\) instead of \({\mathbf{A}}\) did not affect estimated residuals of genotyped animals since the regression coefficient of the estimated residuals using \({\mathbf{H}}\) on the estimated residuals using \({\mathbf{A}}\) and the Pearson correlation between the two were equal to 0.999, which was very similar to the regression coefficient of non-genotyped animals (0.998). The Pearson correlations between estimated residuals using \({\mathbf{H}}\) and \({\mathbf{A}}\) were the same as regression coefficients of estimated residuals using **A** on estimated residuals using **H** for genotyped animals (0.999) and non-genotyped animals (0.998).

In contrast, the use of \({\mathbf{H}}\) in the animal model affected residual estimates of genotyped animals since their distribution was much more scattered (Fig. 1). The slope of estimated residuals using **A** on estimated residuals using **H** was lower than 1 and slightly steeper for genotyped animals (regression coefficient = 0.7025) than for non-genotyped animals (regression coefficient = 0.6798). The Pearson correlations between estimated residuals using \({\mathbf{H}}\) and \({\mathbf{A}}\) were equal to 0.922 for genotyped animals and 0.966 for non-genotyped animals.

When using the sire-dam model, the difference in estimated residuals with \({\mathbf{H}}\) and \({\mathbf{A}}\) was small and ranged from −10.8 to 10.0. When using the animal model, this difference was larger and ranged from −95.3 to 104.5.

### Genetic parameters of body weight and its uniformity

Estimates of variance components and genetic parameters of body weight and its uniformity based on the sire-dam double hierarchical generalized linear model when using pedigree (\({\mathbf{A}}\)) or combined pedigree and genomic relationships (\({\mathbf{H}}\)) and standard or log-transformed phenotypes

Trait/parameter | Standardization | Logarithm | ||
---|---|---|---|---|

\({\mathbf{A}}\) | \({\mathbf{H}}\) | \({\mathbf{A}}\) | \({\mathbf{H}}\) | |

Body weight | ||||

\(\sigma_{P}^{2}\) | 0.843 | 0.856 | 0.131 | 0.132 |

\(\sigma_{a}^{2}\) | 0.216 | 0.243 | 0.043 | 0.046 |

\(\sigma_{c}^{2}\) | 0.095 | 0.091 | 0.013 | 0.014 |

| 0.266 | 0.296 | 0.325 | 0.346 |

| 0.117 | 0.111 | 0.103 | 0.103 |

Uniformity of body weight | ||||

\(\sigma_{{a_{v} ,exp}}^{2}\) | 0.2303 | 0.2732 | 0.0896 | 0.0885 |

\(\sigma_{{a_{v} }}^{2}\) | 0.0612 | 0.0677 | 0.0005 | 0.0005 |

\(\sigma_{{c_{v} }}^{2}\) | 0.0360 | 0.0306 | 0.0000 | 0.0001 |

\(GCV\) | 0.480 | 0.523 | 0.299 | 0.298 |

\(h_{v}^{2}\) | 0.036 | 0.038 | 0.015 | 0.014 |

\(c_{v}^{2}\) | 0.022 | 0.019 | 0.001 | 0.002 |

When using \({\mathbf{A}}\), the estimate of \(h_{v}^{2}\) was higher for uniformity of stdWT (0.036) than for uniformity of lnWT (0.015), while the use of \({\mathbf{H}}\) did not affect the magnitude of \(h_{v}^{2}\) for uniformity. Standard errors of \(h_{v}^{2}\) estimates were, however, high (Table 2).

Although the estimates of \(h_{v}^{2}\) were low, estimates of \(GCV\) were high for uniformity of stdWT (48.0% for \({\mathbf{A}}\) and 52.3% for \({\mathbf{H}}\)), which indicates substantial genetic potential for response to selection. After accounting for scale effects, estimates of \(GCV\) for uniformity of lnWT were reduced to 30% (for both \({\mathbf{A}}\) and \({\mathbf{H}}\)), which supports the existence of genetic variation for uniformity beyond the scale effects.

Estimates of \(c^{2}\) for stdWT and lnWT were moderate and similar for \({\mathbf{A}}\) (0.103 to 0.117) and \({\mathbf{H}}\) (0.103 to 0.111), which suggests that part of the phenotypic variation was explained by non-genetic effects that are common to full-sibs. Instead, the estimates of \(c_{v}^{2}\) for uniformity of stdWT and lnWT were very low and ranged from 0.001 to 0.022 for \({\mathbf{A}}\) and 0.002 to 0.019 for \({\mathbf{H}}\) (Table 2).

The estimate of the genetic correlation between stdWT and its uniformity was close to 1, using either \({\mathbf{A}}\) (0.952) or \({\mathbf{H}}\) (0.951), which shows the high dependency between mean and variance of body weight. However, the estimate of the genetic correlation between lnWT and its uniformity was reduced to −0.093 with \({\mathbf{A}}\) and to 0.024 with \({\mathbf{H}}\), which suggests that after accounting for the scale effects, the mean and variance became independent.

### Cross-validation

The average correlation of adjusted phenotypes with predicted breeding values for stdWT and its uniformity was significantly higher with \({\mathbf{H}}\) (stdWT = 0.443; uniformity = 0.217 to 0.317) than with \({\mathbf{A}}\) (stdWT = 0.372; uniformity = 0.128 to 0.192). However, after accounting for scale effects using log-transformation, the average Pearson, Kendall and Spearman correlations of adjusted phenotypes with predicted breeding values for lnWT and their uniformity were only slightly higher with \({\mathbf{H}}\) than with \({\mathbf{A}}\), and not significantly different from each other (*P* > 0.05).

Average Pearson, Kendall and Spearman correlations and mean square error prediction from a 10-fold cross-validation based on the sire-dam double hierarchical generalized linear model^{a} when using pedigree (\({\mathbf{A}}\)) or combined pedigree and genomic relationships (\({\mathbf{H}}\)) and standard or log-transformed phenotypes

Transformation | Relationship | Body weight | Uniformity of body weight | ||||
---|---|---|---|---|---|---|---|

Pearson | MSEP | Pearson | Kendall | Spearman | MSEP | ||

Standardized | \({\mathbf{A}}\) matrix | 0.372 | 0.724 | 0.192 | 0.128 | 0.178 | 0.625 |

\({\mathbf{H}}\) matrix | 0.443 | 0.682 | 0.271 | 0.217 | 0.317 | 0.608 | |

Logarithm | \({\mathbf{A}}\) matrix | 0.396 | 0.823 | 0.378 | 0.182 | 0.263 | 0.936 |

\({\mathbf{H}}\) matrix | 0.440 | 0.813 | 0.383 | 0.203 | 0.294 | 0.944 |

The predictive ability of EBV of uniformity was sensitive to the type of correlation used, i.e. Pearson, Kendall and Spearman (Table 3). Spearman correlations were 39.1 to 49.0% higher than Kendall correlations. Predictive abilities of EBV and GEBV for uniformity of lnWT differed more from each other based on Kendall and Spearman correlations, albeit not significant at *P* < 0.05, than based on Pearson correlations. However, the SE of Kendall correlations were approximately 50% lower than the SE of Pearson and Spearman correlations, suggesting that Kendall correlations provide a more reliable estimate of predictive ability than Pearson and Spearman correlations.

## Discussion

To the best of our knowledge, this is the first study that compares the use of the numerator relationship (\({\mathbf{A}}\)) and a combined genomics and numerator relationship (\({\mathbf{H}}\)) matrix for estimating genetic parameters and predicting breeding values for body weight and its uniformity. The use of the animal DHGLM with \({\mathbf{H}}\) significantly improved the predictive ability of GEBV for uniformity of body weight (stdWT) but not for scale-adjusted uniformity.

### Genetic parameters

The estimate of heritability for uniformity of stdWT from sire-dam DHGLM with \({\mathbf{A}}\) was low (\(h_{v}^{2}\) = 0.036) but higher than estimates of \(h_{v}^{2}\) obtained in previous studies on rainbow trout [4, 8] and Nile tilapia [15, 16] (\(\bar{h}_{v}^{2}\) = 0.016: min = 0.010: max = 0.024). However, after accounting for scale effects by logarithm transformations, the estimate of \(h_{v}^{2}\) decreased to 0.014 to 0.015, which is in line with the previous reports that also used transformations [4, 8, 14–16].

Estimates of \(h_{v}^{2}\) for stdWT and lnWT using the sire-dam DHGLM with \({\mathbf{H}}\) did not differ from those with \({\mathbf{A}}\), which is in line with estimates of \(h_{v}^{2}\) for uniformity of piglet birth weight obtained using either \({\mathbf{A}}\) or only the genomic relationship matrix (\({\mathbf{G}}\)) [42], while lower estimates were reported for environmental variance of somatic cell score in dairy cattle when using \({\mathbf{G}}\) compared to \({\mathbf{A}}\) [43]. The similarity of the estimates of \(h_{v}^{2}\) obtained by using \({\mathbf{A}}\) or \({\mathbf{H}}\) in this study can be explained by the very similar estimated residuals (proxy of uniformity) between non-genotyped and genotyped animals when using the sire-dam model with \({\mathbf{A}}\) and \({\mathbf{H}}\). The sire-dam model only exploits relationships between sires and dams and does not exploit the full potential of the genotype-based relationships between animals, and especially between full-sibs. In contrast, residuals of genotyped animals estimated by using the animal model were more differentiated when either \({\mathbf{A}}\) or \({\mathbf{H}}\) was used, and likely more accurate than estimates of residuals for non-genotyped animals. However, in a DHGLM analysis, the sire-dam model provides less biased (co)variance components than the animal model [14], likely because of the dependence between estimates of the breeding value and residual of an individual, which are obtained from the same phenotype of body weight. The use of genomic relationships combined with numerator relationships is expected to reduce the dependency between EBV and estimated residuals because the EBV are more accurate. Therefore, we performed the animal DHGLM with \({\mathbf{H}}\) but the model did not converge when the variance components were estimated, which may be due to (1) the dependency between EBV and estimated residuals for body weight remaining high, or (2) the difficulty to disentangle genetic effects from the common environmental effects for uniformity of body weight.

The standard errors of \(h_{v}^{2}\) estimates were high, which may be due to the large variation in family size (4 to 54). According to Hill and Mulder [30], large family sizes or repeated measurements are recommended for estimating the genetic heteroscedasticity of traits. The optimal full-sib family size is 39 with a \(GCV\) of 39% and an \(h^{2}\) of 0.36 [18]. The use of \({\mathbf{H}}\) did not affect the standard errors of the \(h_{v}^{2}\) estimates, which does not agree with previous studies, for example Veerkamp et al. [44] reported lower standard errors of \(h^{2}\) estimates for dry matter intake, milk yield, and body weight of heifers when using genomic relationships with an animal model. One possible explanation could be that the benefit of using the genomic relationship matrix may be limited when the sire-dam DHGLM is applied, since the variance of genomic relationships between sires (0.02) was very similar to the variance of numerator relationships between sires (0.01). In contrast, the variance in genomic relationships between animals (0.01) was much larger than the variance of numerator relationships between animals (0.004). Hence, the SE of \(h_{v}^{2}\) estimates may be lower when using an animal DHGLM with a genomic relationship matrix, compared to the numerator relationship matrix. Nevertheless, in our study, it was not possible to investigate this phenomenon since the log-likelihood did not converge for the animal DHGLM with \({\mathbf{H}}\).

The \(GCV\) of uniformity for stdWT was substantial (48.0%), which indicates high potential for response to selection. This result is in the upper range of previous findings in fish species (17.4 to 64.0%) [4, 8, 14–16, 19] and in terrestrial animals (10.0 to 58.0%) [6, 9–13, 42]. After accounting for scale effects by logarithm transformations, the \(GCV\) for uniformity was reduced but still substantial (29.5 to 29.9%), which was also reported in previous studies on Atlantic salmon [14], rainbow trout [8], rabbit, and pig [45]. Thus, scale effects affect estimates of genetic parameters for uniformity of body weight considerably, but there is genetic variation for uniformity beyond the scale effect.

Genomic information slightly increased the \(GCV\) of uniformity of stdWT (from 48.0 to 52.3%). In contrast, genomic information did not influence the \(GCV\) for uniformity of lnWT (29.8%). Since estimates of genetic parameters obtained with \({\mathbf{A}}\) and \({\mathbf{H}}\) were similar, the \(GCV\) for body weight remained similar, which is in agreement with the previous comparison between \({\mathbf{A}}\) (\(GCV\) = 11.0 to 12.0%) and \({\mathbf{G}}\) (\(GCV\) = 10.0 to 11.0%) for uniformity of birth weight of piglets using a dam model [42].

### Genetic and genomic predictions

In this study, we used the Pearson correlation of EBV and GEBV with adjusted phenotype as the measure of predictive ability. The use of \({\mathbf{H}}\) instead of \({\mathbf{A}}\) in the animal DHGLM significantly improved the ability to predict breeding values for stdWT (19%) and its uniformity (41.1 to 78.1%). Furthermore, the use of the animal DHGLM instead of the sire-dam DHGLM significantly increased the predictive ability of EBV and GEBV for uniformity (see Additional file 2: Table S1), as expected.

Our findings indicate that ssGBLUP with an animal DHGLM can increase the accuracy of EBV for uniformity substantially compared to pedigree-based BLUP. However, after accounting for scaling effects by using log transformations, the use of \({\mathbf{H}}\) compared to \({\mathbf{A}}\) only slightly improved the correlation (1.6 to 13.9%) and MSEP between GEBV and adjusted phenotypes, and the improvement was not significant. There are two main reasons why these results differed between uniformity of stdWT and lnWT. First, log-transformation substantially reduced the genetic correlation of stdWT with its uniformity. As a result, any increases in predictive ability of GEBV of lnWT when using \({\mathbf{H}}\) instead of \({\mathbf{A}}\) (15.7%) did not positively influence the predictive ability of GEBV of its uniformity. Second, the lower additive genetic variance and \(h_{v}^{2}\) for uniformity after accounting for scale effects reduces the accuracy of EBV for uniformity. Consequently, MSEP increased from 0.63 (stdWT) to 0.94 (lnWT) with \({\mathbf{A}}\) and from 0.61 (stdWT) to 0.94 (lnWT) with \({\mathbf{H}}\) in the animal DHGLM.

The accuracy of genomic selection is expected to increase when the number of genotyped animals in the reference population increases for any trait [46] but in particular for lowly heritable traits, such as uniformity as shown by Sell-Kubiak et al. [42] and somatic cell score by Mulder et al. [43]. In this study, uniformity of lnWT had an even lower \(h_{v}^{2}\) than uniformity of stdWT. The number of genotyped animals in the reference population used for cross-validation was on average equal to 1274, which may have limited the benefit of using genomic information in ssGBLUP. A future empirical study should investigate the effect of the number of genotyped animals in the reference population on the ability to predict breeding values for uniformity to validate our findings and conclusions.

### Pearson or rank correlations?

Squared residuals or adjusted phenotype for uniformity (\(\psi_{i}^{*} )\) are exponentially rather than normally distributed, which may not justify quantifying predictive ability using a Pearson correlation. Thus, we also calculated distribution-free rank correlations (Kendall and Spearman) and indeed found estimation of the predictive ability for uniformity to be sensitive to the type of correlation used.

Although not significantly different, Kendall and Spearman correlations explained differences in predictive ability of EBV and GEBV for uniformity of lnWT slightly better than the Pearson correlation. Hence, the conclusion that the benefit of using genomic relationship for computing EBV for uniformity of logarithm transformations is limited remained the same when using Kendall and Spearman correlations. Colwel and Gillett [47] showed that, in general, estimates of Kendall correlations are similar to estimates of Spearman correlations, but in some cases, the magnitude of Spearman correlations can be 50% greater than the magnitude of Kendall correlations [47]. This is in line with our findings since Spearman correlations were 42.2 to 49.0% and 39.1 to 46.1% greater than Kendall correlations for the sire-dam DHGLM (see Additional file 2: Table S1) and the animal DHGLM, respectively. Nevertheless, the SE of Kendall correlations were notably lower by approximately 50% than the SE of Pearson and Spearman correlations, which indicates that the Kendall correlation may be a more reliable estimate of predictive ability than the Pearson and Spearman correlations. Hence, it is recommended to use Kendall instead of Pearson correlations when studying predictive ability for uniformity.

### Selection for uniformity

For fish breeding, major goals are to increase mean body weight and reduce variability (more uniformity) of body weight. Nevertheless, definitions of uniformity of stdWT and lnWT are not the same. From a biological point of view, genetic variation for environmental canalization can be quantified after the scale effect is accounted for. However, from an animal breeding point of view, uniformity on the observed scale explains the actual range of fish sizes that are processed by aquaculture industries.

Selection for body weight and uniformity may be challenging because the genetic correlation between body weight and its variability is high and positive, and sometimes approaches 1. A general observation is that the genetic correlation between log-transformed body weight and its variability is zero or even negative, allowing selection to simultaneously increase transformed body weight and reduce variability. Therefore log-transformed body weight and its variability could be included in a selection index. This would require knowledge of the genetic correlation between variability of stdWT and variability of lnWT, which is not available. Hence, we used the sire-EBV and sire-GEBV, obtained from BLUP and ssGBLUP with the animal DHGLM, to calculate the Pearson correlations between EBV for the trait and its variability. Pearson correlations between EBV for variability of stdWT and EBV for variability of lnWT were positively moderate with BLUP (0.48) and ssGBLUP (0.68). The Pearson correlation between EBV for stdWT and EBV for its variability was close to 1 with either BLUP (0.96) or ssGBLUP (0.97), while the Pearson correlation between EBV of lnWT and its variability was −0.24 for BLUP and -0.005 for ssGBLUP. Not surprisingly, the Pearson correlation between EBV for lnWT and EBV for stdWT was highly positive (0.82). These Pearson correlations suggest that variability of lnWT should be included in the selection index because the GEBV for variability of lnWT are positively correlated with GEBV for variability of stdWT, which indicates that selection against variability of lnWT will indirectly reduce variability on the observed scale. Furthermore, GEBV for variability of lnWT are not correlated to GEBV for lnWT and thus selection against variability of lnWT will not indirectly reduce lnWT.

Selecting the 10% best sires on an index with \(b_{1}\) of 0.3, i.e., \({\text{I}} = 0.3 * {\text{GEBV}}_{\text{lnWT}} - 0.7 * {\text{GEBV}}_{{{\text{variability}}_{\text{lnWT}} }}\) provides almost no genetic gain in variability of stdWT (−0.001) but positive genetic gain in stdWT (3.62% of mean body weight in g). In contrast, a selection index, based on breeding goal traits:\({\text{I}} = 0.52 * {\text{GEBV}}_{\text{stdWT}} - 0.48 * {\text{GEBV}}_{{{\text{variability}}_{\text{stdWT}} }}\), provides zero genetic gains for both stdWT and its variability, showing no possibility to achieve genetic gain on body weight while maintaining stable phenotypic variability. Nevertheless, the genetic gain in stdWT was much greater (17.32% of mean body weight in g) when variability was not included in the selection index. Therefore, although it is possible to increase body weight while keeping variability constant, there is a trade-off in genetic gain for body weight when selecting for reduced variability.

## Conclusions

The use of the animal DHGLM instead of the sire-dam DHGLM substantially increased the predictive ability for breeding values of uniformity, because the animal DHGLM fully exploits the relationships between full- and half-sibs. When using the animal DHGLM, the use of a combined numerator and genomic relationship matrix significantly increased the predictive ability for breeding values of uniformity of body weight, but only a slight and non-significant increase was observed after accounting for the scale effects by using transformed body weights. The small increase in predictive ability with transformed body weights may be due to lower heritability for uniformity of transformed body weight, a lower genetic correlation between transformed body weights and their uniformities, and/or a small number of genotyped animals in the reference population. The use of a Kendall correlation provided the lowest SE of predictive ability for uniformity and provided a more accurate estimate of predictive ability for uniformity over Pearson and Spearman correlations. In conclusion, the use of ssGBLUP increases the accuracy of breeding values for uniformity of harvest weight, which is expected to increase response to selection in uniformity.

## Declarations

### Authors’ contributions

PSL analyzed the data. HAM, AK, and ML provided theoretical support for genomic DHGLM and cross-validation. HAM, AK, and ML contributed to the discussion of the results. PSL drafted the manuscript. HAM, AK, and ML improved the manuscript. All authors read and approved the final manuscript.

### Acknowledgements

This study is a part of the research project entitled STABLEFISH funded by Norwegian Research Council (NRC: 234144/E49). We would like to thank SalmoBreed for providing data for this study. Matthew Baranski genotyped the animals and generated genotype data file for this study. Arthur Gilmour is acknowledged for help in implementing genomic DHGLM in ASReml v4. PSL would like to thank Mario Calus for his guidance on Cal_grm computer software, Solomon Antwi Boison and Sergio Vela Avitúa for a fruitful discussion during the drafting of this manuscript.

### Competing interests

The authors declare that they have no competing interests.

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

## Authors’ Affiliations

## References

- Sae-Lim P, Komen H, Kause A, van Arendonk JAM, Barfoot AJ, Martin KE, et al. Defining desired genetic gains for rainbow trout breeding objective using analytic hierarchy process. J Anim Sci. 2012;90:1766–76.View ArticlePubMedGoogle Scholar
- Gilmour KM, DiBattista JD, Thomas JB. Physiological causes and consequences of social status in salmonid fish. Integr Comp Biol. 2005;45:263–73.View ArticlePubMedGoogle Scholar
- Janhunen M, Kause A, Järvisalo O. Costs of being extreme - Do body size deviations from population or sire means decrease vitality in rainbow trout? Aquaculture. 2012;370–371:123–9.View ArticleGoogle Scholar
- Janhunen M, Kause A, Vehviläinen H, Jarvisalo O. Genetics of microenvironmental sensitivity of body weight in rainbow trout (
*Oncorhynchus mykiss*) selected for improved growth. PLoS One. 2012;7:e38766.View ArticlePubMedPubMed CentralGoogle Scholar - Baras E, Jobling M. Dynamics of intracohort cannibalism in cultured fish. Aquacult Res. 2002;33:461–79.View ArticleGoogle Scholar
- Mulder HA, Bijma P, Hill WG. Selection for uniformity in livestock by exploiting genetic heterogeneity of residual variance. Genet Sel Evol. 2008;40:37–59.PubMedPubMed CentralGoogle Scholar
- Mulder H, Hill W, Vereijken A, Veerkamp R. Estimation of genetic variation in residual variance in female and male broiler chickens. Animal. 2009;3:1673–80.View ArticlePubMedGoogle Scholar
- Sae-Lim P, Kause A, Janhunen M, Vehviläinen H, Koskinen H, Gjerde B, et al. Genetic (co) variance of rainbow trout (
*Oncorhynchus mykiss*) body weight and its uniformity across production environments. Genet Sel Evol. 2015;47:46.View ArticlePubMedPubMed CentralGoogle Scholar - Ros M, Sorensen D, Waagepetersen R, Dupont-Nivet M, SanCristobal M, Bonnet JC, et al. Evidence for genetic control of adult weight plasticity in the snail
*Helix aspersa*. Genetics. 2004;168:2089–97.View ArticlePubMedPubMed CentralGoogle Scholar - Rowe S, White IM, Avendano S, Hill WG. Genetic heterogeneity of residual variance in broiler chickens. Genet Sel Evol. 2006;38:617–35.View ArticlePubMedPubMed CentralGoogle Scholar
- Wolc A, White IM, Avendano S, Hill WG. Genetic variability in residual variation of body weight and conformation scores in broiler chickens. Poult Sci. 2009;88:1156–61.PubMedGoogle Scholar
- Ibáñez-Escriche N, Moreno A, Nieto B, Piqueras P, Salgado C, Gutiérrez JP. Genetic parameters related to environmental variability of weight traits in a selection experiment for weight gain in mice; signs of correlated canalised response. Genet Sel Evol. 2008;40:279–93.PubMedPubMed CentralGoogle Scholar
- Ibáñez-Escriche N, Varona L, Sorensen D, Noguera JL. A study of heterogeneity of environmental variance for slaughter weight in pigs. Animal. 2008;2:19–26.View ArticlePubMedGoogle Scholar
- Sonesson A, Ødegård J, Ronnegard L. Genetic heterogeneity of within-family variance of body weight in Atlantic salmon (
*Salmo salar*). Genet Sel Evol. 2013;45:41.View ArticlePubMedPubMed CentralGoogle Scholar - Marjanovic J, Mulder H, Khaw H, Bijma P. Genetic parameters for uniformity of harvest weight in the gift strain of nile tilapia estimated using double hierarchical generalized linear models. In: Proceedings of the international symposium on genetics in aquaculture XII, 21–27 June 2015; Santiago de Compostela; 2015. http://isga2015.acuigen.es/isga-2015-Abstract-Book.pdf.
- Khaw HL, Ponzoni RW, Yee HY, bin Aziz MA, Mulder HA, Marjanovic J, et al. Genetic variance for uniformity of harvest weight in Nile tilapia (
*Oreochromis niloticus*). Aquaculture. 2016;451:113–20.View ArticleGoogle Scholar - Mulder HA, Bijma P, Hill WG. Prediction of breeding values and selection response with genetic heterogeneity of environmental variance. Genetics. 2007;175:1895–910.View ArticlePubMedPubMed CentralGoogle Scholar
- Sae-Lim P, Gjerde B, Nielsen HM, Mulder H, Kause A. A review of genotype-by-environment interaction and micro-environmental sensitivity in aquaculture species. Rev Aquacult. 2015;8:369–93.View ArticleGoogle Scholar
- Marjanovic J, Mulder HA, Khaw HL, Bijma P. Genetic parameters for uniformity of harvest weight and body size traits in the GIFT strain of Nile tilapia. Genet Sel Evol. 2016;48:41.View ArticlePubMedPubMed CentralGoogle Scholar
- Falconer DS, Mackay TFC. Introduction to quantitative genetics. 4th ed. London: Pearson; 1996.Google Scholar
- Meuwissen THE, Hayes BJ, Goddard ME. Prediction of total genetic value using genome-wide dense marker maps. Genetics. 2001;157:1819–29.PubMedPubMed CentralGoogle Scholar
- Goddard M, Hayes B, Meuwissen THE. Genomic selection in farm animal species-lessons learnt and future perspectives. In: Proceedings of the 9th world congress on genetics applied to livestock production, 1–6 August 2010; Leipzig. 2010.Google Scholar
- Misztal I, Aggrey SE, Muir WM. Experiences with a single-step genome evaluation. Poult Sci. 2013;92:2530–4.View ArticlePubMedGoogle Scholar
- Misztal I, Legarra A, Aguilar I. Computing procedures for genetic evaluation including phenotypic, full pedigree, and genomic information. J Dairy Sci. 2009;92:4648–55.View ArticlePubMedGoogle Scholar
- Legarra A, Aguilar I, Misztal I. A relationship matrix including full pedigree and genomic information. J Dairy Sci. 2009;92:4656–63.View ArticlePubMedGoogle Scholar
- Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81:559–75.View ArticlePubMedPubMed CentralGoogle Scholar
- Gilmour AR, Gogel BJ, Cullis BR, Thompson R. ASReml User Guide Release 4.0. Hemel Hempstead: VSM International Ltd; 2012.Google Scholar
- VanRaden PM. Efficient methods to compute genomic predictions. J Dairy Sci. 2008;91:4414–23.View ArticlePubMedGoogle Scholar
- Aguilar I, Misztal I, Johnson DL, Legarra A, Tsuruta S, Lawlor TJ. Hot topic: a unified approach to utilize phenotypic, full pedigree, and genomic information for genetic evaluation of Holstein final score1. J Dairy Sci. 2010;93:743–52.View ArticlePubMedGoogle Scholar
- Christensen OF, Lund MS. Genomic prediction when some animals are not genotyped. Genet Sel Evol. 2010;42:2.View ArticlePubMedPubMed CentralGoogle Scholar
- Calus MPL, Vandenplas J. calc_grm—a program to compute pedigree, genomic, and combined relationship matrices. Animal Breeding and Genomics Centre: Wageningen; 2015.Google Scholar
- Hill WG, Mulder HA. Genetic analysis of environmental variation. Genet Res (Camb). 2010;92:381–95.View ArticleGoogle Scholar
- Rönnegård L, Felleki M, Fikse F, Mulder H, Strandberg E. Genetic heterogeneity of residual variance—estimation of variance components using double hierarchical generalized linear models. Genet Sel Evol. 2010;42:8.View ArticlePubMedPubMed CentralGoogle Scholar
- Felleki M, Lee D, Lee Y, Gilmour AR, Rönnegård L. Estimation of breeding values for mean and dispersion, their variance and correlation using double hierarchical generalized linear models. Genet Res (Camb). 2012;94:307–17.View ArticleGoogle Scholar
- Lande R. On comparing coefficients of variation. Syst Zool. 1977;26:214–7.View ArticleGoogle Scholar
- Box GE, Cox DR. An analysis of transformations. J R Stat Soc Ser B Stat Methodol. 1964;26:211–52.Google Scholar
- Sakia R. The Box–Cox transformation technique: a review. Statistician. 1992;41:169–78.View ArticleGoogle Scholar
- R Development Core Team. R: a language and environment for statistical computing. Vienna: The R Foundation for Statistical Computing; 2011.Google Scholar
- Hoaglin DC, Welsch RE. The hat matrix in regression and ANOVA. Am Stat. 1978;32:17–22.Google Scholar
- Felleki M, Lundeheim N. Genetic control of residual variance for teat number in pigs. Proc Assoc Advmt Anim Breed Genet. 2013;20:538–41.Google Scholar
- Mulder HA, Visscher J, Fablet J. Estimating the purebred–crossbred genetic correlation for uniformity of eggshell color in laying hens. Genet Sel Evol. 2016;48:39.View ArticlePubMedPubMed CentralGoogle Scholar
- Sell-Kubiak E, Wang S, Knol EF, Mulder HA. Genetic analysis of within-litter variation in piglets’ birth weight using genomic or pedigree relationship matrices. J Anim Sci. 2015;93:1471–80.View ArticlePubMedGoogle Scholar
- Mulder HA, Crump RE, Calus MPL, Veerkamp RF. Unraveling the genetic architecture of environmental variance of somatic cell score using high-density single nucleotide polymorphism and cow data from experimental farms. J Dairy Sci. 2013;96:7306–17.View ArticlePubMedGoogle Scholar
- Veerkamp RF, Mulder HA, Thompson R, Calus MPL. Genomic and pedigree-based genetic parameters for scarcely recorded traits when some animals are genotyped. J Dairy Sci. 2011;94:4189–97.View ArticlePubMedGoogle Scholar
- Yang Y, Christensen O, Sorensen D. Analysis of a genetically structured variance heterogeneity model using the Box–Cox transformation. Genet Res (Camb). 2011;93:33–46.View ArticleGoogle Scholar
- Daetwyler HD, Villanueva B, Woolliams JA. Accuracy of predicting the genetic risk of disease using a genome-wide approach. PLoS One. 2008;3:e3395.View ArticlePubMedPubMed CentralGoogle Scholar
- Colwell DJ, Gillett JR. 66.49 Spearman versus Kendall. Math Gaz. 1982;66:307–9.View ArticleGoogle Scholar