Phenotypic records
Phenotype data were available for 2404 Atlantic salmon smolts from 118 families (i.e. progeny of 118 dams and 40 sires) from the breeding population of Salmones Chaicas, Xth Region, Chile. These individuals were experimentally challenged with C. rogercresseyi. The average number of fish per family was 22 and ranged from 9 to 24, and the average weight of fish was 274.9 g (SD = 90.6 g). The fish were tagged with passive integrated transponders, acclimated, and then distributed to three replicate tanks. The challenge test was carried out as described previously [14–16]. Briefly, infestation with the parasite was carried out by using 13 to 24 copepodids per fish and stopping the water flow for 6 h after infestation. The challenge lasted 6 days; then, fish were euthanized and fins from each fish were collected and fixed for processing and lice counting. The resistance trait was defined as the count of sessile lice per fish on all fins after the infestation period, since that is highly representative of the total lice count on the fish [14]. Experimental tank and final body weight were recorded for each fish.
Genotypes
Genotype data were available from a previous study [16]. Briefly, genomic DNA was extracted from fin clips using a commercial kit (DNeasy Blood and Tissue, Qiagen), quality controlled and quantified. All phenotyped fish were genotyped using a 50 K SNP Affymetrix® Axiom® myDesign™ Genotyping Array designed by Aquainnovo and the University of Chile. More details about the SNPs included in the 50 K array are in Correa et al. [30] and Yáñez et al. [31, 32].
The genotypes were quality controlled using the Affymetrix Genotyping Console and the SNPolisher R package following the Axiom® Genotyping Solution Data Analysis Guide [33]. Additional quality control steps were conducted by filtering out SNPs with a Hardy–Weinberg equilibrium test p value less than 0.00001, an SNP call rate lower than 0.95 and a minor allele frequency lower than 0.01.
Estimation of breeding values
The conventional pedigree-based approach, P-BLUP, was used as the control for genomic evaluations, and EBV for each individual were obtained using a linear mixed model implemented in BLUPF90 [34]. The model used was as follows:
$${\mathbf{y}} = {\mathbf{X}}{\boldsymbol{\upbeta}} + {\mathbf{Tg}} + {\mathbf{e}},$$
where y is a vector of phenotypes (lice count on fins), β is a vector of fixed effects (mean, tank and body weight effects), g is a vector of random additive polygenic genetic effects that follows a normal distribution ~\(N({\mathbf{0, A}}\sigma_{{\mathbf{g}}}^{2} )\), X and T are incidence matrices, A is the pedigree additive relationship matrix, and e is the residual [35].
The genomic EBV (GEBV) for each individual were estimated using three different GS models: G-BLUP, Bayesian Lasso and Bayes C. G-BLUP was as implemented in the BLUPF90 software [34]. G-BLUP is a modification of the BLUP method, where the numerator relationship matrix A is replaced by a genomic relationship matrix G, as described by VanRaden [36].
For Bayesian methods, marker and additive polygenic effects were estimated jointly using the following model implemented in the GS3 software [37]:
$${\mathbf{y}} = {\mathbf{X}}{\boldsymbol{\upbeta }} + {\mathbf{Z}}{\boldsymbol{\upalpha }} + {\mathbf{Tg}} + {\mathbf{e}},$$
where α is a vector of random marker allele substitution effects and Z is the corresponding incidence matrix. The Gibbs sampler was run using 100,000 iterations with a burn-in of 20,000 iterations. Priors were drawn from double—exponential and scaled—inverse χ
2 distributions, for Bayesian Lasso and Bayes C, respectively. Bayesian GEBV were estimated as the sum of the polygenic and marker effects.
Pedigree (P-BLUP) and genomic (G-BLUP) heritabilities were calculated using the AIREMLF90 software [34] as follows: \(h^{2} = \sigma_{{\mathbf{g}}}^{2} /(\sigma_{{\mathbf{g}}}^{2} + \sigma_{{\mathbf{e}}}^{2} )\), where \(\sigma_{{\mathbf{g}}}^{2}\) is the estimate of the additive genetic variance and \(\sigma_{{\mathbf{e}}}^{2}\) is the estimate of the residual variance.
Comparison of models
The different models were compared using a fivefold cross validation scheme. Briefly, all genotyped and phenotyped animals were randomly separated into five validations sets, which were predicted one at a time by masking their phenotypes and using the remaining animals as training set to estimate the marker effects. Thus, for each validation run, the dataset was split into a training set (80%) and a validation set (20%). To reduce the stochastic effects, this cross-validation analysis was replicated 10 times. Accuracy was used to assess the performance of each model and was estimated as:
$$\varvec{R}_{{\varvec{EBV},\varvec{BV}}} = \varvec{ }\frac{{\varvec{R}_{{\varvec{EBV},\varvec{ y}}} }}{\varvec{h}},$$
where \(\varvec{R}_{{\varvec{EBV},\varvec{ y}}}\) is the correlation between the EBV of a given model (predicted for the validation set using information from the training set) and the actual phenotype, while h is the square root of the pedigree-based estimate of heritability [26, 38]. To test prediction accuracies obtained by using various SNP densities, 0.5, 1, 5, 10 and 25 K SNPs were selected from the full set as follows. First, 500 SNPs that had a level of linkage disequilibrium (LD, measured as r2) less than 0.2 and were homogeneously distributed across the genome were selected among the SNPs that passed quality control. Then, SNPs with a homogeneous distribution across the genome were added to the first 500 SNPs to create the 1 K panel. The same procedure was reiterated to create the 5, 10 and 25 K panels. Accuracies were then calculated for each model and SNP density, compared to those obtained with the P-BLUP model, and the relative increase in accuracy was assessed.