Phenotypic data
Phenotypic data on Atlantic salmon were collected from a family experiment that was carried out at the fish laboratory of the Norwegian University of Life Sciences (NMBU) in Aas, Norway. Details on this experiment are in Dvergedal et al. [4]. Broodstock from AquaGen’s breeding population (22 males and 23 females) were used to generate 23 families. To ensure clearly contrasted family groups with respect to growth potential and feed efficiency, divergent parents were selected based on high and low estimated breeding values (EBV) for growth in seawater.
Prior to the start of feeding, multiple families were kept in separate compartments within the same tank, with five tanks required to house all the families. Based on parentage assignment, 100 family members were identified for each of the 23 families and reared together in a single tank from the start of feeding until the start of the feed conversion test. Prior to the 12-day test, families were allocated to tanks, 50 fish per tank and two tanks per family (except for nine tanks in which the number of fish varied from 42 to 54 due to mortality prior to the start of the experiment and a counting mistake), for a total of 2281 fish. Families were fed a fishmeal-based diet labeled with the stable isotopes 15N and 13C at inclusion levels of 2 and 1%, respectively, as described by Dvergedal et al. [4]. Feed conversion rate was recorded at the family level.
Phenotypic data were recorded individually for WG, RG, AMC, AMN, ALC, ALN, and AAC, as described by Dvergedal et al. [4], resulting in phenotypes for 2249 to 2280 fish per trait. Muscle, liver, and adipose samples from each individual were collected in cryotubes and snap-frozen in liquid nitrogen for stable isotope analysis. The sampling procedure and determination of atom % 15N and 13C in the samples are explained in detail in Dvergedal et al. [4]. The stable isotope analysis was carried out at the Institute for Energy Technology (Kjeller, Norway).
From the individual (\(i\)) phenotypes of AMC and AMN, individual isotope-based indicator traits for FCR and FER (IFCR and IFER, respectively), i.e. \({IFCR\_AMC}_{i}\), \({IFCR\_AMN}_{i}\), \({IFER\_AMC}_{i}\), and \({IFER\_AMN}_{i}\), were derived as follows (taking 15N as an example):
$${IFCR\_AMN}_{i}=\frac{{FW}_{i}*{APE}_{Ni} }{{FW}_{i}-{IW}_{i}} \,\,{\text {and}}\,\, {IFER\_AMN}_{i}=\frac{{FW}_{i}-{IW}_{i}}{{FW}_{i}* {APE}_{Ni}},$$
where \({IW}_{i}\) and \({FW}_{i}\) is the initial and final weight of fish \(i\), \({APE}_{Ni}={(AMN}_{i}-IA\%)\), with \(IA\)% equal to 0.370% for 15N and 1.087% for 13C. After a diet switch, the atom percentage in excess (APE) of a stable isotope in muscle tissue is expected to be proportional to the fraction of newly synthesized nutrients in the muscle, and the product of APE and final weight is expected to be proportional to the mass of new nutrients in the body tissue. Because IFCR is expected to be proportional to the amount of newly deposited body nutrients per g increase in body weight, fish that exchange a larger fraction of the body mass per unit of growth will be less feed-efficient. Exchange of body tissue is traceable with stable-isotope profiling and is related to the feed intake of the individual, the denominator of the ratio is weight gain, and the ratio between these two variables equals IFCR or the inverse equals IFER.
Genotypic data
When the fish reached 5 to 10 g, they were pit-tagged with a 2 \(\times\) 12 mm unique glass tag (RFID Solutions, Hafrsfjord, Norway), and a fin-clip was collected from 2300 fish for DNA-extraction and genotyping. Fin clips (20 mg) were incubated in lysis buffer and treated with proteinase K (20 µg/ml) at 56 ℃ overnight. The following day, DNA was isolated from the lysate at Biobank AS (Hamar, Norway) using the Sbeadex livestock kit (LGC Genomics) according to the manufacturer’s protocol (Thermo Fisher Scientific). DNA concentration was measured using a Nanodrop 8000 (Thermo Fisher Scientific). All fish were genotyped using AquaGen’s custom Axiom®SNP (single-nucleotide polymorphism) genotyping array from Thermo Fisher Scientific (former Affymetrix) (San Diego, CA, USA). This SNP-chip contains 56,177 SNPs that were originally identified based on Illumina HiSeq reads (10–15 × coverage) from 29 individuals from AquaGen’s breeding population. Genotyping was done at CIGENE (Aas, Norway). Genotypes were called from the raw data using the Axiom Power Tools software from Affymetrix. Individuals with a Dish-QC score lower than 0.82, and/or a call-rate lower than 0.97, and/or more than 10% missing genotypes were removed from further analyses. Also, SNPs with a minor allele frequency (MAF) lower than 1%, and with a missing call rate greater than 10% were removed. After filtering, 54,200 SNPs were included in the analysis.
Association analysis
Associations between each SNP and the phenotypes related to nitrogen and carbon metabolism (AMC, AMN, ALC, ALN, and AAC), growth (WG, RG), and indicator traits of feed efficiency (IFCR_AMC, IFCR_AMN, IFER_AMC, and IFER_AMN) were tested by using a linear mixed-model algorithm implemented in the GCTA software [26]. The leave-one-chromosome-out option (–mlm-loco) was used, in which the chromosome that contains the tested SNP was left out when building the genetic relationship matrix (GRM). The linear mixed model used can be written:
$$Y_{i} = a + bx + g_{i}^{ - } + \varepsilon _{i} ,$$
where \({Y}_{i}\) is the phenotypes of individual \(i\) for one of the evaluated traits, \(a\) is the intercept, \(b\) is the fixed regression for the SNP to be tested for association, \(x\) is the SNP genotype indicator variable, coded as 0, 1 or 2, \({g}_{i}^{-}\) is the random polygenic effect for individual \(i\), assumed to be distributed as ~ \(N({\bf {0}}, \mathbf{G}{\sigma }_{g}^{2}\)), where \(\mathbf{G}\) is the genomic relationship matrix, computed using the genotyped SNPs on all chromosomes except on the chromosome on which the tested SNP is located, \({\sigma }_{g}^{2}\) is the variance of the polygenic effect, and \({\varepsilon }_{i}\) is a random residual. In this algorithm, \({\sigma }_{g}^{2}\) is re-estimated each time a chromosome is left out from the calculation of the GRM. The level of significance was evaluated using a built-in likelihood-ratio test. The threshold value for 5% genome-wide significance was derived using the Bonferroni correction as 0.05/54,200 = 9.23 \(\times\) 10–7, corresponding to a −log10 p-value (p) of 6.03. The number of SNPs on each chromosome was used to calculate 5% chromosome-wide significance levels. The Bonferroni correction is known to be overly conservative especially when applied to correlated SNP data, i.e., to SNPs that are in linkage disequilibrium, which can produce an excess of false-negative results [27]. Manhattan plots were used to visualize the −log10 (p) of SNPs across the chromosomes (n = 29) (Figs. 1 and 2) and QQ-plots were used to visualize the distribution of observed versus expected genome-wide −log10 (p) (Figs. 3 and 4).
RNA extraction and sequencing
RNA was extracted from the liver of 184 fish, at ~ 10 months of age (four fish per tank), using the RNeasy Plus Universal Kit (Qiagen, Hilden, Germany) according to the manufacturer’s protocol. The RNA concentration was determined using a Nanodrop 8000 (Thermo Fisher Scientific, Waltham, USA), and RNA quality was determined using a 2100 Bioanalyzer with the RNA 6000 Nano Kit (Agilent Technologies, Santa Clara, USA). All samples had an RNA integrity number greater than 8. Sequencing libraries were constructed using the TruSeq Stranded mRNA Library Prep Kit (Illumina, San Diego, USA) according to the manufacturer’s instructions. Sequencing was performed on an Illumina Hiseq 2500 at the Norwegian Sequencing Center (Oslo, Norway), using 100 bp single-end sequencing. All raw fastq-files have been deposited on ArrayExpress under accession number E-MTAB-8305. Reads were trimmed, aligned to the salmon genome (ICSASG_v2), and counted using the bcbio-nextgen pipeline (https://github.com/bcbio/bcbio-nextgen) and the NCBI salmon genome annotation (https://salmobase.org/Downloads/Salmo_salar-annotation.gff3).
RNAseq analysis
The EdgeR software [28] was used to identify genes that had an expression in the liver associated with the nitrogen and carbon metabolism liver traits (ALN and ALC), using the following linear regression model:
$$~Y_{{Fij}} = ~\mu _{F} + b_{1} x_{j} + b_{2} \left( {\frac{{\left( {FW_{i} ~ - ~IW_{i} } \right)}}{{FW_{i} }}} \right) + \varepsilon _{{Fij}} ,$$
where \({Y}_{Fij}\) is the expression of gene \(i\) for fish \(j\) in family \(F\), \({\mu }_{F}\) is the intercept for each family, \({b}_{1}\) is the fixed regression coefficient of the phenotype for the trait (\({x}_{j}\)= ALC or ALN of fish \(j\)), and \({b}_{2}\) is the fixed regression coefficient of the ratio of weight gain (final weight (\(FW\))—initial weight (\(IW\))) to \(FW\), to account for the effect of growth on gene expression, and \({\varepsilon }_{Fij}\) is a random residual. The trait phenotypes were scaled and centered (mean = 0 and SD = 1), such that the resulting regression slopes could be compared between traits. Genes with an expression that had significant regression coefficients on the trait at a false discovery rate (FDR) corrected p-value (q) < 0.05 were classified as trait-associated genes (TAG). TAG were analyzed for over-representation in KEGG pathways using the “kegga” function in the limma R-package [29]. Genes that encoded transcription factor proteins were identified by using the R-package SalMotifDB (https://salmobase.org/apps/SalMotifDB) [30], which interacts with a database of transcription factors for salmonids.