- Research article
- Open access
- Published:

# Genome-wide association study and accuracy of genomic prediction for teat number in Duroc pigs using genotyping-by-sequencing

*Genetics Selection Evolution*
**volume 49**, Article number: 35 (2017)

## Abstract

### Background

The number of teats in pigs is related to a sow’s ability to rear piglets to weaning age. Several studies have identified genes and genomic regions that affect teat number in swine but few common results were reported. The objective of this study was to identify genetic factors that affect teat number in pigs, evaluate the accuracy of genomic prediction, and evaluate the contribution of significant genes and genomic regions to genomic broad-sense heritability and prediction accuracy using 41,108 autosomal single nucleotide polymorphisms (SNPs) from genotyping-by-sequencing on 2936 Duroc boars.

### Results

Narrow-sense heritability and dominance heritability of teat number estimated by genomic restricted maximum likelihood were 0.365 ± 0.030 and 0.035 ± 0.019, respectively. The accuracy of genomic predictions, calculated as the average correlation between the genomic best linear unbiased prediction and phenotype in a tenfold validation study, was 0.437 ± 0.064 for the model with additive and dominance effects and 0.435 ± 0.064 for the model with additive effects only. Genome-wide association studies (GWAS) using three methods of analysis identified 85 significant SNP effects for teat number on chromosomes 1, 6, 7, 10, 11, 12 and 14. The region between 102.9 and 106.0 Mb on chromosome 7, which was reported in several studies, had the most significant SNP effects in or near the *PTGR2*, *FAM161B*, *LIN52*, *VRTN*, *FCF1*, *AREL1* and *LRRC74A* genes. This region accounted for 10.0% of the genomic additive heritability and 8.0% of the accuracy of prediction. The second most significant chromosome region not reported by previous GWAS was the region between 77.7 and 79.7 Mb on chromosome 11, where SNPs in the *FGF14* gene had the most significant effect and accounted for 5.1% of the genomic additive heritability and 5.2% of the accuracy of prediction. The 85 significant SNPs accounted for 28.5 to 28.8% of the genomic additive heritability and 35.8 to 36.8% of the accuracy of prediction.

### Conclusions

The three methods used for the GWAS identified 85 significant SNPs with additive effects on teat number, including SNPs in a previously reported chromosomal region and SNPs in novel chromosomal regions. Most significant SNPs with larger estimated effects also had larger contributions to the total genomic heritability and accuracy of prediction than other SNPs.

## Background

A sufficient number of teats is necessary for a sow to rear its piglets to weaning age. Many putative QTL (quantitative trait loci) for teat number have been reported on most of the porcine chromosomes, but most of these were detected using microsatellite markers and lacked specific gene targets [1]. Genome-wide association studies (GWAS) using single nucleotide polymorphisms (SNPs) and analyses of candidate genes have identified several specific gene targets that affect teat number in swine. A GWAS using 42,654 SNPs on 936 Large White pigs reported 39 QTL with 211 significant SNP effects on teat number [2]. Among those SNP effects, the region between 102.0 and 105.2 Mb on chromosome 7 had the most significant effects and the percentage of the genetic variance explained by SNPs in this region ranged from 0.04 to 2.51%. Within this region, the *VRTN* and *PROX2* genes were identified as the most convincing candidate genes. The chromosomal locations of the significant SNPs that were detected in this GWAS differed from all previously reported QTL for teat number that have been compiled in the animal QTL database [1]. Another GWAS, using 32,911 SNPs on 1550 Large White pigs, reported 21 QTL with additive effects on chromosomes 6, 7 and 12, one QTL with a dominant effect on chromosome 4, and identified *VRTN* as the most promising candidate gene for teat number [3]. A third GWAS using 41,647 SNPs on 1657 Large White pigs found 65 significant SNPs on chromosomes 1, 2, 7, 8, 12 and 14, including SNPs in the region 102.9 between 105.2 Mb on chromosome 7 [4]. A fourth GWAS using 39,778 SNPs identified the *VRTN* gene with pleiotropic and desirable effects on thoracic vertebral number, teat number and carcass (body) length across four pig populations, and showed that, of all SNPs on chromosome 7, a SNP within the *VRTN* gene had the most significant effect on teat number in Duroc pigs [5]. Among all significant SNPs that have been detected for teat number by GWAS, the significance of the *VRTN* gene on chromosome 7 achieved the widest consensus and has been identified as a strong candidate gene for teat number [2–5]. However, in the literature some discrepancies regarding the most significant location and many SNP effects in other genomic regions have been reported. A GWAS using the porcine 60 K SNP chip on a F2 population from a cross between Landrace and Korean pigs identified highly significant SNPs on chromosome 7 that were more than 40 Mb away from the *VRTN* gene [6], and in another GWAS using 36,588 SNPs and 1024 Duroc pigs, the most significant SNPs on chromosome 7 were found 2 to 3 Mb downstream of the *VRTN* region [7]. However, other than for the *VRTN* region, there is little consensus among the GWAS results on genomic regions that affect teat number [2–4, 6, 7]. Therefore, additional studies are needed to identify the genetic factors that affect swine teat number. Furthermore, it is unclear what the impact of the highly significant SNPs is on the accuracy of genomic prediction for teat number.

The objective of this study was to identify genetic factors that affect teat number in pigs, evaluate the accuracy of genomic prediction, and evaluate the contribution of significant genes and genomic regions to the heritability and accuracy of genomic prediction using 41,108 autosomal SNPs from genotyping-by-sequencing (GBS) on 2936 Duroc boars.

## Methods

### Animals, phenotyping, and genotyping-by-sequencing

Animal and phenotype data used for this study were provided by Guangdong Wen’s Foodstuff Group (Guangdong, China). The study population included 2936 Duroc boars born from September 2011 to September 2013 in 1456 litters from 79 sires with one to three piglets per litter, and all pigs were managed at a single nucleus farm. The left and right teats were counted separately within 48 h after birth and only normal teats were recorded. In this study, the phenotype used for ‘teat number’ was the total number of teats that was equal to the sum of the left and right normal teats. The ‘mean ± (standard deviation)’ of teat number was 10.72 ± 1.72. The phenotypic values followed a near bell-shaped distribution (Fig. 1), which was similar to the leptokurtic distribution of teat number that is observed for Landrace and Large White pigs [8], with most animals (1992 out of 2936) having 10 or 11 teats.

Genomic DNA was extracted from ear tissue of all 2936 Duroc boars and quantified using a Qubit 2.0 Fluorometer. DNA concentrations were normalized to 50 ng/ml in 96-well plates. A two-enzyme i.e. *EcoR*I and *Msp*I genotyping-by-sequencing (GBS) was used. A set of 96 forward barcoded adapters with an *EcoR*I overhang were designed by the GBS Barcode Generator (http://www.deenabio.com/services/gbs-adapters), and the reverse adapter with a *Msp*I overhang was designed according to [9]. DNA samples (150 ng each) were digested with *EcoR*I and *Msp*I, then ligated to the designed adapters. Following adapter ligation, samples were pooled in 96-plex and size-selected using two cycles of purification with Agencourt AMPure XP Beads (Beckman Coulter, Pasadena, CA). The purified libraries were amplified by PCR and then sequenced on Illumina NextSeq500 by a 90-bp single-end sequencing. SNP genotypes were called according to the pipeline implemented in Tassel 5.0 with default parameters [10], and Beagle 4.0 was used to impute missing SNP genotypes. A total of 90,051 SNPs were identified for the population used in this study. SNP filtering was based on the following criteria: only SNPs that had a minor allele frequency higher than 5%, for which the frequency of the least frequent homozygous genotype was at least 0.01, and passed the Hardy–Weinberg equilibrium test (p ≥ 10^{−6}) were retained. Among the autosomal SNPs, 41,108 SNPs satisfied these requirements and were used for analyses.

### GWAS analysis

Three single-SNP methods were used for the GWAS analysis: a *t* test of additive and dominance SNP effects using a generalized least squares (GLS) analysis that takes intraclass correlation of sibs into account and is implemented in the EPISNP2 program [11, 12], and the least squares (LS) analyses of additive effects by PLINK [13] and EPISNP1 [11] with population stratification correction using the first 50 dimensions from multidimensional scaling (MDS) as covariates. We report the PLINK and EPISNP1 results using the first 35 MDS dimensions for stratification correction because the genomic inflation factor [14] and the patterns of Manhattan plots of SNP significance stabilized when fitting the first 35 MDS dimensions (Fig. 2).

The statistical model for the EPISNP2 analysis was:

where \({\mathbf{y}}\) is the vector of phenotypic values, \({\mathbf{b}}\) is the vector of fixed year-month effects, \({\mathbf{X}}_{\text{b}}\) is the incidence matrix for \({\mathbf{b}}\), \({\mathbf{g}}\) is the vector of the effects of SNP genotypes, \({\mathbf{X}}\) is the incidence matrix of \({\mathbf{g}}\), \({\mathbf{f}}\) is the vector of random family effects with a common variance \(\upsigma_{\text{f}}^{2}\) for sibs in the same family, and \({\mathbf{Z}}\) is the incidence matrix of \({\mathbf{f}}\). The variance–covariance matrix of the family effects was assumed to be \({\mathbf{G}} = {\text{Var}}\left( {\mathbf{f}} \right) = {\mathbf{I}}\upsigma_{\text{f}}^{2}\), where \({\mathbf{I}}\) is an identity matrix, and the phenotypic variance–covariance matrix is \({\text{Var}}\left( {\mathbf{y}} \right) = {\mathbf{V}} = {\mathbf{ZGZ}}^{\prime } + {\mathbf{I}}\upsigma_{\text{e}}^{2}\) [12].

The statistical model for the PLINK analysis was:

where \({\mathbf{b}}_{1}\) is the vector of fixed effect(s) of the MDS dimension(s), \({\mathbf{X}}_{1}\) is the matrix of the MDS dimension(s) as calculated by PLINK from the SNP matrix of identity-by-state [13], α is the additive SNP effect, and \({\mathbf{x}}\) is a column vector of genotype codes for α created by PLINK.

The statistical model for EPISNP1 analysis was:

where the matrices have the same definitions as in the previous two models. Significance tests for additive and dominance SNP effects by EPISNP1 and EPISNP2 were implemented by *t* tests for the additive and dominance contrasts of the estimated SNP genotypic values [11, 12, 15].

### Genomic heritability and accuracy of genomic prediction

Genomic heritability and genomic prediction were estimated by using a mixed model with additive and dominance effects as described previously [16–18]. Briefly, the mixed model for heritability estimation and genomic prediction was:

with \({\text{Var}}\left( {\mathbf{y}} \right) = {\mathbf{V}} = {\mathbf{ZA}}_{\text{g}} {\mathbf{Z}}^{\prime }\upsigma_{\upalpha}^{2} + {\mathbf{ZD}}_{\text{g}} {\mathbf{Z}}^{\prime }\upsigma_{\updelta}^{2} + {\mathbf{I}}\upsigma_{\text{e}}^{2}\), where \({\mathbf{Z}}\) is an incidence matrix allocating phenotypic observations to each individual, \({\mathbf{a}}\) is the vector of genomic additive (breeding) values, \({\mathbf{d}}\) is the vector of genomic dominance values or dominance deviations, \({\mathbf{A}}_{\text{g}}\) is a genomic additive relationship matrix calculated from the SNPs, \({\mathbf{D}}_{\text{g}}\) is a genomic dominance relationship matrix calculated from the SNPs, \(\upsigma_{\upalpha}^{2}\) is the additive variance, \(\upsigma_{\updelta}^{2}\) is the dominance variance, and \(\upsigma_{\text{e}}^{2}\) is the residual variance. The \({\mathbf{A}}_{\text{g}}\) and \({\mathbf{D}}_{\text{g}}\) matrices were calculated using Definition II of genomic relationships implemented by the GVCBLUP package, and variance components of additive, dominance and random residual values were estimated by genomic restricted maximum likelihood estimation (GREML) using the GREML_CE program in the GVCBLUP package [18]. The genomic heritability was defined as: \({\text{h}}_{\upalpha}^{2} =\upsigma_{\upalpha}^{2} /\upsigma_{\text{y}}^{2}\), i.e. the narrow-sense heritability, \({\text{h}}_{\updelta}^{2} =\upsigma_{\updelta}^{2} /\upsigma_{\text{y}}^{2}\), i.e. the dominance heritability, and \({\text{h}}_{\text{t}}^{2} = {\text{h}}_{\upalpha}^{2} /{\text{h}}_{\updelta}^{2}\), i.e. the broad-sense heritability, where \(\upsigma_{\text{y}}^{2} =\upsigma_{\upalpha}^{2} +\upsigma_{\updelta}^{2} +\upsigma_{\text{e}}^{2}\) is the phenotypic variance. The genomic best linear unbiased prediction (GBLUP) of additive, dominance and genetic values of individuals in the training and validation samples were calculated at the last iteration of the GREML.

A tenfold validation study was conducted to evaluate the prediction accuracy. The 2936 Duroc boars were randomly divided into 10 validation datasets of 293 individuals except the 10th sample, which included 299 individuals. For each of the 10 validation analyses, phenotypic observations in the validation dataset were omitted in the GBLUP calculation. Three measures of prediction accuracy were calculated and compared: \(\widehat{\text{R}}_{{0{\text{jp}}}} = {\text{corr}}\left( {\widehat{\text{g}}_{{0{\text{j}}}} ,\;{\text{y}}_{0} } \right)\), which is the observed accuracy of predicting the phenotypic values in the validation population and is calculated as the correlation between the estimated genetic values (\(\widehat{\text{g}}_{{0{\text{j}}}}\)) and the phenotypic observations (\({\text{y}}_{0}\)) of validation individuals, averaged across all validation datasets; \({\text{R}}_{{0{\text{j}}}} = {\text{corr}}\left( {\widehat{\text{g}}_{{0{\text{j}}}} ,\;{\text{g}}_{{0{\text{j}}}} } \right)\), which is the expected accuracy of predicting the true genetic values (\({\text{g}}_{{0{\text{j}}}}\)) of individuals in the validation population and is calculated as the square root of the reliability estimate for each individual from the GVCBLUP package [18], where ‘\({\text{j}} =\upalpha\)’ indicates additive prediction, ‘\({\text{j}} =\updelta\)’ indicates dominance prediction, and ‘\({\text{j}} = {\text{t}}\)’ indicates prediction of total genetic value; and \({\text{R}}_{{0{\text{jp}}}} = {\text{R}}_{{0{\text{j}}}} \sqrt {{\text{h}}_{\text{j}}^{2} }\), which is the expected accuracy of predicting the phenotypic values, where \({\text{h}}_{\text{j}}^{2}\) is the genomic narrow-sense (\({\text{j}} =\upalpha\)), dominance (\({\text{j}} =\updelta\)), or broad-sense (\({\text{j}} = {\text{t}}\)) heritability. The accuracy of predicting phenotypic values was previously termed as ‘predictive ability’ [19] to distinguish it from ‘expected prediction accuracy’ of predicting genetic values (\({\text{R}}_{{0{\text{j}}}}\) in this study). The formula of the expected accuracy of predicting phenotypic values, \({\text{R}}_{{0{\text{jp}}}} = {\text{R}}_{{0{\text{j}}}} \sqrt {{\text{h}}_{\text{j}}^{2} }\), is a slightly different form of the relationship between ‘predictive ability’ and ‘prediction accuracy’ [19]. The mathematical difference between \({\text{R}}_{{0{\text{jp}}}}\) and \({\text{R}}_{{0{\text{j}}}}\) is in the denominators of these two measures: the denominator of \({\text{R}}_{{0{\text{jp}}}}\) is the phenotypic standard deviation, whereas the denominator of \({\text{R}}_{{0{\text{j}}}}\) is the genetic standard deviation, which is necessarily smaller than the phenotypic standard deviation in the presence of non-zero residual variance. Therefore, \({\text{R}}_{{0{\text{j}}}}\) is the upper limit of \({\text{R}}_{{0{\text{jp}}}}\) but this upper limit may not hold for the observed \({\text{accuracy}}\) of predicting phenotypic values (\(\widehat{\text{R}}_{{0{\text{jp}}}}\)) due to unknown variations in the data or genetic mechanisms that are not explained by the statistical model. The observed accuracy of predicting genetic values can only be defined when the true genetic values such as simulated genetic values are known [16] but they could not be defined in this study because the true genetic values were unknown.

### Contribution of significant SNPs and genomic regions to total heritability and prediction accuracy

The contributions of each SNP to additive, dominance and total heritability can be estimated [18]. However, as shown by the results of this study, the contribution of each SNP is affected by the number of SNPs in the mixed model: the larger is the number of SNPs, the smaller is the contribution of each SNP. To avoid this dependency in the mixed model for GREML and to estimate each SNP’s independent contribution to genomic heritability and prediction accuracy, we used the approach of ‘partial heritability’ and ‘partial accuracy’ based on differences in heritability and prediction accuracy between a full model and a reduced model. The full model fits all SNPs as random effects and the reduced model fits the target SNP or SNPs as fixed effects to remove their effects from the phenotypic values. The reduced model was as follows:

where \({\mathbf{s}}\) is a column vector of fixed SNP effects and \({\mathbf{X}}\) is the incidence matrix of \({\mathbf{s}}\). The phenotypic variance–covariance matrix was assumed to be the same as in the full model, i.e., \({\text{Var}}\left( {\mathbf{y}} \right) = {\mathbf{V}} = {\mathbf{ZA}}_{\text{g}} {\mathbf{Z}}^{\prime }\upsigma_{\upalpha}^{2} + {\mathbf{ZD}}_{\text{g}} {\mathbf{Z}}^{\prime }\upsigma_{\updelta}^{2} + {\mathbf{I}}\upsigma_{\text{e}}^{2}\), but variance components were estimated under the reduced model. Let \(\widehat{\text{h}}^{2}\) \((\widehat{\text{h}}_{\text{i}}^{2} )\) be the estimated heritability from the full (reduced) model, and \({\text{R}}_{0}\) \(\left( {{\text{R}}_{{0{\text{i}}}} } \right)\) a measure of prediction accuracy for the full (reduced) model. Then, the relative contribution of the \(i\)th SNP or the \(i\)th set of SNPs to the total heritability was calculated as \({\text{c}}_{\text{hi}}^{2} = 1 - \widehat{\text{h}}_{\text{i}}^{2} /\widehat{\text{h}}^{2}\), and the relative contribution of the \(i\)th SNP or the \(i\)th set of SNPs to the prediction accuracy as \({\text{c}}_{\text{ri}} = 1 - {\text{R}}_{{0{\text{i}}}} /{\text{R}}_{0}\).

## Results

### Genomic heritability and prediction accuracy

The estimate of genomic narrow-sense heritability (\(\widehat{\text{h}}_{\upalpha}^{2}\)) was 0.365 ± 0.030, of dominance heritability (\(\widehat{\text{h}}_{\updelta}^{2}\)) was 0.035 ± 0.019, of broad-sense heritability (\({\text{h}}_{\text{t}}^{2}\)) was 0.400 ± 0.034, and the estimate of narrow-sense heritability for the mixed model with additive effects only, was 0.368 ± 0.030, which is slightly higher than the corresponding estimate for the mixed model with additive and dominance effects (Table 1). The observed accuracy of predicting phenotypic values from the tenfold validation study was 0.437 ± 0.064 for the mixed model with additive and dominance SNP effects \((\widehat{\text{R}}_{{0{\text{tp}}}}\), Model 1A in Table 2), and was 0.435 ± 0.064 for the mixed model with additive effects only \((\widehat{\text{R}}_{{0\upalpha{\text{p}}}}\), Model 2A in Table 2), which is only 0.46% lower than that from the mixed model with additive and dominance effects. These slight differences in both heritability and accuracy of prediction between the additive model and the model with additive and dominance effects indicates that additive SNP effects were the primary genetic effects that affect teat number and that dominance SNP effects only had a negligible contribution to the prediction accuracy for teat number.

### GWAS results

The GWAS that was done with EPISNP2, which accounted for the sib intraclass correlation and was implemented by a GLS analysis [11, 12], identified 73 SNPs on chromosomes 1, 6, 7, 10, 11, 12 and 14 with additive effects but no SNP with dominance effects reached genome-wide significance with the Bonferroni multiple testing correction (p < 10^{−5.91}) (Fig. 3a, b; Table 3; Additional file 1: Table S1). LS analysis of PLINK [13] and EPISNP1 [11] with stratification correction using the first 35 dimensions of MDS as fixed covariates, identified 54 and 21 significant SNPs, respectively (Fig. 3c, d; Additional file 1: Table S1). Twelve SNPs detected by PLINK and two SNPs detected by EPISNP1 did not overlap with the SNPs detected by EPISNP2. Eighteen SNPs detected by EPISNP1 overlapped with those detected by EPISNP2 and PLINK. For this dataset, EPISNP1 was the most conservative for declaring significance. We report SNPs detected by EPISNP2 because they all had a substantial contribution to the broad-sense genomic heritability (see Additional file 1: Table S1). A graphical view of the GWAS results obtained by EPISNP2, PLINK and EPISNP1 for all autosomes is in Additional file 2: Figure S1.

To evaluate the impact of the significant SNPs on the phenotypic variance, we estimated the decreases in observed genomic narrow-sense heritability and prediction accuracy when the phenotypic values were adjusted for the estimated genotypic values of the significant SNPs. The results showed that the 85 significant SNPs identified by the three methods, i.e. EPISNP2, PLINK and EPISNP1, accounted for 28.5 to 28.8% of the genomic narrow-sense heritability (Table 1) and for 36.2 to 36.8% of the observed prediction accuracy (Model 1A and Model 2A in Table 2). These results show that many SNPs that were deemed insignificant by the GWAS analysis were relevant for genomic prediction of teat number. Each of the 85 SNPs had a relatively large contribution to the genomic heritability and prediction accuracy, with the contribution of each SNP to the observed genomic narrow-sense heritability ranging from 0.7 to 7.3% and relative contribution of each SNP to the observed prediction accuracy ranging from 0.5 to 5.6% (see Additional file 1: Table S1).

### Analysis of the region between 102.9 and 106.0 Mb on chromosome 7

A cluster of 14 SNPs within or near the *PTGR2*, *FAM161B*, *LIN52*, *VRTN*, *FCF1*, *AREL1* and *LRRC74A* genes in the region between 102.9 and 106.0 Mb on chromosome 7 had the most significant effects on teat number with genome-wide significance (Fig. 4a). Based on the GLS analysis of EPISNP2, the two SNPs upstream of *PTGR2* had the most significant additive effects, followed by the three SNPs within and upstream of *AREL1*, whereas the LS analysis of PLINK and EPISNP1 with stratification correction ranked the three *AREL1* SNPs as the most significant and the two SNPs upstream of *PTGR2* in the 6th and 7th positions (see Additional file 1: Table S1). The six most significant SNPs in the region between 102.9 and 103.8 Mb on chromosome 7 accounted for 7.4% of the genomic additive heritability and 7.0% of the observed prediction accuracy in the tenfold validation study (Table 3), and all the 14 SNPs in this region with genome-wide significance accounted for 10.0% of the genomic narrow-sense heritability and 8.0% of the observed prediction accuracy. Removal of the genotypic effects of the 14 SNPs by fitting these SNPs as fixed effects in the model for EPISNP2 removed all significant effects in the region between 102.9 and 103.8 Mb on chromosome 7 and also removed the significant effects of seven SNPs in the region between 116.1 and 117.4 Mb on chromosome 7 (Fig. 4b). SNPs within or near the *AREL1*, *PTGR2*, *FMA161B*, *LIN52* and *LRRC74A* genes also had the largest contributions to the genomic narrow-sense heritability and prediction accuracy (Fig. 4c). We did not detect SNPs within the *VRTN* gene but a SNP upstream of and nearest to *VRTN* (S7_103355294) was highly significant, ranking 6th based on EPISNP2 and 5th based on PLINK and EPISNP1. Linkage disequilibrium (LD) analysis using Haploview [20] showed that the two SNPs flanking *VRTN* were in strong LD (\({\text{D}}^{\prime } = \, 0.90\), Fig. 4d), implying that either of these two SNPs could also be in strong LD with *VRTN*. Therefore, assuming that *VRTN* is a causal gene, the significant effect of S7_103355294 could be a linked effect of *VRTN*. The LD analysis showed that the significant effects of the region between 116.1 and 117.4 Mb on chromosome 7 could also be due to LD with the region between 102.9 and 106.0 Mb since three of the seven SNPs in the former region were in low LD with five significant SNPs in the latter region (\({\text{D}}^{\prime } = \, 0.13\;{\text{to}}\;0.24\), Fig. 4d). This did not consider the possibility of multilocus LD between the two regions. The low LD was the only known reason that could explain the disappearance of the significant QTL effects of the region between 116.1 and 117.4 Mb when the 14 SNPs in the region between 102.9 and 106.0 Mb were fitted as fixed effects.

### Other chromosomes with significant SNPs

In addition to chromosome 7, SNPs on chromosomes 1, 6, 10, 11, 12 and 14 also had significant effects with genome-wide significance (Table 3; Additional file 1: Table S1). Among these chromosomes, the region between 77.3 and 79.7 Mb on chromosome 11 had the most significant additive effects. The top six SNPs within or near the *FGF14*, *BIVM*, *LOC10216759,* and *LOC102167785* genes accounted for 5.1% of the genomic narrow-sense heritability and 5.2% of the observed prediction accuracy (Table 3). The remaining regions on chromosomes 1, 7, 11 and 12 each with at least six SNPs accounted for 0.6 to 3.1% of the genomic narrow-sense heritability and accounted for 1.2 to 2.6% of the observed prediction accuracy (Table 3).

## Discussion

### Comparison with previous GWAS results

The region between 102.9 and 106.0 Mb on chromosome 7 that was identified in the current study was also reported in several previous GWAS but with varying lengths, e.g., between 102.1 and 105.2 Mb [2], 102.9 and 105.2 Mb [4], 103.0 and 103.6 Mb [3], the *VRTN* gene [5], the *VRTN*–*PROX2*–*FOS* region that is equivalent to the region between 103.4 and 104.3 Mb based on a microsatellite study [21], and the region between 106.7 and 106.9 Mb [7]. Five significant SNPs on chromosome 10 were located close to some previously reported significant regions on this chromosome [2], and a significant SNP at 51.6 Mb on chromosome 12 was close to the previously reported chromosome 12 region between 52.9 and 52.6 Mb [2, 4]. In the current study, the region between 77.7 and 79.7 Mb on chromosome 11 was the second most significant chromosome region, which, to our knowledge, has not been reported in previous GWAS. However, the Animal QTLdb [1] has one entry for pig teat number in the region between 79.2 and 85.7 Mb on chromosome 11, which partially overlaps the region between 77.7 and 79.7 Mb that we detected in this study, and this Animal QTLdb result was based on a QTL mapping study using 137 microsatellite markers on 573 F2 females and 530 F2 males from a Meishan × Large White cross [22].

### Heritabilities and factors that affect teat number

The genomic narrow-sense heritability estimates reported in the current study are the only ones available for teat number. Estimated narrow-sense heritabilities ranged from 0.346 to 0.350 (Table 1) and were within the range of recently published heritability estimates based on pedigree relationships, e.g., 0.39 in a study using 57,000 Yorkshire pigs [23], and 0.37 in a study using 1550 Landrace pigs [3]. The above genomic and pedigree-based heritability estimates indicate that swine teat number has a strong genetic component, and also that a large portion of the phenotypic variation is not explained by additive genetic effects. Only one GWAS reported dominance effects on chromosome 4 in Landrace pigs [3]. Our GWAS results differ from the reported dominance effects on chromosome 4 because we found that many other chromosomes had more significant dominance effects than chromosome 4, although none of the dominance effects that we observed reached the genome-wide significance threshold of p < 10^{−5.91} (Fig. 3b). The estimated dominance heritability was low (0.036) and inclusion of dominance effects in the prediction model only had negligible effects on the observed prediction accuracy. Removing dominance effects from the mixed model resulted only in a 0.5% reduction in the observed accuracy of predicting phenotypic values, although the reductions in the expected accuracy of predicting phenotypic values (7.6%) and in the expected accuracy of predicting genetic values (3.9%) were considerably larger for unknown reasons (Table 2). Previously, a maternal effect on teat number was reported [24] but has not been studied by GWAS or genomic prediction. Data error is a source of phenotypic variation, but teat number is easy to measure and any data errors that might have occurred would only be minor, given that our GWAS results on chromosome 7 agreed with those of other studies [2, 3] and that our genomic heritability estimates are consistent with those based on pedigree data [3, 23].

### SNP contributions to genomic heritability and prediction accuracy

We analyzed three methods (Methods I, II and III) of estimating the contribution of a set of SNPs to genomic heritability and compared Methods II and III for estimating the contributions of SNPs to both genomic heritability and prediction accuracy. Method I for estimating the contribution of SNPs to genomic heritability consisted of summing together the heritability estimates of the target set of SNPs because the narrow-sense, dominance and broad-sense heritabilities for each SNP can be estimated individually by the GVCBLUP package [18]. However, this method was inappropriate for estimating SNP contributions to the phenotypic variance because the heritability estimate for each SNP decreases as the number of SNPs in the mixed model increases and is approximately proportional to \(1/{\text{m}}\), where \({\text{m}}\) is the number of SNPs in the mixed model (see “Appendix” for an approximate proof).

The results in Table 4 show the dependency of the size of the heritability estimate on the number of SNPs fitted as random effects in the mixed model. When dropping every other SNP from the model (i.e. reducing from 41,108 to 20,554 SNPs), the average \({\text{h}}_{{\upalpha{\text{i}}}}^{2}\) (narrow-sense heritability of the \(i\)th SNP) of each SNP for the same 20,554 SNPs when all 41,108 SNPs were fitted in the model nearly doubled (\({\bar{\text{h}}}_{{\upalpha1}}^{2} /{\bar{\text{h}}}_{{\upalpha2}}^{2} = \, 1.960\)), while the total narrow-sense heritability was nearly unaffected (\(\widehat{\text{h}}_{{\upalpha1}}^{2} = \, 0.360\) and \(\widehat{\text{h}}_{{\upalpha3}}^{2} = \, 0.368\)), showing that the size of the heritability estimate for each SNP was approximately divided by 2 when the model had twice as many SNPs. Therefore, the heritabilities of the significant SNPs are not suitable for measuring their contributions to the phenotypic variance due to the dependency of the size of the heritability estimate for each SNP on the number of SNPs fitted as random effects in the mixed model.

Method II for estimating the contribution of a set of target SNPs to genomic heritability and prediction accuracy consisted in calculating the difference between the model with all SNPs and the model without the target SNPs. However, removing the target SNPs from the statistical model may not completely remove their effects because some of them could be explained by other SNPs in LD with the target SNPs. The results in Table 4 supported this expectation, i.e., the total narrow-sense heritability when halving the number of SNPs fitted was nearly unaffected. Using Method II, the 85 significant SNPs accounted for 4.5% of the total genomic heritability (Table 1) and 2.5% of the observed prediction accuracy (Model 3A, Table 2). Due to the partial effects of the removed SNPs that could have been explained by other SNPs, the contributions of SNPs to genomic heritability and observed prediction accuracy using Method II can be considered as the lower bound of the SNP contributions.

Method III for estimating SNP contribution to genomic heritability and to prediction accuracy calculated the difference between the model with all SNPs fitted as random effects and the model with the target SNPs fitted as fixed effects, an approach that we refer to as ‘partial heritability’ and ‘partial accuracy’. The example of the region between 102.9 and 106.3 Mb on chromosome 7 showed that fitting significant SNPs as fixed effects completely removed the significant effects of those SNPs and also removed the effects of the SNPs that are still fitted in the statistical model as random effects but are in LD with the SNPs fitted as fixed effects (Fig. 4b). Figure 5 is a graphical view of a specific chromosome region, showing that the contributions of SNPs to the total narrow-sense heritability from the two models with 41,108 SNPs and 20,554 SNPs estimated using Method III were nearly the same for the region between 102.9 and 106.0 Mb on chromosome 7, i.e., partial heritability estimates were nearly unaffected by the number of SNPs in the model. Using this method, the 85 significant SNPs accounted for 28.5 to 28.8% of the genomic narrow-sense heritability (Table 1) and for 36.2 to 36.8% of the observed prediction accuracy (Table 2). In general, contributions of SNPs to genomic heritability and to the observed prediction accuracy were consistent, i.e., most SNPs with higher contributions to heritability also had greater contributions to prediction accuracies. On average, the contributions of SNPs obtained with Method III were larger than those with Method II by 24.0% (4.6 to 28.6%) for the genomic additive heritability and by 33.3% (2.3 to 35.6%) for the observed prediction accuracy (Table 2). Such large differences could be due to two reasons: overestimation by Method III and underestimation by Method II. Overestimation by Method III is expected since some effects that do not come from the target SNPs that are fitted as fixed effects could also be removed, in addition to removing the effects of the target SNPs. e.g., fitting the 14 SNPs in the region between 102.9 and 106.0 Mb on chromosome 7 as fixed effects also removed the significant effects of SNPs in the region between 116.1 and 117.4 Mb on chromosome 7 (Fig. 4b). Therefore, the contributions of SNPs estimated by Method III could be considered as the upper bound of the true SNP contributions. However, underestimation of Method II is likely the main reason for the large differences between Methods II and III, because a large percentage of the effects of the removed SNPs could have been explained by other SNPs in the model. For the sample in Table 4, the effects of half of the 41,108 SNPs were almost completely explained by the remaining half of the 41,108 SNPs because estimates of genomic heritabilities from those two sets of SNPs were almost the same, as we discussed above. Based on this analysis, we report contributions of SNPs by Method III in the abstract but also show the results obtained with Method II in the main body of the article (Tables 1, 2).

### Observed and expected prediction accuracies

In the current study, we compared three measures of prediction accuracy: the observed prediction accuracy of predicting phenotypic values based on the correlation between predictions from GBLUP and phenotypic observations of the validation individuals (\(\widehat{\text{R}}_{{0{\text{jp}}}} ,\;{\text{j}} =\upalpha\;{\text{or}}\; {\text{t}}\)), the expected accuracy of predicting phenotypic values (\({\text{R}}_{{0{\text{jp}}}}\)), and the expected accuracy of predicting genetic values (\({\text{R}}_{{0{\text{j}}}}\)). For the models using all 41,108 SNPs fitted as random effects (Model 1A and Model 2A in Table 2), we found excellent consistency between the observed accuracies of predicting phenotypic values (\(\widehat{\text{R}}_{{0{\text{tp}}}} = \, 0.437\) for Model 1A and \(\widehat{\text{R}}_{{0\upalpha{\text{p}}}} = \, 0.435\) for Model 2A) and the expected accuracies of predicting phenotypic values (\({\text{R}}_{{0{\text{tp}}}} = \, 0.460\) for Model 1A and \({\text{R}}_{{0\upalpha{\text{p}}}} = \, 0.425\) for Model 2A). For the models using the 85 SNPs as fixed effects to remove the genetic values of those SNPs from the phenotypic values (Model 1B and Model 2B in Table 2), some differences between the observed accuracies of predicting phenotypic values (\(\widehat{\text{R}}_{{0{\text{tp}}}} = 0.279\) for Model 1B and \(\widehat{\text{R}}_{{0\upalpha{\text{p}}}} = 0.275\) for Model 2B) and between the expected accuracies of predicting phenotypic values (\({\text{R}}_{{0{\text{tp}}}} = \, 0.360\) for Model 1B and \({\text{R}}_{{0\upalpha{\text{p}}}} = \, 0.320\) for Model 2B) were larger (Table 2), but those differences were mostly within one standard deviation of the observed accuracies and should be considered as acceptable. As expected, both observed and expected accuracies of predicting phenotypic values (\(\widehat{\text{R}}_{{0{\text{jp}}}} \;{\text{and}}\;{\text{R}}_{{0{\text{jp}}}}\)) were lower than the expected accuracies of predicting genetic values (\({\text{R}}_{{0{\text{j}}}}\)).

## Conclusions

Swine teat number has a strong genetic component with narrow-sense heritability estimates of about 0.365. The GWAS results confirmed the previously reported region on chromosome 7 and identified several new regions associated with swine teat number; they also indicated that the additive effects are the primary genetic effects for teat number and indicated consistency between statistical significance of SNP effects and SNP contribution to the genomic heritability. Most SNPs with higher statistical significance also had greater contributions to the genomic broad-sense heritability and prediction accuracy. The 85 significant SNPs accounted for about 28% of the genomic heritability and 36% of the prediction accuracy.

## References

Hu ZL, Park CA, Wu XL, Reecy JM. Animal QTLdb: an improved database tool for livestock animal QTL/association data dissemination in the post-genome era. Nucleic Acids Res. 2013;41:D871–9.

Duijvesteijn N, Veltmaat JM, Knol EF, Harlizius B. High-resolution association mapping of number of teats in pigs reveals regions controlling vertebral development. BMC Genomics. 2014;15:542.

Lopes MS, Bastiaansen JW, Harlizius B, Knol EF, Bovenhuis H. A genome-wide association study reveals dominance effects on number of teats in pigs. PLoS One. 2014;9:e105867.

Verardo LL, Silva FF, Lopes MS, Madsen O, Bastiaansen JW, Knol EF, et al. Revealing new candidate genes for reproductive traits in pigs: combining Bayesian GWAS and functional pathways. Genet Sel Evol. 2016;48:9.

Yang J, Huang L, Yang M, Fan Y, Li L, Fang S, et al. Possible introgression of the

*VRTN*mutation increasing vertebral number, carcass length and teat number from Chinese pigs into European pigs. Sci Rep. 2016;6:19240.Lee JB, Jung EJ, Park HB, Jin S, Seo DW, Ko MS, et al. Genome-wide association analysis to identify SNP markers affecting teat numbers in an F2 intercross population between Landrace and Korean native pigs. Mol Biol Rep. 2014;41:7167–73.

Arakawa A, Okumura N, Taniguchi M, Hayashi T, Hirose K, Fukawa K, et al. Genome-wide association QTL mapping for teat number in a purebred population of Duroc pigs. Anim Genet. 2015;46:571–5.

Clayton GA, Powell JC, Hiley PG. Inheritance of teat number and teat inversion in pigs. Anim Prod. 1981;33:299–304.

Poland JA, Brown PJ, Sorrells ME, Jannink JL. Development of high-density genetic maps for barley and wheat using a novel two-enzyme genotyping-by-sequencing approach. PLoS One. 2012;7:e32253.

Glaubitz JC, Casstevens TM, Lu F, Harriman J, Elshire RJ, Sun Q, et al. TASSEL-GBS: a high capacity genotyping by sequencing analysis pipeline. PLoS One. 2014;9:e90346.

Ma L, Runesha HB, Dvorkin D, Garbe J, Da Y. Parallel and serial computing tools for testing single-locus and epistatic SNP effects of quantitative traits in genome-wide association studies. BMC Bioinformatics. 2008;9:315.

Ma L, Wiggans GR, Wang S, Sonstegard TS, Yang J, Crooker BA, et al. Effect of sample stratification on dairy GWAS results. BMC Genomics. 2012;13:536.

Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81:559–75.

Freedman ML, Reich D, Penney KL, McDonald GJ, Mignault AA, Patterson N, et al. Assessing the impact of population stratification on genetic association studies. Nat Genet. 2004;36:388–93.

Mao Y, London NR, Ma L, Dvorkin D, Da Y. Detection of SNP epistasis effects of quantitative traits using an extended Kempthorne model. Physiol Genomics. 2006;28:46–52.

Da Y, Wang C, Wang S, Hu G. Mixed model methods for genomic prediction and variance component estimation of additive and dominance effects using SNP markers. PLoS One. 2014;9:e87666.

Wang C, Da Y. Quantitative genetics model as the unifying model for defining genomic relationship and inbreeding coefficient. PLoS One. 2014;9:e114484.

Wang C, Prakapenka D, Wang S, Pulugurta S, Runesha HB, Da Y. GVCBLUP: a computer package for genomic prediction and variance component estimation of additive and dominance effects. BMC Bioinformatics. 2014;15:270.

Legarra A, Robert-Granié C, Manfredi E, Elsen JM. Performance of genomic selection in mice. Genetics. 2008;180:611–8.

Barrett JC, Fry B, Maller J, Daly M. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics. 2005;21:263–5.

Ren DR, Ren J, Ruan GF, Guo YM, Wu LH, Yang GC, et al. Mapping and fine mapping of quantitative trait loci for the number of vertebrae in a White Duroc × Chinese Erhualian intercross resource population. Anim Genet. 2012;43:545–51.

Bidanel J, Rosendo A, Iannuccelli N, Riquet J, Gilbert H, Caritez J, Billon Y, Amigues Y, Prunier A, Milan D. Detection of quantitative trait loci for teat number and female reproductive traits in Meishan × Large White F2 pigs. Animal. 2008;2:813–20.

Lundeheim N, Chalkias H, Rydhmer L. Genetic analysis of teat number and litter traits in pigs. Acta Agric Scand A. 2013;63:121–5.

Pumfrey RA, Johnson RK, Cunningham PJ, Zimmerman DR. Inheritance of teat number and its relationship to maternal traits in swine. J Anim Sci. 1980;50:1057–60.

Da Y, Tan C, Parakapenka D. Joint SNP-haplotype analysis for genomic selection based on the invariance property of GBLUP and GREML to duplicate SNPs. J Anim Sci. 2016;94:161–2.

## Authors’ contributions

XH, YD, NL and CT designed the experiments. ZW, DL, RZ and XH performed data collection. CT, JR and ZH performed the experiments. CT and YD analyzed the data. DP developed some of the analysis software. YD, CT and XHu wrote the manuscript. All authors read and approved the final manuscript.

### Funding

This project was supported by the National Basic Research Program of China (2014CB138501), the 948 Program of the Ministry of Agriculture of China (2012-G1(4)), and the National High Technology Research and Development Program of China (2011AA100301).

### Competing interests

The authors declare that they have no competing interests.

### Availability of data and materials

The datasets supporting the results of this article are included within the article and its additional files. The phenotypic data and original SNP data are private property of Guangdong Wens Foodstuffs Group and are not for public distribution at the present time.

### Ethics statement

All animal work was conducted according to the guidelines for the care and use of experimental animals established by the Ministry of Science and Technology of the People’s Republic of China (Approval Number: 2006-398). Ear tissues were collected using the standard method approved by the Animal Welfare Committee of China Agricultural University (Permit Number: SKLAB-2014-04-02).

## Author information

### Authors and Affiliations

### Corresponding authors

## Additional files

### 12711_2017_311_MOESM1_ESM.xlsx

**Additional file 1: Table S1.** Significant SNP effects with Bonferroni significance (p < 10^{−5.91}) by three methods of GWAS analysis.

### 12711_2017_311_MOESM2_ESM.pdf

**Additional file 2: Figure S1.** Manhattan plots of additive SNP effects of all 18 autosomes by three methods of GWAS analysis. All p-values in the figures are in log(1/p) scale.

## Appendix: Approximate proof for the decrease in SNP heritability as the number of SNPs increases

### Appendix: Approximate proof for the decrease in SNP heritability as the number of SNPs increases

The heritability estimate for each SNP decreases as the number of SNPs in the mixed model increases and is approximately proportional to \(1/{\text{m}}\), where \({\text{m}}\) is the number of SNPs in the mixed model. An approximate mathematical proof for this result can be derived based on the invariance property of GBLUP and GREML to duplicating SNPs. Assuming a set of \({\text{m}}\) SNPs is duplicated \({\text{r}}\) times in the mixed model, GBLUP of genetic values (additive, dominance and genotypic values) of individuals and SNP genetic variance components, as well as the associated variance estimates by GREML are invariant to the duplication of SNPs, and GBLUP of SNP additive, dominance and genotypic effects differ from those without duplicate SNPs by the square root of \({\text{r}}\) [25]. In the example of additive SNP effects, \(\widehat{\upalpha}_{\text{ri}} = \widehat{\upalpha}_{\text{i}} /\sqrt {\text{r}}\), where \(\widehat{\upalpha}_{\text{ri}}\) is the additive GBLUP estimate of the \(i\)th SNP from the mixed model with \({\text{m}}\) SNPs repeated \({\text{r}}\) times, \(\widehat{\upalpha}_{\text{i}}\) is the additive GBLUP estimate of the \(i\)th SNP from the model with \({\text{m}}\) SNPs. The additive heritability for the \(i\)th SNP from the mixed model with \({\text{m}}\) SNPs (\({\text{h}}_{{\upalpha{\text{i}}}}^{2}\)) is: \({\text{h}}_{{\upalpha{\text{i}}}}^{2} = \left( {\widehat{\upalpha}_{\text{i}}^{2} /\sum\nolimits_{{{\text{i}} = 1}}^{\text{m}} {\widehat{\upalpha}_{\text{i}}^{2} } } \right){\text{h}}_{\upalpha}^{2}\) [18], where \(\widehat{\upalpha}_{\text{i}}\) is the additive GBLUP of the \(i\)th SNP and \({\text{h}}_{\upalpha}^{2}\) is the total additive heritability based on all SNPs. Since \({\text{h}}_{\upalpha}^{2}\) is unaffected by repeated SNPs, the additive heritability for the \(i\)th SNP from the mixed model with \({\text{m}}\) SNPs repeated \({\text{r}}\) times (\({\text{h}}_{{\upalpha{\text{ri}}}}^{2}\)) is:

i.e., the heritability for the \(i\)th SNP from the model with \({\text{r}}\) times of the \({\text{m}}\) SNPs is \(1/{\text{r}}\) of the SNP heritability with \({\text{m}}\) SNPs in the model without repeat. This theoretical result under the simple assumption of repeated SNPs was almost the same as the results obtained with the real data in Table 4, i.e., when the number of SNPs is reduced by half, the heritability of each SNP as an average of all SNP heritability estimates nearly doubled.

## Rights and permissions

**Open Access** This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

## About this article

### Cite this article

Tan, C., Wu, Z., Ren, J. *et al.* Genome-wide association study and accuracy of genomic prediction for teat number in Duroc pigs using genotyping-by-sequencing.
*Genet Sel Evol* **49**, 35 (2017). https://doi.org/10.1186/s12711-017-0311-8

Received:

Accepted:

Published:

DOI: https://doi.org/10.1186/s12711-017-0311-8