High-resolution analysis of selection sweeps identified between fine-wool Merino and coarse-wool Churra sheep breeds

Gutiérrez-Gil, Beatriz; Esteban-Blanco, Cristina; Wiener, Pamela; Chitneedi, Praveen Krishna; Suarez-Vega, Aroa; Arranz, Juan-Jose

doi:10.1186/s12711-017-0354-x

Research Article
Open access
Published: 07 November 2017

High-resolution analysis of selection sweeps identified between fine-wool Merino and coarse-wool Churra sheep breeds

Beatriz Gutiérrez-Gil ORCID: orcid.org/0000-0001-7990-5723¹,
Cristina Esteban-Blanco^1,2,
Pamela Wiener³,
Praveen Krishna Chitneedi¹,
Aroa Suarez-Vega¹ &
…
Juan-Jose Arranz¹

Genetics Selection Evolution volume 49, Article number: 81 (2017) Cite this article

7299 Accesses
18 Citations
4 Altmetric
Metrics details

Abstract

Background

With the aim of identifying selection signals in three Merino sheep lines that are highly specialized for fine wool production (Australian Industry Merino, Australian Merino and Australian Poll Merino) and considering that these lines have been subjected to selection not only for wool traits but also for growth and carcass traits and parasite resistance, we contrasted the OvineSNP50 BeadChip (50 K-chip) pooled genotypes of these Merino lines with the genotypes of a coarse-wool breed, phylogenetically related breed, Spanish Churra dairy sheep. Genome re-sequencing datasets of the two breeds were analyzed to further explore the genetic variation of the regions initially identified as putative selection signals.

Results

Based on the 50 K-chip genotypes, we used the overlapping selection signals (SS) identified by four selection sweep mapping analyses (that detect genetic differentiation, reduced heterozygosity and patterns of haplotype diversity) to define 18 convergence candidate regions (CCR), five associated with positive selection in Australian Merino and the remainder indicating positive selection in Churra. Subsequent analysis of whole-genome sequences from 15 Churra and 13 Merino samples identified 142,400 genetic variants (139,745 bi-allelic SNPs and 2655 indels) within the 18 defined CCR. Annotation of 1291 variants that were significantly associated with breed identity between Churra and Merino samples identified 257 intragenic variants that caused 296 functional annotation variants, 275 of which were located across 31 coding genes. Among these, four synonymous and four missense variants (NPR2_His847Arg, NCAPG_Ser585Phe, LCORL_Asp1214Glu and LCORL_Ile1441Leu) were included.

Conclusions

Here, we report the mapping and genetic variation of 18 selection signatures that were identified between Australian Merino and Spanish Churra sheep breeds, which were validated by an additional contrast between Spanish Merino and Churra genotypes. Analysis of whole-genome sequencing datasets allowed us to identify divergent variants that may be viewed as candidates involved in the phenotypic differences for wool, growth and meat production/quality traits between the breeds analyzed. The four missense variants located in the NPR2, NCAPG and LCORL genes may be related to selection sweep regions previously identified and various QTL reported in sheep in relation to growth traits and carcass composition.

Background

Approximately 5000 years ago, humans began to select sheep for desired characteristics (e.g., coat color, horns, meat, wool) which resulted in the development of different breeds [1]. Initially, sheep were reared mainly for meat; later, specialization for ‘secondary’ products, such as wool, emerged [2,3,4]. Sheep that have been selected for secondary products appear to have replaced the more primitive domestic populations. Selection for such phenotypes has left detectable signatures of selection within the genome of modern sheep. Due to the very strong selection intensity involved in animal breeding, these changes are expected to occur faster than those due to natural selection. Selection not only affects a favored mutation by rapidly increasing its frequency in the population, but it also produces a hitch-hiking effect of the frequency of neutral alleles at linked loci [5, 6]; these patterns in allele frequencies are known as selection signatures.

The signatures of selection in the genome, also known as selection sweeps, can be detected under the assumption that selection is locus-specific whereas other evolutionary forces such as random genetic drift, mutation and inbreeding, should be expressed genome-wide [7]. Hence, a variety of methods and statistics have been developed with the aim of identifying the selected loci at which allele frequencies have changed following a pattern that is consistent with positive selection. They can be based on between-population differentiation, reductions in local variability, deviations in the site frequency spectrum (SFS), and increases in linkage disequilibrium (LD) and extended haplotype structure [8,9,10]. Methods for detecting signatures of selection have historically been challenged by the confounding effects of demography; for example, recent population growth will result in an excess of rare variants compared to equilibrium expectation [11], and also recent and weak bottlenecks tend to mimic the effects of a selection sweep in several ways [12]. However, demographic events apply to the whole genome, whereas selective events affect different regions of the genome to various extents thanks to recombination [13]. This gives the possibility of distinguishing the two hypotheses by sampling several loci: a more or less common pattern is expected in the case of a bottleneck, while selective sweeps generate heterogeneity across loci [12]. Regions of low recombination may also produce an upward bias in the detection of signatures of selection, although apart from the bias issue, it should be noted that genuine selection sweeps in these regions will leave much stronger signals in regions of average recombination rate [14].

Another issue relates to the limitations of some selection mapping approaches; the standard approaches to detect signatures of selection consider “hard sweeps” where the new advantageous mutations spread rapidly to fixation, purging variation at linked sites as they spread. However, recent studies highlight the potential importance of ‘soft sweeps’, i.e., sweeps from standing variation, or sweeps in which multiple mutations start to sweep simultaneously at a single locus [15]. Soft sweeps, which are often related to adaptation, leave more subtle signatures in the genome (e.g. diversity is not necessarily reduced in the vicinity of the adaptive locus as with hard sweeps) and thus are more difficult to detect [16].

In past years, many genome screening studies based on high-density, genome-wide single nucleotide polymorphism (SNP) panels (i.e., SNP-chips) have been conducted with the goal of detecting signatures of selection in livestock species [17,18,19]. More recently, whole-genome re-sequencing has emerged as an economically feasible tool for assessing genomic variation within and among populations, and the large-scale information derived from the new sequencing technologies can be exploited to identify signatures of selection [20] or further explore previously detected signatures of selection.

The Sheep HapMap project, for which genotypes were generated from 3004 domestic sheep from 71 breeds using the Illumina OvineSNP50K BeadChip assay (50 K-chip), generated valuable information that can be used to perform analyses of signatures of selection in sheep [21]. Global analyses of genetic differentiation in the Sheep HapMap dataset identified several genomic regions that contained genes for coat pigmentation, skeletal morphology, body size, growth, and reproduction [21, 22]. Many of these regions were later confirmed by haplotype-based selection sweep mapping [23, 24]. Also based on this dataset, as well as additional information in some cases, signatures of selection have been reported in thin and fat tail sheep breeds [25] and in specialized European dairy sheep breeds [19]. Further selection mapping studies have identified signatures of selection related to resistance/susceptibility to gastrointestinal nematodes [26], adaptation to different ecoregions [27] or climate adaptation [28, 29]. Information from additional studies using the 50 K-chip to study the biodiversity of sheep breeds [30,31,32] can also help to extend our current knowledge on the ovine genomic regions that have been affected by human-driven selection.

In sheep, selection for wool traits has been extensively carried out for several centuries. Spanish Merino, which was developed since the late Middle Ages [33], appears to have originated during Roman times through the introduction of fine wool ewes from the Southern Italian region of Apulia into Spain and the later selection for white wool color through crosses with African rams imported by Arabs [34] at the beginning of the Middle Ages. Due to the value of their fiber, the Honourable Council of the Mesta strongly protected Merino flocks, and their exportation was strictly forbidden for several centuries. Removal of these restrictions in the eighteenth century led to the dispersal of Merino sheep to Eastern Europe, China, Australia and New Zealand [34]. The first Merino sheep were introduced from South Africa into Australia in 1797 [35]. In this country, intensive selective breeding has enhanced the already fine quality of the wool to produce Australian Merino wool, which based on its long, fine fibers, enables the production of lighter and softer wool fabrics. Hence by 1870, Australian Merino wool industry was the global leader in both the quantity and quality of its wool production. However, in addition to wool, Australian Merino also plays an important role in lamb meat production. Over the last two decades, the Australian sheep meat industry has delivered large increases in lamb production and profitability, with genetic improvement in growth, leanness and muscling contributing substantially to these gains [36, 37]. Australian Merino flocks have also been selected for disease resistance, in particular, by focusing on gastrointestinal nematode parasites, flystrike (cutaneous myiasis) and footrot [38]. Specifically, the three Merino lines considered in this paper have been subjected to selection pressure to reduce susceptibility to parasites [21].

The goal of our work was the identification of regions of selection sweeps related to traits for which Australian fine-wool Merino breeds have been selected. Considering the Iberian origin of Merino breeds and the estimated divergence time between sheep breeds derived from the analysis of the Sheep HapMap project dataset [21], we analyzed genome-wide SNP information from three Australian Merino breeds that are highly specialized for the production of fine wool (Australian Industry Merino, Australian Merino and Australian Poll Merino) and the related coarse-wool breed, Spanish Churra. This is an autochthonous double-purpose breed of the northwest region of Castilla y León in Spain. Traditional Churra flocks are managed based on an intermediate level of dairy specialization (the dairy breeding program was started in 1986 [39]) with a variable fraction of the farm income derived from the sale of suckling lamb meat. The milk is used to produce cheese of high quality value, which is covered by a protected geographical indication (PGI) [40].

With the aim of further exploring the genetic variation in genomic regions that show evidence of selection, whole-genome sequence data from Churra and Australian Merino samples were subsequently analyzed. By identifying the SNPs within the regions of interest that exhibited the most extreme divergence in allele frequencies between the Churra and Merino datasets, this study provides a detailed survey of the genetic variation that underlies the identified regions of selection sweeps.

Methods

Mapping of selection sweeps

Breed selection

In order to identify selection sweeps related to fine-wool production, three Merino lines described as “with extreme fine wool” from the International Sheep Genomics (ISGC) dataset were included in this study. According to the divergence times estimated for the breeds included within the Sheep HapMap project, based on LD and haplotype sharing, these three Merino lines show a recent divergence time (0 to 80 generations) (Figure S10 and Fig. 3 from Kijas et al. [21]). With the aim of providing an appropriate comparison to identify signatures of selection related to wool production, we selected the Spanish Churra sheep breed, which is a coarse-wool breed related to Merino, as shown by the population analysis reported by Fariello et al. [22] in which these two breeds are grouped together within the defined South West European group. According to a haplotype-sharing analysis, Churra sheep show a short and consistent divergence time with each of the three Merino lines (160 to 240 years, see Figure S10 and Fig. 3 from Ref [21]), which supports our study design in which the three Australian sheep breeds selected are considered together against Churra sheep. In addition to wool characteristics, Merino and Churra sheep differ in other traits (see Additional file 1: Table S1) (Fig. 1). Briefly, adult animals of the Australian Merino lines are larger than Churra sheep individuals, whereas weight at birth is similar in the two breeds. Both breeds show white wool color although Churra sheep show characteristic black patches around the eyes, ears and the ends of legs. Note that because the two breeds differ in various traits, the signatures of selection identified here may be related not only to wool traits but also to other phenotypes for which the selection pressure performed in the two breeds differs. Considering the possibility that the geographical isolation and distance between the two studied populations could be a confounding effect for the identified selection sweeps, we performed additional validation analyses by contrasting Churra and Spanish Merino breeds.

Genotypes, quality control and analysis of population structure

We included in this work an initial subset of SNP genotypes for the ovine 50 K-chip that were generated within the framework of the Sheep HapMap project [21], and which are available upon request (http://www.sheephapmap.org/termsofaccess.php). The extracted subset included 332 samples from the Australian Industry Merino (n = 88), Australian Merino (n = 50), Australian Poll Merino (n = 98) and Spanish Churra (n = 96) breeds. In addition, 184 DNA samples of Churra sires included in the selection nucleus of the National Association of Spanish Churra Breeders were also genotyped with the same SNP array. These samples were extracted from semen samples following a classical phenol–chloroform DNA extraction protocol [41], and genotyped by an external laboratory service. The raw genotypes for the 54,241 SNPs included in the genotyping platform were first analyzed with the GenomeStudio software (Illumina) (GenCall score for raw genotypes > 0.15) which was used to extract the genotypes in standard format for the Plink_v1.09 software [42].

The HapMap project samples had already been subjected to quality control (QC) filtering [21], resulting in 49,034 SNPs available for analysis. To join the two separate datasets, we first merged the new Churra dataset (n = 184; 54,241 SNPs) and the HapMap dataset (n = 332; 49,304 SNPs) based on the common SNPs. We then selected the SNPs that mapped, with positions based on sheep genome assembly Oar_v3.1 [43], on the ovine autosomes, resulting in 47,415 SNPs. This dataset was then subjected to the following filtering criteria: (1) individual call rate higher than 90% (two Churra individuals genotyped by our group were removed) and (2) marker call rate higher than 90% (28 SNPs removed due to missing genotype data). Hence, 514 individuals (Churra = 278 and Merino = 236) and 47,387 SNPs were available for further analyses.

To evaluate the genetic structure of the data and confirm the number of different genetic populations, the genotypes of the three Merino fine wool breeds and the genotypes of the Churra individuals were analysed by principal component analysis (PCA) of allele sharing (using smartpca, implemented in Eigensoft [44]), and ancestry estimation (Admixture software [45]). The results of these analyses identified two clearly distinct genetic populations, corresponding to the Merino group and Churra sheep, for details on these analyses and description of results (see Additional file 2 and Additional file 3: Figures S1, S2 and S3). Genotypes were pooled into a single Australian Merino dataset for the three Australian Merino populations.

In addition, as a further validation analysis, we compared the genotypes of 20 randomly chosen Churra samples from the Sheep HapMap dataset and the 20 Spanish Merino samples genotyped by Ciani et al. [34].

Identification of candidate regions under selection using individual analyses

Several analyses between the complete set of Spanish Churra and Australian Merino genotypes were performed to detect candidate regions that harbor signatures of selection. First, a genetic differentiation analysis was used to contrast the Australian Merino and Churra genotypes by calculating the unbiased estimate of Weir and Cockerham’s F _ST [8] for each SNP, as described by Akey et al. [46]. In a second analysis, regions of reduced heterozygosity in the two groups were identified by estimating the observed heterozygosity (ObsHtz) for each SNP. For these two analyses, F _ST and ObsHtz values estimated for each SNP were each averaged across a sliding window of nine SNPs (e.g., F _ST_9SNPW). The size of the sliding window was based on a previous analysis by Gutierrez-Gil et al. [19] for a test control region encompassing the myostatin (GDF-8) gene, which is known to have been under selection in the Texel breed. The identification of candidate signatures of selection in each of the individual analyses was based on window estimates at the extreme of the empirical distributions, as suggested by Akey et al. [46] and has been used in a number of subsequent studies [18, 19, 47,48,49]. Specifically, we considered that a position carried a signature of selection if it was in the top 0.5th percent of the distributions for genetic differentiation (F _ST) or the bottom 0.5th percent for observed heterozygosity. The distribution of the physical sizes of windows based on the 9-SNP fixed-size criteria (average window size = 411.71 kb; average distance between central SNPs of consecutive windows = 51.53 kb) was found to be fairly narrow (98.28% of the windows were 200 to 600 kb long; only 0.74% of the windows were longer than 1000 kb) and thus should provide reasonable estimates of local genomic diversity, in contrast to analyses based on a low-density chip [50].

As a complementary approach to map selection sweeps, we used the hapflk_v1.3 software (https://forge-dga.jouy.inra.fr/projects/hapflk), which implements the FLK [51] and hapFLK [23] tests. The FLK metric tests the neutrality of polymorphic markers by contrasting their allele frequencies in a set of populations against what is expected under a neutral evolution scenario. The hapFLK statistic extends the FLK test to account for the differences in haplotype frequencies between populations. This method has been shown to be robust with respect to bottlenecks and migration [23]. To run the hapflk analysis, the Reynolds’ distances between the Churra and Merino populations were converted to a kinship matrix with an R script provided by the hapFLK developers (available at https://forge-dga.jouy.inra.fr/projects/hapflk/documents). Subsequently, by assuming 20 haplotype clusters in the LD model (-K 20; number of haplotype clusters determined by running a fastPHASE cross-validation analysis), the hapFLK statistics were later computed and averaged across 30 EM runs to fit the LD model (–nfit = 30). The standardization of the statistics using the corresponding python script provided with the software allowed the estimation of the associated P values from a standard normal distribution. To correct for multiple testing, we considered the threshold of the nominal P value as < 0.001 to identify the significant haplotypes, following previous studies using hapFLK analysis on the Sheep HapMap dataset [22, 24].

In addition, we used the rehh software [52] to perform an additional analysis based on the cross-population extended haplotype homozygosity (XP-EHH) test defined by Sabeti et al. [53]. This statistic compares the EHH profiles for bi-allelic SNPs between two populations and is defined, for a given allele, as the log of the ratio of the integrals of the EHH profiles between the two populations. The comparison between populations normalizes the effects of large-scale variation in recombination rates on haplotype diversity and has a high statistical power to detect sweeps that are close to fixation [53]. Alleles were designated at each locus as either minor (“1”, “ancestral”) or major (“2”, “derived”), based on their allele frequency in the overall population. Positive and negative XP-EHH estimates indicated positive recent selection in Churra and Merino, respectively. Based on the P values supplied by rehh, and for consistency with the threshold previously used for hapFLK, we considered as significant those positions showing a P value less than 0.001.

For the four selection sweep mapping analyses, positions that showed evidence of selection (i.e. included in the top/bottom 0.5th percent of the corresponding distribution or showing a P value less than 0.001) and within 0.150 Mb of each other were considered to be the result of the same selection sweep and were labeled, depending on the analysis method, as F _ST-SS, Merino-ObsHtz-SS, Churra-ObsHtz-SS, hapfFLK-SS and XP-EHH-SS. This criterion to connect identified SNPs into discrete regions was established based on an exploratory analysis of the extent of LD and the haplotype block structure of the Churra and Merino populations. Based on the results of LD analysis performed with Haploview_v4.2 [54] (for details see Additional file 4 and Additional file 5: Figure S4), and following Tang et al. [55], we initially considered regions of 50 kb (based on the fact that among the identified haplotype blocks, the proportion of blocks of size 50 kb or more was 43.65% and 50.59% in Merino and Churra, respectively) and extended these regions by 50 kb in both directions [based on the fact that the estimation of half-length decay in LD in the two breeds was around 50 kb (see Additional file 4)].

The four selection sweep mapping analyses described above were subsequently performed on the 20 Spanish Churra and 20 Spanish Merino samples selected for the validation analysis, using the same criteria to identify positions showing evidence of selection and to group these positions into selection sweeps.

Identification of shared regions across methods

Considering the results obtained in the Australian Merino versus Churra analyses, those regions showing an overlap between at least one of the two methods based on haplotype analysis (hapFLK and XP-EHH) and at least one of the two other considered methods (F _ST and ObsHz), were labeled as convergence candidate regions (CCR). The coordinates of the identified CCR were compared with previously reported ovine selective sweeps and previously described sheep QTL in these regions based on the Animal QTLdatabase [56]. An initial assessment of possible functional candidate genes that mapped within the identified CCR was performed using Ensembl BioMart [57] to extract the annotated genes for the relevant genomic intervals. The list of extracted genes was later contrasted with the list of 1255 genes provided by Gutierrez-Gil et al. [17], which are candidates for selection in cattle (and other livestock species) due to their known association with physical features (horns, stature, body size and coat color) or production traits (milk production, mastitis, and meat production/quality traits). This list was extended to include 148 candidate genes for wool production/quality, such as those related to hair follicle cycling (reviewed by Stenn and Paus [58]), or identified as associated with wool production/quality by genome-wide association studies (GWAS) [59] or differential expression analysis [60] (see Additional file 6: Table S2).

The same overlapping criteria were applied to the results from the validation analyses and the resulting convergence candidate regions were labelled as CCR_{(Churra20-SpanishMerino20)}.

High-resolution analysis of selection sweep regions

Whole-genome sequencing (WGSeq) data

WGSeq data for 13 Australian Merino and 15 Churra samples were analysed in this study. Below, we summarize the detailed description and source of the analyzed datasets, which are in Additional file 7: Table S3. Briefly, 13 of the Churra samples were sequenced by our research group and ANCHE (National Association of Breeders of Spanish Churra sheep). These samples included males from the selection nucleus of ANCHE with the largest number of daughters in the general commercial population of Spanish Churra dairy sheep. For these samples, the bam files of the reads mapping to the 18 CCR identified in this work are available in the sequence read archive (SRA) repository [24] within the Bioproject PRJNA395499. In addition, we included in our study publicly available WGSeq datasets from two different projects of the SRA repository: (a) sequencing data for two Churra and three Australian Merino samples were obtained from the “Ovis aries diversity study” (PRJNA160933), coordinated by the International Sheep Genomics Consortium as an extension of the Sheep HapMap project; and (b) WGSeq data for 10 Australian Merino samples generated within the “Australian CRC for Sheep Industry Innovation whole-genome sequence collection” project (PRJNA325682) carried out by the Sheep Commonwealth Government’s Cooperative Research Centres (SheepCRC). All sequencing data were generated with paired-end Illumina technology (Illumina HiSeq 2000 and Hiseq 2500 sequencers).

Bioinformatics analysis

For the samples obtained from the SRA repository, the SRA-Toolkit [61] was used to convert the data to FASTQ format. Then, a common workflow was performed for all 28 WGSeq datasets. Following the criteria of Kijas et al. [62] for the identification of high-quality allelic variants within Run 1 of the Sheep genomes project (PRJEB14685 at the European Variant Archive, EVA), we performed the following five steps to identify allelic variants using GATK [63] and Samtools [64] software: (1) quality of the raw reads was assessed with the FastQC program [65]; (2) the low-quality reads were filtered with Trimmomatic [66] using filter options for paired end samples (-phred33, LEADING:5, TRAILING:5 SLIDINGWINDOW:4:20, MINLEN:36 ILLUMINACLIP: Trimmomatic-0.33/adapters/TruSeq 3-PE.fa:2:30:10); (3) alignment of samples against the reference genome OAR_v3.1 [43] with the Burrows-Wheeler aligner (BWA) [67] using the maximal exact matches (mem) mapping function; (4) data manipulation and preliminary statistical analysis using SAMtools [64, 68] (i.e. transformation of.sam files into.bam binary format and removal of non-mapped reads and the estimation of alignment statistics), the Picard program [69] (i.e. sorting reads, removal of duplicate reads and index building) and Genome Analysis ToolKit v3.3.0 (GATK) [63] (base quality score re-calibration and indel re-alignment); and (v) considering the reads that mapped to the 18 genomic intervals defined as CCR, a variant calling analysis of the 28 samples was done using two different algorithms: the Samtools mpileup [64, 68] analysis, using the default detection parameters, and the GATK HaplotypeCaller tool [63], using default parameters, as suggested in GATK Best Practices recommendations [70]. Using the snpSIFT software [71], filters were applied independently to each of the Samtools and GATK produced VCF files to remove lower quality variants (DP > 10, QUAL > 30, MQ > 30, QD > 5 and FS < 60). An intersect set for the 28 samples, containing those variants concordant between Samtools and GATK predictions, was extracted using BCFtools [68, 72] to produce the final VCF file.

Identification and study of divergent variability in the candidate regions

Among the variants localized in the targeted regions, we selected the SNPs that showed the most significant association with the breed identity between the Churra and Australian Merino samples. To select these SNPs, we first used the VCF-tools software [73] to filter only the SNPs that were detected in all variants and to convert the dataset into PLINK format [42]. Using the PLINK software, we first performed a quality control step on the raw genotypes by discarding SNPs and individuals with genotyping call rates lower than 90%, and SNPs with a minor allele frequency (MAF) lower than 0.01 (--mind 0.1 --geno 0.1 --maf 0.01). We then performed a Chi square association test (using the --assoc option) to identify the SNPs that showed the most significant associations with breed identity and therefore that had the most extreme divergent allele frequencies between the compared populations (e.g. SNPs with genotype “11” in Churra and “22” in Merino, or vice versa). For those SNPs with significant Bonferroni-corrected P values, (considering the number of independent tests as the total number of tested SNPs considered), we performed a functional annotation analysis to assess the possible biological impact of the considered mutations using the Ensembl Variant Effect Predictor (VEP) software [74] (based on the annotated genes of the Oar_v3.1 reference genome). For the non-synonymous variants, the results of the SIFT software analysis [75] regarding predicted effect on protein function were obtained from Ensembl. When the functional analysis assigned one of the divergent variants to a novel gene or pseudogene, we performed BLASTN searches (based on the ± 1.500 bp interval, centered at the SNP location) to identify orthologous genes in cattle (Bos taurus) and/or human (Homo sapiens) genomes. For one of the novel genes that harbored missense mutations, a BLASTN analysis was performed against the newly updated sheep reference genome (Oar_4.0) [76].

Results

Identification of candidate selection sweeps between Australian Merino and Churra sheep based on individual methods

Candidate regions identified by the genetic differentiation analysis

Of the F _ST values averaged in sliding windows of nine SNPs (Fig. 2a), the top 0.5% included 236 values, ranging from 0.119 to 0.325. Following the criteria previously described i.e. allowing a maximum gap of 0.150 Mb to define an F _ST-SS, 49 genomic regions, distributed over 17 autosomes, were considered as potential selection sweeps (F _ST-SS1 to F _ST-SS49; Fig. 2a and see Additional file 8: Table S4). The largest number of F _ST-SS was on chromosome 3 (OAR3, OAR for Ovis aries), where 12 signals were labelled as signatures of selection. The length of the labelled F _ST-SS regions varied from a single central tested SNP (including the averaged estimates of the corresponding 9-SNP window), for 17 regions, to one region involving 40 windows, spanning 1.699 Mb and including 40 SNPs (OAR2, F _ST-SS4).

Candidate regions identified based on reduced heterozygosity

Ninety-six Merino-ObsHtz-SS, distributed across 24 autosomes, were identified after grouping the bottom 0.5% values of the ObsHtz distribution (Fig. 2b; and see Additional file 9: Table S5). The largest candidate region identified by this analysis was located at the proximal end of OAR11 and spanned 1.120 Mb (Merino ObsHtz-SS58: 0.000012–1.120037 Mb). This region included information from 19 central tested SNPs while 45 of the Merino-ObsHtz-SS regions were defined by a single central tested SNP (including the averaged estimates of the corresponding 9-SNP window). When the same analysis was performed with the Churra genotypes, 72 genomic regions (over 23 autosomes) were identified based on the positions included in the bottom 0.5% of the ObsHtz distribution (Churra-ObsHtz-SS) (Fig. 2c; and see Additional file 10: Table S6). The largest of these regions, found on OAR2 (Churra-ObsHtz-SS5: 51.898–52.998 Mb), spanned 1.10 Mb and involved 28 9-SNP windows (i.e. 28 central tested SNPs) while 29 Churra-ObsHtz-SS were based on the averaged ObsHtz estimate assigned to single SNP position.

Candidate regions identified based on haplotype-based analyses

The hapFLK analysis identified seven significant regions (P value < 0.001), one located on OAR2 (hapFLK-SS-1) and the rest located on OAR3 (Fig. 3a; and see Additional file 11: Table S7). The longest selection sweep identified by this approach was hapFLK-SS-6, located on OAR3 (153.963–155.382 Mb). Two other candidate regions involved also an interval longer than 1 Mb: hapFLK-SS-1 (OAR2: 51.898–52.939 Mb) and hapFLK-SS-3 (OAR3: 151.088–152.393 Mb). The XP-EHH analysis identified 98 significant selection sweeps (P value < 0.001) (distributed over 12 autosomes) (Fig. 3b; and see Additional file 12: Table S8). Only six of them, located on OAR6, 11, 15 and 25, showed signatures of selection in the Merino group, whereas the remainder were identified in Churra. Seven of the regions detected by this analysis covered an interval longer than 1 Mb, with the longest selection sweep (XP-EHH-SS17) located on OAR3 (154.638–158.340 Mb). Of the significant regions identified by this analysis, 25 involved a single SNP position.

Convergence of results from the different analyses

Eighteen genomic regions were labelled as convergence candidate regions (CCR) based on the overlap of significant results based on at least one method of each of the two types of analyses performed (i.e. based on allele/genotype frequencies and on haplotype-based information). These regions were located on OAR2, 3, 6, 8, 10, 11, 15 and 25 (Table 1). The intervals included in these CCR ranged from 18.113 kb (CCR17 on OAR15) to 3.701264 Mb (CCR6 on OAR3). All the labeled convergence regions involved a significant result from the XP-EHH analysis, with a good concordance between the sign of the XP-EHH score and the regions showing a reduction in heterozygosity in Merino or Churra. Hence, five of the labeled CCR were related to positive selection in Merino and the other 13 were related to positive selection in Churra.

Table 1 Convergence regions identified in this study based on the overlapping of the results of the four mapping analyses performed to identify selection sweeps between Churra and Australian Merino breeds

Full size table

Fifty-two annotated genes were extracted from the five Merino-defined CCR regions (see Additional file 13: Table S9), whereas 83 genes were extracted from the Churra-CCR intervals (see Additional file 14: Table S10). By comparing these genes with our database of reference candidate genes, 15 unique genes of interest were identified based on their known association with traits targeted by selection, such as horns (RXFP2), stature (LCORL and NCAPG), hair follicle cycle and wool quality (IFNG, DVL2, and TP53), meat production/quality traits (TPM2, CACNG2, PVALB, ACADVL, SLC2A4, CHRNB1, and ATP1B2), or dairy traits (IFNG, ABCG2, TP53, SPP1, and DVL2) (see Additional file 14: Table S11). We found that the five Merino-related CCR and the 15 Churra-related CCR overlapped, respectively, with 103 and 84 previously reported genetic QTL or associations with phenotypic traits (see Additional file 16: Table S12). For each defined CCR, the correspondence with previously reported selection sweeps is indicated in Table 2.

Table 2 Correspondence of the 18 convergence candidate regions (CCR) identified as putative selection signals for Churra and Australian Merino sheep populations with previously reported signatures of selection

Full size table

Selection mapping validation results

The results from the individual analyses performed for the samples included in the validation analysis are in Additional file 17: Table S13 and graphically represented in Additional file 18: Figures S5 and S6. The genetic differentiation analysis identified 39 candidate selection sweeps, whereas the scans looking for regions of reduced observed heterozygosity identified 97 and 68 candidate selection sweeps for Churra and Spanish Merino, respectively. The hapflk and the XP-EHH analyses identified, respectively, six and 76 significant genomic regions as potential selection signatures. Based on the selection signals identified in these individual analyses, and applying the same overlapping requirements than in the core analyses, we defined 18 CCR (labelled as CCR101 to CCR118, as shown in Additional file 19: Table S14). The correspondence between these CCR and those identified in the core analyses are also indicated in Table 3. In summary, seven of the Spanish Merino-Churra CCR were directly related with six of the CCR identified between Australian Merino and Churra breeds (those highlighted in blue), although some others were close to a previously identified CCR (e.g. CCR109 and CCR110 could be considered also related to CCR13). The core CCR that were clearly validated by this secondary analysis were CCR1, CCR3, CCR4 and CCR13 for Churra and CCR12 and CCR28 for Australian Merino. These validated CCR were, in general, those showing the most extreme XP-EHH estimates in the core analyses (e.g. all of them had an absolute XP-EHH estimate higher than 4.80, with the exception of CCR18, located on OAR25, which had an estimate of − 4.233).

Table 3 Correspondence between the convergence candidate regions (CCR) identified in the core analyses between Australian Merino and Churra breeds (labeled as CCR1 to CCR18), with the CCR identified in the validation analyses performed by contrasting a small dataset of Spanish Merino and Churra sheep genotypes (labeled as CCR101 to CCR118)

Full size table

Identification and annotation of divergent allelic variants in the identified CCR based on the analysis of whole-genome sequencing data

Results of the variant calling analysis

The maximum length of the reads obtained through the sequencing process was 100 bp. The whole-genome sequence datasets showed an average number of raw reads per sample of 318,377,494 paired reads. We obtained an average of 296,228,613 reads per sample that passed the quality control process. Per sample, the number of reads aligned to the reference genome varied between 112,607,669 and 513,347,097, with an average of 293,341,790 and an average of 2% unmapped reads per sample. The number of duplicates per sample ranged from 4,818,140 to 43,064,209, with an average of 16,747,357. After alignment, focusing on the 18 target CCR intervals, the individual analyses with GATK and Samtools identified, initially, 194,413 and 196,128 genetic variants respectively (175,317 and 174,278, respectively, after applying the Snpshift filters). The intersection between the variants identified by the two different software programs showed 142,400 variants commonly identified for the 28 sheep genomes analyzed. All these variants, which included 139,745 bi-allelic SNPs and 2655 indels, were considered for further analyses.

Identification of the divergent SNPs in the CCR regions

All 28 samples and 139,244 variants, including SNPs and indels, passed the genotype QC filtering steps and were subsequently investigated by association analysis to compare Australian Merino and Churra breeds. Among these variants, 25,774 do not have an associated rs number. The results of this analysis for the tested SNPs are represented graphically in Fig. 3, where the X-axis shows the log (1/P value). Following a Bonferroni correction for the number of variants analyzed, 1291 variants (1282 SNPs and 9 indels) exceeded the 5% experiment-wise significance threshold (P < 0.05/139,244 = P value < 0.000000359; log (1/P value) = 6.44). The distribution of these divergent variants over the considered chromosomes was as follows: 216 variants on OAR2, 117 on OAR3, 593 on OAR6, 316 on OAR8, 17 on OAR10, 2 on OAR11, 30 on OAR25, and none on OAR15 (Fig. 4). Considering the level of difference in allele frequencies (D) between the two breeds analyzed, which are in Additional file 20: Table S15, differences higher than 0.7 were shown by 79 variants (78 SNPs and 1 Indel) with a high frequency in Churra. Among them, the most extreme value of divergent allele frequency (D = − 0.8) were one SNP on OAR3 (rs408539160) and seven on OAR10 located in the interval 29.380–29.499 Mb within the EEF1A1 gene (ENSOARG00000011616), with the exception of rs421531355, which is an intronic variant of the RXFP2 gene. For the variants with a high frequency in Merino and low in Churra, 943 showed D values higher than 0.7; the most extreme of these D values were found for variants located on OAR3, 6 and 8 (see Additional file 20: Table S15).

The results of the functional annotation analysis showed that 1291 divergent variants included 257 intragenic variants causing 296 annotation variants, distributed across 31 protein coding genes, two pseudogenes (one of them identified as orthologous of bovine TPI1), one rRNA (5S_rRNA) and two snRNA (see Additional file 21: Table S16). Note that all the significant variants identified in the studied region of OAR25 were located in intergenic regions. The genetic variations included in protein coding genes resulted in 275 functional annotation variants classified as four missense variants, four synonymous variants, 199 intronic variants, 38 upstream gene variants, 29 downstream gene variants, and one variant in a 3′ UTR region.

Focusing on the variants located in exons of the above mentioned genes (see Additional file 21: Table S16), all of which were SNPs, we found the following: (1) on OAR2, a synonymous variant in the PHF24 gene and a missense mutation in the NPR2 gene, (2) on OAR6, three synonymous variants, in PDK2, FAM184B and ENSOARG00000004249 (orthologous to human LCORL by BLASTN), and three missense variants, in NCAPG and the inferred LCORL gene.

A full characterization of the identified missense variants is in Table 4. This is based on the eVEP annotation for the NPR2 and NCPAG genes. For a proper assessment of the effects of the LCORL missense mutations, we aligned the interval including the two mutations identified in this gene against the most updated version of the sheep reference genome Oar_v4.0 [76] using a BLASTN search. Then, we identified the effect of the two LCORL missense mutations in the mRNA gene sequence (XM_015096407.1) and the corresponding protein sequence (XP_014951893.1; ligand-dependent nuclear receptor corepressor-like protein isoform X1). The prediction of the functional impact of the amino-acid changes of these mutations with the SIFT software [75] considered the mutations in the NPR2 and LCORL genes as “tolerated”, whereas the missense mutation located in the NCAPG gene, NCAPG_Ser585Phe, was classified as “deleterious” (score = 0.0) (Table 4).

Table 4 Characterization of the three missense mutations identified in this study

Full size table

Discussion

Based on the increasing affordable cost of next-generation sequencing technologies [77], the information derived from whole-genome resequencing offers increased detection power and a higher resolution to identify the genetic variants that underlie variation in traits of economic interest or linked to selection events in livestock populations [78,79,80]. In this work, we exploited the information from WGSeq datasets as a high-resolution step to investigate regions that were previously identified through the analysis of medium-density SNP panels in a representative sample of the population(s).

Identification of selection sweeps in Australian Merino and Churra sheep breeds

The putative selection sweep regions (referred herein as CCR) were determined by comparing 50 K-chip genotypes of three Australian Merino strains that are highly specialized for wool production with genotypes of the related, coarse-wool Spanish Churra dairy breed. In agreement with other authors [21, 22], our population structure analysis supported the use of these two groups of samples as appropriate for mapping selection sweeps because of their close phylogenic relationship but divergent phenotypic characteristics (see Additional file 1: Table S1). The contrasting features of the two breeds [e.g. white and fine wool, growth/carcass production, parasite resistance selection of Australian Merino; coarse wool, milk production/composition, dairy udder/body conformation, and characteristic black patches in specific body regions of Churra; (see Additional file 1: Table S1)] may help to identify the phenotypic targets of putative selection sweeps. Considering the possibility that the geographical isolation and distance between the two studied populations could be a confounding effect with respect of the signals evidenced we have, in addition, performed a validation analysis by contrasting an available dataset of Spanish Merino genotypes with the Churra sheep breed.

In addition, the high genetic diversity reported for the contrasting breeds [30, 81, 82] should be taken into account, which fits well with the known history of these breeds. In particular, the Australian Merinos have been reported as the most diverse sheep populations [81, 83, 84] since the foundation of this population involves contributions from different European, Asian and African breeds and, therefore, Australian Merino are a combination of strains of sheep rather than a single, ancient, homogenous breed. The Australian Merino lines considered in this study have some of the highest estimates for effective population size at 50 generations ago (average of Ne = 868; assuming four years per generation) [21]. The first historical references about Churra sheep date from the Middle Ages, approximately 800 years ago [85]. This breed shows a large influence from ovine populations brought in the Iberia Peninsula by the Celts [86]. Compared with other breeds, the estimated Ne at 50 generations ago for Churra is intermediate (Ne = 600) [21], and a steady decrease of this value until the start of the breeding program in 1986 suggests the absence of severe bottlenecks or other extreme demographic events [82, 87]. Selection sweep mapping methods that are based on haplotypes are known to be highly robust towards perturbations of the demographic model when compared with methods based on population subdivision or allele frequencies [9]. Hence, in our work, the labelling requirement of overlap between at least one haplotype-based method and the F _ST/ObsHtz methods may be seen as helping to limit the number of false positive results due to neutral (e.g. demographic) processes.

Comparing the results of the different analyses, we found discrepancies in the number of signals detected. For example, the hapfFLK analysis, which accounts for the relationship between populations and the LD pattern (haplotype diversity), only detected seven candidate signatures of selection, compared with 25 detected by the genetic differentiation approach, 96 and 77 regions of signatures of selection of reduced heterozygosity in Australian Merino and Churra, respectively, and 98 regions identified by the XP-EHH analysis. These differences can be explained by the fact that the different statistics used in selection sweep mapping do not capture the same patterns in the data, as previously pointed out by many authors [23, 88, 89]. The small number of significant regions detected with hapFLK compared with the other methods agrees with other studies that have analyzed the same datasets using hapFLK and other methods [30, 90]. As suggested by Manunza et al. [30], the multipoint linkage LD model implemented by hapFLK may explain the higher stringency of this method when compared with methods that do not consider haplotype structure (such as F _ST or ObsHtz). In contrast, XP-EHH analysis, which also makes use of haplotype information, detected a large number of selection sweeps (98) supporting several of the signatures of selection detected by the ObsHtz and F _ST analyses. Hence, the underlying model of this cross-population analysis for which the basic idea is to test if each site is homozygous in one population and polymorphic in the other population appears to fit efficiently for the Churra versus Australian Merino contrast undertaken in our study. Also the ObsHtz analysis identified a substantial number of candidate regions (96 Merino/77 Churra), which agrees with the fact that signatures of selection that are based on a reduction in genetic diversity persist for a longer period of time than signals based on haplotype structure and thus the former can detect older signatures of selection [88, 89]. The intermediate number of candidate regions detected based on population differentiation agrees with the intermediate position suggested for these methods by Sabeti et al. [88] regarding the time scale persistency for the different selection sweep mapping methods.

Similar discrepancies in the number of candidate selection sweeps identified by the different methods in the Spanish Merino and Churra analyses (39 with F _ST, 97 for ObsHtz-Churra, 68 for ObsHtz-SpanishMerino, 7 for hapflk and 76 with XP-EHH) (see Additional file 17: Table S13) prove that the differences observed in the number of signatures of selection in the Australian Merino versus Churra analyses were not due to the confounding effect of genetic drift and geographical isolation of the breeds analyzed. The fact that the six clearly confirmed CCR were those that had the most extreme XP-EHH estimates in the Australian Merino-Churra core analyses appears to suggest that the lack of confirmation of some other regions (e.g. CCR2 on OAR2, CCR16 on OAR11, CCR17 on OAR15) can be related to the lower power of detection of the validation analyses due to the limited number of samples analyzed (20 Spanish Merino and 20 Churra samples). Overall, we think that the validation strategy presented here supports the validity of the selection sweeps identified when contrasting Australian Merino and Churra sheep.

In addition, the combination of the four methods used in this work provides a comprehensive picture of the different types of selection sweeps present in the genomes of Churra and Australian Merino sheep. The requirement of overlap between regions identified by different methods to define a selection sweep increases the reliability of the 18 CCR reported here to result from genuine selection events. This is supported by the high level of positional correspondence of these regions with selection sweeps previously reported in sheep (Table 2).

Exploration of convergence candidate regions through WGSeq

In this study, we exploited WGSeq as a secondary step to provide a detailed study of the genetic variation within the regions previously identified as potential signatures of selection. As a technical issue and considering that paired-end sequencing is preferred over single-end sequencing, since it allows improved identification of duplicated reads and a better estimation of the fragment size distribution [91], it is worth clarifying that the workflow used in this study was based on Trimmomatic, which provides a flexible method to keep, in the analysis, the reads for which their paired read is filtered during the quality control filtering [66]. Also the reads for which their paired read was unmapped were included in the later variant calling analysis. To assess the impact that the use of singletons could have on the results, we repeated the variant calling analysis without considering singletons, and found a concordance level of 99.15%. This observation suggests that, for the variant calling analysis workflow applied in this study, which considers the common variants identified by GATK and Samtools, the use of singletons does not have a negative impact on the quality of the variant calling analysis; however, we can also consider than using a simpler workflow that eliminates these singletons may be an efficient strategy for future studies.

Merino-related convergence candidate regions

Among the genomic regions showing positive selection in Merino, those located on OAR6, CCR11 (36.641–36.914 Mb) and CCR12 (37.164–38.580 Mb) showed substantial overlap with previously reported selection sweeps (Table 2) and QTL in sheep (see Additional file 16: Table S12). While most of the coincidences with selection sweeps reported in these regions are related to studies on the SheepHapMap dataset (Table 2), the study of Liu et al. [27] on adaptation to different ecoregions (regions where sheep are exposed to different climate, environment and feeding conditions) also identified the region OAR6: 36.200–36.500 Mb as a putative signature of selection in Duolang sheep. Furthermore, a large number of QTL/associations with production traits have been mapped within these two CCR in a population of Scottish Blackface lambs [92] (see Additional file 16: Table S12). Several of those effects were associated with carcass bone percentage and fat carcass traits, with some suggestion of muscle density effects. CCR11 and CCR12 also include QTL for growth traits [93, 94] and birth weight [95]. Most of these studies suggest NCAPG and LCORL as strong candidate genes for these effects, due to the reported associations of these loci with human stature [96] and body size in mammals [97,98,99,100,101,102]. In addition to these sheep studies, selection sweeps have been identified around the NCAPG-LCORL locus in dogs and pigs [49, 103].

In many cases, the intervals flanking the previously reported selection sweeps or the QTL reported in this OAR6 genomic region involve both CCR11 and CCR12. Considering the inherent inaccuracy of gene mapping, even when based on medium-density SNP arrays, and the large number of effects identified in that region, our approach to group individual signatures of selection based on the extent of LD of the studied breeds appears to identify two independent selection sweeps, which may help to differentiate the causal mutations that underlie the various QTL effects reported in this genomic region.

CCR11 contains the following annotated genes: ABCG2, PKD2, SPP1, MEPE, and IBSP (see Additional file 21: Table S16). Our high-resolution analysis based on sequence data showed that the CCR11 SNPs showing a significant association with breed identity between Churra and Australian Merino were located in three genes included in that interval: PKD2, MEPE and IBSP. Three intronic and one synonymous divergent variant were identified in the PKD2 gene. In cattle, one SNP within this gene is significantly associated with hot carcass weight and intermuscular fat percentage [104]. The other divergent variants were located in non-coding regions of the MEPE and IBSP genes. MEPE is thought to play an inhibitory role in bone formation, and disruption of one of its alleles is known to increase bone mass in mouse [105]. No significant associations were identified with markers within ABCG2 and SPP1 which are functional candidate genes for milk production traits [106, 107].

CCR12 (OAR6: 37.164–38.580 Mb) includes four annotated genes, the major candidate genes NCAPG and LCORL, as well as FAM184B, which is highly expressed in skeletal muscle (http://www.proteinatlas.org/ENSG00000047662-FAM184B/tissue), and DCAF16. A considerable proportion of the intragenic variants (88/296) that show significant between-breed divergence were located within these four genes (see Additional file 21: Table S16), including two synonymous variants in LCORL and FM184B and three missense mutations in NCAPG and LCORL. Kühn et al. [108] suggested that the mechanism underlying the association between NCAPG and pre- and post-natal growth in several mammalian species may be related to the role of this gene in the modulation of growth and body tissue deposition by indirect effects on the nitric oxide (NO) pathway. In cattle, NCAPG is also suggested to be involved in early muscle development. [109]. An analysis in UniProt [110] shows the NCAPG_Ser585Phe amino acid substitution to affect the C-terminal, cysteine-rich domain of the protein, whereas a high level of across species conservation for this residue of the NCAPG protein (Fig. 5) was shown in a comparative analysis with Clustal Omega (http://www.ebi.ac.uk/Tools/msa/clustalo/). Further functional studies are needed to assess how this mutation may affect the protein function and possible effects on phenotype(s). The other gene for which divergent missense mutations were identified in this region is LCORL, which encodes a transcription factor that binds specific DNA elements and appears to function in spermatogenesis; polymorphisms in this gene have also been associated with skeletal frame size and human height, as previously discussed.

The results of the sequence analysis in our study support the hypothesis that the NCAPG-LCORL locus may be responsible for direct effects on sheep production traits, possibly including growth and/or carcass traits, which differ between Churra and Australian Merino. Studies that compared carcass characteristics of Churra and Spanish Merino at the same slaughter age showed that Merino has significantly higher hot carcass weight and conformation score than Churra [111], and also higher carcass yield, total muscle and bone percentage [112]. Hence, the three missense mutations identified within the NCAPG and LCORL genes by our high-resolution study should be further studied to assess their potential role as causal mutations of the previously reported QTL effects within that genomic region (e.g. Matika et al. [92]). This is the first study that suggests specific mutations in this region that may influence production traits in sheep. As the single “deleterious” mutation identified in this region, the NCAPG_Ser585Phe protein variant should be considered as the top candidate to explain the CCR12 signature of selection or the potential selection sweep nucleotide (SSN). Analysis of this mutation in meat breeds where the mutation segregates will allow the verification of the true effect of this polymorphism on production traits. In order to obtain further information about the potential link between NCAPG and LCORL missense mutations and meat production traits, we extracted the genotypes for these three polymorphisms from the 453 samples analyzed in the “PRJEB14685” Project of the EVA repository (http://www.ebi.ac.uk/ena/data/view/PRJEB14685). As observed from these genotypes (see Additional file 22: Table S17), the Merino-related allele for the three mutations, is only present (in homozygous or heterozygous status) in “meat production” breeds.

Three other Merino-CCR regions, CCR16, CCR17 and CCR18, are located on OAR11, 15 and 25, respectively. CCR16 (OAR11) is the most interesting, from which six of the 42 annotated genes were highlighted by the survey performed against our database of candidate genes (see Additional file 15: Table S11). Among them, TP53 and DVL2, are both candidates for wool production due to their link to the hair follicle cycle [45]. However, the divergent intragenic variants identified in this region did not affect any of these wool-related candidate genes but included non-exonic variants within the DLG4 and ACADVL genes. The ACADVL gene encodes the enzyme that catalyzes the first step of the mitochondrial fatty acid beta-oxidation pathway, which suggests that there may be differences in the fatty acid composition between Churra and Merino lamb meat.

A large proportion (7/9) of the previously reported QTL located within CCR18 (OAR25: 7.356–7.821 Mb) are wool-related QTL (see Additional file 16: Table S12). The study of Allain et al. [113], based on microsatellite markers, suggested the presence in this region of a locus with a major effect on fleece characteristics. These results were supported by another study based on the 50 K-chip in which genome-wise significant SNP associations for seven wool-related traits were reported in a nearby interval (OAR25: 6.1–8.2 Mb). Interestingly, a 2-kb insertion was identified at this location, which is a potential causal mutation for the absence of long, coarse hair in the birthcoat of the Romane breed [114]. This is a trait that is moderately genetically correlated with wool quality traits, with the woollier lambs showing a lower coefficient of variation, fewer fibers thicker than 30 μm and better wool quality compared with the other hairier lambs [115]. Within CCR18, 30 SNPs showed a significant difference in allele frequencies between Churra and Merino populations, although none of these SNPs were intragenic. Among the genes found in this region (see Additional file 13: Table S9), the novel gene ENSOARG00000017989 shows correspondence with the bovine EIF2S2 gene, which encodes the beta subunit of EIF-2 that functions in the early steps of protein synthesis. EIF2S2 has been suggested as a novel candidate gene in relation to skin color in humans [116]. Because white wool color has been a major selection objective in Australian Merino, a possible link between the EIF2S2 gene and the CCR18 Merino effect of the signature of selection should be investigated further.

Churra-related convergence candidate regions

With regard to the 15 Churra-associated CCR, nine overlapped with previously reported selection sweeps in sheep (Table 2). The region that showed correspondence with the largest number of studies (most based on SheepHapMap data) is CCR15 (OAR10: 29.334–29.713 Mb). This region includes the RXFP2 gene, which is suggested to control the presence and size of horns in wild and domestic populations of sheep [21, 79, 117] and to be important for horn development in goats and cattle [118, 119]. RXFP2 is a receptor for the relaxin and insulin-like factor 3 proteins and its effects on horn size and status appear to depend on its biochemical interactions with testosterone effects [120, 121]. This region was identified by XP-EHH and ObsHz analyses and overlapped with a signal from the F _ST analysis, which suggests positive selection in Churra (Table 1). Taking into account that selection for the polled phenotype has occurred in both Churra and all Australian Merino lines (not only the Australian Poll Merino), the detection of CCR15 as a Churra-related selection sweep region suggests that selection for the absence of horns is more recent (and thus more detectable) in Churra than in Merino. In fact, the high selection pressure for polledness in Churra started with the breeding program in 1985 [39], and prior to that time horned rams were preferred by Churra breeders (F. de la Fuente, personal communication). At present, about 90% of the Churra males are polled. There are practically no females with horns, and when horns are present they are rudimentary. For the OAR10 selection sweep, our high-resolution analysis identified 13 significant divergent intronic variants in the gene and one variant in the third intron of the RXFP2 gene. A 1.8-kb insertion reported in the 3’ UTR region of the ovine RXFP2 gene, which includes two exons of the EEF1A1 gene, has been suggested to explain the polled phenotype in some sheep breeds [122]. However, this insertion does not completely segregate with the horn status in breeds with a variable horn status in both sexes or with a sex-dependent horn status [123], which is the horn status in Churra sheep. An additional exploratory association analysis considering only the Churra and Australian Poll Merino SNPs located on OAR10 (5329 polymorphic SNPs) only identified one significant SNP, which was annotated as an intronic variant within the EEF1A1 gene (at 29.380.801 bp; P value = 0.000006769). Additional research based on whole-genome sequencing data from breeds with a variable horn status may shed light on this complex phenotype.

Two of the other Churra-related CCR, CCR1 (OAR2: 51.659–53.837) and CCR4 (OAR3: 152.545–153.519), were the only CCR supported by the hapFLK analysis, which identified the smallest number of regions in our analyses. These two regions, which were also replicated when analyzing Spanish Merino and Churra genotypes, overlap with 11 and seven QTL, respectively, reported for a range of sheep production traits (see Additional file 16: Table S12). The CCR1 region on OAR2 was the only one detected by all four selection sweep mapping methods (Table 1) and overlaps with several previously reported signatures of selection (Table 2). The intragenic significant divergent variation identified in this region included a missense mutation in the NPR2 gene, rs160159505 (OAR2: 52,429,848 pb). This gene was previously suggested by other authors [21, 22] as a strong candidate gene for ovine selection sweep effects reported in this region due to its major role in the regulation of skeletal growth regulation [124]. In humans, mutations in this gene are related to impaired skeletal growth [124, 125]. Because rs160159505 was the mutation showing the highest significant association with breed identity within CCR1 (P value = 0.0000000001006) and although it was classified as “tolerated”, further studies should assess the possible relationship of this SNP with growth- or size-related traits in sheep. CCR4, the other region identified by hapFLK, showed correspondence with a signature of selection that was previously reported in a study on ovine signatures of selection related to dairy specialization (also including the Churra breed) [19]. The LALBA (OAR3:137.390–137.392) gene, which was suggested to harbor a quantitative trait nucleotide (QTN) for QTL effects related to milk composition traits in Churra sheep [126], maps just outside the boundaries of this region. The divergent intragenic variants annotated within this region were in non-coding regions of the genes HELB and IRAK3, which were not highlighted by our candidate gene survey. The other Churra-related CCR are not discussed in detail, but we would like to mention that CCR3 (OAR3: 151.088–152.393 Mb) overlaps with three QTL for resistance to gastrointestinal infection (see Additional file 16: Table S12) and that our association analyses identified, within this interval, a single significant intergenic SNP (rs421227322) located in the upstream region containing three immune-related genes, IL22, IL26 and IFNG. Also of possible relevance is the identification of three intronic variants within the well-defined region of CCR5 (OAR3:154.006–154.522 Mb), which show extreme allele frequencies between Churra and Merino within the genes MSRB3 and LEMD3. Indeed, intronic variation within these two genes has been directly associated with a pleiotropic QTL reported in cattle for birth weight, calving ease direct, marbling and ribeye muscle area [127].

Conclusions

We have identified 18 putative selection sweeps by contrasting the Australian Merino and Spanish Churra sheep breeds. The phenotypes affected by these genetic effects may involve any trait for which selection was implemented in only one of the two compared breeds. The criteria used to define the selection candidate regions based on multiple mapping methods, together with a validation approach based on a comparison between Spanish Merino and Churra genotypes, support the validity of our mapping results and confirm the value of using selection mapping to detect genomic regions that influence the phenotypic variation of complex traits in livestock species. Our subsequent high-resolution study performed in the target CCR reveals promising candidate mutations to explain some of the identified selection events, including variants in the RXFP2, NPR2, NCAPG and LCORL genes, related to the presence of horns and skeletal growth. Further studies are necessary to confirm the possible direct effect of some of the mutations highlighted in this work on the phenotypic variation of traits of interest in sheep.

References

Maijala K. Genetic aspects of domestication, common breeds and their origin. In: Piper L, Ruvinsky A, editors. The genetics of sheep. Oxford: CABI; 1997. p. 539–64.
Google Scholar
Larson G, Fuller DQ. The evolution of animal domestication. Annu Rev Ecol Evol Syst. 2014;45:115–36.
Article Google Scholar
Clutton-Brock J. Domesticated animals from early times. London: Heinemann and British Museum (Natural History); 1981.
Google Scholar
Fraser AF. Evolution of domesticated animals. London: Longman; 1985.
Google Scholar
Smith JM, Haigh J. The hitch-hiking effect of a favourable gene. Genet Res. 2007;89:391–403.
Article PubMed Google Scholar
Kaplan NL, Hudson RR, Langley CH. The “hitchhiking effect” revisited. Genetics. 1989;123:887–99.
CAS PubMed PubMed Central Google Scholar
Wiener P, Wilkinson S. Deciphering the genetic basis of animal domestication. Proc Biol Sci. 2011;278:3161–70.
Article PubMed PubMed Central Google Scholar
Weir BS, Cockerham CC. Estimating F-statistics for the analysis of population structure. Evolution. 1984;38:1358–70.
CAS PubMed Google Scholar
Rubin CJ, Zody MC, Eriksson J, Meadows JRS, Sherwood E, Webster MT, et al. Whole-genome resequencing reveals loci under selection during chicken domestication. Nature. 2010;464:587–91.
Article CAS PubMed Google Scholar
Wiener P, Pong-Wong R. A regression-based approach to selection mapping. J Hered. 2011;102:294–305.
Article PubMed Google Scholar
Fu YX. Statistical tests of neutrality of mutations against population growth, hitchhiking and background selection. Genetics. 1997;147:915–25.
CAS PubMed PubMed Central Google Scholar
Galtier N, Depaulis F, Barton NH. Detecting bottlenecks and selective sweeps from DNA sequence polymorphism. Genetics. 2000;155:981–7.
CAS PubMed PubMed Central Google Scholar
Hudson RR, Kreitman M, Aguadé M. A test of neutral molecular evolution based on nucleotide data. Genetics. 1987;116:153–9.
CAS PubMed PubMed Central Google Scholar
Nielsen R, Hellmann I, Hubisz M, Bustamante C, Clark AG. Recent and ongoing selection in the human genome. Nat Rev Genet. 2007;8:857–68.
Article CAS PubMed PubMed Central Google Scholar
Pritchard JK, Pickrell JK, Coop G. The genetics of human adaptation: hard sweeps, soft sweeps, and polygenic adaptation. Curr Biol. 2010;20:R208–15.
Article CAS PubMed PubMed Central Google Scholar
Wilson BA, Petrov DA, Messer PW. Soft selective sweeps in complex demographic scenarios. Genetics. 2014;198:669–84.
Article PubMed PubMed Central Google Scholar
Gutierrez-Gil B, Arranz JJ, Wiener P. An interpretive review of selective sweep studies in Bos taurus cattle populations: identification of unique and shared selection signals across breeds. Front Genet. 2015;6:167.
PubMed PubMed Central Google Scholar
Wilkinson S, Lu ZH, Megens HJ, Archibald AL, Haley C, Jackson IJ, et al. Signatures of diversifying selection in European pig breeds. PLoS Genet. 2013;9:e1003453.
Article CAS PubMed PubMed Central Google Scholar
Gutierrez-Gil B, Arranz JJ, Pong-Wong R, Garcia-Gamez E, Kijas J, Wiener P. Application of selection mapping to identify genomic regions associated with dairy production in sheep. PLoS One. 2014;9:e94623.
Article PubMed PubMed Central CAS Google Scholar
Moon S, Kim TH, Lee KT, Kwak W, Lee T, Lee SW, et al. A genome-wide scan for signatures of directional selection in domesticated pigs. BMC Genomics. 2015;16:130.
Article PubMed PubMed Central Google Scholar
Kijas JW, Lenstra JA, Hayes B, Boitard S, Porto Neto LR, San Cristobal M, et al. Genome-wide analysis of the World’s sheep breeds reveals high levels of historic mixture and strong recent selection. PLoS Biol. 2012;10:e1001258.
Article CAS PubMed PubMed Central Google Scholar
Fariello MI, Servin B, Tosser-Klopp G, Rupp R, Moreno C, International Sheep Genomics Consortium, et al. Selection signatures in worldwide sheep populations. PLoS One. 2014;9:e103813.
Article PubMed PubMed Central CAS Google Scholar
Fariello MI, Boitard S, Naya H, SanCristobal M, Servin B. Detecting signatures of selection through haplotype differentiation among hierarchically structured populations. Genetics. 2013;193:929–41.
Article PubMed PubMed Central Google Scholar
Kijas JW. Haplotype-based analysis of selective sweeps in sheep. Genome. 2014;57:433–7.
Article PubMed Google Scholar
Moradi MH, Nejati-Javaremi A, Moradi-Shahrbabak M, Dodds KG, McEwan JC. Genomic scan of selective sweeps in thin and fat tail sheep breeds for identifying of candidate regions associated with fat deposition. BMC Genet. 2012;13:10.
Article CAS PubMed PubMed Central Google Scholar
McRae KM, McEwan JC, Dodds KG, Gemmell NJ. Signatures of selection in sheep bred for resistance or susceptibility to gastrointestinal nematodes. BMC Genomics. 2014;15:637.
Article PubMed PubMed Central Google Scholar
Liu Z, Ji Z, Wang G, Chao T, Hou L, Wang J. Genome-wide analysis reveals signatures of selection for important traits in domestic sheep from different ecoregions. BMC Genomics. 2016;17:863.
Article PubMed PubMed Central Google Scholar
Lv FH, Agha S, Kantanen J, Colli L, Stucki S, Kijas JW, et al. Adaptations to climate-mediated selective pressures in sheep. Mol Biol Evol. 2014;31:3324–43.
Article CAS PubMed PubMed Central Google Scholar
Kim E-S, Elbeltagy AR, Aboul-Naga AM, Rischkowsky B, Sayre B, Mwacharo JM, et al. Multiple genomic signatures of selection in goats and sheep indigenous to a hot arid environment. Heredity (Edinb). 2016;116:255–64.
Article CAS Google Scholar
Manunza A, Cardoso TF, Noce A, Martínez A, Pons A, Bermejo LA, et al. Population structure of eleven Spanish ovine breeds and detection of selective sweeps with BayeScan and hapFLK. Sci Rep. 2016;6:27296.
Article CAS PubMed PubMed Central Google Scholar
Ciani E, Crepaldi P, Nicoloso L, Lasagna E, Sarti FM, Moioli B, et al. Genome-wide analysis of Italian sheep diversity reveals a strong geographic pattern and cryptic relationships between breeds. Anim Genet. 2014;45:256–66.
Article CAS PubMed Google Scholar
Beynon SE, Slavov GT, Farré M, Sunduimijid B, Waddams K, Davies B, et al. Population structure and history of the Welsh sheep breeds determined by whole genome genotyping. BMC Genet. 2015;16:65.
Article PubMed PubMed Central CAS Google Scholar
Diez-Tascon C, Littlejohn RP, Almeida PAR, Crawford AM. Genetic variation within the Merino sheep breed: analysis of closely related populations using microsatellites. Anim Genet. 2000;31:243–51.
Article CAS PubMed Google Scholar
Ciani E, Lasagna E, D’Andrea M, Alloggio I, Marroni F, Ceccobelli S, et al. Merino and Merino-derived sheep breeds: a genome-wide intercontinental study. Genet Sel Evol. 2015;47:64.
Article PubMed PubMed Central CAS Google Scholar
Lewis W, Balderstone S, Bowman J. Events that shaped Australia. London: New Holland Publishers; 2006.
Google Scholar
Fogarty NM, Safari E, Taylor PJ, Murray W. Genetic parameters for meat quality and carcass traits and their correlation with wool traits in Australian Merino sheep. Aust J Agric Res. 2003;54:715–22.
Article Google Scholar
Gardner GE, Williams A, Siddell J, Ball AJ, Mortimer S, Jacob RH, et al. Using Australian sheep breeding values to increase lean meat yield percentage. Anim Prod Sci. 2010;50:1098–106.
Article Google Scholar
Raadsma HW, Gray GD, Woolaston RR. Breeding for disease resistance in Merino sheep in Australia. Rev Sci Tech. 1998;17:315–28.
Article CAS PubMed Google Scholar
de la Fuente LF, Fernández G, San Primitivo F. Breeding programme for the Spanish Churra sheep breed. Cahier Options Méditerranéennes. 1995;11:165–72.
Google Scholar
Miguélez E, Zumalacárregui JM, Osorio MT, Figueira AC, Fonseca B, Mateo J. Quality traits of suckling-lamb meat covered by the protected geographical indication “Lechazo de Castilla y León” European quality label. Small Ruminant Res. 2008;77:65–70.
Article Google Scholar
Sambrook J, Russell DW. Purification of nucleic acids by extraction with phenol:chloroform. CSH Protoc. 2006;2006:pii: pdb.prot4455.
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81:559–75.
Article CAS PubMed PubMed Central Google Scholar
Sheep genome assembly v3.1. Available from: http://www.ensembl.org/Ovis_aries/Info/Index.
Patterson N, Price AL, Reich D. Population structure and Eigenanalysis. PLoS Genet. 2006;2:e190.
Article PubMed PubMed Central CAS Google Scholar
Alexander DH, Novembre J, Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 2009;19:1655–64.
Article CAS PubMed PubMed Central Google Scholar
Akey JM, Zhang G, Zhang K, Jin L, Shriver MD. Interrogating a high-density SNP map for signatures of natural selection. Genome Res. 2002;12:1805–14.
Article CAS PubMed PubMed Central Google Scholar
Akey JM, Ruhe AL, Akey DT, Wong AK, Connelly CF, Madeoy J, et al. Tracking footprints of artificial selection in the dog genome. Proc Natl Acad Sci USA. 2010;107:1160–5.
Article CAS PubMed PubMed Central Google Scholar
Vaysse A, Ratnakumar A, Derrien T, Axelsson E, Rosengren Pielberg G, Sigurdsson S, et al. Identification of genomic regions associated with phenotypic variation between dog breeds using selection mapping. PLoS Genet. 2011;7:e1002316.
Article CAS PubMed PubMed Central Google Scholar
Rubin CJ, Megens HJ, Martinez Barrio A, Maqbool K, Sayyab S, Schwochow D, et al. Strong signatures of selection in the domestic pig genome. Proc Natl Acad Sci USA. 2012;109:19529–36.
Article CAS PubMed PubMed Central Google Scholar
Stainton JJ, Haley CS, Charlesworth B, Kranis A, Watson K, Wiener P. Detecting signatures of selection in nine distinct lines of broiler chickens. Anim Genet. 2015;46:37–49.
Article CAS PubMed Google Scholar
Bonhomme M, Chevalet C, Servin B, Boitard S, Abdallah J, Blott S, et al. Detecting selection in population trees: the Lewontin and Krakauer test extended. Genetics. 2010;186:241–62.
Article PubMed PubMed Central Google Scholar
Gautier M, Vitalis R. rehh: an R package to detect footprints of selection in genome-wide SNP data from haplotype structure. Bioinformatics. 2012;28:1176–7.
Article CAS PubMed Google Scholar
Sabeti PC, Varilly P, Fry B, Lohmueller J, Hostetter E, Cotsapas C, et al. Genome-wide detection and characterization of positive selection in human populations. Nature. 2007;449:913–8.
Article CAS PubMed PubMed Central Google Scholar
Barrett JC, Fry B, Maller J, Daly MJ. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics. 2005;21:263–5.
Article CAS PubMed Google Scholar
Tang K, Thornton KR, Stoneking M. A new approach for using genome scans to detect recent positive selection in the human genome. PLoS Biol. 2007;5:e171.
Article PubMed PubMed Central CAS Google Scholar
Hu ZL, Park CA, Wu XL, Reecy JM. Animal QTLdb: an improved database tool for livestock animal QTL/association data dissemination in the post-genome era. Nucleic Acids Res. 2013;41:D871–9.
Article CAS PubMed Google Scholar
Kinsella RJ, Kahari A, Haider S, Zamora J, Proctor G, Spudich G, et al. Ensembl BioMarts: a hub for data retrieval across taxonomic space. Database (Oxford). 2011;2011:bar030.
Article CAS Google Scholar
Stenn KS, Paus R. Controls of hair follicle cycling. Physiol Rev. 2001;81:449–94.
CAS PubMed Google Scholar
Wang Z, Zhang H, Yang H, Wang S, Rong E, Pei W, et al. Genome-wide association study for wool production traits in a Chinese Merino sheep population. PLoS One. 2014;9:e107101.
Article PubMed PubMed Central CAS Google Scholar
Liu N, Li H, Liu K, Yu J, Cheng M, De W, et al. Differential expression of genes and proteins associated with wool follicle cycling. Mol Biol Rep. 2014;41:5343–9.
Article CAS PubMed Google Scholar
SRA Toolkit documentation. Available from: http://www.ncbi.nlm.nih.gov/Traces/sra/?view=software.
Kijas J, Brauning R, Clarke SM, McCulloch A, Cockett NE, Saunders G, et al. Launching SheepGenomesDB: 100 million variants from nearly 500 sheep genomes. J Anim Sci. 2016;94:S92–3.
Article Google Scholar
McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20:1297–303.
Article CAS PubMed PubMed Central Google Scholar
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25:2078–9.
Article PubMed PubMed Central CAS Google Scholar
Andrews S. FastQC: a quality control tool for high throughput sequence data. 2010. Available from: http://www.bioinformatics.babraham.ac.uk/projects/fastqc.
Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30:2114–20.
Article CAS PubMed PubMed Central Google Scholar
Li H, Durbin R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics. 2009;25:1754–60.
Article CAS PubMed PubMed Central Google Scholar
Li H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. 2011;27:2987–93.
Article CAS PubMed PubMed Central Google Scholar
Institute Broad. Picard tool, version 1.128. Available from: http://broadinstitute.github.io/picard/.
DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet. 2011;43:491–8.
Article CAS PubMed PubMed Central Google Scholar
Cingolani P, Patel VM, Coon M, Nguyen T, Land SJ, Ruden DM, et al. Using Drosophila melanogaster as a model for genotoxic chemical mutational studies with a new program, SnpSift. Front Genet. 2012;3:35.
Article PubMed PubMed Central Google Scholar
Narasimhan V, Danecek P, Scally A, Xue Y, Tyler-Smith C, Durbin R. BCFtools/RoH: a hidden Markov model approach for detecting autozygosity from next-generation sequencing data. Bioinformatics. 2016;32:1749–51.
Article CAS PubMed PubMed Central Google Scholar
Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, et al. The variant call format and VCFtools. Bioinformatics. 2011;27:2156–8.
Article CAS PubMed PubMed Central Google Scholar
McLaren W, Pritchard B, Rios D, Chen Y, Flicek P, Cunningham F. Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor. Bioinformatics. 2010;26:2069–70.
Article CAS PubMed PubMed Central Google Scholar
Kumar P, Henikoff S, Ng PC. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat Protoc. 2009;4:1073–81.
Article CAS PubMed Google Scholar
Sheep reference genome Oar_4.0. Available from: https://www.ncbi.nlm.nih.gov/genome?term=ovisaries.
Bai Y, Sartor M, Cavalcoli J. Current status and future perspectives for sequencing livestock genomes. J Anim Sci Biotechnol. 2012;3:8.
Article CAS PubMed PubMed Central Google Scholar
Boitard S, Boussaha M, Capitan A, Rocha D, Servin B. Uncovering adaptation from sequence data: Lessons from genome resequencing of four cattle breeds. Genetics. 2016;203:433–50.
Article CAS PubMed PubMed Central Google Scholar
Kardos M, Luikart G, Bunch R, Dewey S, Edwards W, McWilliam S, et al. Whole-genome resequencing uncovers molecular signatures of natural and sexual selection in wild bighorn sheep. Mol Ecol. 2015;24:5616–32.
Article CAS PubMed Google Scholar
Herrero-Medrano JM, Megens HJ, Groenen MAM, Bosse M, Pérez-Enciso M, Crooijmans RPMA. Whole-genome sequence analysis reveals differences in population management and selection of European low-input pig breeds. BMC Genomics. 2014;15:601.
Article PubMed PubMed Central Google Scholar
Kijas JW, Porto-Neto L, Dominik S, Reverter A, Bunch R, McCulloch R, et al. Linkage disequilibrium over short physical distances measured in sheep using a high-density SNP chip. Anim Genet. 2014;45:754–7.
Article CAS PubMed Google Scholar
Chitneedi PK, Arranz JJ, Suárez-Vega A, García-Gámez E, Gutiérrez-Gil B. Estimations of linkage disequilibrium, effective population size and ROH-based inbreeding coefficients in Spanish Churra sheep using imputed high-density SNP genotypes. Anim Genet. 2017;48:436–46.
Article CAS PubMed Google Scholar
Meadows JRS, Chan EKF, Kijas JW. Linkage disequilibrium compared between five populations of domestic sheep. BMC Genet. 2008;9:61.
Article PubMed PubMed Central CAS Google Scholar
Al-Mamun HA, Clark SA, Kwan P, Gondro C. Genome-wide linkage disequilibrium and genetic diversity in five populations of Australian domestic sheep. Genet Sel Evol. 2015;47:90.
Article PubMed PubMed Central CAS Google Scholar
Sánchez Belda A, Sánchez Trujillano MC. Razas ovinas españolas. Publicaciones de Extensión Agraria, Ministerio de Agricultura, Pesca y Alimentación; 1986.
Arranz JJ, Bayón Y, San Primitivo F. Genetic relationships among Spanish sheep using microsatellites. Anim Genet. 1998;29:435–40.
Article CAS PubMed Google Scholar
García-Gámez E, Sahana G, Gutiérrez-Gil B, Arranz J-J. Linkage disequilibrium and inbreeding estimation in Spanish Churra sheep. BMC Genet. 2012;13:43.
Article PubMed PubMed Central CAS Google Scholar
Sabeti PC, Schaffner SF, Fry B, Lohmueller J, Varilly P, Shamovsky O, et al. Positive natural selection in the human lineage. Science. 2006;312:1614–20.
Article CAS PubMed Google Scholar
González-Rodríguez A, Munilla S, Mouresan EF, Cañas-Álvarez JJ, Díaz C, Piedrafita J, et al. On the performance of tests for the detection of signatures of selection: a case study with the Spanish autochthonous beef cattle populations. Genet Sel Evol. 2016;48:81.
Article PubMed PubMed Central Google Scholar
Gholami M, Reimer C, Erbe M, Preisinger R, Weigend A, Weigend S, et al. Genome scan for selection in structured layer chicken populations exploiting linkage disequilibrium information. PLoS One. 2015;10:e0130497.
Article PubMed PubMed Central CAS Google Scholar
Sims D, Sudbery I, Ilott NE, Heger A, Ponting CP. Sequencing depth and coverage: key considerations in genomic analyses. Nat Rev Genet. 2014;15:121–32.
Article CAS PubMed Google Scholar
Matika O, Riggio V, Anselme-Moizan M, Law AS, Pong-Wong R, Archibald AL, et al. Genome-wide association reveals QTL for growth, bone and in vivo carcass traits as assessed by computed tomography in Scottish Blackface lambs. Genet Sel Evol. 2016;48:11.
Article PubMed PubMed Central CAS Google Scholar
Raadsma HW, Thomson PC, Zenger KR, Cavanagh C, Lam MK, Jonas E. Mapping quantitative trait loci (QTL) in sheep. I. A new male framework linkage map and QTL for growth rate and body weight. Genet Sel Evol. 2009;41:34.
Article PubMed PubMed Central CAS Google Scholar
Cavanagh CR, Jonas E, Hobbs M, Thomson PC, Tammen I, Raadsma HW. Mapping quantitative trait loci (QTL) in sheep. III. QTL for carcass composition traits derived from CT scans and aligned with a meta-assembly for sheep and cattle carcass QTL. Genet Sel Evol. 2010;42:36.
Article PubMed PubMed Central Google Scholar
Al-Mamun HA, Kwan P, Clark SA, Ferdosi MH, Tellam R, Gondro C. Genome-wide association study of body weight in Australian Merino sheep reveals an orthologous region on OAR6 to human and bovine genomic regions affecting height and weight. Genet Sel Evol. 2015;47:66.
Article PubMed PubMed Central CAS Google Scholar
Lango Allen H, Estrada K, Lettre G, Berndt SI, Weedon MN, Rivadeneira F. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature. 2010;467:832–8.
Article CAS PubMed PubMed Central Google Scholar
Tetens J, Widmann P, Kühn C, Thaller G. A genome-wide association study indicates LCORL/NCAPG as a candidate locus for withers height in German Warmblood horses. Anim Genet. 2013;44:467–71.
Article CAS PubMed Google Scholar
Eberlein A, Takasuga A, Setoguchi K, Pfuhl R, Flisikowski K, Fries R, et al. Dissection of genetic factors modulating fetal growth in cattle indicates a substantial role of the non-SMC condensin I complex, subunit G (NCAPG) gene. Genetics. 2009;183:951–64.
Article CAS PubMed PubMed Central Google Scholar
Sahana G, Höglund JK, Guldbrandtsen B, Lund MS. Loci associated with adult stature also affect calf birth survival in cattle. BMC Genet. 2015;16:47.
Article PubMed PubMed Central Google Scholar
Setoguchi K, Watanabe T, Weikard R, Albrecht E, Kühn C, Kinoshita A. The SNP c.1326T > G in the non-SMC condensin I complex, subunit G (NCAPG) gene encoding a p.Ile442Met variant is associated with an increase in body frame size at puberty in cattle. Anim Genet. 2011;42:650–5.
Article CAS PubMed Google Scholar
Setoguchi K, Furuta M, Hirano T, Nagao T, Watanabe T, Sugimoto Y. Cross-breed comparisons identified a critical 591-kb region for bovine carcass weight QTL (CW-2) on chromosome 6 and the Ile-442-Met substitution in NCAPG as a positional candidate. BMC Genet. 2009;10:43.
Article PubMed PubMed Central CAS Google Scholar
Lindholm-Perry AK, Sexten AK, Kuehn LA, Smith TPL, King DA, Shackelford SD, et al. Association, effects and validation of polymorphisms within the NCAPG-LCORL locus located on BTA6 with feed intake, gain, meat and carcass traits in beef cattle. BMC Genet. 2011;12:103.
Article CAS PubMed PubMed Central Google Scholar
Vaysse A, Ratnakumar A, Derrien T, Axelsson E, Rosengren Pielberg G, Sigurdsson S, et al. Identification of genomic regions associated with phenotypic variation between dog breeds using selection mapping. PLoS Genet. 2011;7:e1002316.
Article CAS PubMed PubMed Central Google Scholar
Abo-Ismail MK, Vander Voort G, Squires JJ, Swanson KC, Mandell IB, Liao X, et al. Single nucleotide polymorphisms for feed efficiency and performance in crossbred beef cattle. BMC Genet. 2014;15:14.
Article PubMed PubMed Central CAS Google Scholar
Gowen LC, Petersen DN, Mansolf AL, Qi H, Stock JL, Tkalcevic GT, et al. Targeted disruption of the osteoblast/osteocyte factor 45 gene (OF45) results in increased bone formation and bone mass. J Biol Chem. 2003;278:1998–2007.
Article CAS PubMed Google Scholar
Wei J, Geale PF, Sheehy PA, Williamson P. The impact of ABCG2 on bovine mammary epithelial cell proliferation. Anim Biotechnol. 2012;23:221–4.
Article CAS PubMed Google Scholar
Sheehy PA, Riley LG, Raadsma HW, Williamson P, Wynn PC. A functional genomics approach to evaluate candidate genes located in a QTL interval for milk production traits on BTA6. Anim Genet. 2009;40:492–8.
Article CAS PubMed Google Scholar
Kühn C, Weikard R, Widmann P. Metabolomics: a pathway for improved understanding of genetic modulation of mammalian growth and tissue deposition. In: Proceedings of the 10th world congress on genetics applied to livestock production: 17–22 August 2014; Vancouver. 2014.
Liu Y, Duan X, Chen S, He H, Liu X, Liu Y, et al. NCAPG is differentially expressed during longissimus muscle development and is associated with growth traits in Chinese Qinchuan beef cattle. Genet Mol Biol. 2015;38:450–6.
Article PubMed PubMed Central Google Scholar
UniProt Consortium. The Universal Protein Resource (UniProt). Nucleic Acids Res. 2008;36:D190–5.
Article CAS Google Scholar
Martínez-Cerezo S, Sañudo C, Panea B, Medel I, Delfa R, Sierra I, et al. Breed, slaughter weight and ageing time effects on physico-chemical characteristics of lamb meat. Meat Sci. 2005;69:325–33.
Article PubMed Google Scholar
Campo MM, Olleta J, Sañudo C. Características de la carne de cordero con especial atención al Ternasco de Aragón. Agencia Aragonesa de Seguridad Alimentaria. 2008.
Allain D, Miari S, Usai MG, Barillet F, Sechi T, Sechi S, et al. SNP mapping of QTL affecting wool traits in a sheep backcross Sarda-Lacaune resource population. In: Proceedings of the 64th annual meeting of the European Federation of Animal Science: 26–30 August 2013; Nantes. 2013.
Cano M, Allain D, Foulquié D, Moreno C, Mulsant P, François D, et al. Fine mapping of birthcoat type in the Romane breed sheep. In: Proceedings of the 64th Annual Meeting of the European Federation of Animal Science: 26–30 August 2013; Nantes. 2013.
Olivier W, Olivier J, Greyling A. Quantifying the relationship between birth coat score and wool traits in Merino sheep. In: Proceedings of the 10th world conference on animal production: 23–28 November 2008; Cape Town. 2008.
Liu F, Visser M, Duffy DL, Hysi PG, Jacobs LC, Lao O, et al. Genetics of skin color variation in Europeans: genome-wide association studies with functional follow-up. Hum Genet. 2015;134:823–35.
Article PubMed PubMed Central Google Scholar
Johnston SE, McEwan J, Pickering NK, Kijas JW, Beraldi D, Pilkington JG, et al. Genome-wide association mapping identifies the genetic basis of discrete and quantitative variation in sexual weaponry in a wild sheep population. Mol Ecol. 2011;20:2555–66.
Article PubMed Google Scholar
Allais-Bonnet A, Grohs C, Medugorac I, Krebs S, Djari A, Graf A, et al. Novel insights into the bovine polled phenotype and horn ontogenesis in Bovidae. PLoS One. 2013;8:e63512.
Article PubMed PubMed Central Google Scholar
Wiedemar N, Tetens J, Jagannathan V, Menoud A, Neuenschwander S, Bruggmann R, et al. Independent polled mutations leading to complex gene expression differences in cattle. PLoS One. 2014;9:e93435.
Article PubMed PubMed Central CAS Google Scholar
Yuan FP, Li X, Lin J, Schwabe C, Bullesbach EE, Rao CV, et al. The role of RXFP2 in mediating androgen-induced inguinoscrotal testis descent in LH receptor knockout mice. Reproduction. 2010;139:759–69.
Article CAS PubMed PubMed Central Google Scholar
Scott DJ, Rosengren KJ, Bathgate RAD. The different ligand-binding modes of relaxin family peptide receptors RXFP1 and RXFP2. Mol Endocrinol. 2012;26:1896–906.
Article CAS PubMed PubMed Central Google Scholar
Wiedemar N, Drögemüller C. A, 1.8-kb insertion in the 3′-UTR of RXFP2 is associated with polledness in sheep. Anim Genet. 2015;46:457–61.
Article CAS PubMed PubMed Central Google Scholar
Lühken G, Krebs S, Rothammer S, Küpper J, Mioč B, Russ I, et al. The 1.78-kb insertion in the 3′-untranslated region of RXFP2 does not segregate with horn status in sheep breeds with variable horn status. Genet Sel Evol. 2016;48:78.
Article PubMed PubMed Central Google Scholar
Bartels CF, Bükülmez H, Padayatti P, Rhee DK, van Ravenswaaij-Arts C, Pauli RM, et al. Mutations in the transmembrane natriuretic peptide receptor NPR-B impair skeletal growth and cause acromesomelic dysplasia, type Maroteaux. Am J Hum Genet. 2004;75:27–34.
Article CAS PubMed PubMed Central Google Scholar
Vasques GA, Amano N, Docko AJ, Funari MFA, Quedas EPS, Nishi MY, et al. Heterozygous mutations in natriuretic peptide receptor-B (NPR2) gene as a cause of short stature in patients initially classified as idiopathic short stature. J Clin Endocrinol Metab. 2013;98:E1636–44.
Article CAS PubMed Google Scholar
García-Gámez E, Gutiérrez-Gil B, Sahana G, Sánchez JP, Bayón Y, Arranz JJ. GWA analysis for milk production traits in dairy sheep and genetic support for a QTN influencing milk protein percentage in the LALBA gene. PLoS One. 2012;7:e47782.
Article PubMed PubMed Central CAS Google Scholar
Saatchi M, Schnabel RD, Taylor JF, Garrick DJ. Large-effect pleiotropic or closely linked QTL segregate within and across ten US cattle breeds. BMC Genomics. 2014;15:442.
Article PubMed PubMed Central Google Scholar

Download references

Authors’ contributions

JJA and BGG conceived the study and designed the analysis design; BGG, PKC, CEB and ASV performed selection sweep mapping and bioinformatic analyses; PW and BGG developed selection mapping analysis scripts; BGG drafted the manuscript; JJ, PW and CEB supervised and edited the manuscript. All authors read and approved the final manuscript.

Acknowledgements

This work was supported by the AGL2015-66035-R project funded by the Spanish Ministry of Economy and Competitiveness (MINECO) and co-funded by the European Regional Development Fund. We thank the National Association of Spanish Churra Breeders for the close collaboration with our research group and the support for generating sequencing data of Churra genomes. B Gutiérrez-Gil is funded through the Spanish “Ramón y Cajal” Program (RYC-2012-10230) from MINECO. The support and availability to the computing facilities of the Foundation of Supercomputing Center of Castile and León (FCSCL) (http://www.fcsc.es) are greatly acknowledged. The ovine SNP50 K-chip HapMap dataset used in this work was provided by the International Sheep Genomics Consortium (ISGC) and obtained from http://www.sheephapmap.org in agreement with the ISGC Terms of Access. We are also grateful to the ISGC for the whole-genome sequencing datasets belonging to the project PRJNA160933 available at the SRA (https://www.ncbi.nlm.nih.gov/sra) that were analyzed in this study. In addition, we are grateful to the Australian Cooperative Research Centre for Sheep Industry Innovation for generating the whole-genome sequencing datasets included in project PRJNA325682 of the SRA and also included in this study.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Author information

Authors and Affiliations

Departamento de Producción Animal, Facultad de Veterinaria, Universidad de León, Campus de Vegazana s/n, León, 24071, Spain
Beatriz Gutiérrez-Gil, Cristina Esteban-Blanco, Praveen Krishna Chitneedi, Aroa Suarez-Vega & Juan-Jose Arranz
Fundación Centro Supercomputación de Castilla y León, Campus de Vegazana, León, 24071, Spain
Cristina Esteban-Blanco
Roslin Institute and R(D)SVS, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK
Pamela Wiener

Authors

Beatriz Gutiérrez-Gil
View author publications
You can also search for this author in PubMed Google Scholar
Cristina Esteban-Blanco
View author publications
You can also search for this author in PubMed Google Scholar
Pamela Wiener
View author publications
You can also search for this author in PubMed Google Scholar
Praveen Krishna Chitneedi
View author publications
You can also search for this author in PubMed Google Scholar
Aroa Suarez-Vega
View author publications
You can also search for this author in PubMed Google Scholar
Juan-Jose Arranz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Beatriz Gutiérrez-Gil.

Additional files

Additional file 1. Phenotypic, production and reproductive traits of Spanish Churra and Australian Merino sheep breeds.

12711_2017_354_MOESM2_ESM.docx

Additional file 2. Description of the population structure analyses performed with the 50K-chip genotypes of the samples considered in this study.

12711_2017_354_MOESM3_ESM.docx

Additional file 3. Figure S1. Graphical representation of the principal component analysis (PCA) performed with Eigensoft for the final of 50K-Chip genotypes analysed in this study for 238 fine wool Merino [Australian Industry Merino (n = 88), Australian Merino (n = 50) and Australian Poll Merino (n = 98)] and 278 Spanish Churra individuals. Figure S2. Graphical representation of the results of the cross-validation approach performed with the Admixture software to determine the best K-value. Figure S3. Graphical representation of the proportion of membership of each of the analysed populations for K = 2 as obtained with the Admixture_v1.3 software.

12711_2017_354_MOESM4_ESM.docx

Additional file 4. Description of the extent of the linkage disequilibrium and block structure based on the analysis of the 50K-chip genotypes of the samples considered in this study.

12711_2017_354_MOESM5_ESM.docx

Additional file 5: Figure S4. Average linkage disequilibrium (LD) as a function of genomic distance between markers based on the Churra and Australian Merino 50K-Chip genotypes analyzed in this study. The LD values (y-axis), provided as D’ and r2, are plotted against inter-marker distance bins (x- axis). For each case, the total number of marker pairs were assigned according to their physical distance into 14 categories: < 10 Kb, 10-20 Kb, 20-40 Kb, 40-60 Kb, 60-100 Kb, 200-500 Kb, 0.5-1 Mb, 1-2 Mb, 2-5 Mb, 5-10 Mb, 10-20 Mb, 20-50 Mb or > 50 Mb.

Additional file 6: Table S2. List of candidate genes considered in relation to wool-related traits.

Additional file 7: Table S3. Whole-genome sequence datasets analyzed in this study.

12711_2017_354_MOESM8_ESM.xlsx

Additional file 8: Table S4. Signatures of selection identified by the genetic differentiation analysis performed between the Australian Merino (Australian Industry Merino, Australian Merino and Australian Poll Merino) and the Churra populations analysed in this study.

12711_2017_354_MOESM9_ESM.xlsx

Additional file 9: Table S5. Selection signals identified by the analysis of observed heterozygosity performed in the Australian Merino populations analysed in the present study.

12711_2017_354_MOESM10_ESM.xlsx

Additional file 10: Table S6. Selection signals identified by the analysis of observed heterozygosity performed in the Churra population analysed in the present study.

12711_2017_354_MOESM11_ESM.xlsx

Additional file 11: Table S7. Selection signals identified by the analysis performed with the hapFLK software (P-value < 0.001) for the Australian Merino and Churra samples analysed in the present study.

12711_2017_354_MOESM12_ESM.xlsx

Additional file 12: Table S8. Signatures of selection identified by the cross-population extended haplotype homozygosity (XP-EHH) analysis performed between the Merino (Australian Industry Merino, Australian Merino and Australian Poll Merino) and the Churra populations analysed in this study (P-value < 0.001).

12711_2017_354_MOESM13_ESM.xlsx

Additional file 13: Table S9. List of genes extracted from the five convergence candidate regions (CCR) identified in this study as positive selection genomic regions in the fine wool Australian Merino lines.

12711_2017_354_MOESM14_ESM.xlsx

Additional file 14: Table S10. List of genes extracted from the five convergence candidate regions (CCR) identified in this study as positive selection genomic regions in the Spanish Churra breed.

12711_2017_354_MOESM15_ESM.xlsx

Additional file 15: Table S11. List of positional candidate genes included in the identified convergence candidate regions (CCR) that were highlighted by our survey with a list of 1459 genes including candidate genes related to traits of interest in sheep.

12711_2017_354_MOESM16_ESM.xlsx

Additional file 16: Table S12. Correspondence of the convergence candidate regions (CCR) identified in our study with previously published genetic effects (QTL and associations) with phenotypic traits of interest in sheep according to the AnimalQTLdb (http://www.animalgenome.org/cgi-bin/QTLdb/OA/index).

12711_2017_354_MOESM17_ESM.xlsx

Additional file 17: Table S13. Results of the four selection sweep mapping analyses between Spanish Merino and Spanish Churra sheep as a validation procedure. The selection signals identified by genetic differentiation (F _ST), reduction of heterozygosity (ObsHtz), and haplotype-based methods (hapFLK andn XP-EHH) were labeled following the same criteria as for the core analyses between Australian Merino and Spanish Churra breeds.

12711_2017_354_MOESM18_ESM.docx

Additional file 18. Figure S5. Graphical representation of the genetic differentiation analysis (a), and the analysis of reduced heterozygosity (b, c) when analysing the validation dataset “Spanish Merino [34] vs Spanish Churra”. Figure S6. Graphical representation of the selection sweep mapping analyses performed with the two haplotype-based methods used in this work, performed with the hapFLK (a) and the rehh (XP-EHH analysis) (b) software, for the validation dataset considered in the present work (Spanish Merino [34] vs Spanish Churra sheep breeds).

12711_2017_354_MOESM19_ESM.xlsx

Additional file 19: Table S14. Convergence candidate regions (CCR) of selection sweeps identified in the validation survey by contrasting genotypes of the Spanish Merino and Spanish Churra sheep breeds. The CCR were defined based on the overlapping between significant signatures of selection (SS) identified by the different individual analysis methods implemented in this study.

12711_2017_354_MOESM20_ESM.xlsx

Additional file 20: Table S15. Allele frequencies for the genetic variants (SNPs and indels) identified from the analysis of 28 WGSeq datasets (15 Churras and 13 Australian Merino) within the 18 genomic regions identified as CCR in the present study.

12711_2017_354_MOESM21_ESM.xlsx

Additional file 21: Table S16. Characterization of the intragenic SNP variations identified, through the processing of the WGSeq datasets considered, within each of the 18 convergence candidate regions (CCR) identified according to the annotation performed with the eVEP software (ensembl Variant Effect Predictor; for further information about the column field names see http://www.ensembl.org/info/docs/tools/vep/vep_formats.html).

12711_2017_354_MOESM22_ESM.xlsx

Additional file 22: Table S17. Summary of the samples included in the “PRJEB14685” Project (high-quality variant calls from the Sheep genomes project - Run1) of the EVA repository (http://www.ebi.ac.uk/ena/data/view/PRJEB14685) carrying the Australian Merino-related allele for the three missense mutations identified in the present study within the candidate convergence region CCR12 (OAR6: 37164263-38580198 bp).

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article

Gutiérrez-Gil, B., Esteban-Blanco, C., Wiener, P. et al. High-resolution analysis of selection sweeps identified between fine-wool Merino and coarse-wool Churra sheep breeds. Genet Sel Evol 49, 81 (2017). https://doi.org/10.1186/s12711-017-0354-x

Download citation

Received: 19 April 2016
Accepted: 19 October 2017
Published: 07 November 2017
DOI: https://doi.org/10.1186/s12711-017-0354-x

High-resolution analysis of selection sweeps identified between fine-wool Merino and coarse-wool Churra sheep breeds

Abstract

Background

Results

Conclusions

Background

Methods

Mapping of selection sweeps

Breed selection

Genotypes, quality control and analysis of population structure

Identification of candidate regions under selection using individual analyses

Identification of shared regions across methods

High-resolution analysis of selection sweep regions

Whole-genome sequencing (WGSeq) data

Bioinformatics analysis

Identification and study of divergent variability in the candidate regions

Results

Identification of candidate selection sweeps between Australian Merino and Churra sheep based on individual methods

Candidate regions identified by the genetic differentiation analysis

Candidate regions identified based on reduced heterozygosity

Candidate regions identified based on haplotype-based analyses

Convergence of results from the different analyses

Selection mapping validation results

Identification and annotation of divergent allelic variants in the identified CCR based on the analysis of whole-genome sequencing data

Results of the variant calling analysis

Identification of the divergent SNPs in the CCR regions

Discussion

Identification of selection sweeps in Australian Merino and Churra sheep breeds

Exploration of convergence candidate regions through WGSeq

Merino-related convergence candidate regions

Churra-related convergence candidate regions

Conclusions

References

Authors’ contributions

Acknowledgements

Competing interests

Publisher’s Note

Author information

Authors and Affiliations

Corresponding author

Additional files

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Genetics Selection Evolution

Contact us