- Open Access
Assessing signatures of selection through variation in linkage disequilibrium between taurine and indicine cattle
Genetics Selection Evolutionvolume 46, Article number: 19 (2014)
Signatures of selection are regions in the genome that have been preferentially increased in frequency and fixed in a population because of their functional importance in specific processes. These regions can be detected because of their lower genetic variability and specific regional linkage disequilibrium (LD) patterns.
By comparing the differences in regional LD variation between dairy and beef cattle types, and between indicine and taurine subspecies, we aim at finding signatures of selection for production and adaptation in cattle breeds. The VarLD method was applied to compare the LD variation in the autosomal genome between breeds, including Angus and Brown Swiss, representing taurine breeds, and Nelore and Gir, representing indicine breeds. Genomic regions containing the top 0.01 and 0.1 percentile of signals were characterized using the UMD3.1 Bos taurus genome assembly to identify genes in those regions and compared with previously reported selection signatures and regions with copy number variation.
For all comparisons, the top 0.01 and 0.1 percentile included 26 and 165 signals and 17 and 125 genes, respectively, including TECRL, BT.23182 or FPPS, CAST, MYOM1, UVRAG and DNAJA1.
The VarLD method is a powerful tool to identify differences in linkage disequilibrium between cattle populations and putative signatures of selection with potential adaptive and productive importance.
When a part of the genome that confers enhanced fitness or productive ability is preferentially kept in a population by increasing the frequency of favorable alleles, neutral loci that surround this region and that are in linkage disequilibrium (LD) with it, are also retained, thus driving the frequency of particular haplotypes in the region towards fixation in a pattern that decays progressively with distance from the causative location [1–4]. Such a selective sweep can be detected by reduced haplotype diversity and a different LD pattern when compared to those of the surrounding background [2, 5]. Characterizing regions that are affected by selection may enable inferences on the functionality of a genomic region and possibly the effects of specific genes or gene combinations on specific traits [3–6].
Indicine (i.e., Bos primigenius indicus) cattle have been bred for adaptation to tropical and marginal production environments [7, 8], while taurine (i.e., Bos primigenius taurus) cattle have been intensively selected for production in temperate regions of the world [5, 7, 9]. Analyzing differences between these two sub-species of cattle and comparing breeds selected for different purposes (milk or beef) within these subspecies may yield insights into genomic regions that are impacted by these differences in adaptation and productivity traits associated with these two groups of cattle . The amount of LD that exists in genomic regions within a population is a key parameter to trace selective sweeps [2, 3] and differences in decay of LD between bovine populations have been reported [9–12].
Analyses based on the study of regional variation of LD within a population compared to their background LD level, and the contrast of the regions with the same analyses in other populations, allow the assessment of signals of differential selection, also called signatures of selection (SS), in different cattle breeds. In addition, a high coincidence between SS and copy number variants (CNV) has been reported for the human Hapmap populations , which suggests that selection mechanisms may possibly act through copy number differences . Indeed, a study of the effects of CNV on gene expression in Drosophila identified several potential outcomes of gene copy number variation, including the possibility that gene expression increases, decreases or remains stable as copy number fluctuates . Thus, it is of interest to compare SS obtained via analysis of LD variation with reported CNV for the bovine genome [16, 17].
A total of 108 Nelore (NEL), 29 Gir (GIR), 33 Angus (ANG), and 85 Brown Swiss (BSW) individuals were genotyped using the Illumina BovineHD BeadChip (HD chip) . The samples used were either derived from previous studies, approved by local ethical committees, or obtained from AI centers through their routine practice so no further ethical approval was required for the present analysis. Only autosomal SNPs were included in the analysis, resulting in approximately 735 000 SNPs. Quality control measures were calculated using the PLINK software [19, 20] separately for each breed; parameters and thresholds used were a SNP minor allele frequency of at least 5%, a genotype call rate of at least 90%, both per SNP and per animal, and a Hardy-Weinberg equilibrium z-test with p > 10-6. In addition, the population was pruned for close relationships using the identity-by-state (IBS) relationship matrix, or in other words the pairwise genomic kinship coefficient as proposed by Leutenegger et al. , estimated with the GenABEL R package ibs function  and removing one of the individuals from a pair with an IBS > 0.8 (this limit was defined experimentally by assessing IBS relationships of 20 half-sibs). Final SNP counts and numbers of individuals used in the analyses are in Table 1.
A total of six pair-wise comparisons between the four breeds were conducted. These comparisons included differences between the indicine and taurine (I/T) subspecies, differences between dairy and beef (D/B) breed types, and both subspecies and breed type differences (I/T, D/B). Specifically, the six comparisons were NEL/ANG (I/T), GIR/BSW (I/T), GIR/NEL (D/B), BSW/ANG (D/B), GIR/ANG (I/T, D/B), and BSW/NEL (I/T, D/B). Since the method applied here requires using common SNPs, i.e. SNPs that segregate in both breeds compared, for each comparison the coincident SNPs after quality control were extracted. The number of SNPs used in each analysis is in Table 2.
Principal component analysis
To have an overview of the population structure pertaining to the individuals and breeds included in the study, a Principal Component Analysis (PCA) was carried out using the IBS matrix generated with GenABEL , by converting the calculated genomic kinship coefficients to squared Euclidean distances that capture the differences between individuals via classical multidimensional scaling  in n-1 dimensional spaces (where n represents the number of samples) of n eigenvectors, by applying the cmdscale function from the R ‘stats’ package v.3.0.1 .
To provide an insight about the overall levels of LD in the different breeds, genome-wide pairwise r2 values of SNPs separated by a maximum distance of 100 kilobases (kb) (average SNP distance was 7.9 kb), were calculated and graphed using R  and PLINK [19, 20] software.
VarLD is a program for quantifying differences in genome-wide LD patterns between populations . The software quantifies for each window of 50 SNPs the signed r2 of all pairwise comparisons and a square matrix is built with the results, representing a correlation matrix between all SNPs . Equality between the elements of the two matrices is estimated by comparing the extent of departure between their respective ranked eigenvalues after eigen-decomposition of each matrix . A raw VarLD score is assigned for the window as the trace of the difference between the respective diagonal matrices with the sorted eigenvalues in descending order . The magnitude of this score gives a measure of the degree of dissimilarity between the correlation matrices and is used to quantify the extent of regional LD differences between the populations [25, 26]. Positive selection for genes in a genomic region from a specific population is likely to produce a different LD pattern in that region when compared to a non-selected population, which leads to the identification of the region [2, 3, 6, 26].
The methodology used to calculate VarLD scores is described in more detail in [13, 25, 26]. In short, permutation is used to obtain a Monte Carlo statistical significance and the scores are standardized (S i ’) to center the distribution of the scores around a mean of zero and a standard deviation of one, helping to avoid bias in the raw VarLD scores due to differences in the size of the windows in terms of base-pairs (bp) and the populations being compared having different background LD levels. The software uses sliding windows and we applied windows containing 50 SNPs and a step-size of one SNP, following Teo et al. . A window was flagged as a putative SS (SS region) if the associated score S i ’ was greater than or equal to the score at the 99.99th percentile and 99.9th percentile of all scores across the genome. The middle position of the first window in an identified SS was taken as the starting point of a signal, and the end position was the middle of the final window in the SS.
Assigning signals to a breed
To assess which breed showed a selective sweep in a particular region with extreme S i ’, we graphed LD heatmaps of the r2 between all SNPs from the identified SS region, using PLINK [19, 20] and R . Since the levels of r2 differed greatly between the two breeds on each comparison group it was relatively easy to determine the origin of a sweep by assigning it to the breed with the higher LD levels in the region.
To have an additional evaluation of the LD differences between the breeds included in the identified SS regions, we estimated the average r2 of SNPs in windows of 200 kb, with a step-size of 20 kb, discarding any windows that included less than 50 SNPs, and then graphed the results using R . Only the graphs corresponding to the regions explored in detail in this publication are presented, together with graphical representation of the VarLD scores in these candidate SS regions.
After the SS candidate regions were defined, we extracted details on these regions, including comparison group, chromosome, and bp position (middle position of the starting and ending windows included in the peak). Then, the SS regions were sorted by chromosome and bp position, and common signals across comparison groups were highlighted. To identify genes possibly associated with the SS regions, we compared the bp position of the regions to the position of the genes listed from the Ensembl Biomart Tool  for the UMD3.1 Bos taurus genome assembly [28, 29], and extracted a list of genes having a common position with the SS regions.
Confounding flagged regions with CNV regions and previously reported SS
Regions that were flagged by the above method were compared to the latest CNV reports by Bickhart et al.  and Hou et al.  on the bovine genome to detect common regions between VarLD SS and variations in copy number. The resulting signals were also compared to previously published SS using different methodologies and SNP densities. Information from the supplementary files of these publications was used for the comparisons.
The PCA results (Figure 1) show that the first Principal Component (PC) explaining 10.2% of the SNP variation clearly separates the taurine and indicine populations, while the second PC explaining 3.7% of the variance divides each subspecies separating the breeds correctly. The patterns of dispersion also indicate that the two indicine breeds are genetically closer to each other and have lower within breed variance as compared to the taurine breeds. The results of LD decay up to a distance of 100 kb for the four breeds are in Figure 2. The pattern of decay shows higher LD at short distances for the taurine than the indicine breeds, particularly for Angus, reaching an average r2 of 0.3 at a distance of almost 40 kb, while both indicine breeds showed a faster decay, reaching an average r2 of 0.3 at approximately 20 kb.
The genome-wide distribution of standardized VarLD scores for the six comparison sets is in Figure 3. Strong SS were confined to narrow regions of the genome. The most distinct peaks were observed for the ANG/BSW and the GIR/NEL comparisons, which show that the largest VarLD scores are found when comparing different production types within a subspecies. This result is confirmed by the differences in the percentile distributions between the six comparison sets (Table 3), which shows higher 0.1 and 0.01 percentile scores for these two comparisons (ANG/BSW and GIR/NEL).
For the top 0.01 percentile scores across all comparisons, 26 signals were found. Six SS were identified in more than one comparison and 17 genes were associated with these signals. For the top 0.1 percentile scores, 165 signals were detected, covering 10.76 Mb and representing 0.43% of the autosomal genome. Combining the SS shared across several comparison analyses, a total of 42 regions were identified with 125 genes related to these genomic positions (see Additional file 1).
For the I/T comparisons, detailed results for a signal that was found on BTA6 at 81.5-81.7 Mb and was shared across the NEL/ANG, NEL/BSW, GIR/ANG and GIR/BSW analyses are shown in Figure 4. This signal lies within the annotated boundaries of the TECRL (trans-2,3-enoyl-CoA reductase-like) gene (ENSBTAG00000024826) . For this region, the two taurine breeds showed sustained high levels of LD, indicating a selective sweep in both these breeds. In addition, a loss of CNV, a type of variation caused by loss of genetic material due to deletions, was observed in this region, between 81.46 and 81.58 Mb, encompassing 71 SNPs and 120 kb.
The NEL/ANG and GIR/ANG comparisons identified a SS on BTA3 between 14.9 and 15.5 Mb, with a peak between 15.37 and 15.39 Mb (Figure 5). When assessing the LD behavior of the three breeds involved in these comparisons, we found that the signal corresponded to a region with extended high LD in the ANG breed near the FPPS_BOVIN or BT.23182 (ENSBTAG00000003948)  gene.
For the D/B comparisons, the strongest signals were observed within subspecies, and primarily from taurine breed comparisons. For the taurine D/B comparison, one of the signals with the highest VarLD score was located on BTA24, between 37.79 and 37.84 Mb. This region (Figure 6) includes the annotated boundaries of the MYL9 (myosin, light chain 9, regulatory) and the MYL12B (myosin, light chain 12B, regulatory) genes , and a high LD level between the SNPs in this region indicates that the SS is associated with the ANG breed. For the indicine D/B comparison, a signal on BTA5 between 48.5 and 49.1 Mb overlapped with the LEMD3 (LEM domain containing 3) gene (see Figure 7), and further analysis assigned this sweep to the Gir breed.
Thirty-four signature signals from the top 0.1 percentile were found in regions that contained reported bovine CNV [16, 17], and genes located in these regions are presented in Table 4. These 34 regions cover 1.84 Mb or 0.07% of the autosomal genome, and many of the CNV positions coincided between the two reports  and  even though the authors have used different type of data as source of information for CNV discovery, sequence and SNPs, respectively. Information about several other candidate genes identified in this study through the VarLD methodology and that were previously identified in other cattle SS studies are presented in Additional file 2.
The VarLD method has the potential to capture recent strong selection because LD breaks down quickly over longer distances and, thus, high LD over an extended region is likely the result of recent selection. The human populations that have been analyzed [26, 30, 31] have very similar extents and patterns of LD and differ from each other only in limited regions. Cattle populations differ from human populations because they have experienced very strong recent selection caused by breed formation and use of advanced reproductive technologies. This makes the comparison of LD between cattle breeds worthwhile . Differences such as those observed here between indicine and taurine breeds in the rate of decay of LD with increasing distance have been previously reported but with lower marker densities [9, 10, 32, 33]. Our analysis clearly shows that the pattern of LD decay is faster in the indicine breeds compared to the taurine breeds. This supports the use of higher SNP densities in the indicine breeds, both for LD analysis and differences in LD patterns, in order to capture the nature of genomic events that affect narrow regions by having SNPs sufficiently close to the cause of an event to show significant LD. In this study, the regions with the highest VarLD scores that we identified were very narrow, with the largest signal covering 696.78 kb and the smallest signal involving single SNPs, which confirms the benefit of using a high-density SNP beadchip for this approach.
The effect of ascertainment bias in the choice of SNPs for different SNP chip platforms has been discussed in the literature , but the HD chip was constructed using a larger number of indicine breeds and individuals in the reference population, and in general seems to perform better on Bos indicus individuals, than the Illumina BovineSNP50 BeadChip . When the analysis was replicated using the 50Kchip SNPs, nine signals were found for the 0.1 percentile, covering 24.9 Mb of the autosomal genome, and ranging in size from 212.3 kb to 10.2 Mb (results not shown), with only four regions found in common with the analyses performed using the HD chip. This demonstrates a capacity for higher resolution analyses when using the HD chip.
Highest scoring SS for different comparisons
In the I/T comparisons, the strongest signal identified in all breed contrasts was created by unusually high LD in the taurine breeds. The TECRL gene encodes an enzyme that has an oxidoreductase activity on the CH-CH group of donors and other acceptors, and is directly involved in chemical reactions and pathways involving lipids . The SS region containing TECRL also overlaps with a region in which a particular type of CNV with loss of nucleotides is commonly observed, which suggests a possible role of copy number differences being causative in selection processes. Because the selection signature was found in the taurine breeds and is directly related to lipid production in the body, this is a suggestive signature of artificial selection for production purposes.
The FPPS_BOVIN gene, detected in a signal between ANG and both indicine breeds, is a gene involved in cholesterol (sterols) and steroid biosynthesis . Lipid synthesis is a very important physiological function for both milk  and beef [37, 38] production, and both taurine breeds have been selected intensively for these characteristics during the past decades [5, 7, 9].
The LEMD3 gene , detected as a selective sweep in the Gir breed, is a specific repressor of the transforming growth factor beta (TGF-beta) receptor, activin, and BMP signaling, and is involved in negative regulation of skeletal muscle cell differentiation, which might have been selected for in Gir, a breed developed for milk production . In humans, mutations leading to loss of function of this gene are associated with diseases causing sclerosing bone lesions and increased bone density, such as osteopoikilosis [39, 40]. This selection signature was reported by Ramey et al. , between 48.67 and 48.9 Mb on BTA5, using an approach based on sliding windows estimations of minor allele frequency (MAF).
In the taurine D/B comparison, two genes possibly related with variation in muscle accretion were identified i.e. MYL12A and MYL12B. MYL12A encodes a non-sarcomeric myosin complex component with calcium ion binding regulatory functions that are involved in signal transduction mechanisms, cytoskeleton formation, cell division and chromosome partitioning . MYL12B encodes a component of the Z-disc and the myosin II complex. Phosphorylation of MYL12B regulates the activity of non-muscle myosin II, resulting in higher MgATPase activity and the assembly of myosin II filaments. It is also involved in axon guidance processes, muscle contraction and regulation of muscle cell shape . When extending detection of signatures of selection to the 0.1 percentile of VarLD scores, a third gene in this region overlapped with the signal: MYOM1 (myomesin 1) , which encodes a 85 kDa protein that is a structural constituent of muscle. Together with its associated proteins, the MYOM1 protein interconnects the major structure of sarcomeres, the M bands and the Z discs, and is involved in muscle contraction . MYOM1 is one of the top 10 genes with preferential expression in muscle tissue  and has been associated with intramuscular fat content . In addition, the most significant physiological and system development functions associated with genes involved in meat tenderness include skeletal and muscular system development and tissue morphology, both of which have been related with muscle contraction in the pig .
The CAST (calpastatin isoform I) gene (ENSBTAG00000000874)  identified on BTA7 between 98.44 and 98.58 Mb in the NEL/ANG comparison, with the signal originating from ANG, has been intensively studied in different breeds and selected for to improve meat tenderness and other traits associated with beef quality [44–50]. The gene encodes an endogenous calpain (calcium-dependent cysteine protease) inhibitor that is involved in the proteolysis of amyloid precursor proteins. The calpain/calpastatin system is involved in numerous membrane fusion events, such as neural vesicle exocytosis and platelet and red-cell aggregation, and it is hypothesized that it affects the expression of genes encoding structural or regulatory proteins . Due to its capacity to prevent proteolysis , some polymorphisms in this gene have been shown to be associated with increased meat tenderness in beef cattle breeds .
The protocadherin beta gene cluster (PCDHB4, PCDHB6, PCDHB7, PCDHB13, among others)  was identified as having a selection signature in the taurine D/B comparison. This cluster encodes neural cadherin-like cell adhesion proteins that are integral plasma membrane proteins and most likely play critical roles in the establishment and function of specific cell-cell neural connections . In addition, these proteins are involved in nervous system development, synapse assembly, and synaptic transmission . As reported by MacGregor , protocadherin II contains a high-affinity cell surface binding site for Prion proteins and a number of protocadherin genes also function as tumor suppressor genes [52, 53]. Three protocadherin genes, protocadherin-psi1, PCDHB4 and PCDHB6 were previously reported as a selection signature using an Fst approach  and were found to overlap with CNV regions.
The UVRAG (UV radiation resistance associated) (ENSBTAG00000016355)  gene located on BTA15 between 56.2 and 56.3 Mb, was found to have a selection signature in the comparison between the BSW and ANG breeds. This gene is associated with DNA repair and positive regulation of autophagy . The human homologue of this gene  has been shown to complement the ultraviolet sensitivity of xeroderma pigmentosum group C cells  and encodes a protein with a C2 domain . This protein activates a Beclin1 complex that promotes autophagy and suppresses the proliferation and tumorigenicity of human colon cancer cells .
The DNAJA1 (DnaJ (Hsp40) homolog, subfamily B, member 1) (ENSBTAG00000045858) gene , located on BTA7 between 53.8 and 54 Mb, was identified in the comparison between the BSW and ANG breeds. It encodes a heat shock protein binding gene , which is a co-chaperone of the 70 kDa heat shock protein (Hsp70), and the DNAJA1/Hsp70 complex directly inhibits apoptosis . Because of its anti-apoptotic role, it has been considered as having an important role in meat tenderness in beef cattle. Association studies showed that this gene explained up to 63% of the phenotypic variability of tenderness in Charolais . The selection signature identified in the DNAJA1 gene could be a good indicator of selection for meat tenderness in the ANG breed during the last decades.
Comparison of VarLD with other methods to detect selection signatures
Several genes previously reported using other methods to detect SS were also identified in our study. The first example is the MSRB3 (methionine sulfoxide reductase B3) gene (ENSBTAG00000044017)  located on BTA5 between 48.56 and 48.74 Mb for which a SS was found in the NEL/GIR comparison between 48.65 and 49.35 Mb, which was also found by Ramey et al. in Brahman populations . This gene has been associated with the ‘long ear’ phenotype, which characterizes the Gir breed, and against which the Nelore breed has been strongly selected; this reveals a clear sign of differential selection between the indicine breeds . MSRB3 was first identified through a genome-wide association study as a candidate for a QTL involved in ear floppiness and morphology in dogs ; it is an indicator of strong artificial selection for a specific phenotype, and of the time at which the breed was formed.
A second example is the PCSK4 (proprotein convertase subtilisin/kexin type 4) gene (ENSBTAG00000002305)  on BTA7. The SS was identified in the NEL/GIR comparison at 45.5 Mb. This gene was also reported in SS studies by  and  in Jersey and Santa Gertrudis breeds. It is responsible for serine-type endopeptidase activity, which is involved in acrosome reaction, binding of sperm to the zona pellucida, sperm capacitation, and fertilization, which are all key functions of male fertility .
A third example is the TOX (thymocyte selection-associated high mobility group box) gene located on BTA14 between 26.6 and 26.9 Mb (ENSBTAG00000004954) . This gene encodes a possible bovine blood group antigen transcript. Blood group antigens have been shown to be under balancing selection in humans , and this gene was also reported to be under positive selection in the Normande and Montbéliarde French dairy breeds by Flori et al. .
Surprisingly, although haplotype-based and LD methods are expected to perform similarly , when the Rsb method for detecting selection signatures , which evaluates differences between breeds by estimating extended haplotype homozygosity (EHH) for each SNP location, was applied in our data, the results were quite different, and only two regions shared a signal. Another method, ΔDAF , which is based on the difference in the derived allele frequency between populations, was also tested and no common signals were identified. Considering the differences that the adopted LD method could have with other methods to identify SS, LD methods may detect regions that haplotype based-methods such as EHH , iHs , Voight’s iHs , and Rsb  might not detect, because genomic processes such as insertion/deletion (in/del) and other CNV produce LD patterns that may not be accounted for in the haplotype construction.
The fact that LD methods cannot deal with monomorphic SNPs, makes VarLD less sensitive for regions with completely fixed SNPs or with many fixed SNPs for one population. Such signatures might be detected using methods like smoothed Fst [9, 69] and MAF-based  approaches.
Comparison of VarLD results with CNV regions
Several of the identified selection signatures, especially for the indicine breeds, pointed to non-genic regions, including some CNV regions, such as (i) the signal on BTA6 between 66.75 and 66.78 Mb observed in the NEL/GIR comparison that coincides with a CNV on BTA6 between 66.75 and 66.76 Mb, and (ii) a CNV on BTA8 between 46.31 and 46.34 Mb that coincides with two overlapping SS observed in the GIR/BSW and GIR/ANG comparisons. These results suggest that different types of genomic variation, other than SNPs, may have a role in selection mechanisms. Given that CNV have been shown to influence gene expression through dosage-dependent interactions , it is possible that the identified VarLD regions correspond to selection for a specific gene copy number or for a certain duplicated paralog that is present in the duplication.
Across the whole genome, most CNV have evolved under neutral evolutionary pressures and their frequency and sequence context have been shaped by demographic events, mutation, and genetic drift [14, 15, 17]. However, CNV that are located in functional regions of overlapping genes, are mostly under purifying (negative) selection and only a few examples of positive selection on these CNV are known . Regions that differ in copy number between subspecies can be informative about ancient adaptations that may have led to species-specific phenotypes. Recent copy number changes can be an indicator that artificial selection may have led to genetic and phenotypic differences between breeds.
In previous studies using the VarLD method to analyze human data, a large fraction of the top signals corresponded to CNV for some of the populations compared . Comparing our signals from the top 0.1% VarLD scores to recently published reports on the detection of bovine CNV [16, 17], we found that 20.6% of our signals overlapped with reported CNV. Since these signals cover only 0.43% of the genome and the CNV discovery sets included 2.1 and 5.6% of the genome, respectively [16, 17], it is hypothesized that CNV are associated with differences in LD between populations and with selection processes [13–15].
VarLD is a powerful tool to identify differences in LD between cattle populations and possible signals of directional selection between them. The strongest signals differentiate LD patterns between breeds within subspecies and seem to point towards very recent selection. The narrow signatures of selection peaks that were detected in this analysis seem to indicate that both the methodology and the SNP density applied were appropriate to identify genes that underlie the identified selective sweeps.
Some of the genes found in the I/T comparisons indicate potential adaptive signatures, while the D/B comparisons point out genomic regions related to production of milk and beef. A high number of the genomic regions identified with the VarLD method were shown to be associated with physiological pathways of adaptation and production processes, and some of the genes present in these regions have also been reported to coincide with signatures of selection in other species.
The fact that 20.6% of the top VarLD signals overlap with recently reported CNV regions, which cover less than 7.7% of the genome, is a strong indicator of the role of CNV in selection within a breed type. In contrast, it is surprising that results from previous studies using the same breed comparisons and partially overlapping data sets, which applied haplotype-based methods to detect signatures of selection, had almost no overlap with the signals we detected using the VarLD method.
Hill WG, Robertson A: Linkage disequilibrium in finite populations. Theor Appl Genet. 1968, 38: 226-231. 10.1007/BF01245622.
Stephan W, Song YS, Langley CH: The hitchhiking effect on linkage disequilibrium between linked neutral loci. Genetics. 2006, 172: 2647-2663.
Biswas S, Akey JM: Genomic insights into positive selection. Trends Genet. 2006, 22: 437-446. 10.1016/j.tig.2006.06.005.
Sabeti PC, Schaffner SF, Fry V, Lohmueller J, Varilly P, Shamovsky O, Palma A, Mikkelsen TS, Altshuler D, Lander ES: Positive natural selection in the human lineage. Science. 2006, 312: 1614-1620. 10.1126/science.1124309.
Hayes BJ, Chamberlain AJ, Maceachern S, Savin K, McPartlan H, MacLeod I, Sethuraman L, Goddard ME: A genome map of divergent artificial selection between Bos taurus dairy cattle and Bos taurus beef cattle. Anim Genet. 2009, 40: 176-184. 10.1111/j.1365-2052.2008.01815.x.
Nielsen R: Molecular signatures of natural selection. Annu Rev Genet. 2005, 39: 197-218. 10.1146/annurev.genet.39.073003.112420.
Chan EKF, Nagaraj SH, Reverter A: The evolution of tropical adaptation: comparing taurine and zebu cattle. Anim Genet. 2010, 41: 467-477. 10.1111/j.1365-2052.2010.02053.x.
Hoffmann I: Climate change and the characterization, breeding and conservation of animal genetic resources. Anim Genet. 2010, 41: 32-46.
The Bovine HapMap Consortium: Genome-wide survey of SNP variation uncovers the genetic structure of cattle breeds. Science. 2009, 324: 528-532.
McKay SD, Schnabel RD, Murdoch BM, Matukumalli LK, Aerts J, Coppieters W, Crews D, Neto ED, Gill CA, Gao C, Mannen H, Stothard P, Wang Z, Van Tassell CP, Williams JL, Taylor JF, Moore SS: Whole genome linkage disequilibrium maps in cattle. BMC Genet. 2007, 8: 74-
Matukumalli LK, Lawley CT, Schnabel RD, Taylor JF, Allan MF, Heaton MP, O’Connell J, Moore SS, Smith TPL, Sonstegard TS, Van Tassell CP: Development and characterization of a high density SNP genotyping assay for cattle. PLoS ONE. 2009, 4: e5350-10.1371/journal.pone.0005350.
Gautier M, Faraut T, Moazami-Goudarzi K, Navratil V, Foglio M, Grohs C, Boland A, Garnier JG, Boichard D, Lathrop GM, Gut IG, Eggen A: Genetic and haplotypic structure in 14 European and African cattle breeds. Genetics. 2007, 177: 1059-1070. 10.1534/genetics.107.075804.
Teo YY, Fry AE, Bhattacharya K, Small KS, Kwiatkowski DP, Clark TG: Genome-wide comparisons of variation in linkage disequilibrium. Genome Res. 2009, 19: 1849-1860. 10.1101/gr.092189.109.
Clop A, Vidal O, Amills M: Copy number variation in the genomes of domestic animals. Anim Genet. 2012, 43: 503-517. 10.1111/j.1365-2052.2012.02317.x.
Zhou J, Lemos B, Dopman EB, Hartl DL: Copy-number variation: the balance between gene dosage and expression in Drosophila melanogaster. Genome Biol Evol. 2011, 3: 1014-1024. 10.1093/gbe/evr023.
Bickhart DM, Hou Y, Schroeder SG, Alkan C, Cardone MF, Matukumalli LK, Song J, Schnabel RD, Ventura M, Taylor JF, Garcia JF, Van Tassell CP, Sonstegard TS, Eichler EE, Liu GE: Copy number variation of individual cattle genomes using next-generation sequencing. Genome Res. 2012, 22: 778-790. 10.1101/gr.133967.111.
Hou Y, Bickhart DM, Hvinden ML, Li C, Song J, Boichard DA, Fritz S, Eggen A, DeNise S, Wiggans GR, Sonstegard TS, Van Tassell CP, Liu GE: Fine mapping of copy number variations on two cattle genome assemblies using high density SNP array. BMC Genomics. 2012, 13: 376-10.1186/1471-2164-13-376.
Illumina Inc: Bovine HD genotyping BeadChip datasheet.http://www.illumina.com/documents/products/datasheets/datasheet_bovineHD.pdf,
PLINK (v1.07) Purcell S: PLINK: Whole genome association analysis toolset.http://pngu.mgh.harvard.edu/purcell/plink/,
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, Maller J, Sklar P, de Bakker PIW, Daly MJ, Sham PC: PLINK: a toolset for whole-genome association and population-based linkage analysis. Am J Hum Genet. 2007, 81: 559-575. 10.1086/519795.
Leutenegger AL, Prum B, Génin E, Verny C, Lemainque A, Clerget-Darpoux F, Thompson EA: Estimation of the inbreeding coefficient through use of genomic data. Am J Hum Genet. 2003, 73: 516-523. 10.1086/378207.
Aulchenko YS, Ripke S, Isaacs A, van Duijn CM: GenABEL: an R library for genome-wide association analysis. Bioinformatics. 2007, 23: 1294-1296. 10.1093/bioinformatics/btm108.
Mardia KV: Some properties of classical multi-dimensional scaling. Commun Stat A-Theor. 1978, A7: 1233-1241.
The R Project for Statistical Computing: Free software environment for statistical computing and graphics.http://www.r-project.org/,
Ong RT, Teo YY: VarLD: a program for quantifying variation in linkage disequilibrium patterns between populations. Bioinformatics. 2010, 26: 1269-1270. 10.1093/bioinformatics/btq125.
Hee OT: PhD thesis. Population diversity as quantified by inter-population variation in patterns of linkage disequilibrium. 2012, National University of Singapore, Saw Swee Hock School of Public Health
Ensembl BioMart: Ensembl online genome data base BioMart Tool.http://www.ensembl.org/biomart/martview/,
UMD 3.1 assembly: NCBI assembly accession GCA_000003055.3.http://www.ensembl.org/Bos_taurus/Info/Annotation/#assembly,
Ensembl Cow (UMD3.1): Ensembl online genome data base.http://www.ensembl.org/Bos_taurus/Info/Index,
Wall JD, Pritchard JK: Haplotype blocks and linkage disequilibrium in the human genome. Nat Rev Genet. 2003, 4: 587-597. 10.1038/nrg1123.
Ardlie KG, Kruglyak L, Seielstad M: Patterns of linkage disequilibrium in the human genome. Nat Rev Genet. 2002, 3: 299-309. 10.1038/nrg777.
de Roos APW, Hayes BJ, Spelman RJ, Goddard ME: Linkage disequilibrium and persistence of phase in Holstein–Friesian, Jersey and Angus cattle. Genetics. 2008, 179: 1503-1512. 10.1534/genetics.107.084301.
Thévenon S, Dayo GK, Sylla S, Sidibe I, Berthier D, Legros H, Boichard D, Eggen A, Gautier M: The extent of linkage disequilibrium in a large cattle population of western Africa and its consequences for association studies. Anim Genet. 2007, 38: 277-286. 10.1111/j.1365-2052.2007.01601.x.
Lee YM, Han CM, Yi LI, Lee JJ, Kim LH, Kim JH, Kim DI, Lee SS, Park BL, Shin HD, Kim KS, Kim NS, Kim JJ: A whole genome association study to detect single nucleotide polymorphisms for carcass traits in Hanwoo populations. Asian Aust J Anim Sci. 2010, 23: 417-424.
EntrezGene: NCBI Resources EntrezGene.http://www.ncbi.nlm.nih.gov/,
Hawke JC, Taylor MW: Advanced Dairy Chemistry. Edited by: Fox PF. 1995, London: Chapman & Hall, 2: 37-77. Influence of nutritional factors on the yield, composition and physical properties of milk fat, 2, Lipids ,
Scollan N, Hocquette JF, Nuernberg K, Dannenberger D, Richardson I, Moloney A: Innovations in beef production systems that enhance the nutritional and health value of beef lipids and their relationship with meat quality. Meat Sci. 2006, 74: 17-33. 10.1016/j.meatsci.2006.05.002.
Garcia PT, Pensel NA, Sancho AM, Latimori NJ, Kloster AM, Amigone MA, Casal JJ: Beef lipids in relation to animal breed and nutrition in Argentina. Meat Sci. 2008, 79: 500-508. 10.1016/j.meatsci.2007.10.019.
Hellemans J, Preobrazhenska O, Willaert A, Debeer P, Verdonk PC, Costa T, Janssens K, Menten B, Van Roy N, Vermeulen SJ, Savarirayan R, Van Hul W, Vanhoenacker F, Huylebroeck D, De Paepe A, Naeyaert JM, Vandesompele J, Speleman F, Verschueren K, Coucke PJ, Mortier GR: Loss-of-function mutations in LEMD3 result in osteopoikilosis, Buschke-Ollendorff syndrome and melorheostosis. Nat Genet. 2004, 11: 1213-1218.
Ben-Asher E, Zelzer E, Lancet D: LEMD3: the gene responsible for bone density disorders (osteopoikilosis). Isr Med Assoc J. 2005, 7: 273-274.
Ramey HR, Decker JE, McKay SD, Rolf MM, Schnabel RD, Taylor JF: Detection of selective sweeps in cattle using genome-wide SNP data. BMC Genomics. 2013, 14: 382-10.1186/1471-2164-14-382.
Jakhesara SJ, Ahir VB, Padiya KB, Koringa PG, Rank DN, Joshi CG: Tissue-specific temporal exome capture revealed muscle-specific genes and SNPs in Indian Buffalo (Bubalus bubalis). Genomics Proteomics Bioinformatics. 2012, 10: 107-113. 10.1016/j.gpb.2012.05.005.
Hamill RM, McBryan J, McGee C, Mullen AM, Sweeney T, Talbot A, Cairns MT, Davey GC: Functional analysis of muscle gene expression profiles associated with tenderness and intramuscular fat content in pork. Meat Sci. 2012, 92: 440-450. 10.1016/j.meatsci.2012.05.007.
Casas E, White SN, Wheeler TL, Shackelford SD, Koohmaraie M, Riley DG, Chase CC, Johnson DD, Smith TPL: Effects of calpastatin and μ-calpain markers in beef cattle on tenderness traits. J Anim Sci. 2006, 84: 520-525.
Schenkel FS, Miller JR, Jiang Z, Mandell IB, Ye X, Li H, Wilton JW: Association of a single nucleotide polymorphism in the calpastatin gene with carcass and meat quality traits of beef cattle. J Anim Sci. 2006, 84: 291-299.
Barendse W, Harrison BE, Hawken RJ, Ferguson DM, Thompson JM, Thomas MB, Bunch RJ: Epistasis between calpain 1 and its inhibitor calpastatin within breeds of cattle. Genetics. 2007, 176: 2601-2610. 10.1534/genetics.107.074328.
Cafe LM, McIntyre BL, Robinson DL, Geesink GH, Barendse W, Pethick DW, Thompson JM, Greenwood PL: Production and processing studies on calpain-system gene markers for tenderness in Brahman cattle: 2. Objective meat quality. J Anim Sci. 2010, 88: 3059-3069. 10.2527/jas.2009-2679.
Bolormaa S, Porto Neto LR, Zhang YD, Bunch RJ, Harrison BE, Goddard ME, Barendse W: A genome-wide association study of meat and carcass traits in Australian cattle. J Anim Sci. 2011, 89: 2297-2309. 10.2527/jas.2010-3138.
Allais S, Journaux L, Levéziel H, Payet-Duprat N, Raynaud P, Hocquette JF, Lepetit J, Rousset S, Denoyelle C, Bernard-Capel C, Renand G: Effects of polymorphisms in the calpastatin and μ-calpain genes on meat tenderness in three French beef breeds. J Anim Sci. 2011, 89: 1-11. 10.2527/jas.2010-3063.
Curi RA, Chardulo LAL, Mason MC, Arrigoni MDB, Silveira AC, De Oliveira HN: Effect of single nucleotide polymorphisms of CAPN1 and CAST genes on meat traits in Nellore beef cattle (Bos indicus) and in their crosses with Bos taurus. Anim Genet. 2009, 40: 456-462. 10.1111/j.1365-2052.2009.01859.x.
MacGregor I: Prion protein and developments in its detection. Transfusion Med. 2001, 11: 3-14.
Yu J, Cheng YY, Tao Q, Cheung KF, Lam CN, Geng H, Tian LW, Wong YP, Tong JH, Ying JM, Jin H, To KF, Chan FK, Sung JJ: Methylation of protocadherin 10, a novel tumor suppressor, is associated with poor prognosis in patients with gastric cancer. Gastroenterology. 2009, 136: 640-651. 10.1053/j.gastro.2008.10.050.
Li Z, Xie J, Li W, Tang A, Li X, Jiang Z, Han Y, Ye J, Jing J, Gui Y, Cai Z: Identification and characterization of human PCDH10 gene promoter. Gene. 2011, 475: 49-50. 10.1016/j.gene.2011.01.001.
Qanbari S, Gianola D, Hayes B, Schenkel F, Miller S, Moore S, Thaller G, Simianer H: Application of site and haplotype-frequency based approaches for detecting selection signatures in cattle. BMC Genomics. 2011, 12: 318-10.1186/1471-2164-12-318.
UVRAG (ENSG00000198382)/HGNC ID: HGNC:12640 HGNC Hugo Gene Nomenclature Committee.http://www.genenames.org/data/hgnc_data.php?hgnc_id=12640,
Teitz T, Penner M, Eli D, Stark M, Bakhanashvili M, Naiman T, Canaani D: Isolation by polymerase chain reaction of a cDNA whose product partially complements the ultraviolet sensitivity of xeroderma pigmentosum group C cells. Gene. 1990, 87: 295-298. 10.1016/0378-1119(90)90316-J.
Zhao Z, Oh S, Li D, Ni D, Pirooz SD, Lee JH, Yang S, Lee JY, Ghozalli I, Costanzo V, Stark JM, Liang C: A dual role for UVRAG in maintaining chromosomal stability independent of autophagy. Dev Cell. 2012, 22: 1001-1016. 10.1016/j.devcel.2011.12.027.
Takahashi Y, Coppola D, Matsushita N, Cualing HD, Sun M, Sato Y, Liang C, Jung JU, Cheng JQ, Mulé JJ, Pledger WJ, Wang HG: Bif-1 interacts with Beclin 1 through UVRAG and regulates autophagy and tumorigenesis. Nat Cell Biol. 2007, 9: 1142-1151. 10.1038/ncb1634.
Gotoh T, Terada K, Oyadomari S, Mori M: Hsp70 DnaJ chaperone pair prevents nitric oxide and CHOP induced apoptosis by inhibiting translocation of Bax to mitochondria. Cell Death Differ. 2004, 11: 390-402. 10.1038/sj.cdd.4401369.
Bernard C, Cassar-Malek I, Le Cunff M, Dubroeucq H, Renand G, Hocquette JF: New indicators of beef sensory quality revealed by expression of specific genes. J Agric Food Chem. 2007, 55: 5229-5237. 10.1021/jf063372l.
Vaysse A, Ratnakumar A, Derrien T, Axelsson E, Rosengren Pielberg G, Sigurdsson S, Fall T, Seppälä EH, Hansen MST, Lawley CT, Karlsson EK, Bannasch D, Vilà C, Lohi H, Galibert F, Fredholm M, Häggström J, Hedhammar Á, André C, Lindblad-Toh K, Hitte C, Webster MT, The LUPA Consortium: Identification of genomic regions associated with phenotypic variation between dog breeds using selection mapping. PLoS Genet. 2011, 7: e1002316-10.1371/journal.pgen.1002316.
Fumagalli M, Cagliani R, Pozzoli U, Riva S, Comi GP, Menozzi G, Bresolin N, Sironi M: Widespread balancing selection and pathogen-driven selection at blood group antigen genes. Genome Res. 2009, 19: 199-212.
Flori L, Fritz S, Jaffrézic F, Boussaha M, Gut I, Heath S, Foulley JL, Gautier M: The genome response to artificial selection: a case study in dairy cattle. PLoS ONE. 2009, 4: e6595-10.1371/journal.pone.0006595.
Utsunomiya YT, Pérez O’Brien AM, Sonstegard TS, Van Tassell CP, Santana do Carmo A, Mészáros G, Sölkner J, Garcia JF: Detecting loci under recent positive selection in dairy and beef cattle by combining different genome-wide scan methods. PLoS ONE. 2013, 8: e64280-10.1371/journal.pone.0064280.
Tang K, Thornton KR, Stoneking M: A new approach for using genome scans to detect recent positive selection in the human genome. PLoS Biol. 2007, 5: e171-10.1371/journal.pbio.0050171.
Grossman SR, Shylakhter I, Karlsson EK, Byrne EH, Morales S, Frieden G, Hostetter E, Angelino E, Garber M, Zuk O, Lander ES, Schaffner SF, Sabeti PC: A composite of multiple signals distinguishes causal variants in regions of positive selection. Science. 2010, 327: 883-886. 10.1126/science.1183863.
Sabeti PC, Reich DE, Higgins JM, Levine HZ, Richter DJ, Schaffner SF, Gabriel SB, Platko JV, Patterson NJ, McDonald GJ, Ackerman HC, Campbell SJ, Altshuler D, Cooper R, Kwiatkowski D, Ward R, Lander ES: Detecting recent positive selection in the human genome from haplotype structure. Nature. 2002, 419: 832-837. 10.1038/nature01140.
Voight BF, Kudaravalli S, Wen XQ, Pritchard JK: A map of recent positive selection in the human genome. PLoS Biol. 2006, 4: e72-10.1371/journal.pbio.0040072.
Barendse W, Harrison BE, Bunch RJ, Thomas MB, Turner LB: Genome wide signatures of positive selection: the comparison of independent samples and the identification of regions associated to traits. BMC Genomics. 2009, 10: 178-10.1186/1471-2164-10-178.
Gautier M, Flori L, Riebler A, Jaffrézic F, Laloe D, Gut I, Moazami-Goudarzi K, Foulley J: A whole genome Bayesian scan for adaptive genetic divergence in West African cattle. BMC Genomics. 2009, 10: 550-10.1186/1471-2164-10-550.
Pan D, Zhang S, Jiang J, Jiang L, Zhang Q, Liu J: Genome-wide detection of selective signature in Chinese Holstein. PLoS ONE. 2013, 8: e60440-10.1371/journal.pone.0060440.
Stella A, Ajmone-Marsan P, Lazzari B, Boettcher P: Identification of selection signatures in cattle breeds selected for dairy production. Genetics. 2010, 185: 1451-1461. 10.1534/genetics.110.116111.
Gautier M, Naves M: Footprints of selection in the ancestral admixture of a New World Creole cattle breed. Mol Ecol. 2011, 20: 3128-3143. 10.1111/j.1365-294X.2011.05163.x.
We gratefully acknowledge the Bundesministerium für Land- und Forstwirtschaft, Umwelt und Wasserwirtschaft (Austria) for data on 29 BSW animals; Guilherme Penteado Coelho Filho, Daniel Biluca, Adriana Santana do Carmo and DeltaGen (Brazil) for technical assistance in sample acquisition and genotyping of 80 NEL animals, and Hapmap Bovine genome project for the rest of the data used for this analysis. Thanks to Pier KRK Ito for his assistance with graphical outputs. This work was supported in part by Projects 1265-31000-104-00D from USDA-ARS, Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) - process 560922/2010-8 and 483590/2010-0; and Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP) - process 2011/16643-2 and 2010/52030-2. Mention of trade names or commercial products in this article is solely for the purpose of providing specific information and does not imply recommendation or endorsement by the US Department of Agriculture. The USDA is an equal opportunity provider and employer.
The authors declare that they have no competing interests.
JS and AMPO conceived and designed the study. AMPO performed data preparation, statistical analysis, and drafted the manuscript. YTU, GM and DMB performed data preparation and participated in the statistical analysis. GEL, CPV, TS, MVBDS, JFG, and JS helped in data acquisition, interpretation of results, and critical revision. FG and JS coordinated the collaborative efforts. All authors read and approved the final manuscript.
Electronic supplementary material
Additional file 1:Signals found for the top 0.01 and 0.1 percentile of VarLD scores. Table with the signals found on the top 0.1 percentile of the distribution of VarLD scores, organized by chromosome and bp position along the genome; the breed comparison where the signal came from, the chromosome number, the starting and ending bp position and information for the genes spanning the regions under the signals are given in the table; additionally, signals that contain the 0.01 percentile are highlighted in yellow, and in regions that cover several genes, the genes underlying the highest scoring window are highlighted in blue. (DOC 328 KB)
Additional file 2:Common signals with other Signatures of Selection studies. Common signals found between our analysis and previous signatures of selection regions reported in the literature; the breed comparison where the signal came from, the chromosome number, the base pair spanning position, the position of the common region, the reference to the authors and the gene name are given in the Table; several VarLD signals that coincide with the same signal from other studies are highlighted in yellow, while several signals from other authors that concur with one VarLD Signal are highlighted in blue [5, 7, 9, 41, 54, 63, 64, 69–73]. (DOC 147 KB)
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.