The extent of linkage disequilibrium in beef cattle breeds using high-density SNP genotypes

Background The extent of linkage disequilibrium (LD) between molecular markers impacts genome-wide association studies and implementation of genomic selection. The availability of high-density single nucleotide polymorphism (SNP) genotyping platforms makes it possible to investigate LD at an unprecedented resolution. In this work, we characterised LD decay in breeds of beef cattle of taurine, indicine and composite origins and explored its variation across autosomes and the X chromosome. Findings In each breed, LD decayed rapidly and r2 was less than 0.2 for marker pairs separated by 50 kb. The LD decay curves clustered into three groups of similar LD decay that distinguished the three main cattle types. At short distances between markers (< 10 kb), taurine breeds showed higher LD (r2 = 0.45) than their indicine (r2 = 0.25) and composite (r2 = 0.32) counterparts. This higher LD in taurine breeds was attributed to a smaller effective population size and a stronger bottleneck during breed formation. Using all SNPs on only the X chromosome, the three cattle types could still be distinguished. However for taurine breeds, the LD decay on the X chromosome was much faster and the background level much lower than for indicine breeds and composite populations. When using only SNPs that were polymorphic in all breeds, the analysis of the X chromosome mimicked that of the autosomes. Conclusions The pattern of LD mirrored some aspects of the history of breed populations and showed a sharp decay with increasing physical distance between markers. We conclude that the availability of the HD chip can be used to detect association signals that remained hidden when using lower density genotyping platforms, since LD dropped below 0.2 at distances of 50 kb.


Background
Linkage disequilibrium (LD) between molecular markers reflects the correlation between genotypes of two markers or the degree of non-random association between their alleles. Previous studies that used single nucleotide polymorphisms (SNPs) to describe patterns of LD in cattle at the whole-genome level [1][2][3][4][5][6] have suggested that 30 000 to 300 000 SNPs are necessary to perform a genome-wide association study (GWAS), depending on the trait studied and the statistical power desired [1,2]. Today, the availability of high-density SNP platforms that can assay more than 0.5 million loci offers the required marker density.
The extent of LD has implications for both GWAS and the delivery of accurate genomic predictions. However, its importance is often neglected despite the fact that it is known that it can introduce bias. Collecting and using SNP genotyping data have exploded for cattle in the last few years due in part to decreasing genotyping cost and to efforts to improve cattle breeding through genomic selection. Despite this, few studies have documented the behaviour of LD using the expanded set of 777 000 SNPs available on the BovineHD platform (Illumina Inc, San Diego). One of the significant advances of this denser chip is that it allows for an accurate estimation of LD over short physical distances as it contains many more marker pairs separated by 10 kb or less.
Here, we present the LD decay curves for SNPs on bovine autosomes and the X chromosome for three genetic groups of cattle breeds: Bos taurus (taurine), Bos indicus (indicine) and a composite beef cattle group. The results were compared to an independent population to confirm and potentially generalize the findings. This report is intended to be used as an updated description of the extent of LD in beef cattle.

Methods
All analyses were performed using genotypes generated in previous work. Therefore, for this study, no animal ethics approval was requested because no new animals were sampled.
Animals used in this study ( Table 1) were part of a large experimental Australian population [7] that includes the three main cattle types: Bos taurus breeds (Angus, Hereford, Limousin and Shorthorn), Bos indicus (Brahman) and composite cattle (Tropical Composite, Santa Gertrudis and Belmont Red). To confirm our findings, genotyping data from each cattle type (Angus, Brahman and Santa Gertrudis) were sourced from the Bovine HapMap consortium [3].
All animals were genotyped using the BovineHD SNP chip (Illumina, San Diego; http://www.illumina.com/ documents/products/datasheets/datasheet_bovineHD.pdf)  that includes 777 962 markers. Quality control and imputation of missing data in the Australian sample followed the pipeline described by Bolormaa et al. [8]. Briefly, stringent filters were applied to each SNP (call rate, duplicated map position, extreme departure from Hardy-Weinberg equilibrium), resulting in 729 068 informative SNPs. Missing genotypes were imputed within each breed type using 30 iterations of the BEAGLE software [9]. Genotypes for the same set of SNPs were extracted from the Bovine HapMap dataset [10] but missing genotypes were not imputed. LD between each pair of SNPs, measured as r 2 , which is less susceptible to bias due to differences in allelic frequency [4], and within-breed genetic diversity (heterozygosity and proportion of polymorphic SNPs) were calculated using PLINK v1.07 [11]. For the X chromosome, two scenarios were explored: one including all markers, and the second including only fairly polymorphic markers with a minor allele frequency (MAF) greater than 0.1 in all breeds.

Results and discussion
A high proportion of polymorphic markers was observed across all breeds, with the taurine breeds showing a slightly lower proportion (Pn~0.86) than their indicine and composite counterparts (Pn~1.00 for both) ( Table 1). Heterozygozity (He) ranged from 0.25 (Brahman from the HapMap dataset and Shorthorn) to 0.35 (Tropical Composite). In general, the composite breeds showed higher He (0.34) than the taurine (0.28) and indicine breeds (0.26) because they originated from a mixture of both these types of cattle. The pattern of LD differed between breeds and the resulting decay curves could be grouped according to breed type (Figure 1 and Additional file 1: Figure S1). At short marker distances, indicine breeds had lower LD for autosomes compared to either the composite (intermediate) or taurine (highest) breeds. This is in agreement with previous studies [2,3], but the degree of variation fluctuates. When the distance between markers was 10 kb, the average observed LD (r 2 ) for Brahman and Angus was 0.25 and 0.46, respectively (Table 1), which is equivalent to the LD reported for a comparable indicine cattle breed i.e. Nelore (0.27) [12], but higher than the value previously reported for Angus (0.35) [13]. This difference is not as clear for markers separated by a larger physical distance (> 70 kb) where LD quickly approached background levels, and r 2 was~0.10 in both studies and also in dairy breeds [6]. The average LD between unlinked markers (SNPs on different chromosomes) was at the background level or less across all breeds (see Additional file 2: Table S1) and was negatively correlated with sample size (Pearson correlation, r = −0.75). Indicine cattle continued to have a lower LD than most of the other breeds when the distances between markers were large, which suggests that they originated from a larger ancestral population.
Analysis of LD across the bovine X chromosome (BTAX) revealed a different pattern to that observed for autosomes ( Figure 2). The LD decay curves were still grouped by cattle type, however with a different ranking compared to what was observed for LD on autosomes. Over very short distances between markers on BTAX (< 5 kb), the indicine breeds still had the lowest average LD (r 2~0 .5) and the taurine breeds had the highest (r 2 > 0.6). However, contrary to the pattern observed for autosomes, LD across BTAX decayed fastest in the taurine breeds, such that for marker pairs separated by 50 kb, the average LD was lower than that in either of the composite indicine populations (Figure 2A). The same LD patterns were observed when males only were evaluated (see Additional file 1: Figure S2). However, when only SNPs that were polymorphic for all breeds (MAF > 0.1) were used, the LD decay for BTAX became much more homogeneous across all breeds and, in fact, did not differ much from the results obtained for autosomes ( Figure 2B). Because of the bottlenecks that cattle populations have experienced since their domestication and more recently during breed formation and because of the frequent intensive use of artificial insemination, it would be reasonable to expect extensive LD on BTAX. This expectation agrees with the LD decay observed for indicine and composite breeds when using all SNPs but not with the LD decay observed for all taurine breeds, nor for the LD decay observed for all breeds when only polymorphic SNPs were used. We speculate that the use of all markers inflated the LD observed for indicine and composite breeds (or biased the LD for taurine breeds downwards). However, the use of only polymorphic SNPs was too stringent and did not allow the analyses to capture the expected difference in LD on BTAX due to its unique inheritance.
To assess whether the results obtained here were a specific feature of the Australian population, we repeated the analyses with an independent sample of Angus, Santa Gertrudis and Brahman animals from the Bovine HapMap dataset [3,10]. Results for all analyses on these populations showed high concordance with LD observed in the Australian populations for both the autosomes and BTAX (Figures 1 and 2).