The aim of this study was to identify and quantify genomic regions associated with IBH in Shetland pony mares and Icelandic horses in the Netherlands. Breed-specific GWAS were performed and overlapping associated genomic regions (within 15 Mb or less) were identified in both breeds.
Population stratification analysis
Data were gathered according to a matched case–control design to limit unwanted spurious associations due to population stratification, which might be caused by confounding of ‘the trait of interest’ with pedigree and other relevant (e.g. environmental) effects e.g.
. Population stratification due to pedigree was minimized by including paternal half-sib pairs. The multidimensional scaling plots based on breed-specific genomic kinship showed a high degree of overlap between cases and controls (Figure
1). Also, the Bayes method accounts for population stratification due to pedigree by fitting all SNP simultaneously e.g.
. In Shetland pony mares, confounding of IBH with relevant effects such as region and withers height category was negligible, since the analysis revealed no significant association between IBH and these effects. In Icelandic horses, importation from Iceland had a significant effect on IBH (p = 0.002). Our results showed (Figure
2) that differences in genetic background between imported Icelandic horses and Icelandic horses born in Europe were limited, which agrees with Broström et al.
 and Andersson et al.
. However, Culicoides spp. is absent in Iceland and consequently IBH is not observed e.g.
. Increased environmental pressure after export and lack of exposure to Culicoides spp. before export are suggested to result in increased incidence and more severe cases after export e.g.
[32, 34]. The insect bite hypersensitivity statuses of imported Icelandic horses and Icelandic horses born in Europe may not represent the exact same phenotype. The final Icelandic horse data, therefore, only included horses born in Europe.
Single SNP and multi-locus models
Schurink et al.
 published genomic regions associated with IBH in 188 Shetland pony mares using 50 k SNP genotypes. In our study, several similar associated genomic regions within 1 Mb distance were identified in Shetland pony mares on chromosomes 3, 11, 20 and 27. However, Schurink et al.
 used logistic regression fitting single SNP effects, while our Bayes-C method fitted all SNP simultaneously. Mucha et al.
 concluded that estimated variances of identified QTL were not overestimated when all SNP were fitted simultaneously, since the variance explained will be distributed across all SNP in high LD with the QTL and therefore cannot exceed the total variance (in contrast to single SNP analysis). Indeed, Sahana et al.
 compared various association mapping methods and showed that a Bayesian variable selection model that fitted all SNP simultaneously performed best overall. The Bayesian variable selection model using the posterior probability of a QTL in 1 cM overlapping regions to identify associated genomic regions had the highest power to map small QTL (i.e. explaining 2% of genetic variance) and most precise estimates of QTL location. However, a mixed model analysis fitting random additive genetic effects and testing single SNP performed almost as well, although it was computationally more demanding and multiple testing correction was needed. Like in Sahana et al.
, analysis of the Shetland pony mare data using logistic regression with single SNP effects, as in Schurink et al.
, was computationally much more demanding than the Bayesian variable selection method used here and ignored dependencies between SNP. Although several similar associated genomic regions were identified using these two methods, Bayesian variable selection model using posterior probabilities of genomic regions is preferred as it is computationally less demanding, it does not require correction for multiple testing and it accounts for population stratification due to pedigree by fitting all SNP simultaneously.
Non-overlapping window approach
The window approach takes LD between SNP into account and is therefore a better criterion for QTL identification than posterior probabilities of single SNP
[23, 36]. However, optimal choice of the size of a window is not clear, as a specific window may contain more than one QTL or a QTL may be spread over more than one window
. For example, after merging windows at 75 and 76 Mb on chromosome 17 in Shetland pony mares and performing another GWAS, the percentage of variance explained by this 2 Mb genomic region was 0.426, which roughly equals the sum of genetic variance explained by the two separate 1 Mb windows (Table
4). Because these 1 Mb windows were consecutive, the percentage of variance explained by the 2 Mb windows might be considered as total QTL variance (if indeed the two consecutive 1 Mb windows represent the same QTL), whereas the percentage of variance explained by each 1 Mb might each represent a proportion of QTL variance. However, the true QTL position might not be contained in the window with strongest association. Precision of QTL mapping depends on several factors, such as the method of analysis, marker density, sample size and variance explained by the QTL
. In a simulated data set of binary phenotypes and SNP genotypes by Mucha et al.
, the mean distance of estimates from true QTL positions ranged from 0.30 to 0.77 Mb, depending on the method of analysis used. However, the SNP density simulated by Mucha et al.
 was higher than in our study. Because LD can differ between genomic regions e.g.
[38, 39], LD within a genomic region could be used to determine the optimal size of a window in a given region, although further research is needed to determine the relationship between LD structure and optimal window size.
Genome-wide association study
Associated genomic regions identified in both breeds (Figure
3) suggest interesting candidate genomic regions to follow-up on. A simultaneous GWAS of both breeds is expected to increase power to detect associations, as more data would be included. However, GWAS across breeds will be less likely to detect SNP that are in LD with QTL in only one breed and will be more likely to detect SNP in LD with QTL across both breeds, provided LD phase is conserved across breeds e.g.
[40, 41]. To meet these requirements, SNP and QTL need to be physically close or, ideally, represent the actual mutation (which is unlikely). De Roos et al.
 concluded that roughly 50 000 SNP are required to have sufficient LD (i.e. ≥ 0.20) for genomic selection within a dairy cattle breed but that 300 000 SNP are required to find SNP that are in LD with the QTL across breeds. Persistency of LD phase extended less than 10 kb between bovine breeds that diverged hundreds of generations ago
. The consistency of LD phase between Shetland ponies and Icelandic horses was not investigated. Shetland ponies and Icelandic horses did cluster together in the phylogenetic analysis of van de Goor et al.
, which used equine short tandem repeat loci. However, divergence of the breeds occurred many generations ago, thus LD from the ancestral population is expected to have been broken down
. Also, the current equine SNP density results in insufficient LD (roughly 0.3
) to expect to find SNP that are in LD with QTL across breeds.
Research on IBH using the candidate gene approach or GWAS in horses has been limited. Using a candidate gene approach, Andersson et al.
 concluded that SPINK5 (serine peptidase inhibitor, Kazal type 5) on ECA14 was not associated with IBH in Swedish-born Icelandic horses. In our study, no genomic regions associated with IBH were found on ECA14. Hořin et al.
 investigated polymorphisms in various immune response related genes to identify associations with R. equi and L. intracellularis that cause respectively lung and gastrointestinal infections in horses. Several polymorphisms were significantly associated with these infections, including microsatellite locus HMS01 on ECA15. Marti et al.
 concluded that locus HMS01 is associated with IBH. In our study, genomic regions associated with IBH were identified on ECA15 but only in Icelandic horses. Various IL1 (interleukin 1) related genes are located in or around these regions.
We anticipated a common genetic background of IBH across breeds, although breed-specific genetic influences on IBH cannot be excluded. However, SNP densities within genomic regions could differ between Shetland pony mares and Icelandic horses due to breed-specific edits based on MAF and call-rate. Also, LD between SNP and QTL might be present in one breed but absent in the other e.g.
, thereby impeding validation of QTL across breeds. Associated genomic regions identified in both Shetland pony mares and Icelandic horses were considered most interesting to follow-up on and were found on ECA3, 7, 11, 20 and 23 (Figure
5). However, positional candidate genes adjacent to associated genomic regions were identified only for the genomic region on ECA20. No other candidate gene with known function in immunology or allergy was identified in or adjacent to across-breed associated genomic regions. The equine lymphocyte antigen (ELA) class II region is located on ECA20 (spanning 32 and 33 Mb) between the associated genomic regions identified in the Shetland pony mares and Icelandic horses (Tables
3). ELA, or equine major histocompatibility complex, evokes an immune response by recognizing many foreign molecules
. Both serological
[11, 12] and genomic research
 have identified an association between ELA class II antigens and IBH. Andersson et al.
 concluded that the same allele at an ELA locus is associated with IBH in two distinct horse breeds and homozygosity across the ELA region increased IBH sensitivity. An association with IBH on ECA20 was also found by Schurink et al.
, although the identified region was approximately 8 Mb away from the ELA class II region. However, coverage within the region was poor for the Illumina® EquineSNP50 Genotyping BeadChip (Illumina Inc.) used by Schurink et al.
, but improved in the current equine HD chip. Associated genomic regions on ECA20 that were identified in the Shetland pony mares and Icelandic horses were within 2 Mb from the ELA class II region, which is reasonably close to confirm the impact of ELA class II region on IBH.
Conclusions and implications
The genome-wide association study performed here identified several genomic regions associated with IBH in both Shetland pony mares and Icelandic horses. On ECA20, associated genomic regions were identified in both breeds that were within 2 Mb from the equine lymphocyte antigen class II region containing candidate genes. Knowledge on genes associated with IBH will contribute to our understanding of its biology, enabling more efficient therapy, prevention and selection in order to decrease IBH prevalence. Sequencing candidate genes within the equine lymphocyte antigen class II region might identify the functional mutation. Selection on functional mutations, i.e. direct markers, is more effective than indirect markers (i.e. LD and linkage equilibrium markers)
. However, genetic gain for marker-assisted selection using only a small number of significant markers to trace a limited number of QTL (although often with larger effects) is likely to be small because a large number of QTL are expected to explain genetic variation in complex traits e.g.
. In genomic selection, dense genome-wide markers are used to estimate genomic breeding values based on marker effects across the entire genome. Marker density is assumed to be sufficient so that each QTL is in LD with at least one marker or with a set of markers. Therefore, genomic selection could potentially capture the total genetic variance for a complex trait e.g.
. Possibilities for genomic selection on IBH in horse populations or even across horse populations and corresponding implications must be investigated before implementation is considered.