Identifying genomic regions that underlie the response to Eimeria maxima in broilers allows us to understand the associated molecular mechanisms and provides us with candidate genes and genomic regions that can be used in breeding for improved resistance to coccidiosis. The host response to Eimeria is a complex trait controlled by a wide range of biological processes, which are in turn controlled by many genes with a small effect and a small number of genes with a moderate or large effect. Several QTL regions that are significantly associated with traits measured after the response to Eimeria challenge were detected in F2 crosses obtained from experimental lines with different degrees of susceptibility to coccidiosis [16, 17, 36]. In addition, similar approaches have been used for the identification of genomic regions associated with innate and adaptive immunity in laying hens [37] as well as survival rate and aimed at improving general robustness, especially in laying hens [38]. In addition, information regarding the genomic regions that are strongly associated with desirable traits can be incorporated in commercial poultry breeding programs. Regarding broilers, haplotypes that are associated with desirable traits identified in the final product of the four-crossway breeding scheme can be traced back in pedigreed populations for which selection would be performed within the pure lines. This approach may potentially improve general innate immunity as well as resistance to specific pathogens such as Eimeria species.
However, a more detailed understanding of the genetic mechanisms that control the response to Eimeria infection, as in the case of all other complex traits, requires a sufficiently large sample size, dense SNP coverage that can exploit the linkage disequilibrium and informative phenotypes [39]. Therefore, we performed the large-scale challenge study on 1936 commercial Cobb500 animals, which were genotyped using the 580K Affymetrix® Axiom® high-density genotyping array, providing considerable statistical power for GWAS for traits measured on all 1936 animals. Taking advantage of the size and structure of the challenge experiment [19], we conducted a GWAS to obtain more information on the underlying genetic determinism of the response to E. maxima in broilers. In addition, a post-GWAS functional analysis was performed to further understand the biology of the response to E. maxima in broilers.
GWAS
Previous QTL studies have reported that several genomic regions are associated with BWG in response to coccidiosis [15, 20]. Comparing our results with these previous studies, only one highly significant SNP that overlaps with a QTL on GGA3 (between 263 and 282cM; 98.1 and 107.0 Mb), detected by Pinard-van der Laan et al. [16] and Bacciu et al. [17], was observed. However, these different results were not completely unexpected considering that the previously conducted challenge study was performed with animals at a different age and that originated from very different experimental lines.
The MGAT4C gene is close to the significantly associated genomic region on GGA1. This gene encodes a glycosyltransferase that is involved in the transfer of N-acetylglucosamine (GlcNAc) to the core mannose residues of N-linked glycans, also known as N-linked glycosylation. N-linked glycosylation has been shown to be essential to HIV-1 pathogenesis [40]. Furthermore, there is a wide range of well-described disorders that affect primarily N-glycan assembly, with several including gastrointestinal disorders [41]. In addition, a study on humans showed a strong association between the apolipoprotein B level and SNP variants in the MGAT4C gene [42].
The KCNK3 gene is the closest gene to the GGA3 genomic region, which was significantly associated with BWG. The KCNK3 gene encodes a member of the potassium channel superfamily, which has been associated with pulmonary hypertension in humans [43]. Broilers are known to suffer from cardiovascular disorders [44, 45], which may be related to the KCNK3 gene. Furthermore, this gene may be associated with the general robustness of broilers and their ability to cope with stress induced by the E. maxima challenge.
The THBS1 gene is located upstream of the significantly associated region GGA5 [between 28.95 and 29.11 Mb, (See Additional file 6: Table S6)]. In addition, we also analyzed putative GENSCAN Gene Predictions that are supported by a few spliced EST (See Additional file 12: Figure S5); however, no similarity was observed with known proteins, which indicates that THBS1 was a better candidate. Human THBS1 is involved in the regulation of angiogenesis and tumorigenesis in healthy tissues and cell adhesion [46, 47]. Furthermore, in the study by Heams et al. [18], THBS1 was among the top 10 of 1473 significantly differentially expressed genes in the caecum between Fayoumi (resistant) and Leghorn (susceptible) animals infected with E. tenella. Moreover, a porcine transcriptome study showed that THBS1 is strongly repressed in the in vitro stimulation of porcine peripheral-blood mononuclear cells (PBMC) with tetradecanoyl phorbol acetate (TPA)/ionomycin [48]. Finally, a recent study showed that human THBS1 plays an important role in the innate immune response during respiratory bacterial infection [49], which may be of interest regarding Eimeria infection. Based on these observations, THBS1 is a good candidate gene for further functional studies.
Regarding PC, our GWAS results show little overlap with previous QTL mapping studies [16]. We observed the strongest signal with the SNP in the MAN2C1 gene (See Additional file 6: Table S6) in association with PC measured in the range from 465 to 510 nm. The associated SNP is described as a non-synonymous polymorphism (http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=314637018). The MAN2C1 gene encodes the α-mannosidase class 2C enzyme, which is one of the key enzymes involved in N-glycan degradation [50]. Recent studies have indicated that MAN2C1 expression is crucial for maintaining efficient protein N-glycosylation [51] as well as cell–cell adhesion [52]. Glycans are important molecules in numerous essential biological processes, including cell adhesion, molecular trafficking and clearance, receptor activation, signal transduction, and endocytosis [53]. In contrast, changes in the PC reflect the status of intestinal absorption, changes in the production of protein carriers and their antioxidant effect in response to Eimeria infection [54] and reflect several of the processes that MAN2C1 may impact. Furthermore, several significant glycol-related pathways were significantly enriched in the biological pathway analysis for PC measurements (Fig. 5).
Regarding the percentage of β-globulins, we identified FHOD3 and LARGE as potential candidate genes (See Additional file 6: Table S6). FHOD3 encodes a protein that is a member of a formin subfamily and is involved in the regulation of cell actin dynamics [55]. However, how FHOD3 may be involved in the regulation of plasma β-globulin levels is difficult to discern because the current knowledge regarding FHOD3 is scarce and restricted to human and mouse functional studies [56, 57]. The LARGE gene encodes an enzyme glycosyltransferase that is involved in alpha-dystroglycan glycosylation and is capable of synthesizing glycoprotein and glycosphingolipid sugar chains [58]. The exact function of LARGE is not fully known; however, mutations in the human LARGE gene have been described to cause congenital muscular dystrophy type 1D (MDC1D) [59].
Taking all measured traits into account, we identified 22 highly associated SNPs; however, GWAS was performed on two sets of traits: the first set of traits was measured on all 2024 animals, and the second set was measured on the subset of 176 animals, for which the statistical power to detect potential candidate genes differed due to the different sample sizes. Regarding BWG and PC that were measured on all animals, two functionally well-supported candidate genes (THBS1 and MAN2C1) were detected. However, we did not identify any candidate regions for BT and HEMA, potentially because these traits have not been as affected in challenged animals compared with PC and BWG [19]. Furthermore, an infection caused by E. maxima is often characterized by intestinal malabsorption and not by severe bloody diarrhoea, which is the case with E. tenella. This finding further indicates that HEMA is not very informative when measuring the response to E. maxima infection [19]. Similarly, BT is difficult to interpret with respect to Eimeria infection because this trait may be influenced by other factors as discussed by Hamzic et al. 2015 [19]. However, we have not identified many candidate genes or genomic regions in the GWAS performed by using traits measured in the 176 animals, which may be primarily due to the sample size, which was 10 times smaller in comparison to the traits measured on all animals, and to the complexity of the genetic parameters that control these traits. In addition, this absence of identifiable candidate genes may be partially due to the precision of measured traits and rather stringent significance thresholds used in GWAS. Therefore, we performed biological pathway analysis, which enabled an increase in the statistical power to detect significant association. An increase in the statistical power is possible because we decreased the number of statistical tests performed by compressing individual SNPs into biological pathways [60].
The final aim is to transfer the acquired knowledge to the poultry breeding industry. The identified candidate genes and genomic regions can be used in breeding for improved resistance to coccidiosis. The primary obstacle in achieving this goal is relating the knowledge regarding the candidate genes identified in the final product (Cobb500) with the grand-parent pure lines where the actual selection is performed. This task becomes rather complex, considering that the poultry populations and the crossing design are not in the public domain, and due to the proprietary nature of information regarding the grandparent lines. Future studies will identify the best approach to trace the identified regions to the grandparent lines.
Biological pathway analysis
The biological pathway studies based on the GWAS results can potentially extend the knowledge obtained from GWAS studies by identifying the cumulative effect of gene sets [61]. Furthermore, understanding the biological pathways increases the power to detect statistically significant associations because fewer statistical tests are performed as a consequence of assigning individual SNPs to the respective biological pathways. Therefore, the number of tests is decreased from over 400,000 (number of SNPs) to approximately 160 (number of pathways) using a priori biological information. For this purpose, we assigned genes, containing SNPs used in GWAS, to the biological KEGG pathways. Therefore, biological KEGG pathways are considered as a set of genes that have been used for further analysis. However, we have to be aware that the biological pathway analysis should still be primarily viewed as an exploratory technique because the current statistical methodologies used for gene set/pathway analysis need further development [61]. Moreover, the available biological pathway databases necessary for this kind of analysis are not completely annotated and do not contain all the genes present in the chicken genome. In this study, we used a statistical modeling methodology that assesses the cumulative effect of sets of SNPs on the measured traits as presented by Jensen et al. [31] and Buitenhuis et al. [62]. For this purpose, we succeeded in assigning 52,204 SNPs from GWAS to 4342 annotated genes in the chicken genome.
For PC, the top four most frequently affected pathways across all wavelengths include phenylalanine, tyrosine and tryptophan biosynthesis, endocytosis, purine metabolism, and glycosphingolipid biosynthesis—lacto and neolacto series (Fig. 5).
Phenylalanine, tyrosine and tryptophan biosynthesis is the most commonly affected pathway when only PC is considered (Fig. 5) and (See Additional file 11: Figure S4). Phenylalanine, tyrosine and tryptophan have important roles in the regulation of the immune response [63]. Phenylalanine is indirectly involved in the regulation of nitric oxide (NO) synthesis [64], and NO is known to have multiple roles related to the immune response such as signaling properties, regulating cytokine production and killing pathogens [65]. Tyrosine is used as a precursor for the production of dopamine, catecholamines and melanin. Dopamine is a neurotransmitter known to be involved in the regulation of immune response, and melanin has antioxidant properties [63]. In addition, interferon gamma (IFN-γ) suppresses the growth of Toxoplasma gondii through the intracellular depletion of tryptophan [66]. Deprivation of tryptophan produces a deleterious effect on Toxoplasma gondii replication. Toxoplasma gondii belongs to the same order (Eucoccidiorida) of intracellular single-cell parasites as E. maxima. Furthermore, Laurent et al. [67] reported a strong increase in IFN-γ mRNA expression in chickens infected with Eimeria spp. Based on this finding and the results from the biological pathway analysis, tryptophan depletion may also be involved in the innate immune response during E. maxima infection.
The second most commonly affected pathway when considering PC is the purine metabolism pathway (Fig. 5) and (See Additional file 11: Figure S4). This pathway regulates nucleotide metabolism and is important for successful cell division. In addition, we observed that the purine metabolism pathway is significantly associated with erythrocyte number, mean corpuscular hemoglobin (MCH), mean corpuscular hemoglobin concentration (MCHC) and mean cellular volume (MCV) (Fig. 6). This association may be explained by an increased demand for cell division of blood cell progenitors and the regeneration of the intestinal epithelium due to the effects of the infection because the E. maxima infection has been characterized by severe lesions of the intestinal lining in broilers [19].
Endocytosis has been shown to play an important role in both innate and adaptive immune responses [68], which may explain why endocytosis may be among the most commonly affected pathways when PC measurements are considered (Fig. 5). Finally, glycosphingolipid biosynthesis—lacto and neolacto series, like other glycol-related pathways (Fig. 5), is involved in the production of glycoconjugate receptors, which are used by microbes to enter the host cell and are of critical importance in the early stage of the innate immune response [69, 70]. Moreover, similar paths of the host cell invasion have been previously described in the cases of Eimeria and Toxoplasma [71, 72].
We also summarized the frequency of the most common significant biological pathways excluding PC (Fig. 6). In this case, glycosaminoglycan biosynthesis involving heparan sulfate/heparin, the pentose phosphate pathway and the peroxisome are the most frequent significantly enriched biological pathways. The pentose phosphate pathway is a metabolic pathway that regulates the production of nicotinamide adenine dinucleotide phosphate (NADPH) and pentose, which are essential for the synthesis of nucleic and ribonucleic acids, respectively. The pentose phosphate pathway seems to play a role in plasma protein component and heterophil and lymphocyte production, which may be explained as a response to the increased production of these components during the primary response to the challenge (Fig. 6) [73]. The peroxisome pathway controls the metabolism of reactive oxygen components, which are known to be toxic to bacteria and several parasites and play a significant role in resilience and immunity to infectious diseases [73]. Moreover, the peroxisome pathway was significantly associated with blood components such as number of erythrocytes and percentage of heterophils and lymphocytes (Fig. 6).
These findings demonstrate that the response to Eimeria infection is characterized by a strong effect on essential metabolic pathways as well as innate immune response-related pathways. Among the essential metabolic pathways, the most frequently affected are phenylalanine, tyrosine and tryptophan biosynthesis, purine metabolism and the pentose phosphate pathway. In addition, the most frequent innate immune response-related pathways are glycol-related pathways, the peroxisome pathway and the endocytosis pathway.
Network-based analysis
The network-based analysis was performed as a complementary approach to the biological pathway analysis to build gene networks associated with responses to the E. maxima challenge using bibliography-based proven relationships that are available through the Ingenuity Pathway repository. The network-based analysis was performed independently from the biological pathway analysis and was based on a list of genes enriched for SNPs that are associated (p < 10−4) with the traits measured during the E. maxima challenge (See Additional file 4: Table S4). The network-based analysis approach, implemented in the IPA tool, assumes that genes used as an input interact with each other, and these interactions are reconstructed based on the relationships shown in the literature [34]. The network-based analysis aims at exploring the cumulative effect of sets of genes that individually explain a moderate part of the variation for a measured trait and that cannot be identified during GWAS when the strict significance threshold was applied.
Figure 7 illustrates the network formed by several of the molecules grouped around LATS1 and LATS2 with multiple direct connections with other molecules enriched with significantly associated SNPs. LATS1 and LATS2 are known to be involved in the regulation of intestinal epithelium renewal [74], which may be explained by intensified tissue repair upon E. maxima challenge. In addition, phenylalanine, tyrosine and tryptophan biosynthesis, purine metabolism and the pentose phosphate pathway are the most frequent significant pathways associated with all measured traits in the KEGG biological pathway analysis. All three of these KEGG pathways are associated with increased DNA replication, cell metabolism and protein degradation, which are essential during the tissue repair process.
The second network has the MYH6 as a key molecule connected with several direct significant relationships to the other genes that are enriched for significantly associated SNPs (Fig. 8). The MYH6 gene encodes the alpha heavy chain subunit of cardiac myosin. In mice, inactivation of the specific mutant MYH6 transcript suppresses hypertrophic cardiomyopathy [75]. In addition, we also identified KCKN3, which is associated with pulmonary hypertension, as one of the candidate genes because heart failure and ascites have been well documented in broiler chickens [44, 45]. The primary reason for these problems can be attributed to an intensive selection in poultry breeding during the last 60 years [76]. Therefore, we can potentially indicate which animals are able to maintain a normal function of the cardiovascular system and have an advantage in the face of Eimeria infection.
Based on the post-GWAS functional analysis, the broiler response to E. maxima is centered on tissue repair and recovery, general robustness and maintenance of tissue integrity, restoring intestinal homeostasis after the challenge. The described processes, which may bring a comparative advantage in the broiler’s ability to cope with the challenge, can be described as resilience to acute Eimeria infection. In contrast to the previously conducted studies [16, 17], which reported associations with genes involved in the immune response, we primarily observed associations with genes, biological pathways and gene networks that are involved in tissue repair and recovery and tissue integrity maintenance. However, previous studies [16, 17] were conducted on experimental layer populations challenged with E. tenella at 28 days of age. In regard to this, broilers are also able to establish complete immunity 16 days after being challenged with E. maxima [77]. Therefore, one would assume that this challenge study would identify genetic variants associated with processes related to immune responses, which did not occur in this study and may be due to the effect of the infection doses (50,000 oocysts), which were optimized to produce severe clinical signs as reported by Hamzic et al. [19]. We argue that the more resilient animals are able to maintain their biological homeostasis and manage the consequences of the infection, which, in the context of this challenge, exceeded the importance of building an adequate immune response.