Relationship between overall genetic diversity and genetic distance to the wild populations
A strong inverse relationship was found between the genetic diversity (\({\mathrm{H}}_{\mathrm{o}}\)) within populations and their genetic distances to the wild populations (Gallus gallus), as shown in Fig. 1. This relationship was similar even when using only neutral markers (intergenic SNPs, see Fig. 3). Across these chicken populations, 88.6% (Table 1) of the total variation in \({\mathrm{H}}_{\mathrm{o}}\) was explained by the genetic distance to the wild populations, which is slightly higher than the percentage obtained in several human studies based on geographic distances. Geographic distances of humans out of Africa explained 76.3% of the microsatellite heterozygosity and 78.4% of the variation in fixation index (\({F}_{ST}\)) in [5] and explained 85% of the microsatellite heterozygosity in [14]. In [28], geographic distances of humans out of Africa had a correlation of − 0.91 with SNP haplotype heterozygosity and of − 0.87 with microsatellite heterozygosity. Studies in humans have also shown that there is a high correlation (e.g. from 0.77 to 0.89 [5]) between genetic distance (using different genetic distance measures) and geographic distance. However, correlations reported for domesticated cattle were not as high, e.g. 0.62 in [29] and 0.75 and 0.54 in [15] for ancient and modern cattle samples, respectively. It has been suggested that the weaker relationship between geographic and genetic distance in modern domesticated cattle is, among other reasons, due to the human manipulation of genetic diversity, as is the case for many domesticated livestock [15].
Since we had different sample sizes, with some populations having less than 15 sampled individuals, we checked whether this affected the estimates. We estimated \({\mathrm{H}}_{\mathrm{o}}\) when only populations of 15 or more individuals were considered and found that number of individuals did not affect the estimates. We also sampled 1000 SNPs in 100 replicates to validate that the relationship between \({\mathrm{H}}_{\mathrm{o}}\) and genetic distance was not due to chance. The percentages of variation explained in the 100 replicates ranged from 86.1 to 88.9% with a mean of 87.9%. Additional file 2: Figure S1 shows the regression plots of the 100 replicates with their 95% confidence intervals. We also used \({F}_{\mathrm{ST}}\) as an alternative measure of differentiation, and found a Mantel correlation coefficient of 0.97 between pairwise \({F}_{\mathrm{ST}}\) values and the corresponding Reynolds’ distances. Reynolds’ genetic distances of populations to the wild populations (G. gallus) and the \({F}_{\mathrm{ST}}\) values were highly correlated, with a Pearson’s correlation coefficient of 0.99 and their relationship is shown in Additional file 2: Figure S2 with an R2 value of 0.99. When using \({F}_{\mathrm{ST}}\), the genetic differentiation of the breeds from the wild populations (G. gallus) explained 88.5% of the variation in \({\mathrm{H}}_{\mathrm{o}}\) (see Additional file 2: Figure S2).
Since heterozygosity and genetic distance are not entirely independent of each other, in the next steps, we investigated the possibility of a structural link behind the observed relationship. First, we permuted the SNPs to ensure that the decreasing heterozygosity was not an artefact of the Reynolds’ distance and found that there was nearly no relationship between \({\mathrm{H}}_{\mathrm{o}}\) and genetic distance based on permutated SNPs (R2 = 0.01). Second, we estimated the regression of \({\mathrm{H}}_{\mathrm{o}}\) on genetic distance when starting from each of the 160 populations (this time not from the two wild populations). Then, we investigated whether the R2 value of the linear regression was directly associated with the \({\mathrm{H}}_{\mathrm{o}}\) within the populations, i.e. whether populations with higher \({\mathrm{H}}_{\mathrm{o}}\) automatically resulted in higher R2 values and vice versa. We found that although there was some association between the populations’ \({\mathrm{H}}_{\mathrm{o}}\) and R2 values (Pearson’s correlation of 0.35), it was not very strong. R2 values for the regression of \({\mathrm{H}}_{\mathrm{o}}\) on the genetic distances for the populations from Asia (Asian fancy and local populations) ranged from 0.71 to 0.91 and those on the genetic distances for the other populations of non-Asian origin ranged from 0.57 to 0.90. Of the 160 populations, 29 yielded higher R2 values than the two wild populations, of which 25 of them are of Asian origin and only four (Albanian Crowers (ALxx), Hungarian Yellow (YH), Schweizer Huhn (SCw) and African Kuroiler (KUR)) are of non-Asian origin, based on our classification (3 European and 1 African). However, the four were all clustered with Asian populations [7]. Furthermore, KUR is believed to have originated from South Asia (India) and arrived recently in Africa. More information on these populations can be found in [7]. Generally, higher explanatory power of the populations’ \({\mathrm{H}}_{\mathrm{o}}\) lies with the Asian populations. A human-based study that used geographic distance as the explanatory factor of heterozygosity reported that, although some locations outside of Africa had reasonably high R2 values (e.g. 0.74), none of them were higher than the regression based on geographic distance from Ethiopia (the possible founder origin) with an R2 = 0.76. However, other locations in Africa had R2 values higher than Ethiopia (e.g. R2 as high as 0.87) [5]. Therefore, our results do not contradict those found in the literature. Nonetheless, one of the shortfalls with the Gallus gallus subspecies that we included in our study is that the samples were not from naturally existing populations in the wild. Instead, they were sampled from small populations that have been isolated for years, therefore they may have a reduced genetic diversity, which reduces their explanatory power of the genetic diversity (\({\mathrm{H}}_{\mathrm{o}}\)) within the domesticated chickens.
Since we studied populations with different histories, including local and fancy populations from different management and breeding backgrounds, we also investigated whether the groups of populations from different backgrounds resulted in different patterns of genetic diversity. Figure 2 shows that the association between \({\mathrm{H}}_{\mathrm{o}}\) and the genetic distances of populations to the wild populations (G. gallus) was weaker within the Asia_local group than within the rest of the other groups, see Additional file 1: Table S2 for R2 values and regression coefficients. The South_America category was not included in this analysis since it consisted of only four populations. Genetic distances of the Asian local populations to the wild populations were generally shorter than those of the other groups, in particular the European breeds and Asian fancy breeds. Not only is intercrossing common among Asian local breeds, which promotes high genetic diversity, but there is also a high probability of gene flow between the wild populations (RJF) and local chickens in Asia, as reported in some studies e.g. [30]. This gene flow limits variations in genetic diversity due to genetic drift. However, the Asian fancy populations have been separated from the wild populations for quite some time, the same applies for the European and African breeds, thus the probability of exchange of genetic material with the wild populations is very low for these populations [31]. Consequently, the fancy breeds (both Asian and European), the European local, and the African categories all show a high association of the \({\mathrm{H}}_{\mathrm{o}}\) within each of these populations with their genetic distances to the wild populations. Figure 2 clearly shows that the groups of fancy breeds comply with the concept of genetic isolation by genetic distance; however, in these breeds the rate of change in \({\mathrm{H}}_{\mathrm{o}}\) due to genetic distance to the wild G. gallus is more rapid compared to their respective local populations (see Additional file 1: Table S2). In addition to the historic separation from the wild populations, the effects of drift are also strong within the fancy categories as a result of other factors, such as small effective population size and consequent inbreeding.
Based on our results, we can conclude that the variance in \({\mathrm{H}}_{\mathrm{o}}\) within the domesticated chicken populations can be explained well by genetic distance to the wild populations (G gallus). Although our results may not directly prove this, because of the lack of geographic sampling coordinates, given the whole dataset (Fig. 1), it is clear that geographic distance alone may not predict well the observed genetic variations in the chickens for two reasons, as described in the following.
-
(i)
Breeds of the same geographic origin are scattered across the genetic diversity spectrum, in particular, the European (green symbols) and Asian (red symbols) type breeds, as shown in Fig. 1 and highlighted in [7]. The European chickens sampled from the German fancy breeders (denoted with prefix DE_) have a much reduced genetic diversity and their genetic distances to the wild populations (G.gallus) are much larger than those of their respective local breeds. However, when considering the sampling areas, the genetic diversity may correlate to the geographic distances to the G. gallus wild chicken within the Asian breed categories. Presumably, many of the fancy breeds originated from a small number of breeding birds that were imported from Asia to Europe, after which they were subjected to strong phenotypic selection, with small effective population sizes, population bottlenecks, and intensional inbreeding to keep the desired traits. These practices are likely responsible for most of the differences in the genetic diversity of the fancy Asian and European type breeds vs. their respective local types.
-
(ii)
The concept of isolation by distance assumes that individuals from nearby locations are likely related because mating between them is possible. This is often the case for traditional breeding systems but not for breeding and management practices for fancy breeds. In fancy breeds, gene flow between small stocks may occur based on personal contacts or personal relationships between breeders, but is not related to geographic distance forming a subpopulation structure within the breed. Actually, such gene flow between fancy breeds is also very limited. Furthermore, if geographic distance was a better predictor for the loss of genetic diversity and increased differentiation of domesticated breeds to the wild populations, then the African and South American breeds would be expected to have a much reduced genetic diversity due to their geographic distances and to show large genetic distances to the wild populations and to the rest of the Asian populations. However, we found that both these expectations were not fulfilled and that some of the African populations were clustered with the wild type populations [7].
Therefore, the observed differences in genetic diversity between domesticated breeds may not be predicted by geographic expansion only, but rather by a combination with other aspects or subsequent events, such as effective population size, type of breeding practices, and possibly subsequent series of founder events following the geographic expansion, as previously suggested [5, 6]. Such events that took place after the geographic expansion have definitely contributed to the differences in allele frequencies between populations and thus to the genetic distances between the domestic chickens and the wild populations. In addition, equilibrium between genetic drift, migration, and mutation was probably not reached in all the studied populations, which would be compatible with the theory of genetic isolation by distance [5, 8, 9]. The theoretical expansion models are also based on ‘natural’ expansion through migration, while chickens and other livestock were actively transported by humans (e.g. with ships) to distant places.
Comparisons of the patterns of heterozygosity between the overall genome and different functional SNP classes
We compared the patterns of the relationship between \({\mathrm{H}}_{\mathrm{o}}\) and genetic distances of populations to the wild Gallus gallus obtained by using the whole set of SNPs to those obtained by using different SNP classes, as shown in Fig. 3 and Table 1. The rate of change in \({\mathrm{H}}_{\mathrm{o}}\) associated with the genetic distance to the wild populations is represented by the slope in column 4 of Table 1. Among the SNP classes, the class of non-synonymous SNPs showed a relevant deviation from the overall pattern, i.e. the \({\mathrm{H}}_{\mathrm{o}}\) across breeds was lower for this class of SNPs than for the whole genome and had the most deviating slope (− 0.645 compared to − 0.706 for all SNPs). To investigate if this different pattern of the non-synonymous SNP class was not due to the smaller number of SNPs, we resampled the same number of SNPs as in this class (1082 SNPs) from the overall set (156 K SNPs) 100 times. We estimated the \({\mathrm{H}}_{\mathrm{o}}\) for each sample and compared these with the Ho for the set of non-sysnonymous SNPs in Additional file 2: Figure S3, which shows that the original difference was not due to sample size.
The intergenic and intronic classes had the highest proportion of SNPs among all SNP classes (Table 1). In order to validate that the similarities of these two classes to the overall genome (whole set of SNPs) are not an artifact of the number of SNPs, we sampled 1000 SNPs 100 times from the intergenic and intronic classes (separately). Then, we estimated \({\mathrm{H}}_{\mathrm{o}}\) and compared the results to the whole set of SNPs and found that the observed similarities are not due to larger numbers of SNPs (see Additional file 2: Figures S4 and S5). All SNP classes showed a reduction in \({\mathrm{H}}_{\mathrm{o}}\) across populations as genetic distance to the wild type chickens increased, with R2 values ranging from 83.0 to 89.3%. The results indicated that 89.3% and 88.0% of the variation in \({\mathrm{H}}_{\mathrm{o}}\) across populations was explained by their genetic distance to the wild G. gallus for the synonymous and non-synonymous SNPs, respectively, while this percentage was lowest (83.0%) for the class of 5′UTR SNPs. However, it is important to note that the 5′UTR class included only 118 SNPs, which could explain the differences. To test this, we sampled 118 random SNPs in 100 replicates from the overall set and estimated the relationship, as we have done for the non-synonymous SNPs. The R2 for the 100 replicates ranged from 76.0 to 86.6%, with a mean of 82.8%, which suggests that the result for the 5′UTR class is most likely an artifact due to the small number of SNPs.
Figure 4 shows the mean \({\mathrm{H}}_{\mathrm{o}}\) for the different SNP classes. Generally, the \({\mathrm{H}}_{\mathrm{o}}\) was lower in the genic than in the non-genic SNP class. Within the genic class, a lower \({\mathrm{H}}_{\mathrm{o}}\) was observed in exonic than in intronic SNPs. Consistent with the results in Fig. 3, the non-synonymous SNPs presented the lowest \({\mathrm{H}}_{\mathrm{o}}\) among all SNP classes. This was expected since non-synonymous changes can present favourable or disadvantagous consequences. The theoretical assumption is that selection acts rapidly towards fixation of favourable alleles and purging of non-favourable alleles, thus leading to more homozygosity in these protein altering variants. The classes of exonic and 5′UTR SNPs followed the non-synonymous class with the lowest mean \({\mathrm{H}}_{\mathrm{o}}\). UTR variants play a role in the regulation of gene expression and translation. For example, 3′UTR variants can interfere with microRNA to facilitate the translation of critical disease genes (e.g. cancer genes in humans) [32, 33]. It is also claimed that positive selection for the adaptation of humans in different habitats has been achieved with high differentiation in the 5′UTR gene variants [34]. Such examples highlight the importance of UTR variants as possible targets for selection.
Patterns of observed heterozygosity in genes
We investigated the patterns of \({\mathrm{H}}_{\mathrm{o}}\) in the 6303 chicken genes to which at least 10 SNPs were mapped and compared them to the overall \({\mathrm{H}}_{\mathrm{o}}\) pattern (see Additional file 3: Table S3), in order to determine whether the decrease in \({\mathrm{H}}_{\mathrm{o}}\) is faster or slower in certain genes. Coefficients of determination (R2) from the linear regression of the genetic distance from the wild ancestor on \({\mathrm{H}}_{\mathrm{o}}\) for each gene ranged from 0.04 to 0.73, with a mean R2 of 0.47, and the regression coefficients (slopes) ranged from − 0.11 to − 1.19. However, the R2 values were correlated with the number of SNPs genotyped within the gene, with a correlation of 0.57. The regression coefficients were independent of the number of SNPs within genes, with a correlation of 0.03. The correlation between the regression coefficients and R2 values was − 0.52.
We evaluated the regression coefficients of the relationship between \({\mathrm{H}}_{\mathrm{o}}\) and genetic distance for the genes with the 5% highest and lowest regression coefficients, i.e. 32 genes at each end. Regression coefficients ranged from − 0.11 to − 0.34 for the 5% of genes with the lowest regression coefficients and from − 0.99 to − 1.19 for the 5% of genes with the highest regression coefficients (see Additional file 1: Table S4 and Additional files 4 and 5 for the top and lowest 5% ranges, respectively). The genes in the top 5% showed rapid changes in \({\mathrm{H}}_{\mathrm{o}}\) with genetic distance of the breeds to the G. gallus wild chicken and those in the lowest 5% showed very slow changes in \({\mathrm{H}}_{\mathrm{o}}\) with genetic distance. We used the DAVID annotation platform [27] to identify the function of each gene in the lowest and top ranges (see Additional file 1: Table S4). In addition, using the ClueGo package [25], functional annotations of the genes were obtained for the combination of molecular function, biological, and immune system processes, as well as KEGG pathways. Functions of the individual genes in the top 5% range included transmembrane transport, protein transport and protein metabolic processes, and lipid metabolic processes among other functions. However, none of these top 5% genes formed any functional cluster. Most of the genes in the lowest 5% range had consistently lower Ho across the breeds, regardless of the genetic distance to the G. gallus wild chicken (see Additional file 5), and they were mainly related to critical functions for normal functioning of the individuals. Common functions among the individual genes in the lower 5% range included brain morphogenesis and development, axon development, positive regulation of cell proliferation, positive regulation of reactive oxygen species metabolic process, regulation of cell death, cell and structure morphogenesis, salivary gland morphogenesis, and lung morphogenesis, among other functions. However, functional classification revealed only three functional clusters, namely: brain development (EGFR, PAFAH1B1, PTPRS and RTN4), regulation of axon extension (DPYSL2, PTPRS and RTN4), and morphogenesis of salivary gland (EGFR, ESRP2 and FGFR1).
The consistent lower genetic diversity (\({\mathrm{H}}_{\mathrm{o}}\)) for genes with the lowest 5% regression coefficients and the limited or lack of relationship of \({\mathrm{H}}_{\mathrm{o}}\) with genetic distance to G. gallus can be the result of several factors, as described in the following.
-
(i)
On the one hand, some genes may be under evolutionary constraints such that the genetic make-up of these genes may be critical for normal development or functioning of the animal and changes within the genes may have detrimental effects. For example, the GRB2 gene, which had both the lowest slope, i.e. closest to zero (− 0.112) and the lowest R2 value (0.036), is highly conserved and under very strong evolutionary constraints in both chickens and in humans [35].
-
(ii)
On the other hand, the genetic diversity of some genes is probably reduced from the founders i.e. selection and fixation of the preferred variants took place prior to domestication, such that there is either no or less possibility for further reductions in genetic diversity. For example, Qanbari et al. [36] reported putative selective sweeps for the genes EGFR and STK17A across chicken populations, with reduced nucleotide diversity across populations but without significant genetic differentiation between the populations. Previously, a study of these two genes, along with NDUFA9 and NTF3, in different Mexican chickens suggested an influence of natural (adaptive) selection pressure rather than artificial selection [37].
-
(iii)
If an adaptive selection event did not occur prior to domestication, a third explanation is that purifying selection may have continued post-domestication and removed the non-favorable alleles across populations, which led to rapid fixation of the other allele.