- Research Article
- Open Access
Detection of copy number variations in brown and white layers based on genotyping panels with different densities
Genetics Selection Evolutionvolume 50, Article number: 54 (2018)
Copy number variations (CNV) are an important source of genetic variation that has gained increasing attention over the last couple of years. In this study, we performed CNV detection and functional analysis for 18,719 individuals from four pure lines and one commercial cross of layer chickens. Samples were genotyped on four single nucleotide polymorphism (SNP) genotyping platforms, i.e. the Illumina 42K, Affymetrix 600K, and two different customized Affymetrix 50K chips. CNV recovered from the Affymetrix chips were identified by using the Axiom® CNV Summary Tools and PennCNV software and those from the Illumina chip were identified by using the cnvPartition in the Genome Studio software.
The mean number of CNV per individual varied from 0.50 to 4.87 according to line or cross and size of the SNP genotyping set. The length of the detected CNV across all datasets ranged from 1.2 kb to 3.2 Mb. The number of duplications exceeded the number of deletions for most lines. Between the lines, there were considerable differences in the number of detected CNV and their distribution. Most of the detected CNV had a low frequency, but 19 CNV were identified with a frequency higher than 5% in birds that were genotyped on the 600K panel, with the most common CNV being detected in 734 birds from three lines.
Commonly used SNP genotyping platforms can be used to detect segregating CNV in chicken layer lines. The sample sizes for this study enabled a detailed characterization of the CNV landscape within commercially relevant lines. The size of the SNP panel used affected detection efficiency, with more CNV detected per individual on the higher density 600K panel. In spite of the high level of inter-individual diversity and a large number of CNV observed within individuals, we were able to detect 19 frequent CNV, of which, 57.9% overlapped with annotated genes and 89% overlapped with known quantitative trait loci.
Copy number variations (CNV) refer to large-scale insertions, duplications, or deletions of DNA sequence segments compared to a reference assembly. CNV can range in size from 50 to millions of base pairs, but 1 kb is generally assumed to be the lower limit [1, 2]. Most genome-wide mapping studies of CNV have been conducted in humans, where CNV account for a significant proportion of genome variation and are associated with susceptibility to disease [2,3,4,5]. According to Zarrei et al. , 4.8 to 9.5% of the human genome consists of CNV, while other studies in human and mouse found that CNV explained 18 to 30% of the genetic variation in gene expression [7, 8].
A better understanding of CNV in domesticated animal genomes will contribute to greater genetic improvement of production traits and animal health . Several species of farm animals have been scanned for CNV, including cattle [10,11,12], sheep [13, 14], goats , and pigs [16,17,18] and numerous studies have examined CNV in the chicken genome [19,20,21,22,23,24,25,26,27,28,29,30,31,32]. Currently, the known CNV in chicken encompass approximately 8.3% of its genome, or 9.6% of the ordered genome assembly . A number of other avian species have also been scanned for CNV, including duck  and turkey . Skinner et al.  have analyzed CNV in 16 species of birds and found that the number of CNV per Mb was similar in birds and mammals but that their size was smaller in birds than in mammals. In addition, overlapping between CNV and genes in chicken seems to be at the higher end of the range observed in mammals , which suggests that CNV may have functional effects in chickens.
According to studies on the human genome, formation of CNV can be connected to differences in recombination rate across the genome [35, 36]. Based on this hypothesis, recombination hot spots should have a higher prevalence of CNV than other parts of the genome. Indeed, based on an analysis of the genomes of chicken and zebra finch, Völker et al.  found a significant association between presence of structural variations such as chromosomal rearrangements and recombination rate. These data suggest a major role of recombination-based processes in the evolution of avian genomes.
Copy number variations can be detected by a number of methods, including array comparative genome hybridization (aCGH), sequencing, and single nucleotide polymorphism (SNP) arrays . Although SNP arrays are designed primarily for SNP genotyping, detection of CNV is possible because of the abnormal hybridization that occurs when a SNP is located within a CNV region. In general, the use of a 60K SNP  chip for this purpose has resulted in low frequencies of detected CNV [22, 29]. The use of sequence data is much more efficient and yields the largest number of CNV detected [24, 26, 27], but using a 600K SNP array can increase the sensitivity of CNV detection significantly . Four CNV detection studies based on the high-density Affymetrix 600K SNP array have been reported in chicken [28, 30,31,32] and have shown that, in general, CNV detection with this panel is more efficient than with lower density SNP chips.
Our study aimed at (1) detecting CNV and refining the genome-wide copy number profiles for layer chickens; (2) comparing CNV detection across different SNP genotyping panels, in order to evaluate the utility of these panels for CNV detection; (3) characterizing in detail the differences in CNV detection rates between individuals and lines; and (4) assessing the frequency of detected CNV and their possible functional impact. To achieve this, genes and quantitative trait loci (QTL) that overlap with the detected CNV were identified. Gene enrichment analysis was performed to identify overrepresented biological processes and pathways.
Samples and DNA extraction
The total number of individuals used in this study was 18,719, which included birds from four pure lines, two white shell (W) and two brown shell (B) lines, and from one commercial hybrid of white shell layer chickens, all provided by Hy-Line International (Table 1). DNA was isolated from blood collected from the wing vein of each bird. For the pure lines, blood was collected in EDTA-coated anticoagulant tubes and genomic DNA was extracted following lysis of cells and subsequent digestion with proteinase K. For the commercial hybrids, blood was collected on FTA Elute cards (GE Healthcare) and DNA was extracted following the manufacturer’s recommendations.
SNP arrays and genotyping
SNP genotypes were obtained from several platforms over multiple years and for multiple purposes (Table 1).
The SNP panels used included the publicly available 600K Affymetrix chip , a 42K proprietary Illumina iSelecta BeadChip , and two custom 50K Affymetrix chips, which were designed separately for white and brown lines by HyLine International. The choice of SNPs for inclusion in each panel was based on their uniform distribution across the genome. The 42K Illumina panel was optimized to capture the genetic variance that is associated with economically important traits, and thus contained more SNPs in close proximity of genes than the other panels. For both 50K panels, only SNPs with high-quality clusters according to the Axiom™ Analysis Suite were included. This could have led to the elimination of SNPs that overlapped CNV in the birds used for panel design since those SNPs may not form three discrete clusters when plotting allele-A intensity versus allele-B intensity in Axiom™ Analysis Suite.
Detection of CNV
The Axiom™ Analysis Suite  was used to call genotypes for the 50K and 600K Affymetrix panels. A minimum default quality control of 0.82 and a minimum call rate of 97% were used. The Axiom® CNV Summary Tools software  was used to extract log R Ratio (LRR) and B allele frequency (BAF) values for PennCNV 1.0.3 . Genotype and CNV calling were performed separately for each 96-well genotyping plate because of the large differences in signal intensities between plates. Data from the 42K Illumina panel were processed in Genome Studio 2011.1 using the Genotyping module v 1.9 and the cnvPartition CNV Analysis Plugin v3.2.0 .
PennCNV, an integrated hidden Markov model (HMM), was used for all Affymetrix panels. This algorithm incorporates multiple sources of information, including the signal intensity data of LRR and BAF values at each SNP, the distance between neighboring SNPs, and the population frequency of the B allele (PFB). Individual-based CNV calling was performed using the—test option with default parameters for the HMM model. The hhall.hmm file was used with the—test option for all panels. To adjust for genomic waves, the—gcmodel option with the chicken GC content file (GC content of 1-Mb genomic regions surrounding each SNP) was used. The PFB files were compiled separately for each panel from a large set of individuals, using the compile pfb script included in the PennCNV software. For filtering, standard deviations (SD) for LRR ≤ 0.35, BAF drift < 0.01, and waviness factor ≤ 0.04 were used. The waviness factor accounts for the dispersion in signal intensity across the genome. Only CNV that consisted of at least three (for the 50K and 42K panels) or at least five (for the 600K panel) consecutive SNPs were included in the analysis. Individuals with more than 30 called CNV were excluded as unreliable (58 on the 600K, 47 on the 50K brown, 7 on the 50K white and 33 on the 42K panels). This number (30) was chosen as approximation of the mean number of CNV per individual plus 3 standard deviations across all panels. CNV were identified on autosomes (1 to 28) only because PennCNV calls for the sex chromosomes were unreliable and difficult to interpret.
Determination of CNV regions
The identified CNV were merged and/or intersected with the BedTools software , which combines CNV that overlap in one or multiple interval files into a single CNV region. The BedTools intersect tool was used to select only the region of a CNV that is common between individuals, i.e., if a CNV was identified between 1 and 3 kb in individual 1 and between 2 and 4 kb in individual 2, only the region between 2 and 3 kb was retained in the sets and was referred to as a common CNV region (CNVR). The subsequent sets of CNVR used in this study were obtained as follows:
For the list of all detected CNVR, a Bedtools merge was performed across all individuals and lines for all CNV that overlapped by at least 1 bp. This set is referred to as the merged CNVR.
Bedtools merge was performed for variants that were present in at least two individuals within a line. This set is referred to as the common CNVR.
Bedtools intersect was used for all CNV for a given panel and line combination, which were then merged across panels and lines. CNVR that were identified in at least two individuals within a line, were selected for further analysis. This set is referred to as common intersected CNVR.
CNVR that were detected in only one individual are referred to as singletons. CNVR for which both deletions and duplications were observed are referred to as complex CNVR.
Annotation of CNVR and gene ontology analysis
Genes that overlapped with common intersected CNVR were identified with the Ensembl BioMart webtool based on the Galgal4 assembly and the Ensembl Genes 85 database . Analysis of overrepresented GO terms and pathways was performed using PANTHER Classification System version 11 . Known quantitative trait loci (QTL) that overlapped with the detected CNVR were identified based on the Animal QTL database  release 33. In order to perform a comparison with previous studies, autosomal coordinates of the CNVR were migrated from galGal3 to galGal4 using the UCSC liftOver tool . Common CNV were checked visually using available sequence data, which consisted of representative pools of 10 individuals per line. Details on the sequencing of these pools are in Kranis et al. .
Detection of CNV
The proportion of samples that passed quality control ranged from 89.4% in line B1 genotyped on the 50K panel up to 99.6% in line W1 genotyped on the 600K panel. For the latter, only one individual was excluded because of poor quality. The mean number of CNV per individual ranged from 0.50 on the 50K panel for line W2 to 4.87 on the 600K panel for line B1 (Table 2). The commercial hybrid cross and line B1 had the largest average number of identified CNV per individual, whereas line W2 had the smallest average number of CNV per individual.
The length of the CNV ranged from 1.2 kb to 3.2 Mb (Table 2). CNV shorter than 1 kb and that included less than 3, 3 or 5 SNPs for the 42K, 50K and 600K panels, respectively, were excluded from analysis. The mean length of detected CNV was greater for the 50K panels, which most likely resulted from the low detectability of shorter variants due to the greater distance between SNPs compared to the 600K panel. The length of each CNV was calculated as the distance from the first to the last SNP included in the CNV region, which may, therefore, slightly underestimate the true length.
Compared to the high-density 600K panel, the 50K SNP panels resulted in a smaller average number of detected CNV per individual (Table 2) and a smaller number of CNVR with a frequency higher than 1% within a line (Table 3). The highest average frequency of CNV detected from the lower density panels was observed for line B2 and the 42K SNP Illumina panel, for which few CNVR had a frequency higher than 1% and one CNV had a frequency higher than 10% (13.9%). The largest number of CNVR with a frequency higher than 1% was observed for the 600K panel (Table 3). A list of the 19 CNVR with a frequency higher than 5% for the 600K panel is in Additional file 1: Table S1. The most common CNVR was detected in 734 individuals across three lines, on chromosome 5 between 19.60 and 19.72 Mb.
The number of deletions and duplications detected differed between lines (Table 3). The number of duplications exceeded the number of deletions for line W2 and the commercial hybrid, while for line B2 more deletions were detected. For lines W1 and B1, the ratio of duplications to deletions differed between panels. In some regions, complex CNVR were observed, but the number of complex CNVR identified was significantly larger for the lower density panels than for the 600K panel (Table 3), which probably resulted from the poorer variant separation with the lower density panels. The proportion of each chromosome that was covered with deletions, duplications, or complex variants for common CNVR is shown in Fig. 1. The fraction covered with complex CNVR was greater for the microchromosomes (6% on average) than for the macrochromosomes (on average 3% of the total sequence). Although the number of duplications exceeded the number of deletions for most lines, both types of variants covered a similar fraction of the chromosomes.
CNVR and overlapping genes
The distribution of CNVR differed between chromosomes, with microchromosomes having a higher density of CNVR than macrochromosomes (Table 4). The log10 of chromosome size was inversely correlated with the fraction of chromosome covered by CNVR, with a Pearson correlation of − 0.80. Chromosome 16 is the shortest chromosome in the chicken genome with a large fraction covered with CNV. However, results for chromosome 16 should be treated with caution, since the reference sequence for this chromosome is of poor quality, probably because it carries the major histocompatibility complex, which has a high level of variability, multiple gene families, and a high GC content.
In total, 2687 CNVR were identified after merging CNV across all samples and lines (Additional file 2: Table S2). The total length of these CNV was equal to 493.3 Mb, which corresponds to 53.7% of the analyzed genome sequence. Of the merged CNVR, 73.4% overlapped with genes, which accounted for 45.9% of the total CNVR sequence.
The number of common CNVR, which resulted from merging CNV that were found to be shared by at least two individuals within a line, was equal to 1264, with a total length of 375.8 Mb. Of these CNVR, 82.0% overlapped with genes, which encompassed 46.2% of the total CNVR sequence. More than 97.2% of the merged CNVR and 96.8% of the common CNVR overlapped with known QTL. Of the 1264 CNVR, 447 CNVR that cover 252.8 Mb, were detected by more than one SNP panel.
Intersecting CNVR across all lines and panels resulted in 4131 CNVR. Since the CNV that were observed once require further confirmation, CNVR, which were identified in at least two individuals within a line after intersecting within panels, were selected for further analysis (N = 2139). The total length of these common intersected CNVR was equal to 117.3 Mb, which corresponds to 12.7% of the genome. In total, 29.8% of these CNVR overlapped with 3510 Ensembl gene ID, for which 2322 gene names were available, including 94 miRNAs and 29 LOC genes (Additional file 3: Table S3). Of the 3510 Ensembl gene ID, 2994 genes mapped to Panther biological categories. GO enrichment analysis of these genes revealed significant terms involved in antigen processing and presentation, and cellular defense response, which may represent biological processes that are influenced by CNV (Table 5).
Within-line CNV characterization
Each line was characterized by its own CNV profile. Since only the high-density 600K panel was used for more than two lines, the comparison of CNV profiles between lines was based on this panel only. The number of CNVR that were common between lines is in Table 6. The largest overlap was observed between line W2 and the commercial hybrid line, which may be related to the fact that these lines had the largest number of individuals and CNV detected. As expected, line B1 had a relatively small number of common CNV with the white lines and the hybrid cross, which can be explained by the relatively large genetic distance between white and brown egg shell lines. This difference was most pronounced with line W1, for which the number of genotyped individuals was smallest.
To determine whether CNV are associated with specific biological processes, we identified the genes within CNVR that were detected in at least two individuals within a line. In total, 682 genes overlapped with 465 merged CNVR for the 600K panel and 602 of these were mapped by Panther with 257 CNVR not classified in any GO term. Two terms were significant after Bonferroni correction, phagocytosis (p-value = 0.0250) and cellular defense response (p-value = 0.0127), with enrichments of 5.71 and 3.25, respectively.
Then, we performed the Panther GO overrepresentation test for genes that were identified within each line separately. No significant GO terms were identified for line W1, probably because of the small number of genotyped individuals and the small number of detected CNVR. For all the other lines, we detected several significant GO terms, but these were mostly connected to genes that overlapped with a single CNVR. For the hybrid cross, the most significant GO terms were: antigen processing and presentation, phagocytosis, B cell immunity, and cellular defense response. The antigen processing and presentation term was connected to CNVR that were identified on chromosome 16, which consisted of 13 CNV that covered almost the entire chromosome. The B cell immunity and cellular defense response terms, which were significant for lines B1, W2, and the hybrid cross, were connected to a region on chromosome 27, where a single copy deletion was observed between 0.19 and 0.33 Mb (Table 2).
Confirmation of CNVR with frequencies higher than 5%
Information about the confirmation of CNVR based on sequence data of pooled DNA is provided in Additional file 1: Table S1. Due to lack of individual sequence data, only CNV with a relatively high frequency could be detected based on sequence information. In addition, because of the small number of individuals in each pool of sequenced data, even relatively frequent CNV may be indistinguishable from noise. Examples of CNV that were confirmed by sequence data are in Fig. 2 and in Additional file 4: Figures S1 to S8. The sequence data enabled confirmation of selected variants but did not provide a means for identifying false positives because sequenced individuals represented only a limited number of individuals of the genotyped lines and no individual had both sequence and SNP genotype data.
In this study, we used 17,706 individuals from four pure lines and one commercial multi-line cross to detect CNV using genotypes provided by four SNP panels with different densities. In total, 19,525 CNV were detected, which resulted in 2687 CNVR after merging across individuals, lines, and panels. This result shows that CNV detection is possible by using commercially available SNP genotyping platforms. In addition, 19 high frequency CNVR were detected using the 600K panel, of which 57.9% overlapped with annotated genes (Additional file 1). We hypothesize that the CNV, which segregate within the lines at relatively high frequencies, may have an impact on the traits that are under selection in these lines.
CNV detection and comparison of results between SNP panels
Similar to other studies on the detection of CNV, duplications were more abundant than deletions [22, 30], although there were some differences between lines. For line B2, the number of losses was almost equal to the number of gains, which may be specific of this line or of the 42K panel, which was initially developed to exclude SNPs that did not perform well (thus some SNPs within CNV may have been eliminated). For line B1, the 600K panel, the number of losses was almost six times larger than the number of gains when the 600K panel was used, whereas interestingly, when the 50K panel was used, we obtained the opposite result, although most of the gains were due to singletons. These results may be due to the large difference in the number of line B1 individuals genotyped for these two panels, the low detectability with the 50K panel, and the large number of singletons.
Based on the literature, generally most of the detected CNV have low population frequencies, although the use of relatively small numbers of individuals can result in sampling bias. According to Jia et al. , among the 315 CNV that they detected in an analysis of 746 chickens with the 60K SNP array, only four had a frequency higher than 5% and none had a frequency higher than 10%. In addition, more recent studies have reported that most of the detected CNVR are singletons (occurring only in one individual), i.e. 76% in Han et al. , 69% in Yi et al. , and 75% in Strillacci et al. . In our study, we detected several common CNVR with a frequency higher than 5%, although most of these were detected only with the 600K panel (Table 2). The lower frequency of the CNVR detected when using the 50K and 42K panels confirm the advantage of higher SNP densities for CNV detection as previously reported [28, 30, 31].
Among all the detected CNVR, 46% were observed in a single individual across all lines. This observation, combined with the large number of individuals used in this study, confirms previous observations that a large fraction of CNV are singletons. However, such a large number of singletons could also result from the stringent quality control criteria that were applied in this study, including plate-by-plate detection, which could result in some CNV being overlooked.
The number of common CNV (present in at least two individuals within a line) that were detected within and across lines is shown in Fig. 3. The number of CNVR that were shared between the four pure lines and between SNP panels was rather small, which could be due to the relatively small number of CNV per individual that was obtained with the 50K panels and to the large number of singletons. Among the lower density panels, the largest number of CNV per individual was found with the 42K Illumina panel, which may be related to differences in genotyping technology or line specificities.
The stringent quality control criteria that were applied when selecting SNPs for the 50K panel may have excluded SNPs in CNV regions. This hypothesis is supported by the larger number of CNV per individual, which were detected for the white layer line W1 when using the 50K panel that was developed specifically for brown layers compared to the 50K panel that was developed specifically for white layers (Table 1).
To summarize, although all panels enable the detection of CNV, it is possible that a proportion of the CNV that could be detected by more accurate data such as sequence data are missed when using SNP panels, especially lower density panels. In addition, a number of characteristics should be taken into account when calling CNV with SNP panels. First, distance between SNPs on the panel and their coverage have a clear effect on the length of the detected CNV. Second, it is necessary to have genotypes for a relatively large number of individuals to detect CNVR that are segregating within populations and to estimate their frequency. The pre-selection strategy for the SNPs placed on the panel also needs to be taken into account, since the SNPs that are located within CNV are more likely to be excluded as non-performing. Finally, the panel used can influence the ratio of detected deletions to duplications. In light of these results, our recommendation is that CNV detection using SNP genotypes can be used on a larger scale for commercial populations with large sample sizes, but keeping in mind the limitations.
Chromosome coverage and gene content
The number of CNV detected varied between chromosomes, with the microchromosomes being characterized by a higher density of CNV. In general, microchromosomes are known to have a higher gene content, which directly contradicts the observation that the majority of CNV are in gene-poor regions and gene deserts . Our results suggest the opposite, i.e. that microchromosomes are more CNV-rich than macrochromosomes and thus more frequently associated with genes, which is consistent with the findings of Skinner et al. . One of the most interesting cases is chromosome 16, which was covered at 98% by CNV, these being present in 53 individuals across all lines. This confirms a number of previous studies [28, 30]. The reason for this high density of CNV on chromosome 16 could be that it carries the major histocompatibility complex and has a high recombination rate, but the poor reference genome sequence for this chromosome could also be a cause. Details on the recombination rate and CNV located on chromosome 16 are in Fulton et al. .
Rao et al.  reported that only 38% of the 383 CNVR that they identified in chickens overlapped with genes. We observed a similarly small percentage (30%) for the 2139 intersected CNVR, which is probably related to the relatively short length of the intersected variants that fall within intergenic regions. These results support the hypothesis that a majority of CNVR is associated with genes and may have functional effects. In contrast, in a study on 16 bird species, Skinner et al.  determined that 70% of the detected CNVR overlapped with genes. We obtained a similar result for merged CNV, of which 73.4% overlapped with genes, and this percentage was even higher for the 1264 common CNVR (82.0%). These results support the hypothesis that the majority of CNVR is associated with genes and may have functional effects. In addition, GO analysis showed that genes that overlapped with CNV were enriched with a number of biological functions, in particular related to immune response. This is consistent with the results of Jia et al.  who suggested that this type of polymorphism might be prevalent in immune-related genes.
Comparison of CNVR detected in our work with previous studies
Additional file 5: Table S4 includes the list of the CNVR that were detected in this study and that overlap with previously detected CNVR. Of the 2687 CNVR that we detected, 70% overlapped with previously detected CNV, but these only comprise 28% of the total sequence length for all CNVR detected in this study. Of the 1264 common CNVR, 169 were novel and covered 2.4% of their total sequence (375.8 Mb). The total sequence overlap of common CNVR with previously known CNVR was equal to 32.6%, which can be related to the large length of merged CNVR. For both all and common CNVR, the sequence coverage with previously detected CNVR was around 30%. This observation, combined with the large number of singletons, leads to the conclusion that the occurrence of CNV is specific for each individual and inter-individual differences are more pronounced than between-line differences.
Our results show that the use of the high-density 600K panel greatly improves the detection of CNV compared to that of low-density panels. Four studies have already used this 600K panel to detect CNV in various breeds or lines of chickens, and these are summarized in Table 7 [28, 30,31,32]. On average, the number of CNV per individual was larger in those studies than in ours, probably because of the higher level of genetic variability in indigenous breeds than in highly selected commercial lines, such as those that we investigated. This was confirmed by Yi et al. , who found that the average number of CNV detected in commercial breeds was equal to 3.3 versus 5.1 for Chinese indigenous breeds. The populations used for CNV detection by Gorla et al.  and Strillaci et al.  were also characterized by higher genetic variability. We detected a larger number of CNV per individual in the brown line B1 (4.87), which is close to what was reported for some non-commercial breeds [28, 30, 31]. In contrast, the smallest number of CNV per individual was detected in line W2 for the 600K panel, which has a relatively high level of inbreeding (results not shown).
Overall, we found a larger total number of CNV and a higher proportion of the genome covered by these CNV than previous studies in chickens [28, 30,31,32]. These differences are likely due to the much larger number of individuals analyzed in this study (Table 7), which allowed a better characterization of within-line CNV variability.
Of the 19 CNVR that were identified (based on the 600K panel) with a frequency of at least 5% within one line, 11 overlapped with at least one gene and 17 overlapped with a previously detected QTL (Additional file 6: Table S5). Among the QTL that overlapped with these common CNVR, 26 were involved in body weight and 13 in growth. The largest numbers of overlapping QTL were found for CNVR on chromosomes 2, 3, 4 and 5. The deletion on chromosome 8 overlapped with the largest number of genes (14) and with one QTL, for body weight. The GO terms that were enriched for the 600K CNV were mostly related to immune-response genes. This observation, along with the large number of singleton CNV identified, suggests a large inter-individual variability among the genes involved in immune response.
The high-frequency CNVR located at 179 Mb on chromosome 1 overlapped with a number of QTL for body weight and Marek’s disease related traits, and with the ALKBH8 gene. Wang et al.  had already reported this CNVR on chromosome 1 between 184,874,498 and 184,879,098 bp in build 3 of the chicken genome and predicted 45 candidate transcription factor binding sites for this region by WWW PROMOTER SCAN. This suggests that amplification of this upstream locus might affect expression of the ALKBH8 gene, which codes for tRNA methyltransferase and is involved in tRNA modifications and regulation of gene expression.
The second interesting high-frequency CNVR is a single copy deletion on chromosome 6 (12.47–12.54 Mb). This deletion does not overlap with a gene but it is located in close proximity to a number of genes, downstream to ZMIZ1 and upstream to RPS24 and POLR3A, which are all involved in immune response. This region also overlaps with 10 QTL, including one for antibody response to sheep red blood cells (SRBC).
The high-frequency CNVR deletion on chromosome 23 (between 2.34 and 2.35 Mb) overlaps with the gene RHCE (Rh blood group CcEe antigens). Previously detected QTL located within this region are involved in body weight and shank length. A duplication on chromosome 2 between 129.10 and 129.17 Mb overlaps with two genes, BAALC and FZD6, which are both connected to immune response. A CNV that overlaps with the FZD6 gene was previously reported by Yan et al.  and was associated with Marek’s disease resistance. This CNV also overlaps with two QTL in the Animal QTL database that are related to Marek’s disease and with cloacal bacterial burden following challenge with Salmonella.
Our results support previous findings that a large proportion of all detected CNVR are singletons, but we were able to detect several common CNVR, which may have important functional impacts. In addition, the large number of CNV that overlap with genes suggests that chicken CNV can impact agricultural or disease-related traits. In this context, the detection of structural variants such as CNV in chicken should be performed on a wider scale. The use of SNP genotypes on a large number of individuals enabled a better characterization of the CNV, both within and between lines. The list of CNVR presented here provides an additional resource for further studies in chicken. We observed pronounced differences between SNP panels and a clear advantage for the dense 600K SNP panel, both regarding the total number of CNV detected and their population frequencies. Although the use of SNP panels does not allow all the CNV that are present in an individual to be detected, these results show that they are a valuable source of CNV information by allowing the screening of large numbers of individuals at relatively low cost.
Conrad DF, Pinto D, Redon R, Feuk L, Gokcumen O, Zhang Y, et al. Origins and functional impact of copy number variation in the human genome. Nature. 2010;464:704–12.
Stankiewicz P, Lupski JR. Structural variation in the human genome and its role in disease. Annu Rev Med. 2010;61:437–55.
McCarroll SA, Altshuler DM. Copy-number variation and association studies of human disease. Nat Genet. 2007;39:S37–42.
Zhang F, Gu W, Hurles ME, Lupski JR. Copy number variation in human health, disease, and evolution. Annu Rev Genomics Hum Genet. 2009;10:451–81.
Sudmant PH, Mallick S, Nelson BJ, Hormozdiari F, Krumm N, Huddleston J, et al. Global diversity, population stratification, and selection of human copy-number variation. Science. 2015;349:aab3761.
Zarrei M, MacDonald JR, Merico D, Scherer SW. A copy number variation map of the human genome. Nat Rev Genet. 2015;16:172–83.
Stranger BE, Forrest MS, Dunning M, Ingle CE, Beazley C, Thorne N, et al. Relative impact of nucleotide and copy number variation on gene expression phenotypes. Science. 2007;315:848–53.
Orozco LD, Cokus SJ, Ghazalpour A, Ingram-Drake L, Wang S, van Nas A, et al. Copy number variation influences gene expression and metabolic traits in mice. Hum Mol Genet. 2009;18:4118–29.
Clop A, Vidal O, Amills M. Copy number variation in the genomes of domestic animals. Anim Genet. 2012;43:503–17.
Bae JS, Cheong HS, Kim LH, NamGung S, Park TJ, Chun JY, et al. Identification of copy number variations and common deletion polymorphisms in cattle. BMC Genomics. 2010;11:232.
Fadista J, Thomsen B, Holm LE, Bendixen C. Copy number variation in the bovine genome. BMC Genomics. 2010;11:284.
Hou J, Bickhart DM, Hvinden ML, Li C, Song J, Boichard DA, et al. Fine mapping of copy number variations on two cattle genome assemblies using high-density SNP array. BMC Genomics. 2012;13:376.
Fontanesi L, Beretti F, Martelli PL, Colombo M, Dall’Olio S, Occidente M, et al. A first comparative map of copy number variations in the sheep genome. Genomics. 2011;97:158–65.
Liu J, Zhang L, Xu L, Ren H, Lu J, Zhang X, et al. Analysis of copy number variations in the sheep genome using 50K SNP BeadChip array. BMC Genomics. 2013;14:229.
Fontanesi L, Martelli PL, Beretti F, Riggio V, Dall’Olio S, Colombo M, et al. An initial comparative map of copy number variations in the goat (Capra hircus) genome. BMC Genomics. 2010;11:639.
Fadista J, Nygaard M, Holm LE, Thomsen B, Bendixen C. A snapshot of CNVs in the pig genome. PLoS One. 2008;3:e3916.
Chen C, Qiao R, Wei R, Guo Y, Ai H, Ma J, Ren J, et al. A comprehensive survey of copy number variation in 18 diverse pig populations and identification of candidate copy number variable genes associated with complex traits. BMC Genomics. 2012;13:733.
Wang J, Jiang J, Wang H, Kang H, Zhang Q, Liu JF. Enhancing genome-wide copy number variation identification by high density array CGH using diverse resources of pig breeds. PLoS One. 2014;9:e87571.
Griffin DK, Robertson LB, Tempest HG, Vignal A, Fillon V, Crooijmans RP, et al. Whole genome comparative studies between chicken and turkey and their implications for avian genome evolution. BMC Genomics. 2008;9:168.
Skinner BM, Al Mutery A, Smith D, Völker M, Hojjat N, Raja S, et al. Global patterns of apparent copy number variation in birds revealed by cross-species comparative genomic hybridization. Chromosome Res. 2014;22:59–70.
Völker M, Backström N, Skinner BM, Langley EJ, Bunzey SK, Ellegren H, et al. Copy number variation, chromosome rearrangement, and their association with recombination during avian evolution. Genome Res. 2010;20:503–11.
Jia X, Chen S, Zhou Z, Li D, Liu W, Yang N. Copy number variations identified in the chicken using a 60K SNP BeadChip. Anim Genet. 2013;44:276–84.
Crooijmans R, Fife MS, Fitzgerald TW, Strickland T, Cheng HH, Kaiser P, et al. Large scale variation in DNA copy number in chicken breeds. BMC Genomics. 2013;14:398.
Fan WL, Ng CS, Chen CF, Lu MY, Chen YH, Liu CJ, et al. Genome-wide patterns of genetic variation in two domestic chickens. Genome Biol Evol. 2013;5:1376–92.
Han R, Yang P, Tian Y, Wang D, Zhang Z, Wang L, et al. Identification and functional characterization of copy number variations in diverse chicken breeds. BMC Genomics. 2014;15:934.
Yi G, Qu L, Liu J, Yan Y, Xu G, Yang N. Genome-wide patterns of copy number variation in the diversified chicken genomes using next-generation sequencing. BMC Genomics. 2014;15:962.
Yan Y, Yang N, Cheng HH, Song J, Qu L. Genome-wide identification of copy number variations between two chicken lines that differ in genetic resistance to Marek’s disease. Genomics. 2015;16:843.
Yi G, Qu L, Chen S, Xu G, Yang N. Genome-wide copy number profiling using high-density SNP array in chickens. Anim Genet. 2015;46:148–57.
Rao YS, Li J, Zhang R, Lin XR, Xu JG, Xie L, et al. Copy number variation identification and analysis of the chicken genome using a 60K SNP BeadChip. Poult Sci. 2016;95:1750–6.
Gorla E, Cozzi MC, Román-Ponce SI, Ruiz López FJ, Vega-Murillo VE, Cerolini S, et al. Genomic variability in Mexican chicken population using copy number variants. BMC Genet. 2017;18:61.
Strillacci MG, Cozzi MC, Gorla E, Mosca F, Schiavini F, Roman-Ponce SI, et al. Genomic and genetic variability of six chicken populations using single nucleotide polymorphism and copy number variants as markers. Animal. 2017;11:737–45.
Xu L, He Y, Ding Y, Sun G, Carrillo JA, Li Y, et al. Characterization of copy number variation’s potential role in Marek’s disease. Int J Mol Sci. 2017;18:E1020.
Wang X, Byers S. Copy number variation in chickens: a review and future prospects. Microarrays (Basel). 2014;3:24–38.
Skinner BM, Robertson LB, Tempest HG, Langley EJ, Ioannou D, Fowler KE, et al. Comparative genomics in chicken and Peking duck using FISH mapping and microarray analysis. BMC Genomics. 2009;10:357.
Cooper GM, Nickerson DA, Eichler EE. Mutational and selective effects on copy-number variants in the human genome. Nat Genet. 2007;39:S22–9.
Kidd JM, Cooper GM, Donahue WF, Hayden HS, Sampas N, Graves T, et al. Mapping and sequencing of structural variation from eight human genomes. Nature. 2008;453:56–64.
Groenen MA, Megens HJ, Zare Y, Warren WC, Hillier LW, Crooijmans RP, et al. The development and characterization of a 60K SNP chip for chicken. BMC Genomics. 2011;12:274.
Kranis A, Gheyas AA, Boschiero C, Turner F, Yu L, Smith S, et al. Development of a high density 600K SNP genotyping array for chicken. BMC Genomics. 2013;14:59.
Avendano S, Watson K, Kranis A. Genomics in poultry breeding: from Utopia to deliverables. In: Proceedings of the 9th world congress on genetics applied to livestock production. Leipzig; 2010. pp. 1–6.
Affymetrix. http://www.affymetrix.com. Accessed 15 May 2016.
Wang K, Li M, Hadley D, Liu R, Glessner J, Grant SFA, et al. PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection i40n whole-genome SNP genotyping data. Genome Res. 2007;17:1665–74.
Illumina. http://www.illumina.com. Accessed 20 May 2016.
Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26:841–2.
Aken BL, Achuthan P, Akanni W, Amode MR, Bernsdorff F, Bhai J, et al. Ensembl 2017. Nucleic Acids Res. 2017;2017(45):D635–42.
Mi H, Huang X, Muruganujan A, Tang H, Mills C, Kang D, et al. PANTHER version 11: expanded annotation data from Gene Ontology and Reactome pathways, and data analysis tool enhancements. Nucleic Acids Res. 2017;45:D183–9.
Hu Z, Park CA, Reecy JM. Developmental progress and current status of the animal QTLdb. Nucleic Acids Res. 2016;44:D827–33.
Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, et al. The human genome browser at UCSC. Genome Res. 2002;12:996–1006.
Fulton JE, McCarron AM, Lund AR, Pinegar KN, Wolc A, Chazara O, et al. A high-density SNP panel reveals extensive diversity, frequent recombination and multiple recombination hotspots within the chicken major histocompatibility complex B region between BG2 and CD1A1. Genet Sel Evol. 2016;48:1.
Wang Y, Gu X, Feng C, Song C, Hu X, Li N. A genome-wide survey of copy number variation regions in various chicken breeds by array comparative genomic hybridization method. Anim Genet. 2012;43:282–9.
WDC conceived the study, performed the analysis and wrote the draft. AW contributed to the analysis, methods and discussion. JF collected the data and contributed to the discussion. JCMD contributed to the methods and discussion. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Availability of data and materials
The data that support the findings of this study are available from Hy-Line International but restrictions apply to the availability of these data, which were used under license for the current study, and thus are not publicly available. However, data are available from the authors upon reasonable request and with permission of Hy-Line International.
The data and the blood samples were collected on experimental farms that complied with the UEP (United Egg Producers) certified program for animal well-being.
Consent for publication
The study was supported by Warsaw University of Life Sciences travel fund and funding from Hy-Line International.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.