Evolution of the polymorphism at molecular markers in QTL and non-QTL regions in selected chicken lines (Open Access publication)

We investigated the joint evolution of neutral and selected genomic regions in three chicken lines selected for immune response and in one control line. We compared the evolution of polymorphism of 21 supposedly neutral microsatellite markers versus 30 microsatellite markers located in seven quantitative trait loci (QTL) regions. Divergence of lines was observed by factor analysis. Five supposedly neutral markers and 12 markers in theQTL regions showed Fst values greater than 0.15. However, the non-significant difference (P > 0.05) between matrices of genetic distances based on genotypes at supposedly neutral markers on the one hand, and at markers in QTL regions, on the other hand, showed that none of the markers in the QTL regions were influenced by selection. A supposedly neutral marker and a marker located in the QTL region on chromosome 14 showed temporal variations in allele frequencies that could not be explained by drift only. Finally, to confirm thatmarkers located inQTL regions on chromosomes 1, 7 and 14were under the influence of selection, simulations were performed using haplotype dropping along the existing pedigree. In the zone located on chromosome 14, the simulation results confirmed that selection had an effect on the evolution of polymorphism of markers within the zone.


INTRODUCTION
There is currently a large interest in characterising variation patterns in order to identify regions of the genome that are under selection. For that purpose, scans using microsatellites distributed over a genome [32,35] or concentrated around candidate genes under artificial or natural selection [2,28,43] are commonly performed to investigate signatures of selection. These studies highlight and compare among natural populations, differences in patterns of heterozygosity or linkage disequilibrium, but they only give a picture of variability at a certain time, with predictions of the evolution of polymorphism estimated mainly through simulations. Well-known pedigree experimental selected lines can be used to explore the evolution of polymorphism over several generations, leading to the introduction of a time component that helps to distinguish the influence of selection from the influence of drift.
Here, we investigate the joint evolution of neutral and selected genomic regions, using observations on microsatellite markers in a number of selected chicken lines. For this purpose, we compared the evolution of marker allele frequencies observed in supposedly neutral versus selected regions of the genome. Selected regions were chosen based on quantitative trait loci (QTL) detected in previous studies. An important aim was to determine which methods are suitable for identifying signatures of selection, and to compare those methods using a real dataset.

Selection design
We used four experimental chicken lines bred since 1994 in the INRA experimental unit ''Unité expérimentale de Génétique factorielle avicole'' (Nouzilly, France) and derived from an unselected base population of White Leghorn chickens [31] for which 42 founder animals of two lines (9 sires of a commercial line and 33 dams of an experimental line) were randomly mated (generation G2). The F2 population has become the base population, also named generation 0 (G0). Animals from G0 were randomly chosen to create the four lines, thus the parents of one line cannot be parents of another line.
Three of these lines were selected for high values according to three different criteria of immune response: antibody response three weeks after vaccination against the Newcastle disease virus (line 1, trait ND3), cell-mediated immune response at nine weeks of age (line 2, trait PHA) and phagocytic activity at 12 weeks of age (line 3, trait CC). The three lines have undergone mass selection with a restriction on the contribution of the different families (sizes of the different half-sib families were approximately balanced). The fourth line was the control line, in which the parents were chosen at random.
Within each line and at each generation (one generation per year), 15 males and 30 females out of about 100 candidates of each sex were chosen as parents 640 for the next generation. Mating was at random, except that full-and half-sib mating was avoided. This selection programme was conducted for 11 discrete generations (G1 to G11). All animals of the four lines were measured for the three traits. The pedigree was completely known.
Estimated heritabilities were 0.33, 0.12 and 0.24 for the traits ND3, PHA and CC, respectively, using pedigree and phenotypic data up to generation 9 [22]. For other detailed results on this experiment, including genetic gains, various criteria of genetic variability and evolution of the polymorphism at a single candidate gene, namely the Major Histocompatibility Complex (MHC) gene, see [21,22].

Genotyping
In order to compare the evolution of polymorphism of supposedly neutral areas and selected areas, we decided to compare the evolution of microsatellites from the Aviandiv panel (European project on the analysis of diversity in the chicken) and the evolution of microsatellites located within QTL regions, previously detected in independent studies on other lines.

Sampling of animals to be genotyped
Due to financial constraints, it was not possible to genotype animals in each generation. From G2, 37 founders out of 42 were genotyped because blood samples from five founders were either missing or improper for DNA extraction. To reconstruct the five missing genotypes, and to determine the phase of haplotypes in QTL regions, 55 animals from generation G1 were genotyped. Fifty animals of each line from G11 randomly chosen within half-sib families were genotyped.

Markers
The supposedly neutral markers are a set of di-nucleotide microsatellite markers used in a project on the biodiversity of chickens funded by the European Commission, namely known as the Aviandiv project [15]. These are distributed as uniformly as possible throughout the chicken genome. The position of the markers is given in Appendix 1 (published in electronic form only).
QTL regions affecting the immune response were primo-detected in two other experimental lines bred on the experimental unit of the Animal breeding and Genomics Group at the Wageningen University and Research Center (The Netherlands) [36][37][38]. The first population was an F2 originating from a cross of divergently selected lines for high and low antibody response to sheep red blood cells [42]. The second population was an F2 originating from a cross between two commercial lines [3]. Among the different regions detected, we chose six genome-wide significant QTL regions for different antibody titre traits.

Signature of selection in chicken 641
The presence of these QTL was not checked in our experimental lines due to financial constraints, which limited the number of genotyped animals. The MHC region (chromosome 16 -zone 7) was added to the analysis, since the MHC gene is a good candidate gene for immune response [22].
The distance between markers was defined according to estimations of allele frequency changes of markers under selection in mouse lines [18] and estimation of the extent of linkage disequilibrium in domestic sheep [23], since such estimations have not been conducted in chicken. The position of the markers is given in Table I. Genetic distances of existing markers were those defined by the consensus map [12] and genetic distances of the new markers were estimated from the consensus map and their position on the chicken genome sequence.
The genetic position of the three markers within zone 7 (MHC region) was found to be the same (~0 cM) on the consensus map: in order to run simulations, positions were arbitrarily set to 0.00, 0.05 and 0.10 cM in the strict case of this study.
Fluorescently labelled microsatellite markers were analysed on an ABI 3100 DNA sequencer (Applied Biosystems, Foster City, CA, USA) and genotypes were determined using GeneScan Analysis 3.7 and Genotyper Analysis 3.7 software (Applied Biosystems, Foster City, CA, USA). The GEMMA database was used to manage the informativity tests [16]. A recent analysis (Bed'Homunpublished results) of the markers located in the MHC region (zone 7) revealed the presence of a null allele for MCW370. The null allele was named AAA and genotypes were rebuilt according to specific associations of marker alleles within the zone. Appendices 2 and 3 summarise the observed allele frequencies in G2 and G11 (Appendices 2 and 3 are available in electronic form only).

Factor analysis
In order to get an overview of the distinction among generations and among lines, we performed a multiple-dimension principal component analysis (PCA) on all individuals, from generations G2, G1 and G11. First, PCA was based on genotypes at all markers. Second, in order to assess the influence of the different types of markers, PCAwas based, on genotypes at the supposedly neutral markers, on the one hand and on genotypes at markers in QTL regions, on the other hand.

Genetic variability criteria
In order to quantify genetic differences between the lines, we calculated standard descriptors of the genetic variability for each locus in G2 and in G11 within each line: observed heterozygosity H 0 and unbiased expected genetic diversity H exp [29]. Departures from Hardy-Weinberg equilibrium were estimated by calculating Wright's F is and F st according to Weir and Cokerham [45]. The null hypothesis (F is = 0) was tested by bootstrapping over alleles within samples. Population differentiation was tested by permuting genotypes among samples, assuming absence of Hardy-Weinberg equilibrium within samples. Pairwise linkage disequilibrium was estimated by testing the significance of association between genotypes at pairs of loci within QTL regions and across supposedly neutral loci; this analysis was performed in G2 and G11 within each line. P-values were obtained by randomisation of the genotypes at each pair of loci. In order to take into account the fact that multiple loci were examined, a Bonferroni correction was applied within each line. Calculations dealing with heterozygosity and linkage disequilibrium were performed using the F-STAT program [11].
In order to quantify the genetic divergence over time of our lines deriving from the founder population, we estimated the genetic distances. We assumed that mutations at the microsatellite markers could be neglected. It has been reported that divergence occurred on a short-term period and inbreeding increased within each line [21]. Thus, the Reynolds distance [34] is preferred because under the assumption of pure genetic drift, it is the least biased genetic distance for closely related breeds and exhibits the smallest standard error [20]. Since our different markers are polymorphic loci with balanced or unbalanced allele frequencies in the founder population, we used weighted estimates of Reynolds distance,D Ã R [20]. The standard error of the weighted Reynolds distance, rðD Ã R Þ, is equal to: where k 0, j is the number of alleles at the jth locus in the founder generation, n 0 and n t are, respectively, the number of alleles in the founder generation and in generation G11 and " F is the average inbreeding coefficient [20]. Here, weighted estimates of Reynolds distance and standard errors were computed between the G2 population and lines in G11, and across lines in G11, using the POPULATIONS programme [19]. In order to assess the influence of the different types of markers, genetic distances were estimated using genotypes at supposedly neutral markers, on the one hand and genotypes at all markers in QTL regions, on the other hand.

Temporal changes in allele frequencies
In order to detect markers for which the evolution of polymorphism departs from evolution under pure drift, we estimated temporal changes in allele frequencies for each locus. An estimate of the standardised temporal variance in allele frequency, f [47], was computed for each locus within each line over the 13 generations; the f c estimator of f, proposed by Nei and Tajima [30] was used: where k is the number of segregating alleles, x 0,i is the frequency of allele i in G2 and x t,i is the frequency of this allele in G11. The observed value of f c was compared to the distribution of f c obtained from simulations of populations under drift, with the same initial allele frequencies and the same inbreeding effective size [10]. P-values were computed for each locus. Because multiple loci were examined, expected false discovery rates, also known as Q-values, were calculated within each line using the QVALUE package [39]. The false discovery rate is the expected proportion of false positives among the tests found significant. A false positive is the term used to describe rejection of the null hypothesis (i.e., calling the test significant) when it is really true. We fixed the false discovery rate at a pre-determined level of a = 5% beforehand, in order to guarantee that the number of false positives would represent 5% or less of the number of significant tests. The estimate of the variance effective size (Ne V ) of each selected line was directly deduced from the value of " f c , using the equation of Waples [44]: where S 0 and S t are, respectively, the sample sizes in the founder generation (G2) and in generation G11, t is the number of generations and " f c , is the mean of f c across the different loci, weighted by the number of alleles [40]. This value was compared to the value of effective size calculated from the pedigree, Ne I ¼ 1=2ÁF.

Simulations
In order to detect markers undergoing selection, we simulated the evolution of polymorphism of the different markers along the existing pedigree. Simulations (1000 iterations) using haplotype dropping along the pedigree were performed. From the simulation iterations, a 95% confidence interval (CI) was drawn for the allele frequencies of each marker.

Signature of selection in chicken
Initialisation: A haplotype consisted in the different markers located within a defined zone. Haplotypes in the selected zones and genotypes at the supposedly neutral markers were known for the 43 individuals of generation G2. We drew different assumptions about QTL location in one of the selected zones and in that case, the favourable allele Q in generation G2 was either defined as linked to a marker allele within the zone or settled according to a given initial frequency.
Transmission: The approximate mutation rate in our dataset was calculated based on the number of new alleles in G11 (and confirmed with simulations), which yielded a mutation rate of 10 À7 . Therefore, a stepwise mutation model was used with a 10 À7 mutation rate. Recombination within the haplotype followed the Haldane model. Haplotypes and genotypes were dropped along the existing pedigree conditional on the observed phenotypes.
First, we tested the assumption of pure drift: transmission of haplotypes and genotypes followed Mendelian transmission rules. Second, we assumed the presence of QTL related to one of the three traits in one of the QTL regions and tested the assumption of both selection and drift: transmission of genotypes and haplotypes in zones without QTL followed Mendelian transmission rules whereas transmission of the haplotype in the zone with the QTL was conditional to the transmission of the QTL. Transmission of the QTL was conditional on the phenotype of the offspring and on the QTL genotypes of the parents. In that case, we used the Bayes theorem: where p(G i /z) is the probability that offspring inherit QTL genotype G i given its phenotypic value z. The so-called prior probabilities of the three QTL genotypes, p(G 1 ) = p(QQ), p(G 2 ) = p(Qq) and p(G 3 ) = p(qq), were calculated according to the genotypes of the parents. Probabilities of the phenotype given the QTL genotype, also called penetrance, were given by p(z/G i ) = u(z, l i , r 2 ), where l i is the phenotypic mean for the genotype i at the QTL and r is the phenotypic standard deviation (estimated in the base population, i.e., in generation G0). The distribution of the phenotype was assumed to follow a normal distribution. We set the QTL values for the trait to +a, (k*a) and Àa for genotypes QQ, Qq and qq, respectively, k being the degree of dominance, using the same scale as Falconer and Mackay [8].

Factor analysis
A two-dimensional analysis of all individuals based on genotypes of all markers discriminated individuals from G11 (Fig. 1a). The three selected lines were distinct and well distributed, although the control line overlapped with individuals from generations G2 and G1 in the middle of Figure 1a. The first two principal components explained in total 10% of the variance.
We obtained the same picture when using only the genotypes of markers in the QTL zones but not when using the genotypes of supposedly neutral markers (Figs. 1b and 1c): for supposedly neutral markers, individuals from G11 gathered at the centre and individuals from line 3 and from the control line overlapped.
A three-dimensional analysis of all individuals based on genotypes of all markers showed that individuals from generations G2 and G1 were in a different plane than individuals from G11 (results not shown): the third axis seems to represent time divergence between generations G2 and G11.

Genetic variability and genetic distances
F is values of six markers (one supposed to be neutral and five in QTL zones) in G2 were significantly different from zero, all markers showing an excess of heterozygosity. Excess of heterozygosity at the markers was observed for female founders originating from an experimental line with very few reproducers: in that case, allele frequencies are different for sires and for dams [33]; the more heterozygosity is in excess, the smaller is the number of reproducers. This excess was not observed anymore in G11. However, two markers showed a significant heterozygote deficiency: SEQALL427 (zone 3) in line 1 and ADL327 (zone 5) in lines 1 and 2. The supposedly neutral marker ADL278 showed a significantly negative F is value in G11 in line 2, whereas this marker did not show any departure from Hardy-Weinberg equilibrium in G2. The results of deviations from Hardy-Weinberg equilibrium as estimated by F is values are presented in Table II.
F st values ranged from 0.035 to 0.409. According to the Wright criterion, the important diversification (F st > 0.15) among lines in G11 was due to five supposedly neutral markers and 12 markers located in QTL zones. Estimated F st values (and standard deviation) of those markers are presented in Table III.

Signature of selection in chicken
No linkage disequilibrium between pairs of neutral loci was found, neither in generation G2 nor in G11. On the contrary, significant linkage disequilibrium occurred between pairs of loci within each QTL zone. Linkage disequilibrium was also tested between the selected markers across the zones: linkage was only observed between markers within a given zone (detailed results not shown). Furthermore, in our simulations, these results will allow us to consider the supposedly neutral markers as independent whereas markers located in a QTL zone will consist in a haplotype. Table IV gives the matrices of weighted Reynolds distances between the G2 population and the four lines in G11, estimated either with genotypes of the supposedly neutral markers (upper matrix) or with genotypes of markers located in all QTL zones (lower matrix). Genetic distances between G2 and any of the four lines in G11 tend to be larger using genotypes of markers located in all QTL zones than using genotypes of supposedly neutral markers. However, the Mantel test did not show a significant difference between the two matrices, whether individuals from the control line were taken into account or not (P > 0.05).

Simulations
The 95% CI were very large under the assumption of pure drift. The observed allele frequencies of six markers (in zones 1, 2 and 3) fell outside the 95% CI. The observed allele frequencies and 95% CI of those markers are given in Table V. There is no multiple testing in the results of simulations, but considering the total number of alleles per zone, we may approximate the expected number of false positives. The expected number of false positives is four for zones 1 and 2 and three for zone 3. The number of observed allele frequencies that fall outside the 95% CI is larger than the expected false positives for zone 2. Consequently, and according to previous results about genetic variability, we shall focus on zone 2 in greater detail.
QTL in zone 2 was primo-detected for antibody titre to Keyhole Limpet Hemocyanin (KLH) and Mycobacterium butyricum, which are complex antigens. Such complex antigens bind to Th1 or Th2 cytokines and lead to a combination of cellular-and humoral-mediated pathways [9,17]. Trait PHA corresponds to the cell-mediated immune response. To understand the evolution of markers located in this zone, different assumptions were drawn about the presence of a QTL affecting trait PHA (i.e., the selected trait in line 2). First, we compared the observed allele frequencies in G11 in the four lines. Second, we confronted the genotypes of individuals at each marker with the lowest and the highest PHA phenotypes. This gave us indications on any particular association between the marker alleles and the QTL alleles. Then, we tested different localisations of the QTL within zone 2, different degrees of dominance between the QTL alleles and different effects of the QTL on trait PHA. However, the observed allele frequencies of SEQALL455 never fitted the 95% CI drawn under the different assumptions about a bi-allelic QTL simulated within zone 2.
Further investigation of genotyping results led us to question not only the real polymorphism of two markers, namely SEQALL453 in zone 2 and ADL327 in zone 5: for both of them, a pseudo-null allele seems to exist (with a size of 209 bases for SEQALL453 and 107 bases for ADL327) and was not detectable 652 V. Loywyck et al. according to the other allele in the genotype. These assumptions may offset the effects of selection on these markers. Table VI shows the estimations of the effective size for each line, based on the rate of inbreeding using pedigree information (Ne I ) or based on variations of allele frequencies (Ne V ) either from supposedly neutral markers or from markers in all QTL zones.

Effective population size
The values obtained via the temporal variation approach (Ne V ) were always higher than the values derived from the rate of inbreeding (Ne I ). Moreover, Ne V values estimated either from supposedly neutral markers or from markers in QTL zones were significantly different everywhere except in the control line. The value from neutral markers was significantly lower than the one from markers in QTL zones in lines 1 and 2, and the opposite was observed in line 3. It has to be noted that, in the three selected lines, estimations of the effective size based on temporal allele frequencies at the MHC locus [22] were equivalent to Ne V using genotype information from markers in all QTL zones, but estimated values were larger in the control line i.e., 76 for the control line and 51, 65 and 41 for lines 1, 2 and 3, respectively.

Combining different methods for the detection of signatures of selection
Factor analysis gives a good overview of the divergence of lines and constitutes an interesting starting point in detecting signatures of selection. The nonsignificant difference between matrices of genetic distances, according to the type of markers considered, let us suppose that not all markers in the QTL zones are influenced by selection. The evolution of polymorphism of loci over time ( f c ) and fixation indices allowed us to focus on a smaller set of markers that may be influenced by selection.
Finally, to confirm which marker was actually under the influence of selection, simulations were performed since they could take the selection scheme into account (the pedigree was completely known).

Improving the detection of signature of selection
The extent of selective sweep and the distortion in allele frequency spectrum depend on the strength of selection and time since selection occurred e.g. [1,4] but also on original marker variability and marker density. In our experiment, the strength of selection was attenuated since we tried to balance the representation of the half-sib families. The low marker density in our dataset was partly due to the limited number of microsatellites known in the chicken genome and the limited number of polymorphic markers in our experimental lines. In chicken, dropping simulations along the pedigree would probably be more efficient using high-density genotypes. For instance, simulation results on bovine chromosomes [13] suggest that the signature of selection can be detected up to 1 Mb (assuming 1 Mb~1 cM) from a QTL. However, this effect may extend further since Pollinger et al. [32] showed a 40 Mb-selective sweep around a gene with a large phenotypic effect in dog (i.e., the TYRP1 gene known to be responsible for black coat colour).
To improve detection of the signature of selection in our experimental lines still using our microsatellite markers, an earlier generation should be genotyped. Indeed, the number of crossing-overs increases with time and any particular association between a marker and a potential QTL could be broken along the successive generations. This association could probably still be detected in earlier generations. This approach was confirmed by Wiener et al. [46] when comparing the effect of selection on GDF-8 (myostatin gene associated with double-muscling) in double-muscled breeds, using microsatellite loci at various distances from GDF-8. Their study showed that selection on GDF-8 had left a stronger mark in the breed in which the double-muscling mutation had been present for the shortest time.

Difficulties in detecting signature of selection on immune response traits
The results dealing with zone 2 (located on chromosome 14) agreed that selection had an effect on the evolution of polymorphism of markers within the zone. However, modelling selective sweep was not easy and the underlying Signature of selection in chicken model seems to be complex. A QTL may be involved in the evolution of polymorphism within this zone but not only, since the observed allele frequencies never exactly fitted the simulated CI. A polygenic background could be added or the presence of several QTL with low effects could be assumed with epistatic interactions within a zone, for instance. Crossbreeding (F1, F2 and backcrosses) created from generation G11 has been analysed for the three immune traits and the analysis showed a significant recombination loss for ND3, which highlights the important epistatic interactions for this trait [24]. Pleiotropic effects of QTL on the three traits could also be considered, since the pairwise genetic correlations were shown to be non-significant [26,31] but were still not null and the three traits represent different aspects of the complex mechanism of immune response.
Recent improvements in chicken genome mapping [27,41] have shown a certain number of discordances that led us to question the genetic position but also the order of microsatellites located within zone 2. Such discordances do not disturb findings from statistical analyses but could disturb results from simulations.
QTL were primo-detected for primary antibody response to specific antigens such as Sheep Red Blood Cells (SRBC), M. butyricum and KLH, and for Lipopolysaccharide (LPS) natural antibodies. However, as in mammals, immune responses in avian species are specialised in the elimination of antigens: responses to antigens are Th1-or Th2-mediated [7]. Th1 responses require the interference of type 1 T helper cells that directs immune response toward a cell-mediated response (cellular pathway). Th2 responses require type 2 T helper cells that favour the development of humoral response (humoral pathway). KLH and SRBC antigens represent Th2 responses whereas M. butyricum represents Th1 response.
In our experimental dataset, line 1 was selected for antigens against ND3, inducing a Th1 response [6] whereas traits selected in lines 2 and 3 deal with innate immune response. Markers from zone 2, primo-detected for antigens to KLH and M. butyricum and falling outside the 95% CI under assumption of pure drift in line 2, show that responses are rarely exclusively Th1-or Th2-mediated and even if immune responses to antigens follow the same pathway, there is additional complexity in the control of different antigens. The detected QTL were linked to immune response to specific antigens and could not match with our selected traits. This was confirmed by a recent experiment where antibody response to KLH, M. butyricum and LPS was tested in our experimental lines in generation G12 [25]: no difference was observed among lines for KLH and LPS antibodies, but line 1 selected for ND3 showed a significantly higher specific response to M. butyricum. Finally, this led us to retain the hypothesis that QTL may have not segregated in our experimental lines.

Effective population size
The effective size estimated from the rate of inbreeding (Ne I ) was slightly smaller than the effective size estimated from the variance of allele frequencies over time (Ne V ) of supposedly neutral markers. This agrees with Crow and Kimura [5] who pointed out that Ne I is usually smaller than Ne V when a small number of parents generate a large number of offspring, with both estimations assuming neutrality of the markers. However, a surprising result was that estimation of effective size based on allele frequency variation from G2 to G11 of markers located in QTL zones was larger than estimation from supposedly neutral markers for lines 1 and 2. This may be explained by selection acting like a backmoving force that draws allele frequencies in the same direction, whatever the selected line; in that case, fluctuations for allele frequencies are lower than for neutral loci e.g., [14]. Another explanation may be that samples are taken from extreme generations and a calculation based on temporal variation in allele frequencies does not take into account fluctuations that occur over generations: samples from intermediate generations would have given more information.
It seems that allele frequency variations at the supposedly selected markers are weaker than those of the whole genome, as for the MHC locus, which is involved in different stages of the immune response [22]. Could this indicate that variations of markers that influence ND3 or PHA traits are maintained by balancing selection, like variations at the MHC locus, and that detection of signatures of selection when it deals with immunity traits is rather difficult? In addition, since experimental animals are vaccinated against other diseases, do these vaccinations have an impact on our trait measures? This may explain why the observed allele frequencies of SEQALL454 in zone 2 fall out of the CI even in the control line.

ONLINE MATERIAL
The supplementary file (Appendices 1-3) supplied by the authors is available at: http://www.gse-journal.org. Appendix 1. Position of the supposedly neutral markers from the Aviandiv panel. Appendix 2. Observed allele frequencies for the markers located in the QTL zones. Appendix 3. Observed allele frequencies for the supposedly neutral markers (excluding those located in a QTL zone).