Linkage disequilibrium in French natural populations of Drosophila melanogaster

Summary — Seventeen French natural populations of Drosophila melanogaster were analyzed to detect linkage disequilibrium between pairs of 6 polymorphic allozyme loci. The estimates of linkage disequilibrium were made from azygotic frequencies using both Burrows’ and Hills’s methods. No difference between these 2 methods was found. The amount of significant linkage disequilibrium detected was small and similar to those in other natural populations of D. melanogaster. Out of the 15 combinations examined, only 2 pairs, Adh-a-Gpdh and Est-C-Est-6, showed a consistent significant linkage disequilibrium in the populations studied. However, for the first pair, the result was probably due to an association between the loci and the inversion (2 L) t of the second chromosome. For the Est-C-Est-6 pair, the disequilibrium detected might result from an interaction effect between the 2 genes inter se. These results again show the difficulties in detecting linkage disequilibrium due to epistasis between allozyme genes in natural populations.


Introduction
Population studies of genetic variation are classically discussed in terms of single-locus variability measures, such as heterozygosities and changes in gene frequencies. However, there is much interest in knowing the genetic structure of populations at the multilocus level. The application of electrophoretic techniques to analyze genetic variation (Harris, 1966;Hubby and Lewontin, 1966) provides much information at the multilocus level, because a large number of genetic markers can be studied simultaneously in a single individual. Therefore, investigations made on allozyme polymorphism involve the estimation of linkage disequilibrium in natural and experimental populations of a variety of organisms (see Hedrick et al., 1978, for a review).
Various authors (e.g., Lewontin, 1974) have suggested that information about linkage disequilibrium among allozymes might be useful to explain the adaptive value of biochemical polymorphism. But unfortunately, the results obtained by the authors studyng linkage disequilibrium at electrophoretically variable loci in natural populations of Drosophila melanogaster Voelker et al., 1977;Langley et al., 1978;Inoue et aG, 1984;Yamazaki et al., 1984) are reconcilable with several models of population genetics. Consequently, even in the absence of inversion, it is difficult to determine whether these results are due to epistatic natural selection or to random genetic drift. However, we think that it is important to determine the nature and magnitude of linkage disequilibrium in natural populations, because the investigations may perhaps help in the study of interactions between genes and in developing new hypotheses about the mechanisms involved in the maintenance of allozyme polymorphism. In this paper we report a study of linkage disequilibrium among 6 polymorphic allozyme loci in 17 natural populations of D. melanogaster collected from different regions of France.

Collections
Wild Drosophila melanogaster adults were collected and brought to the laboratory for electrophoresis. All collections were made during the annual demographic burst of the species (between August and October).

Estimation of linkage disequilibrium
In this study almost all the data were analyzed by a 2-allele system. If more than 2 alleles exist at a locus, they have been grouped in 2 classes: the most frequent allele corresponding to the first class, and the others to the second.
Let us consider loci A and B, each having, respectively, 2 alleles. A-a (frequency of A : p) and Bb (frequency of B: q), 4 gametes are possible : AB, Ab, aB, and ab. If the gametic frequencies are, respectively, f11, !2. f 2l , and f 22 , the linkage disequilibrium D is given by : In order to make the values of the parameter D less sensitive to change in gene frequency, several other measures of gametic disequilibrium are useful in various contexts. The correlation coefficient R D/Vpg (1-p) (1-q) was used by Hill and Robertson (1968) and by Franklin and Lewontin (1970). However, in a sample of individuals taken from a population, the degree of linkage disequilibrium cannot be estimated directly from the genotypic frequencies when the coupling and repulsion heterozygotes cannot be distinguished. In this case, estimation of linkage disequilibrium can be done in several ways. Hill (1974) provides a maximum-likelihood method where the population is assumed to be random mating and in Hardy-Weinberg equilibrium at each locus. In the case of 2 codominant alleles per locus, the frequency of one gamete (for example AB) estimated by the maximum-likelihood method (f 11 ) is given by a cubic equation : with N II , N 12 , N 21 , N!, and N corresponding, respectively, to the observed numbers of AABB, AABb, AaBB, AaBb, and total individuals in the sample.
In Eq. (1) the only unknown is f11. Hill suggests that an initial value : f11 (4N » + 2N l2 + 2N 21 + N! )l2N pq can be substituted into the right-hand side of (1) and the resulting expression regarded as an improved estimate and itself substituted into the right-hand side of (1 The iterative process is continued until stability is reached and D obtained as : D = f 11pq. A test for D = 0 is given by : K = N D 2 /pq (1-p) (1-q), with Kfollowing the chi-square distribution with one degree of freedom.
A second approach, suggested by Burrows (see Cockerham andWeir, 1977 andLangley et al., 1978), is simply used to estimate the overall covariance of non-allelic genes in individuals. This method does not require that one distinguish between the 2 types of double heterozygotes and know the mating system. Burrows's parameter is estimated by : distribution with one degree of freedom (Cockerham and Weir, 1977). The correlation coefficient based on Burrows's estimation is : R = A/2 ! pq (1-p) (1-q).
In any population, all the loci are not necessarily in Hardy-Weinberg equilibrium. Therefore, we used not only Hill's method, which assumes that the loci are in accordance with the Hardy-Weinberg law, but also Burrows's estimation. Moreover, it was interesting to compare the results obtained by both methods because this was done only in few cases. Table I gives, for each population, the number of flies analyzed per locus and the frequencies of the most common allele at each locus. With regard to the distribution of allelic frequencies, the populations collected in 1983 were analyzed in another paper (Charles-Palabost et al., 1985), and those of 1984 will be analyzed later. Concerning the goodness of fit to Hardy-Weinberg equilibrium, the use of the X 2 test is not appropriate in some cases, since the expected numbers of genotypes are too small. Therefore, each a value given in Table I is the probability that the genotypic frequencies distribution of a random sample are farther from the expected Hardy-Weinberg model than the corresponding observed distribution. These values were obtained by means of Monte-Carlo simulations, using the observed allelic frequencies as the real frequencies and under the null hypothesis in which the populations are in Hardy-Weinberg equilibrium. This test is consequently frequency independent. We observe that 21 a values out of 101 are significant and among these 21 significant values, 10 are due to the presence of a rare genotype in the samples. It means that generally, the observed frequencies of heterozygotes per locus in each population are in good agreement with those expected under the Hardy-Weinberg law. A significant excess of heterozygotes was found only at the a-Gpdh locus of the S6vres population. Table II shows the frequencies of the observed heterozygotes for each locus and population. Classically, the amount of variation differs greatly from one locus to another. The average heterozygosity over the 6 loci analyzed ranges from 0.092 in the Nevez population, to 0.250 in the Ivry-sur-Seine and S6vres populations. Except for Nevez, the mean heterozygosities obtained are similar to those estimated previously in other French natural populations of D. melanogaster (Girard and Palabost, 1976). The values of linkage disequilibrium estimated by Burrows' (A and R b ) and Hill's methods (D and R h ) are given in Table III for the unlinked loci (located on different chromosomes) and in Table IV for those linked (located on the same chromosome). The use of the x 2 distribution in order to determine the significance level of a linkage disequilibrium implies that in a sample of 100 individuals, the frequencies of the most common alleles at each of the 2 loci must be smaller than 0.85 (Montchamp-Moreau, 1985). Thus, the significance levels in Tables III and IV correspond to the probability that the linkage disequilibrium estimated from a random sample is greater than the linkage disequilibrium estimated from the sample analyzed. These probabilities were obtained using Monte-Carlo simulations, under the null hypothesis of a disequilibrium equal to 0. This test is independent of the distribution, but assumes that the observed allelic frequencies are the real frequencies in the populations. We can note that the values of D and A are very similar for unlinked as for linked loci. By contrast, the correlation coefficients R h (Hill's estima-tion) and R b (Burrows's estimation) are different and, in most cases, R b is smaller in absolute values than R h (161 cases out 216 values). When R b = R h (in 55 cases), no double heterozygotes are present in the samples and 0 = 2D; this result is particularly evident for unlinked loci. With Hill's method, 23 out of the 216 comparisons made between pairs of loci are significant, which represents a percentage of 10.6. The percentages obtained, respectively, for the unlinked and linked loci are 10.5 (13/124) and 10.9 (10/92). With Burrows's method, these values are 15.3% (33/216) for all the loci, 11.3% (14/124) and 20.6% (19/92), respectively, for unlinked and linked loci. In the present study, out of the 15 combinations between allozyme loci, only the pair Est-C-Est-6 shows a significant linkage disequilibrium in most of the populations : 4 D values out of 18 populations sampled (22%) and 8 0 values (44%) are significant (Table IV). Using combined data of all the populations, a significant deviation was obtained only in 2 cases : for the Est-C-Est-6 pair and also for Adh-a-Gpdh. With Hill's estimation, the values are, respectively, for Adh-a-Gpdh and Est-C-Est-6 pairs : D = 0.0116 (P < 0.01), R h = -0.0991, and D = -0.0097 (P < 0.01), R h = -0.0943. The corresponding values with Burrows's estimation are :A=-0.0129(P<0.01),f? b =&dquo;0.0548,andA= = -0.0132 (P 0.01), Rb=-0.0643.

Discussion
The results of the present study are not essentially different from those obtained by other investigators in natural populations of D. melanogaster. The amount of linkage disequilibrium detected in the French populations surveyed is small, but nevertheless higher than the amount reported in other natural populations of D. melanogaster, which reveal a significant linkage disequilibrium of around 5-9% of the analyzed pairs of loci (see, for example, Mukai et aL, 1971Mukai et aL, , 1974Mukai and Voelker, 1977;Yamaguchi et al., 1980;Yamazaki et aL, 1984). But in the studies previously mentioned, the method used to detect linkage disequilibrium is the extraction of whole chromosomes by the marked inversion technique. Therefore, our results are more strictly comparable to the data reported by Langley et al., 1978), because they calculate Burrows's estimation R b using genotypic data obtained in natural populations of D. melanogaster. However, they also report a small proportion of significant linkage disequilibrium (5.1 % for linked loci and 6.7% for those unlinked).
Among the 15 combinations between the 6 enzymatic loci studied, the data provide clear evidence of a significant linkage disequilibrium for only 2 pairs of linked loci : Adha-Gpdh and Est-C-Est-6. The same result was obtained by Triantaphyllidis et al. (1981) for the Adh-a-Gpdh pair in Greek populations. This may suggest consistent epistatic interactions between these pairs of genes (Lewontin, 1974). But another explanation is possible in the case of Adh-a-Gpdh; the linkage disequilibrium detected in our populations might be due to an association between these 2 loci and the inversion (2L)t in the same chromosome arm. In effect, the inversion (2L)t is located on the left arm of chromosome 2 and contains the a-Gpdh locus, while the Adh locus is outside and very near to the breakpoint of this inversion (Lindsley and Grell, 1968). Unfortunately, the frequencies of inversions were not analyzed in our populations. However, data of natural populations collected in the Northern hemisphere show a significant negative gametic disequilibrium between these 2 loci only when all the chromosomes (chromosomes with standard sequence and chromosomes with inversion (2L)t ) are considered. This disequilibrium remains negative but not significantly different from 0 when the In (2L)t chromosomes are removed from the analysis (Mukai et al., 1971;Langley ef al., 1974Langley ef al., , 1978Alahiotis e t al., 1976;Voelker et al., 1977;Yamaguchi et al., 1980;Yamazaki et al., 1984). Consequently, in our opinion, the linkage disequilibrium currently observed between Adh and a-Gpdh is probably due to the association of these loci with In (2L)t, despite the well known interactions between these 2 enzymes (Geer etal., 1983(Geer etal., , 1985. The result obtained for the Est-C-Est-6 pair appears more interesting since, in this case, Est-C and Est-6 are located in different arms of chromosome 3. Few cases of significant linkage disequilibrium between esterase loci have been previously reported in natural populations of D. melanogaster (see for example Johnson and Schaffer, 1973;Langley et al., 1978; Laurie-Ahlberg and Weir, 1979), but such a result is known in other organisms such as salamander (Plethodon cinereus; Webster, 1974); and barley (Hordeum spontaneum; Kahler and Allard, 1970). In D. melanogaster, this linkage disequilibrium could be explained by interactions between the 2 loci themselves or by interactions between these loci and inversions located on the same or different arms of the chromosomes 3. This last hypothesis was tested by several authors (Kojima et al., 1970;Mukai et al., 1974;Langley and Ito, 1977;Yamazaki et al., 1984). In most cases, when inversions (3R)P and (3L)Pwere analyzed simultaneously with the esterase loci, no evidence of linkage disequilibrium was found between these 2 inversions, or between them and the esterase loci. The physiological function of esterases remains unknown (Dickinson and Sullivan, 1975;Danford and Beardmore, 1980), but the esterase loci may code for a class of closed proteins, probably functionally related. Therefore, the significant gametic disequilibrium observed between Est-C and Est-6 might be examined in terms of interactions between genes metabolically related, as suggested by Zouros and Krimbas (1973) and then by Zouros and Johnson (1976) for 2 other enzymes. However, in our populations, it is not possible to eliminate entirely the influence of inversions in the origin of linkage disequilibrium found between Est-C and Est-6 loci. Thus, for a better and more extensive evaluation of this result, it is necessary to know the population size (since genetic drift could gives rise to an important disequilibrium; Montchamp-Moreau and Katz, 1986) and to verify if this linkage is maintained over time.