Amylase polymorphism in Drosophila melanogaster: haplotype frequencies in tropical African and American populations

The frequencies of phenotypic haplotypes at the Amylase loci of D melanogaster were determined in 10 samples from 7 different tropical origins, including the African mainland, Indian Ocean Islands and the French West Indies. Altogether, 2 110 haplotypes were scored and 10 different electrophoretic alleles were identified. Allelic frequencies were calculated with the assumption that 2 functional loci occur on each second chromosome. The data of 3 temperate populations from Texas, Japan and France (1 238 haplotypes) were also included for comparisons. Genetic diversity, measured either at the allelic or haplotypic levels, was extremely variable between populations, with expected heterozygosities ranging from 2 to almost 90%. The most diverse populations are found on the African mainland while temperate populations are characterized by the predominance of the Amy-1 allele; a very low diversity was also found in the Mascarene islands. Genetic distances were similarly close between populations from temperate regions, Guadeloupe islands and Mascarene islands, in spite of large geographic distances. On the other hand, African mainland populations, despite their high diversity and geographic proximity, could be very distantly related at the genetic level. With 10 different alleles, 55 different phenotypic haplotypes (ie not discriminating between the proximal and distal loci) may be produced, and 34 were identified. Among the 21 missing haplotypes, 20 had very low expectancy under the assumption of free recombination (total expected number 5.9). Only one (Amy 3-5) had a higher expectancy * Correspondence and reprints (8.9). Therefore, most of the possible haplotypes have been produced during the course of evolution in spite of the tight linkage between the 2 loci, and 3 possible mechanisms are discussed. All these observations seem better explained by stochastic processes than by selective pressures. Ancestral populations on the African mainland have accumulated a large number of alleles and haplotypes, but their genetic differentiation suggests restricted gene flows. In other parts of the world, the low diversity could be explained by demographic bottlenecks related to recent colonizations. amylase / polymorphism / tropical populations / Drosophila melanogaster Résumé-Le polymorphisme de l'amylase chez Drosophila melanogaster : fréquences des haplotypes dans les populations tropicales d'Afrique et d'Amérique. Les fréquences phénotypiques des haplotypes au locus Amylase ont été déterminées dans 10 populations tropicales de D melanogaster provenant d'Afrique de l'Ouest, des îles de l'océan Indien et des Antilles françaises. Au total, 2110 haplotypes ont été analysés et 10 électromorphes identifiés. Les fréquences alléliques ont été calculées …

(8.9). Therefore, most of the possible haplotypes have been produced during the course of evolution in spite of the tight linkage between the 2 loci, and 3 possible mechanisms are discussed. All these observations seem better explained by stochastic processes than by selective pressures. Ancestral populations on the African mainland have accumulated a large number of alleles and haplotypes, but their genetic differentiation suggests restricted gene flows. In other parts of the world, the low diversity could be explained by demographic bottlenecks related to recent colonizations. amylase / polymorphism / tropical populations / Drosophila melanogaster Résumé -Le polymorphisme de l'amylase chez Drosophila melanogaster : fréquences des haplotypes dans les populations tropicales d'Afrique et d'Amérique. Les fréquences phénotypiques des haplotypes au locus Amylase ont été déterminées dans 10 populations tropicales de D melanogaster provenant d'Afrique de l'Ouest, des îles de l'océan Indien et des Antilles françaises. Au total, 2110 haplotypes ont été analysés et 10 électromorphes identifiés. Les fréquences alléliques ont été calculées sous l'hypothèse que 2 locus fonctionnels existent sur chaque chromosome 2. Les résultats obtenus pour 3 populations de régions tempérées, Texas, Japon et France (1 2.i8 haplotypes phénotypiques) ont été inclus pour des comparaisons.

INTRODUCTION
In spite of its domestic and cosmopolitan status, Drosophila melanogaster now appears to be a species geographically highly differentiated; numerous genetic traits help to distinguish its allopatric populations (for reviews see Lemeunier et al, natural populations and the between population variation was estimated with the fixation index, F ST (Wright, 1951). Values ranged from 0.025-0.585 with an average of 0.091 t 0.130, an indication of a large overall amount of local differentiation. This variability may be accounted for by climatic adaptations, since many loci exhibit latitudinal trends, by genetic drift related to the colonization history of the species David and Capy, 1988) and also by a possible restricted dispersal capacity .
Amylase polymorphism was generally not considered in such studies, since the structural duplication of the locus (Bahn, 1967) prevents an easy estimate of allelic frequencies. It is, however, known that amylase loci exhibit high levels of polymorphism and large interpopulational variations (Hickey, 1979;Singh et al, 1982;Dainou et at, 1987). Recent investigations at a molecular level have shown that the 2 structural loci are expressed as an inverted duplication and are separated by only some 5 kb (Levy et al, 1985;Boer and Hickey, 1986;Doane et al, 1987): such a structure should result in a very low recombination rate between the 2 loci.
Also the amylase duplication is likely to exist in all individuals (Gemmill et al, 1986;Langley et al, 1988) and it is probable that the 2 copies on each chromosome are functional (Hawley et al, 1990).
Estimating the allelic frequencies at a duplicated locus raises problems similar to those encountered in studying autotetraploid species. For example, if only 2 isoamylases are expressed in a fly (producing a 1-2 phenotype), the number of copies of allele 1 may range from 1-3. In favorable cases, variations in staining intensity of the electrophoretic bands make it possible to infer the number of copies of each allele. In the case of amylase in D rreelanogaster, variations in band intensity were observed, but it was not possible to relate them clearly to the number of copies. Similar observations were also made in previous studies (Hickey, 1979 ;Singh et al, 1982) and were related to complex regulation of the structural loci (Hoorn and Scharloo, 1978;Hickey, 1981;Yamazaki and Matsuo, 1983;Klarenberg, 1986;Matsuo and Yamazaki, 1986;Doane et al, 1987).
Using an amylase-null strain, the phenotypic haplotypes found in natural populations, ie the gametic associations of electrophoretic alleles were studied. Single wild males were crossed to Amf&dquo;&dquo; females and thus isoamylases expressed by the P I progeny are those of each male chromosome (see Methods). For this we chose tropical populations, and especially African ones, because of their higher level of electrophoretic diversity. From these results allelic frequencies could be determined, and the genetic polymorphism could be studied in the usual way under the assumption of a general duplication. Moreover, each chromosome association may be considered as a fairly stable genetic structure also suitable for measuring the intrapopulational polymorphism and the level of heterozygosity. Finally, the diversity of haplotypes provided some insight into the recombination processes occurring in natural populations, even if, in this kind of study, it was not possible to identify the alleles carried by the proximal and the distal loci.

MATERIALS AND METHODS
The populations studied had 7 different origins, 6 in the Afrotropical region, mainland and Indian Ocean islands, and one in Tropical America (Petit Bourg; Guadeloupe island, West Indies). Africain mainland populations originated from the Congo: Brazzaville and Dimonika, : 400 km east of Brazzaville, in the Nlayombe coastal mountains, and from Benin: Cotonou. Three island populations were also sampled, from Reunion (Cilaos), Mauritius (Port Louis) and Seychelles (Victoria on Mahe Island).
In 2 cases, repeated collections were made in the same locality and will be considered here as independent samples, so as to check the genetic stability of local populations: 3 samples came from Brazzaville and 2 from Guadeloupe island.
For some comparisons, haplotype frequencies in 3 temperate populations were also considered. They are Brownsville (Texas) and Katsunuma (Japan) from Langley et al (1974) and Villeurbanne (France) (unpublished data). copies of the same allele. Altogether 2110 haplotypes were sampled. Inspection of these tables shows a clear similarly among samples taken in the same locality, but large variations between geographically distant populations. These general patterns may be analysed at 3 levels: alleles, haplotypes and genotypes.
Allelic frequencies are usually used for calculating the mean heterozygosity of each population. The expected heterozygosities (Ha), calculated according to Nei and Roychoudhury (1974) are given in table II. Because of the close proximity of the 2 structural loci, each haplotype appears as a stable genetic structure whose frequency can also be used for calculating haplotypic heterozygosities (Hh) given in table I. As expected, Hh is always higher than Ha, but the 2 values are extremely variable among populations and strongly correlated, as shown in figure 1. The graph evidences a geographic pattern: the most diverse populations are found on the African mainland (Congo and Benin) while populations with a very low diversity are found on the Indian Ocean islands (Reunion and Mauritius), and in temperate regions.
Allele and haplotype frequencies can also be used for estimating the amount of differentiation between samples and populations. Two estimators have been used for such an analysis: the genetic distance according to Nei (1972) and the absolute distance, according to Gregorius (1984) which is similar to the percent similarity index widely used by ecologists (see for example Schoener, 1974). Both methods provide concordant information, and the values obtained with Nei's formula are given in table III. The genetic distance matrices from the 2 sets of data (allelic and haplotypic frequencies) were transformed into phenograms by using several algorithms which have been developed for this purpose: the UPGMA method (Sneath and Sokal, 1973), the minimum-sum-of-squares of Fitch and Margoliash (1967) from the PHYLIP computer package (Felsenstein, 1984). In the dendrograms, the 3 temperate populations were included. All of the methods yielded similar topologies although with variable branch lengths. Two examples of dendrograms are given in figure 2.
We see that samples taken in the same locality (Guadeloupe or Brazzaville) are always branched together. The figure clearly shows that a fairly homogeneous cluster includes the 3 temperate populations, the Mascarene and the Guadeloupe populations, in spite of the huge geographic distance between some of them: this group is characterized by a low genetic diversity and a strong predominance of the Amy-1 allele. By contrast, African mainland populations are much more diverse and they do not branch in a single cluster. For example Dimonika and Brazzaville are clearly separate, although only 400 km distant. The Cotonou population is the most different from all other samples, while < 2 000 km from Brazzaville. Finally, the Seychellian (Mah6) population seems intermediate. Sometimes it branches with the low diversity group, but it could also branch with the African group of populations as shown in figure 2.
The divergence between populations may also be appreciated by considering the geographic distribution of each allele. Allele 5.l, (not described in Dainou et al, 1987) was only found in 1 copy in Reunion and, so far, seems restricted to this population. All other alleles have been found in at least 4 distinct populations. For example allele 1. 4 , which seems very rare throughout (5 copies among 4 220) has been found in Congo, Benin and Guadeloupe. On the other hand, the neotropical Guadeloupe population is apparently lacking alleles 5 and 6 which are abundant and widespread in the Afrotropical region. Allele 3.l!, which is widespread throughout the African mainland, is absent from Guadeloupe and from the Indian Ocean islands. Finally, allele 1, which is considered to be the most frequent allele in the species (Doane, 1969;Hickey, 1979;Singh et al, 1982) is found abundantly everywhere, but its frequency varies from 0.241-0.957. Still higher frequencies may be found in some temperate populations, for example 0.991 in Villeurbanne. A general way to appreciate the level of genetic differentiation between geographic populations is to calculate the fixation indices F ST (Wright, 1951;Singh and Rhomberg, 1987). Values were calculated for each allele separately and given in table II. Assuming that geographic differentiation was due to stochastic processes, F ST indices estimated either from different loci (Lewontin and Krakauer, 1973) or from different alleles at the same locus (Weir and Cockerham, 1984) should be similar.
For the common widespread amylase alleles ( 1, 2, 3, 4, 5 and 6) the F ST indices apparently fall into 3 separate classes: values ranging between 0.07-0.085 for alleles 2, 3 and /, ; values of 0.15 for alleles 5 and 6; finally a value of 0.30 for allele 1. This suggests that the various alleles do not provide exactly the same information, and that in some cases some selection effects could occur. A more precise conclusion does not seem possible because of the complexity of statistical comparisons and of the small number of populations considered in the present work.
Since each haplotype may be considered as a fairly stable genetic structure, we may ask the question: were the genotypes under a Hardy-Weinberg equilibrium? For each sample of wild collected males (all of them, except the Mauritius population) the expected genotypic frequencies were calculated under the panmixia hypothesis and compared to the observed frequencies by X 2 analysis. For example, for the 3 Brazzaville samples, X 2 values were 2.48, 4.68 and 4.91 for 5, 10 and 7 degrees of freedom respectively. Other populations yielded similar results so that the panmictic hypothesis, at the haplotype level at least, appears to be acceptable. In other words there were apparently no selective effects on the genotypes between zygote formation and adult stage. However, if samples are not very large selection may be too weak to produce significant departures from Hardy-Weinberg. In the present case one can reasonably say that the sizes of the natural populations appear sufficiently large, as is usual in the tropics, to prevent this as well as any heterozygote deficit due to inbreeding. An important point arising from the present study is the diversity of haplotypes. D melanogaster exhibits a remarkable diversity of its electrophoretic alleles at the Amylase loci since 13 different alleles have so far been identified (Dainou et al, 1987 and present study). Moreover similar and apparently identical alleles are found in different, geographically distant populations, and a likely explanation is that alleles designated by the same number originated from a single mutation (or recombination) ie that they are identical by descent. For example, allele 2, which is widespread all over the world, would produce the same protein everywhere. Such an assumption is implicit in all papers so far published on amylase polymorphism of D menalogaster. We find (table I) that allele 2 may be associated with alleles 1, 3, 3.4, 4, 5 and 6, which are the most frequent in the species and all present in African populations. Moreover we also assume that the homohaplotype 2 which is quite common, carries 2 copies of the same allele. In other words allele 2, originating at one of the structural loci, either proximal or distal, has been able to invade the adjacent locus. The possible mechanisms for such recombinations will be considered in the discussion. Here we shall only compare the observations with what could be expected under a null hypothesis ie a free recombination between alleles, as if they were not strongly linked. The problem of the proximal or distal position of each allele will however not be considered.
With a total of 10 different electrophoretic alleles, 45 different heterohaplotypes may be found, and 10 homohaplotypes. Only 34 different haplotypes were actually observed in this study (table I). However, among the possible haplotypes, most of them have a very low expectation. The 21 missing haplotypes, with their total expected numbers, are given in table IV. We see that the absence of only one, A!ny 3-5 is significant ( X 2 test) since 8.9 copies were expected. All the 20 other haplotypes have an expected frequency around or less than unity, ie a very low figure with respect to the total number involved in the present study. At this point we may conclude that during the evolutionary history of the species, most of the possible haplotypes have been produced by recombination. This implies that the rate of production of new electrophoretic alleles, by mutations, is much less than the rate of recombination between the 2 loci, which is itself very low. Observing most of the theoretically possible haplotypes does not mean that their frequencies corresponded to a panmictic equilibrium. Under the free recombination hypothesis the expected haplotype frequencies were calculated by using the allelic frequencies shown in table II. The distributions of the expected and observed numbers in each sample were compared by X Z analysis. In all populations harboring a high genetic diversity (heterozygosity > 0.4), very significant values were found. For example, in the 3 samples from Brazzaville, the homogeneity x 2 values were generally not significant, but this is a consequence of the low number of possible haplotypes and of the limited number of degrees of freedom.
It seemed interesting, for each haplotype in each sample, to consider the direction of disequilibrium, ie the amount of excess or deficit as compared to the expected number. Since we could not identify the position (proximal or distal) of each allele, we did not use the index D for describing the linkage disequilibrium. A more neutral description has been preferred, the ratio of the numbers of observed to expected haplotypes. This analysis was done only for the most diverse populations and those for which a high number of haplotypes was scored; the 3 samples of Brazzaville were pooled, and also the 2 samples from Guadeloupe. The 2 other populations considered were Dimonika and Cotonou, and the results are given in table V. For each case for which a sufficient number of haplotypes was either observed or expected, the difference was analysed statistically with a X 2 test, assuming 1 df for the threshold values.
As a first approximation, the ratio r helps to identify haplotypes in significant excess, in significant shortage or in approximate equilibrium. Among the first category, we can mention 2 homohaplotypes 1-1 and 2-2 (the non-significant excess of Arrcy 1-1 in Guadeloupe is presumably due to the very high frequency of this haplotype). Several heterohaplotypes are also everywhere in excess, such as 3-3.4, 3.4-6! 4-6 and 5-6. Among the haplotypes which are always in too low frequency, we may mention 1-2, 1-3.l,, 1-l!, 1-5 and 3-6. In several cases, on the other hand, discordant observations are found in different populations. For example, haplotypes 1-3 and 1-!!, which are in great shortage in Brazzaville (124.6 expected but 57 observed) are close to equilibrium in Dimonika, 400 km away (23.5 expected against 25 observed). The differences are much more pronounced if we compare more distant populations, such as Brazzaville and Cotonou. For example, haplotype 1-6, clearly deficient in Brazzaville (6 found against 45.7 expected) is close to equilibrium in Cotonou (37 found against 38.8 expected). Still more surprising, haplotype 6-6, which is in excess in Brazzaville, is completely absent in Cotonou.

DISCUSSION
The analysis of haplotype frequencies of the Amylase loci is a powerful means for discriminating natural populations of D melanogaster. According to geographic origin, the gametic diversity, measured by the allelic or haplotypic heterozygosity, may vary from < 0.02 (Villeurbanne) up to 0.89 (Dimonika). A low diversity is always correlated with the prevalence of allele 1 and, in this respect, we confirm previous investigations either based on phenotypic analyses (Hickey, 1979;Singh et al, 1982) or on haplotypes (Langley et al, 1974). There is however clear evidence that, on the Afrotropical mainland, the frequency of Amy-1 is usually < 50% and may decrease to 25% such as in Dimonika or Cotonou. The frequency of this allele is thus the most variable among populations, with a fixation index F ST =0.31.
Up to now, 13 different electrophoretic alleles have been identified in natural populations (Dainou et al, 1987 and present study). Their world distribution, based on phenotypic analyses of more numerous populations, will be described elsewhere.
Nine different alleles are regularly present in tropical African populations and this observation, correlated with the high genetic diversity, is a strong argument for assuming that these populations have evolved during a long time in the same area. We have good evidence that the species originated in West Africa David and Capy, 1988;Lachaise et al, 1988).
All dendrograms based on amylase data separate the populations into 2 groups. The first one comprises the temperate populations, those from Mascarene islands and those from West Indies. According to a former classification (David and Capy, 1988), they correspond to recent and new populations, with respect to the date of colonization. The genetic proximity does not correlate with geography (eg Brownsville and Katsunuma) and more likely reflects the domestic status of D melanogaster. A second group of populations is characterized by a higher diversity and a lesser prevalence of Amy-1. The position of the Seychellian population (Mah6) remains more uncertain. However, because haplotype data are more accurate than allele data, it seems to be related to the African group of populations, thus invalidating a previous suggestion of a Mediterranean origin for this population (David and Capy, 1982). A recent introduction from Southern Africa now appears a more likely hypothesis (unpublished observations). A most intriguing observation is the large amount of differentiation between African mainland populations, in spite of the geographic continuity and the apparent lack of ecological barriers. Much greater variations occur between Dimonika and Brazzaville (400 km) than between Europe and West Indies or between Texas and Japan and there is a clearcut separation of the Cotonou population. For the moment, among enzyme loci so far investigated (David, 1982;Singh et al, 1982;Singh and Rhomberg, 1987;and unpublished data) the case of Amylase loci is unique. For other loci, variations between different African countries are limited and generally correspond to long range trends, for example latitudinal clines.
The differentiation of amylase polymorphism over a medium geographic range in tropical Africa may be a consequence of strong local divergent selective pressures or of genetic drift accompanied by a very restricted gene flow (Slatkin, 1987). Several investigations have shown some selection occurring at the Amylase loci under laboratory conditions (De Jong and Scharloo, 1976;Hickey, 1979;Scharloo and De Jong, 1980;Haj Hamad and Hickey, 1982;Yamazaki and Matsuo, 1983;Matsuo and Yamazaki, 1986). Following the selectionist interpretation, we could suggest that Amy-1 has a strong selective advantage in most countries, either temperate or tropical, when D melanogaster tends to have a domestic ecology. Nevertheless it has been shown that the other Amy variants generally exhibit higher enzymatic activity than Amy-1 (Doane et al, 1987;Benkel and Hickey, 1986;Langley et al, 1988)..Moreover in the Afrotropical region, where Arrcy-1 is much less frequent, it would remain difficult to explain why other alleles, such as 4, 5 and haplotype 3-5 is noteworthy, since it was never found despite an overall expectation of = 9 copies. Assuming that all alleles sharing the same electrophoretic mobility are identical by descent a general recombination scheme, as utilized for example in table V, is possible only if a given allele can move from a proximal to a distal position, and reciprocally. Several mechanisms may be considered for a general production of heterohaplotypes. The most widely accepted process is unequal crossing over leading to a transient triplication (Ohta, 1980). In the present case, this mechanism seems very unlikely for two main reasons. First, a haplotype expressing 3 different alleles has never been observed. Second, and more importantly, an unequal crossing over implies a tandem duplication, while it is now known that the 2 amylase loci are transcribed divergently (Boer and Hickey, 1986;Gemmill et al, 1986;Benkel et al, 1987). Another possible process is gene conversion (Dover, 1980). In favor of this possibility, we may point out that several homohaplotypes, such as 1-1 or 2-2, are generally found in significant excess (table V). But this is not a general rule. For example, haplotype 6-6, found in significant excess in Brazzaville, was completely absent from Cotonou, while 24 copies would be expected. Although gene conversion may be considered as a general phenomenon in genome evolution, its frequency in natural populations is not known. Moreover, a loss of function of 1 locus in heterohaplotype will produce an apparent homohaplotype. This case certainly exists in natural populations, but with an unknown frequency. The third process which may change the position of the 2 loci is chromosome rearrangements. Because of sequence homologies, intrachromatid pairing is likely to occur with a significant frequency; any exchange outsite the duplication will move the alleles from the proximal to the distal position, and vice versa. The Amy'lll chromosome, which was found in a natural population (Haj Ahmad and , presumably arose from such a rearrangement at the level of the coding sequences (Gemmill et al, 1986). Langley et al (1988) also reported an apparent inversion of the interlocus region in one of the 85 chromosomes they analysed.
In conclusion, the polymorphism of amylase loci in D rrielanogaster raises several important questions concerning the evolutionary origin of the various alleles, the mechanisms of recombination, the spread of alleles and haplotypes in natural populations and the neutral vs adaptive significance of the genetic diversity. Further investigations combining molecular and ecological studies should help to solve some of these problems.