Genome size variation and evolution in North American cyprinid fishes

Genome sizes (nuclear DNA contents) were documented spectrophotomet-rically for 29 species of North American cyprinid fishes. The data were then merged with comparable genome size data (published previously) from an additional 20 North American cyprinid species. The distributions of DNA values within populations of the 49 cyprinid species were essentially continuous and normal. The proportion of DNA which apparently is free to vary quantitatively within cyprinid populations appears to be between 4 and 5 % of the genome. The distribution of DNA values among cyprinid species was more-or-less continuous, with considerable overlap among species with intermediate DNA values. Analysis of the average genome size difference (distance) between individuals drawn from successive levels of evolutionary divergence indicated that: (i) the majority of genome size divergence in North American c prinids has occurred above the level of individuals within populations of species, and (ii) the degree of genome size divergence in the extremely speciose cyprinid genus Notropis is greater than that between species in other, less speciose cyprinid genera. The hypothesis that genome size change might be concentrated in speciation episodes was tested by comparing the means and variances of genome size difference (distance) between species in the cyprinid genus Notropis (a species-rich phylad) and the centrarchid (sunfish) genus Lepomis (a species-poor phylad). The ratios of mean distances and variances in the Notropis versus Lepomis comparisons were greater than unity, suggesting that changes in genome size in cyprinids may be correlated with speciation episodes. Whether or not genome size change in cyprinids occurs at speciation sensu strictu is problematic. The data suggest that separate facets or levels of the cyprinid genome may follow independent evolutionary paths. genome size (DNA content) / cyprinid fish / natural selection / speciation Résumé-Variation et évolution de la taille du génome chez les cyprinidés d'Amérique du Nord-La taille du génome (estimée par la quantité d'ADN nucléaire) de 29 espèces Nord-Américaines de cyprinidés a été mesurée par spectrophométrie; les résultats ont ensuite été jumelés à des données comparables publiées antérieurement, obtenues sur 20 autres espèces de cyprinidés de la même aire géographique, et les analyses ont été conduites sur l'ensemble de ces données. Au sein des populations, la quantité d'ADN nucléaire suit une distribution continue et normale, et varie dans une proportion qui représente 4 à 5% du génome. Etudiée sur l'ensemble des espèces, la quantité d'ADN nucléaire présente une distribution quasiment continue, avec des chevauchements considérables entre …


INTRODUCTION
It has been known for several years that sizeable differences in genome size or DNA content often occur, even between closely related species (Mirsky and Ris, 1951;Bachmann et al, 1972;Sparrow et al, 1972. Kauffman (1971 initially hypothesized that the extensive genome size variation was related directly to organismal and/or genetic complexity. It is now clear, however, that no significant correlations exist between genome size and organismal (or genetic) complexity or phylogenetic advancement (Cavalier-Smith, 1985a;Price, 1988a). This has been termed the Cvalue paradox and represents a general biological problem among eukaryotes which to date remains unresolved (Price, 1988a,b,c).
Efforts towards explaining or understanding the C-value paradox have been focused primarily on the search for significant correlations between genome size and a variety of biological, biophysical or genetic parameters. What has emerged from these studies are several hypotheses which relate genome size in an inverse way to rates of organismal growth, metabolism or differentiation, and which invoke selection as the primary force responsible for the observed variation in genome size (Bennett, 1971(Bennett, , 1972Cavalier-Smith, 1978, 1985aSzarski, 1983;Sessions and Larson, 1987;Price 1988a). These hypotheses are confounded for several reasons. First, much of the data which document relationships between genome size and cell cycle patterns or certain life history parameters are from unicellular eukaryotes (eg, Cavalier-Smith, 1980;Shuter et al, 1983). The problem lies in the extrapolation to multicellular eukaryotes where it is often dif6cult to obtain direct, unbiased or standardized estimates of organismal growth and/or developmental rates. A second reason is that most, if not all, of the evidence is correlative and does not necessarily demonstrate cause and effect. A third reason is that nearly all of the genome size data are from distinct species or higher level taxa. Studies of genome size variation at lower hierarchical levels are few, and differences in genome size within species generally have been regarded as insignificant or unimportant (Bennett and Smith, 1976). Several recent studies, however, have shown that intraspecific variation in genome size may be substantial, and in some cases approximate the average genome size differences observed between species (Price et al, 1981(Price et al, , 1986Sherwood and Patton, 1982;Gold and Price, 1985;Gold and Amemiya, 1987;Johnson et al, 1987;Ragland and Gold, 1989). A final reason is that little attention has been paid to the mechanisms by which DNA might be gained or lost from a genome. The observations that species within cohesive groupings (eg, genera) often differ substantially in genome size and that interspecies genome sizes are frequently discontinuously distributed have led to the suggestion that genome size evolution may occur in a &dquo;quantized&dquo; fashion; ie, by a succession of large-scale changes (Narayan, 1982;Cavalier-Smith, 1985b). Subsumed within this problem is the question of whether genome size changes might be occurring disproportionally during speciation episodes. Several authors (Hinegardner, 1976;Morescalchi, 1977;Cavalier-Smith, 1978) have suggested that genome size change might be associated with speciation, although a direct correlation between genome size change and speciation has not been tested critically.
In the following, data on intra-and interspecific genome size variation among 49 species of North American cyprinid fishes are presented. The genome size data from 29 of the species are given for the first time. The subjects of primary interest in the paper are: (i) the pattern and magnitude of genome size variation within populations and among species, and (ii) the question of whether genome size changes are concentrated in speciation episodes.

MATERIAL AND METHODS
The collection localities of samples representing the 29 North American cyprinid species, whose genome sizes are reported here, are given in the Appendix, Table A1. All fish were collected by seine from natural populations. Fish sampled from Texas (TX) and Louisiana (LA) were returned live to our laboratory in College Station for processing; fish sampled from Oklahoma (OK) and Alabama (AL) were processed in facilities at the Oklahoma University Biological Station on Lake Texoma and at Samford University in Birmingham, AL, respectively. Except for Notropis lepidus, the samples of each species comprised 5 individuals taken from the same locality. The N. lepidus sample comprised 10 individuals from the same locality. Collection localities for the 20 other North American cyprinid species included in the data analyses in this paper, may be found in Gold and Amemiya (1987). In that study, the samples of each species comprised 10 individuals taken from the same locality.
Genome sizes were measured via scanning microdensitometry of Feulgen-stained erythrocyte nuclei using chicken blood as an internal control. The latter was obtained from a highly inbred, pathogen-free strain available from the Texas A & M College of Veterinary Medicine. Full details of slide preparation, staining and microdensitometry may be found in Gold and Price (1985) and Gold and Amemiya (1987). Fifteen erythrocyte nuclei were measured from each of 2 slides per fish (=30 nuclei/individual) and standardized as a percent of the mean absorbancy of 10 chicken erythrocyte nuclei on the same slide. Standardized absorbancy values of fish nuclei were coded (for convenience) by multiplying the percent chicken standard (for each fish nucleus) by 20. Statistical analyses of the data were carried out using either SAS (1982) or our own programs on the Texas A & M mainframe computer.
Means, standard errors and ranges for the 29 species were taken from the distribution of DNA values of individuals within each species. Distribution normality indices (g l and g 2 ) were taken from the distribution of measurements (nuclei) within each species. Descriptive statistics of genome size variation within and among the 20 cyprinid species not reported here may be found in Gold and Amemiya (1987). The methodologies used to determine genome sizes of individuals in all 49 species were identical. The current classification of the 49 species is shown in the Appendix,  (Lee et al, 1980).

RESULTS
Descriptive statistics (means + standard errors, ranges and the g i and g 2 indices of distribution normality) for the 29 species are given in Table I. Genome sizes ranged from 2.06 pg of DNA in Notropis callistius to 3.26 pg of DNA in Phenacobius catostomus, a difference of approximately 58%. The ranges of genome sizes within each of the 29 species varied in percent from 1.15 in Notropis beldus to 8.74 in Dionda episcopa, and averaged 4.11. Five of the 29 sampling distributions of measurements (nuclei) within each species were significantly non-normal. Of the 5, 3 were significantly platykurtic or flat, and 2 were significantly skewed towards higher DNA values.
Patterns and magnitude of genome size variation within populations of species The coded absorbancy data from the 49 cyprinid species examined to date were organized into a number of different sampling distributions and each was tested for distribution normality using the g l and 92 indices. The distributions tested included: (i) all measurements (nuclei) within each population (species) or sample (49 sampling distributions; N = 300 for populations where 10 individuals were examined and N = 150 for populations where 5 individuals were sampled); and (ii) a rankit distribution (Sokal and Rohlf, 1969) reflecting the distribution of DNA values of individuals within populations summed over all 49 populations. The latter was generated following eqn[l] in Gold and Amemiya (1987) in order to remove scaling effects due to individuals being drawn from different species. The results of the distribution normality tests are summarized in Table II. The majority of the distributions of measurements (nuclei) within populations were normal, although the incidence of non-normal distributions was higher than expected by chance at a = 0.05. The rankit distribution reflecting the distribution of DNA values of individuals within populations was significantly platykurtic, although the deviation appears slight (Fig 1).
Separate single classification analyses of variance (ANOVA) were used to test for significant heterogeneity of DNA values of individuals within each of the 49 populations (species) using the distribution of measurements (nuclei) of that species. All F-tests were significant at a = 0.05. A synopsis of the results of Duncan's multiple range test on each population is shown in Table III. The results demonstrate that significant differences in genome size occur among individuals within cyprinid populations and that, on average, approximately half of the individuals from any given population differ in DNA content.
The magnitude of genome size variation within cyprinid populations was estimated as the average of the percent maximum variation between individuals within populations. These values ranged from 1.15% in Notropis bellus (Table I)  Two approaches were used to examine the magnitude of genome size variation among the 49 species. The first was to carry out a nested analysis of variance (  (Sneath and Sokal, 1973). The average genome size difference (distance) between individuals drawn from successive levels of evolutionary divergence are shown in Table V. Estimates of average genome size distances between species in subgenera of Notropis and between species in Notropis and in other genera were obtained from subsets of GSD m i n values extracted from the 48 x 49 GSD m i n distance matrix. The average genome size distance between species in subgenera of Notropis, for example, involved first computing the average genome size distance value for each subgenus based on all pairwise comparisons between species in that subgenus, and then averaging these values over all subgenera. The same method was used to estimate the average genome size distance between species in genera other than Notropis. The estimate for species in Notropis is simply the average of all pairwise comparisons among 29 of the 31 nominal Notropis species examined. Both N atrocaudalis and N stramineus were not included in the latter estimate since the phylogenetic af6nities of these 2 species may lie outside of Notropis (Mayden, 1989). For similar reasons, N rubeldus and N baileyi were not included in the genome size distance estimate for the Notropis subgenus Hydrophlox (Mayden and Matson, 1988). The genus Pimephades was included in the genome size distance estimate for species within the genus Notropis since Pimephales is now believed to be closely related phylogenetically to certain lineages within Notropis (Cavender and Coburn, 1986). The estimate for species in the family is the average of all pairwise comparisons among all 49 species examined.
As shown in Table V, individuals drawn at random from a population of the same cyprinid species will differ, on average, by 0.388 genome size distance units (approximately 0.048 pg of DNA); whereas, any 2 individuals drawn at random from 2 different North American cyprinid species will differ, on average, by 2.322 genome size distance units (approximately 0.290 pg of DNA). This represents a 6-fold difference and strongly suggests that the majority of genome size divergence in North American cyprinids has occurred above the level of individuals within populations of species. Particularly noteworthy are the observations that (i) the degree of genome size divergence between species in the genus Notropis is approximately 5 times that between species in other cyprinid genera, and (ii) much of the divergence in Notropis has apparently occurred at the subgeneric rather than generic level. The most actively evolving Notropis subgenera in terms of genome size appears to be Cyprinella and Notropis, where the average genome size distance between species was estimated as 2.152 and 2.340 units, respectively. Since these are the 2 largest Notropis subgenera in terms of number of species, and since Notropis itself contains considerably more species than Campostoma, Nocomis or Phenacobius, the tentative implication of these data is that there may be a positive relationship between the number of species within a group or subgroup and divergence in genome size.

Genome size change and speciation
The findings that the majority of genome size variation in North American cyprinids appears to occur at the species level or above, and that a relationship may exist between the number of species within cyprinid groups or subgroups and divergence in genome size, suggest that genome size changes in cyprinids may be concentrated in speciation episodes. Ayala (1975, 1976) and Avise (1978) developed models which contrast expected means and variances of genetic differences or distances among extant members of rapidly versus slowly speciating lineages or phylads, and which may be used to assess whether genetic differentiation is correlated with speciation. Briefly, if genetic differentiation is essentially a function of time (gradual evolution), the ratio of mean genetic distances between species-rich versus species-poor phylads should be approximately 1, and the ratio of variances should be less than 1. Alternatively, if genetic differentiation is proportional to the number of speciation episodes (punctuated evolution), the ratio of distances should be greater than 1, and the ratio of variances should be much greater than 1. There are several assumptions inherent in using the models, the most important of which is that the species-rich and species-poor lineages under comparison be of approximately equal evolutionary age (Avise and Ayala, 1975;Avise, 1978).
In Table VI, the mean (d) and variance (s 2 ) of average genome size differences (distances) among 32 Notropis species (including the 3 species of Pimephales) are compared with comparable values from 8 species of the centrarchid (sunfish) genus Lepomis. The distance and variance values were generated as before (ie, extracted from the 48 x 49 GSD mi l1 cyprinid data matrix, and from a similar Lepomis data matrix described in Ragland and Gold, 1989). For reasons noted previously, the 3 species of Pimephales were included into the estimates for Notropis, whereas N atrocaudalis and N stramineus were not. For similar reasons (GV Lauder, personal communication), Lepomis gulosus was not included in the calculations of d and s 2 values for the genus Lepomis. As shown in Table VI, the ratio of mean distances is greater than 1, and the ratio of variances is very much greater than 1. According to the models, these results indicate that changes in genome size in cyprinids are correlated with speciation episodes. In Table VII, observed ratios of mean distances and variances for the comparison Notropis versus Lepomis and data from protein electrophoresis and morphological measurements are compared to those based on genome size. Taken at face value, the observed ratios suggest that differentiation in structural genes and morphology occurs primarily as a function of elapsed time.

DISCUSSION
The normality (or near normality) of genome size distributions within populations of cyprinids strongly suggests that DNA quantity changes at this level are small, involve both gains and losses of DNA, and are cumulative and independent in effect. This hypothesis is based on the assumption that the variation follows the premises of the normal probability density function (Sokal and Rohlf, 1969). An identical pattern of variation also occurs among populations of 9 species of the North American centrarchid genus Leporrcis (Ragland and Gold, 1989). Of importance is that no instance of a quantum or &dquo;quantized&dquo; (Cavalier-Smith, 1985b) difference in genome size among individuals has been found in the nearly 60 populations of cyprinids or centrarchids thus far studied. Comparable data from other organisms on genome size variation among several individuals within populations are few, and are limited primarily to the extensive researches by Price and colleagues on the plant Microseris douglasii (Price et al, 1981(Price et al, , 1986. In M douglasii, genome size variation is also continuous with no evident, large-scale differences in genome size occurring among individuals within populations. There was an apparent tendency towards platykurtosis in a few of the cyprinid populations genome size distributions, including the rankit distribution, which reflects the normalized variation of DNA values of individuals. Most of the deviations from normality, however, were slight and, in the case of the rankit values, the distribution only became platykurtic upon the addition of the 28 populations (species) reported in this paper, where sample sizes were restricted to only 5 individuals per population. This suggests that the observed platykurtosis may be a function of non-random sampling since typically most individuals were collected in only 1 or 2 seine-hauls and could represent close relatives (eg, full-sibs) rather than individuals drawn at random from population.
The proportion of DNA, which apparently is free to vary quantitatively within cyprinid populations, appears to be between 4 and 5% of the genome, as estimated from the average maximum genome size variation among all 49 populations surveyed. This quantity is approximately the same as that theoretically needed for the cyprinid structural gene component if one assumes the latter contains 50 000 coding nuclear genes per genome and there are 1500 coding DNA base pairs per gene. It seems unlikely, however, that coding structural genes would be regularly gained or lost from a genome without eventually resulting in a phenotypic disturbance or developmental irregularity. This suggests that up to 90% of the cyprinid genome is maintained quantitatively even though no specific functions are known for this DNA. As noted previously (Gold and Price, 1985), both the normality of distributions within cyprinid populations and the apparent constraints on the quantity of DNA which can vary strongly imply the action of stabilizing or normalizing selection operating through the truncation of deleterious extremes (Stebbins, 1966;Mettler and Gregg, 1969). However, while natural selection may be influencing genome size variation within cyprinid populations, there is no evidence at present to indicate that selection favours a particular cyprinid species DNA value relative to some organismal parameter (Gold and Amemiya, 1987). Two suggestions to account for interspecies genome size differences are the selfish DNA hypothesis (Doolittle and Sapienza, 1980;Orgel and Crick, 1980) and the hypothesis that genome size changes might occur primarily during speciation episodes (Hinegardner, 1976;Morescalchi, 1977;Cavalier-Smith, 1978). The basis for the former is that most eukaryotic genomes contain DNA sequences that can increase in copy number through differential replication. Presumably, these sequences are phenotypically inconsequential, at least to the point where the energy expended in replicating such DNA begins to infringe on the energy needs of the organism (Doolittle and Sapienza, 1980). In a very general way, the cyprinid genome size data are not inconsistent with the selfish DNA hypothesis in that: (i) there is significant variation in genome size within cyprinid populations which presumably is phenotypically inconsequential; (ii) species DNA values appear to be more or less randomly distributed within the variation which occurs; and (iii) individuals at the high end of the genome size distribution appear to be removed by negative selection. Alternatively, one might predict that if selfish DNAs contribute significantly to genome size variation, the underlying distributions of DNA values should not be normal. Species or populations where selfish DNAs are proliferating should show distributions skewed towards higher values; whereas, species or populations where selfish DNAs have accumulated to the point of impairing energy needs should show distributions skewed towards lower values. The genome size distributions in most cyprinid populations, however, are normal, and there appears to be no general tendency towards skewness in either direction.
The comparison of the means and variances of genome size distance between the cyprinid genus Notropis (species-rich phylad) versus the centrarchid genus Lepomis (species-poor phylad), suggests that considerable genome size change may occur during or be associated with cyprinid speciation episodes. Such a hypothesis is not contradicted by the findings that: (i) genome size variation within cyprinid populations is generally less than that among cyprinid species; (ii) cyprinid species genome sizes appear to be continuously and more or less randomly distributed within the variation which occurs; and (iii) there are no apparent associations between species genome sizes and various life-history characteristics (Gold and Price, 1985;Gold and Amemyia, 1987;this paper). A point to note, however, is that the evidence is essentially correlative and it would be difficult to determine experimentally whether the correlation was one of cause and effect or one of association. Moreover, intraspecific variation in genome size in both cyprinids and centrarchids can often be as great as the differences among species (Gold and Amemyia, 1987;Ragland and Gold, 1989; this paper). This raises some doubt as to the strengh or validity of the apparent correlation between genome size differentiation and speciation since, as noted by Ragland and Gold (1989), the generally lower intraspecific variation observed could stem from the homogenizing effects of gene flow within species. On the other hand, the finding that ratios of mean genome size distance and variance in the Notropis versus Lepomis comparison differ markedly from those reported for structural genes and morphology suggests that different levels of the genome may follow independent evolutionary paths. The simplest explanation for the difference in distance and variance ratios is that genome size evolution is dependent, in part, on speciation episodes, whereas structural gene and morphological evolution are dependent primarily on elapsed time. This explanation is unquestionably oversimplified and is based on the assumptions that: (i) the models of Ayala (1975, 1976) and Avise (1978) are appropriate and sufficiently robust, and (ii) Notropis and Lepomis are appropriate taxa for comparison. Neither assumption is without caveats (Avise, 1977;Mayden 1986), nor have the models been tested or used in any other organismal group outside of cyprinid and centrarchid fishes. Moreover, exactly how or why the difference might occur is somewhat problematic given the difficulty in studying speciation in situ nascent, as well as the wide variety of speciation modes (White, 1978;Templeton, 1980) theoretically pos-sible for any given speciation event. At this point, the conservative thesis is that genome size evolution may be decoupled from other levels of genome organization, and that genome size may, in fact, evolve in a &dquo;quantized&dquo; fashion as suggested by Cavalier-Smith (1985b).