Computer-assisted selection of restriction enzymes for rrs genes PCR-RFLP discrimination of rhizobial species

L'analyse du polymorphisme de longueur des fragments de restriction d'ADN amplifie (PCR-RFLP) presente de nombreux avantages pour la caracterisation de la diversite microbienne: relative rapidite, faible cout, possibilite d'application a l'ensemlble ries taxons bacteriens, bonne reproductibilite, possibilite d'analyse simultanee d'un grand nombre de souches, interet au niveau taxonomique. Afin d'optimiser sort usage, nous avons cherche a determiner combien d'enzymes et lesquels etaient le plus approprie, en effectuant une analyse systematique par ordinateur de la capacite de 25 enzymes de restriction differentes a generer par PCR-RFLP, seules ou en combinaison, (les profils de digestion permettant la distinction d'especes proches taxonomiquement. La sous-unite 16S de l'ARN ribosomal (rrs; pour ribosomal RNA small subunit) a ete choisie pour simuler ces PCR-RFLP: elle correspond a la region la plus frequemment etudiee et permet en general une bonne discrimination au niveau de l'espece; de plus, de nombreuses sequences rrs sont disponibles dans les banques de donnees. L'etude a ete focalisee sur les rhizobia qui representent un bon modele de genres differents, entremeles avec d'autres genres tres proches taxonomiquement, et parce que leur importance en agriculture induit la necessite d'outils performants pour la caracterisation de grands nombres de nouveaux isolats. La digestion de 24 sequences rrs correspondant aux souches-types de differentes especes a ete simulee par ordinateur. Les profils de digestion obtenus pour les differentes sequences ont ete compares deux par deux afin de determiner le nombre de profils distincts generes par chaque enzyme individuellement. ou par les combinaisons systematiques de 2, 3, 4, 5, 6 ou 7 enzymes. Differentes combinaisons de 3, 4 ou 5 enzymes permettent la distinction entre les especes. La frequence a laquelle les differentes enzymes sont trouvees dans les i:ounfiinaisons discrimimantes ne correspond pas completement a leur pouvoir de discrimation individuel, mais a la complementarite qu'elles presentent lorsqu'ellas sont analysees ensemble. La combination des profils obtenus avec six enzymes, choisies parmi les sept puis frequentes dans les combinaisons discriminantes, permet en moyenne de distinguer plus de 99,5%, des sequences etudiees, quelle que soit la similarite entre ces sequences. L'utilisation d'au moins six de ces sept enzymes est suggeree afin d'assurer une bonne distinction des especes de rhizobia, connues et inconnues.

A recurrent question concerning PCR-RFLP is how to determine which and how many restriction enzymes must be used for the analysis. The selection of the most appropriate enzymes has frequently been proposed by empirical analysis of experimental results (and often by taking into account commercial availability and price). This kind of approach is appropriate to distinguish between a small number of known species. However, if the number of examined species increases, it rapidly becomes a tedious and difficult task. In contrast to other PCR-based DNA-typing methods such as RAPD (randomly amplified polymorphic DNA; Williams et al., 1990), AFLP (amplified fragment length polymorphism; Zabeau and Vos, 1993), use of primers derived from repetitive extragenic palindromic (REP) and enterobacterial repetitive intergenic consensus (ERIC) sequences (Versalovic et al., 1991) or transfer ribonucleic acid (tRNA) gene fragments (McClelland et al., 1992), PCR-RFLP offers the possibility, often underused, of checking experimental results with DNA sequencing data. Moyer et al. (1996) proposed the use of this advantage to estimate the efficacy of ten commonly used tetrameric restriction enzymes for studies of microbial diversity in nature, by computer-simulated restriction enzyme digestion of known sequences and subsequent analysis of size-frequency distribution of the resulting fragments. In this study we developed a different computerassisted approach to estimate the ability of 25 different commercially available tetrameric restriction enzymes to generate, singly or in combination, discriminating restriction patterns allowing the distinction of closely related species. The final aim was to propose a screening approach for the selection of the enzymes, and to predict the number of enzymes required to ensure differentiation of known and unknown species.
We focused on the 16S rRNA region (SSU rRNA, for small subunit of ribosomal RNA or rrs) because it is the most frequently used bacterial gene for PCR-RFLP, and because it allows, in general, good discrimination at the species level. Furthermore, the large number of available rrs sequences in data bases makes it possible to simulate restriction patterns for a wide range of microorganisms.
Rhizobia were chosen as target microorganisms because they represent a good model of different genera interleaved with various other closely related genera, such as Agrobacterium and Phyllobacterium (Jarvis et al., 1988;Willems and Collins, 1993;Yanagi and Yamasato, 1993;Young and Haukka, 1996), and because their importance in agricultural and environmental studies induces the need for powerful tools for efficient screening of large numbers of new isolates.
Sequences were aligned using the Clustal program (Higgins and Sharp, 1988).
Uncertainties were resolved by comparing with the sequences of the other species, and, when available, with sequences of strains of the same species. PCR amplification with primers FGPS5-255 (5'-ATGGAAAGCTTGATCCTGGCT-3') and FGPS1509'-153 (5'-AAGGAGGGGATCCAGCCGCA-3') (Normand, 1995) was simulated by ending all the rrs sequences with primer sequences.
These two primers were selected because they experimentally allow successful amplification of almost full length rrs genes of all rhizobial strains tested (data not shown).

Restriction patterns simulation
For all the selected sequences, the size of fragments obtained after digestion with each of the 25 following tetrameric restriction endonucleases was determined: AccII, AciI, Alul, Bfal, BsaJI, Bsll, Cac81, DdeI, DsaV, Fnu4HI, Haelll, Hhal, Hinfl, HpaII, MaeII, MaeIII, MboI, Msel, Mwol, NIaIII, NIaIV, Rsal, Sau96I, TaqI, rsp5091 (recognition sites and isoschizomers are given in table !. The patterns obtained for the different species were compared pairwise to determine the distinct restriction patterns generated by each enzyme. To simulate experimental conditions, due to the resolution threshold of standard 3 % w/v agarose gel electrophoresis, only fragments larger than 90 bp were taken into account in the analysis. Furthermore, fragments differentiated by less than 5 % were considered as similar.

Enzyme combinations analysis
The cumulative patterns generated by multiple combinations of different sets of enzymes were determined for each of the species. For each combination, the cumulative patterns of the different species were compared pairwise in order to evaluate the number of distinct cumulative patterns. The number of discriminating combinations (i.e. combinations that differentiate all the species) was scored for each set of combinations; the frequency of occurrence of each enzyme in the discriminating combinations was calculated and expressed as the ratio of occurrence of the enzyme to the total number of discriminating combinations for a given set. Because of the great number of combinations to be tested, we developed a simple analysis using HyperTalk language on Macintosh computers.

RESULTS
All the 25 enzymes tested were theoretically able to cut at least once each of the rrs sequences selected. They induced between two (Hinfl on A. tumefaciens and R. etli; MnII on B. elkanii and B. japonicum) and 16 restriction fragments (BsaJI on M. dimorpha). The average number of measurable fragments (> 90 bp) for the rhizobia sensu stricto showed a range of 2.5 to 7.6 per enzyme, with an overall average of 5.2 (table I). These theoretical patterns were confirmed in vitro for some of the enzymes and species (data not shown). One pattern only (obtained by digestion with RsaI of the amplified rrs rRNA of S. saheli type strain ORS 609) was not as expected and yielded three bands instead of the four predicted by computer simulation, but the comparison of aligned sequences showed an error in the published sequence, inducing a false additional restriction site. It was corrected for further analysis.
From three to 12 distinct restriction patterns were detected with the 25 enzymes for the 17 rrs sequences of the rhizobia sensu stricto, and from four to 17 for the global set of 24 sequences (table I). The number of patterns was not related to the number of fragments. For example, a mean number of measurable fragments of 3.6 was sufficient to obtain 12 distinct patterns with Bsll, whereas only three patterns were found for a mean number of fragments of 5.7 with NIaIII. None of the enzymes was able to distinguish alone among the set of sequences. Decreasing the limit of detection of the fragments (i.e. from 90 to 80 bp) did not significantly modify this analysis (data not shown). The different patterns obtained for each enzyme and each of the 24 sequences are listed in table IL The discriminating power of multiple combinations of the 25 enzymes was estimated for the rhizobia sensu stricto and for the global set of sequences. The total number of possible combinations of n enzymes chosen from a total of N available enzymes is equal to: N!/(N-n)!n!. Therefore, for 25 available enzymes the numbers of combinations of two, three, four and five enzymes are 300, 2 300, 12 650 and 53 130, respectively. The average number of distinct patterns per combination for the 17 sequences of rhizobia alone was 11.2, 12.9, 13.9 and 14.7 for the combinations of two, three, four and five enzymes, respectively, and 16.2, 19.0, 20.5 and 21.6 for the global set of 24 sequences. The efficiency of the different enzyme combinations in distinguishing between the sequences (expressed as the average percent of sequences detected) is presented in figure 1.
Only one combination of two enzymes (BsII and Fnu4HI) distinguished between all the sequences of rhizobia. The same combination was also sufficient to distinguish between the global set of sequences. The number of discriminating combinations of three, four and five enzymes was nearly the same for rhizobia alone and for the global set of sequences (table 111). The discriminatory index (i.e. the frequency at which an enzyme is found in a discriminating combination) was estimated for each enzyme and for each combination. The result for the combinations of five enzymes is shown in table L The discriminatory indices were practically identical for the two sets of sequences, and ranged from 0.11 to 0.59. As for single enzyme patterns, the discriminatory index of enzymes was not related to the number of fragments. On the other hand, the enzymes producing the best discriminatory indices in multiple combinations were also the ones showing as a whole the larger number of distinct patterns when used singly (table I). For the ten enzymes that were the best when used in combination to distinguish between sequences, the mean number of individual distinct patterns was 10, whereas for the ten less discriminatory enzymes the mean was 6.5 only. However, the discriminatory index was not entirely correlated with the number of distinct patterns: for example, Fnu4HI, the best enzyme in combination, allowed the detection of only nine distinct patterns among rhizobia species when used singly, whereas HpaII had a discriminatory index almost three times lower, in spite of its ability to distinguish alone 12 patterns.
The seven enzymes that gave the best results (Fnu4HI, BsII, Cac8l, AcaT, Hinfl, Sau96I and HpaII) were selected for further analysis. Their ability to distinguish at the species level was studied by comparing the patterns generated with these enzymes when used either singly or in combination.
All the possible combinations of two, three, four, five, six and seven enzymes were tested on the two previously studied sets of sequences (rhizobia sensu stricto and the global set of 24 sequences): the percentage of differing sequences detected ranged from 60 when one enzyme was used to 100 when seven enzymes were used in combination (figure 1).
The different combinations of the seven selected enzymes were also applied independently to the sequences corresponding to the three genera of fastgrowing rhizobia (Rhizobium, Mesorhizobium and Sinorhizobium). The results for the Mesorhizobium set were very similar to those obtained with the global sets of sequences (figure 1). The average percentages of differing sequences calculated for the Rhizobium species were clearly higher, from 80 % with one enzyme up to 98.9 % for the combination of only three enzymes. In contrast, more enzymes were necessary to distinguish between the Sinorhizobium sequences.

DISCUSSION
Rhizobia classification is constantly and rapidly changing. Since the separation of the fast-and slow-growing rhizobial species into two different genera, Rhizobium and Bradyrhizobium (Jordan, 1982), three new genera have been recognised among root-and stem-nodulating bacteria of legumes: Azorhizobium (Dreyfus et al., 1988), Sinorhizobium (de Lajudie et al., 1994) and Mesorhizobium (Lindstr6m et al., 1995). Seventeen different species are currently recognised for the five genera (Young and Haukka, 1996), but this number increases continuously as new species are discovered and described using the classic tools of DNA/DNA hybridisation and rrs sequencing. Furthermore, various bacteria have rrs sequences that are very similar to those of fast-growing rhizobia. Agrobacterium and Phylobacterium are two genera of bacteria of the Rhizobiaceae, generally inducing cortical hypertrophies (galls or nodules) on plants (Jordan, 1994). Blastobacter aggregatus (Hirsh and Miiller, 1985) and Mycoplana dimorpha (Gray and Thornton, 1928) are two species isolated from lake water and soil, respectively. Because both species can be isolated from the same habitats (Agrobacterium especially), it is important to be able to differentiate them from rhizobia.
Our computer-assisted simulation confirmed that PCR-RFLP of the rrs gene is an appropriate tool to distinguish between species of rhizobia and related bacteria, as previously demonstrated by Laguerre et al. (1994a), who also showed the validity of this method in estimating the genetic relationship between species. However, as a consequence of the increasing number of recognised species, it appears that discrimination between all the named species of rhizobia becomes increasingly difficult to ensure with a single set of few enzymes. For example, the minimum of four enzymes (CfoI, Hinfl, Mspl and NdeII) found by Laguerre et al. (1994a) to be sufficient to discriminate between all the species recognised at that time can no longer separate S. meliloti from the newly described species S. medicae (Rome et al., 1996). To overcome this problem, one may be tempted to screen more enzymes in order to find a new discriminating combination. When looking at 25 different tetrameric enzymes, various combinations of three, four or five enzymes can be selected to allow complete discrimination of all the species studied (table 777). It is even possible to find two enzymes (B!S and Fnu4HI) that can, in combination, distinguish Therefore, for a more accurate choice it is important to sort the enzymes that are the most effective in combination. Looking at the frequency at which the enzymes are found in a discriminating combination allows them to be sorted according to their real efficacy in combination. The classification we obtained with the discriminatory indices was not directly related to the number of distinct patterns generated by each enzyme. This may be due to a complementary discriminatory power of the best enzymes when the results of each of them are considered together, and to a possible superfluous effect for the less effective enzymes. Moyer et al. (1996) also considered the same possibility of a 'synergistic' effect between enzymes to explain the better efficiency of one of the restriction enzyme groups they studied. The average percentage of sequences that can be distinguished by combination of the different enzymes enables prediction of the number of enzymes required to ensure the detection of a large part of the existing species. The average percentage of sequences detected by the seven selected enzymes used singly is quite different according to the set of sequences examined (figure 1). This can be explained by the level of similarity of the sequences within each set. The five sequences of Sinorhizobium studied are very close, and so are more difficult to distinguish.
On the other hand, the percentage of Rhizobium sequences detected is high, and can be explained by the fact that species which are not closely related are still gathered in this genus: Rhizobium galegae is not a typical member of the genus Rhizobium and its rrs sequence is close to the sequence of Agrobacterium species (Young and Haukka, 1996); even more important is the presence of an insertion of 72 nucleotides in the rrs of Rhizobium tropici IIA (Willems and Collins, 1993), which results in the fact that restriction patterns for this sequence are always different from the others. Also interesting is that, whatever the differences for one enzyme, the average percentage of sequences detected always exceeds 99.5 % when combinations of six of the seven selected enzymes are used (figure 1). This suggests that six selected enzymes will be sufficient to ensure good separation between rhizobia species.
Although computer simulation of enzymatic digestion of rrs may underestimate technical problems such as the partial digestion of PCR products with some enzymes, or the existence in the same bacteria of different rrs copies that are not necessarily identical, theoretical testing of the discriminatory power of the different restriction enzymes appears to be a good way of optimising PCR-RFLP analysis. It allows the choice of the best enzymes to discriminate, in combination, the different species of rhizobia. It shows that these best enzymes are not necessarily those that are the most discriminating when used singly, but that there is a complementary discriminative ability when the results of digestion with these enzymes are analysed in combination. It indicates also that the use of six enzymes chosen from the more discriminating ensures the distinction of isolates at the species level. This approach could be used on other bacterial taxa.