A first generation bovine BAC-based physical map

A first generation clone-based physical map for the bovine genome was constructed combining, fluorescent double digestion fingerprinting and sequence tagged site (STS) marker screening. The BAC clones were selected from an Inra BAC library (105 984 clones) and a part of the CHORI-240 BAC library (26 500 clones). The contigs were anchored using the screening information for a total of 1303 markers (451 microsatellites, 471 genes, 127 EST, and 254 BAC ends). The final map, which consists of 6615 contigs assembled from 100 923 clones, will be a valuable tool for genomic research in ruminants, including targeted marker production, positional cloning or targeted sequencing of regions of specific interest.


INTRODUCTION
The detection of loci affecting economically important traits represents a major objective in livestock genomics. It should ultimately lead to more efficient breeding schemes (marker-assisted selection or MAS) and improve the accuracy and intensity of selection programs [18,22]. In this perspective, genetic maps constructed in various livestock species [3,12,26,29,46] are sufficient to detect regions containing genes and QTL. The identification and 107 In this paper, we report the assembly of a first generation clone-based physical map, for the bovine genome, combining fluorescent double digestion fingerprinting and STS marker screening. This map represents the first report of genome wide physical mapping using the fluorescent double digestion fingerprinting method.

BAC DNA preparation
BAC clones were grown in 2 mL 96-well blocks for 18 h 30 min at 37 • C with shaking, using 900 µL 2YT containing 12.5 µg·mL −1 chloramphenicol. Bacteria were pelleted by centrifugation at 3200 ×g for 8 min. The blocks were inverted to discard the supernatant and stored at −20 • C for one week until DNA preparation.
DNA preparation was performed using a modified alkaline lysis procedure. Cell pellets were resuspended by addition of 100 µL of TGE + 30 µg·mL −1 RNAse and vigorous vortexing. Lysis was achieved by a 3 min incubation step after addition of 100 µL of NaOH 0.2 M + SDS 1%. One hundred microliters of ice-cold potassium acetate (1.32 M, pH 4.8) were then added to each well, followed by 100 µL of lithium chloride (8 M). After 20 min at −20 • C, the lysis products were purified using Millipore filter plates (Multiscreen MANLY). The samples were then precipitated with 250 µL of isopropanol and centrifuged at 3200 ×g for 30 min with isopropanol. DNA pellets were washed with 250 µL of 70% ethanol per well and centrifuged for 10 min. The blocks were then inverted on paper towel to drain excess ethanol from the pellet and placed in a vacuum hybridization oven for 5 min to dry the DNA. Resuspension was achieved by addition of 10 µL of 10 mM Tris-HCl (pH 8.0) + 10 mg·mL −1 RNAse and vortexing after a 30 min incubation step at 37 • C.
The DNA concentration of 24 samples was estimated by fluorimetry for each plate. The plates were then calibrated at a mean concentration of 75 ng·µL −1 by addition of 10 mM Tris-HCl.

Fingerprinting
Restriction enzyme digestions and dye labeling were carried out simultaneously in a 10 µL reaction volume using 300−400 ng of BAC DNA, 2 units HindIII (Promega), 3 units HaeIII (Promega), 0.5 units Sequenase II (Amersham Pharmacia Biotech) and 3 pmoles ddA (R110, R6G or TAMRA from NEN) in restriction buffer C (Promega). The reactions were carried out for 90 min at 37 • C. In order to normalize signal intensities, R6G-labeled samples were diluted twice by adding 10 µL of sterile water.

Sample pooling and reaction cleanup
Multiscreen filter plates (Millipore, MAHVN45), filled with 360 µL of preswollen Sephadex G50 (G50 superfine 80 g·L −1 , Amersham Pharmacia Biotech) were used to achieve dye removal and sample desalting. Samples (10 µL each) from three plates labeled with different dyes were then pooled on the center of G50 mini-columns using a Hydra 96 dispenser. After a fiveminute incubation step, the samples were recovered by spinning down for 5 min at 910 ×g and stored at −20 • C until injection.

Sample injection and capillary electrophoresis
Seven microliters of the purified fingerprinting reaction were transferred to the injection plate and three microliters of a size-standard mix (2.8 µL formamide + 0.2 µL ET 900 ROX standard) were added. The samples were denatured in a heater block for 3 min at 90 • C prior to injection in a MegaBace 1000 automated 96 capillary DNA sequencer. Injection parameters and run conditions were respectively 3 kV for 90 s and 10 kV for 100 min using dye set 2 filters.

Data analysis and cleanup
The runs were analyzed with the Genetic Profiler software developed to perform the genotyping analysis on the MegaBace. After spectral matrix correction and peak identification, this software creates a "pks" file for each capillary that contains the scan value, the peak height and the peak width for all peaks detected in each channel. Since the Genetic Profiler software takes all peaks into account, it was necessary to eliminate background and artifactual peaks before exporting data to an FPC formatted file. Therefore, we developed a Visual Basic software (available upon request to the author) taking a "pks" file as input and applying three filters. The first one removes all peaks with a signal height lower than 200 units or not in the range of the analysis (55 to 750 bp); after an iterative estimation of the mean and standard deviations of the most likely distribution of peak height, a second filter eliminates external bands, the height of which is deviated by more than 2.5 standard deviations from the mean; the third filter removes peaks in the size range (+/-1 bp) of known artifacts (76 and 129 bp additional bands generated by FAM, 80 bp by TAMRA and 102 and 160 bp by R6G). This software rewrites "pks" files, so that modifications can be visualized by the Genetic Profiler. Good results were obtained, since about 70% of the sample were perfectly cleaned, except for data with 109 low signals or when multiple clones (cross contaminations) were present in the same well. Therefore, to provide reliable data, all samples were manually checked and edited if necessary before exporting to FPC. As described in [13], we submited fragment sizes as ×10 in order to deal with some limitations of FPC: we could thus set the TOL value at 5 to take into account the fact that the standard deviation of fragment size measurements was less than one base.

BAC library screening
A PCR-based screening was performed on the Inra bovine BAC library as previously described [14]. Briefly, primer pairs for markers were obtained from the BOVMAP database (http://locus.jouy.inra.fr). PCR reactions were performed on PTC-100 thermocyclers (MJ Research) in a 15 µL reaction volume with 1× standard buffer supplemented with 125 µM dNTP, 1.5 mM MgCl 2 , 0.5 µM of each primer and 0.035 U·µL −1 Taq polymerase (Promega). The samples were preheated for 5 min at 94 • C, subjected to 35 cycles (94 • C for 20 s, 50−60 • C for 30 s and 72 • C for 30 s), and a final extension step of 5 min at 72 • C.

BAC end sequencing
The Nucleobond AX100 kit (Macherey-Nagel) was used to prepare a BAC DNA suitable for end sequencing according to the manufacturer's recommendations. Sequencing reactions were performed as previously described [39].

Contig assembly
The map was constructed with FPC starting from an initial stringent build and using an incremental process, which consisted in joining contigs together based on end-end comparisons. This time consuming step requires generally to manually fuse and reanalyze contigs at higher cutoff values. In order to automate this work and to trace weak joins introduced by manual editing, we developed a new strategy using virtual markers. In fact, FPC allows a less stringent cutoff for clones sharing one or more markers and the "Incremental Build Contig" (IBC) option takes into consideration new marker data to automatically merge and reanalyse contigs. We thus transcribed end-end comparisons results in term of marker data. For example, end-end comparisons showed that clone bI0053B12 matched clone bI0133E09 at a 2 × 10 −11 value. We thus defined a Virtual End (VE) Marker (called V I0053B12 2E-11) hitting these two clones. Marker data were incorporated in FPC through *.ace files using the replace marker command. The same strategy was used to incorporate singleton at higher cutoff.
In practice, the first map was obtained at a 10 −13 cutoff value using the pure Sulston method, with the Tolerance parameter set to 5 (0.5 base in fact). Contigs with more than 5 Q (questionable clones that FPC failed to place properly in the map) were automatically reanalyzed at a lower cutoff value using the DQER (from 10 −13 to 10 −25 ) from FPC software. Three large contigs with more than 10 Q remained unchanged after reanalysis. These contigs encompassed 1059 clones, which displayed more than 75% shared bands with an unusually high number of related clones. These clones were discarded as they may represent clones with large repeated or duplicated sequences. Moreover, all contigs encompassing more than 15 clones were manually edited and split when they appeared doubtful. Singletons were then incorporated up to a 10 −10 cutoff value using virtual markers (called S clone name CutOff in FPC).
The next building step was an end-end comparison at a 10 −12 cutoff value. We retained as valuable, only reciprocal and unique fusions. Moreover, contigs split by the last DQER step and manual editing, could not be merged. Then, we automatically fused contigs based on end-end comparisons using Virtual Ends (VE) Markers (called V Clone Name CutOff in FPC) and the IBC option (see above). Fusions, which produced contigs with more than 5 Q, were rejected by deleting the corresponding virtual marker and reanalyzing with IBC. Remaining VE markers are denoted validated VE markers in Table II. All steps required to create or delete VE markers, were done automatically using an Access Database, managing files containing the detailed end-end results (by printing in a file the FPC standard output), the summarized end-end results (by saving the FPC results window), the contig results (by saving the FPC By Contigs window) and the IBC results (by saving the FPC results window). Four additional steps were performed as described, making it possible to merge ends sequentially from 10 −11 to 10 −08 . In a last step, we merged contigs based on our framework markers and checked contigs with more than 50 clones to split manually those which seemed to be doubtful.

Fingerprinting of BAC clones
Two libraries were used: the Inra BAC library [14] with 105 984 clones and part of the CHORI-240 (www.chori.org/bacpac) BAC library (26 500 clones). About 9% of the clones were lost during the fingerprinting process, due to poor DNA preparation or low DNA yield (7%), injection failure or damaged capillaries (2%). Cross contamination was identified in about 2% of the wells.
Empty or small clones showing a low number of bands were rejected at the data cleanup step, in order to retain only clones with more than 7 bands for Inra clones and 11 bands for CHORI-240 clones. Moreover, since a clone  with a high number of bands may correspond to two different clones in a same well, INRA clones with more than 63 bands and CHORI-240 clones with more than 75 bands were marked as cancelled in FPC. Finally, 102 725 clones were transferred to FPC, 40 of which were fingerprinted twice and 1802 were cancelled. Table I summarizes all these results and Figure 1 shows the distribution of band numbers after data cleanup.

Contig assembly
Preliminary maps built using different cutoff values show that cutoff values ranging from 10 −12 to 10 −08 produced contigs consistent with either the expected results based on statistical analysis [27] or on our contigs built by chromosome walking (unpublished data). We used two criteria to evaluate the quality of the construct: the number of clones poorly incorporated (Q clones) and the number of contigs exhibiting at least one Q clone (Q contigs). As shown in Figure 2   70 000 clones. Moreover, Figure 2 shows clearly that map buildings are more stable in terms of Q clones and Q contig number for cutoff values in the 10 −12 to 10 −09 range.
Moreover, we analyzed more precisely the differences between 10 −10 and 10 −12 cutoff value maps after multiple DQER steps. We observed that in most cases, contigs built at a cutoff value of 10 −10 represent end fusions of 2 or more contigs built at 10 −12 . Thus, contig assembly seemed to be reliable in the 10 −12 to 10 −09 cutoff range. However, to provide reliable data, we used an incremental process to build our final map, as described in the Materials and methods starting from a stringent build obtained at a 10 −13 cutoff. We stopped this incremental process at 10 −08 , since higher cutoff values generated too many non reciprocal and non unique end fusions (Fig. 3). Table II shows the results obtained at each fusion step, the map encompassing 6890 contigs obtained using 4604 virtual markers and 4157 fusions.

Map validation and anchoring by PCR screening
A total of 1390 markers was screened on the Inra BAC library, 87 of them being absent from our library. Thus the screening information for a total of 1303 markers (451 microsatellites, 471 genes, 127 EST, and 254 BAC-ends) was used to anchor the contigs and validate our building strategy. These markers were derived from existing bovine genetic and radiation hybrid maps or according to their position on the human genome. In particular, we used 80 type I and II markers placed on a BTA26 comprehensive radiation hybrid map [17].
Fifty-five contigs contained at least one VE marker flanked by screened markers. In all cases, the locations of flanking markers were consistent, suggesting that VE markers introduced no error.
About thirty contigs were built in the course of this work by chromosome walking using framework preliminary maps with VE markers. BAC ends were sequenced and screened on the BAC library, making it possible to confirm 152 VE markers (data not shown). No erroneous fusion could be detected. These screening data made it possible to fuse some more contigs, resulting in the first generation bovine physical map.

The bovine physical map
The current release (June 2003) encompasses 6615 contigs, 747 of them being anchored. The contigs contain 15 clones on average (from two to 164 clones). Table III shows the number of contigs, the Q contigs and number of Q clones for 7 contig size classes.
Our 6615 contigs cover a total of 791 214 bands, each of them representing about 3.5 kb based on 82 000 Inra clones (∼120 kb) showing 35 bands and 19 000 CHORI clones (∼180 kb) showing 48 bands. Thus, our contigs cover about 400 kb on average and the map covers about 2769 Mb, i.e., more than 90% of the bovine genome. Singletons and cancelled clones containing large repetitive sequences represent about 117 000 additional bands (i.e., ∼14%). Therefore, the BAC map covers virtually all the bovine genome. However, this may represent a slight overestimation because of undetected overlaps.
In terms of anchored contigs, the map covers 171 500 bands representing about 600 Mb, i.e. 20% of the bovine genome.
The bovine physical map will be continuously updated, based on new marker screening data. It will be publicly available through the BovMap database and webFPC at http://locus.jouy.inra.fr/cgi-bin/lgbc/mapping/common/ intro2.pl?BASE=cattle.

A first draft of a BTA 26 physical map
In order to test the genome coverage of our contigs and the usefulness of our map, we screened a first set of 34 evenly spaced markers from BTA 26. It was thus possible to anchor 23 contigs and to detect two regions not represented in our library (Fig. 4). These 22 contigs covered about 30 Mb, i.e. 50% of the estimated 60 Mb total length of BTA26 [17]. Thus, screening of about 30 evenly spaced markers made it possible to increase from the chromosome coverage with anchored contigs 20% to 50%.
The remaining 44 markers from the BTA26 RH map [17] were then screened: five new contigs were identified and six previously identified contigs were enlarged or fused.
Finally this preliminary BTA26 physical map covers about 35 Mb, which correspond to about 60% of BTA26.
The results are concordant in terms of gene order with the radiation hybrid map data except for five contigs (Ctg520, Ctg146, Ctg525, Ctg530 and Ctg150). These discrepancies concern very closely related markers separated by a distance under the resolution limit of our panel [16].
In addition, these results are consistent in terms of contig length, except for Ctg146 and Ctg525 for which the physical sizes (1700 kb and 2050 kb) represent about 170 and 90 cR 3000 instead of 60 and 70 cR 3000 respectively, based on the converting ratio previously defined from human physical data [16].

DISCUSSION
We developed a first generation genome-wide BAC-based physical map of the bovine genome. This map consists of 6756 contigs assembled from 100 923 clones selected in two libraries. Restriction profiling was achieved using a fluorescent fingerprinting technique, based on capillary electrophoresis to analyze samples. This strategy has proven to be efficient in terms of speed and data quality. Four persons were sufficient to generate about 2000 fingerprint profiles per day on a MegaBace 1000, using little automation (one Hydra 96). Capillary electrophoresis reduces the variations of fragment sizes to less than 1 bp in a range of 50 to 750 bp, making it possible to more accurately dedect overlaps. However, efficient fragment analysis software is necessary to simplify and speed up the data cleanup process, which required two persons for six months.
Our iterative strategy makes it possible to not only automate the manual merging step but also to precisely monitor the contig assembly process and thus to stop it before data is corrupted. No error could be detected based on 50 contigs containing at least one VE marker flanked by screened markers. Since these 50 contigs represent 118 virtual fusions, we can expect, with a risk lower than 5%, that less than 2.5% of the contigs could be falsely branched by VE markers.
Moreover, the use of VE markers allows any user to trace all 'weak joints' introduced by manual editing. Indeed, the only way to detect a merge at a high cutoff value is to reanalyze this contig. VE markers clearly indicate which clones and which cutoff values were used to merge contigs. This could be of great interest for users accessing contigs through webFPC, for example.
The clones analyzed in this study were submitted to a double digestion, which generated about 40 bands on average. It should be possible to achieve a three to fourfold increase of band number since the MegaBace can accurately resolve fragments in the 50−750 bp range. A higher number of bands could be of great interest to detect smaller overlaps. For example, two clones with 30% overlap could share 11 bands out of 35 or 22 bands out of 70. Their coincidence scores would be 2E −08 and 9E −10 , respectively. Many restriction enzymes could be used simultaneously to generate more bands and thus detect 20% overlaps. This strategy could be an alternative to the strategy of Ding et al. [13]. However, these authors suggest that too many bands (∼100 bands) could be troublesome for FPC to handle. A three-digestion strategy generating about 70 bands and permitting 25−30% could thus be a good compromise.
Our BAC-based map will be a valuable tool for genomic research in ruminants, including targeted marker production, positional cloning or targeted sequencing of regions of specific interest. Even if our map provides only a four-genome equivalent coverage, it may not be worthwhile to spend more time, fingerprinting additional clones. Three elements support this statement: -Firstly, STS content mapping may be more effective than fingerprinting for achieving gap closure and contig joining. About 3000 additional markers evenly distributed on the whole genome should be sufficient to achieve a 60% coverage with anchored contigs based on our results on BTA26. About 6000 markers would allow most gap closure. At present, comparative mapping data combined with the human sequence makes it possible to quickly identify these 6000 markers and to develop bovine specific primers from the numerous bovine EST available in databases. This is encouraging since the main applications of this physical map could be dedicated to large contig construction to assist positional cloning in the ETL regions.
-Secondly, our map provides a good framework to initiate a strategy similar to that of Gregory et al. [20] and establish high-resolution syntenies among ruminant, human and mouse genomes. About 60 000 bovine BES from the CHORI-240 library have been submitted to GenBank, 10 000 of them corresponding to clones integrated in our contigs. End sequencing of singletons and clones at the end of our contigs will provide about 60 000 additional BES. BLASTN comparison with the human genome should thus provide at least 5000 significant "hits", making it possible to align most bovine contigs along the human genome. The deduced bovine contig juxtaposition could be helpful to identify potential overlapping contigs, which could be easily joined by PCR screening.
-Thirdly, an international physical map is under development (www.bcgsc.bc.ca/projects/bovine mapping, http://www.livestockgenomics. csiro.au/cattle.shtml) by analyzing single digest fingerprints obtained from 280 000 BAC. Amongst these, 18 913 clones from the CHORI-240 library were also incorporated in our map. Even if two different fingerprinting strategies were used, clones common to both maps could serve as anchors to identify news fusions between contigs from the two maps. Cross validation between these two independently constructed maps should provide a reliable framework to start whole genome sequencing projects.