A specific pattern of splicing for the horse αS1-Casein mRNA and partial genomic characterization of the relevant locus

Mares' milk has a composition very different from that of cows' milk. It is much more similar to human milk, in particular in its casein fraction. This study reports on the sequence of a 994 bp amplified fragment corresponding to a horse αS1-Casein (αS1-Cn) cDNA and its comparison with its caprine, pig, rabbit and human counterparts. The alignment of these sequences revealed a specific pattern of splicing for this horse primary transcript. As in humans, exons 3', 6' and 13' are present whereas exons 5, 13 and 14 are absent in this equine mRNA sequence. BAC clones, screened from a horse BAC library, containing the αS1-Cn gene allowed the mapping of its locus by FISH on equine chromosome 3q22.2-q22.3 which is in agreement with the Zoo-FISH results. Genomic analysis of the αS1-Cn gene showed that the region from the second exon to the last exon is scattered within a nucleotide stretch nearly 15-kb in length which is quite similar in size to its ruminant and rabbit counterparts. The region between αS1- and β-Cn genes, suspected to contain cis-acting elements involved in the expression of all clustered casein genes, is similar in size (ca. 15-kb) to the caprine and mouse intergenic region.


INTRODUCTION
Milk, the natural food for new-born mammals obviously contains all of the components necessary for the initial phases of postnatal development. With regards to the protein content of milk, marked qualitative and quantitative

RNA extraction from mares' milk cells
Mares' milk samples (2 × 50 mL) were freshly collected and total RNA was extracted directly from pelleted cells after centrifugation at 3 500×g for 30 min at 4 • C essentially according to the procedure described in [3]. Pelleted cells were lysed with 0.5 mL of denaturing solution. Pellets of RNA were stored in ethanol at −20 • C. For RNA analysis, the pellet was dissolved in 50 µL distilled water.

Reverse-transcription, PCR and cDNA sequence analysis
Total RNA prepared (5 µL) from the cellular pellet was reverse transcribed into first strand cDNA as previously described [18]. PCR primers were designed from conserved regions. Owing to the high conservation of the sequence encoding the signal peptide of calcium-sensitive caseins between species, the BT71 primer was chosen to pair with this sequence. From another highly conserved region detected in the 3 -UTR after the alignment of known αS1-Cn cDNA sequences from various species, the BT56 reverse primer was defined, starting 13 to 15 nt upstream of the polyadenylation signal as previously described [18]. DNA fragments obtained after PCR on reversed transcribed mRNA under standard conditions using the αS1-Cn BT56/71 specific primer pair were cloned into pUC18 essentially as previously described [18]. These clones were screened by PCR using the same αS1-Cn primer pair, BT56/71. PCR products, 5 µL of each reaction mix, were analysed by electrophoresis. A plasmid showing a ca. 1 kb amplified fragment was sequenced after purification with a QIAquick PCR kit (Qiagen), using the Prism ready reaction cycle sequencing kits and an ABI 377 automated sequencer (Applied Biosystems Inc.). A comparison of the resulting sequences was searched using BLAST and Align programmes.
Oligonucleotides were provided by Genosys Biotechnology (United Kingdom).
The letters in brackets represent the degenerescence at the relevant position.

Genomic analyses
To characterize the casein locus, BAC clones containing casein genes were isolated from the Inra horse BAC library [20] by PCR screening using αS1-Cn (Eq7/Eq8) and β-Cn (Eqβ1/β2) specific primer pairs. Clones were prepared as described previously [7,26] and their gene content was ascertained by PCR performed on miniprepped BAC DNA with the same screening primers. PCR fragments were purified with the QIAquick PCR purification kit (Qiagen) and sequenced on an ABI 377 automated sequencer (Applied Biosystems Inc.). Homologies to sequences present in the databases were searched for using the BLAST algorithm (www.ncbi.nlm.nih.gov). The BAC insert size was estimated after NotI digestion followed by field inverted gel electrophoresis (FIGE). BAC clones containing the casein locus were used for fluorescence in situ hybridization (FISH) mapping as described elsewhere [1,7].
Equine genomic DNA was prepared from peripheral blood, as previously described [7]. The Expand TM long template PCR system of Boehringer-Mannheim (Meylan, France) was used according to the manufacturer's protocol except that annealing was performed at 55 • C. An aliquot (5 µL) of each reaction mix was analysed by electrophoresis, then amplified DNA (10 µL) was digested with one of the following restriction enzymes: PstI, PvuII under conditions specified by the supplier. The size of digested fragments was determined by electrophoresis in 1% agarose gel.

Nucleotide sequence of the cDNA encoding equine αS1-Cn
To decipher the equine αS1-Cn cDNA primary structure, we took advantage of the presence of mammary epithelial cells in milk to isolate RNA.
After cloning of amplified fragments, recombinant plasmids were screened using αS1-Cn specific primers to identify a clone containing a horse αS1-Cn cDNA. The sequencing of the identified αS1-Cn cDNA insert reveals that the amplified fragment was 1 018 bp in length. The accession number of the obtained αS1-Cn cDNA sequence, reported here, is AY049939. Due to the strategy used, the 5 extremity of the cDNA amplified fragment corresponds to the sequence encoding the signal peptide and therefore this cDNA lacks 5 UTR. The coding region stretches over 627 bp coding for a 208 amino acid (aa) residues protein including the 15 aa residues of the signal peptide. Therefore, the mature horse αS1-Cn is 193 aa residues long. Most of the 3 -UTR is included in the amplified cDNA fragment which stretches out 391 bp past the TGA stop codon.
Multiple alignments of the cDNA sequences from different species (Fig. 1), taking into account the exon modular splitting derived from the known structural organisation of genes as proposed by [19] points to constitutive horse specific exon-skipping events. Indeed, exons 5, 13 and 14 (numbering of the caprine gene) are lacking whereas the others including exons 3 , 6 and 13 are present. Exons 3 and 6 are conserved during splicing as in the rabbit whereas they are absent in ruminant mRNA. Conversely, exon 13 which was first detected in humans [18] is present in horse αS1-Cn mRNA as in the pig. It is worth noting that exon 13, which is a duplication of exon 10 in ruminants, is absent in the horse cDNA sequence reported here as in humans, pigs and rabbits. This event could be due to mutation(s) in the splice site sequence surrounding this exon or to the absence of this duplication in equine species. This horse exon-skipping pattern is more similar to that exhibited by its rabbit counterpart (except for exons 13 and 14) than to its caprine or bovine counterpart. The specificity of the horse αS1-Cn pattern of splicing, reported here, resides in exon 5-skipping which is not observed in the caprine, rabbit, pig and human species. However, this work could not provide evidence of the absence of multiple alternative processed transcripts as described in goat [5,14]. Moreover, due to the lack of polymorphism analysis, we cannot exclude that genetic variability exists. The αS1-Cn splicing pattern presented here is the first described in the horse. The present transcript most probably corresponds to the longer of the two, major αS1-Cn proteins identified in the mares' milk (Miranda et al., unpublished). Regarding the multiplicity of alternative splicing described in different species for this gene (see review [19]), we can guess that other patterns could exist in the horse.
Furthermore, it is worth noting that 3 or 6 nucleotide deletions occur at the 5 or 3 extremities of the horse exons 4, 6 and 11 (Fig. 1). In goats, it was proposed that such events are due to the usage of cryptic 5 or 3 splice sites leading to the elimination of the first or the last codon of exons [14]. This event could be due to inaccurate splicing by promoting selection of cryptic splice sites and/or by equine specific mutation changing this splice site sequence. These phenomena seem to be rather frequent with αS1-Cn transcripts occurring in most of the species examined so far. Exon-skipping and cryptic splice site usage result in an interspecies variability of the αS1-Cn structure.
Regarding cDNA sequence comparison, different types of analysis of similarity were performed. A global analysis with the entire sequences gave a

------G---A-C---------------C-----T-CC----C------ACT--A--C-G---G-G-----GT-----CT--C-C---TG---T-----CTC-TT-
a H bR B degree of similarity with caprine, rabbit, human and pig full-length cDNA of ca. 60%. Due to the exon-skipping events, this global comparison does not reflect biological reality. The comparison of common exon sequences reveals higher degrees (ca. 75%) of similarity between the horse and other species. Furthermore, sequence comparisons taking into account the exon modular structure of the cDNA show a similarity ranging between 61 to 93% depending on the exons and the species compared. In particular, the equine exon 9 sequence which encodes the highly conserved multiple phosphorylation site of the protein has around 75% similarity with its caprine, rabbit and pig counterpart. Surprisingly, similarity with human exon 9 is lower (less than 65%). It is worth noting that equine exons 15 and 18 show a high similarity (> 80%) with the other species (Fig. 1).

Partial characterization of the casein gene locus
Using PCR to screen the Inra horse BAC library, the αS1-Cn (Eq7/Eq8) primers identified 12 clones likely to contain the relevant gene. One BAC (593-G-3) clone was studied in more detail. The insert size of this clone, estimated after NotI digestion followed by field inverted gel electrophoresis (FIGE) analysis, was ca. 90 kb. The presence of the αS1-Cn gene in this clone was confirmed by sequencing of the PCR product obtained. A specific hybridisation signal obtained by FISH using this clone as a probe was observed on the equine chromosomal region 3q22.2-q22.3 (ECA3) after examination of the metaphase spread. This result is in agreement with Zoo-FISH analyses [22] and comparative mapping data [20]. Indeed, the casein locus was localised on the 6q31 bovine chromosomal region [28], on caprine 6q32 [8] and on human 4q21.1 [17], which are orthologous of ECA3.
With respect to the regulation of expression of the four casein genes, the αS1/β region suspected to contain cis-acting elements was studied. Since the β-Cn gene was not detected in BAC 593-G-3, a second screening of the BAC library was performed using primers specific for β-Cn gene (Eqβ1/ Eqβ2). Six clones were shown to contain the β-Cn gene and one (577-C-4) of them included both the αS1-Cn and the β-Cn genes. The size of the insert was estimated by FIGE to be ca. 82-kb. Given the close proximity of the caprine, bovine and mouse αS1-Cn and β-Cn loci [15,23,25], the size of this intergenic region in the horse was evaluated using long-range PCR. Electrophoresis of PCR products was obtained using Eq7 (matching the 5 of the last exon of the αS1-Cn gene) and Eqβ1 (matching of the exon 8 sequence of the β-Cn gene; accession number AF214526) revealed an amplified fragment, the size of which was estimated to be ca. 11.2-kb after PstI, PvuII digestion and gel electrophoresis. This result (Fig. 2) demonstrates that these two genes are also convergently transcribed as it was first shown in caprine [15], bovine [23], mice [25] and human [24]. Interspecies comparison of the αS1/β-Cn region revealed its conservation during evolution. Only the distance separating these two casein genes is quite different. In horses, the size of this intergenic region is between that of its mouse counterpart (10 kb, [25]) and its ruminant (12 kb in caprine [15] and 20 kb in bovine [23]) counterparts. In humans this region seems to be larger since the αS1-and β-Cn gene appeared to be 60 kb apart [24].
In the same way, the size of the αS1-Cn gene has been determined by longrange PCR using primer pairs Eq5 (matching exon 2)/Eq2 (matching exon 10); Eq1 (matching exon 9)/Eq8 (matching the last exon) and Eq1/Eq2 (Fig. 2). The sizes of the amplified fragments estimated after PstI, PvuII digestion and gel electrophoresis were ca. 5.5-, 10.8-and 0.7-kb, respectively. These results suggested that the size of the αS1-Cn gene is at least 15-kb without the region spanning exons 1 to 2. The size of this segment is quite similar to those determined for ruminants and rabbits (ca. 15-kb).
Lastly, the size of the β-Cn locus was determined by long-range PCR using the primers Eqβ3 (matching with exon 2) and Eqβ2 (matching with the 3 extremity of the gene). Gel electrophoresis of the PCR product revealed that the size of this region, 5.5 kb in the horse, is at least 1 kb shorter than its ruminant counterpart. The characterisation of this gene is under investigation to determine the deletion site.

CONCLUSION
The results presented here are in agreement with the hypothesis that exonskipping and cryptic splice site usage outcome in an interspecies variability of the αS1-Cn [19]. We have described here a horse specific pattern of αS1-Cn primary transcript splicing. Nevertheless, we cannot exclude the existence of other patterns due to the extensive alternative splicing of this gene described in other species. We have also analysed the genomic organisation of a part of the casein locus which presents a large similarity with the loci of the other species described so far. This knowledge provides tools to study the regulation of the casein gene cluster in equine species and the homologous cDNA sequence leads us to study the αS1-Cn alternative splicing in the horse.