Comparative analysis on the structural features of the 5' flanking region of κ-casein genes from six different species

κ-casein plays an essential role in the formation, stabilisation and aggregation of milk micelles. Control of κ-casein expression reflects this essential role, although an understanding of the mechanisms involved lags behind that of the other milk protein genes. We determined the 5'-flanking sequences for the murine, rabbit and human κ-casein genes and compared them to the published ruminant sequences. The most conserved region was not the proximal promoter region but an approximately 400 bp long region centred 800 bp upstream of the TATA box. This region contained two highly conserved MGF/STAT5 sites with common spacing relative to each other. In this region, six conserved short stretches of similarity were also found which did not correspond to known transcription factor consensus sites. On the contrary to ruminant and human 5' regulatory sequences, the rabbit and murine 5'-flanking regions did not harbour any kind of repetitive elements. We generated a phylogenetic tree of the six species based on multiple alignment of the κ-casein sequences. This study identified conserved candidate transcriptional regulatory elements within the κ-casein gene promoter.


INTRODUCTION
Although milk casein composition varies considerably between livestock species, κ-casein seems to be ubiquitous in accordance with its biological role [17]. The relative concentration of κ-casein versus the Ca-sensitive caseins varies among species and is influenced by the casein allelic variants within each species. The ratio of κ-casein versus Ca-sensitive caseins has a significant influence on casein micelle size [15], which in turn alters the manufacturing properties and digestibility of milk [5]. In spite of the importance of κ-casein in the assembly and stability of casein micelles, a detailed analysis of its regulation and comparison with the structural features of the most studied β-casein promoter has not been performed. Specifically, although the κ-casein cDNA sequence is known for many species, the 5 flanking regions have only been analysed in three closely related ruminant species. Identification of DNA sequences involved in the transcriptional control of this gene will help the investigation of κ-casein expression using gene transfer methods.
As a first step to understanding how κ-casein expression is regulated, we compared six different κ-casein gene promoters at the sequence level. The presence of highly conserved, putative transcription factor binding sites in all the known 5 regulatory regions of the κ-caseins might indicate that interactions between these sites and the corresponding transcription factors contribute to the regulation of mammary gland-specific gene expression. We sequenced 1.9 kb of the rabbit and murine κ-casein 5 flanking regions and the published human κ-casein promoter sequence [7] was extended further upstream and compared to the corresponding regions in the ruminant κ-casein 5 flanking sequences.

Origin of sequences
The murine sequence was generated from a subclone of BAC clone 555-N16 (Research Genetics Inc., USA), which contains 105 kb of the murine casein locus [8]. The rabbit κ-casein promoter was derived from the λ 24 genomic clone [2]. The human sequence [7] was extended further upstream using overlapping, unfinished sequence contigs obtained from the Human Genome Project (EMBL accession number M73628 and AC060228). The caprine, ovine and bovine sequences have EMBL/GenBank accession numbers Z33882, L31372 and M75887 respectively.

Promoter sequencing and sequence analysis
Sequencing was performed on both strands by applying fluorescing dyelabelled terminators and the cycling method (ABI PRISM TM Dye Terminator Cycle Sequencing Ready Reaction Kit with AmpliTaq R DNA Polymerase, FS; Perkin Elmer) in five steps.

Characterisation of murine and rabbit 5 sequences
The mouse sequence was generated (acc. No. AJ309571) from the BAC clone 555-N16 (Research Genetics Inc., USA), which contains 105 kb of the murine casein locus [8]. A ∼ 24-kb BamHI fragment from this clone, containing the complete κ-casein gene, was subcloned into pPolyIII [11] and sequenced. Rabbit DNA was subcloned into the pPolyIII-I vector from the λ24 genomic clone [2] and sequenced (acc. No. AJ309572). The rabbit κ-casein promoter sequence corresponds to the "A" allele in the two variants described [10].
We were able to generate 1 962 bp of murine and 1 908 bp rabbit 5 flanking sequences, respectively. The murine and rabbit sequences include the putative TATA box that has been described for the bovine sequence [1]. When comparing these overlapping 5 flanking sequences, excluding regions containing repetitive elements, the rabbit sequence shows 63% similarity to human, 58.6% to murine and 58% to ruminant κ-casein. The TATA box in the murine and the rabbit is different from this consensus sequence by one and two mismatches, respectively. Both sequences were analysed for the presence of all transcriptional factor consensus sites, which have already been described in the 5 regulatory regions of casein genes. Table I shows that the rabbit has 6 AP-1 (activator protein 1), 11 C/EBP (CCAAT/enhancer binding protein), 1 CTF/NF1 (nuclear factor 1), 2 GR half sites (delayed secondary glucocorticoid response element), 2 MGF/STAT5 (signal transduction and activator of transcription 5), 6 PMF (pregnancy specific nuclear factor) and 8 YY1 (yin and yang factor 1) consensus sequences. A comparison to the mouse sequence showed that a similar situation exists, except that in addition the mouse sequence had a single Oct-1 (octamer binding protein 1) site. The murine sequence harbours 7 AP1, 9 C/EBP, 2 CTF/NF1, 4 GR, 2 MGF/STAT5, 1 PMF, 1 OCT1 and 3 YY1 consensus sequences.
Three of the sites (C/EBP, CTF/NF1 and MGF) found in the murine and rabbit promoters were identified as common motifs in 28 milk protein gene promoters [16]. Of the 30 consensus sequences found in the murine compared to the 36 found in the rabbit, only three sites were spatially conserved (< 20 bp difference) between the murine and the rabbit; the C/EBP site at −1200 (approx.) and both MGF/STAT5 sites at −1020 and −940 (approx.). This spatial conservation, with respect to the transcriptional start site and relative to each other, may imply functional importance.

Comparison of six κ-casein promoter sequences
A high level of homology and similar locations of most putative transcription binding sites were reported among the published ovine, caprine and bovine κ-casein promoters [4]. Here we performed a comparative analysis, which included the aforementioned sequences in addition to the human (EMBL acc. No. M73628; Human Genome Project acc. No. AC022672.00009 and AC060228.00059) and the newly sequenced murine and rabbit κ-casein promoters. The level of homology differs between compared sequences, e.g. the ruminants are all well conserved at > 90% [4]; while the level of homology between the rabbit, mouse and human was significantly lower at about 60%. We found similarities with respect to transcription factor consensus sequences within the proximal promoter region but they were not conserved in all analysed sequence. In addition, this was not the most conserved region located by sequence alignment. An approximately 400 bp region located about 800 bp upstream of the proximal promoter was found to be the most conserved. This region is aligned for the six kappa casein promoter sequences in Figure 1. Notably, this conserved region contained the two conserved MGF/STAT5 sites, but not the single conserved C/EBP site. In all κ-casein promoters, the positions of these two putative transcription factor-binding sites were the most highly conserved. They also appeared to share a common spacing with respect to each. In the ruminant they are 96 bp apart while in the mouse they are 97 bp apart. Table I. Occurrence of putative transcription factor binding sites in the 5 region of the murine and rabbit κ-caseins. Positions are relative to the TATA boxes. Abbreviations are as described in the text plus N is any nucleotide, N{0,8} means that up to eight nucleotides were allowed and M is A or T.
" # ¥ ! § ¥ " # P P P U P 1 P P 1 P P P P P P W P P P P P The spacing is slightly greater between the human MGF/STAT5, which are separated by 104 bp, and less in the rabbit, where 65 bp separate the MGF sites. Among the other consensus sequences searched for, only two YY1 and one GR-half sites were found in this region, however they were not conserved in all six promoters. Conversely, six conserved short stretches of sequence similarity were found in this most conserved region, where the homology between the six sequences is greater than the average; B3 and B6 have already been described in the β-casein gene promoter [16] while conserved box CB1-4 were novel sequences (Fig. 1). These conserved box regions did not correspond to known transcription factor consensus sites. The CB4 box overlapped with the B6 block, while the other conserved β-casein-specific motif (B3) overlaped the conserved GR-half site at position −654 in the mouse. A further 5 conserved blocks (CB5-9) were detected throughout the completed aligned promoter region. At these boxes the homology is either absolute between the sequences, or there are only two types of nucleotides occurring in a given position. The consensus sequences of these novel conserved blocks ( ; where Y is C or T, R is A or G, W is A or T, K is G or T, S is G or T and H is A, T or C. As identified by Coll et al. [4], the ruminant κ-casein 5 -flanking region contains repetitive elements. We located the repetitive elements and their relative positions in all six sequences analysed. The caprine and bovine κ-casein sequences contain two repetitive elements. The first sequence is the same 114 bp long interspersed nuclear element (LINE), which belongs to the L1MA5A mammalian-specific sequence [24] and the second is a 206 bp short interspersed nuclear element (SINE), which belongs to the Bov-tA Bovidae family [4]. The LINE element is also conserved in the ovine gene, but it is unknown whether the adjacent SINE region is also conserved, as it has not been sequenced. In the human κ-casein promoter, a 206 bp LINE element just 100 bp upstream from the TATA box was identified. This insertion is a classical 5 truncated sequence that contains only the 3 untranslated region of the original L1 sequence, which belongs to the L1PA2 primate subfamily [24]. The sequence of this repetitive element was not identified in an earlier analysis of the human κ-casein sequence, where only a single Alu element in the second intron was described [7]. LINE-related-sequences have been described in the first and fourth introns of the rabbit κ-casein gene [10]. Therefore, the lack of the two ruminant repetitive elements in the other three species and the lack of the L1PA2 insertion in the five other promoters indicates that the insertion of the L1MA5A and the Bov-tA elements happened after the divergence of the ruminants, while the insertion of the L1PA2 element could be considered as a recent evolutionary event, which happened well after the diversification of primates. Figure 2 describes a phylogenetic tree of the six species based on the multiple alignment of the κ-casein promoter sequences. Possible insertion points of the three repetitive elements L1MA5A, Bov-tA and L1PA2 are indicated.

DISCUSSION
The temporal and tissue-specific expression of milk protein genes is controlled by a distinct class of co-operating and antagonistic transcription factors which associate with multiple, sometimes clustered, binding sites. The number and position of potential binding sites can play a decisive role in the outcome of these synergistic and antagonistic interactions [6]. We compared the κ-casein 5 -flanking sequences from six different species. The general theme is that common consensus sequences are present in all but that different spatial arrangements exist in the promoters from different species.
Three consensus sequences, previously deemed to be common to all milk protein genes [16], were found (C/EBP, CTF/NF1 and MGF). In addition, some similarities with other milk protein promoters were identified. For example, the frequently studied β-casein gene promoter harbours two lactogenic hormone response regions (LHRR), which are characterised by the presence of multiple C/EBP sites with at least one binding site for MGF/STAT5 [6]. Close to the highly conserved MGF/STAT5 sites, three and two C/EBP binding sites were identified in the mouse and rabbit κ-casein promoters, respectively (Tab. I). The corresponding regions therefore fulfil the structural criteria for a potentially active LHRR. In addition, an insulin response element (IRE) is present within the rabbit κ-casein promoter. This sequence contains a one-base mismatch compared to the consensus sequences found in other milk protein gene promoters [16], as does the IRE in both the bovine and caprine κ-casein promoters. Perhaps this may reflect earlier in vitro data, in which neither insulin nor glucocorticoids noticeably amplified the action of prolactin on rabbit κ-casein gene expression [3].
The differences between the newly characterised κ-casein sequences and other milk protein gene promoters were more noticeable. First, a common feature of several milk protein genes is the presence of a "milk box", e.g. YY1 motifs associated with two MGF binding sites [16]. Associations of MGF and YY1 sites in the human, rabbit and murine in contrast to ruminant κ-casein promoters were not identified. Secondly, clusters of sequence motifs related to the delayed secondary glucocorticoid response elements have been identified in bovine, ovine and caprine κ-casein promoters along with other milk protein genes [4]. Notably, a GR-half site consensus (at position −654 in the mouse promoter) belongs to this cluster and it is conserved in all the examined species except the rabbit, where a single base-pair difference has occurred (Fig. 1). Thirdly, overlapping OCT-1 C/EBP sites, located 25 bp upstream of the TATA box, have been described in the bovine αs2-, β-casein genes and in the ruminant κ-casein genes [9,23]. However, although the C/EBP site is conserved, the OCT-1 consensus sequence is absent in the human, rabbit and murine κ-casein promoters. Remarkably, and in contrast to the ruminant κ-casein promoter, none of these features were found to be associated with either the murine nor the rabbit or human promoters. Alignment analysis indicated that the proximal promoter was not the most conserved region. Rather a 400 bp region residing approximately 800 bp upstream from the transcriptional start site was highly conserved in all six species. Notably this region is characterised by the two MGF sites. These sites were the only two sites found to be spatially conserved in all six κ-casein 5 promoter regions. The importance of this region in regulating κ-casein gene expression has not been evaluated, except that it is present in all transgenic studies performed todate [2,20,22].
Several studies have tried to use κ-casein sequences to drive transgene expression in mice. Both the bovine and the caprine κ-casein genomic clones were not or were poorly expressed in transgenic mouse lines under their own regulatory regions [22,20]. The rabbit κ-casein genomic clone, which includes the 2.1 kb 5 regulatory region, directed low level, but tissue specific expression in transgenic mice [2]. The presence of the repetitive LINE and SINE elements in the 5 -flanking region of the ruminants and human κ-caseins may alter transcriptional efficiency [19]. It is tempting to speculate that the impaired expression levels of ruminant κ-casein transgenes could reflect the presence of repetitive elements in these genomic sequences. Further experiments are necessary to evaluate the importance of the most conserved region, the conserved lactogenic hormone response region, and to reveal the significance of the differences compared with other milk protein genes.