Genetic polymorphism of milk proteins in African Bos taurus and Bos indicus populations. Characterization of variants αs1-Cn H and κ-Cn J

The polymorphism of caseins, a-lactalbumin and β-lactoglobulin was investigated in African Bos taurus (N'Dama, Baoule, Kuri) and Bos indicus (Shuwa Arab, Sudanese Flzlani) populations. The respective frequencies of alleles a and a in the N'Dama (0.89 and 0.11) and Baoule (0.92 and 0.08) breeds were almost opposite to those found in the Shuwa Arab zebu (0.22 and 0.78), a true zebu, which confirms a phenomenon already documented in the literature. Because the a/<t-Cn haplotype was strongly predominant in N'Dama and Baoul( and 0as compared to the asl/K-Cn! haplotype in the Shuwa Arab zebu (0.63), an opposite trend in frequencies was also observed between taurines and zebus at the β-Cn and rc-Cn loci. These results confirm that the polymorphism of caseins provides an efficient marker system to discriminate Bos taurus from Bos indicus origins. The Kuri was at an intermediate position, since, in this population, the a allele predominated as in taurines, while the * Correspondence and reprints

a sl -Cn C , /3-Cn A2 , K -Cn A haplotype was the most frequent, as in zebus. This may be interpreted as revealing intercrossings with zebus in the previous history of this cattle type. Conversely, but to a lesser degree, the polymorphism of the Sudanese Fulani zebu indicates a taurine influence, in accordance with what is accepted about the origins of this cattle type. No polymorphism of a s2 -casein could be identified, while a-lactalbumin was polymorphic in all populations. Two additional variants, probably specific to African cattle, were observed. Variant H of a sl -casein, found in Kuri, is characterized by the deletion of the eight amino acid residues (51-58) coded by exon 8, a probable consequence of exon skipping. Allele &OElig;s l -Cn H is derived from allele a sl -Cn B . Variant J of !-casein, found in Baoulé, is derived from variant B by the substitution of Ser 155 (B) -Arg (J). The existence of at least another allele of a sl -casein was suggested. &copy; Inra/Elsevier, Paris genetic polymorphism / milk proteins / Africa / Bos taurus / Bos indicus Résumé -Polymorphisme génétique des protéines du lait dans des populations de  Ces résultats confirment que le groupe des gènes des caséines fait partie des marqueurs de choix pour discriminer entre des origines de type Bos taurus et Bos indicus. Le Kouri occupe une position intermédiaire puisque l'allèle a sl -Cn B prédomine comme chez les taurins, alors que l'haplotype le plus fréquent est a sl -Cn c , ¡3-Cn A2 , !-CnA comme chez les zébus. Ces particularités peuvent être interprétées comme révélant des pratiques de métissage plus ou moins anciennes avec des zébus. Inversement, mais à un bien moindre degré, le polymorphisme du zébu Peuhl révèle une influence taurine, en accord avec ce qui est admis sur les antécédents de ce type de bovin. Aucun polymorphisme de la caséine a s2 n'a pu être identifié, alors que l'a-lactalbumine est polymorphe dans toutes les populations. Deux variants supplémentaires, probablement spécifiques des populations africaines, ont été identifiés. Le variant H de la caséine a sl , trouvé chez le Kouri, se caractérise par la délétion de la séquence de huit résidus d'acides aminés (51-58) codée par l'exon 8, conséquence vraisemblable d'une anomalie d'épissage de l'ARN messager, l'allèle a sl -Cn H dérivant de l'allèle a sl -Cn B . Le variant J de la caséine !, trouvé chez le Baoulé, dérive du variant B par la substitution Ser 155 (B) --4Arg (J). L'existence d'au moins un autre allèle de la caséine a sl est suggérée. &copy; Inra/Elsevier, Paris polymorphisme génétique / protéines du lait / Afrique / Bos taurus / Bos indicus

INTRODUCTION
More than 40 years after the pioneer work of Aschaffenburg and Drewry on /3-lactoglobulm [2], a vast amount of information has been collected on the genetic polymorphism of the six main bovine lactoproteins: a si -, o s2 -, /3-and K -caseins, controlled by four tightly clustered loci (a si -Cn, a 52 -Cn, / 3-Cn, K -Cn), a-lactalbumin and !-lactoglobulin, controlled by independent loci (a-La, ,   [24, 28!. Investigations were primarily carried out in dairy breeds of European origin and were stimulated by the search for correlations between those polymorphisms and milk production traits, which have proved to be successful [10,28]. In addition, the work was also extended to beef breeds, since milk protein polymorphisms are valuable markers for population studies [10,11,28]. Data available on African Bos taurus and Bos indicus populations, as well as on zebus as a whole, are comparatively scarce and, when they do exist, they are far less complete. As an example, the only publication providing haplotype frequencies of the casein cluster of genes is that by Grosclaude et al. on Madagascar zebus !12!. As early as 1968, Aschaffenburg and coworkers [1,4] drew attention to the interesting features of the lactoprotein polymorphisms in Bos indicus, namely the predominance, at the a sl -casein locus, of the C allele, contrasting with the usual higher frequency of the B allele in Bos taurus, and the occurrence of a polymorphism of a-lactalbumin, contrasting with the monomorphism of this protein in the various breeds of Bos taurus which had been investigated at that time; a-lactalbumin was, however, later found to be polymorphic in southern European breeds and this made the differentiation between taurines and zebus less clear !21, 24, 28).
The lack of data on the genetic polymorphism of milk proteins in African cattle is unfortunate because the diversity of these populations is exceptionally high, since they were derived from successive Bos taurus and Bos indicus introductions which tended to substitute for, or to mix in a complex way. According to Epstein [7] the first domestic cattle in Africa were humpless longhorn animals introduced through Egypt from South-West Asia in the second half of the 5th millenium B.C. This type is now restricted to two West-African populations, the N'Dama, whose breeding centre is the Fouta Djallon plateau in Guinea, and the Kuri, located in the Lake Chad basin (figure 1).
A second Bos taurus type, the humpless shorthorn cattle, originating from the same domestication area in South-West Asia, was introduced into Africa in the 2nd millenium B.C. In West Africa, humpless shorthorns, known as Baoulé, Somba, Muturu and Lagune, are now mainly found in the coastal regions from Gambia to Cameroon. Present African zebus are derived from shorthorned thoracic humped animals which spread rapidly westwards from the Horn of Africa after the Arab invasion (about 700 A.D.). In West Africa, this type now extends along a narrow belt south of the Sahara desert (from west to east: Maure, Tuareg, Azawak and Shuwa zebus). Finally, cattle of mixed origin are widely distributed in eastern and southern Africa. In West Africa, they are represented by the long or giant horned Fulani zebus, which extend between the taurine area in the south and the zebu belt in the north. According to Epstein [7] Fulani cattle were derived from crossbreedings between longhorn humpless cattle and thoracic humped zebus. This paper presents the results of the analysis of milk protein polymorphisms in the two longhorn humpless populations, N'Dama and Kuri, in the humpless shorthorn Baoul6, in the Shuwa Arab true zebu and in the Sudanese Fulani cattle. The four above-mentioned cattle groups are thus represented.

Equipment
The reverse phase-HPLC equipment was from Spectra Physics, San Jos6, CA, USA; the absorbance detector (lambda Max 481) and automatic injector (712 WISP) were from Waters, Milford, MA, USA; the Nucleosil C18 N 225 column (250 x 4.6 mm, 10 nm, 5 (im) was from Shandon HPLC, Runcorn-Cheshire, UK; the Vydac C4 214TP54 column (150 x 4.6 mm; 30 nm; 5 q m) was from Touzart et Matignon, Vitry-sur-Seine, France; the FPLC system and Mono Q (HR10/10) column were from Pharmacia, Uppsala, Sweden; the amino acid analyser LC3000 was from Eppendorf-Biotronik, Maintal, Germany; the Procise 494-610A protein sequencer, 377 A automated DNA sequencer and 480 thermal cycler were from Perkin Elmer-Applied Biosystems, San Jos6, CA, USA; the matrix-assisted laser desorption ionization linear time of flight mass spectrometer (MALDI-MS) G2025A, equipped with a Pentium PC using a sofware supplied by the manufacturer, was from Hewlett Packard, Palo Alto, CA, USA; the QIA quick PCR purification kit was from Qiagen, Courtaboeuf, France.

Nomenclature
The known variants of asl-casein being A, B, C, D, E (25!, F [8,30] and G [32, 33!, the additional one found in the present study was named H. In the same way, the additional variant of K -casein was named J, next to A, B, C, D, E !25!, F [14], G [9], H and I !31!. . Samples from Sudanese Fulani cows were collected, between 1990 and 1996, from private herds from 11 villages in Burkina Faso, nine of which are located around Bobo-Dioulasso (location 8), the remaining two being more distant (locations 6 and 7). Samples from Kuri cows were collected in 1994, in private herds from the Bol district, in the Lake Chad basin (location 9). After milking, the samples were frozen until air-dispatching to the laboratory in Jouy-en-Josas. Only a few samples were not suitable for analysis.
The genotype of the Kuri cow, whose milk was used to produce a sl -casein H, was homozygous for the a sl -Cn H , /3-Cn Al , K-Cn! haplotype. The genotypes of the two Baoul6 cows, whose milk was used to produce r!-casein J, were &OElig; Sl -Cn B/B , ,B-CnAl/A2, ,!_CnB/J and a sl -Cn B/B , O-Cn A2 / A2 , K-CnA/J, res p ectively, because no homozygous cow was available.

Electrophoresis of milk samples
Milk samples from Shuwa Arab cattle were analysed by starch gel and polyacrylamide gel electrophoresis as described by Grosclaude et al. [12].
Samples from the other populations were analysed by isoelectric focusing according to Mahe and Grosclaude !19!.

Preparation of K -casein
Whole casein, acid-precipitated at pH 4.6 from skim-milk, was chromatographed on a mono Q column as described by Guillou et al. [13]. The order of retention times of the non-glycosylated K -casein fractions (!c0 -Cn) of the three genetic variants was J < B < A. !0 -Cn fractions were exhaustively dialysed against distilled water and freeze-dried. Starch-gel electrophoresis of whole casein at an alkaline pH was carried out according to Aschaffenburg and Michalak !3!. Renneted samples were obtained by mixing 10 !iL of a 1/50 diluted rennet solution (containing 520 mg chymosin per litre) with whole casein (24 mg/mL). Once coagulated (after 20 min at 32 °C), the samples were loaded onto the gel.
2.4.5. Preparation of deglycosylated CMP (CMPO) of the variants K -Cn B and K -Cn J CMPOs B and J were prepared by a two-step precipitation of the supernatant of a chymosin hydrolysate of whole casein ( K -Cn AJ and K -Cn BJ) with 5 and 12 % trichloracetic acid successively, according to Yvon et al. [39]. The CMPO fraction was chromatographed at 40 °C on the C18 nucleosil column at a 1 mL/min rate, using a linear gradient from 100 % solvent A (0.115 % TFA) to 100 % solvent B (CH 3 CN/H 2 0/TFA 60/40/0.10 %), collected and dried with a speedvac evaporator concentrator. Retention times of the CMPOs of variants A, B and J were in the order of A < J < B.

Enzymatic and chemical hydrolysis
Chymosin hydrolysate (E/S: 10-5 ) of the whole casein was performed at 37 °C for 20 min in 25 mM citrate buffer, pH 6.5. The reaction was stopped by increasing the pH to 9.0 with NaOH. a sl -Casein H was hydrolysed by

Allelic and haplotypic frequencies
Among the six main lactoproteins, only a s2 -casein was not found to be polymorphic with the techniques used. Table I gives the allelic frequencies at the loci of the five other proteins, and table II the frequencies of haplotypes of the casein loci cluster, calculated by the method of Ceppellini et al. [6]. This method assumes a Hardy-Weinberg equilibrium, a requirement that was found to be satisfied at all three individual loci in the five populations. The allelic frequencies observed in the N'Dama and Baoul6 samples are remarkably similar. To gauge this similarity, the allelic frequencies at the five polymorphic milk protein loci were used to calculate the genetic distances, according to Cavalli-Sforza or Nei, between a total of 23 populations (17 French breeds, 3 African Bos indicus and 3 African Bos taurus populations including N'Dama and Baoule). Consensus trees were built using the UPGMA method and a bootstrap procedure was carried out (for references of the methods, see Moazami-Goudarzi et al. [26]). Among all pairwise comparisons, the closest distance was indeed observed between the N'Dama and Baoul6, the bootstrap value being as high as 97 % (not shown). On the contrary, the frequencies observed in N'Dama and Baoul6 showed a marked contrast to those of the two true zebu populations, Shuwa Arab and Madagascar zebu. In N'Dama and Baoul6, a sl -Cn B , ,6-Cn Al and K-Cn! are the most frequent alleles compared to a,,,,-Cn c , /3-Cn A2 and K-Cn! in zebus.
Coherently, haplotype BA 1 B (a simplified designation for a sl -Cn B , ,6-Cn Al , r,-Cn B ) is the most frequent in taurines, in contrast to CA Z A in zebus. The values in Sudanese Fulani cattle show a zebu-like pattern, but the rather high frequency of haplotype BA 1 B may be considered as revealing the influence of Bos taurus genes in the origin of this cattle type. In Kuri, allele a sl -Cn B prevails over &OElig; sl -Cn c , which is a taurine feature. The predominant haplotype is, however, CA!A, as in zebus, and overall, the Kuri appears as an almost perfect intermediate between taurines and zebus. In contrast with a majority of west European breeds, a-lactalbumin is also polymorphic in taurines, but the frequencies of a-La A are significantly lower than in zebus.
The occurrence of three additional variants was suspected at the a sl -Cn and !-Cn loci. Two of them could be characterized by biochemical analyses summarized hereafter and were given a regular designation: variant a sl -Cn H was found in 16 Kuri cows (nine C/H, six B/H, one H/H), and variant !-Cn J in three Baoul6 cows (one A/J, two B/J) and one Fulani, not belonging to the population sample (B/J) (figure !). eight amino acid residues. RP-HPLC elution patterns of tryptic hydrolysates of a sl -caseins B and H showed the absence of one peak in a sl -casein H as the only difference. The fraction corresponding to this missing peak was identified, by Edman degradation, as the peptide 43-58 of a si -casein B, which suggests that the difference is located in this region (figure 3).

Characterization of variant a sl -Cn
The sequence of the first 52 residues of the a sl -casein H protein was established unambiguously by Edman degradation. This sequence was identical to that of variant B, up to residue 50, but the next two residues (51-52) were Gln-Met, instead of Asp-Gln in variant B. This result indicated the deletion, in the H protein, of residues 51-58 ( figure 3) since, in the reference variant B, the Gln-Met sequence occurs at positions 59-60. MALDI-MS analysis of the purified CNBr peptides, carried out for confirmation, showed a difference for only one peptide. The molecular mass found for this peptide was 6 300 Da in a si -Cn B, which corresponds to peptide 1-54 (theoretical mass: 6 285.12 Da).
In the a sl -Cn H variant, the measured molecular mass was 6 114 Da which is very close to the theoretical value obtained for the sequence 1-52 (6 099.1 Da).
Altogether, these results established the deletion in variant H of the sequence of eight residues (51-58) which is coded by exon 8.
The non-deletion of exon 8 in the a sl -Cn H gene could be ascertained by sequencing the product of PCR amplification of the corresponding region of the gene, including exons 7 and 8, carried out on the DNAs of two heterozygous cows, of genotypes &OElig;Sl-CnB/H and &OElig; sI -Cn C/H . In both cases, only the normal, non-deleted sequence was obtained. This suggested that the deletion of peptide 51-58 was the consequence of exon skipping. The Edman degradation of the endo Asp-N C-terminal peptide of a sl -Cn H showed that this variant had the same C-terminal sequence as a si -casein B (Glu in position 192 instead of Gly for a sl -Cn C). Allele a sl -Cn H is thus derived from allele a sl -Cn B .

Characterization of variant K -Cn J
The patterns observed in figure 2 strongly suggest that !c-Cn J has either one negative charge less or one positive charge more than r!-Cn B. The para-K -caseins obtained by chymosin hydrolysis of !c-caseins A, B and J showed an identical electrophoretic migration, indicating that the particularity of the type J casein was to be searched in the caseinomacropeptide. RP-HPLC elution patterns of B and J CMPOs digested with S. aureus protease V8 showed a single difference in the fractions (data not shown, available upon request). Partial Edman degradation of the fraction corresponding to the extra peak of type J gave a sequence corresponding to residues 152-164 of r!-casein B, except that Arg replaced Ser at position 155 (Val-Ile-Glu-Arg-Pro-Pro-Glu-Ile-Asn-Thr-Val-Gln-Val OH). The results from CPA degradation of !0-Cn J showed no difference in its C-terminal part, which also corresponds to the C-terminal parts of CMPs and !c-caseins A and B [Thr: 2.68 (3); Ser: 1.25 (1): Ala: 0.67 (1); Val: 2.35 (3); Ile: 1.00 (1); Asn: 1.16 (1); GIn: 0.88 (1)].
In conclusion, the difference in charge between variants r!-Cn B and !c-Cn J is the consequence of a substitution of Ser 155 (B) -> Arg (J) due to a mutation (AGC -> AGA or AGG) having occurred in the K-Cn! allele. This result is consistent with the order of elution of variants A, B and J from an anion exchange mono Q column, as given above (J < B < A).

DISCUSSION
The number of individual milk samples analysed varied markedly in the populations studied. A large number of samples makes the detection of rare alleles and haplotypes possible. This is well exemplified in the Fulani sample where the BDB, CA 1 A, CBB and CBA haplotypes are rare recombinants between the a,,-Cn and /!-Cn loci. If a large number of samples had been available, such rare recombinants may also have been found in the other populations. Taking into account this disequilibrium in sample sizes, the comparison of populations should only be based upon the main alleles and haplotypes.
When comparing the protein polymorphism of geographically distant populations, the risk of homoplasy (different variants having the same electrophoretic behaviour) should be taken into consideration. In their work on Madagascar zebu, Grosclaude et al. [12] ascertained that the amino acid substitutions responsible for the differences in charge between variants a sl -Cn B and C, ,6-Cn A 1 and A 2 , as well as K -Cn A and B, were the same in this zebu as in western humpless cattle. In the present study, it was also checked, by DNA typing [17], that the /3-Cn A' variant of the N'Dama and the K -Cn C variant of the Kuri were identical to their European counterparts (data not shown).
Because !-Cn C occurs at low frequencies in European breeds, its presence in the Kuri was unexpected. Since there is no record of any introduction of European cattle in the Kuri, K -Cn C may be regarded, as !c-Cn A and B, as being common to European and African cattle, or at least Bos taurus populations. The extension of research on the polymorphism of milk proteins to the so-far neglected African cattle populations was expected to lead to the discovery of additional variants, specific to these populations. Three such unknown variants were observed, and two of them were characterized, a si -Cn H and K -Cn J, the amount of the casein sample available being insufficient to characterize the third one.
The a sl -Cn H variant is the most interesting, since it is the fifth example of a casein variant having an internal deletion, most likely generated by exon skipping. Except for the D variant of bovine a s2 -casein [5], the other examples are all concerned with variants of either bovine or caprine a sl -casein.
The difference in charge between variants K -Cn B and K -Cn J is due to a single amino acid substitution, Ser 155 (B) -Arg (J). Interestingly, position 155 is the same as that affected in the K -Cn E variant [25] but, in this case, the mutation (AGC -GGC) occurred in the A allele, inducing the amino acid substitution Ser 155 (A) -Gly (E).
The most striking result in tables I and II is the exceptional similarity of allelic and haplotypic frequencies in the N'Dama and Baoulé samples. In both cases, sampling was carried out in two experimental herds and not in the field populations, but it is difficult to believe that this procedure, which is not quite appropriate, could account for such a similarity. Frequencies observed in the Shuwa Arab zebu are not very different from those of the Madagascar zebu. The inversion of allelic frequencies between taurines and zebus, already well documented in the literature for the a sl -Cn locus [24, 28!, may also be observed here at the ,6-Cn and K -Cn loci. The polymorphism of casein loci thus appears to be particularly useful for population studies in the African continent, where, for a long time, taurines have coexisted and intercrossed with zebus. The fact that the frequencies in Sudanese Fulani cattle are intermediate between those of taurines and zebus, and closer to the latter, is in good agreement with the above-mentioned theory about the origins of this cattle type.
The Kuri, which is humpless and has the small metacentric chromosome of Bos taurus !29!, is thus considered as a taurine population. It was classed by Epstein !7!, together with the N'Dama, in the humpless longhorn cattle group, descending from the first domesticated Bos introduced into Africa. Surprisingly, its allelic frequencies are not far from the mean of those of the N'Dama and of the true zebus, except for the K -casein polymorphism which is closer to that of zebus. This situation may be due to introgression of zebu genes into the breed.
MacHugh et al. [18] have indeed concluded to the existence of an east to west introgression gradient of microsatellite alleles of Indian Bos indicus into African populations, including sub-populations of the taurine type N'Dama, but the Kuri was not included in their study. As a matter of fact, crossbreeding with Shuwa Arab and M'Bororo zebus is a common practice in the areas fringing Lake Chad, while pure Kuri are restricted to the islands [22!. The frequencies of milk protein polymorphisms were, however, found to be exactly the same in a group of 103 cows considered, on the basis of phenotypic characters, to be pure Kuri, as in a group of 63 cows which, although humpless, could be suspected to carry zebu genes [37]. The possible gene flow between Kuri and zebus is thus not easily detectable in the present conditions. When considering the allelic differences between N'Dama and Kuri, one should remember that these two populations most likely originated from two distinct routes of introduction of domesticated humpless longhorn cattle into Africa. The ancestors of the N'Dama probably spread through northern and north-western Africa, and those of the Kuri through the Sahara during the 'green period'. The genetic differences between N'Dama and Kuri could thus have a quite remote origin. It is conceivable that the Kuri remained genetically closer to the originally domesticated population of South-West Asia, the common origin of all taurines than the N'Dama did (7!. The gene frequencies in this population were probably closer to those of the zebus than those of the modern taurines of northern Europe and western Africa. This is supported by the distribution of the a-lactalbumin polymorphism in Europe. While the a-lactalbumin polymorphism is the rule in zebus, it is restricted, in Europe, to Podolic breeds, or breeds known to have been crossed with Podolic cattle, all of which are located in southern Europe [21, 24, 28!. Because the longhorn Podolic group is genetically and geographically the closest in Europe to the original domestic population of South-West Asia !20!, it can be assumed that a-lactalbumin was polymorphic in this original population, which is more a Bos indicus than a Bos taurus feature. The phylogenetic status of the Kuri will be analysed with more genetic markers in a forthcoming paper.