Serial analysis of gene expression (SAGE) in bovine trypanotolerance: preliminary results

In Africa, trypanosomosis is a tsetse-transmitted disease which represents the most important constraint to livestock production. Several indigenous West African taurine (Bos taurus) breeds, such as the Longhorn (N'Dama) cattle are well known to control trypanosome infections. This genetic ability named "trypanotolerance" results from various biological mechanisms under multigenic control. The methodologies used so far have not succeeded in identifying the complete pool of genes involved in trypanotolerance. New post genomic biotechnologies such as transcriptome analyses are efficient in characterising the pool of genes involved in the expression of specific biological functions. We used the serial analysis of gene expression (SAGE) technique to construct, from Peripheral Blood Mononuclear Cells of an N'Dama cow, 2 total mRNA transcript libraries, at day 0 of a Trypanosoma congolense experimental infection and at day 10 post-infection, corresponding to the peak of parasitaemia. Bioinformatic comparisons in the bovine genomic databases allowed the identification of 187 up- and down- regulated genes, EST and unknown functional genes. Identification of the genes involved in trypanotolerance will allow to set up specific microarray sets for further metabolic and pharmacological studies and to design field marker-assisted selection by introgression programmes.


INTRODUCTION
In central and sub-Saharan Africa, the most important constraint to livestock production is trypanosomosis. This tsetse-transmitted disease represents an important risk for about 60 million cattle and it strongly affects their productivity (milk, meat, fertility, pulling...) on 7 million km 2 spread over 37 countries. Animal trypanotolerance is the genetic ability of some breeds from several mammalian species (such as cattle, small ruminants, pigs, wild buffaloes and antelopes...) to live normally and remain productive in tsetse-infested areas [12]. This phenomenon was described in Africa as early as the beginning of the XXth century [4,6,7,29,33,35]. Trypanotolerance results from various biological mechanisms under multigenic control, which relate either to the control of trypanosome infection, as measured by parasitemia [10,13,22,34], or the control of the pathogenic effects of the parasites, the most prominent of which is anaemia [1,[36][37][38]. Two different pools of genes are probably involved in determining the two characteristics and the various methodologies used so far have not succeeded in identifying them. Neither zootechnical studies [8,9,16,15,23], nor quantitative genetics approaches [39], nor the electrophoretic analysis of targeted proteins [32,31], nor MHC typing [21] have brought significant progress on the trypanotolerance understanding. QTL studies developed more recently, firstly on mice [17][18][19] then on bovine [14] give some more interesting results but they are restricted to small parts of the cattle genome. Considering the limited number of experimental animals used, the confidence interval of the bovine QTLs is too wide to be useful in a marker assisted selection (MAS) programme, or in a positional candidate approach. The homologous comparison between the mouse and bovine genomes is limited and there is no proof that the same genes are involved in the two species. Finally, the QTL approach could give information on genes involved in the innate immunity but not on those controlling the acquired immunity, both gene types being involved in the global trypanotolerance mechanisms. Furthermore the crossbreeding plan to study the QTL segregation on bovines is very long and expensive. Several recent biotechnologies allow exhaustive functional analysis using a transcriptomic approach which is efficient to characterise the full complex of genes involved in the expression of specific biological functions. Amongst them, the serial analysis of gene expression (SAGE), which we used in the present work, will allow to compare up-and down-regulated genes involved in the control of Trypanosoma congolense infection in N'Dama cattle.

Experimental animals and design
We used one animal of the N'Dama breed which is a Longhorn indigenous West African taurine (Bos Taurus) well known to be resistant to trypanosomosis infection. This animal was taken in the field from a highly tsetse infected area. A serological control allowed to verify the presence of specific T. congolense antibodies. Before the beginning of the experiment, this animal was treated against blood parasites (Veriben: diminazene aceturate, 7 mg · kg −1 ) and gastrointestinal parasites (Vermitan: albendazole, 7.5 mg · kg −1 ). After a few days of resting, a first blood sampling was done using a PAXgene Blood RNA tube (Quiagen, cat. No 762125) which contains a total RNA conservation medium. This first blood sample at day 0 was used to develop the first reference SAGE library (D0L) from total white blood cells. Then the experimental design consisted in a Trypanosoma congolense infection (Ser/71/STIB/212) using a unique syringe inoculation of 8 × 10 5 parasites [11,25,27,28]. Each couple of days, a blood parasitological control on the buffy coat allowed to check for the presence of the parasites and to follow the kinetics of their development (Fig. 1). The second blood sampling was done to develop the second reference SAGE library (MPL) at the peak of parasitaemia which appeared at day 10. These two D0L and MPL SAGE libraries were used in a differential comparison of expressed genes in this N'Dama animal before and after a T. congolense infection.

The SAGE method
The serial analysis of gene expression (SAGE) technique [3,40,41] enhances the power and the swiftness of transcriptome analysis. SAGE generates complete expression profiles of tissues or cell lines and the results are quantitative and absolute. The principle of this technique consists in the construction of total mRNA libraries for a quantitative analysis of the whole transcripts expressed or inactivated at particular steps of a cellular activation. It is based on three principles: (i) a short sequence tag (9-14 bp) obtained from a defined region within each mRNA transcript contains sufficient information to uniquely identify one specific transcript; (ii) sequence tags can be linked together to form long DNA molecules (concatemers) that can be cloned and sequenced. Sequencing of the concatemer clones results in the quick identification of numerous individual tags; (iii) the expression level of the transcript is quantified by the number of times a particular tag is observed.
We used the I-SAGE TM kit from Invitrogen (cat. No T5000-01) to develop our 2 D0L and MPL transcript libraries.
Bioinformatic comparisons [30] in several genomic databases (Unigen, Tigr) allowed firstly to identify the different activated and inactivated tags (known genes, EST or unknown genes) and secondly to compare their respective frequencies in both D0L and MPL libraries.

RESULTS
The analyses of the whole identified tags are summarised in Table I. From 4763 sequenced tags, we identified 2281 distinct transcripts, 187 of them being differentially expressed in both D0L and MPL libraries. The rates of contamination by linker sequences were non-significant. Repeated ditags (not taken into account for the measurement of expression levels) represented 1.3% of the total ditag population, revealing a high complexity of the original mRNA population.
The tags showing the most significant differences in frequencies (P < 0.001) between both D0L and MPL libraries are separately presented for the up (Tab. II) and the down (Tab. III) regulated transcripts.
A different interesting presentation of these results is given in a graphical scatter plot (Fig. 2) where each dot represents a particular tag. "Genes" are tags matching with well identified genes; "cDNA/EST" are tags matching with anonymous described sequences; "no match"are tags failing to match with SAGEmap (rank 1 or 2) or UniGene sequences.   Several dots correspond to known genes (immunoglobulins, B and T cell receptors, interleukins, MHC Bola class I and II, metabolic and ribosomal proteins...) or EST but others correspond to unknown genes. These unknown genes could come from the N'Dama mRNA but they could also come, to a small extent, from mRNA of T. congolense parasites. To validate this hypothesis, we did another bioinformatic comparison of these whole no-match tags with the two available existing Trypanosoma genome databases (T. brucei and T. cruzi). We identified 5 expressed genes actually coming from the Trypanosoma congolense genome, and which are probably ubiquitous in the Trypanosoma genus (Tab. IV).
This result opens a very interesting way to study the interactive mechanisms at the host-parasite interface by a parallel comparison of the parasite and the host SAGE libraries.

DISCUSSION
Amongst the 187 regulated tags, from the pool of up-regulated transcripts (Tab. II), we found several genes involved in the immune mechanisms which confirm several previous immunological results [2]. The most activated genes were those encoding different chains of immunoglobulin (IgG and IgM) molecules. This confirms their important role in the immune mechanisms involved in the control of trypanosome infections. Indeed, the literature on this topic [2,5,6,13,20,26] is rich of corroborating results indicating that, except for primo-infection [24], the ability of resistant animals to control parasitaemia is due to a more efficient specific antibody response. The T-independent responses producing IgM antibodies are sufficient to control the parasitaemia, and the IgM are more efficient than the IgG as neutralising antibodies at the beginning of the infection [20]. The increase of the serological IgM level and the parasitaemia appearance are simultaneous while the IgG antibodies generally appear later. The IgM are mainly directed to the parasite surface antigens while the IgG are generally directed to the internal antigens [5]. It has often been reported [22] that the trypanosomes are responsible for B and T cells polyclonal proliferation. We confirmed the activation of the genes encoding the B and T cell receptors (Tab. II). Furthermore, the T cell receptor beta cluster is located in the bovine chromosome 4 (Bta4) in the 4q3.1 and 4q3.6 region (IDVGA51-TGLA159/MGTG4B) where Hanotte et al. [14] described a QTL strongly associated with the fewer parasites trait (PARMLn) in N'Dama. We also found genes encoding cytokines such as interleukins (IL1 and IL10R) confirming their role in the induction of a cell polyclonal activation, particularly for IgM antibodies [26]. Finally, MHC class II BoLA-DQB genes seemed to be activated (Tab. II) while other MHC genes of class I and class II (BoLA-DRA and BoLA-DMA) seemed to be down-regulated (Tab. III). Apart from molecules of the immune system, we identified several genes involved in different up-and down-regulated metabolic pathways (such as the NADH-ubiquinone oxidoreductase chain 1, the bovine profilin or the glutathione peroxidase). Several ribosomal genes were also regulated. Within regulated EST, one up-regulated (TC133588) and 3 down-regulated (BE236829, BF890336, Bt. 57839) EST are of interest, but further developments of the bovine map will be needed to clearly identify these EST. Concerning the up-and down-regulated unknown tags, they can be spotted on microarrays for further applications.
These preliminary results obtained on a single experimental N'Dama animal need to be reproduced at least on another individual of this breed. To identify the genes implicated in the trypanotolerance mechanisms, we need to implement similar differential analysis at least on two individuals from several other cattle breeds: the trypanotolerant breed (Bos taurus), such as Baoule, and the trypanosusceptible zebu breed (Bos indicus). The comparison of the results obtained on the different trypanotolerant and trypanosusceptible cattle will allow to differentially identify the pool of genes specifically involved in the control of parasitaemia. Also, the kinetics of the package cell volume (PCV) should be monitored in order to collect blood samples for SAGE libraries at the precise time when the PCV increases as a result of efficient mechanisms of anaemia control. This would lead to the constitution of two global pools of genes involved in the trypanotolerance genetic character, either through the control of parasitemia and/or the control of anaemia, to set up field marker assisted selection and specific microarrays for further metabolic and pharmacological studies. Finally, these results could be compared with those of the QTL approach for cross validation and to identify positional candidate genes useful for future selection/introgression programmes in different cattle breeds.
The comparative SAGE libraries applied to the Trypanosoma congolense parasite should also allow for the identification of parasite genes that are specifically up-and/or down-regulated by the host defence mechanisms, with interesting consequences for drug development against animal and human trypanosomoses.