Genetic diversity of a large set of horse breeds raised in France assessed by microsatellite polymorphism

The genetic diversity and structure of horses raised in France were investigated using 11 microsatellite markers and 1679 animals belonging to 34 breeds. Between-breed differences explained about ten per cent of the total genetic diversity (Fst = 0.099). Values of expected heterozygosity ranged from 0.43 to 0.79 depending on the breed. According to genetic relationships, multivariate and structure analyses, breeds could be classified into four genetic differentiated groups: warm-blooded, draught, Nordic and pony breeds. Using complementary maximisation of diversity and aggregate diversity approaches, we conclude that particular efforts should be made to conserve five local breeds, namely the Boulonnais, Landais, Merens, Poitevin and Pottok breeds.


Introduction
During the twentieth century, horse breeding has undergone large changes in Europe. Previously considered as an agricultural, industrial and war tool, horse is now essentially bred for hobby riding. Draught horses, in particular, have been less and less used as utility horses, and many draught breeds have undergone a dramatic decrease in population size: according to the Haras Nationaux, out of the nine French draught breeds, six have annual births below 1000. Measures for in situ conservation have been applied in France for several years but such measures are in general expensive. Therefore, it would be useful to identify priorities among conservation purposes and this requires characterising diversity and genetic relations between breeds [1].
With about 800 000 animals belonging to 50 different breeds (source: Haras Nationaux), France shows a large diversity of horse populations. Among these breeds, 21 have a French origin or have been bred in France for at least a century. According to the FAO, at least 15 populations have disappeared during the last 50 years, and eight indigenous breeds are still considered as endangered or endangered-maintained. Among those breeds, the majority are draught breeds, namely the Ardennais, Auxois, Boulonnais, Poitevin and Trait du Nord breeds, the other ones being the Merens warm-blooded breed and the Landais and Pottock pony breeds. Information on the genetic diversity of French endangered breeds could help breeders and providers, decide where they should place more emphasis.
In the present study, we first analysed the genetic diversity of 39 horse populations reared in France: within-breed diversity, breed relationship and population structure were investigated, using microsatellite data. Then, we focussed on 19 breeds of French origin or having been raised in France for at least a century, and evaluated the conservation priorities between these populations, using different approaches to evaluate within, between and total diversity.

Populations sampled and microsatellite analysis
French nomenclature divides horse breeds into three groups: warm-blooded, draught horses and ponies. In this study, 39 populations were considered (Table 1). These 39 populations comprised 31 recognised breeds (including 13 warm-blooded breeds, nine draught breeds, and nine pony breeds), the primitive Przewalski horse (used as an outgroup), and seven populations originating from the splitting of two recognised breeds, namely the Anglo-Arab (AA) and Selle Français (SF) breeds (divided into four and three groups, respectively). The 2005 studbook rules define those groups according to the proportion of foreign genes that can be found from genealogical analysis: AA6 and AA9 are considered as pure AA, whereas AA5 and AA10 can have ancestors from another origin, the proportion of Arab origin being higher for AA5 and AA6 than the others. SF8 has a large proportion of PS origin and can therefore be used to produce AA, SFA97 constitutes a group closed to direct foreign influences, whereas SFB98 individuals can have a parent from another breed (under some conditions).
For each of the 39 populations, 23 to 50 animals born between 1996 and 2005, were sampled amounting to 1679 animals. Except for the Przewalski horse, where no pedigree data was available, the sampled animals were known to have no common parents. For the conservation approach, the study focussed on 19 populations, either of French origin, or having been bred in France for at least 100 years (PS, AA and AR breeds). In this approach, 50 animals were randomly sampled among the four and three AA and SF subpopulations, respectively, to constitute two populations.

Statistical analysis
Allele frequencies, mean number of alleles (MNA), observed (Ho) and non-biased expected heterozygosity (He), were calculated using GENETIX [13]. Wright Fis, Fit and Fst coefficients were also computed using the same software. GENEPOP [14] was used to evaluate pairwise genetic differentiation between breeds [15] and departure from Hardy-Weinberg equilibrium, using exact tests and sequential Bonferonni correction [16] on loci. Global tests on Hardy-Weinberg equilibrium were also performed using GENEPOP. Allelic richness was computed using FSTAT [17].
The matrix of Reynolds unweighted distances D R [18] was computed using POPULATION (Olivier Langella; http:// bioinformatics.org/~tryphon/populations/). Regarding the D R distance, a NeighborNet tree was drawn using SPLITSTREE 4.8 [19]. A factorial correspondence analysis (without the Przewalsky horse) was also performed using GENETIX. Finally, the genetic structure of the populations was assessed using Bayesian clustering methods developed by Pritchard (STRUCTURE, [20]): using a model with admixture and correlated allele frequencies, we made 20 independent runs for each value of the putative number of sub-populations (K) between 1 and 22, with a burn-in period of 20 000 followed by 100 000 MCMC repetitions. Pairwise similarities (G) between runs were computed using CLUMPP [21].
To evaluate the conservation priorities in a set of populations, taking into account contributions to within-population and between-population genetic diversity, Ollivier and Foulley [12] have proposed the following method. First, the between-breed contribution (CB) is evaluated, based on the Weitzman [22] loss Vk of diversity when the population k is removed from the whole set of breeds (in this study we used D R distance). Then, the within-breed contribution (CW) is defined as: where H(S) is the average internal heterozygosity of the whole set S and H(S/k) the average internal heterozygosity of the set when k is removed. Finally, the aggregate diversity D of a population is defined as: The cryopreservation potential (CP) could be computed as the product between the breed contribution (CB) and the probability of extinction (P ex ) of the breed, assumed to be directly proportional to the inbreeding rate (ΔF). Following Simianer et al. [23], P ex can be approximated as where Ne is the effective population size, M and F are the numbers of breeding males and females, respectively, used inside the breed in 2005, and c is a constant, to be chosen. Considering that the effective population size of a breed should not be lower than 50 to avoid extinction in the short term [24], we considered that P ex = 1 for Ne = 50. Therefore, c was set to 100 (see equation 3).
Caballero and Toro [11] have developed a parallel approach. The total diversity GD T can be considered as the exact sum of the gene diversity within population GD WS and the gene diversity between populations GD BS considering the following equations: where n is the number of populations, f ij is the average coancestry between populations I and j, and D ij is the Nei minimum distance between populations I and j. The contribution of a population to the diversity is evaluated by computing the loss or gain of diversity ΔGD when the population is removed.
The authors have also proposed to evaluate the contributions (c i ) of each population, which can maximise the total diversity at the next generation, using the following equation: The contributions can be computed by maximising GD TN in equation (7), with the following restrictions: for each population i, c i ≥ 0 and Σ i c i = 1. a W = warm-blooded horse, D = draught horse, P = pony, Pr = primitive horse b France = breeds of French origin or raised in France for at least 100 years; other countries = country of origin for breeds raised in France for less than 100 years c In brackets, number of individuals of each AA and SF subpopulation used when aggregating the four and three subpopulations, respectively

Genetic variations
One hundred and nine alleles were found over all populations and all markers. The average number of alleles per locus was 9.8 ranging from seven (locus HTG4 and HMS1) to 15 (locus ASB2). Some rare alleles in the whole data set were found with a high frequency in the PRW population: for instance, with the HTG6 loci, the two most frequent alleles in the PRW population (70%) were seldom found in other breeds (less than 1%). Heterozygosities, mean number of alleles (MNA) and allelic richness (AR) are presented in Table 2. MNA and AR were highly correlated, (r = 0.98, P < 0.0001). He ranged from 0.43 in the FRI breed to 0.79 in the PFS breed, while Fis per breed ranged from -0.08 (TDN breed) to 0.11 (PRE breed).
Some significant heterozygote deficits after corrections were found, for different loci and populations (see Table  2). Only one test exhibited significant excess (AA5 with HMS1). Using global tests, five populations (AB, AR, AUX, CAM, PRE) and two markers (HMS3 and HTG10) showed significant deficit in heterozygotes (P < 0.01). Other studies have shown similar results for these two markers [4].
Testing population differentiation, 11 pairs of populations were found non significantly differentiated out of the 741 tests performed: AA5 with AA6, AA9 with AA10, The Fis, Fit, and Fst values were 0.019, 0.116 and 0.099, respectively. We found a gene differentiation coefficient G ST [25] of 0.0989.

Breed relationships and clustering
The NeighborNet network (Figure 1) clearly separated draught horses (also including MER, HAF breeds) and warm-blooded horses, whereas most pony breeds were placed between these two groups. Nordic (IS, SHE, FJ) breeds formed a separate group. FRI and PRW populations were isolated from the other breeds, the closest groups being draught horses and Nordic breeds, for the FRI breed and PRW population, respectively.
Neighbour-Net for the 39 horse populations, based on Reynolds D R distance Figure 1 Neighbour-Net for the 39 horse populations, based on Reynolds D R distance.
In Figure 2, the 38 populations (PRW being excluded) were placed according to the two main axes of the correspondence analysis (accounting for 27.4% and 11.5% of the inertia, respectively  Figure 3 shows the assignment of populations to clusters for each K, using runs having the highest pair-wise similarity coefficients.
For K = 2, there was a clear separation between draught and warm-blooded horses, with other populations showing intermediate results. When K reached 3, Nordic/primitive breeds, ponies, and some warm-blooded horses segregated more or less clearly from the two other clusters.
As K increases to 4 and 5, the five clusters were constituted of Nordic/primitive breeds, draught horses, ponies, warm-blooded populations close to the AR breed and warm-blooded populations close to the PS breed. Some breeds were shared among the last three clusters, such as Correspondence analysis of allele frequencies for 38 of the populations studied (PRW is not included) Figure 2 Correspondence analysis of allele frequencies for 38 of the populations studied (PRW is not included). The projection is shown on the first two axes.
LAND between ponies and AR groups, and APPAL among the three clusters. When K reached 6, depending on the runs, FRI or PRW populations were alternately isolated, which led to a decrease of similarity across runs and explains the low similarity coefficient (0.562) in comparison with other K. When K = 7, these two populations were isolated. The different runs highlight some differences among sub-populations of AA and SF breeds, underlining a more important proportion of AR genes in AA6, AA5 and respectively SFA97 and SF98 groups. Some warm-blooded (FRI until K = 6, MER) and pony breeds (HAF) were classified with draught horses, while the CAM warm-blooded breed was clustered with ponies. As K reached 15, most breeds were shared among different clusters. The ARD, AUX and TDN breed constituted a single cluster while FJ/IS and LUS/PRE constituted two others. In a few cases, a single cluster was essentially associated to a single breed (BOUL, FRI, SHE, PRW).

Partition of diversity
In the set of the 19 French breeds, we found a gene diversity within population GD WS of 0.685, a gene diversity between populations GD BS of 0.073, and a total gene diversity GD T of 0.758. Table 3 shows between-breed, within-breed, and total contribution/variation of diversity according to Ollivier and Foulley [12] and Caballero and Toro [11] approaches. For within-breed diversity, CW and ΔGD WS ranged from -0.48 to 0.50 and from -0.0055 to 0.0069 respectively. In both cases, the POIT breed showed a particularly low within-breed diversity. CW and ΔGD WS were negatively correlated (r = -0.715, P = 0.001). For between-breed diversity, CB and ΔGD BS ranged from 0.85 to 12.60 and from -0.0041 to 0.0024, respectively. Here, the POIT breed showed a particularly high contribution to the between-breed diversity. The correlation between CB and ΔGD BS was not significant. D and ΔGD T , accounting for total diversity, were negatively correlated (r = -0.53, P < 0.019). They ranged from -0.32 to 1.25 and from -0.0042 to 0.0039, respectively. In both cases, the ARD and PS breeds showed a particularly low and high diversity, respectively.
Considering contributions to the between-breed diversity and probabilities of extinction, the BOUL, LAND and POIT breeds showed the highest cryopreservation potentials (2.95, 2.95 and 4.83, respectively).
Cluster assignment of each of the 39 populations to the K cluster Contributions of each population for an optimal GD T are given in Table 3: the composite PFS breed should contribute to 70% of the pool, for a total GD T of 0.79. Besides, to maximise the total gene diversity, seven of the 19 breeds should be maintained, namely the BOUL, COBND, LAND, PFS, POT, PS and SF breeds.

Gene diversity and genetic relations among breeds
Differences between breeds explained 10% of the total genetic variation, which is quite similar to other analyses, where values ranged from 8% to 15% [2][3][4]9]. According to previous studies using microsatellites, expected heterozygosities ranged from 0.47 for the FRI breed [6] to 0.80 for the Sicilian Indigenous breed [6]. In our study, only one result was found outside this range of values: 0.43 for the FRI breed, i.e. close to the value found by Luis et al. [6]. Plante et al. [9] recently analysed 22 Canadian and Spanish populations. Our estimated values of He were slightly lower (0.71 on average vs. 0.75, P = 0.048) for the eight breeds shared between their study and the present one. Differences on the within-breed diversity among studies using microsatellites can be explained, on the one hand, by the loci used and, on the other hand, by the populations analysed, incidentally belonging to similar breeds but having different recent histories. In the AR breed, we found a He value of (0.72) with a significant deficit of heterozygotes, which can be explained by the fact that this is an international breed in which mating between close relatives is common [26]. Plante et al. [9] and Luis et al. [6] have found similar results for the same breed, but not Aberle et al. [2] who observed a lower heterozygosity (0.57) without a heterozygote deficit. The PER population seemed to have a particularly high genetic diversity in the Plante study (He = 0.78), in comparison with the French PER population (He = 0.68). Because PER populations have been bred in America since the end of the 19 th century, such results should be interpreted bearing in mind that the French PER population has probably suffered from recent bottlenecks due to several modifications of the selection aims.
The three approaches based on genetic relationships (genetic distances, FCA and clustering methods) gave similar results. The populations considered in the present study can be classified into four more or less differentiated clusters: warm-blooded, draught, Nordic and pony breeds. Similar patterns of clustering have been found in other studies [2,3,9,10]. The draught horses constitute a quite homogenous group, including the nine French CW = contribution to within-breed diversity; CB = contribution to between-breed diversity; D = aggregate diversity;CP = Cryopreservation potential; ΔGD WS = Loss or gain of gene diversity within populations when breed is removed; ΔGD BS = Loss or gain of gene diversity between populations when breed is removed; ΔGD T = Loss or gain of total diversity when the breed is removed; C i = contribution of the breed to optimise GD T draught horse breeds and three breeds presently classified as pony (HAF) or warm-blooded (MER and FRI in a lesser extent) breeds. These three breeds were historically used as draught horse breeds and could therefore have been subject to crossbreeding with other draught horse populations in their past history. Pony breeds formed a group in an intermediate position in comparison to the other clusters. It also included the CAM breed, today recognised as a warm-blooded breed, but morphologically considered as a pony [27]. According to our analysis, FRI and PRW populations were found to be genetically isolated, which can be, to some extent, linked to a low genetic variability [28] due to historical bottlenecks within these breeds [2,29]. Moreover, another parameter explaining isolation of the PRW breed is the presence of rare alleles, which was in agreement with other studies [2] and expected for a population considered as a primitive wild horse.
Population differentiation tests and Bayesian approaches indicate clear differences between sub-populations of AA and SF. Such results may be largely explained by differences in the proportion of thoroughbred (PS) origins in the gene pool of these sub-populations. Within the AA breed, AA5 and AA6 populations appeared distinct from AA9 and AA10 populations and close to the PS breed. This was in agreement with the studbook rules: on the basis of pedigree data, AA5, AA6, AA9 and AA10 populations were indeed found to have respectively 94%, 89%, 44% and 59% of genes from PS origin (Sophie Danvy, personal communication). Within the SF breed, the SF8 (not differentiated from the PS breed) was distinct from SFA97 and SFB98 populations. This result was in agreement with previous results from pedigree data [30]: the SF8 was found to have 98% of genes from PS origin. The three draught breeds ARD, AUX and TDN, were found to be quite similar, which is linked to a common historical and geographical origin (north of France) [27]. Iberic breeds (LUS and PRE) were also found to be genetically quite close. These results and the fact that according to Bayesian approaches, the likelihood became stable before K reached the number of breeds, indicate that the most relevant division is situated at a level superior to that of the breeds [31]. Such a subdivision of the whole set can be explained by the existing crossbreeding management system in several horse populations.

Conservation priorities
In the present study, an almost comprehensive sampling of French breeds was achieved. The different approaches used gave an estimation of the contribution of each breed to the whole French horse stock. Petit [32] has proposed allelic richness as a good parameter to evaluate the genetic diversity of a population, useful as an indicator of past bottlenecks [33]. In our study, the POIT breed was found to have the lowest allelic richness and also one of the low-est within-breed contributions to diversity according to the two other methods used in the study. Because of the strong correlation with the mean number of alleles, the concept of allelic richness interest seemed to be of limited value in our study.
The results given by the aggregate diversity and gene diversity approaches were slightly correlated. By definition, breeds with low contributions to aggregate and total diversities should have related breeds in the data set. Thus, ARD, TDN, and AUX breeds, which were genetically highly related, illustrate quite well such a hypothesis.
According to the approaches of Ollivier and Foulley [33] and Cabalero and Toro [11], populations that contributed a lot to the total diversity were mostly non-endangered breeds (AR, PS, SF, TF). There were, however, some differences between the two methods when considering the eight breeds classified as endangered or endangered/ maintained by the FAO (ARD, AUX, BOUL, LAND, MER, POIT, POT, TDN). Using the approach of Ollivier and Foulley [33], contributions to aggregate diversity D of BOUL, MER and POIT breeds were quite high, and taking into account population size, CP was the highest for BOUL, LAND and POIT breeds. Using the approach of Caballero and Toro [11], GD T decreased only when LAND and POT breeds were removed, and those two breeds plus the BOUL breed should have been kept to optimise GD T . The differences can be explained by the methods used in the two approaches, particularly considering the evaluation of the contributions to between-diversity. Using the approach of Caballero and Toro [11], some Weitzman criteria, such as the twin property [22], were not applied: for instance, assuming that two populations are genetically identical but very different from the whole set, removing one of them will largely decrease GD BS , which will not be the case when using the Weitzman approach. However, one advantage of the approach of Caballero and Toro [11] is the fact that there is no need to give weight to withinand between-diversities to compute total diversity, since by definition GD T is the sum of GD WS and GD BS . In fact, our results outline that both approaches should be considered as complementary to identify which breeds have to be taken into account in a context of genetic resource management. Therefore, conservation priorities should concern particularly BOUL, LAND, MER, POIT and POT breeds.
Another advantage of the method of Caballero and Toro [11] is the possibility of computing the contribution of each population to optimise total diversity. Such an approach was designed to conserve a large diversity of alleles. Therefore, it is not surprising to notice that the three breeds (PFS, SF, BOUL) that should have the highest contribution to optimise genetic diversity represent the three identified genetic differentiated groups. The importance of the PFS breed is due to the fact that this synthetic pony breed has the largest number of alleles. SF, another composite breed, has a smaller variability but carries alleles representative of the warm-blooded breed group, while the BOUL breed carries alleles seldom present in the two other breeds but frequent in draught horses.
Finally, several considerations have to be taken into account before taking final conservation decisions [34], such as the special range of performances for given traits, current production systems associated to the breed, sociocultural value, or dynamics of the group of breeders. Between 1998 and 2003, births remained more or less stable for BOUL, LAND, POIT and POT breeds, but decreased for the MER breed [35]. In the endangered breeds, specific uses should be supported to maintain a demand for such horses (production of mules for the POIT breed, ecotourism for local breeds, draught activities, meat production). Genetic variability should also be managed, especially since some of these breeds constitute a pool of original genes (BOUL, MER and POIT) (see Figure 3). For instance, sires with different origins should be used [36]. When populations of the same breed are raised in other countries (such as the POT breed in Spain [31]), regular exchanges should be organised between both countries to maintain a relatively large variety of reproducers.

Conclusion
Based on this study, horse breeds raised in France can be clustered into four groups. These groups were found to be meaningful according to the use of breeds, morphological characteristics and/or geographical origins. The combined use of different methods allowed us to identify breeds for which conservation efforts should be a priority, in order to preserve the maximum genetic variability. Since several horse studies have used similar panels of markers [7,9,10], it would be interesting to merge the corresponding data.