Genetic diversity of eleven European pig breeds

A set of eleven pig breeds originating from six European countries, and including a small sample of wild pigs, was chosen for this study of genetic diversity. Diversity was evaluated on the basis of 18 microsatellite markers typed over a total of 483 DNA samples collected. Average breed heterozygosity varied from 0.35 to 0.60. Genotypic frequencies generally agreed with Hardy-Weinberg expectations, apart from the German Landrace and Schwäbisch-Hällisches breeds, which showed significantly reduced heterozygosity. Breed differentiation was significant as shown by the high among-breed fixation index (overall FST = 0.27), and confirmed by the clustering based on the genetic distances between individuals, which grouped essentially all individuals in 11 clusters corresponding to the 11 breeds. The genetic distances between breeds were first used to construct phylogenetic trees. The trees indicated that a genetic drift model might explain the divergence of the two German breeds, but no reliable phylogeny could be inferred among the remaining breeds. The same distances were also used to measure the global diversity of the set of breeds considered, and to evaluate the marginal loss of diversity attached to each breed. In that respect, the French Basque breed appeared to be the most "unique" in the set considered. This study, which remains to be extended to a larger set of European breeds, indicates that using genetic distances between breeds of farm animals in a classical taxonomic approach may not give clear resolution, but points to their usefulness in a prospective evaluation of diversity.

breeds, but no reliable phylogeny could be inferred among the remaining breeds. The same distances were also used to measure the global diversity of the set of breeds considered, and to evaluate the marginal loss of diversity attached to each breed. In that respect, the French Basque breed appeared to be the most "unique" in the set considered. This study, which remains to be extended to a larger set of European breeds, indicates that using genetic distances between breeds of farm animals in a classical taxonomic approach may not give clear resolution, but points to their usefulness in a prospective evaluation of diversity. genetic diversity / molecular marker / conservation / pig / European breed Résumé -Diversité génétique de onze races porcines européennes. Un ensemble de onze races porcines en provenance de six pays européens, et incluant un petit echantillon de sangliers, aété choisi pour uneétude de diversité génétique. Cette diversité aétéévaluée sur la base de 18 marqueurs microsatellites typés sur un total de 483échantillons d'ADN. Les racesétudiées manifestent un taux d'hétérozygotie allant de 0,35à 0,60. Les locus sont enéquililibre de Hardy-Weinbergà l'exception du cas des races allemandes Landrace et Schwäbisch-Hällisches, qui manifestent un déficit d'hétérozygotes. L'indice de différenciation entre races estélevé (F ST global de 0,27) et les distances génétiques entre individus permettent de les regrouper pratiquement en 11 ensembles distincts, correspondant aux 11 races considérées. Les distances génétiques entre races ont d'abordété utilisées pour construire des arbres phylogénétiques. Ces arbres suggèrent qu'un modèle de dérive génétique pourrait expliquer la divergence des deux races allemandes, mais aucune phylogénie fiable n'a puêtreétablie entre les races restantes. Les mêmes distances ont ensuiteété utilisées pour mesurer la diversité génétique globale de l'ensemble etévaluer la perte marginale de diversité associéeà chacune des racesétudiées. De ce point de vue, la race française Basque apparaît comme la plus originale dans l'ensemble considéré. Cetteétude, qui resteàétendreà un plus grand nombre de races européennes, indique que l'utilisation des distances entre races animales domestiques dans une approche taxonomique classique risque d'avoir un faible pouvoir de résolution, mais elle souligne l'intérêt de les utiliser plutôt pour desévaluations prospectives de diversité.

INTRODUCTION
Europe contains a large proportion of the pig world population (circa 30%) as well as of the pig world genetic diversity (37% of the breeds included in the FAO inventory, according to Scherf [25]). However, the European pig industry relies predominantly on a limited number of breeds, since one single breed, the widely known Yorkshire (Large White in many countries), represents about one third of the slaughter pig's gene pool of the European Union. Europe thus needs sources of novel genetic variation in order to improve commercial lines, as exemplified by the Chinese Meishan breed included in several synthetic lines. Also, novel genetic variants may be needed in order to respond to changes in consumer demand or to be integrated in sustainable agricultural systems.
Conservation programmes, using both in situ and ex situ techniques, are already under way in several European countries. In particular, gene banks are currently being developed, though there are few for the pig. The need for quantifying biodiversity in order to better rationalize conservation policies is recognized (see Weitzman [32]).
In order to facilitate and rationalize the maintenance of pig genetic diversity, it is essential that simple assays be quickly developed taking advantage of the molecular genetics tools now available. Such tools have recently been developed through progress made in genome studies and genotyping technologies. Major contributions to the making of genetic maps have been made through the EC-co-ordinated Pig Gene Mapping Project (PiGMaP) over the period 1991-1996 (Archibald et al. [2]). In the second phase of this project, covering the period 1994-1996, a pilot study on genetic diversity was planned (Archibald [1]), along the recommendations made in 1993 to FAO by a working group (Barker et al. [4]). The results obtained are presented in this paper, and conclusions for further investigations are discussed.

The breeds sampled
In order to sample the European pig diversity, an initial set of 12 breeds belonging to 7 different countries was identified and animals were selected according to the following sampling protocol. In large breeds, the sampling objective was 50 animals (25 males, 25 females) unrelated at the grandparental level. For smaller breeds, as this was often not possible, the objective was a male and a female from each of 25 litters, each litter being farrowed by a different female, and the 25 litters representing as many different sires as possible. The 7 laboratories involved in the study were responsible for blood collection and preparation of the DNA samples in the breed(s) of their respective countries.
The 12 breeds of the study are listed in [1] (Tab. of p. 200). The Tamworth breed was eventually not sampled, and the remaining set in this analysis therefore included 11 breeds, originating from 6 countries. Table I gives the list of those breeds, the codes used in the following presentation and the sizes of the samples. It can be seen that the objective of 50 pigs per breed was only reached (or approached) in the first 8 breeds of Table I. It should also be mentioned that the Wild Pig sample provided by Sweden (SEWP) came from wild animals hunted in Poland. For that reason, this population could not be sampled according to the rules applied in domestic breeds. Finally a total of 483 DNA samples were collected (see Tab. I).
General information on those breeds is entered in the Animal Genetic Data Bank of the European Association for Animal Production (EAAP-AGDB). This information may be found in [26] and at http://www.tiho-hannover.de/einricht/zucht/eaap/index.htm. Similar information may be found in the FAO Domestic Animal Diversity Information System (DAD-IS: see [25] and http://www.fao.org/dad-is/).

The panel of microsatellite markers selected and the typings
A panel of microsatellite markers was selected by D. Milan (INRA) and M. Groenen (WAU), following the FAO recommendations for diversity analyses [4], and further approved by the FAO-ISAG Advisory Committee for genetic distance studies. The markers were chosen for their quality, polymorphism, and absence of null alleles at the time of selection. At least one marker on each chromosome was selected, apart from chromosome 18 (see Tab. II). When two markers were on the same chromosome, they were chosen with a minimal distance of 30 cM (for more information on the panel see http://www.toulouse.inra.fr/lgc/pig/panel.html). Table II also gives the numbers of alleles per locus in this set, which are on average markedly above those found in the reference families of [2] and [23].
The typings of the DNA samples were distributed among the five following laboratories: Castanet-Tolosan (Toulouse) for the four FR breeds and the BEPI, Wageningen for the NLLW, Hohenheim (Stuttgart) for the two DE breeds, Copenhagen for the DKSO and Uppsala for the SELR and SEWP breeds. All laboratories used automated ABI sequencers with fluorescent dyes, apart from the Hohenheim Laboratory where an ALF automated sequencer was used.
For further standardization of genotypes, 4 control animals were analysed either on the same gels (FR, BE, NL, DK, SE), or on control gels (DE). These 4 animals were chosen from the PiGMaP reference families [2], namely 2 French F1 animals from a Large White × Meishan cross and 2 Swedish F1 animals from a Wild Pig × Large White cross. Moreover, to avoid differences in primer synthesis, all laboratories used primers from a single synthesis provided by Max Rothschild (Ames, Iowa). Raw data (allele size) were collected in Toulouse for identification of genotypes (allele reference sizes are available at http://www.toulouse.inra.fr/lgc/pig /panel/refsize.htm).
In spite of the standardization, it was not always possible to unambiguously identify the genotypes analysed in 5 different laboratory conditions. Thus the number of genotypes identified was generally variable across breeds and loci, and the genotype could not be determined for some breed-marker combinations (see Tab. II). In particular, genotypes could not be unambiguously identified for 7 markers (SW72, S0228, S0101, S0386, S0068, S0355, SW936) in DELR and DESH. In addition, the CGA locus exhibited very long alleles that could not be resolved in most breeds and also had to be discarded. As a result, only 18 loci could be used for comparing the breeds. Finally, out of the 483 DNA samples collected a maximum of 467 animals could be used in the genetic analyses (see Tab. II).

Within-breed diversity
Observed heterozygosities and their unbiased estimates taking account of sample sizes were computed per autosomal locus and per breed, according to the method described in [6]. An exact test of Hardy-Weinberg equilibrium was performed (GENEPOP [20]), with a Bonferoni correction for repeated tests over 187 breed-locus combinations. The exact P-value was obtained either by the complete enumeration method [15] for loci with fewer than five alleles, or by the Markov Chain method of [12] otherwise.

Between-breed diversity
Breed differentiation was evaluated by the fixation indices of Wright (see [30] and [22]). The null hypothesis of random mating within and between populations was tested by means of permutation tests (allele permutation within population to test for F IS , and individual permutation between populations to test for F ST ) as shown by [6].
Genetic distances between individuals were estimated on the basis of their own genotypes, using a multi-locus estimation of the kinship coefficients. This between individual genetic distance D BI is defined as D BI = 1 − P [drawing two identica1 alleles from the two individuals] [7,8], setting D BI = 0, however, when the two individuals have identical genotypes.
Genetic distances between breeds were calculated based on the allelic frequencies in each breed, or in each breed-sex combination with appropriate weight for the X-linked marker (1/3 for males and 2/3 for females). An equal number of males and females was assumed in the 2 breeds (SELR and SEWP) in which the sex was not identified. Two measures of distances were used, namely the Reynolds' [21] and the standard Nei's distances [17], taking account of the corrections needed for small sample size [18].

Clustering, phylogenetic tree reconstruction and measures of breed diversity
Distances between individuals were used to infer phylogenies by the unweighted pair-group method with arithmetic mean (UPGMA) described in [13], [27] and [5]. Distances between breeds were also used for tree construction according to the neighbour-joining algorithm of [24], giving unrooted trees. The bootstrapping procedure of PHYLIP [9] was used to evaluate the significance of tree nodes and was extended to account for unequal sample size across breeds and loci.
Genetic distances can also be used to measure diversity, as proposed by Weitzman [31,32]. This approach has been implemented here to provide a further upward hierarchical representation of the breeds and to evaluate marginal losses of diversity due to various patterns of breed extinction, as advocated by [28].

Heterozygosity and deviation from Hardy-Weinberg equilibrium
For each breed, Table III shows the observed and expected heterozygosities and the numbers of alleles averaged across the 17 autosomal loci. Observed heterozygosities ranged from 0.35 (for FRBA) to 0.60 (for BEPI) and average numbers of alleles from 3.22 (FRBA) to 5.72 (DESH). Three loci, S0215, S0225 and SW951, were fixed in 6, 2 and 1 of our breeds respectively, and the 2 loci of chromosome 5, S0005 and IGF1, reached a 0.92 observed heterozygosity in the wild pig sample. The heterozygosities observed are close to their expectations in all breeds except in DELR and DESH which show a markedly reduced heterozygosity.
Deviations from Hardy-Weinberg equilibrium are significant for 8 locusbreed combinations out of 187, which represents a percentage slightly below the 5% expected in such a number of tests under the hypothesis of equilibrium.
However the deviations are all observed in DESH and DELR, which are the only two breeds showing a globally significant deviation. In both cases, deviation from Hardy-Weinberg equilibrium is linked to a quite high positive F IS . Table III also shows that the breeds vary relatively more in effective size than in heterozygosity. However, the significant rank correlation (0.8) between population size and heterozygosity among the breeds in Table II indicates a tendency for a positive association.

Breed differentiation and genetic distances
The fixation indices of Table IV show a generally high level of genetic differentiation between breeds, with quite large differences across loci.

Clustering and phylogenetic trees
The between individuals UPGMA tree of Figure 1 shows eleven clusters grouping the individuals which belong to the same breed. The only exceptions are an exchange between DESH and DELR and a DESH individual which does not fit in with any breed. The neighbor-joining trees based on both distances indicate that, apart from the two German breeds, no reliable phylogeny can be inferred since only the node linking the two German breeds shows a bootstrap value (of 90%) close to significance. When the analysis was restricted to the 9 breeds for which genotypes were available at 25 loci (thus excluding the two German breeds), even lower bootstrap values were obtained (results not shown). This suggests that no reliable phylogeny can be constructed among those breeds, as if they had differentiated according to a radiative scheme of divergence. In an analysis restricted to the ten domestic breeds, after excluding the small sample of wild pigs, the phylogeny of Figure 2 was obtained, further confirming a radiative scheme of divergence.

Distribution and amount of diversity
The Weitzmann representation, based on the Reynolds distance, is shown in Figure 3, in which the branch length of each breed can be read as approximately measuring its relative contribution to the corresponding diversity function. The marginal losses of diversity attached to each breed, which may be taken as a measure of their "uniqueness", are shown in Table VI, based on the two distances considered. On average, the highest and lowest losses of diversity are incurred with the extinction of the Basque or the Piétrain breeds, respectively. It can also be seen from Table VI that the loss of the two German breeds (DELR and DESH) induces a markedly higher loss than the sum of the corresponding individual breed losses, whereas the losses attached to two French local breeds (FRBA and FRLI) add up almost exactly.

Within population structure
In these European pig breeds, average heterozygosity observed is around 0.5 (Tab. III). This level of polymorphism is similar to the values so far reported for microsatellites in European pig and cattle breeds, e.g. by [10], [29] and [16], but below the values observed in human or chimpanzee populations where the expected heterozygosity ranges from 0.7 to 0.9 [11].  (1) Breed loss . Figure 3. Dendrogram of relationship established by the method of Weitzman [31] using the Reynolds pairwise distances among the ten domestic breeds and the wild pig.
This level of polymorphism when compared to the corresponding effective sizes of the breeds, ranging from 13 to over 30 000 (Tab. III), cannot be seen as the result of an equilibrium between drift and mutation. Under such a model, assuming a mutation rate u of about 10 −4 for microsatellites and with the effective sizes of Table III, 4N e u should vary from 0.005 to 13 and the equilibrium values of heterozygosities would be expected to vary from 0.005 to 0.93. This contrast with the observed values, though based on current effective sizes which may not reflect past ones, tends to confirm that standard population genetics models cannot be easily extended to sets of breeds of farm animals; probably because they cannot be considered as separate closed populations.
Since the 27 markers were selected, null alleles have been identified in other familial studies: for instance S0215 (Moser et al. unpublished), and S0386 (Archibald et al., personal communication). However, our study did not provide any evidence of null alleles since 179 breed-locus combinations out of 187 may be considered as being in Hardy-Weinberg equilibrium. Therefore, if null alleles existed in our breeds their frequencies would probably be low and would not greatly distort the genotypic frequencies. In addition, all loci showing a significant deviation from random union of gametes belonged to the two German breeds. This suggests some inbreeding effect, counterbalanced by high numbers of alleles (yielding a high expected heterozygosity), though the presence of null alleles only in these breeds cannot be excluded.

Genetic structure of the 11 breeds sampled
The microsatellites used did not exhibit any breed specific allele allowing simple identification of the breed to which each animal belonged. However, the UPGMA tree of individuals is in very good agreement with the breed structure (Fig. 1). More precisely, using breed allelic frequencies to calculate the likelihood that an animal belongs to a given breed and then assigning the animal to the breed showing the largest likelihood (as proposed by Paetkau et al. [19]) allowed all animals to be correctly assigned. In most cases this result was obtained because an individual from one breed carried at least one allele which was absent in the other breeds. This indicates that these markers provide a way of measuring the genetic differentiation between the breeds considered. This strong differentiation is also confirmed by the very large F ST values of Table IV.
Neglecting the effects of migration, and assuming a low contribution of mutations to the genetic diversity between these breeds, the differences in allelic frequencies may be interpreted as primarily due to random genetic drift. The genetic differentiation may be seen as the result of an increased mean inbreeding coefficient F over a rather recent period of time. Under this hypothesis, the most appropriate measure of diversification is provided by the Reynolds distance. This distance has an expected value of 0.5 (F 1 + F 2 ), where F 1 and F 2 are the increases of inbreeding since divergence, or, more generally the average F i , The tree of Figure 2 shows that DESH and DELR are closely related. The high distances separating them from the other breeds and their higher numbers of alleles suggest that genetic drift might have structured these breeds into 2 groups, a group of German breeds and another group of non-German breeds among which it is difficult to distinguish any particular structure. The assumption of a radiative divergence of the non-German breeds agrees with the tentative phylogeny of Figure 2, which may sum up our interpretation of the genetic differences observed between these European breeds. On the other hand, the dendrogram of Figure 3 could suggest the existence of a distinct subset of breeds belonging to the Landrace family, extending from the DELR to the FRNO branches. These interpretations are of course limited to the ten domestic breeds available in this study and they would obviously need to be confirmed on a larger set.

Breed diversity
This study gave an opportunity for evaluating the global diversity of the set of breeds considered, using the approach of Weitzman [31,32]. Table VI clearly shows the wide range of the contributions of each breed to the overall diversity, ranging from about 4 to 15%. Table VI also shows that the results are not entirely consistent over the 2 measurements of genetic distances used. It can be noted that the Reynolds distance appears to be slightly more discriminating between breeds, since contributions range from 4 to 17%. Based on this distance, the 4 French local breeds altogether account for half of the total diversity, which is an indication of the potential value of preserving local endangered breeds in the maintenance of a species biodiversity. But, here again, our conclusions should be considered as relative to the limited sample of breeds considered, and do not preclude conclusions which might be obtained on a more comprehensive set of breeds.

CONCLUSIONS
This study may be one of the first demonstrations of the feasibility of evaluating genetic diversity across different countries following the FAO recommendations [4]. An evaluation of buffalo genetic diversity along the same lines by Barker et al. [3] is also to be mentioned. Once an agreement is reached on a common set of markers, the essential requirements for achieving comparability of allele sizing between different laboratories are (i) to include on the same gel a set of common control DNA samples previously distributed to the participants, and (ii) to preferably use primers derived from a single synthesis, as done in the present experiment. For further studies, we strongly suggest use of DNA from the control animals mentioned before, which are available upon request to L. Andersson and D. Milan.
The panel of markers used in this trial exhibited a very high polymorphism, confirming an early study of microsatellite polymorphisms in 4 major pig breeds by [10] and the study on Belgian pig breeds of [29]. There are also good indications that null alleles were at a low frequency in the samples investigated. The 11 breeds chosen exhibit a very strong differentiation. In spite of this, it appeared difficult to infer any reliable phylogeny among those populations. This may not be too surprising given that our present domestic breeds are not likely to have resulted from a strict tree-like branching process, as noted by [28]. On the other hand, there is a need for measuring the overall diversity of a set of breeds, since prospective evaluations of diversity are required for defining appropriate conservation policies, as advocated by [32]. Such an approach may be based on standard genetic distances, which is the Weitzman approach, though similar procedures may also be implemented from contingency tables of allelic frequencies, as shown by [14]. Our results certainly point to the usefulness of global evaluations of diversity using molecular markers for the choice of breeds worthy of preservation. However, as stressed by [4], final decisions should take into account additional information on traits of economic importance and on specific adaptive features.
We thank Prof. Max Rothschild (Ames, Iowa), US Pig Genome Co-ordinator, for having freely provided the primers to the five typing laboratories in this project.
Comments made by two anonymous referees are also gratefully acknowledged.