Genetic diversity measures of local European beef cattle breeds for conservation purposes

This study was undertaken to determine the genetic structure, evolutionary relationships, and the genetic diversity among 18 local cattle breeds from Spain, Portugal, and France using 16 microsatellites. Heterozygosities, estimates of Fst, genetic distances, multivariate and diversity analyses, and assignment tests were performed. Heterozygosities ranged from 0.54 in the Pirenaica breed to 0.72 in the Barrosã breed. Seven percent of the total genetic variability can be attributed to differences among breeds (mean Fst = 0.07; P < 0.01). Five different genetic distances were computed and compared with no correlation found to be significantly different from 0 between distances based on the effective size of the population and those which use the size of the alleles. The Weitzman recursive approach and a multivariate analysis were used to measure the contribution of the breeds diversity. The Weitzman approach suggests that the most important breeds to be preserved are those grouped into two clusters: the cluster formed by the Mirandesa and Alistana breeds and that of the Sayaguesa and Tudanca breeds. The hypothetical extinction of one of those clusters represents a 17% loss of diversity. A correspondence analysis not only distinguished four breed groups but also confirmed results of previous studies classifying the important breeds contributing to diversity. In addition, the variation between breeds was sufficiently high so as to allow individuals to be assigned to their breed of origin with a probability of 99% for simulated samples.


INTRODUCTION
During the last forty years, it has become clear that biochemical analyses of genetic variation can provide valuable insight into the genetic structure and evolutionary history of cattle populations. Studies have been undertaken on a broad scale to encompass populations not only from different regions of the globe but also at a local level among closely related populations within particular regions [4,18,22,30,33,38]. Manwell and Baker [31] were the first to present a phylogenetic tree for the ten major cattle breed-groups of Europe, Western Asia, and Northern Africa. By reviewing the data on protein polymorphism, they were able to demonstrate that it was in positive agreement with morphological and geographical divisions of the major breed-groups. They were not able, however, to study relationships between individual breeds.
More recently, molecular techniques have provided new markers for the study of genetic variation [6,27,37]. Among these, microsatellites (repetitive elements containing simple sequence motifs, usually dimers or trimers) have quickly become the favourite agents for population genetic studies as they offer advantages which are particularly appropriate in conservation projects. First, they are widely available. Second, they exhibit a high degree of polymorphism. Third, as genetic systems, they are comparatively easy to automate with the possibility of multiplex amplification of up to five loci in a single PCR reaction and of multiple loadings of up to fifteen loci per lane in some highly optimised gel systems. In addition, it is assumed they are neutral to selection, the observed genetic diversity being the consequence of two forces: genetic drift and mutation.
In the last five years, different studies of genetic relationships between cattle breeds using microsatellites have been published. MacHugh et al. [28] analysed 20 microsatellites in different cattle populations from Africa, Europe, and Asia highlighting a marked distinction between humpless (taurine) and humped (zebu) cattle which provides strong support for the hypothesis of a separate origin of domesticated zebu cattle. Studies aimed at characterising relationships within the African group [45] or within the European group of cattle breeds have focused on breeds from Italy [10], Spain [32], Belgium [36], the British Isles [29], France [35], and Switzerland [42]. It is difficult, however, to group the data from these studies together in order to clarify the genetic relationships among the major types of cattle because they do not use a common set of microsatellites. For this reason, the FAO has proposed a list of thirty microsatellites for the analysis of genetic diversity in European cattle breeds.
The primary goal of this study is to assess the genetic variation within, and between, breeds and groups of breeds. A secondary aim is to define a diversity measure which will permit the ranking of breeds for conservation purposes thus providing useful information concerning the relative contribution to genetic diversity of 18 local cattle breeds from Spain, Portugal, and France using 16 microsatellites (15 of which are from the FAO list).

Cattle breeds
The breeds included in this study (Tab. I) are characterised by a widespread regional distribution, small population size, and ties to traditional production systems.
Regarding their morphological attributes, most of the breeds show pigmentation similar to their wild ancestor, from reddish-brown to brownish-black, with black pigmentation restricted to the extremities (Alistana, Mirandesa, Maronesa, Barrosã, Asturiana de los Valles, Asturiana de la Montaña, Aubrac). In some breeds (Tudanca, Gasconne and Bruna) red pigmentation tends to lighten considerably as the animals age. The most commonly observed variants are solid black (Morucha and Avileña) and red pigmentation (Retinta, Alentejana, Pirenaica, Salers) although a colour-sided (Mertolenga) breed was also found in this study. Most of the breeds included in the project have never been exposed to reproductive technology or other breeding tools related to artificial discriminative mating thus limiting the male and female gene flow between breeds with individual dispersion only at local levels. Nevertheless, the lack of organised studbooks, most of them created recently, for many of the breeds has facilitated a certain degree of genetic introgression between them.

Sampling of populations
The sampling process is of great importance as it allows us to determine the kind of inferences which can be made. In order to reflect the current genetic composition, individuals can be considered to have been sampled at random within-generation.
Fresh blood collected in a conservative buffer was taken from 50 individuals (25 males and 25 females).

DNA extraction and PCR amplification
DNA was extracted using established procedures [20,41] that guarantee long-term stability of DNA samples. Primers and Polymerase Chain Reaction (PCR) conditions are described in Table II. The PCR analysis of microsatellites was carried out by loading onto standard 7% polyacrilamide denaturing gel using silver staining [2] or fluorescent-labelled PCR primer methods through an automated DNA fragment analyser (Applied Biosystem 373 or 377). In order to ensure the compatibility of results from different equipment and laboratories, 3 types of reference DNA were used: Type 1 = reference DNAs (n = 9) from the AIRE 2006, Type 2 = reference DNA (n = 4) from this project, Type 3 = reference DNA (n = 2) from individual laboratories. Moreover, the accurate sizing of allele fragments of these 15 reference DNAs was checked by each of the four laboratories involved in the study. In addition, to ensure the compatibility of results within each laboratory, Type 3 DNAs were used as standards for each loaded gel.

Statistical analysis
The BIOSYS-1 package [47] was used to compute allele frequencies by direct counting, as well as the number of alleles, and unbiased estimates for expected (He) and observed (H o ) heterozygosity.
Different genetic distances clustered into three groups were used: 1) genetic distances considered appropriate under a pure drift model where genetic drift was assumed to be the main factor in genetic differentiation among closely related populations or for short-term evolution [39,48,52] -using the traditional differentiation-between-population estimator F ST [55] and the Reynolds genetic distance estimator [39]; 2) genetic distances that assume a step-wise-mutation model, i.e., average squared distance [16] and delta-mu squared distance [17]; 3) a non-metric genetic distance based on the proportion of shared alleles [5]. All genetic distances were estimated using MICROSAT [34] except for the Reynolds distance for which the PHYLIP package [13] was used. The productmoment correlation (r) and Mantel test statistic were computed for pairwise comparisons of distance matrices.
After defining groups of breeds by country or by trunk (a set of breeds with a hypothetical common ancestor) using a priori information, a hierarchical analysis of variance was carried out which permitted the partitioning of the total genetic variance into components due to inter-individual differences on the one hand and inter-breed differences on the other. Variance components were then used to compute fixation indices [55] and their significance tested using a non-parametric permutation approach described by Excoffier et al. [12]. Computation was carried out using the AMOVA (Analysis of Molecular Variance) programme implemented in the ARLEQUIN package [43].

Multivariate correspondence analysis
Phylogenetic reconstruction and the use of genetic distances do not take into account the effects of admixtures between branches. Alternatively, the representation of genetic relationships among a group of populations may be obtained using multivariate techniques which can condense the information from a large number of alleles and loci into a few synthetic variables.
Correspondence Analysis [3,26] is a multivariate method analogous to the Principal Components analysis but which is appropriate for categorical variables and leads to a simultaneous representation of breeds and loci as a cloud of points in a metric space. As with the Principal Components analysis, axes, which are ranked according to their fraction of information, span this space with each axis independent of the others. Inertia, or dispersion, measures this information, i.e., the direction of maximum inertia is the direction in which the cloud of points is the most scattered. The basic concept of inertia can be related to the well-established population parameter F ST [19] as well as to genetic diversity [24].
Allele frequencies of all loci were used as variables to spatially cluster the breeds using a correspondence analysis based on Chi-square distances to judge proximity between them.

Computing diversity
Following the Weitzman approach [53,54], the Reynolds genetic distances were used to compute marginal losses of genetic diversity. After transforming the genetic distance matrix into a distance matrix with ultrametric properties, a maximum likelihood tree was drawn using NTSYS [40].

Breed assignment
The assignment of an anonymous animal i to a set of breeds, r 1 , . . . r n , was based on the maximum likelihood discriminate rule, i.e., animal i was assigned to the population which maximises the conditional probability (P[i|r]). Let P r,l,a be the frequency of allele a in the l locus and r breed, then When one allele was missing in a specific population, we assigned a small, but positive, probability of the allele in this breed 1/(2n + 1) where n was the sample size of the breed [44]. A traditional way of expressing the significance of a particular result is by using the log of likelihood ratio (LOD). If the interest is to classify an anonymous sample in one of two possible populations, it is necessary to determine the distribution of the appropriate statistic under the null hypothesis (H 0 ) by bootstrap or by simulating allele frequencies. Given that it is not possible to directly determine the LOD distribution when many loci are used, we simulated 100 000 genotypes per breed using allele frequencies according to the assumptions of Hardy-Weinberg and linkage equilibrium. The frequency at which each animal was correctly assigned to its breed provided the probability of assignment, and the distribution of the LODs for pairs of breeds, or populations, allowed for the construction of confidence thresholds.

Variation within, and among, populations
A total of 173 distinct alleles were detected across the 16 loci analysed. The mean number of alleles (MNA) per locus per breed was 6.5 (Tab. I).
Levels of apparent breed differentiation were considerable with multilocus F ST values indicating that around 7% of the total genetic variation corresponded to differences between breeds while the remaining 93% corresponded to differences among individuals. Table III presents F ST values when breeds were considered in pairs. Genetic differentiation values among breeds ranged from 3% for the Aubrac-Salers pair to 15% for the Mirandesa-Tudanca pair. All values were different from 0 (P < 0.01). Values above the diagonal in Table III represent the number of individuals between populations exchanged per generation (Nm, where N is the total effective number of animals and m the migration rate) which balanced the diversifying effect of the genetic drift.
The AMOVA analysis permitted the partitioning of the genetic variability between different sources of variation -hypothetical trunks, or countriesand breeds were the main factors in the analysis carried out in this study. Results of the analysis of variance are shown in Table IV. Clearly, variability (excluding individual variability) was taken into account when looking at the breed factor leaving a low, yet significant, genetic variability (< 1.5%) at the trunk (Tab. IVa), or country level (Tab. IVb). Less than 1.5 per cent of the total genetic differences detected was due to the hypothetical trunk (1.43) or to the country of origin (1.36) to which the breeds were assigned.

Correspondence Analysis
The first two axes contribute 14% and 13% of the total inertia respectively (Fig. 1). The Sayaguesa breed was isolated from the others and represents 12% of the total inertia respective to the other 18 breeds. Axis 1 separates the Mirandesa and Alistana breeds as well but shows no special proximity between the two. Axis 2 separates two blocks: block I (Gasconne, Salers, Aubrac, Bruna) and Block II (Mirandesa, Alistana, Sayaguesa).
The most important alleles are INRA 032 (170 bp) which contributes 17% in Axis 1 and 9% in Axis 2, and ETH 3 (109 bp) which contributes 8% and 6% in Axis 1 and 2, respectively. Allele INRA 032 (170 bp) is a nearly unique characteristic of the Sayaguesa breed with a frequency of 40% that was absent in the other breeds except the Gasconne and Salers (4% and 1%, respectively). Although this allele appeared in only 9% of the entire breed population studied, allele ETH 3 (109 bp) can be closely associated with the Alistana and Mirandesa breeds which demonstrated a 34% and 58% frequency, respectively.
Observing the importance of allele INRA 032 (170 bp), the analysis was repeated excluding this microsatellite, enabling us to detect a change in the axes -a 15% change in the first axis separating the Alistana and Mirandesa from the other breeds and an 11% change in the second axis separating the Sayaguesa from the others. It became clear at this point that inertia, explained by the   change from 12% to 7.2% in the Sayaguesa breed, no longer discriminated this breed from the rest since, for example, the Mirandesa had an inertia of 9.4%. In summary, the Sayaguesa is a breed which can be differentiated from the others, however, this result was obviously amplified by the presence of allele INRA 032 (170 bp) which was present in 40% of the breed and absent, or rare, in the other breeds. Taking into account the position of the Sayaguesa breed, we repeated the analysis excluding this breed. This caused a radical change in the results, which created a zooming-in effect on the other 17 breeds and thus facilitated our ability to interpret the findings.

Evaluation of diversity
In contrast to traditional hierarchical clustering methods, the use of the concepts of link and representative elements (breeds) allows for a unique topology [49]. The tree generated by the algorithm (Fig. 2) has the property of a maximum evolutionary likelihood and the diversity function defined is equal to the total branch length of the tree. The loss of diversity caused by the extinction of a breed, or a set of breeds, can be approximately inferred by looking at the tree or can be exactly quantified by recalculating the total amount of diversity after eliminating the breed, or set of breeds, in question. For instance, a value of 11 585 was found when computing the diversity of the initial set of breeds, and it dropped down to 10 712, a 17% loss of diversity, after the elimination of the Sayaguesa and Tudanca breeds.

Breed assignment
Results for the assignment of animals to populations using 16 microsatellites are presented in Table V, where the assignment of 100 000 simulated individuals to the breeds is shown. Misclassified individuals were distributed among all breeds. The Sayaguesa and Mirandesa were the breeds most often correctly classified, and the Retinta and Barrosã those most frequently misclassified. Apart from the Salers breed in which 50% of the misclassified individuals were assigned to the Aubrac breed, we did not observe any systematic assignment of animals from one breed to another.
The set of markers used in this study provided a high discriminant power between pairs of breeds: for two closely related populations as are both Table V. Breed assignment using 16 microsatellites and the maximum likelihood classification rule for eighteen bovine breeds.
Asturiana breeds, only 1.2% of the individuals were misclassified. This can be interpreted from a classical hypothesis testing point of view; if for a certain anonymous sample the test "H 0 : the sample is Asturiana de Valles, H 1 : the sample is Asturiana de Montaña" is carried out and we set a conservative significance level (0.01), the power of the test 1 − Pr(type II error) is 0.98.

DISCUSSION
Assuming that we are working with a neutral polymorphism, three forces remain that can be used to explain the genetic diversity observed: mutation, genetic drift, and migration. Since mutation is important only when studying long periods of time, we accept that the forces to be considered in this sort of study are genetic drift, the source which contributes to diversity, and migration, the opposite force which tends to homogenise the breeds. Reproductive isolation, a consequence of the local use and management of a breed, reduces the effective population size and contributes to a genetic subdivision that can be detected through drift-based measures based on variations observed when using the microsatellite loci.
The degree of genetic differentiation among the breeds studied and the high levels of significance for the between-population F ST estimates indicate a relatively low gene flow between these breeds and, equivalently, a relatively high reproductive isolation. It is also clear that most of the genetic variation is inter-individual and only less than seven percent of the total variation is due to breed differences.
Migration values (Nm) can be interpreted in the context of the conservation and maintenance of the genetic variability of an animal as the upper limit of the number of migrants per generation which would allow for maintenance of the genetic differentiation observed between the breeds.
Although ancestral trunks are evident in studies based on morphological traits, e.g. Jordana et al. [21], they are not nearly as apparent when using neutral information to assign breeds to clusters such as the Brown trunk (both Asturian breeds, Alistana, Sayaguesa, Tudanca, etc.), Turdetanus trunk (Pirenaica and Bruna), or Iberian trunk (Avileña and Morucha). Results of this study are confusing since a similar magnitude of differentiation was found among breeds within a trunk or country (5.7% and 6.1% respectively). F SC and F ST are measures of the degree of resemblance between individuals within a breed. This resemblance can be interpreted as the differences between individuals in different breeds and expressed as the differences between breeds as a proportion of the total genetic variance (F ST ) or as a proportion of the trunk or country variance (F SC ). Conversely, the parameter F CT is a measure of the degree of resemblance between individuals of a trunk, or country, expressed as a proportion of the total variance. The degree of genetic differentiation among The lack of correlation between the group of genetic distance measurements which apply under a classical random drift-mutation model and the group which applies under the pure drift model (Tab. VI) is a consequence of the nature of the populations included in this study which cannot be considered as separate, closed populations. European cattle breeds must be considered to be closely related and the main factor describing their genetic variability is random drift. Under this assumption, genetic distances which reflect only the consequences of the genetic drift such as the F ST and Reynolds distances can be considered the most appropriate in measuring the degree of diversification [11], though they could also be inferred comparing the heterozygosity values found with the effective sizes of the breeds, which ranged from 21 (Sayaguesa) (Cañón, personal communication) to over 1 400 (Aubrac, Gasconne) (Renand, personal communication) [25].
Regarding the correspondence analysis, it should be noted that the most significant result was the very strong separation of the Sayaguesa, though this was dependent on the presence of a special allele. This result is obviously not very robust. A very distinct clade is the Gasconne, Pirenaica, Salers and Aubrac block. An Alistana and Mirandesa block was easily distinguished as well even though these two breeds were not very close to one another. Finally, there is the Mertolenga, Barrosa, Alentejana and Maronesa block, though it is less homogeneous than the two cited above.
Looking at Figure 2, where the contribution of each breed to diversity and clade is represented, it is clear that the reduction in diversity as a consequence of the extinction of a clade equals the sum of the reductions caused by the extinction of the breeds which composed the clade. This additive property occurs only if breeds are independent, e.g., the loss of the Mirandesa and Alistana has this property. However, the joint extinction of the Sayaguesa and Tudanca breeds reduces the total diversity by a greater magnitude than the sum of the two, so they cannot be considered as independent from each other. An interesting question is to what extent both procedures, correspondence analysis and the Weitzman approach, give similar results. It must be emphasised that a correspondence analysis exploits within-breed variability while the Weitzman approach does not. The correlation between the contribution of breeds to diversity, computed by the Weitzman procedure, and the correspondence analysis (inertia) when the complete set of 16 markers and INRA 032 were eliminated, was 0.54 and 0.64 ( p < 0.05), respectively. Moreover, if we consider the four breeds which contributed the most to diversity, three of them (Mirandesa, Sayaguesa and Alistana) were always present, independently of the analysis procedure used.
Two additional considerations with respect to the Weitzman diversity function refer to the caution needed when interpreting the graphical representation as a phylogenetic tree. Indeed it is only a representation of the diversity found at the current time and the sensitivity of the graphical representation from the model used to study the divergence among the breeds. The order of the breeds appearing in the tree strongly depends on the force (random drift or mutation) considered to be the determinant of the observed diversity. When F ST and Reynolds genetic distances were used, breeds ranked in a similar order (Spearman correlation = 1.0); however, no rank correlation was found to be significantly different from 0 between the breed-order computed using former distances based on effective population size and the breed-order calculated using those genetic distances which are based on the size of the alleles. It should be noted that, despite criticism of the Weitzman approach [49], it continues to be a valid method of determining priorities for conservation investments, if we know the relationships of breeds to each other, the survival probability distribution functions and the costs of improving breed survival.
A different argument showing that hypervariable microsatellites with a high level of heterozygosity and a large number of alleles, provide an efficient way of evaluating genetic diversity between the bovine breeds considered, can be demonstrated by observing their statistical power for breed-affiliation estimation. The results presented in Table V demonstrate the possibility of assigning breedidentities to anonymous bovine samples as has been previously shown in equines [9], cattle [29], sheep [8] and humans [44]. These molecular markers provide a powerful tool for measuring the genetic differentiation between breeds of domestic species.

CONCLUSIONS
The main objective of conservation genetics is to preserve variability within populations under the hypothesis of correlation between genetic variation and population viability. Avoidance of inbreeding has often been considered as synonymous with heterozygosity maintenance. Heterozygosity is retained through the maximisation of the inbreeding effective size, which primarily depends on the parental generation size. In populations with known pedigrees, as is the case in this study, maximising effective size while ignoring the ancestry of each individual may not be the most effective strategy for maintaining genetic diversity. Instead, a strategy that utilises all pedigree information would better serve to preserve genetic variation. Unfortunately, many of the local breeds included in this study have incomplete pedigrees and one or both parents of some individuals are unknown. In this context, the application of molecular information can solve some of the uncertainties since it is useful when identifying pedigree relationships and the genetically most important animals in order to maximise founder genome equivalents. Moreover, although additional information on productive, morphological, and fitness-related traits should be taken into account when ranking breeds for preservation purposes, strategies based on neutral markers can be efficient in maximising the retention of the highest number of neutral and non-neutral alleles in small populations [1].
This study contributes to the knowledge of the genetic diversity across different countries and to the molecular characterisation of limited-size populations, many of which are under threat of extinction. It also shows how microsatellites can be used to construct an appropriate measure of diversity function through the genetic relationships between populations. Additionally, the present study provides reasonable statistical power for breed assignment regardless of whether breeds are closely related or not. These issues allow for future management of the breeds to be based on greater knowledge of their genetic structuring and the relationships between their populations.