Differentiation among Spanish sheep breeds using microsatellites

Genetic variability at 18 microsatellites was analysed on the basis of individual genotypes in five Spanish breeds of sheep – Churra, Latxa, Castellana, Rasa-Aragonesa and Merino -, with Awassi also being studied as a reference breed. The degree of population subdivision calculated between Spanish breeds from FST diversity indices was around 7% of total variability. A high degree of reliability was obtained for individual-breed assignment from the 18 loci by using different approaches among which the Bayesian method provided to be the most efficient, with an accuracy for nine microsatellites of over 99%. Analysis of the Bayesian assignment criterion illustrated the divergence between any one breed and the others, which was highest for Awassi sheep, while no great differences were evident among the Spanish breeds. Relationships between individuals were analysed from the proportion of shared alleles. The resulting dendrogram showed a remarkable breed structure, with the highest level of clustering among members of the Spanish breeds in Latxa and the lowest in Merino sheep, the latter breed exhibiting a peculiar pattern of clustering, with animals grouped into several closely set nodes. Analysis of individual genotypes provided valuable information for understanding intra- and inter-population genetic differences and allowed for a discussion with previously reported results using populations as taxonomic units.


INTRODUCTION
Investigation of genetic relationships among populations has traditionally been based on the analysis of allele frequencies at different loci as an estimate of genetic variability, since a comparison of population parameters allows for an inference of their evolutionary history. Highly variable loci such as microsatellites provide a large amount of genetic information permitting alternative approaches based on individual genotypes, which help to clarify the genetic relationships between populations or breeds. Two strategies have been frequently used: the assignment of individuals to populations and the analysis of inter-individual distances. Different assignment procedures have been developed, including those reported by Paetkau et al. [16], Rannala and Mountain [18] and Cornuet et al. [4]. These methods show a large variety of applications (reviewed by [22]) such as the identification of the source population of an individual genotype and the evaluation of population differentiation. For their part, inter-individual distances based on the proportion of shared alleles allow for the construction of dendrograms showing the genetic relationships among individuals with no assumptions concerning previously defined populations. In fact, they have proved useful in the analysis of human [2] and animal populations [6,12,13].
The genetic relationships of Spanish sheep breeds have been previously studied from population parameters [1]. In this paper, we present an analysis based on individual genotypes at 18 microsatellite sequences with a view to obtaining a deeper insight into relationships within and between breeds. An assignment test was performed, using several methods, to evaluate their accuracy in the identification of breeds from genotypes. The information derived from this analysis was also used with a view to comparing breeds. Furthermore, intra-and inter-population relationships were investigated on the basis of pairwise individual distances derived from the proportion of shared alleles.
The Spanish breeds studied are classified according to morphological aspects as follows: Merino type, "entrefino"type (Rasa-Aragonesa and Castellana) and "churro" type (Churra and Latxa). For a description of the breeds, see [1].
Average heterozygosities were computed and population subdivision was estimated through the Wright F ST diversity indices obtained both by variance and heterozygosity methods, using the MICROSAT programme [14]. The former methodology estimates the F ST statistic as the standardised variance in allele frequencies among populations, while the latter measures the reduction in heterozygosity of subpopulations due to genetic drift.
Individuals were assigned to populations using the Cornuet et al. [4] GENECLASS programme, which allows for different estimation procedures, which have been thoroughly described by these authors and are briefly indicated here. The frequency method [16] assigns a genotype to the population in which it is most likely to occur on the basis of the allele frequencies in the candidate populations. The Bayesian method [18] computes the likelihood of a genotype in each population based on the probability density of population allele frequencies. The distance methods assign an individual to the population showing the closest genetic relationship to it. Six distances are used by the programme, which are adapted to obtain individual-population estimations: the Nei standard (D S ) and minimum (D m ) distances, Cavalli-Sforza and Edwards chord distance, D A of Nei et al. [15], D AS of Chakraborty and Jin [3] and (δµ) 2 of Goldstein et al. [10].
Following Bowcock et al. [2] and using the MICROSAT programme, a subset of animals, genotyped for all the loci, was used to calculate the proportion of alleles shared by two individuals averaged over loci (Ps), a measure of distance between two individuals being given as (1 − Ps). The neighbourjoining methodology [19] was applied and a tree was constructed from the pairwise distances using the PHYLIP package [8].

Genetic variability and population subdivision
Locus heterozygosity averaged over breeds ranged from 0.63 (BM1258 marker) to 0.86 (MAF70) for the Spanish sheep, with a mean estimate of 0.77, while overall mean heterozygosity including Awassi sheep was slightly lower (0.75).
F ST estimates in Spanish sheep, indicators of population subdivision, reached similar values when calculated by variance (0.073) and by heterozygosity methods (0.068). An estimation was also obtained including Awassis, and the resulting average F ST values were slightly greater (0.092 and 0.087 by variance and heterozygosity methods, respectively). Table I shows the results of the assignment test obtained through different procedures using data from all 18 microsatellites. Accuracy was generally high, with a percentage of individuals correctly assigned to breeds of over 95% in all but one analysis. The best scores (> 99%) were computed with the Bayesian The assignment performance of each of the 18 microsatellites was analysed using the Bayesian method and results are shown in Table II. The percentage of correct assignment based on a single locus varied from 31.84% (ADCYC) to 61.05% (BM6444). On the basis of these results two separate groups of loci were established and each set of microsatellites was evaluated by means of the Bayesian method. The nine loci with the highest individual scores (CSSM66, BM4621, TGLA13, OarFCB11, MAF70, MAF33, TGLA53, BM143, and BM6444), when used together, correctly assigned 98.88% of individuals, as opposed to 92.51% correct identification for the nine loci with the lowest individual scores (ADCYC, MAF65, OarCP34, MAF64, BM1258, CSSM6, ILSTS002, MAF36, and MAF48). Table III. Comparison of average − log 10 likelihood (with standard error, SE) that genotypes sampled from a given breed occur in the same breed (left), contrasted with average − log 10 likelihood that genotypes sampled from another breed occur in the same given breed (right).

Individual-breed assignment: comparison among breeds
In order to compare breeds, following Cornuet et al. [4], we analysed results for the Bayesian method based on all 18 microsatellites as shown below. This method calculates the likelihood of observing a genotype in a breed and expresses the assignment criterion as minus the decimal logarithm of that value. Results obtained for the assignment criterion to a particular breed (e.g. Awassi) were separated into two sets of values corresponding on the one hand to individuals sampled from the same breed (Awassi animals) and on the other to individuals sampled from another breed (non-Awassi animals).
This procedure was performed for each of the six breeds and the distributions of the resulting assignment criteria were plotted in each case. Awassi and Merino sheep showed the extreme patterns for this kind of representation, which are shown in Figure 1 (A and B) whereas the remaining breeds produced intermediate patterns (not shown). Table III summarises the results for all the breeds, and includes the average of the assignment criteria to a reference breed calculated from individuals sampled in the reference breed (shown on the left) or from any other breed (on the right).
Greater uniformity among the sampled Awassi genotypes was indicated by their log-likelihood average (15.70 ± 0.29), which was much lower than that obtained for Spanish sheep (≥ 21.13), among which the largest heterogeneity was calculated for the Merino breed (24.50 ± 0.27).
Comparison of the distributions of assignment criteria as in Figure 1 (A and B) gives us information about the divergence between a particular breed and the others. This was highest for the Awassi breed, with no overlapping of distributions, and with log-likelihood averages of 15.70±0.29 vs. 46.45±0.41 for Awassi and non-Awassi animals, respectively (Tab. III). Some overlapping Figure 1. Distributions of the assignment criteria to the Awassi (A) and to Merino (B) breeds. In A and B the histogram above represents the distribution of the loglikelihood that genotypes sampled from a given breed occur in the same breed, whereas the histogram below represents the distribution of the log-likelihood that genotypes sampled from another breed occur in the same given breed. did however appear in the analysis of the Spanish breeds and was most noticeable in Merino sheep, with respective log-likelihood averages of 24.50 ± 0.27 vs. 32.78 ± 0.25 for Merino and non-Merino animals. Inter-individual genetic distances for the whole population showed considerable variation (0.28 to 0.97), while estimates between animals from different breeds varied less (0.47 to 0.97). Average values between animals from different breeds covered a narrow range: from 0.72 (average of distances for Awassi/Churra pairs) to 0.76 (for Castellana/Latxa pairs). The mean pairwise distance between individuals within breeds was 0.65 (Tab. IV), while for animals from the whole population it was 0.73 and for animals from different breeds, 0.75. Figure 2 shows the neighbour-joining tree constructed from the pairwise inter-individual distances. It reveals a considerable degree of breed differentiation: out of the 190 individuals, 147 (77%) formed discrete clusters, each one coinciding with a particular breed.

Clustering analysis of individuals
The first split separates Awassi from Spanish sheep and all but one of the Awassis were found in this cluster. It was the only case where the node was exclusive to animals of a particular breed, with no sheep from Spanish breeds included in it.
The percentage of individuals from a Spanish breed grouping into a single clade ranged from 54% in Merino to 100% in Latxa sheep, the only case where all the animals from a Spanish breed clustered together.
The Merino breed showed a peculiar pattern of clustering, for only 19 out of 35 Merinos were found in a single clade, but the majority of them were included in few nodes. Most of the Castellana animals were grouped amongst themselves or with Merinos. Finally, Rasa-Aragonesa and Churra sheep showed a similar degree of clustering (61% and 64%, respectively) but with several animals dispersed among the nodes of the other Spanish breeds.

Genetic variability and population subdivision
Overall mean heterozygosity estimated from 18 microsatellites over the five Spanish breeds (0.77) reflects a notably high variability, a characteristic of microsatellites which derives from a greater mutation in comparison with other genetic markers, which makes them a valuable instrument in genetic differentiation analyses. Although some limitations have been indicated for microsatellite loci such as size homoplasy and constraints on allele-length variation, which would cause an underestimation of genetic differentiation [2,6], such limitations seem to affect largely divergent populations rather more than breeds like ours with close evolutionary relationships [21].
Comparison of average and total heterozygosities indicated that most genetic diversity (93%) had an intrapopulational origin, in accordance with previous findings for microsatellite sequences and also for other markers. Such results were also evident from the comparison of the average inter-individual distance within breeds (0.65) and the mean value between animals from different breeds (0.75), calculated from the analysis of alleles shared by individual genotypes.
Genetic differentiation among breeds was estimated through the computations of F ST statistics. Other estimates have been developed under the assumption of a stepwise mutation model, presumably more appropriate for microsatellite loci. However, it seems that the mutation model at these sequences is irregular [9] and for the particular case of closely related populations, genetic drift rather than mutation seems to account more for genetic differentiation in microsatellite distributions [17]. Furthermore, F ST values allow for a comparison with previous studies. In this regard, genetic differentiation among Spanish breeds (about 7% of total diversity) was of the order of magnitude found at microsatellite loci in other species, though slightly lower than those obtained in cattle by MacHugh et al. [13] or in pigs by Laval et al. [12], indicating a closer relationship. Moreover, the inclusion of Awassi sheep in the computation of genetic differentiation brought about an increase in the estimates by up to 9%, in accordance with their more distant relationship with Spanish sheep. All these values must be evaluated in the particular context of microsatellites since, as Hedrick [11] points out, the high within-population variability at these markers may result in a low magnitude of differentiation measures.

Individual-breed assignment
Assigning individuals to populations has multiple applications, as reviewed by Waser and Strobeck [22], among which we may cite the identification of the source population of a given genotype and the evaluation of population differentiation. Both are of interest to our study, and, from a practical point of view, we would mention the identification of the breed of an animal product (e.g. a carcass) when this has an economic significance, as is the case with products with designation of origin. Davies et al. [5] have reviewed the advantages and limitations of assignment procedures made possible by the large amount of genetic information available from markers such as microsatellites. These procedures are based on multilocus genetic data and use both individual genotypes and population parameters.
A high degree of accuracy in breed assignment was estimated from 18 loci in our study. When the different methods were compared, some-but not total-agreement was found with results reported by Cornuet et al. [4] who made an evaluation of these procedures on the basis of simulated data. As in these authors' results, the Bayesian method was the most efficient, while the Goldstein et al. (δµ) 2 distance method showed a markedly low performance. This low efficiency may be explained by the fact that, as indicated by Goldstein and Pollock [9] , the (δµ) 2 distance performs better for largely divergent than for closely related populations.
The main difference between our results and those of Cornuet et al. [4] concerns the Nei et al. D A distance method, which in our study showed great accuracy, close to that of the Bayesian method and greater than that of the frequency method; in contrast Cornuet et al. [4] indicated that according to preliminary results, the D A performed far below likelihood-based methods.
A possible explanation for this difference might be the distinct nature of the data analysed. On the one hand Cornuet et al. [4] used simulated data, obtained from the assumption of exact Hardy-Weinberg proportions at all loci and no linkage disequilibrium. On the other hand, genotypes were sampled in our study from real populations and although markers were selected on the basis of non-linkage, linkage disequilibrium was significant from our data in a few cases, in accordance with genome-wide linkage disequilibrium detected in animal populations, as in the case of Farnir et al. [5] in cattle.
Moreover, Hardy-Weinberg contrasts had revealed deviations from equilibrium for several microsatellites analysed in our study [1] (Out of the 108 contrasts, seven tests showed significant deviations after a Bonferroni correction for the number of multiple tests). In this situation, distance methods, which do not rely on a HWE assumption, may produce better results. Moreover, Takezaki and Nei [21], who evaluated genetic distances in phylogenetic analyses, pointed out the good performance of Nei et al. D A distance under different circumstances.
A degree of accuracy (approximately 99%) was also obtained when the assignment test was based on nine loci which had been selected for their high individual performance, while the contrast test performed using the nine loci with lowest individual scores revealed a drop -although not a serious one -in efficiency (to approximately 92%). We would expect intermediate efficiency for nine randomly chosen microsatellites, a number which has a practical interest from an analytical point of view, since the methodology widely used in microsatellite genotyping ("one lane -four colours") permits the simultaneous analysis of several loci ("multiplex"), and nine is a suitable number of markers for this technique. Furthermore, a number of about 10 microsatellites was also suggested as sufficient for a Bayesian assignment method in the theoretical study by Cornuet et al. [4], for conditions close to those in our study regarding population parameters such as heterozygosity and F ST values as well as number of individuals sampled. The results described so far support the idea that microsatellites are a valuable tool for individual-breed assignment, and that considerable accuracy is obtained from a fairly low number of loci displaying high laboratory efficiency.

Clustering analysis of individuals
Another strategy for extracting information from individual genotypes is the analysis of inter-individual distances based on the proportion of shared alleles. This methodology allows for the construction of dendrograms that show the genetic similarity among individuals with no assumptions concerning previously established populations. In recent years, this peculiarity has made them, a valuable complement to traditional studies based on populations as genetic units, helping to clarify intra-and inter-population relationships.
Various studies based on the investigation of genetic relationships among individuals have been performed in humans and animals, and have shown variable clustering levels, which are generally higher for more divergent populations [2,6,12,13].
The present study concerns five Spanish sheep breeds and Awassi, included as a reference breed. The tree constructed from pairwise individual distances showed a remarkable breed-clustering pattern, taking into account the close relationship among the Spanish breeds analysed. Although the number of individuals forming discrete clusters, each one coinciding with a particular breed (77%), was lower than the scores for correct assignment already discussed, we have to take into consideration the methodological differences between both approaches. The assignment methods are based on individual-population relationships that are defined by their allele frequencies, whereas the second kind of analysis relies on inter-individual distances and makes no assumption about previously defined populations.
In accordance with expectations, Awassi sheep branched off from Spanish breeds in a private node. The greater divergence of Awassis was also reflected in the analysis of the Bayesian assignment criterion as represented in Figure 1A. Another characteristic shown by Awassi sheep was their higher uniformity as indicated both from average shared allele distances (Tab. IV) and from log-likelihood average (Tab. III). However, this lower genetic variation must be carefully evaluated since animals were sampled from Spanish Awassi populations and we cannot disregard a possible foundational bottleneck effect.
Examination of Spanish sheep in the tree revealed frequent cases of animals dispersed among other breed nodes, in accordance with their close relationship. Moreover, Spanish breeds showed different patterns of clustering and these results offer complementary information to that obtained from a previous analysis based on populations [1].
It should be noted that Merinos differed somewhat from other breeds, for they appeared at various nodes, in accordance with their greater within-breed interindividual distance reflecting more variability. Furthermore, several Merinos were found dispersed in other breed clusters and, out of these, all but one appeared related to Castellana sheep.
In accordance with results from the previous analysis of populations as taxonomic units, examination of the tree reveals that individuals of the "entrefino" type do not appear grouped at a separated node from the Merino and Churro types. It is to be noted that such a result also applies here to Castellana sheep, which were not included in previous studies and which showed a closer genetic relationship to Merinos than to Rasa-Aragonesas. These data are consistent with the idea that both Merinos and Churros have a non-negligible degree of genetic relationship with breeds of the entrefino type. This then contributes to weakening the hypothesis of an independent genetic origin for the latter, which had been suggested by Sánchez and Sánchez [20] on the basis of morphological traits.
Interestingly, the Churra and Latxa breeds, which belong to what is referred to as the "churro" type showed a very different pattern of clustering. Latxas were the only Spanish sheep grouped together into a single cluster and no Latxas were found outside that node. This result is indicative of greater uniformity within the Latxa breed in comparison with Churras, which may be a result of the particular historical background of the former indicating greater isolation than other Spanish sheep during the evolutionary process. On the contrary, Churra sheep showed a lower level of clustering with various animals scattered among the other Spanish breeds, which is in accordance with breed assignment data already discussed, all these results suggesting a greater gene flow between Churras and other Spanish sheep in comparison with Latxas.
All these findings somewhat seem to contrast with the small difference in genetic variation previously detected between the two breeds from population parameters such as the average number of alleles and average heterozygosities [1].
The data reported here provide a valuable population insight, and help in assessing inter-population dispersal, supporting the idea that the analysis of inter-individual relationships is a helpful complement to allele-frequency-based population studies.