History of Lipizzan horse maternal lines as revealed by mtDNA analysis

Sequencing of the mtDNA control region (385 or 695 bp) of 212 Lipizzans from eight studs revealed 37 haplotypes. Distribution of haplotypes among studs was biased, including many private haplotypes but only one haplotype was present in all the studs. According to historical data, numerous Lipizzan maternal lines originating from founder mares of different breeds have been established during the breed's history, so the broad genetic base of the Lipizzan maternal lines was expected. A comparison of Lipizzan sequences with 136 sequences of domestic- and wild-horses from GenBank showed a clustering of Lipizzan haplotypes in the majority of haplotype subgroups present in other domestic horses. We assume that haplotypes identical to haplotypes of early domesticated horses can be found in several Lipizzan maternal lines as well as in other breeds. Therefore, domestic horses could arise either from a single large population or from several populations provided there were strong migrations during the early phase after domestication. A comparison of Lipizzan haplotypes with 56 maternal lines (according to the pedigrees) showed a disagreement of biological parentage with pedigree data for at least 11% of the Lipizzans. A distribution of haplotype-frequencies was unequal (0.2%–26%), mainly due to pedigree errors and haplotype sharing among founder mares.


INTRODUCTION
The Lipizzan horse breed was established in the Habsburg court stud at Lipica in 1580 [12] . This baroque horse breed is considered as the oldest European cultural horse breed. In the 17th century, Lipizzans began to spread out from Lipica to the wide area of central and eastern Europe. Numerous 636 T. Kavar et al. maternal lines that follow very strict breeding rules have been developed in addition to six classical stallion lines, but some of them have died out during the history of the breed. Founder mares of existing Lipizzan maternal lines were born in the 18th, 19th and 20th century and they originated from many different breeds, including Karst-, Spanish-, Italian-, Kladruber-and Arabian horses [12]. Only maternal lines established on the stud of Lipica before the 2nd World war are recognised as classical maternal lines. All the other maternal lines originate from studs established on the territory of the former Austro-Hungarian Empire [12]. The nowaday Lipizzan studs in this region are considered as traditional studs and they represent important centres for the preservation of the Lipizzan breed.
The history of the Lipizzan maternal lines has been well described [12], however, these descriptions are based mainly on historical-and pedigree data. Additional information elucidating the history of maternal lines could be provided by examining the nucleotide sequences of the mtDNA control region. In our previous study [7] we showed that the sequence variability of the mtDNA control region within the Lipizzan horse breed is sufficient for differentiation of the majority of the Lipizzan maternal lines, and that the control region is a suitable genetic marker for tracing back the history of the Lipizzan maternal lines.
In this study, we sequenced the mtDNA control region of representatives of the Lipizzan maternal lines from eight traditional Lipizzan studs (Lipica, Slovenia; Piber, Austria; Monterotondo, Italy; Szilvasvarad, Hungary; Beclean and Fagaras, Romania; Ðakovo, Croatia; Topolcianky, Slovakia). First, we described Lipizzan maternal lines in terms of mtDNA haplotypes, and estimated matrilineal diversity in the Lipizzan horse breed. Second, we examined the relationship among Lipizzan-, domestic-and wild-horse haplotypes, in order to provide new information about the origin of the Lipizzan maternal lines. Finally, we compared the sequence data obtained with those of the Lipizzan maternal line pedigrees and described the genetic structure in eight Lipizzan studs with the aim to prove reliability of pedigrees and to reconstruct the recent history of the Lipizzan maternal lines.

MATERIALS AND METHODS
The upstream part of the control region (GenBank X79547 [19], nt 15450nt 15834) was sequenced for 212 Lipizzans representing 56 maternal lines from eight traditional Lipizzan studs: Lipica, Piber, Monterotondo, Szilvasvarad, Beclean, Fagaras, Ðakovo and Topolcianky. We selected 1-6 blood samples per maternal line in order to cover wide portions of the pedigree. In the case of ambiguity (discrepancy of genetic and pedigree data), we sequenced additional samples from maternally related animals or, whenever such samples were not mtDNA analysis in Lipizzans 637 available, we analysed questionable samples again, to minimise the chance of errors during the processing of samples in the laboratory. To improve the estimation of the genetic relationship among mtDNA haplotypes, we sequenced the downstream part of the control region (GenBank X79547 [19], nt 16351nt 16660) of one sample per each distinct haplotype, according to the upstream part of the control region.
mtDNA was extracted following a standard procedure [18]. PCR amplification of the upstream part of the control region was performed on an MJ Research PTC100 thermal cycler, with an annealing temperature of 52 • C for 30 s.
Reactions (20 µL) contained template DNA (50 ng), 1 × PCR buffer II (Perkin Elmer), 1.5 mM MgCl 2 , 20 µM dNTPs, 0.4 U Taq Polymerase (Perkin Elmer) and 10 pmol of each primer (HDF: 5 -AGTCTCACCATCAACACCCAAAGC-3 and HF: 5 -CCTGAAGTAGGAACCAGATG-3 ). PCR fragments were sequenced using the ABI PRISM BigDye Terminator Cycle Sequencing Ready Reaction Kit and the ABI PRISM 310 DNA Sequencer (PE Applied Biosystems). For the sequence analysis of the downstream part of the control region, amplification of the entire control region was performed using the HDF and HDR primer pair, (HDR: 5 -ACTCATCTAGGCATTTTCAGTG-3 ) and an annealing temperature of 49 • C for 60 s.
Sequences of the upstream (385 bp) and downstream (310 bp) part of the control region were aligned using the reference sequence (GenBank X79547, [19]). Kimura 2-parameter distances were calculated [9] and a Neighbour-joining (NJ) tree [15] was drawn with the Phylip program packages [4]. The same program was used to perform bootstrap analysis on 1 000 data sets. For the graphical presentation of the tree, the Treeview program [14] was applied.
The Lipizzan haplotypes obtained were compared to equine control region sequences retrieved from the GenBank database: AF064627-32, AF326635-86 [17], AF072975-96 [10], AF169009-10, AF014405-17 [8], AF056071 [8], D23665-6 [5], D14991 [5], AF055876-9 [13], AF132568-94 [2], AF431965-9, X79547 [19], representing sequences of domestic horses (a variety of breeds from all over the world), sequences of Przewalski's wild horses and sequences of late Pleistocene horses from Alaska. Multiple alignment was performed using Clustal W [16]. Identical sequences were joined into the same haplotype, regardless of the sequence length. The number of sequences joined in each haplotype was accounted for as the haplotype frequency. The NJ tree was constructed as described above. Median networks were constructed using an algorithm for speedy construction by hand [1]. They were generated from the binary data matrix, which comprises zeros at positions where the sequence haplotype in question matches the consensus type and ones where a (usually transitional) variant is present. In the process of construction, compatibilities between characters become manifest in simple branching, whereas incompatibilities increase dimensionality by doubling parts of the network. Unnecessarily large networks were avoided by reductions of some of the most obvious recurrent mutations (by splitting characters that account for hypothetical multiple hits).
Pedigree data were recorded during stud visits, and pedigrees were reconstructed by tracing-back the maternal line to the individual's founder mare. An animal with an unknown mother was defined as a founder mare. Haplotype frequencies were calculated for a sample of 416 breeding mares, which were present at eight traditional Lipizzan studs at the time of sample collection. For 212 mares, genetic data were obtained by sequencing and for the rest only pedigree data were used.

RESULTS
Sequence analysis of 212 Lipizzan horses revealed 37 distinct mtDNA haplotypes. Twenty-four of them are presented in Table I, the remaining 13 haplotypes were described previously [7]. The alignment of 37 distinct Lipizzan sequences with the reference sequence (GenBank X79547, [19]) showed that the majority of polymorphic sites (47) was found in the upstream part of the control region, but only 14 in the downstream part. Many haplotypes with sequence differences in the upstream part have identical sequences in the downstream part. Lipizzan haplotypes differ from each other by 1 to 24 nucleotides or from 0.14 to 3.5%. They are clustered into four groups: C1, C2, C3 and C4 (Fig. 1). The bootstrap values were high for groups C2, C3 and C4 but values for the C1 group were slightly lower. However, the integrity of the C1 group is supported by high sequence identity in the downstream region. Within the C1 group several low bootstrap values were observed, therefore, we did not define subgroups in this group, although at least two subgroups were well supported by relatively high bootstrap values. One of them consisted of the Slavina, Dubovina and X haplotypes and the other one of the G, F and Trompeta haplotypes (Fig. 1). On the contrary, the C2 and C3 groups could be further divided into C2a, C2b and C3a, and C3b subgroups respectively. Such division is well supported by high bootstrap values, and almost identical sequences in the downstream part of the control region.
A comparison of the Lipizzan haplotypes with other equine haplotypes showed that the other haplotypes cluster into the same four groups (C1-C4) including several subgroups (Fig. 2). Lipizzan haplotypes can be found in almost all subgroups with only a few exceptions. For example, they are not present in the C3c group, which consists of five haplotypes determined in wild horses from the late Pleistocene from Alaska [17]. However, these five haplotypes are related to the C3b group of haplotypes, to which the Lipizzan haplotype Thais also belongs. Table I. Polymorphic sites within the upstream (nt 15450-nt 15834) and downstream (nt 16351-nt 16660) part of the control region for 24 Lipizzan mtDNA haplotypes and the reference sequence (GenBank X79547, [19]). The additional 13 Lipizzan haplotypes were described previously [7]. The DNA sequences of all 37 Lipizzan mtDNA haplotypes are available in GenBank under Acc. No. Y057408-34, AY057435-6, AF168689-98 and AF168699-705.   Lipizzan haplotypes are in general similar or even identical to other domestic horse haplotypes (Fig. 3). From the networks we can see that some haplotypes are more frequent than the others. They are common for several horse breeds and usually do not have unique polymorphic sites. Such haplotypes, for example, the Gaetana, Monteaura, Allegra and O haplotypes, are present in the Lipizzan breed as well as in the three to four other horse breeds. In the networks, more frequent haplotypes are usually surrounded by haplotypes which differ from them by one to two nucleotides (Fig. 3). Therefore, for each of these haplotypes one or two unique polymorphic sites are characteristic.

X79547 T T A C C A C G A A T C T G T T C A A T G A G C T C C G G A C A C C A C T A A T C C T T C -A T
The distribution of the mtDNA haplotypes among Lipizzan studs is biased (Tab. II). Only the Batosta haplotype was present in all the studs but many haplotypes were observed only in one or two studs. The haplotype frequency distribution in the Lipizzan breed was unequal, reaching from 26% for the most   (Fig. 3a) and median network of the C4 main group haplotypes (Fig. 3b). Data matrix for the median network of the C2 main group haplotypes is based on sequence data of the upstream part of the control region, while a data matrix for the network of the C4 main group is based on the sequence data from both parts of the control region. Haplotypes are represented by circles, proportional to haplotype frequencies. Circles representing haplotypes found in the Lipizzan horse breed are black. The numbers on the branches denote nucleotide positions where mutations have occurred (positions > 16 000 are in black boxes). Reticulations within the network indicate ambiguity in the topology and parallel lines in a single reticulation represent the same mutation. common haplotype, Capriola, present in 13 lines (Tabs. II and III) to 0.2% for the V, C and Z haplotypes (Tab. II).
According to the pedigree data, Lipizzans from eight traditional studs belong to 56 maternal lines. In 38 lines only one haplotype was found, showing no discrepancy between genetic and pedigree data (Tab. III). However, in 11 lines two haplotypes were identified and in 7 lines three haplotypes were identified. Different haplotypes found within the same maternal line indicate pedigree errors. Therefore, at least 25 pedigree errors have occurred within the Lipizzan breed in the past. Because of the vertical transmission of errors through generations, we estimate that the biological origin of at least 11% of the Lipizzans is in disagreement with their pedigree data.

DISCUSSION
This study confirmed our preliminary assumption that the information collected by sequencing the upstream part of the mtDNA control region (nt 15450nt 15834) is sufficient for characterisation of the mtDNA haplotypes. However, due to the high level of homoplasy observed in the upstream part, additional sequence information from the less variable downstream part (nt 16351nt 16660) of the control region contributed considerably to the estimation of the genetic relationship among haplotypes, reflected in higher bootstrap values in the phylogenetic tree (Fig. 1) and in a slightly different relationship among groups and subgroups ( Fig. 1 and Fig. 2) than in NJ trees based only on the upstream part of the control region [10,17].
The presence of 37 mtDNA haplotypes reflects the broad genetic base of Lipizzan maternal lines. According to historical data, numerous maternal lines from founder mares of different breeds were established during the Lipizzan breed history, therefore, a high number of mtDNA haplotypes in Lipizzans met our expectations. The number of haplotypes identified in the Lipizzans exceeds even the number of haplotypes found in Arabians (29)(AF132568-94, AF064627/29). High matrilineal diversity was also observed in the Lipizzan breed as in other domestic horse breeds [2,10,17] and in 1000 to 2000 years old Viking Age horse bones found in a restricted area in northern Europe [17]. The reasons for this are an ancient origin of horse maternal lineages [10,17], incorporation of numerous matrilines into the gene pool of domestic horses during the process of domestication (multiple origins scenario) [17], and little congruence of haplotype with breed [10,17].
Only the haplotypes (or their derivatives) which survived two recent population bottlenecks (one was the mass extinction of large-bodied animals around 8 000 B.C., and the second one was horse domestication which took place around 4 000 B.C. [3,11]) can be found in present day domestic horses. According to the control region mutation rate (2-4 × 10 −8 per site per year) [6], we assume that haplotypes identical to the haplotypes of early domesticated horses, so called "ancestral" haplotypes, can still be found in several Lipizzan maternal lines and in other breeds. We suggest that more frequent haplotypes, for example, Gaetana, Allegra, Monteaura and O (Fig. 3), which are characterised by the lack of unique polymorphic sites, have not changed at least since horse domestication. Haplotypes which can be found around those "ancestral" haplotypes in the network probably represent their evolutionary derivatives. However, the boundary among "ancestral" haplotypes and their "derivatives" is often unclear, because of the chance that horses with closely related haplotypes, differing only by one or two polymorphic sites, have been domesticated, too (Fig. 3b).
Clutton-Brock's model of domestication [3] suggests that the wild stock from which all domestic animals were bred, inhabited the plains of southern Russia, from the Ukraine to the region of Turkestan, and the earliest domesticated horses spread out from this arc and all the different types and breeds of horse that are known today were developed as a result of artificial selection. However, an alternative model was also suggested by the same author [3]: this is based on the assumption that there was a geographical cline in the populations of wild horses with those in the northern part of the range being smaller and more sturdy than those in the south [3]. No obvious phylogeographic structure in domestic horses has been observed [10] and "ancestral"haplotypes can be found in many different horse breeds, therefore, domestic horses could arise either from a single large population or from several populations provided strong migrations during the early phase after domestication. The first evidence that the wild horse populations might have been phylogeographicaly structured before domestication has already been introduced [17] (all mtDNA haplotypes determined in the late Pleistocene horses belong to the C3 group), but probably this single evidence is not sufficient to confirm or completely reject either one of the horse domestication models [3].
A comparison of mtDNA sequence data with the pedigrees of Lipizzan maternal lines has revealed only 37 different mtDNA haplotypes in 56 maternal lines. This is the consequence of haplotype sharing among maternal lines (Tab. III). Some of the maternal lines with identical haplotypes possibly belong to the same maternal line which is, due to the incomplete pedigree data, split into different maternal lines. The joining of such lines into the same maternal line could be supported by additional pedigree data which would close the gaps in the pedigree.
For the present genetic structure in eight Lipizzan studs, biased distribution of haplotypes (Tab. II), with many private haplotypes and only one haplotype found in all studs, is characteristic, thus suggesting that the independent status of studs has been well preserved up to now. It seems that the exchange of mares among studs was quite restricted. Consequently, private haplotypes remained mainly in the country of the founder mare origin, although some exceptions have been observed.
A comparison of genetic data with pedigree data showed some inconsistencies in the distribution of 37 mtDNA haplotypes within 56 maternal lines (Tab. III). A relatively short time of Lipizzan maternal lines existence (not more than 400 years) in combination with a high degree of nucleotide sequence differences (at least at two sites) among haplotypes within each maternal line (according to the pedigree) suggests that all inconsistencies could be explained solely by pedigree errors. In this context, heteroplasmy and subsequent purifying selection/drift, appear to be a very unlikely alternative explanation for the presence of more than one haplotype within the same maternal line. Therefore, these results showed that pedigrees of Lipizzan maternal lines were not completely reliable. We identified 25 pedigree errors, but it is likely that some additional errors remained hidden, due to the restriction that pedigree errors cannot be found in the uni-lineal pedigrees. Unfortunately, such parts of the pedigree occurred in the majority of 56 Lipizzan matrilineal pedigrees. Therefore, biological founder mares could be different from founder mares identified by pedigree data. On the contrary, it is possible that some of the Lipizzan maternal lines, which were already assumed to be extinct according to the pedigree data (e.g. Rozca, Khel il Massaid, Mersucha), biologically survived as a consequence of pedigree errors. Pedigree errors and the fact that some maternal lines shared the same haplotype, had an impact on the unequal distribution of haplotype frequencies in the Lipizzan population (Tab. II). Several haplotypes were found only in one or two breeding mares. These rare haplotypes could disappear very soon, if the discrepancy among genetic and pedigree data is not taken into consideration.
In spite of the fact that genetic and pedigree data are in some cases in disagreement, we do not recommend radical revision of Lipizzan maternal line pedigrees. In addition, genetic data obtained by the analysis of present horses would not allow reliable corrections of the pedigrees. Therefore, we suggest that the pedigree information about the maternal origin of each breeding mare should be accompanied by the information about its mtDNA haplotype. This would not require typing of all of the animals in order to avoid negative consequences of pedigree errors (e.g. extinction of rare maternal lines); occasional testing of some randomly selected animals from each line would be sufficient.