Genomic diversity and population structure of the indigenous Greek and Cypriot cattle populations

Background The indigenous cattle populations from Greece and Cyprus have decreased to small numbers and are currently at risk of extinction due to socio-economic reasons, geographic isolation and crossbreeding with commercial breeds. This study represents the first comprehensive genome-wide analysis of 10 indigenous cattle populations from continental Greece and the Greek islands, and one from Cyprus, and compares them with 104 international breeds using more than 46,000 single nucleotide polymorphisms (SNPs). Results We estimated several parameters of genetic diversity (e.g. heterozygosity and allelic diversity) that indicated a severe loss of genetic diversity for the island populations compared to the mainland populations, which is mainly due to the declining size of their population in recent years and subsequent inbreeding. This high inbreeding status also resulted in higher genetic differentiation within the Greek and Cyprus cattle group compared to the remaining geographical breed groups. Supervised and unsupervised cluster analyses revealed that the phylogenetic patterns in the indigenous Greek breeds were consistent with their geographical origin and historical information regarding crosses with breeds of Anatolian or Balkan origin. Cyprus cattle showed a relatively high indicine ancestry. Greek island populations are placed close to the root of the tree as defined by Gir and the outgroup Yak, whereas the mainland breeds share a common historical origin with Buša. Unsupervised clustering and D-statistics analyses provided strong support for Bos indicus introgression in almost all the investigated local cattle breeds along the route from Anatolia up to the southern foothills of the Alps, as well as in most cattle breeds along the Apennine peninsula to the southern foothills of the Alps. Conclusions All investigated Cyprus and Greek breeds present complex mosaic genomes as a result of historical and recent admixture events between neighbor and well-separated breeds. While the contribution of some mainland breeds to the genetic diversity pool seems important, some island and fragmented mainland breeds suffer from a severe decline of population size and loss of alleles due to genetic drift. Conservation programs that are a compromise between what is feasible and what is desirable should focus not only on the still highly diverse mainland breeds but also promote and explore the conservation possibilities for island breeds.

impact on agricultural development, diet alterations, cultural heritage and social structure [1,2]. Molecular evidence [3] has indicated two domestication events in cattle from which the taurine (Bos taurus) and indicine (Bos indicus) species arose, sharing the aurochs (Bos primigenius) as common ancestor ~ 250,000 years ago [4]. During the introduction of domesticated cattle into Europe, the Mediterranean coast played a crucial role [2,5]. Therefore, Greece and Cyprus, which are closely located to the core region of the domestication of cattle, i.e. in the near East, represented an important crossroad for the dispersion of human groups and their herds from Anatolia towards Europe [6][7][8].
Historically, the Southern Balkan peninsula has been characterized by the free movement of humans and animals, especially in the areas near the current borders, from almost the Neolithic throughout Antiquity, the Roman, Byzantine and Ottoman empire times to nearly 40 years before the present [6,9]. Pastoralism, the seasonal movement of herds and people to exploit the grazing lands across the Balkan region, is a common practice since centuries [10,11]. Migration events that enhanced the gene flow among the domesticated cattle populations, genetic drift, physical isolation due to geographic barriers [12] during the above historic periods, and human weak selection pressures led to the formation of well-adapted local cattle breeds in rather marginal and harsh environments [6,13].
The farming and breeding practices associated with these local breeds differ substantially from those in highperforming breeds [14]: (i) they are fairly undifferentiated and unselected; (ii) pedigree records are incomplete or do not exist; (iii) breeding associations either do not exist or are recently established for conservation purposes only; (iv) accordingly, there are no classical breed standards but a breed is defined by common origin; (v) systematic and standardized records of traits do not exist; and (vi) the infrastructure necessary for such recording is rudimentary. Therefore, these local indigenous populations of South East European cattle do not meet all the conditions to be referred as breeds. However, to avoid confusion between the terms breed, strain and population, we will use the term 'breed' only with the adjectives 'indigenous' , 'local' or 'rare' or without an adjective. These local breeds have been considered to be under constant risk mainly since the 1970s and onwards. In fact, in some extreme cases, the current population size of some local breeds consists of only a few animals. This global trend [15][16][17] reflects also, in generic terms, the situation of Greek local breeds.
In Greece and Cyprus, economic and social conditions as well as geomorphological and climatic reasons have not allowed the development of high-yield local cattle populations since 1950. After the 1960s, with the implementation of artificial insemination, many indigenous populations were crossbred with highly selected commercial breeds [18,19]. For example, Brachyceros, which is a Greek autochthonous breed with short horns resembling the Albanian and Buša cattle, was crossed with Swiss Brown in some regions [20]. In the middle of the 20th century, eight indigenous cattle breeds were reported in Greece. Nowadays, four of these are considered as officially extinct (Tinos, Andros, Chios, Kerkyra), three as threatened (Brachyceros, Katerini and Sykia) and one (Kea) as a rare breed [21,22]. The aforementioned breeds were mainly bred in mountainous regions and/or on islands with poor infrastructure. The latter combined with geographical and natural barriers as well as the absence of artificial insemination led to reproductive isolation, fragmentation, and gradual depletion of the genetic diversity in these breeds [23].
Genetic diversity studies during the past years have been conducted in a large number of domestic cattle breeds [13,[24][25][26][27][28][29][30]. Improving our knowledge on the genetic diversity within and among local breeds is considered an issue of crucial importance for enhancing their efficient use with regard to sustainable animal farming in a harsh and less intensified environment, and for implementing further conservation programs [31]. Since the indigenous breeds exhibit adaptive ability to their local environment and remarkable longevity, the genetic pool of unselected local breeds can represent a valuable source of genes [32]. However, with a few exceptions [33,34], little research has been carried out on the genetic diversity and genetic relationships of indigenous cattle from South East Europe with regard to Greece and Cyprus.
Thus, using the single nucleotide polymorphism (SNP) array technology, our objectives were: (i) to obtain unbiased estimations of the neutral genetic diversity of the Greek cattle populations, which represents the first comprehensive genome-wide analysis for these populations; (ii) to evaluate within-breeds'/ populations' different sources of genetic variance as well as their distinctiveness level; (iii) to predict recent admixture patterns of the highly selected and competitive breeds with the non-selected and heterogeneous indigenous breeds from Greece and Cyprus; (iv) to predict historical admixture patterns in cattle breeds in Greece and their expansion route towards the southern foothills of the Alps; and (v) to build an objective basis for the implementation of conservation programs for breeder's associations, and national and international bodies, as a solution towards the uncontrolled mating and outcrossing of certain rare breeds under high risk of extinction.
(KEA; n = 97), Agathonisi cattle (AGT; n = 6), Crete cattle (CRT; n = 11), Kastelorizo cattle (KAS; n = 4), Nisyros cattle (NSY; n = 7) and Cyprus cattle (CYP; n = 5). A detailed description of the local cattle breeds studied here is provided in Additional file 2 and Additional file 1: Table S2. The samples collected in this study were complemented by whole-genome genotypes for GRB (n = 19) and CYP (n = 9) reported by Flori et al. [34] and SYK cattle (n = 5) reported by Verdugo et al. [35] (see Additional file 1: Table S1). In addition, for comparison purposes, we included genetic information of 104 international breeds, based on genetic, historical and geographical criteria. More precisely the large dataset of the aboveselected breeds fell into eight main geographic groups (Minor Asia, South East Europe, East Podolian, Tyrrhenian (Apennin-Sicily-Sardinia-Corse), Alpine, France, Fig. 1 Origin of the breeds used in the analyzed dataset. Special square marks represent the influence of East-Podolian (grey), Alpine (green) and North-West (olive green) groups Iberian and North West European breeds) plus one outgroup that included Gir (GIR), Yak (YAK), and N'Dama (NDA). We included all the above breeds in our study because the geographical origin of some of them have an obvious proximity to Greece and Cyprus or because they have more or less affected the genetic pool of local breeds from Greece and Cyprus through long-term crossbreeding events during the past [36] and possibly till present. For example, the long-term crossing of the indigenous short horned (GRB) cattle and some breeds from the Alpine group (e.g. OBV, BBV and TGV) led to the formation of the KEA breed. However, the GRB and the short horned Buša cattle sampled in the neighboring Balkan countries (South East European group) probably share the same origin, which is why we sampled 18 additional Buša cattle individuals from North Macedonia (see Additional file 1: Table S1). The KTR and SYK breeds are hypothesized to share ancestry with the East Podolian steppe geographic cattle group. The AGT and NSY breeds are hypothesized to share ancestry with some strains of Podolian steppe and Anatolian origin, whereas CYP and KAS might share Anatolian and zebu origins. In addition to the Bos taurus breeds that come from Europe and Minor Asia, in our phylogenetic analyses, we used as outgroups YAK from Mongolia, NDA from West Africa representing African Bos taurus and GIR cattle originating from India but bred in Brazil, representing Bos indicus cattle. These 115 breeds and 10 geographical breed groups used in our analyses are described in Additional file 1: Table S1. We believe that the best way to display the grouping of breeds is by geographical origin (color code) without taking previous genetic knowledge into account, and to improve visualization, we added a symbol for some breeds to indicate possible or known genetic similarities with other breeds or groups. More specifically, these symbols combine the color of the group from which they originate geographically with the color of the group with which they have some genetic similarity. Thus, the breeds BURL (from the Tyrrhenian group), and PUST and PIN (from the Alpine group) display an additional symbol with a color that indicates a known influence from the North West group, the breeds KEA (from Greece), and CABA, AGER, and SBRU (from the Tyrrhenian group) display an additional color that indicates influence from the Alpine group, and some local breeds from Italy (RMG, MCH, CALV, CHI, MARE, and PODO) display an additional color that indicates possible podolian influence [37].

DNA isolation and SNPs
DNA isolation was performed using a commercial kit (QIAamp DNA MiniKit, QIAGEN) according to the manufacturer's instructions. For SNP genotyping, the Illumina BovineSNP50 BeadChip array was used following standard procedures (http://www.illum ina.com). Furthermore, quality controls were applied to obtain a high-quality dataset: (i) only the animals with a call rate higher than 0.95 were included; (ii) SNPs that mapped to unknown or sex chromosomes were removed; (iii) SNPs that were genotyped for less than 90% of the samples were removed; (iv) SNPs with a MAF lower than 0.025 and that deviated from Hardy-Weinberg equilibrium within breed (P ≤0.01) were removed. The dataset, at this stage, consisted of 46,678 autosomal SNPs genotyped in 3457 individuals.

Haplotyping and unified additive relationships (UAR)
The Beagle software package (v 5.0) was used for imputation of missing genotypes [38] and haplotype phasing [39]. To improve the efficiency of phasing and imputation, we considered genotyping data of all available bovine animals in the in-house database, which includes a large number of pairs and trios from other projects. Genome-wide relationships between individuals were estimated using a unified additive relationship (UAR) matrix [40] implemented in the R package snpReady [41] and applied to 46,678 SNP genotypes of 3457 animals.
Diversity, phylogeny and population structure analyses require samples of representative and least related animals in each breed. To retain the most representative animals, we excluded outlier individuals (i.e. erroneously sampled and/or admixed animals) by multivariate outlier analysis [42], and then decreased the level of relationship by successive exclusion of highly related animals. Both the multivariate outlier analysis and the reduction of familial structures within breeds rely on the genomewide additive genetic relationships stored in the UAR matrix. Finally, the dataset used in subsequent diversity and phylogenetic analyses included the 2858 most representative and unrelated animals. The starting and optimized sample sizes for each breed are listed in columns N and Nd (see Additional file 1: Table S1).

Haplotype diversity
To design the Illumina Bovine SNP50 BeadChip (Illumina), only five taurine breeds and one indicine breed were considered [43]. Therefore, since this array could contain biased sets of pre-ascertained SNPs, we adopted a 4-SNP-block approach as described previously to reduce this ascertainment bias [13,14]. Specifically, 4-SNP blocks (haplotypes) that spanned less than 150 kb and had an inter-marker distance shorter than than 50 kb were defined, leading to a compromise between the maximum number of SNPs and the minimum recombination probability within the block [14]. In total, 5756 SNP blocks were taken into account as multi-allelic markers and their haplotypes as alleles in the subsequent unbiased allelic diversity and heterozygosity analyses. Hereafter, SNP blocks are also referred to as multi-allelic markers.

Genetic diversity
The following population genetic parameters were estimated: total number of alleles (nA), mean number of alleles per block (mA), allelic richness (AR), observed (H O ) and expected heterozygosity (H E ), number of private alleles (npA; alleles observed only in one population), frequency of private alleles (fpA) and number of common alleles (ncA; observed in all subpopulations) as described previously [13,44,45]. In addition, we used the number of semi-private alleles (nspA), defined as the alleles observed in two populations only, to further assess the allelic diversity and to consider any genetic and geographical closeness of the included breed pairs. Inbreeding coefficients of each animal i (F i ) were estimated from the diagonal elements of the UAR matrix as F i = UAR (i,i) − 1. To improve the presentation and discussion of the summary statistics related to diversity, we standardized and then plotted these statistics onto a map with a tessellated projection using the R-script available with the package Tess (http://membr es-timc.imag.fr/Olivi er.Franc ois/TESS_Plot.html).
To assess the subpopulation-differentiation, we used the D EST estimator, which is analogous to the classical G ST for multi-allelic loci but is unbiased and more suitable when the level of gene diversity is high [46]. In this approach, we used the dataset of the genotypes for 5756 multi-allelic SNP blocks in 115 breeds.

Past effective population size based on linkage disequilibrium
We applied a linkage disequilibrium (LD) based method as implemented in the SneP tool [47] to estimate the historical and recent effective population size (Ne) for all the cattle breeds with sample size larger than 8. The estimation was performed on the SNPs with minimum and maximum distances equal to 20,000 and 10,000,000 bp, respectively, and by applying a recombination rate correction [48] and a sample size correction. The most recent effective population is represented by Ne 5 , (i.e. five generations ago), the effective population size in preindustrial times (i.e. 50 generations or 250 years ago) is represented by Ne 50 , and in times close to domestication (10,000 years ago) by Ne 2000 . To improve the presentation and discussion of the effective population size across time and space, we standardized and then plotted Ne 5 , Ne 50 and Ne 2000 onto a map with a tessellated projection using the R-package Tess as described above.

Phylogeny and population structure
We applied four phylogenetic and population structure analyses to infer relationships between animals and breeds. Two of these analyses rely on bi-allelic SNP genotypes and two on multi-allelic SNP-block genotypes. In addition, two of these analyses represent supervised clustering and two represent unsupervised clustering.

Supervised phylogeny of 115 cattle breeds
To elucidate the phylogenetic relationships among the studied populations, we implemented maximum likelihood (ML) methods using the TreeMix program [49]. In this approach, we used the dataset of SNPs genotyped in 115 breeds (Table 1), and set YAK as outgroup to root the tree. The second supervised approach used the allele frequencies of 5756 haplotype blocks to estimate the Nei's unbiased D A -distances [50]. Then, the D A -distance matrix of 115 breeds was used to reconstruct the neighbor-net with the SplitsTree4 software and the neighborjoining tree with the FigTree 1.4 software (http://tree.bio. ed.ac.uk/softw are/figtr ee/) [51]. We set YAK as outgroup to root the neighbor-joining tree.

Unsupervised population structure analyses
Similar to the above analysis, first we examined the population structure based on SNP genotypes. For this purpose, we conducted clustering with the program Admixture 1.23 [52] under the assumption that the number of clusters is equal to K, with K ranging from 1 to 115, i.e. the number of breeds plus 1. Since the admixture analysis does not need an outgroup, we used the dataset without YAK. To assess the quality of clustering and thus infer the most likely K, we performed 10 cross-validations [53] and estimated the cross-validation error for each K. To illustrate the results of the admixture analyses, we used the Python package Pong [54]. The second unsupervised approach used the proportion of genomewide shared SNP-block alleles (PS) among all pairs of 2858 animals. The PS matrix was then converted into an allele sharing distance matrix (D PS ) as described by Bowcock et al. [55]. Multidimensional scaling (MDS) [56] was used to project the multidimensional D PS distance matrix onto a two-dimensional (2D) plane for the 115 breeds. For a better illustration and interpretation of the MDS results, for each breed, we calculated the mean MDScoordinates of all the individuals, which correspond to the center of each breed symbol (circle), and then we estimated the standard deviation (SD) around that center. Specifically, for each breed, we calculated the spatial distance of each individual to the group center by applying the Pythagorean theorem, assuming that the hypotenuse is the distance between the center of the breed symbol      Then, we estimated the SD of these spatial distances. Finally, the SD was used as the radius around the breed center symbol, as a proxy for spatial dispersion of animals of each breed. For visualization purposes, plot dimensions were proportionally adjusted in R, considering a 1-inch (= 0.254 cm) length as the longest radius.
We calculated the mean D PS within breed to show the level of breed differentiation. Mean D PS values were standardized and then plotted as a tessellated projection onto a map, using the R-package Tess as described above.

D-statistics analysis
To investigate the historical admixture between taurine and indicine cattle, the D-statistics [57] was computed by using the qpDstats tool of the AdmixTool software package [58]. In brief, for a set of three populations P1, P2 and P3, and an outgroup O that fits to this phylogeny: (((P1,P2),P3).O), the numbers of shared alleles between P1 and P3 (BABA) and, P2 and P3 (ABBA) are calculated by assuming that allele "A" represents the ancestral allele and allele "B" the derived allele. The significant excess of either "ABBA" or "BABA" indicates admixture between populations P2 and P3, or P1 and P3, respectively. We used YAK as outgroup O and Bos indicus (GIR) as P3. We selected Highland cattle that originate from the most northwestern part of Europe (Scotland) as P2 and, thus as a Bos taurus breed with the lowest, if any, admixture level with Bos indicus. All other cattle breeds were tested as P1. To present the gradient of Bos indicus genes in European taurine cattle, we standardized and then plotted the D-values onto a map with a tessellated projection using the R-package Tess as described above. The D-values with Z > |3| were considered as significant and indicated on the map.

Genetic diversity
In total, 590 common alleles were detected among the 115 breeds studied, which represents only 0.7% of the total number (80,720) of alleles. All estimators of genetic diversity for the breeds studied here were highly differentiated among the predefined geographical groups ( Table 1). The geographic group of Minor Asia displays the highest average value for almost all the estimators of allelic diversity used, i.e. for: total number of alleles (nA), number of alleles per haplotype block (mA), number of private (npA), number of semi-private alleles (nspA) and allelic richness (AR) ( Table 1). The heterozygosity estimates based on multi-allelic SNP blocks and on biallelic SNPs are highest for the South East European Buša breeds. The only diversity parameter, which indicates a higher diversity in the central Europe group than in the Minor Asian group, is the observed heterozygosity estimator based on bi-allelic SNPs (H O[SNP] ) ( Table 1 and Fig. 2). Intermediate values were found for the breeds in the Greek and Cyprus group for mean number of observed alleles ( nA = 37,879), mean number of private ( npA = 58.6) and semi-private alleles ( nspA = 82.6). The Greek and Cyprus group showed the highest mean inbreeding coefficient ( F = 0.178) and relatively low values for: mean allelic richness ( AR = 3.54), mean observed ( H O = 0.656) and mean expected heterozygosity ( H E = 0.641). Within the Greek and Cyprus cattle breeds, CRT exhibited the lowest values for all genetic diversity parameters and, at the same time, the highest frequency of private alleles (fpA = 0.376) as well as the highest average inbreeding coefficient (F = 0.457). For GRB, most of the diversity estimates had the highest values but the frequency of private alleles (fpA = 0.021) and the inbreeding coefficient (F = 0.108) were low. Generally, all the analyzed island populations except the KEA breed, which was sampled on the island of Kea and on mainland, had very high levels of inbreeding and very low diversity parameters (Table 1). GRB and the South East Europe Buša cattle shared similar values for almost all diversity parameters. The tessellated projection of diversity statistics provided strong support for a high allelic diversity in breeds from Anatolia and part of South East Europe. Based on Additional file 3: Figure S1, a southeast to northwest gradient of genetic diversity could be inferred, but it is interrupted by the genetic diversity parameters of the Greek island breeds. However, if the Greek island breeds are excluded, this possible southeast to northwest gradient of genetic diversity remains consistent (see Additional file 3: Figure S1). To illustrate a possible ascertainment bias of the SNP chip data, we present the tessellated projection of the observed heterozygosity estimations based on multi-allelic blocks (H O ) and bi-allelic SNPs (H O[SNP] ) side-by-side in Fig. 2 and H O are estimators of the true diversity. Therefore, the diversity of the breeds placed above the overall trend line (e.g. North West Europe) is overestimated by H O [SNP] and the diversity of the breeds placed below this line (e.g. Minor Asia) is underestimated by H O [SNP] .

Effective population size Ne
The weighted mean of the current effective population size (Ne 5 ) is relatively small and ranges from 28 in   Figure S2). The tessellated projections provide strong support to the observation that as we go back in time the effective population size increases in regions closer to the domestication center and decreases in North West Europe (see Additional file 4: Figure S2). As Ne is inversely correlated to the extent of LD, our results suggest that the level of LD is high in the fragmented breeds under extinction pressure, which is most probably caused by uncontrolled inbreeding.

Genetic differentiation
Pairwise population differentiation values as estimated by the unbiased estimator developed for multi-allelic markers, D EST , are in Additional file 5:  Within the Greek and Cyprus group, low pairwise D EST values were obtained between two mainland breeds i.e. GRB-SYK (D EST = 0.073) and high values were obtained between two island breeds i.e. CRT-AGT (D EST = 0.413). It is remarkable that the CRT breed showed the highest differentiation level among all the investigated breeds ( D EST = 0.391). Again, the GRB and Buša breeds shared comparable D EST values.
In addition, we used average allele sharing distance (D PS ) between animals within a breed as a measure of the breed-level differentiation. As shown in Additional file 6: Figure S3, the highest D PS values were observed for most of the Buša and Anatolian breeds together with the Greek GRB and the Italian PODO breeds, whereas the lowest values were observed for the Greek island breeds.

Genetic distances and phylogenetic neighbor net
We estimated Nei's genetic distance D A based on multiallelic SNP blocks and present the values as a neighbornet (Fig. 3) and as a neighbor-joining tree routed by YAK (see Additional file 7: Figure S4). In Fig. 3 to improve the visibility of the main part of the tree, we shortened the branch length for YAK. Among the Greek and Cyprus group, the island breeds CYP, AGT and KAS were placed close to Bos indicus. This is also the case for the Anatolian breeds, which form a cluster with the aforementioned breeds. Interestingly, the GRB breed from the mainland Greek group, is positioned within the cluster of the Buša breeds with short branches. On the opposite, for the island breeds CRT, AGT, KAS and NSY, long branches result from a high inbreeding level. KTR and SYK, which represent Greek podolian cattle, are placed between the TRG and Italian podolic breeds. KEA is the only Greek breed that clusters together with some breeds from the Alpine region, as a result of crossbreeding that occurred between indigenous Greek cattle and some Alpine cattle breeds. This cluster includes also some Italian breeds with Brown-Swiss influence (CABA, AGER and SBRU). In addition, we used the allele counts of bi-allelic SNPs and reconstructed the phylogeny with the TreeMix program (Fig. 4) by using YAK to root the tree. Compared to other European cattle breeds, CYP, KAS and AGT were closer to the root of the tree and thus, closer to GIR, the Bos indicus representative. The aforementioned Greek breeds are gradually followed by the cattle breeds from Minor Asia, Greece, Bulgaria, North Macedonia, Kosovo and Serbia. In agreement with the neighbor-net, KEA is the only Greek breed that clusters in the Alpine cluster.
The remaining Buša breeds sampled along the Ionian-Adriatic route, i.e. Albania, Montenegro and Dalmatia, were placed after two clusters of Italian breeds. Our phylogenetic analyses, both with multi-allelic (Fig. 3) and bi-allelic markers (Fig. 4), do not aggregate the so-called Podolian or gray steppe cattle breeds (TRG, KTR, SYK, HRI, HRP, and UKP) in a single separate cluster, instead they are scattered along the   On the other hand, HRI, HRP and UKP do not form an own cluster but are placed among some of the Buša neighbor breeds (Fig. 4). The Italian-podolian breeds form a separate cluster, which is placed between some of the Buša and other Italian breeds and the East podolian breeds. Finally, in the phylogenetic tree reconstructed based on SNP allele counts (TreeMix), for some samples with a high inbreeding level (e.g. CRT and BHB) long branches are observed whereas for less differentiated and highly diverse breeds (e.g. TRG and RMB) short or no branches are found.

Assessment of population structure using unsupervised heuristic and unsupervised model-based methods
We used multi-allelic SNP blocks to estimate the allele sharing distance matrix (D PS ) among 2858 animals and projected these by multidimensional scaling (MDS) on a two-dimensional (2D) plane. The MDS projection of 115 breeds is shown in Fig. 5. MOSI, and RSIC) including all Italian podolic breeds (PODO, MARE, RMG, CALV, CHI, and MCH). CRT is isolated from the other two main Greek breed clusters. KEA is positioned in the geographic region of some Alpine and Italian breeds, showing a closer relationship to these breeds than to its own (Greek) geographic cluster. The second dimension of MDS (MDS2) separates the breeds of North Europe and Alpine geographic area from the breeds of central, west and southern Europe. The Admixture analysis presents the second unsupervised clustering method applied to 2832 animals (26 Mongolian yak excluded) using the genotypes of bi-allelic SNPs. The lowest cross-validation error (cv error = 0.536) was determined at K = 78. Figure 6 presents the clustering at K = 4, 10, 24 and 78. The reasons for choosing K = 78 are that: (i) with a K larger than 4, animals from highly selected breeds start to separate into distinct clusters; (ii) the clusters at K = 10 represent the number of pre-defined geographical breed groups plus 1, i.e. nine groups of European Bos taurus plus NDA and GIR; (iii) the cross-validation error decreases almost linearly from K = 1 to K = 24; and (iv) the clustering with the lowest cross-validation error is at K = 78. Additional file 8: Figure S5 presents the Admixture results with the default color assignment performed by the Pong package for K = 2 till 26. All breeds from Minor Asia and most of the South East geographic groups, as well as GRB, showed a high level of complex admixture even at K = 78. All highly selected breeds and isolated breeds formed their own cluster. GRB and KTR, and SYK to a lesser degree, shared the same pattern with most of the breeds belonging to the South East group with evidence of shared ancestry between these and the Anatolian breeds. The highly inbred and differentiated island breeds assigned to separate clusters. The same was also observed for smallsized mainland populations, such as PRG and ROG. At K = 24, a significant proportion of the Alpine ancestry (OBV and BBV) was estimated for KEA, which is also retained at the most probable true K = 78.

D-statistics analysis
The historical admixture between taurine and indicine cattle was confirmed for most of the cattle breeds that originate from Anatolia, Cyprus, Greek and South East Europe and most of the Central and South Tyrrhenian breeds. The KEA, AGER and SBRU breeds, which are influenced by the Alpine group, as well as the MPIS and SARD breeds deviate from the above described general geographic trend. In Fig. 7, the values of all these significant D-statistics (Z < |3|) are shown with a red spot on a tessellated map to highlight the gradient of the Bos indicus introgression from the Southeast to the Northwest direction. The results provide strong evidence that support the influence of Bos indicus along the Balkan continental route and along the Mediterranean route up to North Italy.

Discussion
In this study, we focused on the genome-wide structure of 11 indigenous cattle populations, 10 from Greece and one from Cyprus. The dataset included all the indigenous cattle populations that are reared in Greece and Cyprus and considered to be under constant risk mainly since the 1970s and onwards. An essential reason for this situation is the increased use of sires from the so-called cosmopolitan breeds mostly from the mainland. Sampling included as many individuals as possible from the whole Greek territory (Fig. 1). Some island indigenous breeds (CYP, AGT, KAS, and NSY) were represented by very small sample numbers (Table 1). However, due to the aforementioned limited population sizes and limitations regarding the herds (i.e. highly related, almost feral and difficult to handle), it was not realistic to consider that the number of samples can be increased. For the remaining Greek breeds, a sufficient number of 10 or more of the most representative individuals of the local breeds were sampled. Therefore, this study represents the most complete and up-to-date dataset available for the indigenous Greek cattle and integrated into the most complete Minor Asia and European sample collection. Improving our knowledge of the genetic diversity within and among local breeds is an issue of crucial importance for implementing further conservation programs that are necessary for sustainable development and future animal farming in changing environments [34]. This could be particularly important for traditional unselected breeds that cover a geographical area close to the domestication center [32].
In spite of the ascertainment bias of the BovineSNP50 chip data [13,43], it was highly informative for the analyzed Greek and Cyprus cattle populations as reflected by the consistent values obtained for various parameters of genetic diversity levels. These parameters indicated a more profound loss of genetic diversity as well as a higher risk of inbreeding in the Greek island populations than the Greek mainland breeds. Experimental studies [59] have demonstrated that a higher level of nucleotide diversity is associated with a stronger selection response under stressful conditions. Furthermore, Vilas et al. [60] reported that the high adaptive potential of a population is better indicated by the allelic diversity of neutral markers than by expected heterozygosity. Here, we investigated cattle breeds, whose geographical origin ranged from the cattle domestication center to the most western and the most northern parts of Europe. The overall diversity estimators suggest higher allelic diversity in breeds from the region of South East Europe up to Anatolia. However, this general trend is interrupted by some highly fragmented and inbred Greek cattle breeds (see Additional file 3: Figure S1), especially by island breeds such as CRT, AGT, NSY and KAS which are represented by samples of the last remaining indigenous animals. In complete accordance with the strong effect of genetic drift, the corresponding average frequency of private alleles (fpA; Table 1) reached the highest value in the Greek island breeds. Comparable high fpA-values were observed only in genetically isolated island breeds such as MNRQ and MALL (Table 1). We observed a general trend that characterizes the fragmented breeds at risk of extinction, i.e. that they are highly inbred with a small number of high-frequency private alleles. This is reflected in the high positive correlation between F and fpA (r (F, fpA) = 0.72) and low negative correlation between F and npA (r (F, npA) = − 0.21). As outlined in the Background section, all indigenous breeds of South East Europe, Greece, Cyprus, and Anatolia are either under weak artificial selection or not under any kind of coordinated artificial selection. However, there are substantial differences with respect to isolation. The estimated D EST levels (see Additional file 5: Table S4) suggest a substantial genetic differentiation of the Greek and Cyprus group compared to the remaining groups. This high differentiation is attributed to the genetic drift and inbreeding in the highly fragmented and naturally isolated island breeds (CRT, AGT, NSY, and KAS) and is confirmed by the low differentiation between animals within these breeds (see Additional file 6: Figure S3).
However, empirical studies [59] provide evidence that, rather than inbreeding, genetic drift led to a reduction in diversity in comparable scenarios. Moreover, purging in stressful environments (i.e. natural selection) can maintain a higher diversity level than expected by inbreeding.
In general, each island of the Aegean represents a different entity with a variety of environmental and socioeconomic factors that affect the livestock populations [61].
However, the cattle populations from the Greek islands have constantly suffered from declining population sizes during recent decades. In the case of the CRT breed, even 60 years ago, the statistical data for livestock numbers of indigenous cattle in Crete identified only 149 of the 1954 (7.6%) recorded animals as indigenous or local, which is a rather small percentage [62]. This statistic also documents a very small number of animals for such a large island (one cow per 4.23 km 2 ). For the whole territory of the Dodecanese islands, on which the AGT, NSY, and KAS breeds are raised, the same source identified 2434 of the 6210 recorded animals (39%) as local. Today, the population size of all the above-mentioned cattle breeds is very small (see Additional file 1: Table S3 and Additional file 2). In addition to small population sizes, genetic drift is strengthened by geographical isolation that further hampers animal exchange. Thus, in such cases, inbreeding and genetic drift are unavoidable factors that shape the diversity of these populations. More generally, there is an opposite trend between the economic significance of cattle and of small ruminants when the regions along the line from the Aegean islands toward the Greek mainland, Balkan and Middle Europe are compared.
In contrast to the island breeds, the lowest levels of genetic differentiation (D EST ) were observed for the breed groups from Minor Asia and South East Europe and for some mainland Greek breeds [e.g. GRB, PRG, SYK, and KTR (see Additional file 5: Table S4)]. This low differentiation is accompanied by a high level of allelic diversity including a large number of low-frequency private and semiprivate alleles. Low differentiation and high diversity are probably the consequences of a low artificial selection pressure combined with low genetic drift in effectively large (Table 1) and less isolated populations.
All supervised clustering methods applied in this study clearly reveal the level of genetic differentiation. The highly differentiated breeds show long branches in the neighbor network (Fig. 3), neighbor joining tree (see Additional file 7: Figure S4) and maximum likelihood tree (Fig. 4). This is independent of the evolutionary sources of the differentiation, i.e. the artificially selected and artificially isolated breeds (e.g. beef breeds from the UK) and the naturally isolated and naturally selected breeds (e.g. Greek island breeds) show very long branches. However, breeds that are under strong random sampling (island breeds) show even longer branches especially in the ML analysis. On the contrary, for local breeds with a low differentiation and high diversity level, the three methods reveal short branches or even no branches (Figs. 3 and 4; see Additional file 7: Figure S4). The unsupervised clustering methods provide a partly different tale. Most of the animals from the artificially unselected breeds of South East Europe, Greece, Cyprus and Anatolia remain unclustered with the admixture approach, even at very high K-values ( Fig. 6 and Additional file 8: Figure S5). The CRT breed which has the highest inbreeding level (0.457) among the 114 breeds investigated here was the first unselected breed that was clearly clustered with the admixture approach at K = 7. The K-value of 78 is associated with the smallest cross-validation error and should represent the true number of clusters in our design. Among the aforementioned unselected breeds, we observed clear clustering for the island breeds and the most inbred Buša breed BHB only, which are characterized by high inbreeding. We observed a similar trend regarding clustering for the Greek KTR, SYK, ROG and PRG populations but in different proportions, which reflects their isolation status (see Additional file 2 and Fig. 1). The remaining unselected breeds and/or animals remained as an unresolved mixture even at the K-value with the lowest cross-validation error.
The MDS projection places almost all the unselected breeds from Minor Asia, Greece and Cyprus, and geographic breed groups from South East Europe and East Podolian, in the overlapping space with suggestive trends (Fig. 5), i.e. Anatolian and Cyprus breeds and the Greek island populations KAS and AGT close to the root of the tree and to GIR. The remaining breeds that were sampled in Greece were placed among the above-mentioned island breeds, the South East European Buša and some Italian local breeds. Thereby the breeds that are geographic neighbor of each other overlap in the MDS presentation. Only the KEA breed, which is the historical product of a cross between indigenous animals similar to the modern-day breeds of Brachyceros and some Alpine breeds (OBV, BBV and TGV), is placed between the Alpine and Tyrrhenian breeds, which is also confirmed by the admixture results (K = 8 to 26; see Additional file 8: Figure S5).
Estimating the LD-based effective population size (see Additional file 4: Figure S2) for different time intervals during the evolution of cattle provides an interesting insight into their demographic history. For example, the pattern of Ne at the time of domestication (~ 2000 generations ago) suggests that compared to western European breeds, modern South East European cattle breeds had larger founder population sizes as a result of their proximity to the center of domestication. In fact, up to ~ 50 generations ago, cattle breeds from southeastern Europe had larger Ne than those from northwestern Europe. However, the current Ne indicates that the population size of the South East European cattle breeds such as Buša and GRB, has decreased substantially in the last 50 generations or so. Furthermore, this observation implies that such populations have accumulated a small number of common but long haplotypes, which contribute to the high LD. The industrial revolution, modern breeding practices and changes in customers' preference have led to the replacement of local cattle breeds with commercial cattle breeds, which mostly originate from northwestern Europe. This factor, coupled with uncontrolled breeding in the remaining fragmented populations due to the absence of modern reproduction and breeding management practices contributed to population shrinkage of most of the previously highly diverse cattle breeds from southeastern Europe.
Based on Fig. 5 and Additional file 8: Figure S5, and on the results of the previous studies (e.g. [34,63]), we assume that introgression of indicine ancestry into Anatolian and some Mediterranean cattle occurred. To test this putative indicine introgression, which could have occurred along the migration route from Anatolia, Greece and South East Europe to the southern foothills of the Alps and North West Europe, we calculated the D-statistics. For all the breeds from these areas (Minor Asia, Greece and Cyprus, South East Europe and East Podolian) except KEA and also all the Italian Podolian breeds and some other breeds from South and central Italy, we obtained significant D-statistic values (Fig. 7). The first breed from the southern Alps for which no significant Bos indicus influence was found is SIC (Z < |3|), with a D-value close to zero. The D-values clearly decrease as the spatial distance to the origin of Bos indicus increases. Interestingly, three breeds from the East Podolian group also show a significant indicine introgression, among which UKP that was sampled east of the Carpate Mountains. As discussed recently by Verdugo et al. [35], indicine introgression started ~ 4000 years ago and may have been stimulated by the onset of a period of increased aridity known as the 4.2-thousand-year abrupt climate change event. The increased level of diversity in the breeds from South East Europe, and in the Anatolian and some Greek cattle breeds could be, also, the result of various demographic events, including the introgression of Bos indicus alleles. It should be noted that we cannot distinguish the proportion of diversity caused by introgression from that caused by other evolutionary forces, e.g. low pressure of artificial selection. We also note that some estimators of allelic diversity, such as the number of private and semi-private alleles measures the proportion of unique alleles not present even in GIR or other neighbor breeds. The fact that we sampled and genotyped some of the close neighboring breeds intrinsically reduced the number of private and semi-private alleles. However, although GIR and many neighbor breeds from South East Europe and Greece were sampled (Fig. 1), the numbers of private and semi-private alleles remained larger than for breeds from other parts of Europe.
Conservation of genetic diversity for sustainable development must be understood as a global long-term task. The repeatedly confirmed evolutionary trade-off hypothesis [59,64] suggests that increased fitness in the environment of selection is accompanied by a decrease in fitness in other environments. This trade-off is applied for high-performance breeds that are adapted to benign (temperate) environments and for breeds that are adapted for production in stressful environments. In spite of the above-mentioned trade-off, replacement of well-adapted local breeds by highperformance breeds and/or animals from their crosses is forced by the free-market mechanism that intrinsically causes the extinction of local breeds [65]. This replacement process has either already reached the terminal stage or is close to reaching it in many regions [14]. If we consider animal diversity as a globally shared-resource and as a prerequisite for sustainable development in the changing environments, then gradual depletion of neutral diversity present in local breeds under soft selection pressure is a kind of "tragedy of the commons" [65,66]. Therefore, conservation breeding programs must be seen as a regulated long-term exploitation of common resources.
As already discussed above, most probably, because of an abrupt climate change event, farmers started to cross taurine cattle with Bos indicus [4,9] during the early Bronze Age. Such crosses increased the already high level of diversity of local soft-selected breeds. The subsequent long-term adaptation to local environments shaped the bovine mosaic genomes with an unknown but low proportion of indicine alleles in modern-day cattle breeds from Anatolia up to the southern foothills of the Alps. Therefore, conservation of a high diversity level in these partly fragmented breeds could offer valuable genetic resources for future human needs and future abrupt or gradual climate-change events.

Conclusions
The phylogenetic patterns derived from genome-wide information were quite consistent with the geographic positions and the historical information regarding crossbreeding between Greek and Cyprus cattle. All investigated Cyprus and Greek breeds present a complex mosaic genome of historical and recent admixture events between neighbor and well-separated breeds. While the contribution of some mainland breeds to the pool of genetic alleles seems important, some island and fragmented mainland cattle strains suffer from a severe decline in population size and a loss of alleles due to strong bottlenecks and genetic drift. However, in spite of a markedly reduced genetic diversity level in most island breeds, these show high fertility and longevity in stressful environments. Conservation programs that are a compromise between what is feasible and what is desirable should focus not only on the still highly diverse mainland breeds but also promote and explore the conservation possibilities for island breeds.