Molecular phylogeny of Salmo of the western Balkans, based upon multiple nuclear loci

Background Classification of species within the genus Salmo is still a matter of discussion due to their high level of diversity and to the low power of resolution of mitochondrial (mt)DNA-based phylogeny analyses that have been traditionally used in evolutionary studies of the genus. We apply a new marker system based on nuclear (n)DNA loci to present a novel view of the phylogeny of Salmo representatives and we compare it with the mtDNA-based phylogeny. Methods Twenty-two nDNA loci were sequenced for 76 individuals of the brown trout complex: Salmo trutta (Danubian, Atlantic, Adriatic, Mediterranean and Duero mtDNA lineages), Salmo marmoratus (marble trout), Salmo obtusirostris (softmouth trout), and Salmo ohridanus (Ohrid belvica or belushka). Sequences were phylogenetically analyzed using maximum-likelihood and Bayesian Inference methods. The divergence time of the major clades was estimated using the program BEAST. Results The existence of five genetic units i.e. S. salar, S. ohridanus, S. obtusirostris, S. marmoratus and the S. trutta complex, including its major phylogenetic lineages was confirmed. Contrary to previous observations, S. obtusirostris was found to be sister to the S. trutta complex and the S. marmoratus clade rather than to the S. ohridanus clade. Reticulate evolution of S. obtusirostris was confirmed and a time for its pre-glacial origin suggested. S. marmoratus was found to be a separate species as S. trutta and S. obtusirostris. Relationships among lineages within the S. trutta complex were weakly supported and remain largely unresolved. Conclusions Nuclear DNA-based results showed a fairly good match with the phylogeny of Salmo inferred from mtDNA analyses. The comparison of nDNA and mtDNA data revealed at least four cases of mitochondrial–nuclear DNA discordance observed that were all confined to the Adriatic basin of the Western Balkans. Together with the well-known extensive morphological and genetic variability of Balkan trouts, this observation highlights an interesting and variegated evolutionary history of Salmo in this area.


Background
Due to the high level of phenotypic diversity recorded in trout species, the classification of the genus Salmo is still a matter of discussion. According to a recent taxonomic evaluation [1], about 30 species have been identified, which is in marked contrast to the two-species view (e.g. [2]), which was held for many years and which recognized only S. trutta and S. salar (e.g. [3,4]). Genetically, representatives of the genus Salmo, with the exception of S. salar, are usually regarded as members of the Salmo trutta complex, which includes five mitochondrial (mt)DNA lineages named after the basins where they were first found: Danubian (DA), Atlantic (AT), Adriatic (AD) and Mediterranean (ME) [5], and the drainage Duero (DU; [6]). A separate lineage, marmoratus (MA), corresponds to the marble trout (S. marmoratus) sampled in the Po and Soča river drainages. Although a consensus has been reached on the existence of these different phylogeographic lineages, the relationships among the lineages are unclear and result in different interpretations of their evolution (e.g. [7][8][9][10]). Moreover, a number of studies have shown that some lineages are also naturally present outside of their described geographical range. For example, the AD and ME lineages are present respectively in the Iberian Peninsula [8] and in the Adriatic basin [11], while the AT lineage has been found in Sicily [12] and is also apparently naturally present in the Danubian drainage in northern Austria [13]. In addition, MA haplotypes have been detected in brown trout [10,11] and AD haplotypes in the southern population of marble trout, from the Neretva and Skadar (Shkodra in Albanian) basins [14].
From a taxonomic view, mtDNA lineages have not been validated, with the exception of marmoratus, despite the inconsistency reported for marble trout, which is regarded either as a separate species (e.g., [11,[15][16][17]) or a member of the S. trutta complex [7,18].
Besides S. salar and the S. trutta complex, there are two less well known but phenotypically highly distinct members of Salmo: S. ohridanus (Ohrid belvica; belushka in Albanian) and S. obtusirostris (softmouth trout), both of which have been ambiguously classified despite their distinct morphology, and their position within Salmonidae has been rearranged on numerous occasions (see Snoj et al. [19] for review). On the basis of molecular data, these two species have been positioned in the genus Salmo [20][21][22]. While S. ohridanus has turned out to be undisputedly sister to the S. trutta complex, the exact position of S. obtusirostris has yet to be resolved; depending on the molecular markers used and model of evolution applied, S. obtusirostris is assigned either to the S. trutta complex or as sister to S. ohridanus [19,23].
Although several other Salmo species have been described from the Balkans [1], the systematics of the majority of these species is still uncertain [24]. Therefore, in addition to S. trutta and S. salar, we will refer to the taxonomic classification of only those taxa for which separation has been previously described at the molecular level i.e. S. marmoratus, S. obtusirostris, S. ohridanus; for the other taxa, we use the designation brown trout Salmo trutta complex.
Some phylogeny studies of Salmo have also included nuclear (n)DNA single loci but were either subsequently reported to be phylogenetically non-informative due to selection pressure (transferrin [25,26]; ITS [27]) or screened in only a few Salmo taxa (lactate dehydrogenase LDH-C1* [19]; growth hormone [28]; ITS1 [20]) with no intention of resolving their phylogenetic position within the genus. Phylogeny studies using other sequences of nDNA in Salmo have not been undertaken and, therefore, no comprehensive nDNA information is available to verify the conclusions based on the analysis of the control region (CR) mtDNA.
The discrepancy that exists between gene trees and a species tree, and also between nDNA and mtDNA trees, is well known [29][30][31] and is especially problematic in closely related species or those with large population sizes [32], a situation commonly observed in Salmo. With the development of novel techniques, it is now becoming easier to collect data on multiple unlinked nuclear gene loci and multiple individuals per species [32][33][34]. In addition, new analytical methods have emerged to evaluate species trees, based either on concatenated datasets and previously used methods to construct phylogenies (maximum likelihood, parsimony, Bayesian analyses) or on the coalescent theory, which analyzes genetic loci individually and constructs a species tree from the results of independent analyses of individual loci [32].
In order to resolve phylogeographic and taxonomic inconsistencies in Salmo, predominantly in trouts of the Western Balkans, and to avoid misinterpretations due to possible discrepancies between gene and species trees, we have applied a new marker system based on a larger number of independent and neutral nDNA loci than that previously used and which are designed to distinguish purebred trouts from their hybrids [35].
DNA was isolated from fin clips conserved in 96% ethanol, using the high-salt extraction technique of Miller et al. [39].

Sequencing
Twenty-two nuclear regions were analyzed. Polymerase chain reaction (PCR) primers and conditions are described in Pustovrh et al. [36] and PCR were performed in 25 μL reaction mixtures according to [35]. Amplified DNA fragments were run on a 1.5% agarose gel and purified using the QIAEX II Gel Extraction Kit (QIAGEN). Approximately 100 ng of purified PCR product was used in cycle sequencing reactions following BigDye Terminator v3.1 Cycle Sequencing protocols (Applied Biosystems). The amplified, fluorescently labeled and terminated DNA was salt-precipitated and analyzed with an ABI 3130 XL Genetic Analyzer. All newly determined sequences were submitted to Genbank (see Table 2).

Alignment, data partitioning and phylogenetic analysis
Sequences of all 22 loci for each individual sample were combined in the same order as reported in Table 2 and aligned using the default parameters in CLUSTAL-W [40]. Genotypes of all loci in each of the analyzed samples are reported [See Additional file 1: Table S1].
All loci were tested for positive selection (HA: dN > dS) by the Nei-Gojobori method [41] using MEGA version 4 [42].  A phylogenetic tree based on all the individuals analyzed was constructed using maximum-likelihood (ML) and Bayesian Inference (BI) methods. ML was performed as implemented in GARLI Version 0.96b8 [43]. To avoid over partitioning and yet still effectively deal with heterogeneity, each locus was used as a criterion to define a partition (locus). Prior model selection for each partition was determined using the Bayesian information criterion (BIC) calculated in jMODELTEST v 3.06 [44] in conjunction with PAUP (models for each partition are listed in Table 2). For analysis, 2000 bootstrap replicates were carried out to identify the best partitioning scheme. Analysis was performed with the settings recommended by Zwickl [43] and the runs were set for an unlimited number of generations and automatic termination following 20,000 generations without a meaningful (ln L increase of 0.01) change in score.
For BI, MrBayes 3.1.2 [45] was used. Prior model selection for each partition was determined using the Akaike Information Criterion (AIC) calculated in MrModeltest 2.3 [46] in conjunction with PAUP (Table 2). Random starting trees were used and four Markov chains were run for one million generations: nucmodel = 4 by 4; nruns = 2; treesampling frequencies of 1 in 100 (10,000 trees saved). Convergence was assessed by examining the cumulative posterior probabilities of clades using the "Are We There Yet?" online program (AWTY; [47]).
To address inter-specific phylogenetic relationships within Salmo, the method for species tree reconstruction, implemented in *BEAST v1.7.2. [34], was also used. The program determines the likelihood of gene trees in a given species tree to find the most likely containing species tree for multiple genes sampled from individuals across a set of closely related species. It uses the Markov chain Monte Carlo approach to average over the tree space, so that each tree is weighted proportionally to its posterior probability. All settings were entered in BEAUti v1.7.0, a graphical software package that allows  Table 1. creation of BEAST XML input files. Each genetic region (locus) was partitioned using BIC calculated in jMO-DELTEST v 3.06 (Table 2). Time-measured phylogeny using the relaxed clock model [48] and tree model were set as unlinked for all partitions. All molecular clock estimates for the gene regions were examined using the uncorrelated lognormal model. The operational taxonomic units (OTU) that were compared corresponded with the clades defined on the basis of individual trees constructed in GARLI as described above (see Table 1). Using CIPRES Science Gateway V. 3.3 [49], a public resource for inference of large phylogenetic trees, the same XML file was run for seven independent simultaneous runs of 300 millions generations sampled once every 30,000 trees. After analyzing each run separately, the program Tracer v1.5 (distributed with BEAST) was used and the first 30 millions generations were discarded as burn-in samples. A summary tree was created for each run using TreeAnnotator v1.5.3 (distributed with BEAST). The separate log and tree files were combined using LogCombiner v1.5.3 (distributed with BEAST). The combined log file was analyzed in Tracer 1.5 to ensure that effective sample sizes were large enough. Combined trees were analyzed with TreeAnnotator 1.5.3 and a summary tree produced and viewed in FigTree version 1.3.1 (http://tree.bio.ed.ac.uk/software/figtree/).
Each gene tree generated with BEAST was also analyzed and compared to the reference species tree. This was assessed by K tree score implemented in the program Ktreedist [50] that measures overall differences in the relative branch length and topology of two phylogenetic trees (two trees with very different relative branch lengths will get a high K score while two trees that follow a similar between-lineage rate variation will get a low K score, regardless of the overall rates in both trees; by comparing two trees you can choose the one that better follows the overall shape of a given reference tree [50].

Divergence date estimation
The time of divergence between major clades within Salmo since they last shared a common ancestor was estimated using the Relaxed Bayesian molecular clock model in BEAST v.1.7.2. The first prior was chosen to set the time to the most recent common ancestor (TMRCA) of S. ohridanus and S. trutta complex, which, on the basis of Cytb sequences was estimated to be 4 MYA (million years ago) [22] (prior was set to 4 MY (million years) with standard deviation (SD) = ±1 MY.) Another prior was chosen to set the time when the DA and AD-ME clades last shared a common ancestor. It is generally considered that the Mediterranean and Danubian basins separated about 700,000 years ago, and thus the prior was set to 0.7 MY with SD = ±0.2 MY [7,51]. All the other settings were the same as for the *BEAST analysis described above and median values and 95% highest posterior density intervals of the corresponding TMRCA were obtained. The results were analyzed with Tracer v 1.5.

Alignment and loci properties
Final alignment was made for 76 samples and 22 nuclear loci (Tables 1 and 2). Some loci could not be amplified in the out-group species (four in S. salar and 10 in P. perryi). When excluding the out-group species, the final alignment consisted of ca. 8000 bp with 234 variable sites, 196 of which were parsimony-informative. All DNA sequences have been deposited in GenBank (accessions numbers are in Table 2).
None of the loci analyzed proved to be under selection pressure according to the Nei-Gojobori method.
All the gene trees produced by *BEAST analysis were concordant and their topology and shape were similar to those of the reference species tree, as reflected by a low K score below 0.05 in the 22 comparisons (Table 2). Thus, none of the genes was excluded from the *BEAST input file and all gene trees were used to calculate the species tree.

Phylogenetic analysis
The topologies of the phylogenetic trees based on individuals using either the ML or BI method were very similar ( Figure 2) with two main branches: one branch corresponded to S. ohridanus and the other to a cluster of three groups of trouts that included northern and southern populations of marble trout and its dentex ecotype, softmouth trout, and a complex of brown trout lineages. Since these three groups split from the same internal node (polytomy), their evolutionary relationship could not be resolved.
The complex of brown trout lineages included two sister groups: one originating from the Atlantic basin and the other from the Danubian and Mediterranean basins. The split between these two groups was only moderately supported (MrBayes posterior probability (MrB PP) = 0.53; ML bootstrap value (ML BS) = 0.61). The Atlantic basin group was further split into two clades that corresponded to the AT and DU mtDNA phylogeographic lineages, while the Danubian and Mediterranean group formed two clades that corresponded to the DA and AD-ME mtDNA phylogeographic lineages. In the latter clade, AD and ME lineages could not be clearly distinguished i.e. individuals from the Zrmanja River (Adriatic basin) bearing AD haplotypes showed a sister relationship to individuals from Sardinia (Mediterranean lineage) bearing ME haplotypes.
The relationships among OTU shown on the phylogenetic tree based on species and lineages using the Bayesian *BEAST method ( Figure 3) were similar to those on the tree in Figure 2 but with a few specificities i.e. (1) instead of polytomy for marble trout, softmouth trout and brown trout, a split between softmouth trout and a group consisting of two sister clades (marble trout and brown trout) was observed, although with a low posterior probability in BEAST (BE PP = 0.82) and a high statistical support in ML (98%) and (2) within the brown trout complex, a split between individuals from the Atlantic and the Danubian and Mediterranean basins was supported with a high probability (BE PP = 0.93) like the split between individuals from the Adriatic basin and from Sardinia (BE PP = 1).

Divergence date estimates
Based on the output of *BEAST and with the two priors of 4 ±1 MYA for a split between S. trutta and S. ohridanus, and 0.7 ±0.2 MYA, when the DA and AD-ME clades last shared a common ancestor, as well as a relaxed clock, the time of the most recent common ancestor was estimated for each sister pair in Figure 3. Accordingly, the divergence date between Atlantic salmon and the trout complex was estimated in the late Miocene, between softmouth trout and the other trouts (brown and marble) in the late Pliocene, and between marble trout and the brown trout complex in the mid-Pleistocene about 1.5 MYA. For details on the divergence among brown trout and other intra-species, see Figure 3.

Phylogenetic relationship
Previously, phylogenetic analyses of the genus Salmo were based primarily on variations within CR mtDNA and in general, the resolution of the relationships among OTU was difficult. Furthermore, many examples confirm that a phylogeny based on mtDNA does not necessarily reflect the accepted phylogeny of the species. This is expected when dealing with species that have evolved recently as is the case of Salmo. For example, mtDNA analyses of S. obtusirostris salonitana (Jadro softmouth trout) and S. obtusirostris zetenzis (Zeta softmouth trout) have revealed haplotypes that are typical of the brown trout AD lineage rather than those that are normally present in softmouth trout. It is only after microsatellite analysis that mtDNA capture (reticulate evolution) was shown to be involved in both populations ( [52]; unpublished data). Whereas mtDNA describes only the maternal history of populations, nuclear markers always reflect maternal and paternal gene flow and allele histories, and are therefore more appropriate to reconstruct species tree. By using a new system based on 22 nuclear loci, we have improved the phylogeny of Salmo and checked its match with the CR mtDNA-based phylogeny.
Overall, our results show a fairly good match with the phylogeny of Salmo inferred from CR mtDNA, since several identical genetically homogeneous groups are found with both approaches. We confirm the existence of at least five distinct genetic units: Atlantic salmon, Ohrid belvica, softmouth trout, marble trout and the brown trout complex, all of which can be distinguished from each other. However, our results do not support the sister relationship between Ohrid belvica and softmouth trout as previously hypothesized based on external morphology [53] and mtDNA genetics [19]. Instead, our results point to a greater affinity of softmouth trout with the brown trout complex and marble trout. However, since these three species originate from one node, their relationships remain unresolved (Figures 2 and 3). We found that softmouth trout from the rivers Jadro and Zeta, despite having Adriatic haplotypes, cluster together with other softmouth trout in the same clade, which confirms the previously reported suggestion of reticulate evolution for this species [52].
The fact that softmouth trout is a spring spawner suggests that it evolved prior to the Pleistocene as a consequence of an evolutionary conservatism related to water temperature during spawning (contrary to glacial species such as brown and marble trout, which require colder water and thus spawn in winter; for details, see Karaman [54]). A pre-glacial origin for softmouth trout is also confirmed by the divergence time estimated from *BEAST analysis ( Figure 3). However, it is important to emphasize the somewhat unreliable statistical support for this divergence time, although molecular markers specific to softmouth trout have never been found outside of the species' present range and are exclusive to the narrow strip of the Western Balkans. Thus, both ecological and genetic observations imply that the origin of softmouth trout lies in this area, and that its distribution has remained limited to the Western Balkans since the time of its origin (ca. late Pliocene) until today. According to the mtDNA-based phylogeny, S. marmoratus is a separate lineage within the S. trutta complex (marmoratus, MA [7]), and closely related to the AD lineage. Both lineages are considered to have originated in the Italo-Adriatic and Balkans at a time when the ME lineage evolved in the western Mediterranean (reviewed in Bernatchez [7] and Cortey et al. [8]). However, we reposition S. marmoratus on the basis of the nDNA results reported here that classify it as a distinct cluster of the same or similar rank as that of S. trutta and S. obtusirostris. This observation supports the classification of marble trout as a distinct species; indeed, this is the most notable specificity that differentiates the nDNAbased tree topology ("species tree") presented here from the previously constructed mtDNA-based trees.
Our study reveals the genetic divergence between the northern and southern populations of marble trout, which could not be previously seen because of the polyphyletic origin of mtDNA haplotypes of these populations. It also provides new insight into the genetic structure of S. marmoratus with a considerably higher level of genetic polymorphism than previously reported on the basis of mtDNA studies [7]. For example, the genetic structure of seven populations of marble trout in Slovenia that are geographically separated and represent a resource for repopulation of the species [16] has been well studied both with mtDNA ( [23,55]; unpublished data) and microsatellite markers [15]. Whereas the mtDNA analysis was poorly informative due to the fixation of a single haplotype (MA1a) that suggested the absence of population substructuring, microsatellite analysis revealed a very strong inter-population genetic differentiation (pairwise fixation index (F ST ) from 0.3-0.9) probably because of a combination of a high level of microsatellite mutation and random drift (all the populations are very small). Analysis of the nDNA of the seven Slovenian populations grouped them into four genetically homogeneous units, their geographic distribution coinciding logically with the spatial proximity of the corresponding populations; according to our results, the populations diverged in the late Pleistocene. Thus, there is evidence that the marker system reported here is sufficiently polymorphic and informative to establish times of divergence for trout lineages within Salmo.
The least well supported branching on the tree concerns the S. trutta complex (Figures 2 and 3) for which the progress of how brown trout lineages developed is unclear. The lowest statistical significance was found for the relationship between OTU within the Mediterranean-Adriatic clade, but this poor resolution might be due to the small number of Mediterranean individuals sampled. However, the development of these lineages runs across a relatively narrow timeframe, as suggested both by the *BEAST-TMRCA analysis and by Bernatchez [7], which could be the most likely reason for such a low phylogenetic resolution in the rest of the S. trutta complex in this study too. Our results suggest that the Atlantic group was the first brown trout lineage to split off. Ancestral divergence in the AT lineage has been demonstrated at the mtDNA level [7], and also by the existence of a nucleotide polymorphism in the ITS nuclear gene [27] but this was not confirmed in later studies (e.g., mtDNA analyses [8][9][10], and transferrin nuclear gene analyses [8][9][10]), which considered the DA lineage as the oldest. We found that the DA lineage was sister to the ME-AD lineages, which, according to TMRCA, split around 0.66 MYA as previously estimated from variation in mtDNA [7]. The early mid-Pleistocene (0.7 MYA) saw the most drastic paleohydrological changes in the last three MY that led to the separation of previously connected river basins [56]. Taking into account both the nDNA and mtDNA data, the separation of the Danubian basin from the Adriatic basin seems to be one of the most marked hydrological changes that occurred at that time.

Mitochondrial-nuclear discordance
Comparison of nDNA and mtDNA data detected at least four cases of mitochondrial-nuclear discordance for Salmo (in S. marmoratus and S. obtusirostris; Figure 4), three of which had been previously reported based on microsatellite, or single gene analyses, or both, and mtDNA (S. obtusirostris salonitana [52]; S. obtusirostris zetensis [37]; S. marmoratus [36]). We also observed a new case of mitochondrial-nuclear discordance within softmouth trout with the population from the Vrljika River bearing a Neretva softmouth trout-specific mtDNA haplotype [21] although it clustered with the Jadro and Zeta populations in the same clade on the basis of nDNA data (Figures 2 and 4). Interestingly, all reported cases of mitochondrial-nuclear discordance in Salmo are limited to the Adriatic river systems in the Western Balkans. Such discordances have been reported for other salmonids (Oncorhynchus [57] and Salvelinus [58][59][60]) but to our knowledge, have not been detected in Salmo outside the above-mentioned area. The extensive morphological and genetic variability in Salmo, together with the large proportion of mitochondrial-nuclear discordance detected in the Adriatic basin of the Western Balkans, point to an interesting and variegated evolutionary history of Salmo taxa in the area.

Note on taxonomy aspects
Salmo trouts have undergone an evolutionary radiation in relatively recent times (from mid to late Pleistocene). For this reason, it has been difficult to genetically distinguish the most recently evolved populations, particularly in the Balkans, where trout have evolved into a variety of different forms. Despite this difficulty, the data reported here suggest a clear separation of the species and populations analyzed into several main evolutionary lineages, and further geographical subdivisions of these lineages. Thus, the phylogenetic trees presented here lend support to nomenclature, not only with regard to the species status of S. ohridanus, S. obtusirostris and S. marmoratus, but also as a means for naming the main lineages by names that already exist. Thus, S. trutta would apply to the AT lineage and S. labrax to the DA lineage, as already proposed [1]. The situation is not so clear for the AD-ME lineage, which comprises the most complicated conglomerate of taxa e.g. S. macrostigma, S. cettii, and S. farioides. However, several meaningful phylogenetic subdivisions have been found in trout from the rivers Krka and Zrmanja that correspond completely with the distribution of the questionable species S. visovacensis Taler, 1950 and S. zrmanjenzis Karaman, 1938. Similarly, the clear separation of Ohrid brown trout from the other trouts that inhabit nearby areas (River Neretva, the Skadar-Ohrid river system and River Bistrica in southern Albania) implies an independent taxonomic status for the Ohrid brown trout and provides support to maintaining its taxonomic epithet S. letnica as previously proposed [37].
Geographically clearly defined sub-lineages are evident also in marble trout. Since these share a similar phylogenetic hierarchical level to that of the main evolutionary lineages of S. trutta, they should be regarded as two distinct species. However, no morphological analysis has been undertaken to compare these lineages and they do not show any visible phenotypic differences. The notable exception is S. dentex, which appears to be a life-history form of Neretva marble trout [14]. Therefore, further research is necessary to determine whether marbled trout can be split into two geographically separated sister species. For such a classification, a detailed morphological analysis is required and to date, we propose that each lineage is considered as an evolutionary significant unit.