Sampling
For this study, we sampled buccal swabs from 64 Tibetan Terriers at dog shows or directly from breeders and owners: 24 western Tibetan Terriers (20 and 4 from the Lamleh and Luneville lineages, respectively), 22 native Tibetan Terriers, 8 TT-F1 (native × Lamleh), 6 TT-BC2 (F1 × Lamleh) and 4 TT-BC3 (TT-BC2 × Lamleh). The native Tibetan Terriers samples were collected at 22 locations in Tibet. Identification of purebred animals was based on specific morphological criteria and/or pedigree. DNA from buccal swabs was extracted using the standard protocol for the DNeasy Blood & Tissue kit (Qiagen, Germany). The quantity of DNA was estimated using NanoVue (GE Healthcare Life Sciences, USA).
Genotypic data
We performed genotyping using a microsatellite set and an Illumina 170 k CanineHD BeadChip. The data obtained from genotyping were merged with the publicly available SNP array data for 5406 dogs belonging to 163 breeds downloaded from the Dryad repository [14]. The number of dogs per breed ranged from 1 to 732 (see Additional file 2: Table S1).
Quality control and merging of the SNP array data were done with the SNP and Variation Suite v8.7.0 (Golden Helix, Inc., Bozeman, MT, http://www.goldenhelix.com). First, we filtered the data obtained from the genotyping of 24 Tibetan Terriers by excluding the markers on the X chromosome (leaving 168,102 SNPs) and by removing the SNPs with a call rate lower than 95% and an Hardy–Weinberg equilibrium P-value less than 0.00001 (leaving 156,708 SNPs. Then, less than 10% of missing calls per sample were allowed, and 18 out of 24 Tibetan Terrier samples passed the quality control. After merging this dataset with the publicly available data, 143,170 SNPs that were common to both datasets, were available and distributed across all the autosomal chromosomes. The average marker density was one SNP every 15,404 kb. Based on publicly available data, we excluded the village dogs and mixed breed dogs, and finally selected 19 or less representative animals with the highest call rates for each breed. Tibetan Terriers were divided into five subpopulations (native, Lamleh, Luneville, F1 individuals between native and Lamleh Tibetan Terriers, and Tibetan Terriers that were extracted from the Dryad database [14].
To compare CanineHD BeadChip data with microsatellite data, we examined 18 microsatellite loci suggested by the International Society of Animal Genetics (ISAG): AHTK211, CXX279, REN169O18, INU055, REN54P11, INRA21, AHT121, FH2054, REN162C04, AHT137, REN169D01, AHTH260, AHTK253, INU005, AHTH171 and REN247M23, using the Canine Genotyping Panel 1.1 (Thermo Scientific, California) reagent kit. Genotyping was performed on the capillary Genetic Analyser ABI 3130xl. Microsatellite alleles were called using the GENEMAPPER v 4.0 software (ABI, Foster City, CA, USA) and manually checked.
Neighbor-joining tree
We used various methods to provide a fine-scale assessment of the genetic structure of Tibetan Terriers. First, we computed pairwise Nei genetic distances between individuals belonging to 161 dog breeds (N = 1793) and grey wolf individuals (N = 14) using the ape R package [25]. A neighbor-joining tree was computed based on the distance matrix using the phyclust package for R [26].
NeighborNet
Phylogenetic relationships between Tibetan Terriers and both dogs from 27 dog breeds (N = 349) and grey wolf individuals (N = 14) were analysed using the NeighborNet network based on Nei distances and the SplitsTree4 software [27]. We split Tibetan Terriers into two subpopulations, the native and western populations of Tibetan Terriers.
Estimation of the migration rate using TreeMix
To infer the patterns of splits and mixtures in the populations of Tibetan Terriers and a subset of 28 dog breeds and the grey wolf, we used the Treemix algorithm [28] in which grey wolves (Gray_Wolf) were set as the rooting outgroup. First, we built a maximum likelihood tree of the populations with no migration events allowed. Then, we constructed a phylogenetic network for all selected populations, thereby increasing migration events sequentially up to 11 migrations. The residuals from the fit of the model to the data were visualized using the R script implemented in Treemix.
Population structure and admixture analysis
To investigate the population structure inferred from SNP array and microsatellite genotyping data, we used the model-based clustering method STRUCTURE [29]. Visualisation of the results and estimation of the best K value according to Evanno [30] were performed using the web-based tool CLUMPAK [31].
SNP array data were used to investigate the population structure of two datasets: (1) 366 dogs from 29 breeds and 14 grey wolves (by merging genotyped samples from this study and publicly available samples), and (2) a subset that included Tibetan Terrier samples only. We pruned the CanineHD BeadChip data from the 143,170 SNPs by setting a window size of 100, a step size of 20 and a pairwise r2 threshold of 0.1, which left data for 22,175 SNPs. A model that assumes admixture and correlated allele frequencies was used for all STRUCTURE runs, with a burn-in of 106 iterations followed by 105 MCMC iterations. Runs were repeated 10 times for each K value, which ranged from 1 to 20 for the larger dataset (29 dog breeds and grey wolf) and from 1 to 6 for the subset of Tibetan Terrier samples.
The STRUCTURE program was used to estimate the number of genetic populations/lineages in Tibetan Terriers based on allele frequencies of microsatellites with a burn-in of 106 iterations followed by 105 MCMC iterations. Runs were repeated 10 times for each K value ranging from 1 to 6. All Tibetan Terrier samples included in the SNP array analysis were present in the microsatellite dataset. However, the microsatellite dataset consisted of additional samples that were not genotyped on the SNP array.
Runs of homozygosity
Runs of homozygosity (ROH) were identified for each individual separately using the SNP and Variation Suite v8.8.3 (Golden Helix, Inc., Bozeman, MT, http://www.goldenhelix.com). ROH were defined as runs of 25 or more homozygous SNPs that had a minimum run length of 500 kb and in which no heterozygous SNPs, and no more than five missing SNPs were allowed. Similar to other studies [32], ROH were summarized in three length categories: short ranging from 0.5 to 2.5 Mb; medium ranging from 2.5 to 5.0 Mb; and long i.e. more than 5 Mb. Genomic inbreeding coefficients (FROH) were calculated as in [33] and were defined as the percentage of the autosomal genome present in a ROH. The length of the autosomal genome was set to 2,392,715,236 bp.