- Research Article
- Open Access
Age-based partitioning of individual genomic inbreeding levels in Belgian Blue cattle
© The Author(s) 2017
- Received: 3 July 2017
- Accepted: 13 December 2017
- Published: 22 December 2017
Inbreeding coefficients can be estimated either from pedigree data or from genomic data, and with genomic data, they are either global or local (when the linkage map is used). Recently, we developed a new hidden Markov model (HMM) that estimates probabilities of homozygosity-by-descent (HBD) at each marker position and automatically partitions autozygosity in multiple age-related classes (based on the length of HBD segments). Our objectives were to: (1) characterize inbreeding with our model in an intensively selected population such as the Belgian Blue Beef (BBB) cattle breed; (2) compare the properties of the model at different marker densities; and (3) compare our model with other methods.
When using 600 K single nucleotide polymorphisms (SNPs), the inbreeding coefficient (probability of sampling an HBD locus in an individual) was on average 0.303 (ranging from 0.258 to 0.375). HBD-classes associated to historical ancestors (with small segments ≤ 200 kb) accounted for 21.6% of the genome length (71.4% of the total length of the genome in HBD segments), whereas classes associated to more recent ancestors accounted for only 22.6% of the total length of the genome in HBD segments. However, these recent classes presented more individual variation than more ancient classes. Although inbreeding coefficients obtained with low SNP densities (7 and 32 K) were much lower (0.060 and 0.093), they were highly correlated with those obtained at higher density (r = 0.934 and 0.975, respectively), indicating that they captured most of the individual variation. At higher SNP density, smaller HBD segments are identified and, thus, more past generations can be explored. We observed very high correlations between our estimates and those based on homozygosity (r = 0.95) or on runs-of-homozygosity (r = 0.95). As expected, pedigree-based estimates were mainly correlated with recent HBD-classes (r = 0.56).
Although we observed high levels of autozygosity associated with small HBD segments in BBB cattle, recent inbreeding accounted for most of the individual variation. Recent autozygosity can be captured efficiently with low-density SNP arrays and relatively simple models (e.g., two HBD classes). The HMM framework provides local HBD probabilities that are still useful at lower SNP densities.
Two alleles are identical-by-descent (IBD) if they descend from a single allele in an ancestor. This measure is relative and depends on the definition of a reference (or base) population. Indeed, two alleles are declared IBD if the ancestor belongs to the reference population and identical-by-state (IBS) for more remote common ancestors. When two alleles are IBD within an individual, the terms “autozygous” or “homozygous-by-descent” (HBD) are used. The inbreeding coefficient F of an individual is related to these measures and is defined as the probability that two alleles at any locus in this individual are IBD . Inbreeding is associated with negative effects on fitness (e.g., [2–4]) and the occurrence of monogenic disorders increases in populations with higher levels of inbreeding . Thus, the study and management of inbreeding are of high importance in such populations. Belgian Blue Beef cattle (BBB) represent a good example of an intensively selected cattle population. A consequence of the selection process in this breed is the increase in the level of inbreeding, as illustrated by several recent outbreaks of genetic recessive defects [5–11].
There are several methods to estimate the inbreeding coefficient F. In the past, methods were based on the genealogy and estimated the expected inbreeding coefficient (based on the relationship between the two parents). With the development of genetic markers, several approaches allow the estimation of the realized inbreeding coefficient (“observed” in an individual), even in the absence of genealogy. Global approaches, including moments estimators (e.g., ), simple homozygosity measures (e.g., ) or based on the genomic relationship matrix , estimate the total amount of inbreeding in an individual and can work with sparse genetic maps. Methods that are based on runs of homozygosity (ROH) (e.g., ) are, most often, empirical rule-based methods, which assume that long stretches of identical alleles are HBD. For such rule-based methods, prior parameters have to be defined, i.e., the minimal number of homozygous markers, the minimal length and the maximum number of allowed heterozygous markers to consider a set of successive markers as HBD, etc. Likelihood-based approaches (e.g., [15, 16]) rely on probabilistic models, which use allele frequencies and genotyping error rates to determine whether ROH are autozygous (i.e., HBD), and derive from earlier works by Broman and Weber . Compared to global estimators, ROH-based methods require denser genetic maps and can provide estimators of local autozygosity. ROH have been used to study inbreeding in diverse species including humans [14, 16, 18], pigs , cattle [20, 21] and others, and to study genetic diversity and signatures of selection. In addition, ROH offer the possibility to distinguish between recent and more ancient inbreeding [16, 18, 22]. Indeed, segments that are inherited from recent ancestors are expected to be longer since the recombination process has fewer generations to split the fragment into smaller pieces. Finally, hidden Markov models (HMM) were developed to estimate the HBD probability of segments along chromosomes  and make use of all the available information about the sequences of homozygous/heterozygous markers, allele frequencies of markers, the genetic map, and genotyping error rates. These models can handle whole-genome sequence data , including low-fold experiments . All these HMM assume that (1) all the autozygosity results from a single event, (2) all the HBD segments trace back to one or several ancestors in a single generation, and (3) they all have the same expected length. However, natural and domesticated populations are complex. They result from a long demographic history with variable effective population size (Ne) and, sometimes, have undergone major demographic events such as bottlenecks.
To relax this strong assumption of the current HMM methods, we recently developed a new HMM with multiple age-based HBD-classes  in which the length of the HBD segments from different classes have distinct expected distributions (longer/shorter segments for more recent/ancient common ancestors). The model allows to fit genomic data better and to reveal the “recent” demographic history of populations. The aims of our study were to: (1) characterize inbreeding by using a model describing genomes as a mosaic of non-HBD and HBD segments and partitioning the latter in multiple age-related classes in an intensively selected cattle population such as BBB cattle; (2) investigate the effect of marker density and setting of parameters on the estimates; and (3) compare our estimates with those obtained with other methods (pedigree-based inbreeding coefficients, estimates from the genomic relationship matrix or rule-based ROH estimators).
Single nucleotide polymorphism (SNP) genotypes for the 735,293 SNPs from the Illumina BovineHD BeadChip (HD; Illumina, San Diego, CA) were available for 634 BBB sires. Moreover, whole-genome sequencing (WGS) data were also available for 50 of these sires (the bioinformatic processing of the WGS data is described in ). The pedigree including all known ancestors of the 634 bulls contained 7676 individuals. In addition, we extracted from the Widde database (http://widde.toulouse.inra.fr; ), Illumina BovineHD genotypes for animals belonging to 10 cattle breeds of European origin (originally provided by the BovineHD genotyping consortium). This set contained samples from 42 Angus, 22 Brown Swiss, 37 Charolais, 21 Guernsey, 35 Hereford, 60 Holstein, 38 Jersey, 50 Limousin, 21 Piedmontese and 21 Romagnola individuals.
All individuals had a call rate higher than 0.90. We selected SNPs that mapped to bovine autosomes (using the UMD3.1 build) and removed from the dataset those that had a call rate lower than 95% and minor allelic frequency lower than 0.01, that significantly deviated from Hardy–Weinberg proportions (p < 0.001) or that presented incompatible genotypes for more than one parent–offspring pair, which resulted in a set of 601,226 SNPs. Furthermore, SNPs located in segments that might be incorrectly mapped to the genome build were removed. Such putative errors were identified based on evidence from linkage information , linkage disequilibrium  or an excess of breaks in ROH from independent samples . Consequently, an additional 2.7% of the SNPs were filtered out, which resulted in a final BBB dataset of 585,159 SNPs. Removing potential map errors is essential for our applications since these might break long ROH into smaller fragments. For the other breeds, the number of conserved SNPs using the same rules ranged from 524,113 to 622,603 SNPs.
To study the effect of SNP density on the estimation of inbreeding, we used two subsets of the 585,159 SNPs selected for BBB cattle based on their presence on the bovine Illumina BovineSNP50 BeadChip v1 and v2 (32,412 SNPs conserved for this 50 K panel) or on both the 50 K panel and the Illumina BovineLD BeadChip (6844 SNPs conserved for this low-density (LD) panel).
For the sequence data, first we applied stringent filtering rules to select a high-quality subset of SNPs, as described in . Briefly, SNPs, which passed the calibration score and were present in other cattle WGS datasets (1000 bull genomes project , Holstein and Jersey individuals from New-Zealand  and a Dutch Holstein pedigree of 415 individuals that was used as a reference population for imputation in ), were selected, resulting in a set of ancient variants. We conserved only the SNPs that presented correct Mendelian segregation in the WGS Dutch Holstein pedigree (see  for more details). Regarding the genotyping data, we also removed variants with a MAF lower than 0.01 and some possibly incorrectly mapped regions (errors in the genome assembly) based on the rules described in . The final WGS dataset contained 5,653,911 bi-allelic SNPs.
Methods to estimate inbreeding coefficients and HBD probabilities
Multiple HBD-classes HMM
Our multiple HBD-classes model  is a HMM that describes individual genomes as mosaics of multiple HBD and non-HBD states. Although several non-HBD states can be fitted, here we used only one non-HBD state and K − 1 HBD states for a total of K states, where K is a parameter of the method that can be either predefined or selected by model comparison (see below). Each state k has its own rate parameter R k that defines the distribution of the lengths of the segments originating from that class: the lengths in Morgans are distributed exponentially with rate R k . The rate corresponds approximately to the size of the inbreeding loop measured in generations and is closely related to age in generations of the common ancestors. R k is approximately twice the number of generations to the common ancestor. Each state has also its own mixing proportion, which is equal to the frequency of segments originating from that class. Such a model with multiple-HBD classes will be referred to as a KR model, with K being equal to the number of distinct rates fitted, K − 1 for HBD states and 1 for the non-HBD state. In the case where a single HBD class and a single non-HBD class are fitted, we use a common rate for both (1R model) since such a model has better properties . Emission probabilities of the HMM correspond to the probabilities of observing a particular genotype conditionally on the underlying state (HBD or non-HBD). For non-HBD classes, these probabilities correspond to Hardy–Weinberg proportions  and for HBD classes, homozygotes AA are observed with a probability f A (1 − \( \upvarepsilon \)) and heterozygotes with a probability \( \upvarepsilon \), where f A is the frequency of allele A and \( \upvarepsilon \) is an error term corresponding to the probability of observing a heterozygous genotype in a HBD segment . With WGS data, these probabilities are integrated over the different genotype probabilities obtained from the VCF file . For each HBD class, the genome-wide HBD probability is estimated as the probability of belonging to that class averaged over the whole genome, whereas the local HBD probability is defined as the probability of belonging to that class at a specific genomic location (see  for more details). The genome-wide HBD probabilities correspond to the percentage of the genome that is associated with a specific HBD class, e.g., the proportion of the genome that is located within HBD segments of a certain length. To estimate the inbreeding coefficient, first the base population must be defined, which is done by deciding which classes are considered as truly autozygous. For instance, we might consider that ancestors associated with classes with a R k rate higher than a selected threshold T (i.e., R k ≥ T) are unrelated. Then, the corresponding inbreeding coefficient F G-T is estimated as the probability to belong to any of the HBD classes with a R k ≤ T averaged over the whole genome (e.g., the inbreeding coefficient is defined as the probability of sampling an HBD locus given a reference population). Since R k rates of HBD classes are approximately equal to twice the number of generations to the common ancestor, including HBD classes with a R k ≤ T amounts to setting the base population to approximately 0.5 * T generations ago. In the remainder of the manuscript, inbreeding coefficients or HBD probabilities reported without specifying a base population or R k , are obtained by including all HBD classes (e.g., using the most remote base population). In that case, the age of the base population or the smallest HBD segments captured are a function of the SNP density used. All the HBD probabilities are estimated with the forward–backward algorithm .
As an alternative to the KR model, we can use a set of pre-defined R k rates and estimate only the mixing proportions (MixKR model). This set of R k rates should be selected to cover a wide range of past generations. In our analyses, we used 13 HBD states with respective R k rates equal to [21, 22, 23, …, 213] and one non-HBD class with a rate of 213. These values were chosen to have a constant and limited degree of overlap between the exponential distributions that specify the HBD lengths for each successive class. The upper rate is determined by the SNP density that defines the size of the smallest HBD segments that we can capture. Such models proved efficient to estimate the genome-wide (global) and local autozygosity levels and to obtain information on recent demographic history . In addition, inbreeding coefficients are then estimated with respect to the same reference population and HBD classes are defined over identical periods in the past, allowing better comparisons between individuals.
With all the models, the parameters (mixing proportions for all models and R k rates for KR models only) were estimated with 1000 iterations of the expectation-maximization algorithm with constraints to force R k to be between 1 and 8192. The number of classes K is fixed for each run but the optimal value can be determined by comparing models with the Bayesian information criterion (BIC). All analyses were performed with the ZooRoH software (https://github.com/tdruet/ZooRoH).
Additional inbreeding coefficient estimators
The inbreeding coefficient based on pedigree data (FPED) was computed with the method of Sargolzaei et al. . We used several measures to estimate genomic inbreeding coefficients. The first measure uses the diagonal elements of the genomic relationship matrix (GRM) computed with the BLUPF90 package  without any pedigree information (\( \upalpha \) set to 1.0) and is based on the variance of the additive genetic values (FGRM; [13, 37]). The second, which was proposed and recommended by Yang et al.  for its smaller sampling variance, is based on the correlation between uniting gametes (FUNI) and was estimated using the GCTA software . The third more simple measure is defined as the homozygosity (FHOM) or the proportion of homozygous SNPs (e.g., ), which is closely related to the excess homozygosity estimator (FExHOM) implemented in plink . For FGRM, FUNI and FExHOM, we estimated allele frequencies based on the 31 bulls born before 1985. Finally, the fourth estimator measures the proportion of the genome covered by ROH (FROH), which contained at least 15 SNPs and were identified using plink  with 50-SNP windows (no heterozygous genotypes were accepted and up to five missing genotypes were possible). These parameters were selected based on published studies in cattle (e.g., [20, 22, 41]). The minimal SNP density, length of ROH and maximal SNP spacing were optimized for each panel as follows by order of increasing density: at least one SNP per 500, 100 and 10 kb, the length of ROH had to be at least 5 Mb, 1 Mb and 100 kb long and the maximum distance between two consecutive SNPs had to be 1 Mb, 500 kb and 200 kb.
Estimation and age-based partitioning of individual genomic inbreeding levels in the Belgian Blue Beef cattle population
Comparison of the results for BBB cattle with those of other breeds
Estimation of inbreeding coefficients and HBD probabilities with different SNP densities
Distribution of the length of HBD segments identified with a model with 13 HBD-classes with pre-defined R k rates for different SNP densities
HBD segment length category
50 K panel
Comparison of models
Models that estimate R k rates of HBD-classes (KR models)
Comparison of models used to estimate genomic inbreeding coefficients with different numbers of HBD classes (from 1 to 4)
Number of fitted HBD classes
Correlation with F G-8192 b
Median of estimated R k rates per HBD class
1st HBD class
2nd HBD class
3rd HBD class
4th HBD class
Models with pre-defined R k rates of HBD classes (MixKR models)
Estimation of genomic inbreeding coefficients with models using different numbers of HBD classes (from 1 to 4) with pre-defined R k rates that correspond to the expected length in Morgans of HBD segments and with the LD panel
Number of fitted HBD classes
Correlation with reference F G a
Predefined R k rates used for each HBD class
1st HBD class
2nd HBD class
3rd HBD class
Comparison to other inbreeding coefficient estimators
Summary statistics for the inbreeding coefficients estimated for the 634 Belgian Blue sires with different methods and using the HD panel
Correlations between inbreeding coefficients estimated for the 634 Belgian Blue sires with different methods and using the HD panel
Correlations of estimates from the traditional GRM with our estimates are moderately high (r = 0.73) and lower with homozygosity estimators (r = 0.63) and ROH-based estimators (0.61). The FGRM was computed with the formula proposed by , which divides all SNP contributions by the same weight. When estimated with the alternative formula, which divides each SNP contribution by its own weight 2f i (1 − f i ) (f i being the frequency of SNP i) as in Amin et al. , correlations were lower (i.e., 0.48 with FG, 0.34 with FHOM and 0.33 with FROH). The estimator based on the unified correlations between gametes proposed by Yang et al.  presented relatively high correlations with both FG and FGRM (respectively, 0.90 and 0.92) and slightly lower correlations with the other estimators (r = 0.87 and 0.85 with FHOM and FROH, respectively).
For several reasons, the Belgian Blue Beef cattle breed is considered as an extremely selected breed. It is famous for its exceptional muscular development referred to as “double muscling”, which is caused by an 11-bp deletion in the myostatin gene . This loss-of-function variant is almost fixed in the current population (e.g., ) but muscular development was further improved through intense selection . As a result, most often calving requires caesarian section. In addition, artificial insemination is more frequent in this breed compared to other beef cattle breeds, which allows a more intense use of key sires. In recent years, several outbursts of recessive defects associated with inbreeding have been reported. For instance, causative variants were identified for eight recessive defects including congenital muscular dystonia 1 and 2 , crooked tail syndrome [6, 7], stunted growth , gingival hamartome , prolonged gestation, lethal arthrogryposis syndrome  and junctional epidermolysis bullosa . Some of these defects have reached a high frequency in the population.
When estimated over all HBD-classes, the average genomic inbreeding coefficient was high (higher than 0.30) but these values were comparable to those obtained for other cattle breeds of European origin (i.e., BBB presented intermediate values). In agreement with Purfield et al. , samples from breeds that originated from the British Isles (Hereford, Angus, Jersey or Guernsey) presented high inbreeding coefficients (≈ 34 to 40%), possibly as a result of closed population histories and strict importation restrictions . Similarly, high levels of inbreeding in Holstein and Brown Swiss breeds were previously reported [21, 41, 47]. When focusing on recent common ancestors only (associated with HBD-classes with R k ≤ 64), we observed lower inbreeding coefficients in BBB cattle, ranging from 1.0 to 17.7% across animals (6.8% on average), with a positive trend: animals from the current population presenting 6% higher inbreeding coefficients on average than individuals born 30 years ago. Some individuals accumulated more than 10% recent autozygosity and carried HBD segments longer than 10 or even 50 Mb. The same model applied to other species, i.e., dog breeds or sheep populations that suffered severe bottlenecks revealed significantly higher levels of recent autozygosity . Conversely, some human and sheep populations presented lower levels of recent autozygosity (even lower than 1% on average). The recent HBD-classes are probably more relevant for management purposes because they account for most of the individual variation in genome-wide autozygosity. In addition, deleterious variants might be mostly associated to recent HBD segments because older variants have undergone more generations of selection against deleterious effects (e.g., [3, 48, 49]). Recent intensive selection of key sires allowed some deleterious variants to reach higher frequency than under natural selection. Indeed, strong bottlenecks that occur with domestication, breed creation or intensive selection in cattle result in the relaxation of purifying selection and increase the load of deleterious mutations (e.g., [50, 51]). For instance, all identified variants that cause recessive defects in BBB cattle are specific to this breed (suggesting their young age). We applied our model to previously genotyped cases (see ) and the causative variants were found on recent HBD segments (associated with HBD-classes with R k ≤ 32), also suggesting that these variants are relatively young.
Application of our model with different SNP densities showed large differences in average estimated inbreeding coefficients, with the average FG equal to 0.060, 0.093 and 0.303 using the LD, 50 K and HD panel, respectively. Correlations between these estimates were very high (even with the LD panel, r > 0.93). High-density panels allow the capture of shorter ROH that are associated with very ancient ancestors, are characteristic of the population (associated with past demographic history) and present little individual variation. For recent HBD classes, estimators were similar across SNP panels (up to R k = 32 with the LD panel and 256 with the 50 K panel). Small HBD segments, ranging from 10 kb to 1 Mb, accounted for most of the differences obtained with the HD panel compared to the lower density panels. A substantial proportion of HBD segments longer than respectively 1 and 5 Mb were identified with the LD and the 50 K panels. These observations are consistent with those of Ferenčaković et al.  and Purfield et al.  who showed that denser panels can be used to identify short ROH and that the 50 K panel proved suitable to identify ROH longer than 5 Mb. If the goal is to estimate the inbreeding coefficient with respect to a recent base population, which has more variation and is possibly the most functionally relevant one (see above), these LD and 50 K panels provide enough information (e.g., the correlation between FG estimated with the HD and the 50 K panels was equal to 0.975). Regarding the optimal model, our comparisons indicated that models with a few HBD classes (1 or 2 according to SNP density) achieved results that were as good as those obtained with 13 HBD classes in terms of FG and correlations with more complex models. Thus, such parsimonious models were selected based on the BIC. For each SNP panel, we recommend the use of the largest K that is optimal for a substantial proportion of individuals since that value is required for these animals and using a larger K will not penalize the other individuals. To make comparisons between individuals easier, we also recommend the use of a model with pre-defined R k rates and the same HBD-classes for all individuals. In that case, the use of at least two HBD-classes is preferable with low-density panels, one to capture the recent HBD segments and one that is associated with more remote ancestors. Three HBD-classes models present a parsimonious solution to distinguish recent from ancient autozygosity (similarly to ) but if the objective is to obtain a finer age-based partitioning of autozygosity, more HBD classes could be recommended.
Comparisons of inbreeding coefficients obtained with different estimators have already been reported in the literature. In this paper, we also report correlations with our estimates of genome-wide inbreeding. These comparisons are essentially indicative since different methods refer to different base populations and all estimators are not fully comparable (e.g., ). In addition, some estimators are sensitive to the estimated allelic frequencies. Here, we used frequencies that were estimated using the set of genotyped bulls born before 1985. At moderate to high SNP density, the genome-wide inbreeding coefficient estimated with our model, averaged over all SNPs and HBD classes, was highly correlated with homozygosity measures or ROH-based estimates, whereas lower correlations were obtained when compared to estimates based on the genomic relationship matrix. Low correlations between FGRM and homozygosity measures (homozygosity or ROH) were previously reported (e.g., ) although moderate to high correlations were also found (e.g., [2, 4]). It should be kept in mind that these results must be interpreted with caution because global estimators, and particularly FGRM, depend strongly on the estimation of allele frequencies in the population. In addition to global inbreeding coefficients, our model also estimates local autozygosity (i.e., it identifies HBD segments) and uses the linkage between SNPs as ROH-based estimators, conversely to global estimators that consider all SNPs as independent (FHOM, FGRM, FUNI or FPED). Correlations with homozygosity measures decreased at lower SNP densities when the use of linkage between successive SNP positions was more important to determine whether a position is IBD or not. ROH-based estimators are not frequently used with LD panels in cattle and previous studies concluded that LD panels were appropriate to identify recent inbreeding or HBD segments longer than 5 Mb [20, 22]. The HMM proved to work well with LD panels, i.e., it captured the recent HBD segments, presented high correlations with coefficients estimated at higher density, and provided HBD probabilities. It is indeed recommended to use such probabilities at low-density because they account for uncertainty due to lower informativeness as opposed to ROH-based classification or the Viterbi algorithm. We showed that, at lower SNP densities, the smallest HBD segments are not captured but also that the Viterbi algorithm even fails to identify some segments of moderate size. Therefore, we recommend the use of HBD probabilities that are obtained with the forward-backward algorithm. Most of the global estimators provided inbreeding coefficients relative to a base population, i.e., the founders of the pedigree or the population used to estimate allele frequencies, whereas the multiple-HBD class model provides an age-based partitioning of autozygosity. As a result, inbreeding coefficients estimated by including all HBD classes are higher because some HBD-classes trace back to more remote generations than the base population commonly used by other methods and the SNP density determines how ancient HBD segments can be captured. Compared to rule-based ROH, the HMM framework also allows to accommodate low-fold sequencing or genotype-by-sequencing data, i.e., when genotypes are not unambiguously determined, as described in Vieira et al.  and Druet and Gautier .
Moderate to high correlations between FPED and FROH (from 0.50 to 0.75) were reported in cattle (e.g., [3, 21, 22, 54, 55]). In addition, long ROH (> 5 Mb) were shown to be closely associated with pedigree inbreeding coefficients . Correlations between estimators obtained from the pedigree and the genomic relationship matrices are more variable, ranging from moderate (e.g., ) to high (e.g., ), whereas in other studies, these correlations were particularly low [54, 56]. As mentioned above, these differences might be due to the estimation of the allelic frequencies. Inbreeding coefficients estimated with the HMM had moderate correlations with pedigree-based inbreeding coefficients, lower than with methods based on homozygosity or ROH that were in the range with correlations reported in previous studies. However, correlations were higher with the autozygosity associated to the recent HBD-classes, which is a desired feature since these recent classes correspond to the autozygosity captured by the pedigree whereas old HBD-classes are associated to ancestors tracing further back than the genealogy. Similarly, correlations between HMM inbreeding coefficients estimated with the LD panel and pedigree-based estimates were higher since they capture only recent autozygosity (compared to higher density panels).
Although we observed high levels of inbreeding associated with small HBD segments in Belgian Blue Beef cattle, recent HBD segments account for most of the individual variation. Recent autozygosity can be captured efficiently with low-density SNP panels and with relatively simple models (e.g., two HBD classes) although we recommend the use of models with pre-defined R k rates that are associated with the expected length of HBD segments (the same HBD-classes are then used for all individuals) to make comparisons between individuals easier. More complex models (with more age-based HBD classes) are needed to obtain a finer age-based partitioning of inbreeding levels and indications of the past demographic history of a population. Such partitioning allows to better understand which HBD classes contribute to individual autozygosity. In addition, the use of more classes avoids the fragmentation of long HBD segments into smaller fragments with next-generation sequencing data. Estimates obtained with the HMM framework are highly correlated with those obtained based on relative homozygosity (or ROH). In addition, such HMM can use genotype probabilities (e.g., with low-fold sequencing data) and provide, beyond global estimates, local HBD probabilities that are still useful at lower SNP densities. Such local HBD probabilities might be useful to identify regions associated with inbreeding depression.
MS, ASG, TD, PF and AB performed the experiments. MS, TD, FF, MG and PF conceived the study, interpreted the results and wrote the manuscript. All authors read and approved the final manuscript.
This work and the ZooROH project were supported by the Fonds de la Recherche Scientifique-FNRS (F.R.S.-FNRS) under Grant J.0134.16. Tom Druet is Senior Research Associate from the F.R.S.-FNRS. We used the supercomputing facilities of the “Consortium d’Equipements en Calcul Intensif en FédérationWallonie-Bruxelles” (CECI), funded by the F.R.S.-FNRS.
The authors declare that they have no competing interests.
This work was supported by the Fonds de la Recherche Scientifique-FNRS (F.R.S.-FNRS) under Grant J.0134.16.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Malécot G. Les Mathématiques de l’hérédité. Paris: Masson et Cie; 1948.Google Scholar
- Bjelland DW, Weigel KA, Vukasinovic N, Nkrumah JD. Evaluation of inbreeding depression in Holstein cattle using whole-genome SNP markers and alternative measures of genomic inbreeding. J Dairy Sci. 2013;96:4697–706.View ArticlePubMedGoogle Scholar
- Leroy G. Inbreeding depression in livestock species: review and meta-analysis. Anim Genet. 2014;45:618–28.View ArticlePubMedGoogle Scholar
- Pryce JE, Haile-Mariam M, Goddard ME, Hayes BJ. Identification of genomic regions associated with inbreeding depression in Holstein and Jersey dairy. Genet Sel Evol. 2014;46:71.View ArticlePubMedPubMed CentralGoogle Scholar
- Charlier C, Coppieters W, Rollin F, Desmecht D, Agerholm JS, Cambisano N, et al. Highly effective SNP-based association mapping and management of recessive defects in livestock. Nat Genet. 2008;40:449–54.View ArticlePubMedGoogle Scholar
- Fasquelle C, Sartelet A, Li W, Dive M, Tamma N, Michaux C, et al. Balancing selection of a frame-shift mutation in the MRC2 gene accounts for the outbreak of the crooked tail syndrome in Belgian Blue cattle. PLoS Genet. 2009;5:e1000666.View ArticlePubMedPubMed CentralGoogle Scholar
- Sartelet A, Klingbeil P, Franklin CK, Fasquelle C, Géron S, Isacke CM, et al. Allelic heterogeneity of Crooked Tail Syndrome: result of balancing selection? Anim Genet. 2012;43:604–7.View ArticlePubMedGoogle Scholar
- Sartelet A, Druet T, Michaux C, Fasquelle C, Géron S, Tamma N, et al. A splice site variant in the bovine RNF11 gene compromises growth and regulation of the inflammatory response. PLoS Genet. 2012;8:e1002581.View ArticlePubMedPubMed CentralGoogle Scholar
- Sartelet A, Stauber T, Coppieters W, Ludwig CF, Fasquelle C, Druet T, et al. A missense mutation accelerating the gating of the lysosomal Cl−/H+-exchanger ClC-7/Ostm1 causes osteopetrosis with gingival hamartomas in cattle. Dis Model Mech. 2014;7:119–28.View ArticlePubMedGoogle Scholar
- Sartelet A, Li W, Pailhoux E, Richard C, Tamma N, Karim L, et al. Genome-wide next-generation DNA and RNA sequencing reveals a mutation that perturbs splicing of the phosphatidylinositol glycan anchor biosynthesis class H gene (PIGH) and causes arthrogryposis in Belgian Blue cattle. BMC Genomics. 2015;16:316.View ArticlePubMedPubMed CentralGoogle Scholar
- Sartelet A, Harland C, Tamma N, Karim L, Bayrou C, Li W, et al. A stop-gain in the laminin, alpha 3 gene causes recessive junctional epidermolysis bullosa in Belgian Blue cattle. Anim Genet. 2015;46:566–70.View ArticlePubMedGoogle Scholar
- Ritland K. Estimators for pairwise relatedness and individual inbreeding coefficients. Genet Res. 1996;67:175–85.View ArticleGoogle Scholar
- VanRaden PM. Efficient methods to compute genomic predictions. J Dairy Sci. 2008;91:4414–23.View ArticlePubMedGoogle Scholar
- McQuillan R, Leutenegger AL, Abdel-Rahman R, Franklin CS, Pericic M, Barac-Lauc L, et al. Runs of homozygosity in European populations. Am J Hum Genet. 2008;83:359–72.View ArticlePubMedPubMed CentralGoogle Scholar
- Wang S, Haynes C, Barany F, Ott J. Genome-wide autozygosity mapping in human populations. Genet Epidemiol. 2009;33:172–80.View ArticlePubMedPubMed CentralGoogle Scholar
- Pemberton TJ, Absher D, Feldman MW, Myers RM, Rosenberg NA, Li JZ. Genomic patterns of homozygosity in worldwide human populations. Am J Hum Genet. 2012;91:275–92.View ArticlePubMedPubMed CentralGoogle Scholar
- Broman KW, Weber JL. Long homozygous chromosomal segments in reference families from the Centre d’Etude du Polymorphisme Humain. Am J Hum Genet. 1999;65:1493–500.View ArticlePubMedPubMed CentralGoogle Scholar
- Kirin M, McQuillan R, Franklin CS, Campbell H, McKeigue PM, Wilson JF. Genomic runs of homozygosity record population history and consanguinity. PLoS ONE. 2010;5:e13996.View ArticlePubMedPubMed CentralGoogle Scholar
- Bosse M, Megens HJ, Madsen O, Paudel Y, Frantz L, Schook L, et al. Regions of homozygosity in the porcine genome: consequence of demography and the recombination landscape. PLoS Genet. 2012;8:e1003100.View ArticlePubMedPubMed CentralGoogle Scholar
- Ferenčaković M, Sölkner J, Curik I. Estimating autozygosity from high-throughput information: effects of SNP density and genotyping errors. Genet Sel Evol. 2013;45:42.View ArticlePubMedPubMed CentralGoogle Scholar
- Ferenčaković M, Hamzić E, Gredler B, Solberg TR, Klemetsdal G, Curik I, et al. Estimates of autozygosity derived from runs of homozygosity: empirical evidence from selected cattle populations. J Anim Breed Genet. 2013;130:286–93.View ArticlePubMedGoogle Scholar
- Purfield DC, Berry DP, McParland S, Bradley DG. Runs of homozygosity and population history in cattle. BMC Genet. 2012;13:70.View ArticlePubMedPubMed CentralGoogle Scholar
- Leutenegger AL, Prum B, Génin E, Verny C, Lemainque A, Clerget-Darpoux F, et al. Estimation of the inbreeding coefficient through use of genomic data. Am J Hum Genet. 2003;73:516–23.View ArticlePubMedPubMed CentralGoogle Scholar
- Narasimhan V, Danecek P, Scally A, Xue Y, Tyler-Smith C, Durbin R. BCFtools/RoH: a hidden Markov model approach for detecting autozygosity from next-generation sequencing data. Bioinformatics. 2016;32:1749–51.View ArticlePubMedPubMed CentralGoogle Scholar
- Vieira FG, Albrechtsen A, Nielsen R. Estimating IBD tracts from low coverage NGS data. Bioinformatics. 2016;32:2096–102.View ArticlePubMedGoogle Scholar
- Druet T, Gautier M. A model-based approach to characterize individual inbreeding at both global and local genomic scales. Mol Ecol. 2017;26:5820–41.View ArticlePubMedGoogle Scholar
- Charlier C, Li W, Harland C, Littlejohn M, Coppieters W, Creagh F, et al. NGS-based reverse genetic screen for common embryonic lethal mutations compromising fertility in livestock. Genome Res. 2016;26:1333–41.View ArticlePubMedPubMed CentralGoogle Scholar
- Sempéré G, Moazami-Goudarzi K, Eggen A, Laloë D, Gautier M, Flori L. WIDDE: a Web-Interfaced next generation database for genetic diversity exploration, with a first application in cattle. BMC Genom. 2015;16:940.View ArticleGoogle Scholar
- Druet T, Georges M. LINKPHASE3: an improved pedigree-based phasing algorithm robust to genotyping and map errors. Bioinformatics. 2015;31:1677–9.View ArticlePubMedGoogle Scholar
- Utsunomiya ATH, Santos DJA, Boison SA, Utsunomiya YT, Milanesi M, Bickhart DM, et al. Revealing misassembled segments in the bovine reference genome by high resolution linkage disequilibrium scan. BMC Genomics 2016;17:705.View ArticlePubMedPubMed CentralGoogle Scholar
- Faux P, Druet T. A strategy to improve phasing of whole-genome sequenced individuals through integration of familial information from dense genotype panels. Genet Sel Evol. 2017;49:46.View ArticlePubMedPubMed CentralGoogle Scholar
- Daetwyler HD, Capitan A, Pausch H, Stothard P, van Binsbergen R, Brøndum RF, et al. Whole-genome sequencing of 234 bulls facilitates mapping of monogenic and complex traits in cattle. Nat Genet. 2014;46:858–65.View ArticlePubMedGoogle Scholar
- Kadri NK, Harland C, Faux P, Cambisano N, Karim L, Coppieters W, et al. Coding and noncoding variants in HFM1, MLH3, MSH4, MSH5, RNF212, and RNF212B affect recombination rate in cattle. Genome Res. 2016;26:1323–32.View ArticlePubMedPubMed CentralGoogle Scholar
- Rabiner LR. A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE. 1989;77:257–86.View ArticleGoogle Scholar
- Sargolzaei M, Iwaisaki H, Colleau JJ. A fast algorithm for computing inbreeding coefficients in large populations. J Anim Breed Genet. 2005;122:325–31.View ArticlePubMedGoogle Scholar
- Misztal I, Tsuruta S, Strabel T, Auvray B, Druet T, Lee DH. BLUPF90 and related programs (BGF90). In: Proceedings of 7th world congress on genetics applied to livestock production, 19–23 Aug 2002. Montpellier; 2002.Google Scholar
- VanRaden PM, Olson KM, Wiggans GR, Cole JB, Tooker ME. Genomic inbreeding and relationships among Holsteins, Jerseys, and Brown Swiss. J Dairy Sci. 2011;94:5673–82.View ArticlePubMedGoogle Scholar
- Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, Nyholt DR, et al. Common SNPs explain a large proportion of the heritability for human height. Nat Genet. 2010;42:565–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Yang J, Lee SH, Goddard ME, Visscher PM. GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet. 2011;88:76–82.View ArticlePubMedPubMed CentralGoogle Scholar
- Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81:559–75.View ArticlePubMedPubMed CentralGoogle Scholar
- Szmatoła T, Gurgul A, Ropka-Molik K, Jasielczuk I, Ząbek T, Bugno-Poniewierska M. Characteristics of runs of homozygosity in selected cattle breeds maintained in Poland. Livest Sci. 2016;188:72–80.View ArticleGoogle Scholar
- Amin N, van Duijn CM, Aulchenko YS. A genomic background based method for association analysis in related individuals. PLoS ONE. 2007;2:e1274.View ArticlePubMedPubMed CentralGoogle Scholar
- Boichard D. PEDIG: a fortran package for pedigree analysis suited for large populations. In: INRA, editor. Proceedings of 7th world congress on genetics applied to livestock production. 19–23 Aug 2002. Montpellier; 2002.Google Scholar
- Grobet L, Martin LJ, Poncelet D, Pirottin D, Brouwers B, Riquet J, et al. A deletion in the bovine myostatin gene causes the double-muscled phenotype in cattle. Nat Genet. 1997;17:71–4.View ArticlePubMedGoogle Scholar
- Druet T, Pérez-Pardal L, Charlier C. Gautier M. Identification of large selective sweeps associated with major genes in cattle. Anim Genet. 2013;44:758–62.View ArticlePubMedGoogle Scholar
- Druet T, Ahariz N, Cambisano N, Tamma N, Michaux C, Coppieters W, et al. Selection in action: dissecting the molecular underpinnings of the increasing muscle mass of Belgian Blue cattle. BMC Genom. 2014;15:796.View ArticleGoogle Scholar
- Peripolli E, Munari DP, Silva MVGB, Lima ALF, Irgang R, Baldi F. Runs of homozygosity: current knowledge and applications in livestock. Anim Genet. 2017;48:255–71.View ArticlePubMedGoogle Scholar
- Hinrichs D, Meuwissen THE, Ødegard J, Holt M, Vangen O, Woolliams JA. Analysis of inbreeding depression in the first litter size of mice in a long-term selection experiment with respect to the age of the inbreeding. Heredity (Edinburgh). 2007;99:81–8.View ArticleGoogle Scholar
- Szpiech ZA, Xu J, Pemberton TJ, Peng W, Zöllner S, Rosenberg NA, et al. Long runs of homozygosity are enriched for deleterious variation. Am J Hum Genet. 2013;93:90–102.View ArticlePubMedPubMed CentralGoogle Scholar
- Cruz F, Vilà C, Webster MT. The legacy of domestication: accumulation of deleterious mutations in the dog genome. Mol Biol Evol. 2008;25:2331–6.View ArticlePubMedGoogle Scholar
- Schubert M, Jónsson H, Chang D, Der Sarkissian C, Ermini L, Ginolhac A, et al. Prehistoric genomes reveal the genetic foundation and cost of horse domestication. Proc Natl Acad Sci USA. 2014;111:E5661–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Zhang Z, Guillaume F, Sartelet A, Charlier C, Georges M, Farnir F, et al. Ancestral haplotype-based association mapping with generalized linear mixed models accounting for stratification. Bioinformatics. 2012;28:2467–73.View ArticlePubMedGoogle Scholar
- Curik I, Ferenčaković M, Sölkner J. Inbreeding and runs of homozygosity: a possible solution to an old problem. Livest Sci. 2014;166:26–34.View ArticleGoogle Scholar
- Zhang Q, Calus MPL, Guldbrandtsen B, Lund MS, Sahana G. Estimation of inbreeding using pedigree, 50 k SNP chip genotypes and full sequence data in three cattle breeds. BMC Genet. 2015;16:88.View ArticlePubMedPubMed CentralGoogle Scholar
- Marras G, Gaspa G, Sorbolini S, Dimauro C, Ajmone-Marsan P, Valentini A, et al. Analysis of runs of homozygosity and their relationship with inbreeding in five cattle breeds farmed in Italy. Anim Genet. 2015;46:110–21.View ArticlePubMedGoogle Scholar
- Gurgul A, Szmatoła T, Topolski P, Jasielczuk I, Żukowski K, Bugno-Poniewierska M. The use of runs of homozygosity for estimation of recent inbreeding in Holstein cattle. J Appl Genet. 2016;57:527–30.View ArticlePubMedGoogle Scholar