Skip to main content

A comparison of marker-based estimators of inbreeding and inbreeding depression

Abstract

Background

The availability of genome-wide marker data allows estimation of inbreeding coefficients (F, the probability of identity-by-descent, IBD) and, in turn, estimation of the rate of inbreeding depression (ΔID). We investigated, by computer simulations, the accuracy of the most popular estimators of inbreeding based on molecular markers when computing F and ΔID in populations under random mating, equalization of parental contributions, and artificially selected populations. We assessed estimators described by Li and Horvitz (FLH1 and FLH2), VanRaden (FVR1 and FVR2), Yang and colleagues (FYA1 and FYA2), marker homozygosity (FHOM), runs of homozygosity (FROH) and estimates based on pedigree (FPED) in comparison with estimates obtained from IBD measures (FIBD).

Results

If the allele frequencies of a base population taken as a reference for the computation of inbreeding are known, all estimators based on marker allele frequencies are highly correlated with FIBD and provide accurate estimates of the mean ΔID. If base population allele frequencies are unknown and current frequencies are used in the estimations, the largest correlation with FIBD is generally obtained by FLH1 and the best estimator of ΔID is FYA2. The estimators FVR2 and FLH2 have the poorest performance in most scenarios. The assumption that base population allele frequencies are equal to 0.5 results in very biased estimates of the average inbreeding coefficient but they are highly correlated with FIBD and give relatively good estimates of ΔID. Estimates obtained directly from marker homozygosity (FHOM) substantially overestimated ΔID. Estimates based on runs of homozygosity (FROH) provide accurate estimates of inbreeding and ΔID. Finally, estimates based on pedigree (FPED) show a lower correlation with FIBD than molecular estimators but provide rather accurate estimates of ΔID. An analysis of data from a pig population supports the main findings of the simulations.

Conclusions

When base population allele frequencies are known, all marker-allele frequency-based estimators of inbreeding coefficients generally show a high correlation with FIBD and provide good estimates of ΔID. When base population allele frequencies are unknown, FLH1 is the marker frequency-based estimator that is most correlated with FIBD, and FYA2 provides the most accurate estimates of ΔID. Estimates from FROH are also very precise in most scenarios. The estimators FVR2 and FLH2 have the poorest performances.

Background

The characterization and management of inbreeding are important topics in conservation [1] and in population and quantitative genetics [2,3,4]. Proper estimation of individual inbreeding coefficients is necessary to estimate inbreeding depression, i.e., the change in the mean of a quantitative trait, which mainly results from the expression of recessive alleles with their effect being hidden in heterozygotes and exposed in homozygotes by inbreeding (see [2], Chapter 10). The inbreeding coefficient of an individual is defined as the correlation between homologous alleles [5] or the probability of identity-by-descent (IBD) of homologous alleles [6], and has been traditionally obtained from pedigree data [7]. However, inbreeding can also be obtained from genomic information (see e.g., [8]) and the development of high-density single nucleotide polymorphism (SNP) panels for an increasing number of species gives the opportunity to develop and implement methods that can accurately estimate the true realized inbreeding [9]. Genomic information also enables the levels of genetic variation across the genome to be evaluated [10,11,12] or the identification of genomic regions that are responsible for inbreeding depression [13, 14].

There are many metrics to estimate relatedness and inbreeding from marker data that are applied on a SNP-by-SNP basis and need the knowledge of marker allele frequencies [15,16,17,18,19,20,21,22,23,24]. Another useful marker-based measure to estimate inbreeding is the proportion of the autosomal genome that is composed of runs of homozygosity (ROH) [25], as well as other alternative analogous metrics [26]. ROH measures of inbreeding provide estimates of IBD and have been shown to be useful estimators of the inbreeding coefficient and as a basis for estimating inbreeding depression [9, 27, 28] but their performance depends on the criteria used to define a ROH [29, 30]. Marker frequency-based measures of inbreeding are expected to provide unbiased estimates of the average IBD measures of relatedness and inbreeding relative to an initial generation or base population that is assumed to consist of non-inbred and unrelated individuals, when the allele frequencies in that initial generation are known [19, 22, 31, 32]. However, allele frequencies in the base population are usually not available and inferring them is difficult [33], thus frequencies in the current generation are often used instead. In that case, marker frequency-based measures of inbreeding provide deviations from Hardy–Weinberg proportions or correlations of allele frequencies of homologous genes relative to the current generation of the population rather than estimates of IBD [22], and the estimates are confounded by the average coancestry of individuals with other individuals in the sample taken for analysis [34].

The different measures of molecular inbreeding have been evaluated in multiple studies, both based on empirical data [13, 14, 24, 34,35,36,37,38,39,40,41,42,43,44], and based on simulated data or theoretical analyses [9, 32, 34, 45,46,47]. Some empirical studies have used inbreeding estimators based on known [12] or estimated [35] allele frequencies from a predefined base population. In other studies, a frequency of 0.5 is assumed for all SNP alleles, which results in inbreeding estimates that have intermediate to high correlations with pedigree-based estimates or IBD [35, 37, 48]. Nevertheless, in most studies, allele frequencies in the current population are used to obtain different marker-based inbreeding estimates and these are compared with each other and with pedigree-based inbreeding estimates, with rather contrasting results. For example, the correlation of coefficients of inbreeding obtained from pedigrees with those obtained from markers is often intermediate ranging from 0.4 to 0.8 [13, 14, 39, 41, 42, 49], but can sometimes be low or even negative [39, 41, 44, 50, 51].

Previous simulation studies have investigated the accuracy of different molecular measures of inbreeding to estimate IBD. One of the most comprehensive studies is that from Wang [23], who showed that when the number of markers is large enough (around 10,000), marker-based estimates of inbreeding are more correlated with IBD than pedigree-based estimates. A later simulation study by Forutan et al. [48], who extended the analyses to artificially-selected populations, showed that marker-based inbreeding coefficients were close to the IBD values when the frequencies of the base population are known, as expected. However, when constant allele frequencies of 0.5 were used, marker-based coefficients overestimated IBD, although the correlation of marker-based estimates with IBD values was close to 1.

The accuracy of using molecular measures of inbreeding based on current allele frequencies to estimate the rate of inbreeding depression (ΔID) has also been addressed by different studies [9, 23, 32, 46, 47]. These studies have shown that marker-based measures of inbreeding and relatedness are more precise and more powerful to detect inbreeding depression than pedigree-based measures. In addition, simulation results have indicated that some estimators of molecular inbreeding, such as those obtained from the correlation of uniting gametes [20] and those from ROH, generally provide good estimates of ΔID but can give biased estimates in some cases. However, comprehensive studies are still needed to elucidate the performance of the use of different marker-based estimators to estimate ΔID under some scenarios with non-random contributions from parents to progeny. For example, in conservation programs it is common to equalize contributions from parents to offspring in order to maintain the highest genetic diversity [1]. In addition, many populations are subjected to artificial selection, where contributions from parents to progeny can be far from random.

The objective of this study was to evaluate how different estimators of molecular inbreeding perform as measures of IBD and how precise they are when estimating ΔID, with or without knowledge of allele frequencies in the base population of reference. We focus on the most popular marker-based estimators that make use of SNP allele frequencies or ROH [42,43,44, 51,52,53,54,55,56,57,58] and that are readily obtained by commonly used software such as PLINK [59, 60] and GCTA [36]. We carried out computer simulations that assumed a model of selection against deleterious mutations, with random or equal contributions from parents to progeny, or artificial selection for a neutral quantitative trait. We also analysed a pedigreed population of Iberian pigs where equalization of family sizes has been intended over 23 generations, and compared values of inbreeding and inbreeding depression obtained for the different measures of inbreeding based on a simulation of this population with their observed empirical values.

Methods

Simulation of populations under different breeding program designs

Large initial population

We used the software SLiM 3 [61] to simulate the genome of a large population of a sexually reproducing diploid species with random mating and discrete generations. As a model, we assumed the genomic characteristics of the pig genome [62], as some of the simulation results are compared with those obtained from empirical data of a strain of Iberian pigs (the Guadyerbas strain). A population with a constant size of N = 500 individuals was simulated for 5000 generations, which is the approximate effective population size estimated for the ancestral population of the Guadyerbas strain [63]. We assumed a total genome size of 2250 Mb, with 18 chromosomes of 125 Mb each. A summary of the main parameters used in the simulations is in Table 1. A recombination rate of 8 × 10–9 was assumed, with crossover frequencies following a Poisson distribution with a uniform probability across the genome and without interference. This implies a genome length of about 1 Morgan per chromosome and a recombination rate of 0.8 cM/Mb, which is close to that found in the pig genome [62]. The mutation rate per nucleotide was assumed to be 4 × 10–9 in order to obtain a number of SNPs close to that available for the pig population, but a model with a higher mutation rate (9 × 10–9) was also considered to investigate a higher density of SNPs.

Table 1 Simulation parameters for the default model, a model with a higher density of SNPs, and an alternative model of deleterious mutations with a lower mean effect than for the default model

In order to evaluate ΔID for fitness, it was assumed that 5% of the mutations were deleterious. The fitness of the wild type genotype, the heterozygote, and the mutant homozygote were 1, 1 − \(sh\), and 1 − \(s\), respectively, where the homozygous selection coefficient \(s\) was obtained from an exponential distribution with mean \(\overline{s }\) = 0.1, and the dominance coefficient \(h\) was obtained from a uniform distribution between 0 and \({e}^{(-ks)}\), where \(k\) was set to obtain an average value of \(\overline{h }\) = 0.2 [4, 64]. Individual genotypic fitness values were obtained multiplicatively across loci. We also assumed an alternative model with a lower mean effect of deleterious mutations (\(\overline{s }\) = 0.025) and a higher mutation rate (Table 1). The distribution of mutation effects and dominance coefficients assumed in the two models are shown in Additional file 1: Fig. S1.

Small populations with random contributions, equalization of parents´ contributions, or artificial selection

An in-house C program was used to simulate small populations of size \(N\) = 20 or 100 breeding individuals over 10 (for \(N\) = 20) or 50 (for \(N\) = 100) discrete generations, such that the average inbreeding coefficient (\(F\)) in the last generation was about the same for both population sizes. Populations of size \(N\) = 20 and \(N\) = 100 were also simulated for 20 and 100 generations, respectively, in order to approximately double the average inbreeding coefficient in the last generation. To start these populations, individuals were randomly sampled from the large initial population of 500 individuals that was simulated as described previously. The first generation (\(t\) = 0) was considered to be the base population that all inbreeding coefficients refer to. In one set of simulations, generation \(t\) = 10 was considered the base population. The genome length, the number of chromosomes, the recombination rate between nucleotides, and the fitness effects were as described previously. It was assumed that the same SNP panel was used to compute measures of inbreeding in the initial and final generations. Thus, only SNPs that segregated at generation t = 0 were considered in the analyses and the mutation rate was set to zero for the subsequent generations. In order to estimate IBD probabilities, about 4500 multiallelic loci uniformly distributed across the genome were also simulated, with unique alleles assigned to each individual in generation \(t\) = 0.

For the random contributions (RC) and equalization of parents´ contributions (EC) scenarios, a number \(N\) of individuals were randomly taken from the large initial population to create each replicated line. For the RC scheme, parents were randomly mated to generate \(N\) progeny, allowing polygamy but avoiding self-fertilization. For the EC scheme, the same procedure was followed, except that, in each generation, the \(N\) individuals were randomly paired in couples and two offspring were obtained from each couple. For the artificial selection scheme (SEL), \(T\) individuals (100 or 500) were taken at random from the large initial population to generate the base population of each replicate. From these individuals, \(N\) (20 or 100, respectively) were artificially selected based on their own phenotype for a quantitative trait (described below), implying a selected proportion of 20%. Two thousand replicates were carried out for each simulated scenario.

The genetic basis of the quantitative trait was established as described in the following. From the SNPs of the large initial population, a random 5% were chosen as quantitative trait loci (QTL) and an effect on the quantitative trait was assigned to one of the two alleles. Effects were taken from a normal distribution with mean zero and standard deviation 0.1, assuming additive gene action. Additivity was also assumed between loci, and thus genotypic values were obtained as the sum of the effects of all QTL. The phenotypic value was obtained by adding an environmental deviation to the genotypic value, which was taken from a normal distribution with mean 0 and environmental variance 2, a value chosen such that it generated an initial heritability for the trait of around 0.5. The quantitative trait was assumed to be not related to fitness.

Pedigrees were recorded over generations to obtain the pedigree inbreeding coefficient of each individual in the last generation (\(t\) = 10, 20, 50 or 100) to obtain an estimate of \({F}_{PED}\). The identity-by-descent inbreeding coefficient (\({F}_{IBD}\)) in the last generation was computed using the multiallelic loci. The genotypes of the SNPs (about 25,000–40,000 in the default model and more than twice as many in the higher density model; Table 1) were used to obtain the molecular inbreeding coefficients for each individual in the last generation. Loci that affected the selected quantitative trait or fitness were excluded when computing any measure of inbreeding. Marker allele-frequency based measures of \(F\) (described below) were obtained using the known allele frequencies of the initial generation (\(t\) = 0, considered the base population of reference), those of the last (current) generation (\(t\) = 10 or 20 for \(N\) = 20 and \(t\) = 50 or 100 for \(N\) = 100), or set to a constant value of 0.5. Runs of homozygosity were also obtained for individuals in the last generation. The mean and variance of the \(F\) values of individuals in the last generation were obtained for each estimator of inbreeding. Pairwise Pearson’s correlations were also obtained between all inbreeding measures. Average estimates of inbreeding from the different molecular estimators were compared with the inbreeding coefficient obtained based on IBD values (\({F}_{IBD}\)) in the last generation.

Estimates of the rate of inbreeding depression (ΔID) were obtained for fitness as the regression coefficient of the logarithm of individual fitnesses in the last generation on the corresponding estimated inbreeding coefficients (see [2], p. 262). Additional file 2: Table S1 and Additional file 3: Fig. S2 illustrate the removal of deleterious mutations by genetic purging and drift, the reduction in the expected inbreeding depression in consecutive generations, and the change in additive and dominance variance for fitness for one of the simulated scenarios.

Power to detect inbreeding depression was obtained following Wang [23], by counting the percentage of replicates for which the value of ΔID was significantly different from zero with a probability P < 0.05. A bootstrap method based on 1000 resampling sets with replacement of the 2000 pairs of simulated estimates was used to compare the estimates of ΔID obtained from \({F}_{IBD}\) with those from the different molecular marker estimators. To confirm the bootstrap results, a paired sample t-test was also carried out using R Commander [65].

Simulation of the Iberian pig population

The Guadyerbas population consists of a pedigreed population with 1206 individuals born across 23 cohorts, from which 219 (those born in cohorts 17–23) were genotyped with the Illumina Porcine SNP60 Bead Chip, which contains 35,519 SNPs that segregate in the Iberian breed [66]. Phenotypes for litter size were available for 832 females (2712 litters, with a mean litter size of 7.4 piglets and standard deviation of 2.3) that were born across all cohorts (i.e., from 0 to 23), so that ΔID could be calculated by including \({F}_{PED}\) as a covariate in an animal model. The detailed statistical model used is given by Saura et al. [14]. Estimates of ΔID from molecular measures of \(F\) were also obtained for 103 females (250 litters, with mean litter size of 7.1 piglets and standard deviation of 2.4) that were genotyped in cohorts 17–23 [14].

We carried out a gene dropping simulation of the whole known pedigree and ascribed to the individuals of the initial cohort the simulated genotypic data from the large initial population of 500 individuals, as described before. The pedigree was run 2000 times, starting from random samples from the large initial population to generate the genotypic values for the individuals of the initial cohort (cohort 0). The length of the genome, the number of chromosomes and the recombination rate assumed between nucleotides were the same as those detailed in the previous sections. Simulated offspring were obtained for each mating following the pedigree in the absence of selection and mutation. The different measures of \(F\) were obtained for all individuals across cohorts. The mean and variance of the different estimators of inbreeding and the correlations between them were calculated for the last cohort (23), either assuming the allele frequencies of cohort 17 (considered the base population to which \(F\) estimates refer to), assuming the frequencies of the current generation (cohort 23), or assuming frequencies equal to 0.5. Finally, using the available data for litter size from 103 females (born in cohorts 17–23), ΔID was obtained for each molecular \(F\) estimator using the simulated genotypes. The simulation results were compared with the empirical estimates.

Molecular measures of the inbreeding coefficient

The following estimators of the inbreeding coefficient were calculated based on the SNP data:

$${F}_{VR1}=\frac{{\sum }_{k=1}^{S}{\left({x}_{k}-2{p}_{k}\right)}^{2}}{{\sum }_{k=1}^{S}2{p}_{k}\left(1-{p}_{k}\right)}-1,$$

where \(S\) is the total number of markers, \({x}_{k}\) is the number of minor alleles of marker \(k\) (i.e., 0, 1 or 2 copies), and \({p}_{k}\) is the frequency (initial, current or 0.5) of the minor allele. This estimator was proposed by VanRaden [19] and it is based on the variance of additive genetic values.

$${F}_{VR2}=\frac{1}{S}\sum_{k=1}^{S}\left(\frac{{\left({x}_{k}-2{p}_{k}\right)}^{2}}{2{p}_{k}\left(1-{p}_{k}\right)}-1\right),$$

a metric proposed by Amin et al. [67], as cited by VanRaden [19]. It only differs from \({F}_{VR1}\) in that the weighting by the variance of allele frequencies is done for each SNP rather than by the summation of variances for all SNPs, so that it gives a larger weight to rare alleles.

$${F}_{YA2}=\frac{1}{S}\sum_{k=1}^{S}\frac{{x}_{k}^{2}-\left(1+2{p}_{k}\right){x}_{k}+2{p}_{k}^{2}}{2{p}_{k}\left(1-{p}_{k}\right)},$$

which is a coefficient based on the correlation between uniting gametes [20], such that homozygous genotypes are weighted by the inverse of their allele frequencies, and it is expected to show lower sampling variance than other estimators [36].

$${F}_{YA1}=\frac{{\sum }_{k=1}^{S}{x}_{k}^{2}-\left(1+2{p}_{k}\right){x}_{k}+2{p}_{k}^{2}}{{\sum }_{k=1}^{S}2{p}_{k}\left(1-{p}_{k}\right)},$$

an analogous estimator to that of Yang et al. [20] (\({F}_{YA2}\)) but considering the summation of terms separately in the numerator and denominator, as recommended by Zhang et al. [34].

$${F}_{LH1}=1-\frac{{\sum }_{k=1}^{S}{x}_{k}\left(2-{x}_{k}\right)}{{\sum }_{k=1}^{S}2{p}_{k}\left(1-{p}_{k}\right)},$$

the metric that was initially proposed by Li and Horvitz [16], which gives the deviation of the observed frequency of homozygotes from the expected values under Hardy–Weinberg equilibrium when current allele frequencies are considered.

$${F}_{LH2}=1-\frac{1}{S}\sum_{k=1}^{S}\left(\frac{{x}_{k}\left(2-{x}_{k}\right)}{2{p}_{k}\left(1-{p}_{k}\right)}\right),$$

for which the different weighting in relation to FLH1 is analogous to that between \({F}_{VR2}\) and \({F}_{VR1}\), and between \({F}_{YA2}\) and \({F}_{YA1}\).

$${F}_{q05},$$

which is obtained by setting \({p}_{k}\) = 0.5 for any of the previous estimators.

$${F}_{HOM}=1-\frac{1}{S}{\sum }_{k=1}^{S}{x}_{k}\left(2-{x}_{k}\right),$$

which is the average number of homozygous SNPs, which has a correlation of 1 with \({F}_{LH1}\) and \({F}_{q05}\).

The above estimators may receive different names in different studies and a summary of this nomenclature is in Table 1 of Villanueva et al. [12]. Here, we used a terminology, as simple as possible, by identifying the abbreviation for the authors who originally proposed the estimators (VR, YA, LH) plus a suffix 1 to indicate that the measure is obtained from a ratio of averages over loci (VR1, YA1, LH1) and a suffix 2 when the same estimator is obtained from the average of the ratios (VR2, YA2, LH2). This may simplify the existing “Babel tower” [44] of nomenclature.

The estimators are based either on genetic drift derivations (\({F}_{VR1-2}\) and \({F}_{YA1-2}\)) or on homozygosity derivations (\({F}_{LH1-2}\) and \({F}_{HOM}\)) and the equivalence between both approaches was already demonstrated by Cockerham [68]. A summary of the relationships between the estimators is given in the Supplementary Appendix of Caballero et al. [32], and a derivation of \({F}_{YA2}\) from the correlation between uniting gametes is given in the Appendix of this paper.

The measures \({F}_{VR2}\), \({F}_{YA2}\), \({F}_{LH1}\) and \({F}_{LH2}\) can readily be obtained by the software PLINK and GCTA (\({F}^{I}\), \({F}^{III}\), \(F\) and \({F}^{II}\), respectively), while \({F}_{VR1}\) can be obtained from the diagonal of the genomic relationship matrix obtained by GCTA with the option-make-grm-alg 1.

Note that, for these molecular measures to provide estimates of \({F}_{IBD}\) with reference to a given base population, the calculations must be done using the allele frequencies (\({p}_{k}\)) of all SNPs that segregate in the base population, rather than of only SNPs that segregate in the generation for which the estimate of inbreeding is computed. Estimates \({F}_{HOM}\) and the coefficient obtained assuming frequencies equal to 0.5 (\({F}_{q05}\)) were obtained using only SNPs that are segregating in the current generation.

Finally, estimates of the inbreeding coefficient based on ROH (\({F}_{ROH}\)) were obtained as:

$${F}_{ROH}= \frac{\sum {L}_{ROH}}{L},$$

where \(\sum {L}_{ROH}\) is the sum of the lengths of all ROH across the genome of the individual, and \(L\) is the genome length [25]. We used the software PLINK [59, 60] with default options, i.e., a minimum of 100 SNPs per ROH (30 SNPs in the simulated pig population to match the conditions assumed by Saura et al. [14]), at least 1 SNP per 50 kb (100 kb in the simulated pig population), a scanning window of 50 SNPs, and two SNPs in the same ROH no more than 1000 kb apart. We considered ROH lengths longer than 1 Mb (\({F}_{ROH-1}\)) or 5 Mb (\({F}_{ROH-5}\)).

Results

Comparison of estimators of inbreeding and inbreeding depression

Default model

Figure 1 shows the comparison of the different estimators of the inbreeding coefficient and \({F}_{IBD}\) values for a population size of \(N\) = 20 individuals run for \(t\) = 10 generations under the three scenarios: no selection with random contributions (RC), equalization of contributions (EC), and artificial selection (SEL). The two upper rows of the graphs show the averages and variances of the different measures of inbreeding, as well as estimates based on ROH. Note that when frequencies equal to 0.5 are used, all marker-based estimators are equivalent and a single purple bar is represented for all of them. Most of the patterns were very similar for the three schemes, with only some small differences between them.

Fig. 1
figure 1

Estimates of inbreeding based on alternative measures, correlation among them, and rate of inbreeding depression for fitness obtained for a population of N = 20 breeding individuals maintained for 10 discrete generations assuming random mating and random contributions from parents to progeny (RC), equalization of contributions from parents to progeny (EC), and artificial selection for a neutral quantitative trait (SEL). Mean (F) and variance (VF) of inbreeding coefficients at generation 10, correlation between estimated inbreeding coefficients and those obtained from IBD measures (r), and mean values of the rate of inbreeding depression (ΔID). Bars refer to true IBD values (FIBD; horizontal red line), and estimated from pedigree records (FPED) and from different marker frequency-based measures (FVR1, FVR2, FYA1, FYA2, FLH1, FLH2, FHOM; see text for definitions) assuming the frequencies of the base generation (blue bars), those of the current generation (yellow bars) or a constant frequency of 0.5 (Fq05; purple bars). Estimates from runs of homozygosity are shown for fragments larger than 1 Mb (FROH-1) or 5 Mb (FROH-5). Only subscripts of estimators are shown for the sake of clarity

The average \({F}_{IBD}\) under EC was about one half of that under random contributions, and that under artificial selection was slightly larger than that without selection, as expected (Fig. 1, first row). The average \({F}_{IBD}\) was well estimated by \({F}_{PED}\), and also by the five marker-based inbreeding coefficients when allele frequencies of the base population are known (blue bars), with a slight underestimation due to the excess of heterozygotes expected for markers in finite populations. When allele frequencies in the current generation were used instead (yellow bars), all estimators gave the expected average deviation from Hardy–Weinberg proportions (values slightly below zero, expected to be around − 1/2N). Estimates from direct average homozygosity of SNPs (\({F}_{HOM}\)) gave large overestimations of \({F}_{IBD}\), as expected. The average values of \({F}_{ROH}\) were slight overestimations of \({F}_{IBD}\), particularly when including shorter fragments (\({F}_{ROH-1}\)). Using allele frequencies equal to 0.5 when computing the marker-based estimators (\({F}_{q05}\), purple bars) gave also substantial overestimations of \({F}_{IBD}\).

The variances between individual estimates of inbreeding were substantially smaller for \({F}_{PED}\), \({F}_{HOM}\) and \({F}_{q05}\) than for \({F}_{IBD}\) (Fig. 1, second row, VF). When the initial allele frequencies were known (blue bars), variances of molecular estimators of inbreeding were very similar to the variance of \({F}_{IBD}\), except that for \({F}_{VR2}\) (particularly for the SEL scheme), which had the largest variance. When allele frequencies of the current generation were used (yellow bars), the variances for molecular estimators of inbreeding increased substantially, although not so much in the EC scenario. The estimator that had the closest variance to that of \({F}_{IBD}\) was \({F}_{YA2}\). The variances of \({F}_{ROH}\) were also close to those of \({F}_{IBD}\).

The third row of the graphs in Fig. 1 shows the correlation between the estimated and true \({F}_{IBD}\) values. When the initial allele frequencies were known (blue bars), all molecular estimators had a very high correlation with \({F}_{IBD}\), except for \({F}_{VR2}\). The \({F}_{ROH}\) estimators also had a very high correlation with \({F}_{IBD}\), while the \({F}_{PED}\) estimator had the lowest correlation with \({F}_{IBD}\). When using allele frequencies in the current generation (yellow bars), \({F}_{LH1}\), \({F}_{HOM},\) and \({F}_{q05}\) (which have a correlation of 1 among them) generally showed the highest correlations with \({F}_{IBD}\).

The fourth row of the graphs in Fig. 1 shows estimates of the inbreeding depression rate (ΔID) compared to the estimate based on \({F}_{IBD}\) (horizontal dotted line). When base population frequencies were known (blue bars), all measures of inbreeding provided estimates of inbreeding depression that were close to those obtained with \({F}_{IBD}\), with estimates based on \({F}_{VR2}\) and \({F}_{LH2}\) slightly biased downwards, particularly in the SEL scenario. Estimates of inbreeding depression based on \({F}_{PED}\) and \({F}_{q05}\) were also rather accurate but \({F}_{q05}\) provided a substantial overestimation of inbreeding depression under the EC scenario. When frequencies in the current generation were used (yellow bars), \({F}_{YA2}\) gave the most accurate estimates of inbreeding depression for the three scenarios, while \({F}_{HOM}\) resulted in very large overestimations and \({F}_{VR2}\) and \({F}_{LH2}\) in the largest underestimations. Estimates of ΔID based on \({F}_{ROH}\) were very close to those from \({F}_{IBD}\). Statistical tests to compare the mean difference between the estimates of ΔID based on \({F}_{IBD}\) with those based on the different marker-based estimators are shown in Additional file 4: Table S2. All differences were significantly greater than 0 except for estimates based on \({F}_{ROH-1}\) in the RC and EC scenarios, based on \({F}_{PED}\) in the EC scenario, and based on \({F}_{YA2}\) when using the current frequencies in the EC scenario.

Additional file 5: Fig. S3 shows results analogous to those in Fig. 1 but after 20 instead of 10 generations, so that the average inbreeding coefficient was doubled, while Additional file 6: Fig. S4 shows results for \(N\) = 100 run after 50 generations. In general, the results in these two figures were similar to those presented in Fig. 1, although there were some relevant differences. With \(N\) = 20 after 20 generations [see Additional file 5: Fig. S3], using base population allele frequencies equal to 0.5 (purple bars) resulted in underestimation of average inbreeding and ΔID for the RC and SEL schemes. With \(N\) = 100 after 50 generations [see Additional file 6: Fig. S4], the increase of \(F\) by selection (scheme SEL) was more evident than in the previous cases (with some overestimation of the molecular estimates), and the estimates of \({F}_{PED}\) were more clearly biased downwards.

Additional file 7: Table S3 gives the minimum and maximum values of individual inbreeding coefficients for the simulations presented in Fig. 1 and in Additional file 5: Fig. S3 and Additional file 6: Fig. S4. As expected, \({F}_{IBD}\), \({F}_{PED}\), \({F}_{HOM}\) and \({F}_{ROH}\), which are probabilities, ranged from 0 to 1, while \({F}_{VR1-2}\), \({F}_{YA1-2}\) and \({F}_{LH1-2}\), which are correlations or deviations from Hardy–Weinberg proportions, ranged from – 1 to 1. However, some values of \({F}_{VR2}\) were greater than 1 and some values of \({F}_{LH2}\) were smaller than – 1.

The power of the different measures of inbreeding to detect inbreeding depression is shown in Additional file 8: Fig. S5 for the scenarios considered in Fig. 1 and in Additional file 5: Fig. S3 and Additional file 6: Fig. S4. When allele frequencies of the base population were known, all estimators (as well as \({F}_{ROH}\) and \({F}_{q05}\)) resulted in approximately the same power as \({F}_{IBD}\), although some differences were observed between estimators in the case with \(N\) = 100 [see Additional file 8: Fig. S5c]. When allele frequencies in the base population were unknown, the most powerful estimator was generally \({F}_{YA2}\) (or \({F}_{YA1}\), with a slightly higher power in some occasions). The \({F}_{VR2}\) and \({F}_{LH2}\) estimators generally showed the lowest power. The power of \({F}_{PED}\) to detect inbreeding depression was usually lower than that of marker-based estimators.

Correlations between individual values for all estimators of inbreeding are shown in Additional file 9: Fig. S6. When allele frequencies in the base population were known [see Additional file 9: Fig. S6a–c], pairwise correlations between marker-based estimators were generally very large (≥ 0.9), with the correlation between \({F}_{VR2}\) and \({F}_{LH1-2}\) being somewhat lower. Correlations of \({F}_{PED}\) with the marker-based estimators were lower than correlations among the latter estimators. When the allele frequencies in the current generation were used, correlations were generally lower than when frequencies in the base population were used but still moderately high [see Additional file 9: Fig. S6d–f]. Some correlations with \({F}_{VR2}\) or \({F}_{LH2}\) were close to zero or negative. The \({F}_{ROH}\) estimator was highly correlated with the other estimators except with \({F}_{VR2}\), particularly in the SEL scenario.

Alternative models

In order to assess the robustness of the previous results, we also ran scenario RC under alternative conditions. Additional file 10: Fig. S7a shows results for the scenario with \(N\) = 20 after 10 generations but with double the density of SNPs than in Fig. 1 (see Table 1). Additional file 10: Fig. S7b shows results for the same scenario except that the effect of deleterious mutations was 1/4 of that used for the default model. In addition, Additional file 10: Fig. S7c presents results for \(N\) = 100, as in Additional file 6: Fig. S4, but after 100 generations instead of 50. In general, results for these alternative models were very similar to those of the default model, confirming the main results. Finally, Additional file 10: Fig. S7d shows that results were similar to those of Additional file 5: Fig. S3 with \(N\) = 20 after 20 generations but with the base population set up at generation 10 rather than generation 0. The only relevant difference was that \({F}_{ROH}\) overestimated \({F}_{IBD}\) and ΔID, which is expected because \({F}_{IBD}\) is relative to generation 10 whereas \({F}_{ROH}\) provides values of inbreeding relative to more ancestral generations.

Because the general patterns observed for the different scenarios (RC, EC, SEL, and different N values) were rather similar, Fig. 2 shows a summary of the main global findings by pooling the results for all breeding scenarios and simulations included in Fig. 1, Additional file 5: Fig. S3, Additional file 6: Fig. S4, and Additional file 10: Fig. S7a–c. Average correlations of the different estimates of the inbreeding coefficient with \({F}_{IBD}\) are presented as one minus the correlation (in percentage), such that lower values imply a greater accuracy of estimating \({F}_{IBD}\) (Fig. 2a). When the allele frequencies of the base population were known (blue bars), all molecular estimators of inbreeding showed a good accuracy for estimating \({F}_{IBD}\), although \({F}_{VR2}\) was slightly less accurate. When allele frequencies in the current generation were used (yellow bars), \({F}_{LH1}\) (or \({F}_{HOM}\)) gave the most accurate estimation of \({F}_{IBD}\). Good estimates of \({F}_{IBD}\) were also obtained when the frequencies of the base population were assumed to be 0.5 (purple bar) or when \({F}_{ROH}\) estimates were considered (brown bars).

Fig. 2
figure 2

a Correlation (r), expressed as 1–r (in percentage), of the different estimates of inbreeding with true IBD inbreeding, and b Average difference between estimates of the rate of inbreeding depression (ΔID) and those obtained from true IBD inbreeding. All results in the study from Fig. 1, Additional file 5: Fig. S3, Additional file 6: Fig. S4 and Additional file 10: Fig. S7a–c were averaged (intervals denote 1 standard deviation). The estimates refer to pedigree records (FPED; green bars) and to different marker frequency-based measures (FVR1, FVR2, FYA1, FYA2, FLH1, FLH2, Fq05; see text for definitions), assuming the frequencies of the base generation (blue bars), those of the current generation (yellow bars), or a constant frequency of 0.5 (purple bars). Estimates from runs of homozygosity are shown for fragments larger than 1 Mb (FROH-1) or 5 Mb (FROH-5). Only subscripts of estimators are shown for a better view

Figure 2b gives the average difference between the estimate of ΔID based on the different estimators compared to the estimate based on \({F}_{IBD}\) for all scenarios. Positive values imply underestimation of ΔID and negative values imply overestimation. Both \({F}_{PED}\) and all molecular estimators, except \({F}_{VR2}\) and \({F}_{LH2}\), gave relatively good average estimates of ΔID when allele frequencies in the base population were known or when a frequency of 0.5 was assumed. If allele frequencies in the base population were not known, the best estimator of ΔID was \({F}_{YA2}\), although \({F}_{ROH}\) also provided little biased estimates of ΔID. The \({F}_{HOM}\) estimator showed large overestimations of ΔID.

Comparison between simulated and empirical results in an Iberian pig population

Figure 3 presents simulated average estimates of \(F\), their variance, and correlations between different measures of \(F\) using individuals in the last generation (cohort 23). Estimates were obtained assuming allele frequencies from cohort 17, the current generation (cohort 23), or equal frequencies of 0.5. The figure also shows simulated values of ΔID (Fig. 3e, f). The simulation results generally agreed with those obtained for the discrete generation simulations, although results here referred to a single simulated pedigree (the empirical one). The empirical results were usually within the 95% of the distribution of simulation results across replicates, concurring with some of their major findings. For example, estimates of the average inbreeding coefficient based on \({F}_{VR2}\) and \({F}_{LH2}\) when the allele frequencies in the base population were known (\(F\), yellow diamonds) had the largest differences with \({F}_{IBD}\) and the largest variances in \(F\) values (Fig. 3a, c). In addition, estimates from these estimators were highly negatively correlated (Fig. 3g), as predicted by the simulations. When base population frequencies were assumed to be equal to 0.5 (\({F}_{q05}\)), substantially overestimated ΔID based on \({F}_{IBD}\) (Fig. 3e) were obtained, in agreement with the overestimations found in the scenario EC in the simulations (Fig. 1, Additional file 5: Fig. S3 and Additional file 6: Fig. S4).

Fig. 3
figure 3

Simulated (bars) and empirical (diamonds) estimates of inbreeding for the Guadyerbas pig population. Mean (F) and variance (VF) of simulated inbreeding coefficients in cohort 23, obtained from true IBD values (FIBD), and estimated from pedigree records (FPED) and from markers (FVR1, FVR2, FYA1, FYA2, FLH1, FLH2, FROH; see text for definitions) assuming the frequencies of the base generation (cohort 17 in a and c and cohort 0 in b and d; blue bars and yellow diamonds), those of the current generation (cohort 23; yellow bars and grey diamonds) or a constant frequency of 0.5 (purple bars). e and f Estimates of the inbreeding depression rate (ΔID) for litter size from 103 females born in cohorts 17–23. The estimate from FPED was obtained from data of 832 females born in all cohorts (0–23). g and h Correlations among simulated inbreeding coefficients in cohort 23. The intervals shown denote 95% of the variation across simulated replicates in order to see if the observed values are within the expected simulated values. Only subscripts of estimators are shown for a better view

Because ROH estimates of inbreeding consider more ancient inbreeding (i.e., inbreeding relative to a more ancient generation) than that of cohort 17, Fig. 3b, d, f, h shows simulation and observed results assuming that the base population is the initial cohort 0. The average inbreeding coefficient based on \({F}_{ROH-5}\) was very close to that based on \({F}_{IBD}\), while that based on \({F}_{ROH-1}\) was higher, as it may also consider more ancient inbreeding. This is well illustrated in Fig. 4, which shows the simulated and observed average inbreeding coefficients based on \({F}_{IBD}\), \({F}_{PED}\), \({F}_{ROH-1}\), and \({F}_{ROH-5}\) across generations. Estimates of inbreeding based on \({F}_{ROH-5}\) were very close to \({F}_{IBD}\) and \({F}_{PED}\) values. For the simulations, the average number of ROH per individual considering all individuals between cohorts 17 and 23 was about 86, with an average length of 10 Mb for ROH > 1 Mb, and about 60 with an average length of 13 Mb for ROH > 5 Mb. These values are similar to those found in the real data: an average of 77.7 ROH per individual with a mean length of 9.99 Mb for ROH > 1 Mb, and 47.9 ROH with a mean length of 14.04 Mb for ROH > 5 Mb. There was, however, some discrepancy between simulated and empirical estimates of ΔID based on \({F}_{ROH}\), with the empirical estimates of the latter being higher (Fig. 4f).

Fig. 4
figure 4

Change in the average simulated and observed inbreeding coefficient across generations for the Guadyerbas pig population. Lines refer to average simulated IBD values (FIBD), estimated from pedigree records (FPED) and from simulated (SIM) and observed (OBS; circles) runs of homozygosity considering fragments larger than 1 Mb (FROH-1) or 5 Mb (FROH-5). Only subscripts of estimators are shown for clarity

Discussion

Proper estimation of the inbreeding coefficient of individuals is necessary to estimate inbreeding depression and effective population size in domestic and wild populations. The possibility of obtaining data from large numbers of SNPs for an increasing number of species allows the estimation of genomic inbreeding in the absence of pedigree records or as a complement to them. Here, we carried out computer simulations to compare some of the most popular marker-based estimators of \(F\) that are readily obtained from widely-used software such as PLINK and GCTA under different scenarios, including equalization of parental contributions, and artificial and natural selection. Our results show that, all estimators generally had high correlations with \({F}_{IBD}\) and precise estimates of ΔID when the allele frequencies in the base population are known, with \({F}_{VR2}\) and \({F}_{LH2}\) having the lowest performance (Fig. 2). When allele frequencies in the base population are not known, FLH1 generally had the largest correlation with \({F}_{IBD}\), and \({F}_{YA2}\) generally gave the most accurate estimates of ΔID. The \({F}_{VR2}\) and \({F}_{LH2}\) estimators had very low correlations with \({F}_{IBD}\) and low accuracies to estimate ΔID in most scenarios. The assumption of allele frequencies of 0.5 in the calculation of the estimators (\({F}_{q05}\)) resulted in high correlations with \({F}_{IBD}\) and relatively good estimates of ΔID, except in the EC scenario, for which a substantial overestimation was found. Estimates of ΔID obtained with simple homozygosity measures (\({F}_{HOM}\)) resulted in large overestimations of ΔID, although \({F}_{HOM}\) has a correlation of 1 with \({F}_{LH1}\); however, its variance is much lower than that of \({F}_{IBD}\) (Fig. 1), resulting in estimates of ΔID based on \({F}_{HOM}\) to be biased upwards. This agrees with previous studies that showed that estimates of ΔID based on homozygosity measures of \(F\) are generally larger than those based on other estimators of \(F\) [37, 49]. Estimates of inbreeding based on ROH gave precise estimates of \({F}_{IBD}\) and ΔID, as also shown by previous studies [9, 27,28,29,30, 47]. Finally, estimates of \(F\) based on pedigrees (\({F}_{PED}\)) were less correlated with \({F}_{IBD}\) than molecular estimates, but provided estimates of ΔID with little bias. Our results for the pedigreed pig population showed generally good agreement between simulated and empirical results, supporting the above findings.

Estimation of the inbreeding coefficient

When allele frequencies for a base population of reference that consists of noninbred and unrelated individuals are known, all marker frequency-based estimators are expected to give almost unbiased estimates of \({F}_{IBD}\) [19, 22, 31, 32], as observed in our simulation results (Fig. 1). The slightly lower average inbreeding coefficients based on molecular estimates compared to \({F}_{IBD}\) (Fig. 1) is explained by the negative excess of heterozygotes expected for finite population sizes, which amounts approximately to − 1/2N [69, 70]. Note that, as the use of genomics becomes more common, it will be more frequent to have several previous generations genotyped and, therefore, to have information available for a base initial population of reference. Allele frequencies for a base population can also be estimated. Gengler et al. [33] and, more recently, Aldrige et al. [71] have proposed practical methods to calculate gene content using a linear regression. However, these methods are not feasible for deep pedigrees, and it is standard practice to use allele frequencies for the current genotyped population because of the ease of computation. Even if base population frequencies are available, the correlation with \({F}_{IBD}\) is not the same for all estimators. The \({F}_{LH1}\), \({F}_{VR1},\) and \({F}_{YA1-2}\) estimators provided accurate predictions, but \({F}_{VR2}\) and \({F}_{LH2}\) did not. The latter (called \({F}^{II}\) in the software PLINK and GCTA) is almost never used in practice. However, \({F}_{VR2}\) (\({F}^{I}\) in PLINK and GCTA) has been considered on many occasions [12, 41, 43, 44, 50, 58]. Our results argue against the use of both these estimators, as they showed the lowest correlations with \({F}_{IBD}\) and the most biased estimates of ΔID in the majority of scenarios, as summarized in Fig. 2.

When allele frequencies in the base population are not known, and those of the current population are used instead, which is the most frequent situation, the most accurate marker-frequency-based estimator of \({F}_{IBD}\) was \({F}_{LH1}\) (or \({F}_{HOM}\) or \({F}_{q05}\), which both had a correlation of 1 with \({F}_{LH1}\)). \({F}_{LH1}\) had the highest correlation with total homozygosity [32], and it is also the most useful to indicate if genetic diversity has been lost or gained in specific regions of the genome from the past generations to the current one [12]. For example, an increase in the value of local \({F}_{LH1}\) indicates a loss in heterozygosity, while a decrease indicates a gain in heterozygosity. In contrast, \({F}_{VR1}\), \({F}_{VR2},\) and \({F}_{YA2}\) do not provide useful information of whether heterozygosity has declined or increased in genomic regions [12].

Estimates of the inbreeding coefficient based on ROH were very accurate estimators of \({F}_{IBD},\) although they depended on the length of the fragments considered. For the pig data, Fig. 4 shows that estimates of \({F}_{ROH-5}\) were very close to the expected values of \({F}_{IBD}\) and \({F}_{PED}\) relative to the initial cohort, while \({F}_{ROH-1}\) provide larger estimates of inbreeding. Because the average rate of recombination for pigs is about 0.8 cM per Mb, \({F}_{ROH-5}\) values refer to events of inbreeding that occurred since about 12.5 generations ago [9], which probably includes most of the inbreeding that occurred since the foundation of the strain. However, \({F}_{ROH-1}\) refers to events of inbreeding that occurred further in the past (on average about 62.5 generations), which includes inbreeding that occurred before foundation of the strain, and explains why the mean \({F}_{ROH-1}\) in cohort 0 was greater than 0 (0.1).

Estimates of inbreeding based on current allele frequencies had very variable correlations with each other and with pedigree-based estimates of inbreeding. For example, the non-exhaustive compilation of correlations in Table 2 shows that the correlation of the estimates of inbreeding based on homozygosity (\({F}_{LH1}\) or \({F}_{HOM}\)) with \({F}_{PED}\) ranged from 0.1 to 0.9, with an average of about 0.5. Correlations of other marker-based estimators with \({F}_{PED}\) and with each other were also rather variable, with some negative values, usually involving \({F}_{VR2}\). These empirical results from Table 2 agree with our results, which also showed the lowest correlation between \({F}_{PED}\) and \({F}_{VR2}\), both in the simulations [see Additional file 9: Fig. S6] and in the pig data (Fig. 3). Correlations between molecular estimators of \(F\) were generally high, except for some correlations that involved \({F}_{VR2}\) particularly with \({F}_{LH1}\) and \({F}_{LH2}\), which were often low or negative (Table 2). This is also evident from our simulations results and the pig data ([see Additional file 9: Fig. S6] and Fig. 3), as well as from simulation results by Zhang et al. [34], where correlations of \({F}_{VR2}\) with \({F}_{PED}\), \({F}_{LH1}\), and \({F}_{YA1}\) were negative. Moreover, minima and maxima of individual inbreeding coefficients were within the expected ranges in our study [see Additional file 7: Table S3]), except for some values higher than 1 or lower than – 1 for \({F}_{VR2}\) and \({F}_{LH2}\), which are the estimators with the poorest performances. Therefore, the different estimators allow for a wide range of possible values of F, which likely depends on many factors such as the species, mating system, family structure, density and quality of markers, population size, etc. In addition, empirical results suggest that genotype imputation largely increases the variance of the estimators, thus producing substantial overestimates of average inbreeding coefficients and individual values well outside the expected ranges [44].

Table 2 Compilation from the Literature of correlations between individual measures of inbreeding from empirical data based on marker-based estimators of inbreeding assuming allele frequencies in the current generation and based on runs of homozygosity (ROH)

Estimation of the rate of inbreeding depression

Regarding estimation of ΔID when current allele frequencies are used, the most accurate molecular marker frequency-based estimator of inbreeding appears to be \({F}_{YA2}\). This is the estimator that results in a variance of F values that is closest to that of \({F}_{IBD}\) (see VF in Fig. 1, Additional file 5: Fig. S3 and Additional file 6: Fig. S4), and the lowest variance of individual estimates among all molecular estimators, as deduced theoretically by Yang et al. [36] and confirmed by other simulation [32] and empirical [42] results. \({F}_{YA2}\) was also the estimator with the greatest power to detect inbreeding depression [see Additional file 8: Fig. S5]. There has been a number of simulation studies that have investigated the performance of different measures of inbreeding to estimate ΔID when allele frequencies of the current population are assumed [32, 46, 47]. These studies suggest that \({F}_{YA2}\) and \({F}_{ROH}\) generally provide good estimations of ΔID, depending on the population size, while \({F}_{LH1}\) slightly underestimated ΔID and \({F}_{VR2}\) severely underestimated ΔID. In fact, \({F}_{YA2}\) and \({F}_{ROH}\) were highly correlated (Table 2) and previous work has shown that \({F}_{YA2}\) and \({F}_{ROH}\) have approximately the same power to detect inbreeding depression [13, 37], as also confirmed by our simulation results [see Additional file 8: Fig. S5]. It has also been found [32] that \({F}_{YA2}\) has the highest correlation with the phenotypic values of individuals and with the homozygous mutation load [9], as a proxy for fitness.

However, results from previous simulations demonstrate that \({F}_{YA2}\) can substantially overestimate ΔID for relatively small population sizes [32, 47]. For example, an overestimation by about 28% was observed for a population of size of N = 500 [32]. This substantial overestimation of ΔID based on \({F}_{YA2}\) was not observed in our results, where \({F}_{YA2}\) was always rather accurate. The reason for this discrepancy is that the previous studies analysed samples that were taken directly from a relatively large population, while we simulated a number of generations with a much smaller \(N\) (20 or 100) prior to the generation that was analysed, and mutation was not considered in the simulations. Thus, in our simulations, many rare alleles were likely lost by genetic drift during the period of reduced census size, which may have improved the estimation of the rate of inbreeding depression. In order to check this, we also analysed the case with \(N\) = 100 with the RC scheme at generation 0, i.e., immediately after sampling from the large original populations with \(N\) = 500. Resulting estimates of ΔID are given in Additional file 11: Fig. S8, which show that, with no minor allele frequency (MAF) filter (red bars), \({F}_{YA2}\) overestimated ΔID by about 28%, \({F}_{LH1}\) underestimated ΔID by about 30%, while \({F}_{ROH}\) produced almost unbiased estimates and \({F}_{VR2}\) and \({F}_{LH2}\) gave large underestimations. These results are in full agreement with those previously reported [32]. However, if these analyses are done when rare alleles were removed by applying a MAF filter of 0.01 or 0.05 to the data, overestimation of ΔID by \({F}_{YA2}\) was reduced to 3% for the 0.01 filter and became an underestimation by 15% for the 0.05 filter [see Additional file 11: Fig. S8]. Thus, part of the overestimation of ΔID that occurred with \({F}_{YA2}\) can be due to the contribution of SNPs with very low frequencies. Estimates of ΔID based on \({F}_{VR1}\), \({F}_{YA1}\), and \({F}_{LH1}\), i.e. those based on ratios of averages over loci, were not affected by applying an MAF filter, while those based on \({F}_{VR2}\), \({F}_{LH2}\), and \({F}_{ROH}\) were increased by an increase of the MAF threshold [see Additional file 11: Fig. S8].

The results of our study are concordant with those of Wang [23]. Under the conditions considered in our simulations (genome size of 18 Morgans, number of SNPs > 30,000, population sizes < 100, and number of generations > 20), molecular estimates were more precise estimates of \({F}_{IBD}\) and ΔID than those based on pedigree. Wang [23] considered random mating populations under a model of deleterious mutations on fitness, where the molecular inbreeding metric (\({F}_{LH1}\)) was calculated using current allele frequencies. We reached similar conclusions (Fig. 2a) for other molecular estimators of inbreeding and scenarios with equalization of contributions and artificial selection. Our results also concur with those of Forutan et al. [48], who compared estimates of inbreeding obtained from pedigree and from molecular data, i.e., \({F}_{VR1}\) and \({F}_{ROH}\), considering random mating and artificially selected populations, and a base population simulated assuming allele frequencies equal to 0.5 or uniformly distributed between 0 and 0.5. They showed that pedigree-based estimates of inbreeding underestimate inbreeding in selected populations, a result that we also observed in our simulations [see Additional file 6: Fig. S4]. This is as expected because selected individuals tend to be more related than the mean of the population [72]. They also found that \({F}_{VR1}\) estimates were close to \({F}_{IBD}\) estimates if the allele frequencies of the base population are known, but assuming a constant allele frequency of 0.5 resulted in overestimation when using \({F}_{VR1}\), although the correlation of these estimates with \({F}_{IBD}\) was close to 1. In fact, high correlations (> 0.6) between inbreeding values obtained based on \({F}_{PED}\) and \({F}_{q05}\) have been found in several empirical studies [35, 37, 48]. This agrees with our results, which also showed that \({F}_{q05}\) generally provided accurate estimates of ΔID (Fig. 2b), except for the EC scheme, where it resulted in overestimation (Fig. 1, Additional file 5: Fig. S3 and Additional file 6: Fig. S4), which was also reflected by the pig data (Fig. 3), a population that was intended to follow an EC protocol.

Pedigree-based inbreeding (\({F}_{PED}\)) showed rather accurate mean estimates of \({F}_{IBD}\) (see Fig. 1, Additional file 5: Fig. S3 and Additional file 6: Fig. S4), except in the case of artificial selection (see Additional file 6: Fig. S4 and [48]), but had only a moderate correlation with \({F}_{IBD}\) (Fig. 2a), as also found by Wang [23]. Estimates of ΔID by \({F}_{PED}\) were on the whole rather accurate (Fig. 2b), which concurs with the meta-analysis of Doekes et al. [73], who found moderate to high correlations of the estimates of ΔID based on \({F}_{PED}\) with those obtained based on molecular estimators of inbreeding: i.e., about 0.5–0.6 for \({F}_{ROH}\), about 0.6–0.7 for \({F}_{VR2}\) or \({F}_{YA2}\) (increasing to 0.8–0.9 for \({F}_{YA2}\) when base population frequencies of 0.5 were used), and about 0.6–0.7 for \({F}_{LH1}\). According to Wang [23], molecular estimates of inbreeding are more accurate predictors of \({F}_{IBD}\) than pedigree-based estimates when population sizes are smaller than ~ 200 (see his Fig. 3). This agrees with other simulation results assuming small population sizes [28]. However, Wang [23] also showed that for larger population sizes (> 200), the accuracy of pedigree-based estimates of inbreeding increased and became higher than that from molecular estimates of inbreeding. Whereas the accuracy of the molecular estimates of inbreeding was reduced with increasing population size, that from pedigree-based estimates remained invariable with increasing population size (see his Fig. 3). In addition, power to detect inbreeding depression became greater for pedigree-based estimates than for molecular estimates when population sizes were larger than about 500 (see Figure 5 of Wang [23]). Therefore, for large population sizes, under some situations, estimates of inbreeding and ΔID obtained from pedigrees can be as reliable, or even more reliable, than estimates based on molecular markers, although pedigrees for such large populations may only be available in domestic species. In most situations, therefore, estimates of inbreeding and ΔID should be more reliably obtained from molecular markers than from pedigrees.

Our simulations implied that the effective population size (\({N}_{e}\)) was close to the census size of breeders (\(N\)), except for the EC scenario, for which \({N}_{e}\) ≈ 2\(N\); see e.g., [4], p. 110). Thus, the results referred to scenarios with relatively small \({N}_{e}\) (≤ 100), with average inbreeding coefficients reaching relatively high values (\(F\) ≈ 0.1 − 0.4). This is a common situation in many animal breeding populations [74] and populations of conservation concern [75]. Other simulation studies have focused on populations with much larger population sizes (e.g., \(N\) > 10,000 individuals) and these have shown that molecular estimators of inbreeding perform also rather well [23, 32, 46].

In order to estimate inbreeding depression for fitness, we assumed a model of partially recessive deleterious mutations with average mutational parameters based on empirical data (see, e.g., [4]; p. 152–161). We also considered an alternative model with a larger number of mutations with much lower effects (Table 1 and Additional file 1: Fig. S1) but this did not affect the main findings of the study. However, we did not consider models of overdominance or epistasis. Overdominance can make some contribution to inbreeding depression but this is generally considered to be minor compared to partial dominance [76,77,78]. Nevertheless, further studies should be devoted to the investigation of the performance of the molecular estimators of inbreeding under other genetic models.

Conclusions

If the allele frequencies of the base population are known, all marker frequency-based estimators of inbreeding and inbreeding depression are reasonably highly correlated with \({F}_{IBD}\) and provide accurate estimates of ΔID, except for \({F}_{VR2}\) and \({F}_{LH2}\). If the allele frequencies of the base population are not known, \({F}_{LH1}\) is generally the estimator that is best correlated with \({F}_{IBD}\) and \({F}_{YA2}\) is the best estimator of ΔID. \({F}_{ROH}\) is a very accurate estimator of \({F}_{IBD}\) in most situations, while \({F}_{VR2}\) and \({F}_{LH2}\) perform very poorly in almost all scenarios. Estimates that are obtained assuming a constant frequency of 0.5 (\({F}_{q05}\)) give highly biased estimates of \({F}_{IBD}\) but are highly correlated with \({F}_{IBD}\) and provide reasonably good estimates of ΔID. Estimates from simple frequencies of homozygous markers (\({F}_{HOM}\)) cannot be used to estimate ΔID.

Availability of data and materials

All software and scripts used in the simulations are available in GitHub: https://github.com/armando-caballero/Molecular_Estimates_of_Inbreeding.

References

  1. Frankham R, Ballou JD, Briscoe DA. Introduction to conservation genetics. 2nd ed. Cambridge: Cambridge University Press; 2010.

    Book  Google Scholar 

  2. Lynch M, Walsh B. Genetics and analysis of quantitative traits. Sunderland: Sinauer Associates Inc.; 1998.

    Google Scholar 

  3. Hedrick PW. Genetics of populations. Sudbury: Jones and Bartlett Publishers; 2010.

    Google Scholar 

  4. Caballero A. Quantitative genetics. Cambridge: Cambridge University Press; 2020.

    Book  Google Scholar 

  5. Wright S. Coefficients of inbreeding and relationships. Am Nat. 1922;56:330–9.

    Article  Google Scholar 

  6. Malécot G. Les mathématiques de l’hérédité. Paris: Masson et Cie; 1948.

    Google Scholar 

  7. Wright S. Evolution and the genetics of populations. The theory of gene frequencies, vol. 2. Chicago: University of Chicago Press; 1969.

    Google Scholar 

  8. Howard JT, Pryce JE, Baes C, Maltecca C. Invited review: Inbreeding in the genomics era: Inbreeding, inbreeding depression, and management of genomic variability. J Dairy Sci. 2017;100:6009–24.

    Article  CAS  Google Scholar 

  9. Keller MC, Visscher PM, Goddard ME. Quantification of inbreeding due to distant ancestors and its detection using dense single nucleotide polymorphism data. Genetics. 2011;189:237–49.

    Article  Google Scholar 

  10. Kleinman-Ruiz D, Villanueva B, Fernández J, Toro MA, García-Cortés LA, Rodríguez-Ramilo ST. Intra-chromosomal estimates of inbreeding and coancestry in the Spanish Holstein cattle population. Livest Sci. 2016;185:34–42.

    Article  Google Scholar 

  11. Doekes HP, Veerkamp RF, Bijma P, Hiemstra SJ, Windig JJ. Trends in genome-wide and region-specific genetic diversity in the Dutch-Flemish Holstein-Friesian breeding program from 1986 to 2018. Genet Sel Evol. 2018;50:15.

    Article  Google Scholar 

  12. Villanueva B, Fernández A, Saura M, Caballero A, Fernández J, Morales-González E, et al. The value of genomic relationship matrices to estimate levels of inbreeding. Genet Sel Evol. 2021;53:42.

    Article  CAS  Google Scholar 

  13. Pryce JE, Haile-Mariam M, Goddard ME, Hayes BJ. Identification of genomic regions associated with inbreeding depression in Holstein and Jersey dairy cattle. Genet Sel Evol. 2014;46:71.

    Article  Google Scholar 

  14. Saura M, Fernández A, Varona L, Fernández AI, de Cara MÁ, Barragán C, et al. Detecting inbreeding depression for reproductive traits in Iberian pigs using genome-wide data. Genet Sel Evol. 2015;47:1.

    Article  Google Scholar 

  15. Toro M, Barragán C, Óvilo C, Rodrigañez J, Rodriguez C, Silió L. Estimation of coancestry in Iberian pigs using molecular markers. Conserv Genet. 2002;3:309–20.

    Article  CAS  Google Scholar 

  16. Li CC, Horvitz DG. Some methods of estimating the inbreeding coefficient. Am J Hum Genet. 1953;5:107–17.

    CAS  Google Scholar 

  17. Ritland K. Estimators for pairwise relatedness and individual inbreeding coefficients. Genet Res. 1996;67:175–85.

    Article  Google Scholar 

  18. Leutenegger A-L, Prum B, Genin E, Verny C, Lemainque A, Clerget-Darpoux F, et al. Estimation of the inbreeding coefficient through use of genomic data. Am J Hum Genet. 2003;73:516–23.

    Article  CAS  Google Scholar 

  19. VanRaden PM. Efficient methods to compute genomic predictions. J Dairy Sci. 2008;91:4414–23.

    Article  CAS  Google Scholar 

  20. Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, Nyholt DR, et al. Common SNPs explain a large proportion of heritability for human height. Nat Genet. 2010;42:565–9.

    Article  CAS  Google Scholar 

  21. Toro M, García-Cortés LA, Legarra A. A note on the rationale for estimating genealogical coancestry from molecular markers. Genet Sel Evol. 2011;43:27.

    Article  Google Scholar 

  22. Wang J. Marker-based estimates of relatedness and inbreeding coefficients: an assessment of current methods. J Evol Biol. 2014;27:518–30.

    Article  CAS  Google Scholar 

  23. Wang J. Pedigrees or markers: which are better in estimating relatedness and inbreeding coefficient? Theor Popul Biol. 2016;107:4–13.

    Article  Google Scholar 

  24. Goudet J, Kay T, Weir BS. How to estimate kinship. Mol Ecol. 2018;27:4121–35.

    Article  Google Scholar 

  25. McQuillan R, Leutenegger AL, Abdel-Rahman R, Franklin CS, Pericic M, Barac-Lauc L, et al. Runs of homozygosity in European populations. Am J Hum Genet. 2008;83:359–72.

    Article  CAS  Google Scholar 

  26. Druet T, Gautier M. A model-based approach to characterize individual inbreeding at both global and local genomic scales. Mol Ecol. 2017;26:5820–41.

    Article  CAS  Google Scholar 

  27. Curik I, Ferenčaković M, Sölkner J. Inbreeding and runs of homozygosity: a possible solution to an old problem. Livest Sci. 2014;166:26–34.

    Article  Google Scholar 

  28. Kardos M, Taylor HR, Ellegren H, Luikart G, Allendorf FW. Genomics advances the study of inbreeding depression in the wild. Evol Appl. 2016;9:1205–18.

    Article  Google Scholar 

  29. Ferenčaković M, Hamzić E, Gredler B, Solberg TR, Klemetsdal G, Curik I, et al. Estimates of autozygosity derived from runs of homozygosity: empirical evidence from selected cattle populations. J Anim Breed Genet. 2013;130:286–93.

    Article  Google Scholar 

  30. Ferenčaković M, Sölkner J, Kapš M, Curik I. Genome-wide mapping and estimation of inbreeding depression of semen quality traits in a cattle population. J Dairy Sci. 2017;100:4721–30.

    Article  Google Scholar 

  31. Wang J. A new likelihood estimator and its comparison with moment estimators of individual genome-wide diversity. Heredity (Edinb). 2011;107:433–43.

    Article  CAS  Google Scholar 

  32. Caballero A, Villanueva B, Druet T. On the estimation of inbreeding depression using different measures of inbreeding from molecular markers. Evol Appl. 2020;14:416–28.

    Article  Google Scholar 

  33. Gengler N, Mayeres P, Szydlowski M. A simple method to approximate gene content in large pedigree populations: application to the myostatin gene in dual-purpose Belgian Blue cattle. Animal. 2007;1:21–8.

    Article  CAS  Google Scholar 

  34. Zhang QS, Goudet J, Weir BS. Rank-invariant estimation of inbreeding coefficients. Heredity. 2022;128:1–10.

    Article  CAS  Google Scholar 

  35. VanRaden PM, Olson KM, Wiggans GR, Cole JB, Tooker ME. Genomic inbreeding and relationships among Holsteins, Jerseys, and Brown Swiss. J Dairy Sci. 2011;94:5673–82.

    Article  CAS  Google Scholar 

  36. Yang J, Lee SH, Goddard ME, Visscher PM. GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet. 2011;88:76–82.

    Article  CAS  Google Scholar 

  37. Bjelland DW, Weigel KA, Vukasinovic N, Nkrumah JD. Evaluation of inbreeding depression in Holstein cattle using whole-genome SNP markers and alternative measures of genomic inbreeding. J Dairy Sci. 2013;96:4697–706.

    Article  CAS  Google Scholar 

  38. Eynard SE, Windig JJ, Leroy G, van Binsbergen R, Calus MPL. The effect of rare alleles on estimated genomic relationships from whole genome sequence data. BMC Genet. 2015;16:24.

    Article  Google Scholar 

  39. Zhang Q, Calus MP, Guldbrandtsen B, Lund MS, Sahana G. Estimation of inbreeding using pedigree, 50k SNP chip genotypes and full sequence data in three cattle breeds. BMC Genet. 2015;16:88.

    Article  Google Scholar 

  40. Huisman J, Kruuk LE, Ellis PA, Clutton-Brock T, Pemberton JM. Inbreeding depression across the lifespan in a wild mammal population. Proc Natl Acad Sci USA. 2016;113:3585–90.

    Article  CAS  Google Scholar 

  41. Brito LF, Kijas JW, Ventura RV, Sargolzaei M, Porto-Neto LR, Cánovas A, et al. Genetic diversity and signatures of selection in various goat breeds revealed by genome-wide SNP markers. BMC Genomics. 2017;18:229.

    Article  Google Scholar 

  42. Alemu SW, Kadri NK, Harland C, Faux P, Charlier C, Caballero A, et al. An evaluation of inbreeding measures using a whole genome sequenced cattle pedigree. Heredity. 2020;126:410–23.

    Article  Google Scholar 

  43. Morales-González E, Saura M, Fernández A, Fernández J, Pong-Wong R, Cabaleiro S, et al. Evaluating different genomic coancestry matrices for managing genetic variability in turbot. Aquaculture. 2020;520: 734985.

    Article  Google Scholar 

  44. Dadousis C, Ablondi M, Cipolat-Gotet C, van Kaam JT, Marusi M, Cassandro M, et al. Genomic inbreeding coefficients using imputed genotypes: assessing different estimators in Holstein-Friesian dairy cows. J Dairy Sci. 2022;105:5926–45.

    Article  CAS  Google Scholar 

  45. Kardos M, Luikart G, Allendorf FW. Measuring individual inbreeding in the age of genomics: marker-based measures are better than pedigrees. Heredity (Edinb). 2015;115:63–72.

    Article  CAS  Google Scholar 

  46. Yengo L, Zhu Z, Wray NR, Weir BS, Yang J, Robinson MR, et al. Detection and quantification of inbreeding depression for complex traits from SNP data. Proc Natl Acad Sci USA. 2017;114:8602–7.

    Article  CAS  Google Scholar 

  47. Nietlisbach P, Muff S, Reid JM, Whitlock MC, Keller LF. Non-equivalent lethal equivalents: models and inbreeding metrics for unbiased estimation of inbreeding load. Evol Appl. 2018;12:266–79.

    Article  Google Scholar 

  48. Forutan M, Mahyari SA, Baes C, Melzer N, Schenkel FS, Sargolzaei M. Inbreeding and runs of homozygosity before and after genomic selection in North American Holstein cattle. BMC Genomics. 2002;19:98.

    Article  Google Scholar 

  49. Bérénos C, Ellis PA, Pilkington JG, Pemberton JM. Genomic analysis reveals depression due to both individual and maternal inbreeding in a free-living mammal population. Mol Ecol. 2016;25:3152–68.

    Article  Google Scholar 

  50. Solé M, Gori AS, Faux P, Bertrand A, Farnir F, Gautier M, et al. Age-based partitioning of individual genomic inbreeding levels in Belgian Blue cattle. Genet Sel Evol. 2017;49:92.

    Article  Google Scholar 

  51. Yoshida GM, Cáceres P, Marín-Nahuelpi R, Koop BF, Yáñez JM. Estimates of autozygosity through runs of homozygosity in farmed Coho Salmon. Genes (Basel). 2020;11:490.

    Article  CAS  Google Scholar 

  52. Rodríguez-Ramilo ST, Reverter A, Sánchez JP, Fernández J, Velasco-Galilea M, González O, et al. Networks of inbreeding coefficients in a selected population of rabbits. J Anim Breed Genet. 2020;137:599–608.

    Article  Google Scholar 

  53. Antonios S, Rodríguez-Ramilo ST, Aguilar I, Astruc JM, Legarra A, Vitezica ZG. Genomic and pedigree estimation of inbreeding depression for semen traits in the Basco-Béarnaise dairy sheep breed. J Dairy Sci. 2021;104:3221–30.

    Article  CAS  Google Scholar 

  54. Shi L, Wang L, Liu J, Deng T, Yan H, Zhang L, et al. Estimation of inbreeding and identification of regions under heavy selection based on runs of homozygosity in a Large White pig population. J Anim Sci Biotechnol. 2020;11:46.

    Article  CAS  Google Scholar 

  55. Schiavo G, Bovo S, Bertolini F, Tinarelli S, Dall’Olio S, Nanni Costa L, et al. Comparative evaluation of genomic inbreeding parameters in seven commercial and autochthonous pig breeds. Animal. 2020;14:910–20.

    Article  CAS  Google Scholar 

  56. Adams SM, Derks MFL, Makanjuola BO, Marras G, Wood BJ, Baes CF. Investigating inbreeding in the turkey (Meleagris gallopavo) genome. Poult Sci. 2021;100: 101366.

    Article  CAS  Google Scholar 

  57. Polak G, Gurgul A, Jasielczuk I, Szmatoła T, Krupiński J, Bugno-Poniewierska M. Suitability of pedigree information and genomic methods for analyzing inbreeding of Polish cold-blooded horses covered by conservation programs. Genes (Basel). 2021;12:429.

    Article  CAS  Google Scholar 

  58. Nosrati M, Nanaei HA, Javanmard A, Esmailizadeh A. The pattern of runs of homozygosity and genomic inbreeding in world-wide sheep populations. Genomics. 2021;113:1407–15.

    Article  CAS  Google Scholar 

  59. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81:559–75.

    Article  CAS  Google Scholar 

  60. Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience. 2015;4:7.

    Article  Google Scholar 

  61. Haller BC, Messer PW. SLiM 3: forward genetic simulations beyond the Wright-Fisher model. Mol Biol Evol. 2019;36:632–7.

    Article  CAS  Google Scholar 

  62. Tortereau F, Servin B, Frantz L, Megens HJ, Milan D, Rohrer G, et al. A high density recombination map of the pig reveals a correlation between sex-specific recombination and GC content. BMC Genomics. 2012;13:586.

    Article  CAS  Google Scholar 

  63. Santiago E, Novo I, Pardiñas AF, Saura M, Wang J, Caballero A. Recent demographic history inferred by high-resolution analysis of linkage disequilibrium. Mol Biol Evol. 2020;37:3642–53.

    Article  CAS  Google Scholar 

  64. Caballero A, Keightley PD. A pleiotropic nonadditive model of variation in quantitative traits. Genetics. 1994;38:883–900.

    Article  Google Scholar 

  65. Fox J, The R. Commander: a basic-statistics graphical user interface to R. J Stat Softw. 2005;14:1–42.

    Article  Google Scholar 

  66. Saura M, Fernández A, Rodríguez MC, Toro MA, Barragán C, Fernández AI, et al. Genome-wide estimates of coancestry and inbreeding in a closed herd of Iberian pigs. PLoS One. 2013;8: e78314.

    Article  CAS  Google Scholar 

  67. Amin N, van Duijn CM, Aulchenko YS. A genomic background-based method for association analysis in related individuals. PLoS One. 2007;2: e1274.

    Article  Google Scholar 

  68. Cockerham CC. Variance of gene frequencies. Evolution. 1969;23:72–84.

    Google Scholar 

  69. Kimura M, Crow JF. The measurement of effective population number. Evolution. 1963;17:279–88.

    Google Scholar 

  70. Robertson A. The interpretation of genotypic ratios in domestic animal populations. Anim Sci. 1965;7:319–24.

    Article  Google Scholar 

  71. Aldridge MN, Vandenplas J, Calus MPL. Efficient and accurate computation of base generation allele frequencies. J Dairy Sci. 2019;102:1364–73.

    Article  CAS  Google Scholar 

  72. Robertson A. Inbreeding in artificial selection programmes. Genet Res. 1961;2:189–94.

    Article  Google Scholar 

  73. Doekes HP, Bijma P, Windig JJ. How depressing is inbreeding? A meta-analysis of 30 years of research on the effects of inbreeding in livestock. Genes (Basel). 2021;12:926.

    Article  CAS  Google Scholar 

  74. Hall SJ. Effective population sizes in cattle, sheep, horses, pigs and goats estimated from census and herdbook data. Animal. 2016;10:1778–85.

    Article  CAS  Google Scholar 

  75. Palstra FP, Ruzzante DE. Genetic estimates of contemporary effective population size: what can they tell us about the importance of genetic stochasticity for wild population persistence? Mol Ecol. 2008;17:3428–47.

    Article  Google Scholar 

  76. Hedrick PW. What is the evidence for heterozygote advantage selection? Trends Ecol Evol. 2012;27:698–704.

    Article  Google Scholar 

  77. Yang J, Mezmouk S, Baumgarten A, Buckler ES, Guill KE, McMullen MD, et al. Incomplete dominance of deleterious alleles contributes substantially to trait variation and heterosis in maize. PLoS Genet. 2017;13: e1007019.

    Article  Google Scholar 

  78. Pérez-Pereira P, López-Cortegano E, García-Dorado A, Caballero A. Prediction of fitness under different breeding designs in conservation programs. Anim Conserv. 2022. https://doi.org/10.1111/acv.12804.

    Article  Google Scholar 

Download references

Acknowledgements

We are grateful to Jack Dekkers, an anonymous Associate Editor and two anonymous reviewers for helpful comments on the different versions of the manuscript, and to Sebastián Ramos-Onsins for help with the implementation of SLiM3.

Funding

This work was funded by MCIN/AEI/10.13039/501100011033 (Project PID2020-114426GB), Xunta de Galicia (GRC, ED431C 2020-05), Centro singular de investigación de Galicia accreditation 2019-2022, and the European Union (European Regional Development Fund—ERDF) “A way to make Europe”. This study forms part of the Marine Science programme (ThinkInAzul) supported by Ministerio de Ciencia e Innovación and Xunta de Galicia with funding from European Union NextGenerationEU (PRTR-C17.I1) and European Maritime and Fisheries Fund.

Author information

Authors and Affiliations

Authors

Contributions

AC designed the initial steps of the study, wrote the programs and carried out the simulations. AF performed the analyses with pig data. All authors contributed in the gathering of empirical data for correlations among estimators, in the focus of the study and in the interpretation and discussion of results. All authors contributed in the writing of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Armando Caballero.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: Figure S1.

Distribution of mutational homozygous effects and dominance coefficients assumed in the simulations. a Homozygous effects (s) had an exponential distribution with mean effect \(\overline{s }\) = 0.1 or 0.025. The number of mutations are scaled by the haploid mutation rate in each model. b Dominance coefficients (h) were assumed to have an inverse relationship with s values and were taken from a uniform distribution between 0 and e(−ks), where k is a constant needed to get an average value of \(\overline{h }\) = 0.2.

Additional file 2: Table S1.

Number of deleterious loci at generation 0 and those lost, fixed or segregating at generation 20 and their average frequency, for a range of selection coefficients (s). The results refer to a simulation with N = 20 individuals carried for 20 generations with scheme RC.

Additional file 3: Figure S2.

Average inbreeding load (B), additive (VA) and dominance (VD) variances for fitness for a population of N = 20 breeding individuals maintained for 20 discrete generations assuming random mating and random contributions from parents to progeny (RC). The inbreeding load (B) measures the cumulative deleterious effect of (partially) recessive mutations that is hidden in large non-inbred populations and is expressed by inbreeding and is quantified in terms of number of lethal equivalents per haploid genome. In the absence of selection, B equals the rate of inbreeding depression (ΔID). The values of B in the simulations were calculated as the sum over loci of 2dpq ([4], p. 180), where p and q = 1 − p are the frequencies for the wild-type and deleterious allele, respectively, and d is the dominance effect which accounts for the deviation of the fitness value of the heterozygote from the average fitness of the two homozygotes (d = s(1 − 2h)/2, where s is the selection coefficient and h the dominance coefficient for each locus; [4], p. 44). The upper graph shows the decline in B across generations due to the loss of deleterious mutations by genetic drift and genetic purging selection. The red circle indicates the value of ΔID from FIBD observed at generation 20. The lower graph shows the change in the average additive (VA) and dominance variance (VD) across generations for fitness, which were calculated as the sum over loci of 2α2pq and (2dpq)2, respectively, were α = s/2 + d(1 − 2q) is the average effect of an allelic substitution ([4], p. 44).

Additional file 4: Table S2.

Statistical tests of the difference between mean F values from marker-based estimators and mean FIBD values. Bootstraps are based on 1000 resamplings. 1Limits for 95% confidence intervals. 2Probability values < 0.001

Additional file 5: Figure S3.

Inbreeding estimates, correlations with FIBD and estimates of the rate of inbreeding depression for fitness obtained for a population of N = 20 breeding individuals maintained for 20 discrete generations assuming random mating and random contributions from parents to progeny (RC), equalization of contributions from parents to progeny (EC), and artificial selection for a neutral quantitative trait (SEL). Mean (F) and variance (VF) of inbreeding coefficients at generation 20, correlation between estimated inbreeding coefficients and those obtained from IBD measures (r), and mean values of the rate of inbreeding depression (ΔID). Bars refer to true IBD values (FIBD), and estimated from pedigree records (FPED) and from different marker-based measures (FVR1, FVR2, FYA1, FYA2, FLH1, FLH2, FHOM; see text for definitions) assuming the frequencies of the base generation (blue bars), those of the current generation (yellow bars) or a constant frequency of 0.5 (Fq05; purple bars). Estimates from runs of homozygosity are shown for fragments longer than 1 Mb (FROH-1) or 5 Mb (FROH-5). Only subscripts of estimators are shown for the sake of clarity.

Additional file 6: Figure S4.

Inbreeding estimates, correlations with FIBD and estimates of the rate of inbreeding depression for fitness obtained for a population of N = 100 breeding individuals maintained for 50 discrete generations assuming random mating and random contributions from parents to progeny (RC), equalization of contributions from parents to progeny (EC), and artificial selection for a neutral quantitative trait (SEL). Mean (F) and variance (VF) of inbreeding coefficients at generation 50, correlation between estimated inbreeding coefficients and those obtained from IBD measures (r), and mean values of the rate of inbreeding depression (ΔID). Bars refer to true IBD values (FIBD), and estimated from pedigree records (FPED) and from different marker-based measures (FVR1, FVR2, FYA1, FYA2, FLH1, FLH2, FHOM; see text for definitions) assuming the frequencies of the base generation (blue bars), those of the current generation (yellow bars) or a constant frequency of 0.5 (Fq05; purple bars). Estimates from runs of homozygosity are shown for fragments longer than 1 Mb (FROH-1) or 5 Mb (FROH-5). Only subscripts of estimators are shown for the sake of clarity.

Additional file 7: Table S3.

Minimum and maximum values of the individual inbreeding coefficient (F) for populations of size N run for t generations assuming random mating and random contributions from parents to progeny (RC), equalization of contributions from parents to progeny (EC), and artificial selection for a neutral quantitative trait (SEL). Values refer to true IBD values (FIBD), and estimated from pedigree records (FPED) and from different marker-based measures (FVR1, FVR2, FYA1, FYA2, FLH1, FLH2, FHOM, FROH; see text for definitions). The results correspond to those of Fig. 1, Additional file 5: Fig. S3 and Additional file 6: Fig. S4.

Additional file 8: Figure S5.

Power to detect inbreeding depression obtained by counting the percentage of replicates where the value of ΔID was significantly different from zero with a 95% probability for a population of N breeding individuals maintained assuming random mating and random contributions from parents to progeny (RC), equalization of contributions from parents to progeny (EC), and artificial selection for a neutral quantitative trait (SEL). Populations with N = 20 run for 10 (a) or 20 (b) generations, and for N = 100 run for 50 generations (c). Bars refer to true IBD values (FIBD), and estimated from pedigree records (FPED) and from different marker-based measures (FVR1, FVR2, FYA1, FYA2, FLH1, FLH2, FHOM; see text for definitions) assuming the frequencies of the base generation (blue bars), those of the current generation (yellow bars) or a constant frequency of 0.5 (Fq05; purple bars). Estimates from runs of homozygosity are shown for fragments longer than 1 Mb (FROH-1) or 5 Mb (FROH-5). Only subscripts of estimators are shown for the sake of clarity.

Additional file 9: Figure S6.

Correlation between the values of inbreeding obtained with the different estimators when base population frequencies are known or those from the current generation are used assuming random mating and random contributions from parents to progeny (RC), equalization of contributions from parents to progeny (EC), and artificial selection for the quantitative trait (SEL) under a neutral model of variation for fitness. Populations of N = 20 breeding individuals maintained for 10 discrete generations (a), of N = 20 breeding individuals maintained for 20 discrete generations (b), and of N = 100 breeding individuals maintained for 50 discrete generations (c). In the first set of graphs (ac) allele frequencies of the base population are assumed for the estimators. In the second set (df), current allele frequencies are assumed. Only subscripts of estimators are shown for the sake of clarity.

Additional file 10: Figure S7.

Mean estimates of the inbreeding coefficient (F), variance of F values (VF), correlation (r) between F estimates and IBD measures, and estimates of the rate of inbreeding depression for fitness (ΔID) obtained for a population maintained with random mating and random contributions from parents to progeny (RC). a Scenario with N = 20 individuals run for 10 generations but considering a density of SNPs more than double of that in Fig. 1 (see Table 1). b Scenario with N = 20 individuals run for 10 generations assuming an alternative model of deleterious mutations where the mean effect of homozygous effects was 1/4 of that considered in the previous figures. c Scenario with N = 100 individuals run for 100 generations instead of 50. d Scenario with N = 20 individuals run for 20 generations where the base population is set up at generation 10. Bars refer to true IBD values (FIBD), and estimated from pedigree records (FPED) and from different marker-based measures (FVR1, FVR2, FYA1, FYA2, FLH1, FLH2, FHOM; see text for definitions) assuming the frequencies of the base generation (blue bars), those of the current generation (yellow bars) or a constant frequency of 0.5 (Fq05; purple bars). Estimates from runs of homozygosity are shown for fragments longer than 1 Mb (FROH-1) or 5 Mb (FROH-5). Only subscripts of estimators are shown for the sake of clarity.

Additional file 11: Figure S8.

Estimates of the rate of inbreeding depression (ΔID) for a sample of N = 100 individuals sampled from a large population.The estimates refer to different marker frequency-based measures (FVR1, FVR2, FYA1, FYA2, FLH1, FLH2; see text for definitions), assuming the frequencies of the current generation, estimates from homozygosity of SNPs (FHOM), and estimates from runs of homozygosity for fragments longer than 1 Mb (FROH-1) or 5 Mb (FROH-5). Estimates are obtained assuming minor allele frequencies (MAF) equal to 0 (red bars), 0.01 (orange bars) and 0.05 (green bars). The horizontal red line indicates the true ΔID in the population.

Appendices

Appendix

Derivation of the estimator from Yang et al.

The estimator proposed by Yang et al. [20] is based on the correlation between the allele frequencies of uniting gametes. Let \({p}_{kg1}\) and \({p}_{kg2}\) be the allele frequencies for a biallelic locus \(k\) of the two gametes producing a given individual \(i\). Thus, \({p}_{kg1}\) and \({p}_{kg2}\) take values of 1 or 0 if the gametes carry or not a given allele of reference. The allele frequency of individual \(i\) at the locus is thus:

$${p}_{ki}=\frac{{p}_{kg1}+{p}_{kg2}}{2},$$
(1)

which takes values 0, ½ or 1, if the individual carries no copies, one copy or two copies of the reference allele.

The correlation of allele frequencies across all \(S\) loci for individual \(i\) is:

$${r}_{UG}=\frac{1}{S}{\sum }_{k=1}^{S}\frac{({p}_{kg1}-{p}_{k})({p}_{kg2}-{p}_{k})}{{p}_{k}{q}_{k}},$$
(2)

where \({p}_{k}\) and \({q}_{k}=1-{p}_{k}\) are the average frequencies for the two alleles at locus \(k\) in the whole population.

Substituting Eq. (1) into Eq. (2) we obtain:

$${r}_{UG}=\frac{1}{S}{\sum }_{k=1}^{S}\frac{(2{p}_{ki}-{p}_{kg2}-{p}_{k})(2{p}_{ki}-{p}_{kg1}-{p}_{k})}{{p}_{k}{q}_{k}}=\frac{1}{S}{\sum }_{k=1}^{S}\frac{{(2{p}_{ki}-{p}_{k})}^{2}-\left(2{p}_{ki}-{p}_{k}\right){p}_{kg1}-\left(2{p}_{ki}-{p}_{k}\right){p}_{kg2}+{p}_{kg1}{p}_{kg2}}{{p}_{k}{q}_{k}}=\frac{1}{S}{\sum }_{k=1}^{S}\frac{\left(2{p}_{ki}-{p}_{k}\right)\left[\left(2{p}_{ki}-{p}_{k}\right)-\left({p}_{kg1}+{p}_{kg2}\right)\right]+{p}_{kg1}{p}_{kg2}}{{p}_{k}{q}_{k}}=\frac{1}{S}{\sum }_{k=1}^{S}\frac{-\left(2{p}_{ki}-{p}_{k}\right){p}_{k}+{p}_{kg1}{p}_{kg2}}{{p}_{k}{q}_{k}}.$$

Adding and subtracting \(2{p}_{ki}^{2}\) in the numerator and operating, \({r}_{UG}=\frac{1}{S}{\sum }_{k=1}^{S}\frac{{({p}_{ki}-{p}_{k})}^{2}+{p}_{ki}^{2}-2{\left(\frac{{p}_{kg1}+{p}_{kg2}}{2}\right)}^{2}+{p}_{kg1}{p}_{kg2}}{{p}_{k}{q}_{k}}=\frac{1}{S}{\sum }_{k=1}^{S}\frac{{({p}_{ki}-{p}_{k})}^{2}+{p}_{ki}^{2}-\left(\frac{{p}_{kg1}^{2}+{p}_{kg2}^{2}}{2}\right)}{{p}_{k}{q}_{k}}\)

Because, \({p}_{kg1}^{2}={p}_{kg1}\) and \({p}_{kg2}^{2}={p}_{kg2}\), then,

$${r}_{UG}=\frac{1}{S}{\sum }_{k=1}^{S}\frac{{({p}_{ki}-{p}_{k})}^{2}-{p}_{ki}(1-{p}_{ki})}{{p}_{k}{q}_{k}}.$$
(3)

Equation (3), which was also shown in the Appendix of Caballero et al. [32], indicates that the estimator is based on the squared deviation of the allele frequencies of individuals from the population mean minus the expected variance within individuals. This contrasts with the estimator of VanRaden [19], which is based on the correlation of allele frequencies between individuals and does not include the correction for the expected variance within individuals in the numerator of its expression,

$${F}_{VR1}=\frac{{\sum }_{k=1}^{S}2{({p}_{ki}-{p}_{k})}^{2}}{{\sum }_{k=1}^{S}{p}_{k}{q}_{k}}-1.$$
(4)

(see Appendix of Caballero et al. [32]).

After some algebra on Eq. (3),

$${r}_{UG}=\frac{1}{S}{\sum }_{k=1}^{S}\frac{4{p}_{ki}^{2}-2{p}_{ki}\left(1+2{p}_{k}\right)+2{p}_{k}^{2}}{{p}_{k}{q}_{k}}.$$

And taking \({x}_{k}=2{p}_{ki}\) as the number of alleles of reference, i.e. \({x}_{k}\) takes values 0, 1 or 2 for individual \(i\),

$${F}_{YA2}={r}_{UG}=\frac{1}{S}\sum_{k=1}^{S}\frac{{x}_{k}^{2}-\left(1+2{p}_{k}\right){x}_{k}+2{p}_{k}^{2}}{2{p}_{k}{q}_{k}},$$
(5)

which is the equation shown in the main text for this estimator. Using the same change of terms, Eq. (4) becomes the equation for \({F}_{VR1}\) of the main text.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Caballero, A., Fernández, A., Villanueva, B. et al. A comparison of marker-based estimators of inbreeding and inbreeding depression. Genet Sel Evol 54, 82 (2022). https://doi.org/10.1186/s12711-022-00772-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12711-022-00772-0