Effects of pedigree errors on the efficiency of conservation decisions
© Oliehoek and Bijma; licensee BioMed Central Ltd. 2009
Received: 22 December 2008
Accepted: 14 January 2009
Published: 14 January 2009
Conservation schemes often aim at increasing genetic diversity by minimizing kinship, and the best method to achieve this goal, when pedigree data is available, is to apply optimal contributions. Optimal contributions calculate contributions per animal so that the weighted average mean kinship among candidate parents is minimized. This approach assumes that pedigree data is correct and complete. However, in practice, pedigrees often contain errors: parents are recorded incorrectly or even missing. We used simulations to investigate the effect of these two types of errors on minimizing kinship. Our findings show that a low percentage of wrong parent information reduces the effect of optimal contributions. When the percentage of wrong parent information is above 15%, the population structure and type of errors, should be taken into account before applying optimal contributions. Optimal contributions based on pedigrees with missing parent information hampers conservation of genetic diversity; however, missing parent information can be corrected. It is crucial to know which animals are founders. We strongly recommend that pedigree registration include whether missing parents are either true founders or non-founders.
Genetic diversity within populations is necessary for adaptive capacity and avoidance of inbreeding depression on the long term. A critical fact is that small populations are at risk of losing their adaptive capacity because genetic drift constantly lowers genetic diversity. An important strategy in conservation genetics is the preservation of genetic diversity by minimizing the average mean kinship via the preferential breeding of genetically important, or distantly related, animals [1, 2]. In theory, the most efficient method to minimize kinship is to use optimal contribution selection (OCS) [3, 4], a strategy that calculates contributions so that the weighted average mean kinship among potential parents (candidates) is minimized. This strategy associates higher contributions to genetically important animals, while animals with over-represented ancestors receive lower or zero contributions.
OCS has been implemented using either complete and correct information on pedigrees  or a sufficient number of molecular markers per candidate [5, 6]. However, in other cases, pedigree information has been erroneous, either because of missing parent information, resulting in gaps in the pedigree, or because of wrong parent information resulting in misidentified parents. In zoo populations, missing parent information is more often the rule than the exception , and even for many commercial domestic populations, it is well known that the recorded pedigree does not generally fully represent the true pedigree.
Overview of percentage of wrong parent information.
German dairy cattle
Israeli Holstein cows
Israeli Holstein cows (same pop.)
Sheep, USA (mismothering)
Lipizzaner Hors (mismothering)
UK dairy cattle (misfathering)
New Zealand dairy cattle
Sheep, New Zealand (misfathering)
Dutch dairy cows (misfathering)
Sheep, USA (misfathering)
Little is known on the effects of erroneous pedigree information on the efficiency of conservation decisions. In this article, we analyze the effect of missing parent or wrong sire information on the amount of diversity conserved when OCS is applied as a conservation strategy using a Monte Carlo simulation. We have investigated the amount of diversity saved by comparing three different situations: (1) OCS based on observed pedigree (including wrong and/or missing pedigrees), (2) OCS based on true pedigrees, and (3) breeding with equal contributions, a method that requires no (pedigree) information.
A simulation was conducted to produce 200 replicates of diploid populations with both true and observed pedigree information. True pedigrees were converted to erroneous pedigrees using two methods: (1) changing sire records, resulting in wrong sire information (WSI) and (2) setting parent records to missing, resulting in missing parent information (MPI). To understand the impact of population parameters, a panmictic standard population and deviations were simulated. For each replicate, the true kinship based on true pedigree and the observed kinship based on observed pedigree with WSI and/or MPI were calculated in the 10th generation. Subsequently, effects of pedigree errors in the 10th generation were assessed using statistical criteria for true and observed kinship, and by comparing saved diversity based on true versus observed kinship. Instead of evaluating the effects for only one generation, an additional breeding scheme evaluated effects over multiple generations. In all schemes, the population sizes and sex ratios varied.
A panmictic (random mating) population was used as the basic model. Populations were bred for 10 discrete generations from a base generation of (unrelated) founders. For each generation, 10 males and 50 females were randomly selected as parents of the next generation. Females produced an average litter of 2.5, which was a Poisson-distributed litter size. Males had a Poisson-distributed number of mates (on average 5) and the average number of progeny was 12.5. For each generation, offspring were produced using random mating and both the true and observed pedigrees were recorded. Parameters derived from observed pedigree information are indicated with '~' in this paper. True kinship (f) between individuals was calculated from the true pedigree, and observed kinship () was calculated from the observed pedigree using the tabular method . The 10th generation had a fixed number of 100 individuals (candidate parents).
Wrong sire information (WSI)
For each generation, observed pedigrees were created from true pedigrees by substituting 0% to 25% of the true fathers by another father taken at random from the same generation as the true father.
Missing parent information (MPI)
For each generation, observed pedigrees were created from true pedigrees, by setting, sires, or both parents to missing for 0% to 100% random individuals.
WSI and MPI combined
The combined effect of WSI and MPI was investigated by applying 0% to 100% MPI on the standard population with 10% WSI.
Correction for missing pedigree information
Kinship can be corrected for MPI. VanRaden  have stated that unknown parents should be related to all other parents by twice the mean inbreeding level of the period. Instead of mean inbreeding level, the average mean kinship among parents was used.
For each replicate, both true and observed kinships were calculated between all pairs of individuals from the 10th generation using the tabular method . The effect of WSI and/or MPI was investigated by comparing true and observed kinships using two types of criteria: (1) statistical criteria and (2) a diversity criterion.
Three statistical criteria were used for the analysis: (1) the correlation between true and observed kinships (ρ), which measures the proportion of the variance in true kinship explained by observed kinship; (2) the regression coefficient of observed kinship on true kinship (β1), which is a measure for bias in the observed differences in kinship among pairs of individuals; and (3) the regression coefficient of true kinship on observed kinship (β2), which indicates whether observed kinship is an "unbiased" prediction of true kinship. In practice, the latter is important since conservation decisions are based on observed kinship and not on true values . Kinship of individuals with themselves was excluded from all three statistical criteria.
Though statistical criteria are informative, they do not directly reveal the amount of conserved genetic diversity when using observed pedigrees in practice. In addition, we applied a diversity criterion, DS, which evaluates the Diversity Saved when optimal contributions are based on observed pedigrees. DS was calculated from three underlying diversity measures, which are expressed on the scale of founder genome equivalents (FGE) . FGE s are the number of equally contributing founders with no random loss of founder alleles in descendants that would be expected to produce the same genetic diversity (or kinship) as the population under study [20, 21]. This scale is a natural number and easier to interpret than probabilities or percentages . The three underlying diversity measures were (1) N EC , genetic diversity conserved when equal contributions were applied; (2) N OC , genetic diversity conserved when OCS were applied based on true kinship; and (3) , the genetic diversity conserved when OCS were applied based on observed kinship (hence the '~').
where F is a matrix of true kinships among all individuals, including kinship of individuals with themselves, and c is a column vector of proportional contributions of candidate parents to future generation (which were always 100 animals in the 10th generation), so that sum of elements of c equals one . By varying the contributions of individuals (c), average mean kinship among candidates, and thus the average mean kinship in the future generations, can be increased or decreased.
N EC was calculated by substituting c in Equation 1 with c EC , which is a vector of equal contributions per candidate parent, so that the sum of elements of c EC equals one. N EC is simply the average mean kinship of the current population, expressed on the scale of FGE.
where 1 is a column vector of ones. When negative contributions were obtained, the most negative contribution was set to zero and vector c OC was recalculated until all contributions were non-negative. This method does not necessarily find the true optimal solution. True optimum was always found, however, when contributions were not fixed a . N OC measures the diversity that could be obtained in future generations (assuming overlap) and a practical example is the selection of animals for a gene bank to reconstruct a future population.
was calculated by substituting c in Equation 1 with the observed optimum contribution vector (). was calculated by substituting F in Equation 2 by the matrix of observed kinship (). measures the obtained diversity when OCS is applied on observed pedigrees.
DS evaluates the Diversity Saved when optimal contributions were based on observed pedigrees; , as a fraction of the full amount of diversity that could have been saved with optimal contributions based on true pedigree data; N OC – N EC . Equal contributions were used as a base of comparison, as this would be the logical selection method if no information on kinship is available.
Note that in practice not all the individuals can be parent, even when desired, which causes genetic drift. This could cause a setback in the genetic diversity gained for both equal contribution- as well as optimal contribution-schemes.
The 'observed N OC ' () was calculated by substituting c and F in Equation 1 with and . Breeders only have observed pedigrees. Therefore, the true genetic diversity obtained due to optimal contributions () is not known to breeders. Hence, is the genetic diversity that breeders predict to obtain, based on the observed pedigrees.
Optimal contribution selection scheme for multiple generations
where is a vector of proportional contributions of (n) selection candidates to the next generation, so that contributions of males within equals 1/2 and contributions of females within equals 1/2, is a matrix of kinship based on observed pedigrees, 1 is a column vector of ones, and Q is a (2 × n) design matrix indicating sex of the selection candidates. When negative contributions were obtained, the most negative contribution was set to zero and was recalculated until all contributions were non-negative. Next, these continuous contributions per candidate were converted into a desired number of offspring per candidate. Each generation, mating began with a randomly assigned male and female that produced progeny, until one reached its desired number of offspring. Then, another random male or female candidate was assigned to the remaining male or female in order to produce progeny until one reached its desired number of offspring. This was repeated until all selected candidates reached their desired number of offspring, and the last generation resulted in 100 individuals. , N OC and N EC were obtained by five generations of selection using Equation 4: with selection was based on pedigrees containing errors; with N OC selection was based on true pedigrees; and with N EC selection was based on MPI of 100% (a scenario that comes close to equal contributions). Hence, DS was calculated by equation 3.
Results and discussion
Wrong sire information (WSI)
Simulations with larger population sizes or differences in sex ratio showed the same trend for β1, β2, ρ and DS as the standard population (results not shown). The slope of DS was less than when sex ratio was higher. For example, with a 1:1 sex ratio, DS decreases by about 0.022 with each 1% increase of WSI, and DS would be zero at approximately 45% WSI.
Our results indicate a moderately negative influence of wrong parent information on genetic variation saved by means of OCS in panmictic (random-mating) populations. Our findings suggest that in a panmictic population with approximately 10 to 20% WSI, which is common in practice (Table 1), OCS would, on average, save more genetic diversity than equal contributions. In some cases, however, selection of parents by OCS might decrease diversity more than the application of equal contributions. Nevertheless, equal contributions do not have that risk. Note that in real populations, dam information may also be wrong.
Missing parent information (MPI)
Overestimation of diversity is also shown by β2 (Figure 4). To avoid overestimation of the conserved genetic diversity, it is important that observed kinship is an "unbiased" predictor of true kinship, which requires that β2 equals one. In the case of WSI, β2 gradually decreases. The strong decrease of β2 in the case of MPI indicates that the amount of conserved genetic diversity will be overestimated when selecting the least related individuals based on observed kinship. Although β2 indicates overestimation (Figure 4), it does not predict the strong overestimation of in Figure 5.
A similar trend for DS was observed in simulations where only sires were missing, though DS behaved slightly differently. Logically, correlation for missing sire information decreased less rapidly than with both parents missing (results not shown).
OCS breeding scheme for multiple generations
Fraction diversity saved (DS) after five generations of breeding by OCS based on observed pedigrees gradually decreased with increasing percentages of wrong sires (WSI). With WSI of 0%, DS is 1 by definition; with 10%, DS was 0.73; and with 25%, DS was 0.43. DS decreased roughly by 0.022 with each 1% increase of WSI. Extrapolation showed that DS would be zero at around 46% WSI.
This research investigated a panmictic population, assuming control over a population. In practice, species or populations differ in population structure due to aspects like unequal sex ratio and/or limited number of progeny per female, etc. Conservationists have to consider these constraints. With unequal sex-ratio for example, equal contributions cannot be applied and instead optimal management of mate selection across multiple generations yields lowest rates of increase of kinship [24, 25].
The results imply that using only pedigree information in conservation warrants caution.
On average, the genetic diversity saved by optimal contributions is less with low percentages of WSI. If WSI is over 35%, on average, optimal contributions preserve less genetic diversity than equal contributions. The impact of WSI on genetic diversity for a single population, however, might deviate from this average (Figure 3). In addition, when pedigrees are known to contain more than approximately 15% wrong parent information (misidentified fathers plus mothers) in a panmictic population, conservationist should consider alternative breeding methods, because expected gain is relatively low compared to alternatives like optimal management of mate selection across multiple generations. Populations in need of conservation, however, often deviate from a panmictic population. Furthermore, the type of error expected should also be taken into consideration. This research investigated the worst type of WSI. In practice, misidentified sires are sometimes related to the true sire, for example with natural mating occurs within herds. We also found that DS decreased slower due to VanRaden-corrected MPI (Figure 6) than due to WSI (Figure 4). In conclusion, wrong parent information above 15% might be acceptable in practice, depending on the type of error and the population structure. Traditionally, MPI is bypassed in pedigree analysis by the assumption that animals with unknown parents are founders , resulting in an overestimation of the available genetic diversity. Optimal contributions are extremely sensitive to differences in kinship between candidates. Small differences in pedigree can make the difference between significant or zero contribution for an individual animal. Animals with gaps in their pedigree will be considered unrelated and therefore be given high contributions. In this situation, equal contributions to each candidate parent would maintain diversity. Therefore, optimal contributions based on pedigrees with MPI can perform less well than equal contributions.
Overall this indicates that low percentage of MPI should always be corrected prior to the application of OCS. Even a simple correction of MPI by randomly assigned parents would increase diversity, which would leave breeders with wrong parent information. However, to correct for gaps in pedigrees, more sophisticated solutions have been presented. Ballou and Lacy  have proposed the calculation of kinship based only on the portion of the genome that descends from true founder animals, excluding the proportion due to animals with unknown parents. VanRaden  corrected gaps in pedigrees by assuming that unknown parents are related to all other parents by twice the average inbreeding level of that period. VanRaden is occasionally applied to calculate kinship . Compared to VanRaden, the Ballou and Lacy-correction creates more variance among kinship values, which has a possible negative impact on OCS. Therefore, the VanRaden was applied to correct for MPI in this research.
We recommend two policies for conservation. First, measures that avoid errors in pedigree are encouraged. One obvious measure is to sample animal tissue, since DNA can be used both for parentage analysis and kinship estimation . Second, pedigree-registration, like herd-books, should include information on the status of animals without parent records: whether they are (1) founders (wild-caught or otherwise known to be unrelated) or (2) related and descending from founders. Within kinship calculation, the latter should always be corrected, for example by using the VanRaden or a similar algorithm.
We thank Sipke Joost Hiemstra, Jack Windig and Johan van Arendonk for their thorough comments on previous versions and two anonymous referees for comments and suggestions. This work was financed by the Ministry of Agriculture, Nature and Food Quality through the Centre for Genetic Resources, the Netherlands (CGN).
- Ballou JD, Lacy RC: Identifying genetically important individuals for management of genetic variation in pedigreed populations. Population management for survival and recovery: analytical methods and strategies in small population conservation. Edited by: Ballou JD, Gilpin ME, Foose TJ. 1995, New York: Columbia University Press, 375: 76-111.Google Scholar
- Frankham R, Ballou JD, Briscoe DA: Introduction to Conservation Genetics. 2002, Cambridge, UK: Cambridge University PressView ArticleGoogle Scholar
- Pong-Wong R, Woolliams JA: Optimisation of contribution of candidate parents to maximise genetic gain and restricting inbreeding using semidefinite programming (Open Access publication). Genetics Selection Evolution. 2007, 39: 3-25. 10.1051/gse:2006031.View ArticleGoogle Scholar
- Sonesson AK, Meuwissen THE: Minimization of rate of inbreeding for small populations with overlapping generations. Genetical Research. 2001, 77: 285-292. 10.1017/S0016672301005079.View ArticlePubMedGoogle Scholar
- Eding H, Crooijmans R, Groenen MAM, Meuwissen THE: Assessing the contribution of breeds to genetic diversity in conservation schemes. Genetics Selection Evolution. 2002, 34: 613-633. 10.1051/gse:2002027.View ArticleGoogle Scholar
- Oliehoek PA, Windig JJ, van Arendonk JAM, Bijma P: Estimating Relatedness Between Individuals in General Populations With a Focus on Their Use in Conservation Programs. Genetics. 2006, 173: 483-496. 10.1534/genetics.105.049940.PubMed CentralView ArticlePubMedGoogle Scholar
- Earnhardt JM, Thompson SD, Schad K: Strategic planning for captive populations: Projecting changes in genetic diversity. Animal Conservation. 2004, 7: 9-16. 10.1017/S1367943003001161.View ArticleGoogle Scholar
- Sanders K, Bennewitz J, Kalm E: Wrong and missing sire information affects genetic gain in the Angeln dairy cattle population. Journal of Dairy Science. 2006, 89: 315-View ArticlePubMedGoogle Scholar
- Laughlin AM, Waldron DF, Craddock BF, Engdahl GR, Dusek RK, Huston JE, Lupton CJ, Ueckert DN, Shay TL, Coekett NE: Use of DNA markers to determine paternity in a multiple-sire mating flock. Sheep and Goat Research Journal. 2003, 18: 14-17.Google Scholar
- Kavar T, Brem G, Habe F, Sölkner J, Dovc P: History of Lipizzan horse maternal lines as revealed by mtDNA analysis. Genetics Selection Evolution. 2002, 34: 635-10.1051/gse:2002028.View ArticleGoogle Scholar
- Visscher PM, Woolliams JA, Smith D, Williams JL: Estimation of pedigree errors in the UK dairy population using microsatellite markers and the impact on selection. Journal of Dairy Science. 2002, 85: 2368-View ArticlePubMedGoogle Scholar
- Spelman RJ: Utilisation of molecular information in dairy cattle breeding. Proceedings of the 7th World Congress on Genetics Applied to Livestock Production. 2002, 11-17.Google Scholar
- Crawford AM, Tate ML, McEwan JC, Kumaramanickavel G, McEwan KM, Dodds KG, Swarbrick PA, Thompson P: How reliable are sheep pedigrees?. Proceedings of the New Zealand Society of Animal Production. 1993, 53: 363-366.Google Scholar
- Wang S, Foote WC: Protein polymorphism in sheep pedigree testing. Theriogenology. 1990, 34: 1079-1085. 10.1016/S0093-691X(05)80007-X.View ArticleGoogle Scholar
- Bovenhuis H, Van Arendonk JA: Estimation of milk protein gene frequencies in crossbred cattle by maximum likelihood. Journal of Dairy Science. 1991, 74: 2728-View ArticlePubMedGoogle Scholar
- Ron M, Domochovsky R, Golik M, Seroussi E, Ezra E, Shturman C, Weller JI: Analysis of Vaginal Swabs for Paternity Testing and Marker-Assisted Selection in Cattle. Journal of Dairy Science. 2003, 86: 1818-1820.View ArticlePubMedGoogle Scholar
- Weller JI, Feldmesser E, Golik M, Tager-Cohen I, Domochovsky R, Alus O, Ezra E, Ron M: Factors Affecting Incorrect Paternity Assignment in the Israeli Holstein Population. Journal of Dairy Science. 2004, 87: 2627-2640.View ArticlePubMedGoogle Scholar
- Emik LO, Terrill CE: Systematic Procedures For Calculating Inbreeding Coefficients. Journal of Heredity. 1949, 40: 51-55.PubMedGoogle Scholar
- VanRaden PM: Accounting for Inbreeding and Crossbreeding in Genetic Evaluation of Large Populations. Journal of Dairy Science. 1992, 75: 3136-3144.View ArticleGoogle Scholar
- Caballero A, Toro MA: Interrelations between effective population size and other pedigree tools for the management of conserved populations. Genetical Research. 2000, 75: 331-343. 10.1017/S0016672399004449.View ArticlePubMedGoogle Scholar
- Lacy RC: Analysis of founder representation in pedigrees: Founder equivalents and founder genome equivalents. Zoo Biology. 1989, 8: 111-10.1002/zoo.1430080203.View ArticleGoogle Scholar
- Hoffrage U, Lindsey S, Hertwig R, Gigerenzer G: Communicating statistical information. Science. 2000, 290: 2261-10.1126/science.290.5500.2261.View ArticlePubMedGoogle Scholar
- Meuwissen THE: Maximizing the response of selection with a predefined rate of inbreeding. Journal of Animal Science. 1997, 75: 934-940.PubMedGoogle Scholar
- Fernandez J, Toro MA, Caballero A: Fixed contributions designs vs. minimization of global coancestry to control inbreeding in small populations. Genetics. 2003, 165: 885-894.PubMed CentralPubMedGoogle Scholar
- Sánchez L, Bijma P, Woolliams JA: Minimizing inbreeding by managing genetic contributions across generations. Genetics. 2003, 164: 1589-1595.PubMed CentralPubMedGoogle Scholar
- Cole JB: PyPedal: A computer program for pedigree analysis. Computers and Electronics in Agriculture. 2007, 57: 107-113. 10.1016/j.compag.2007.02.002.View ArticleGoogle Scholar
- Ballou JD: Ancestral inbreeding only minimally affects inbreeding depression in mammalian populations. Journal of Heredity. 1997, 88: 169-View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.