Pedigree and marker information requirements to monitor genetic variability

There are several measures available to describe the genetic variability of populations. The average inbreeding coefficient of a population based on pedigree information is a frequently chosen option. Due to the developments in molecular genetics it is also possible to calculate inbreeding coefficients based on genetic marker information. A simulation study was carried out involving ten sires and 50 dams. The animals were mated over a period of 20 discrete generations. The population size was kept constant. Different situations with regard to the level of polymorphism and initial allele frequencies and mating scheme (random mating, avoidance of full sib mating, avoidance of full sib and half sib mating) were considered. Pedigree inbreeding coefficients of the last generation using full pedigree or 10, 5 and 2 generations of the pedigree were calculated. Marker inbreeding coefficients based on different sets of microsatellite loci were also investigated. Under random mating, pedigree-inbreeding coefficients are clearly more closely related to true autozygosity (i.e., the actual proportion of loci with alleles identical by descent) than marker-inbreeding coefficients. If mating is not random, the demands on the quality and quantity of pedigree records increase. Greater attention must be paid to the correct parentage of the animals.


INTRODUCTION
In the initial stages of conservation, populations may not be concerned with genetic progress, but simply with conserving genetic variation. This means that the rate of inbreeding should be minimised. Various suggestions have been made to achieve this. In principle, two questions have to be answered: which animals to select and how to mate them? Caballero and Toro [3,4] discuss that the optimal choice of breeding individuals requires minimisation of the average coancestry among the reproductive individuals weighted by their contribution to the next generation. The same authors point out that the choice of the mating system is less simple because it depends on the time scale of interest. In many practical breeding programmes the interest of conservation is more in the short rather than in the long term. In this case, avoidance of inbred matings seems to be appropriate [4]. If special breeding strategies are a condition for the financial support of endangered breeds, a critical judgement of the mating system becomes more important. This can be done with measures based on molecular genetic information [16]. Another way of measuring the mating system within a population is the comparison of the expected inbreeding coefficient under random mating (mean kinship in generation t) and the observed mean inbreeding coefficient in generation t + 1 based on pedigree information. This simple comparison allows a statement whether the average level of inbreeding is higher or lower than that expected under random mating conditions.
One weakness especially of the latter option is that the inbreeding coefficient depends very much on the quality of pedigree information. Developments in molecular genetics make it possible to calculate several measures based on genetic marker information. The aim of this study was to compare measures based on pedigree or genetic marker information with regards to the monitoring of endangered populations. Minimum requirements for the quantity and quality of the underlying source of information necessary to detect autozygous individuals (i.e., individuals with a high proportion of alleles identical by state) were investigated. The correlation between such measures and true autozygosity serves as an indicator of the quality of the underlying source of information.

SIMULATION STUDY
A simulation study was carried out for a population with ten sires and 50 dams. For each animal a genome was modelled consisting of 20 pairs of chromosomes with 50 loci each. In most situations, a total length of 30 Morgans for the whole genome was assumed. Two further genome lengths with 100 Morgans and 10 Morgans were investigated as well. The recombination rate between neighbouring loci was 0.03, 0.001 and 0.01, respectively. All loci were assumed to be neutral with regards to selection. No mutation events were modelled.
Three simple mating schemes were examined: Random mating (scheme I), mating of full sibs was avoided (scheme II) and a third scheme in which mating of half sibs and full sibs was not permitted (scheme III). In all schemes, each female was permitted to produce a maximum of two offspring (full sibs). Ten males and 50 females were generated as potential parents for the next generation. Animals were observed over a period of 20 discrete generations.
The level of true autozygosity (proportion of loci with alleles identical by state) and homozygosity (proportion of loci with alleles alike in state) at the loci of the whole genome was investigated for each animal of a reference population. The reference population was defined as the last simulated generation. True autozygosity was used as a reference for the measures described below.
The correlation between true autozygosity and several measures based on pedigree or marker information was calculated within each repetition for all animals of the reference population. The correlations presented in this paper are means and corresponding standard deviations of 100 repetitions for each situation.

Measures based on pedigree information
When pedigree inbreeding coefficients are computed in the sense of Malécot [13] or Wright [21], it is necessary to define the base population to which the present inbreeding is referred. In this case a "real" base population with unrelated individuals is present. In such a situation, the average inbreeding coefficient of the reference population can be taken as a measure for the true autozygosity. Under practical circumstances the "real" base population is never known, very often gaps and sometimes false parentage occur in pedigree records. These pedigree weaknesses influence the value of our measures for autozygosity. Three situations with regards to the quality of pedigrees were considered.

Length of pedigrees
Using a method described by VanRaden [19], we calculated inbreeding coefficients taking only 2, 5, 10 or the complete 20 generations into account. Studies on the genetic variability of several cattle breeds in Austria [15] and France [2] showed that a maximum number of 10 to 18 generations of animals in a defined reference population could be traced back. In the case of two highly endangered cattle breeds in Austria [1], this number was clearly lower (6 and 9).

Completeness of pedigrees
The maximum number of traceable generations gives no reliable information about gaps in the pedigree. A good way of describing the quality of a pedigree is the average complete generation equivalent (i.e., number of generations in a comparable complete pedigree) [2]. This measure was found to be very high (15.22) in Lipizzan horses [22], but clearly lower (1.73-6.18) in many cattle breeds [1,2,15]. We reduced the simulated pedigrees to mirror the quality of pedigrees in a rare Austrian cattle breed. The maximum number of traceable generations was set to 6 and known ancestors in the reduced pedigree were randomly exchanged against unknown ancestors. This resulted in average complete generation equivalents of 2 to 3. These strongly reduced pedigrees were used to calculate inbreeding coefficients.

Correctness of pedigrees
Errors in pedigrees are known to occur due to mis-mothering, misidentification and incorrect recording procedures. Several studies [5,7,14] show that the misidentification rate in cattle pedigrees varies between about 3 and over 20%. Even in Lipizzan horse pedigrees where a great importance is attached to correct pedigree recording, a small number of pedigree errors has been revealed by mtDNA analysis [10]. To take this into account, 1, 5, 10 and 20% of the sires were exchanged randomly against wrong animals in each generation. Inbreeding coefficients were calculated according to these incorrect pedigrees.

Measures based on genetic marker information
Molecular technologies provide direct information on genotypes at polymorphic loci. Therefore it is possible to analyse the system of mating of a population as a deviation from the heterozygosity expected under Hardy-Weinberg equilibrium using the following formula [8,16]: (1) where He is the expected heterozygosity calculated from allele frequencies in a defined base population with random mating, and Ho is the observed heterozygosity in a reference population. We used a similar formula to derive individual inbreeding coefficients based on marker information: where H eL is the expected heterozygosity for marker locus L (with L = 1, 2, 3, . . . , n) derived from the allele frequencies at locus L in the base population and H oL the observed heterozygosity at locus L. Assuming that all alleles at homozygous loci in the simulated true base population are alike in state but not identical by descent, the increase of homozygosity (i.e. the proportion of homozygous loci) can be used to estimate the true autozygosity of a reference population. In addition to these marker-based inbreeding coefficients, the level of homozygosity was calculated for each animal. In reality, the number of analysed marker loci is restricted, and allele frequencies in the "true" base population are usually not known. Also, genetic markers show different polymorphism and allele frequencies. Several scenarios described below were considered.

Number of genetic markers
A marker inbreeding coefficient was calculated for different sets of 20, 50, 100 and 200 marker loci equally spaced over the whole genome to cover information from each chromosome. Genetic markers were assumed to be fully informative. Table I. Assumed situations with regards to the number of alleles per locus and the initial allele frequencies in the base population.

Number of alleles per marker locus and allele frequencies
Various types of genetic markers are currently used. We considered one situation with seven different marker alleles per locus in the base population to mimic a microsatellite marker. In addition, marker loci with two alleles with different frequencies in the base population were simulated to evaluate the effect of SNP markers. The marker loci represent just a small part of the total genome which was modelled in the same way as the marker loci (Tab. I). This results in an ideal situation because marker loci mirror the rest of the genome. All loci were assumed to be in Hardy-Weinberg equilibrium.

Definition of the base population
Inbreeding coefficients must be related to a base population or they are meaningless [9]. The expected heterozygosity for each marker locus was calculated from allele frequencies in the true base population and for base populations 2, 5 and 10 generations back from the reference population. Table II gives an overview of the results for pedigree inbreeding coefficients under random mating. The average inbreeding coefficient of animals in the reference population is a good measure for true autozygosity when pedigrees can be traced back to the true base population. This is still the case with complete pedigrees with 20% false parentage in each generation. With pedigrees reduced in length, true autozygosity is severely underestimated. Generally the standard deviations of replicates were similar for true average autozygosity and average pedigree inbreeding coefficients.

Level of autozygosity and average pedigree inbreeding coefficients
As expected, the level of true autozygosity was lower after 20 generations with avoidance of mating with close relatives (Tab. II). The potential to infer the average level of true autozygosity from pedigree inbreeding coefficients was not influenced by the mating scheme.

Relationship between true autozygosity and pedigree inbreeding coefficients
Even with 2 generation pedigrees, the correlation between autozygosity and pedigree inbreeding coefficients was rather high (0.670) in situations with random mating (Tab. II). Therefore it seems to be possible to identify the most autozygous animals assuming parents and grandparents are known. Taking more than five generations of a correct pedigree into account leads only to a marginal increase of the correlation of pedigree inbreeding coefficients and autozygosity. Under random mating, inbreeding coefficients based on pedigrees with very low quality (reduced pedigrees) were still highly related to true autozygosity (0.613). In the case of 10% and 20% incorrect paternity in a complete pedigree, this correlation dropped to 0.58 and 0.40, respectively. The occurrence of false parentage of 20% or more seems to be a more severe problem for the identification of the most autozygous animals than does incompleteness and shortness of pedigrees. In cases of incomplete or short pedigrees the autozygosity of single animals or all animals within a generation is underestimated to the same extent, respectively. If there is false parentage, two types of errors might occur using pedigree inbreeding coefficients: underestimation and overestimation of the true autozygosity of individuals. The relationship between true autozygosity and pedigree inbreeding coefficients must become less close if correct parents are exchanged randomly against false ones so that the correlation drops. A noticeable higher standard deviation for the mean correlation over the 100 repetitions was observed in cases of pedigrees with false parentage compared to short and incomplete pedigrees.
In conservation breeding programmes, mating of close relatives is purposely avoided. Common ancestors could not be detected in short (two generations) and strongly reduced pedigrees when full-and half sib mating was strictly avoided (Tab. II). With false parentage it is almost impossible to make a statement about which individuals are highly autozygous.

Level of autozygosity and marker-based inbreeding coefficients
The results of inbreeding coefficients based on genetic marker information with an underlying total length of the genome of 30 Morgans are shown in Tables III and IV. Table III comprises the results for microsatellite markers (situation A). As in the case of pedigree-inbreeding coefficients, the true base population must be known to get a good estimate of the true level of autozygosity in a population. Otherwise true autozygosity is underestimated and the average marker-based inbreeding coefficient simply estimates the increase in homozygosity with regards to the defined base population (2, 5 or 10 generations back). In reality, allele frequencies several generations back are usually not known. In such a situation, no meaningful results for the level of autozygosity are obtained, if the expected heterozygosity is calculated on the allele frequencies of the current (reference) population (Tabs. III and IV). To get meaningful results on the evolution of genetic variability, genotyping of animals in different generations is necessary.
If only animals from the reference population were genotyped, the observed heterozygosity was usually slightly higher than the expected one resulting in negative values for the average level of marker inbreeding in the reference population (equation (1)). Templeton and Read [16] state that such negative values can be expected in finite populations with separate sexes because of random differences in allele frequency between sexes.
The average level of marker based inbreeding in the reference population was not influenced by the number of marker loci (Tab. III) but the standard deviation between replicates decreased with increasing number of marker loci. This can be explained by the increase of sample size (i.e. number of loci): the higher the number of marker loci the more reliable is the marker-based inbreeding coefficient.
Clearly, a high number of marker loci and knowledge about allele frequencies in the base population are necessary to get a reliable estimator for the level of autozygosity. In contrast, a quite low number of marker loci (20) was sufficient to measure the level of true homozygosity (Tab. III).

Relationship between true autozygosity and marker inbreeding coefficients
The correlations between marker inbreeding coefficients and true autozygosity show clearly that a rather high number of polymorphic loci must be analysed Table III. Arithmetic meansx for investigated marker-inbreeding coefficients with a different number of genetic markers of animals in the reference population and correlation with true autozygosityr for situation A; a total genome length of 30 Morgans and random mating, means calculated from 100 repetitions, standard deviations in italics. in order to detect autozygous individuals. The correlations between homozygosity calculated with different numbers of marker loci and true autozygosity are also shown. These correlations are identical to those derived using marker inbreeding coefficients, where the expected heterozygosity was calculated from allele frequencies in the real base population. It must be pointed out that even with 100 marker loci, this correlation is only as high as with very poor pedigree information (reduced pedigree information, r = 0.613) under random mating. The standard deviation for the mean correlation of the 100 replicates with marker based inbreeding coefficients is quite high compared to the correlations with pedigree inbreeding coefficients based on short pedigrees. Only with 200 marker loci is the standard deviation for the correlation as low or even lower than for pedigree inbreeding coefficients based on five or more generationpedigrees. Several studies dealing with marker-based kinship measures [6,12] showed that a high number of polymorphic markers is necessary to obtain reliable estimates for the relatedness of individuals. Eding and Meuwissen [6] concluded that by studying the scenarios presently used in the studies of genetic  diversity with 10-15 loci, it is impossible to distinguish even full sibs from half sibs. Our study shows that with such a low number of marker loci, the detection of highly inbred (autozygous) animals is not possible. The level of polymorphism and the allele frequencies also influence the usefulness of a genetic marker. Table IV shows that higher polymorphic markers and even allele frequencies are more useful in identifying autozygous animals, which corresponds with results from Toro et al. [17].

Number
The information content of a set of genetic marker loci is also influenced by the length of the whole genome. In our simulation study, we considered three simple cases in which the length of the genome depended only on the given recombination rate between neighbouring loci. Table V shows the results for a fixed number of loci and chromosomes but different lengths of the total genome (10, 30 and 100 Morgans, respectively). For the same number of marker loci the correlation between the marker inbreeding coefficient and true autozygosity is closer for smaller genomes due to the smaller recombination rate.
The correlations between true autozygosity and marker inbreeding coefficients drop slightly in the case of non-random mating schemes. The decrease of these correlations is lower than the observed decrease for the pedigree inbreeding coefficients (Tab. II).

Pedigree inbreeding coefficients versus marker inbreeding coefficients
In all situations under random mating with less than 100 marker loci, the correlation between autozygosity and marker inbreeding coefficients (Tab. III) was lower than with pedigree inbreeding coefficients (except for pedigrees with 20% false paternity, see Tab. II). Only ideal genetic marker loci with regards to codominance and possible mutations were considered. Mutations and genotyping errors would lead to even worse results for marker based inbreeding coefficients. Therefore, marker inbreeding coefficients do not appear to be a favourable method for identifying autozygous animals when reliable pedigree information is available. An additional point is that in most cases we assume the allele frequencies in a defined base population to be known, which is quite unrealistic. Wang [20] points out that allele frequencies have to be estimated from samples and are therefore subject to sampling errors. For highly polymorphic markers and realistic sample sizes, the sampling errors for allele frequencies cannot be ignored. In our study we assumed that the total population was genotyped. Therefore, in reality even less favourable results for marker inbreeding coefficients must be expected. Toro et al. [18] made an extensive comparison of several estimators of coancestry based on molecular markers. They also calculated the correlation between such estimators and the genealogical coancestry. They observed that the use of the true allelic frequencies in the base population increases the correlation of the investigated estimators with the genealogical coancestry. These results are in agreement with our results (Tab. IV). Toro et al. [18] also calculate coancestry estimators using allelic frequencies in the actual population. They found discrepancies between the estimations of coancestry obtained from molecular information and the values of the genealogical coancestry, which are comparable to our results in Tables III and IV, where the expected heterozygosity was calculated from allele frequencies in the reference population. The lack of information on the true allelic frequencies in the base population cannot be compensated for with a larger number of investigated loci. This has clearly been shown for coancestry estimators [18] and the marker inbreeding coefficients (Tab. III) calculated here.
Under non-random mating conditions the picture changes slightly. A complete pedigree with less than 5 percent of false paternity or a correct and complete 5-generation pedigree leads to a closer correlation of pedigree inbreeding coefficients with true autozygosity than marker pedigree coefficients based on 100 or less microsatellite marker loci.
Of course, under the avoidance of full and half sib mating, no conclusions can be drawn from pedigree inbreeding coefficients (based on a 2-generation pedigree) about the most inbred animals.
Marker inbreeding coefficients based on 200 evenly spaced microsatellite marker loci would lead to almost the same (0.738) or an even closer (0.643) correlation than the corresponding pedigree inbreeding coefficients based on perfectly complete pedigree records (0.763 and 0.583) under avoidance of full sib mating or avoidance of full and half sib matings, respectively (Tab. II).

CONCLUSIONS
Several papers that deal with the coancestry or kinship between individuals within a population were published only recently [6,12,18,20]. The average kinship between the parents selected from generation t equals the average inbreeding coefficient of their offspring in generation t + 1 (assuming random mating of the parents). This suggests that we can control inbreeding by controlling the average kinship of the parents [11]. For that reason many papers published focus on kinship. Nevertheless many conservation breeding programmes, including a number of them in Austria, are based on the maximum avoidance of inbreeding, i.e. mating pairs are chosen such that the inbreeding coefficients of the offspring are as small as possible. Animals with high inbreeding coefficients are excluded from financial support and further utilization as breeding animals. This rule serves to avoid the deliberate mating of close relatives. To control the compliance with such rules it is important to have reliable information on average and individual inbreeding coefficients. If special breeding strategies are a condition for the financial support of endangered breeds, a critical judgement of the previous mating systems becomes more important.
The simulations show that even pedigrees of low quality allow the identification of the most autozygous animals in a random mating population. Measures based on codominant marker loci lead to comparable results only when more than 100 (better 200) microsatellite loci are typed.
With regards to conservation breeding programmes, it seems wise to use pedigree information whenever available especially considering the costs of genotyping for 100 or more microsatellite marker loci. The situation might change if typing of a massive number of genotypes per individual at a reasonable cost is a feasible task. Genotyping of animals for the purpose of monitoring the genetic variability in small populations needs to be a continuous process. Molecular markers must be investigated over several generations. Under a non-random mating scheme, the demands on quality and quantity of pedigree records increase. Deeper pedigrees are necessary to draw conclusions from pedigree inbreeding coefficients on autozygosity. To control past breeding strategies in practical conservation breeding programmes, complete five generation pedigee data should be obtained. Greater attention must be paid to the correct parentage of animals. Parentage verification with genetic markers is a valuable tool.