Individual increase in inbreeding allows estimating effective sizes from pedigrees

We present here a simple approach to obtain reliable estimates of the effective population size in real world populations via the computation of the increase in inbreeding for each individual (delta Fi) in a given population. The values of delta Fi are computed as t-root of 1 - (1 - Fi) where Fi is the inbreeding coefficient and t is the equivalent complete generations for each individual. The values of delta F computed for a pre-defined reference subset can be averaged and used to estimate effective size. A standard error of this estimate of Ne can be further computed from the standard deviation of the individual increase in inbreeding. The methodology is demonstrated by applying it to several simulated examples and to a real pedigree in which other methodologies fail when considering reference subpopulations. The main characteristics of the approach and its possible use are discussed both for predictive purposes and for analyzing genealogies.


INTRODUCTION
The effective population size (N e ), defined as 'the size of an idealized population which would give rise to the rate of inbreeding, or the rate of change in variance of gene frequencies observed in the population under consideration' [27], is a key parameter in conservation and population genetics because of its direct relationship with the level of inbreeding, fitness and the amount of genetic variation loss due to random genetic drift [5,7]. As a consequence, N e is usually considered as a useful criterion for classifying the livestock breeds according to the degree of endangerment [6,8].
When genealogies are available, the effective population size can be estimated from the increase in inbreeding (DF) between two discrete generations as in N e ¼ 1 2DF , with DF ¼ F t ÀF tÀ1 1ÀF tÀ1 , where F t and F tÀ1 are the average inbreeding at t and t À 1 generations [7]. The increase in inbreeding is constant for an ideal population of constant size with no migration, no mutation and no selection over discrete generations. However, in real populations with overlapping generations, the number of males and females is usually different and non-random mating is the rule, making DF a difficult parameter to deal with [7]. In most cases the definition of a 'previous' generation is quite difficult to establish. In fact, taking the average inbreeding of a pre-defined reference subpopulation and referring it to the founder population in which inbreeding is null by definition, fits poorly in any given real population and is only acceptable in small populations with shallow pedigree files [1,10,12] leading to the risk of overestimating the actual effective population size.
Some attempts have been proposed to overcome these challenges in the real world, namely the computation of N e from the variances of family sizes of males and females [7,13,14] or the use of the regression coefficient of the individual inbreeding coefficients on the number of generations known for each animal as an estimate of DF [12]. In a scenario of overlapping generations, computation of N e based on family variances unrealistically ignores population subdivision and several other causes of variation of the parameter, such as mating between relatives, migration, or different representation of founders. Most methodologies applied to compute N e under overlapping generations are also affected by the difficulties in fitting individuals to generations because data over time usually appear as registered by year regardless of when the renewal of the population is done at a generation interval. On the contrary, the computation of regression coefficients with the aim of approximating DF ¼ F t ÀF tÀ1 1ÀF tÀ1 , also has the difficulty of defining the 'previous' generation with respect to the identified reference subpopulation. The estimation of effective size could be approximated by using 1 À F t ¼ ð1 À 1 2N e Þ t to derive its value from a log regression of (1 À F) over a generation number [20], thus avoiding the need to define a previous generation. When the value of t is difficult to establish, this can be estimated by considering the year of birth as t and further correcting for the length of the generation interval [20]. However, variations in the breeding policy, such as planning mating to minimize coancestry after a period in which mating between close relatives was preferred, can lead to a temporal decrease in average inbreeding. When the animals of interest are those born in the period in which 360 the inbreeding decreased, methods based on assessing the increase in inbreeding would lead to negative values of N e .
Moreover, in real populations in which selection is likely to occur, an increase in inbreeding is not a consequence of the sole accumulative change of gene frequency of a neutral gene over generations but of the long-term genetic contributions made by the ancestors [25,26]. In fact, the average inbreeding coefficient of a current reference subpopulation depends on both the number of generations separating this reference subpopulation from the founder population, and how rapidly the inbreeding accumulates.
The concept of effective size can therefore be interpreted not only as a useful parameter to predict inbreeding, but also as a tool to analyse genealogies [5]. Many attempts have been made to deal with the different real world scenarios in order to obtain reliable estimates of the effective population size [4,5]. However, there is no standard method for general application to obtain the effective population size. Here we present a straightforward approach to deal with this task by the computation of the increase in inbreeding for each individual (DF i ) in a given population. The values of DF i are useful to obtain reliable estimates of N e . The N e estimated this way roughly describes the history of the pedigrees in the population of interest. The approach directly accounts for differences in pedigree knowledge and completeness at the individual level but also, indirectly, for the effects of mating policy, drift, overlap of generations, selection, migration and different contributions from a different number of ancestors, as a consequence of their reflection in the pedigree of each individual in the analyzed population. This approach, which is based on the computation of individual increase in inbreeding, also makes it possible to obtain confidence intervals for the estimates of N e .

Individual increase in inbreeding
We will start from a population with a size of N individuals bred under conditions of the idealized population [7]. Under these conditions the inbreeding at a hypothetical generation t can be obtained by [7]: The idea presented here is to calculate inbreeding values and a measure of equivalent discrete generations for each animal belonging to a subgroup of animals of interest (the so called reference subpopulation) in a scenario with overlapping generations. Then, from (1), and equating the individual inbreeding Effective size from pedigrees coefficient to that for a hypothetical population with all individuals having the same pedigree structure (F t = F i ), an individual increase in inbreeding (DF i ) can be defined as where t is the 'equivalent complete generations' [3,18] calculated for the pedigree of the individual as the sum over all known ancestors of the term of (½) n , where n is the number of generations separating the individual from each known ancestor. Notice that, on average, for a given reference subpopulation, t is equivalent to the 'discrete generation equivalents' proposed by Woolliams and Mäntysaari [24], thus characterizing the amount of pedigree information in datasets with overlapping generations. Parameter t has been widely used to characterize pedigree depths both in real [1,9,21] and simulated datasets [2]. The set of DF i values computed for a number of individuals belonging to the reference subpopulation can be used to estimate the N e regardless of the presence of individuals which would be assigned to different discrete generations according to their pedigree depth. The DF i values of the individuals belonging to the reference population can be averaged to give DF . From this, a mean effective population size N e can be straightforwardly computed as N e ¼ 1 2DF . Notice that this way of computing effective population size is not dependent on the whole reference subpopulation mating policy but on the mating carried out throughout the pedigree of each individual.
Moreover, since we are assuming a different individual increase in inbreeding for each individual i in the reference subpopulation, ascertaining the confidence on the estimate of DF is also feasible, and the corresponding standard error can be easily computed. Kempen and Vliet [17] described how the variance of the ratio of the mean of two variables x and y can be approximated using a Taylor series expansion. Assigning in our case x = 1, and y = 2DF, we can obtain the standard error of N e as r N e ¼ 2 ffiffi ffi N p N e 2 r DF , with N being the number of individuals in the reference subpopulation, r DF the standard deviation of DF and r N e the standard error of N e . It can also be easily shown that this is equivalent to assuming that N e has the same coefficient of variation as DF .

Other methods to estimate N e using pedigree information
Various additional approaches have been used to compare estimates of N e obtained from individual increase in inbreeding. First, N e was estimated from the rate of inbreeding (DF) or the rate of coancestry (Df ) observed between two discrete generations as, respectively, N e ¼ 1 2DF and N e ¼ 1 2Df , with 362 J.P. Gutiérrez et al.
1Àf tÀ1 , where F t and F tÀ1 and f t and f tÀ1 are the average inbreeding and the average coancestry at the t and t À 1 generations. Moreover, N e was estimated from the variances of family sizes as [13] 1 where M and F are the number of male and female individuals born or sampled for breeding at each time period, L the average generation interval r 2 mm and r 2 mf are the variances of the male and female offspring of a male, r 2 fm and r 2 ff are the variances of the male and female offspring of a female, and cov(mm, mf ) and cov(fm, ff ) the respective covariances. Note that the family size of a parent (male or female) consists of its number of sons and daughters kept for reproduction [14]. The three approaches described above were applied to the simulated pedigree files with the data structured in discrete generations.
When datasets with no discrete generations were analyzed, N e was estimated from the variances of family sizes but also from DF using three different approaches: first, following Gutiérrez et al. [12], the increase in inbreeding between two generations (F t À F tÀ1 ) was obtained from the regression coefficient (b) of the average inbreeding over the year of birth obtained in the reference subpopulation, and considering the average generation interval (l) as follows: with F tÀ1 computed from the mean inbreeding in the reference subpopulation (F t ) as Second, in a similar way N e was obtained using t directly instead of considering the generations through generation intervals. By using this approach, N e was computed from the regression coefficient (b) of the individual inbreeding values over the individual equivalent complete generations approximating t. In this case with F t being the average F of the reference subpopulation.
Effective size from pedigrees Finally, we applied the approach developed by Pérez-Enciso [20] to estimate N e via a log regression of (1 À F) (obtained from (1) as 1 generation number. When datasets with no discrete generations were analyzed, N e was estimated by a log regression of (1 À F) on the date of birth and then divided by the generation interval [20].  13], where N C is the number of reproductive individuals included in the reference subpopulation (50), V km and V kf are, respectively, the variances of family sizes of reproductive males and females (V km = V kf = 2 under random conditions), and L is the generation length in units of the specified time interval (2.5). Here N e equals to 125. (iv) Like the simulated population (i) but all parents having two offspring in the next generation. This is a case where mating is random but the variance of family sizes does not follow a Poisson distribution. The expected value of N e computed from the expression: 13], after equalling V km = V kf = 0, is 400. The simulated pedigree files (i) to (iv) listed above are expected to characterize classical theoretical scenarios of populations evolving randomly with two sexes (i), population subdivision (ii), overlapping generations (iii), and non-Poisson variance of family sizes (iv). Within each pedigree file, a reference subset (RS) was defined as the last 400 animals born.

Examples
Additionally, the pedigree file of the Carthusian strain of the Spanish Purebred horse was used to demonstrate the methodology on a real example. It is a subpopulation of the pedigree file of the Andalusian horse (SPB, Spanish Purebred horse) [22] and included a total of 6 318 individuals since the foundation of the studbook. This population is expanding with 45% of the registered individuals born over the last 20 years. This period of time is roughly the last two generations ( Fig. 1) [22]. The pedigree knowledge is reasonably high: 95% of ancestors tracing back seven generations were known and the mean equivalent complete generations for the animals born in the last decade was 9.1.
The Carthusian strain was chosen as a real example of an inbred population, because it had been subjected to a planned mating strategy using the minimum coancestry approach beginning in the 1980's [22]. Due to this mating policy, a decrease in the mean inbreeding coefficients along the period involving the last generation was also found [22]. This enables testing for the possible influence of a particular supervened breeding policy on N e . Two RSs were defined in the Carthusian pedigree file: the individuals born in the last 10 years of available records (RS 10 ), and the individuals born in a given period of years allowing their use for reproduction (1977)(1978)(1979)(1980)(1981)(1982)(1983)(1984)(1985)(1986)(1987)(1988)(1989); RS 77-89 ). The pedigree files of the fitted RSs were also edited to include only individuals with four equivalent generations or more, and eight equivalent generations or more. The main parameters describing the Carthusian pedigree file are given in Table I.  Individuals born in the defined period that acted sequentially as mares.

Program used
The analyses were performed using the ENDOG program (current version v4.4) [11], which can be freely downloaded from the World Wide Web at http://www.ucm.es/info/prodanim/html/JP_Web.htm.

RESULTS
The results from the analyses carried out on the simulated pedigree files are summarized in Figure 2. A discontinuous line was drawn for the theoretical effective size as reference under the different scenarios. In the case of subdivision (ii) the theoretical effective population size was also computed as the harmonic mean over generations, which expresses the expected N e under descriptive rather than predictive purposes.
Note the erratic behavior over generations, in Figure 2, of N e computed using the rate of inbreeding in the idealized (plot i) and non-Poisson offspring size variance (plot iv) populations. N e tended to fit better in the case of population subdivision (plot ii) and could not be used under a scenario with overlapping generations (plot iii). This erratic behavior was caused by the use of a single replicate in the simulation and could be overcome by using the harmonic mean of N e by generations. Estimations of N e based on an increase in coancestry, are, however, more precise because they are computed using much more data (all pairs of individuals rather than the number of individuals), and is almost exact in the case of all animals having identical offspring size. N e values computed using Df and those based on variance of family size, tended to fit well in the idealized population and in the case of overlapping generations, but it failed when considering the case of population subdivision because the method ignores that such a subdivision exists. After about eight generations, performance of the individual increase in inbreeding tended to fit better than those based on Df and variance of family sizes in the idealized population. In the case of a population subdivision, the N e computed from an individual increase in inbreeding fits very closely to the N e computed as the harmonic mean of the number of animals over generations for descriptive purposes and the N e using rate of inbreeding tended to approximate the theoretical N e for predictive purposes. The computed effective population size using DF i accounts for all historical pedigree of the individuals and the obtained N e summarizes all the genealogical information of each individual. Therefore, the genealogies recorded before subdivision weigh much more at the time closer to the population fission but their weight decreases with the accumulation of generations. If the estimation of N e from the generations after fission is carried out for predictive purposes the harmonic mean of N e throughout Effective size from pedigrees generations would be preferred rather than the N e based on individual increase in inbreeding since this converges much slower towards the 'theoretical' N e . However, the latter better addresses the history of the population if the estimation of N e is carried out for descriptive purposes. In the case of overlapping generations    Table I gives the main parameters describing the real pedigree file. The pedigree size was 6318 and the size for the fitted RS was 1464 individuals for RS 77-89 and 1721 for RS 10 . The mean equivalent generation ± standard deviation was the following: 6.6 ± 2.75 for the whole pedigree (WP), 9.1 ± 0.68 for RS 10 and 8.2 ± 0.64 for RS 77-89 . Figure 3 shows the evolution of the mean inbreeding, mean individual increase in inbreeding, mean equivalent generations, and N e across years of birth for the whole Carthusian pedigree file. It can be noted that mean inbreeding became approximately stable in the last generation interval, whilst mean equivalent generations increased leading to a reduction in the mean DF i during this period. The flat or slightly negative trend of inbreeding coefficients would lead to illogical estimates of N e when using methods based on regression of inbreeding on either generations or year of birth. However, N e obtained from individual increase in inbreeding remained approximately stable since pedigree knowledge achieves about five equivalent generations. Table II gives the estimates of N e obtained using regression of the individual coefficients of inbreeding on equivalent generations, variance of family sizes and individual increase in inbreeding (DF i ) in the whole pedigree file and the defined RS of the Carthusian horse. The estimates obtained using variances in family sizes are quite larger than those obtained using both regression on equivalent generations and DF i . Population    The values for N e were stable regardless of any restrictions on the pedigree depth of the individuals included in the pedigree file or the corresponding RS, whilst the N e obtained from regression of F on equivalent generations showed a non-consistent behavior. It is possible to find a noticeable increase in N e (40.0) when RS 10 includes individuals with four or more equivalent generations in the pedigree file but also negative values of N e (À142.2) when RS 10 includes individuals with eight or more equivalent generations in the pedigree file. Note again that a negative estimation of N e can be obtained when the increase in inbreeding is obtained by regression of the inbreeding coefficient on the year of birth and younger individuals are less inbred than older individuals. Thus, the N e obtained depends partially, on the effect of the changes in the mating policy. In the Carthusian population, the criterion of minimal coancestry has recently been used to define the mating policy of this population [22]. However, the N e obtained by individual increase in inbreeding shows a stable value of about 22.
The overall behavior of N e is summarized in Table III. In this table a RS as the last 400 individuals born was defined, which involves the last two generations in the simulated examples under discrete generations. The real whole pedigree file and the RS defined in the Carthusian pedigree file were also included together with the simulated examples. Different approaches to obtain N e were gathered: b(t) as described in (4), b(date) as described in Gutiérrez et al. [12], log b(date) as described in Pérez-Enciso [20], Var(offs) as described in (3), and N e based on the individual increase in inbreeding together with its standard error. The theoretical N e is also addressed in the table when possible. The method using the variances of family size does not result in accurate N e estimates in examples reflecting scenarios closer to reality (such as population subdivision) because they only consider one source of variation of N e . The N e computed from regression approaches (regardless of whether it is done on generations or birth date) tend to perform poorly when RSs are defined in a pedigree file, particularly in extreme cases in which the subset of interest has lower mean F than the other individuals included in the pedigree file. Interestingly enough, N e is quite stable and precise regardless of the pedigree file (real or simulated) analyzed and the particular conditions of the whole and the RSs defined.

DISCUSSION
The simple methodology presented here to assess N e in real populations accounts for pedigree knowledge of each individual in a population in order to obtain individual increase in inbreeding values (DF i ). The increase in inbreeding is not treated here as a single value but as a variable with an associated Effective size from pedigrees mean (ÁF ) that can easily be used to compute N e (in fact N e ) for a given RS as N e ¼ 1 2ÁF . The current methodology addresses N e directly from DF which, theoretically, becomes constant in a population with stable size and breeding policy. This is contrary to F, which increases from one generation to another [7]. Thus, DF is completely independent of pedigree depth in the idealized population [7]. The use of DF i s additionally contributes to overcome the problem of using the F i coefficients because the latter are non-linearly dependent on the pedigree depth of each individual. When trying to assess DF, after averaging F i coefficients by generation, differences in absolute mean values from one generation to another must still be divided by one minus the mean inbreeding in the previous generation. This is not easy to carry out in real populations which usually have overlapping. In such a scenario, it is unrealistic to work under assumptions such as no inbreeding in previous generations, or linear trend of inbreeding by generations. However, individual increase in inbreeding is ''free'' from these effects since it is also adjusted for the generation number of each animal. Of course, DF i s are still dependent on the completeness of the analyzed pedigree file and need a few generations to become constant at the population level (Fig. 3). Notice also that in real-diploid populations in which self-fertilization is not possible and where there are two different sexes, DF (but also ÁF ), will need more generations to reach asymptotic values than those expected in the idealized population. In this respect, the simple plot of DF i over equivalent generations helps with the correct definition of the RSs and on its usefulness to obtain reliable estimates of N e (Fig. 4).
Of course, as with other methodologies, the N e obtained is still dependent on the way the RS is defined. However, the present approach can be used to assess the confidence interval of N e from the standard error of ÁF . Indeed, no standard error can be computed directly for N e because the presence in our RS of individuals with DF i = 0 would lead to individual estimates of N e = 1. Following the methodology presented here, values of DF i = 0 would affect N e but an estimation of the parameter should still be possible.
When compared with the other methods to obtain N e assayed here, the N e obtained from ÁF was more stable regardless of the particular situation of the analyzed pedigree files. This was shown to be true regardless of whether we included individuals with shallower or deeper pedigree in the RSs (Tab. II). In this respect, the use of the regression coefficient of the individual inbreeding values over the individual equivalent complete generations to compute N e , gave estimates that were highly dependent on the mating policy carried out in the RS. This last concern was particularly noticeable when, after having mated relatives, a plan of mating based on low coancestry was implemented, leading to a decrease in the mean inbreeding thus providing negative N e values.
As expected, the N e computed from variances of family sizes was not useful to characterize the 'real' effective size, as shown in the Carthusian population analyzed here. This was also accurately reflected by the results obtained from the simulated example (ii) involving subdivision. If population subdivision is known, one would not treat a pedigree file as a single population, but each subpopulation separately. Then, the number of breeding animals would be smaller than when ignoring subdivision. The simulated example is only an unrealistic example of extreme subdivision but almost all real populations have some degree of uncontrolled subdivision, reflected by the fact that inbreeding is usually higher than coancestry. However, the method based on variance of family size reflects a temporary mating policy and can be useful when the pedigree knowledge is limited and/or subdivision has not yet occurred.
The methodology shown here does not consider phenomena such as population subdivision and selection. However, since these phenomena are reflected in the pedigree of individuals belonging to the analysed RS, the effective size computed from DF i s is expected to capture these effects. The concept of effective size usually has an asymptotic meaning in a regular system and can be used for predictive purposes rather than for analyzing genealogies [12]. The effective size obtained here is also related to pedigree tools used for describing genetic diversity in real populations, such as (effective) number of founders Effective size from pedigrees or ancestors [3,15]. The parameters computed here (ÁF and N e ) are therefore related to those computed using the concept of long-term genetic contributions of ancestors [25,26] even though the current approach does not need complex or iterative computations. The relationship between inbreeding -actually the partial inbreeding coefficients of Lacy et al. [16] -and founder contributions were used by Man et al. [19] to predict the frequency of carriers of an autosomal-deleterious gene when the ancestral source(s) of the gene is known. There is an inverse relationship between partial inbreeding coefficients coming from founders or ancestors and the effective number of founders or ancestors. In fact, under a random mating scheme, these partial inbreeding coefficients should be the sum of their squared contribution, which is precisely the denominator in the computation of the effective number of founders or ancestors. The relationship between F i s and DF i s is straightforward and the inverse relationship between increase in inbreeding and N e is given by definition.
However, the unbalance of the contributions from founders or ancestors is not the sole cause of the increases in inbreeding. They are also dependent on many other circumstances such as population structure, mating policy, changes in population size, etc. The estimates of N e based on individual increase in inbreeding would accurately reflect the genetic history of the populations, namely the size of their founder population, their mating policy or bottlenecks due to abusive use of reproductive individuals. All these phenomena influence the pedigree of the individual and are therefore reflected in the individual increase in inbreeding.
Inbreeding coefficients are widely used to calculate the rate of inbreeding and consequently N e [7]. However, N e can also be computed from the average coancestry of a RS [4]. In regular non-structured populations, average coancestry and inbreeding coefficients are analogous and the N e obtained using either approach should be the same. However, it has been reported that this is not always the case [23] and our results from the simulated population (ii) would confirm this fact. The effective size is defined here from the individual rate of inbreeding. However, the method can be easily extended to apply the same methodology to coancestry coefficients. To do this, all the coancestries among individuals in the RS would be assigned to the inbreeding coefficient of a hypothetical offspring from each couple of individuals with an equivalent discrete generation number of half the sum of the parents plus one, by simply applying (3).
To conclude, we show here a simple new approach to estimate N e in real populations that give stable estimates for N e . This methodology is derived from the definition of the individual increase in inbreeding. It is treated as a variable, which can be computed for each individual and can be useful for other purposes. Moreover, the possibility of estimating N e using this approach, and after assuming that it is the true N e value, provides a chance to estimate other useful parameters by just making use of the multiple existent expressions to predict N e in different situations [4,7], as for example, rates of selection, migration and Hardy-Weinberg deviations. Although this parameter is still a novelty, its usefulness makes it worthwhile to be investigated further.