### Definitions

Since the methods were applied to populations with overlapping generations, all definitions are based on birth cohorts rather than generations. A birth cohort *J* is a set of individuals born in a particular time interval, e.g. the individuals *B*
_{
t
} born in year *t*, or the population *P*
_{
t
} at time *t*[6]. Since the date of death is unknown in most cases, the population *P*
_{
t
} consists of all individuals up to a particular age *T*. This age *T* could be the average age of individuals when their last offspring was born, or, for simplicity, it could be the generation interval *I*. Thus, population *P*
_{
t
} consists of all individuals born in the time interval *t-T,t*.

The

**gene diversity**
*GD*(

*J*) of birth cohort

*J* is the probability that two alleles chosen at random from the birth cohort are not IBD. We can write

where alleles *X*
_{
J
} and *Y*
_{
J
} are randomly chosen with replacement from birth cohort *J*, and founder alleles are assumed to be pairwise different. An equivalent representation is
, where
is the average coancestry in birth cohort *J*.

Each allele descends from a particular founder. Take

to be the set of founder alleles. We distinguish between native founders and migrants, whereby a native founder is a founder that is not a migrant. A native founder is typically an individual with unknown pedigree that belongs to the population and was born before a certain date

*t*
_{
s
}. A migrant is typically an individual that either comes from an other population (breed), or an individual with unknown pedigree that was born after the date. The date

*t*
_{
s
}could be chosen shortly after establishment of the stud book when a sufficient portion of the population was recorded. We can write

where
is the set of alleles that come from native founders and
is the set of alleles that come from migrants.

We define the

**conditional gene diversity**
*condGD*(

*J*) of birth cohort

*J* as the conditional probability that two alleles randomly chosen from the birth cohort are not IBD, given that both descend from native founders. That is,

The

**founder genome equivalents**
*FGE*(

*J*) of birth cohort

*J* is defined as the minimum number of founders that would be needed to establish a population that has the same gene diversity as the individuals in birth cohort

*J*. It can be computed as

see
[

4]. Analogously, we define the

**native genome equivalents**
*NGE*(

*J*) of birth cohort

*J* as the minimum number of founders that would be needed to create a population that has the same conditional gene diversity as the individuals in birth cohort

*J*. We have

However, a problem with this definition is that native founders of the population are assumed to be unrelated, which is not true. As a consequence, in the first generation the NGE would be almost as large as the total population size. However, due to the invalid assumption of unrelated founders, the limited effective size causes the NGE to decrease tremendously shortly after the last native founders have entered the population. In order to avoid this artifact, we extrapolate the history of the breed back in time and use as the reference population not the founders listed in the stud book, but the population at an earlier time

*t*
_{0}. That is, all individuals are assumed to be unrelated in year

*t*
_{0}. In the applications, the base year was

*t*
_{0} = 1800. We define the conditional gene diversity of an age cohort

*J*
_{
t
}at time

*t* ≥

*t*
_{
s
} with respect to base year

*t*
_{0} as

where

is the population at time

*t*
_{
s
},

*I* is the generation interval, and

*histN*
_{
e
} is the historic effective size of the population. The historic effective size can be estimated from marker data
[

7]. The term that defines the conditional gene diversity is the product of two factors. The first is the estimated gene diversity in the population at time

*t*
_{
s
}, and the second is the factor by which the conditional gene diversity decreased between

*t*
_{
s
} and

*t*. Consequently, the

**NGE with respect to base year**
*t*
_{0} can be calculated as

A further parameter that can be of interest is the **effective size** of the population. The effective size *N*
_{
e
}(*t*
_{1}
*t*
_{2}) of a population within a time interval *t*
_{1}
*t*
_{2} is the size of an idealized random mating population of constant size that causes the same decrease of gene diversity as the true population within
generations. However, in breeds with steady gene flow from other populations, the gene diversity does not decrease below a certain level, so this definition of the effective size does not make much sense for populations with migration. Therefore, we use a slightly different definition. We define the **native effective size**
*N*
_{
eN
}(*t*
_{1}
*t*
_{2}) as the size of an idealized random mating population of constant size that causes the same decrease of the *conditional* gene diversity *condGD*(*P*
_{
t
}) as the true population within
generations. The effective population size at time *t*, defined as *N*
_{
eN
}(*t*) = lim_{
ε→0}
*N*
_{
eN
}([*t* − *ε*,
*t* + *ε*]), was calculated as described in
[8], except that it was calculated from the conditional gene diversity. The native effective size quantifies the decrease of genome equivalents originating from native founders because the NGE depend only on the conditional gene diversity, as can be seen from the previous two equations. In a population without migration,
*N*
_{
e
} and *N*
_{
eN
}are equal. However, in a population with steady gene flow from other populations, *N*
_{
eN
} is smaller than *N*
_{
e
} because the gene diversity approaches a plateau level, so
*N*
_{
e
}(*t*) goes to infinity.

The population *P*
_{
t
}at time *t*, which consists of all individuals up to an age of *T* years, has gene diversity *GD*(*P*
_{
t
}), native genome equivalents
, and genetic contribution
from native founders. Note that
, so
is the probability that a randomly chosen allele from age cohort *J* descends from a native founder. Besides monitoring of these quantities, a major task for a conservation program is the calculation of optimal genetic contributions for the breeding individuals that maximize the conditional gene diversity in the offspring and simultaneously maximize the genetic contribution from native founders in the offspring. Moreover, a sufficient level of gene diversity must be maintained in order to avoid inbreeding depression. In general, however, the quantities
and
cannot be maximized simultaneously, so an objective function is needed that considers each appropriately.

The usual approach (Approach A) for populations without migration is the calculation of genetic contributions

for the breeding individuals of population

*P*
_{
t
}such that the gene diversity

is maximized by a hypothetical (infinitely large) offspring population

. This approach is called minimum kinship selection
[

9]. Note that the gene diversity

of the hypothetical offspring is known as the potential diversity of the population at time

*t*[

6]. A more appealing approach for populations with migration is to use genetic contributions

for the breeding individuals such that the probability

is maximized by the resulting offspring population

. This is the probability that two randomly chosen alleles from the offspring are not IBD and are from native founders (Approach B). As a third approach, we consider maximization of the probability that two randomly chosen alleles from the offspring are not IBD and at least one of them descends from a native founder (Approach C). In this case, genetic contributions

for the breeding individuals are calculated such that the offspring population

maximizes

Finally, we consider maximizing the conditional gene diversity in the offspring population. That is, genetic contributions

for the breeding individuals were calculated such that the conditional probability

is maximized. This approach is intuitively appealing because it maximizes NGE. It has, however, the disadvantage that the conditional gene diversity can be large even for offspring populations with very large migrant contributions. This is due to conditioning on the event that the randomly chosen alleles
*X*
_{
J
} and *Y*
_{
J
}originate from native founders. This can be seen as follows. Take a solution
of the optimization problem and suppose that at least one migrant is a potential breeding individual. Then it can be shown mathematically that the genetic contribution of this migrant to the offspring population can be arbitrarily increased without changing the value of the objective function. Thus, the solution of the optimization problem may be not unique, and one solution maximizes migrant contributions. In order to avoid this, we put an additional constraint on the maximum permissible value for the genetic contribution from migrants to the offspring population.

### Computations

To calculate the parameters defined in the previous section, the following quantities are needed. First, the coancestry

*f*
_{
i,j
} is needed for each pair of individuals

*i,j*. It is the probability that two alleles randomly chosen from the individuals are IBD. That is,

where allele *X*
_{
i
} is randomly chosen from the two alleles of individual *i* at a particular locus.

Now we define an equivalence relation on the set of founder alleles. Two alleles

*x*
_{
i
},

*x*
_{
j
} are equivalent (

*x*
_{
i
}≡

_{
M
}
*x*
_{
j
}) if they are IBD or if both are migrant alleles. For two alleles randomly chosen from individuals

*i, j*, the probability for this to occur is

A second equivalence relation is defined as follows. Two alleles

*x*
_{
i
},

*x*
_{
j
} are equivalent (

*x*
_{
i
}≡

_{
FM
}
*x*
_{
j
}) if both are native founder alleles or if both are migrant alleles. For two alleles randomly chosen from individuals

*i, j*, the probability for this to occur is

These probabilities have the advantage that they can easily be computed with existing software, e.g. with function *kinship()* from the R-package *kinship*. For calculation of
, the parents of all migrants were identified with the same dummy individual and for this individual a pedigree with several generations of selfing was added. The coancestry of individuals *i, j*, computed from this extended pedigree is equal to
. Equality holds only approximately because only a finite number of generations of selfing was added. For calculation of
, the parents of all migrants were identified with one single dummy individual, the parents of all native founders were identified with another single dummy individual, and for both individuals pedigrees with several generations of selfing were added. The coancestry of individuals *i, j*, computed from this extended pedigree, is equal to
. For example, consider two full sibs *i, j* whose sire is a migrant and whose dam is a native founder. Their coancestry is
, but
, and
.

Let
be the *N*
_{
t
}×*N*
_{
t
} coancestry submatrix for the *N*
_{
t
} individuals from population *P*
_{
t
} that is obtained from the true pedigree (i.e.,
. The *N*
_{
t
}×*N*
_{
t
} matrix that contains the probabilities
for each pair of individuals *i, j* from population *P*
_{
t
}is denoted as
, and the *N*
_{
t
}×*N*
_{
t
} matrix that contains the probabilities
is denoted as
. That is, rows and columns that correspond to individuals not born in time interval [*t-T,t*] and dummy individuals were excluded from the matrix.

Additionally, the

*N*
_{
t
}-dimensional vector

**C**
_{
t
} = (

*C*
_{
t1},…,

is needed and contains the genetic contribution of native founders for each individual of population

*P*
_{
t
}. Note that

is the mean of vector

**C**
_{
t
}. Let

, and

be the means of the respective matrices. It is well known that the gene diversity can be computed as
[

4]

Proofs of all numbered equations are presented in Additional file

1, in which it is shown that the conditional gene diversity satisfies

Let

be an arbitrary (hypothetical) offspring population of size

*N* that is obtained from population

*P*
_{
t
} such that each breeding individual

*a* ∈

*P*
_{
t
} has genetic contribution

*c*
_{
a
}to the offspring population. The probability that an allele randomly chosen from the offspring population descends from a native founder is

and the conditional gene diversity in the offspring population is

so the optimum contributions

for the breeding individuals with respect to objective function

*ϕ*
_{
A
}minimize

under side conditions

*c*
_{
a
} ≥ 0 and

. Additional side conditions can be added to fulfil biological and practical requirements. Moreover, we have

so the optimum contributions

for the breeding individuals with respect to objective function

*ϕ*
_{
B
} minimize

under the side conditions described above, where

**1** is a vector with ones. Since

the optimum contributions

for the breeding individuals with respect to objective function

*ϕ*
_{
C
} minimize

under the side conditions. Finally, we have

where

is a

*N*
_{
t
} ×

*N*
_{
t
} matrix. This function was maximized under the side conditions described above. Moreover, the additional side constraint

was applied, where

is the minimum permissible contribution of native founders to the offspring population. This is a quadratic fractional programming problem with linear constraints, so the objective function could have multiple local maxima. As mentioned in the previous section, one solution of the optimization problem maximizes migrant contributions, so the inequality constraint could be replaced by the equality constraint

. For each offspring population

*J* that satisfies this equality constraint, the objective function (i.e. the conditional gene diversity) satisfies

where the approximation is exact if the events
and
are independent. Therefore, an approximate solution was obtained by maximizing objective function *ϕ*
_{
B
} under the additional constraint
. The resulting contributions for the breeding individuals were used as starting values for general nonlinear optimization in order to obtain the exact solution. In the applications, the threshold value
was quite arbitrarily chosen as the 75% quantile of the genetic contributions from native founders to individuals in the population. The same quantile was used for all breeds and years in order to make the results comparable. Results could be improved by choosing breed dependent threshold values.

We used the interior point method *ipop* in R-package *kernlab* (see
[10]) for objective functions *ϕ*
_{
B
} and *ϕ*
_{
D
}, whereas for objective functions *ϕ*
_{
A
}and *ϕ*
_{
C
}with positive definite matrices we used *solve.QP* from R-package *quadprog*. It implements the dual method of Goldfarb and Idnani
[11, 12].

### Materials

Only three local cattle varieties of Baden and Württemberg in the south-west of Germany have been preserved from extinction. These are the Vorderwald cattle, Hinterwald cattle, and Limpurg cattle. Other local breeds were replaced by Simmentaler Fleckvieh after their introduction at the beginning of the 19th century because the small landraces were not suitable for tillage
[1].

The small *Hinterwald cattle* could be preserved as an almost pure breed until the beginning of the 20th century
[13, 14] because the poor soil quality in its region of origin was not suitable for larger breeds. Nevertheless, this breed adopted the colour of the Simmentaler Fleckvieh during the 19th century
[15]. The Hinterwald cattle were occasionally crossed with the Vorderwald cattle
[16] and with Fleckvieh.

The red-and-white marked, colour-sided
[17]*Vorderwald cattle* were frequently crossed with Simmentaler cattle. Consequently, the white stripe along the back became rare already around 1900
[16]. After the Second World War, Vorderwald cattle were also crossed with Ayrshire, Red Holstein and Montbéliard cattle in order to improve milk yield. These crosses were registered as Vorderwald cattle. Extinction probabilities for Vorderwald and Hinterwald cattle were estimated by
[18].

The yellow coloured *Limpurg cattle* were not only frequently crossed with Simmentaler cattle
[19], but also occasionally with Braunvieh and Gelbvieh cattle
[15] in order to increase body size. Nevertheless, the population size decreased dramatically. Only 17 Limpurg cows were registered in 1967, so the breeding association was dissolved. Several Limpurg cattle, however, were rediscovered in 1986 and a new stud book was established. Not only Limpurg cattle were registered, but also Fleckvieh crosses, and some Gelbvieh and Glan-Donnersberger bulls
[16].

The data consisted of the pedigrees and additional information on 25 412 Hinterwald cattle, 185 315 Vorderwald cattle, and 4 150 Limpurg cattle. Vorderwald cattle without offspring were removed from the data in order to reduce the data set. Pedigrees of Hinterwald and Vorderwald cattle trace back only to 1948 because the stud books were renewed after the Second World War. Pedigrees of Limpurg cattle trace back only to 1970. Cattle from other breeds were considered to be migrants. Additionally, Hinterwald and Vorderwald cattle with unknown pedigree born after
*t*
_{
s
} = 1970 were also considered migrants, although some may have purebred ancestors. Limpurg cattle with unknown pedigree were considered to be migrants if they were born after *t*
_{
s
} = 1988. The generation intervals were similar for the three breeds (unpublished results). Here, we assumed a generation interval of
*I* = 5.3 years for all breeds.