Reliabilities of best linear unbiased predictions (BLUP) of breeding values are defined as the squared correlation between true and estimated breeding values and are helpful in assessing risk and genetic gain. Reliabilities can be computed from the prediction error variances for models with a single base population but are undefined for models that include several base populations and when unknown parent groups are modeled as fixed effects. In such a case, the use of metafounders in principle enables reliabilities to be derived.

Methods

We propose to compute the reliability of the contrast of an individual’s estimated breeding value with that of a metafounder based on the prediction error variances of the individual and the metafounder, their prediction error covariance, and their genetic relationship. Computation of the required terms demands only little extra work once the sparse inverse of the mixed model equations is obtained, or they can be approximated. This also allows the reliabilities of the metafounders to be obtained. We studied the reliabilities for both BLUP and single-step genomic BLUP (ssGBLUP), using several definitions of reliability in a large dataset with 1,961,687 dairy sheep and rams, most of which had phenotypes and among which 27,000 rams were genotyped with a 50K single nucleotide polymorphism (SNP) chip. There were 23 metafounders with progeny sizes between 100,000 and 2000 individuals.

Results

In models with metafounders, directly using the prediction error variance instead of the contrast with a metafounder leads to artificially low reliabilities because they refer to a population with maximum heterozygosity. When only one metafounder is fitted in the model, the reliability of the contrast is shown to be equivalent to the reliability of the individual in a model without metafounders. When there are several metafounders in the model, using a contrast with the oldest metafounder yields reliabilities that are on a meaningful scale and very close to reliabilities obtained from models without metafounders. The reliabilities using contrasts with ssGBLUP also resulted in meaningful values.

Conclusions

This work provides a general method to obtain reliabilities for both BLUP and ssGBLUP when several base populations are included through metafounders.

Background

In the traditional animal model, animals with unknown parents are assumed to be unrelated and to belong to the same base population. This base population is assumed to be of infinite size and with an average breeding value equal to zero. Under this assumption, using the correct model specification and with a complete pedigree, estimation of the breeding values \(\left(\mathbf{u}\right)\) and variance components is unbiased [1, 2]. However, pedigrees are incomplete in most livestock populations, in which case genetic groups or unknown parent groups (UPG) are typically used to handle incomplete pedigrees and unrecorded selection [3]. From the early stages of their implementation, UPG were fitted as fixed effects [3,4,5], resulting in the final breeding value \({\mathbf{u}}^{*}=\mathbf{Q}\mathbf{g}+\mathbf{u}\) to be a weighted sum of the fixed group effects \(\left(\mathbf{Q}\mathbf{g}\right)\) and a random deviation \(\left(\mathbf{u}\right)\) [3]. As a result, the final breeding values (and their estimates) do not have an explicit base population, therefore, estimated breeding values (EBV) are not estimable, although their contrasts are [3]. Moreover, using fixed UPG is not justified from a quantitative genetics perspective because it assumes that the genetic variance is not changed by drift or selection [6], in other words, it assumes that UPG are of infinite size and not related to each other.

The reliability of an individual EBV can be defined as the squared correlation between the true and estimated breeding value over repeated conceptual sampling: \(re{l}_{i}={r}^{2}\left(\widehat{u},u\right)\). Reliabilities can be obtained as \(re{l}_{i}=1-\frac{\text{Var}\left({\widehat{u}}_{i}-{u}_{i}\right)}{\text{Var}\left({u}_{i}\right)}\) [7], where \(\text{Var}\left({\widehat{u}}_{i}-{u}_{i}\right)\) is the prediction error variance of \({\widehat{u}}_{i}\), which can be obtained from the inverse of the coefficient matrix of the mixed model equations (MME) [8]. However, the expression \(re{l}_{i}=1-\frac{\text{Var}\left({\widehat{u}}_{i}-{u}_{i}\right)}{\text{Var}\left({u}_{i}\right)}\) does not hold when UPG are fitted as fixed effects because then the covariance between true and estimated effects is not equal to the variance of the estimated effects [7]. Instead, when UPG are fitted as fixed effects, the reliability can be defined as \({r}^{2}\left({\widehat{u}}^{*},{u}^{*}\right)={r}^{2}(\mathbf{q}\widehat{\mathbf{g}}+\widehat{u},\mathbf{q}\mathbf{g}+u)\), where \(\mathbf{q}\) is a row of \(\mathbf{Q}\) (see p. 44 in [8]). However, the value of this correlation depends on the particular generalized inverse that is used to solve the MME (often implicitly) and is, thus, not uniquely defined (see p. 44 in [8]). A practical solution is to calculate the reliabilities from a model without UPG, but results in reliabilities to be overestimated because it ignores uncertainty due to the estimation of UPG [9].

To account for an increase in inbreeding and relationships, VanRaden [10] considered UPG as random effects by assigning them an inbreeding coefficient equal to the average inbreeding of known parents from the same generation. In crosses of populations, Lo et al. [11] modeled the covariances across animals by considering the heterogeneity of the base populations (i.e., different allele frequencies). Their model requires knowledge of the genetic variance of each base population and the segregation variance for each cross, which can differ for each trait. Garcia-Cortés and Toro [12] presented an equivalent model.

In a genomic context and with access to genotypes, Legarra et al. [13] introduced the concept of metafounders, which are a generalization of genetic groups that account for inbreeding and segregation variances (assumed the same for all traits) by introducing relatedness between base populations—the more different the base populations are, the larger is the genetic segregation variance. However, a base population from which all relationships are computed must be defined when considering different populations and crosses. For the case of two populations, Lo et al. [11] proposed using the F2 cross as a “reference group”. Mostly for convenience reasons (compatibility with genomic relationships), metafounders have a conceptual base population that has maximum heterozygosity. This lacks a clear genetic interpretation, as it differs from the base population of a pedigree without metafounders.

Using random UPG or metafounders leads to a model where all components of \({\mathbf{u}}^{*}\) are random and, thus, \(re{l}_{i}=1-\frac{\text{Var}\left({\widehat{u}}_{i}-{u}_{i}\right)}{\text{Var}\left({u}_{i}\right)}\) holds. However, in preliminary tests we observed that the computed reliabilities for a model with metafounders were biased (too low), for instance, for proven dairy sires. Thus, we hypothesize that the reliabilities obtained from the inverse of the coefficient matrix when using metafounders do not refer to a meaningful breeding value in the usual quantitative genetics sense because they do not refer to a single unrelated base population. In fact, it is not obvious what the genetic base is when using random UPG or metafounders. Instead, we argue that \(re{l}_{i}=1-\frac{\text{Var}\left({\widehat{u}}_{i}-{u}_{i}\right)}{\text{Var}\left({u}_{i}\right)}\) does represent the squared correlation between the true and estimated breeding values, but across two sampling processes. In the case of metafounders, these two sampling processes are: (i) from the conceptual base population of metafounders (one with maximum heterozygosity) to the metafounders, and (ii) from each metafounder to the individual. In the case of random UPG, there is a sampling from 0 (average UPG effect value a priori) to UPG values and then individual Mendelian samplings for each individual, which sum to UPG fractions, yield the final breeding value. Thus, in order to compute meaningful reliabilities of EBV we must define a proper base (or reference, as in Lo et al. [11]) population, relative to which the breeding values are expressed [14]. Thus, the objectives of this study were to develop theory and computational methods to obtain meaningful reliabilities for EBV for models with metafounders. The method is illustrated with data from the Lacaune dairy breed under pedigree-based and genomic-based models.

where \(\mathbf{y}\) is the vector of phenotypes, \({\varvec{\upbeta}}\) is the vector of fixed effects, \(\mathbf{u}\) is the vector of breeding values, \(\mathbf{e}\) is the vector of errors, and \(\mathbf{X}\) and \(\mathbf{Z}\) are incidence matrices. As usual, it is assumed that:

If the inverse of \(\mathbf{C}\) is \({\mathbf{C}}^{-1}=\left(\begin{array}{cc}{\mathbf{C}}^{11}& {\mathbf{C}}^{12}\\ {\mathbf{C}}^{21}& {\mathbf{C}}^{22}\end{array}\right)\), the reliability for the \(i\)th animal is calculated as [7]:

In “regular” animal models, \(\mathbf{K}=\mathbf{A}\), where \(\mathbf{A}\) is the numerator relationship matrix. Now, we introduce a distinction by using metafounders. We keep \({u}_{i}\) for the breeding value of individual \(i\) estimated in the “regular” animal model, and \({u}_{i}^{*}\) for the breeding value estimated with metafounders. When using metafounders [13], \({\mathbf{K}}^{-1}\) in Eq. (3) is replaced by either \({\mathbf{H}}_{{\varvec{\Gamma}}}^{-1}\) or \({\mathbf{A}}_{{\varvec{\Gamma}}}^{-1}\) if single-step genomic best linear unbiased prediction (ssGBLUP) [13, 15, 16] or pedigree-based BLUP is used, respectively. For the structure of \({\mathbf{H}}_{{\varvec{\Gamma}}}^{-1}\) and \({\mathbf{A}}_{{\varvec{\Gamma}}}^{-1}\), see [13]. In addition, \({\sigma }_{u}^{2}\) in Eqs. (3) and (4) is substituted by \({\sigma }_{u{\text{-}}related}^{2}\), which is the genetic variance accounting for relatedness in the base population [13, 17].

Using metafounder models, the reliability calculated following Eq. (4) (i.e., \(1-\frac{\text{Var}\left({\widehat{u}}_{i}^{*}-{u}_{i}^{*}\right)}{\text{Var}\left({u}_{i}^{*}\right)}\)) will be referred to as \({rel}_{i}^{*}\). Initial empirical evidence (examples will be shown later) and analytical proofs (see Appendix) showed that calculating reliabilities with metafounders based on Eq. (4) leads to reliability values that are lower than expected—for instance, 0.60 for a proven bull with 100 progenies with records in a pedigree-BLUP evaluation with \({h}^{2}=0.25\). This is because Eq. (4) refers to a base population of maximum heterozygosity [13] and with an expected value equal to zero, which has no meaning for breeding purposes. Metafounders are conceptually drawn from this base population with covariance matrix \({\varvec{\Gamma}}{\sigma }_{u{\text{-}}related}^{2}\) [13], and breeding values of individual animals are subsequently drawn through Mendelian sampling. Although this conceptual base population has no meaningful genetic interpretation, metafounders do. For example, they represent founders of the pedigree [18], founders of pure breeds [19], or unknown parents of animals born in given time periods [20]. In order to refer reliabilities to a base population [14], we propose to define the reliabilities as a contrast to one of these populations, i.e., contrasts to a reference metafounder.

Reliability as a contrast

Thus, we define the reliability \(re{l}_{{c}_{i}}\) of the contrast of “the EBV of animal \(i\) minus the estimated effect of the metafounder \(mf\)”, in the usual manner as \(re{l}_{{c}_{i}}={r}^{2}\left({\widehat{u}}_{i}^{*}-{\widehat{u}}_{mf}^{*},{u}_{i}^{*}-{u}_{mf}^{*}\right)\), with the following analytical expression:

where \(\left({\widehat{u}}_{mf}^{*}\right) {u}_{mf}^{*}\) is the (estimated) breeding value of the chosen reference metafounder. In Eq. (5), \({\mathbf{V}}_{{\Gamma }_{ii}}\) refers to either \({\mathbf{A}}_{{\Gamma }_{ii}}\) or \({\mathbf{H}}_{{\Gamma }_{ii}}\), whereas \({\mathbf{C}}_{{\Gamma }_{j,k}}^{22}\) refers to the \(j\)\(k\) element of the corresponding block of the inverse of \(\mathbf{C}\) calculated by replacing \({\upsigma }_{u}^{-2}{\mathbf{V}}^{-1}\) by \({\sigma }_{\text{u-related}}^{-2}{\mathbf{A}}_{{\varvec{\Gamma}}}^{-1}\) or \({\sigma }_{\text{u-related}}^{-2}{\mathbf{H}}_{{\varvec{\Gamma}}}^{-1}\). Note that Eq. (5) can be also used to calculate the reliability of the contrast of the estimated effect of two metafounders. Hence, it is possible to obtain a statistic to evaluate the precision of the estimate of a metafounder’s effect, something which is not easy to do for fixed-effect UPG.

Comparison to “regular” animal model reliabilities

Let \(re{l}_{i}=1-\frac{\text{Var}\left({\widehat{u}}_{i}-{u}_{i}\right)}{{A}_{\text{ii}}{\sigma }_{u}^{2}}\) be the “regular” animal model reliability without UPG or metafounders. In Appendix (Eq. 20), we show that for the case of a single metafounder and under mild assumptions, \(re{l}_{i}=re{l}_{{c}_{i}}\). This shows that our proposal for \(re{l}_{{c}_{i}}\) is a generalization of standard animal breeding theory [7]. The assumption for the proof in the Appendix to hold is that the product of the incidence matrix for the random effects (\(\mathbf{Z})\) times a vector of 1s (\(\mathbf{1}\)) must belong to the column space of the incidence matrix \(\mathbf{X}\) for the fixed effects (\(\mathbf{Z1}\in \mathcal{C}\left(\mathbf{X}\right)\), the column space of \(\mathbf{X}\)). From an interpretational point of view, this assumption implies that an overall mean is fitted implicitly or explicitly in the model (\(\mathbf{1}\in \mathcal{C}\left(\mathbf{X}\right)\)) and that the sum of each row of the incidence matrix \(\mathbf{Z}\) is constant (\(\mathbf{Z1}\propto \mathbf{1}\)). This assumption holds for a wide variety of models, including sire, animal (with and without maternal effects), and multi-trait models. For more complex models, such as competition models [21] and models with indirect genetic effects with non-constant group sizes [22], the assumption seems to hold if the models are sensibly specified.

We prove in the Appendix that the expression in Eq. (4) \({rel}_{i}^{*}=1-\frac{\text{Var}\left({\widehat{u}}_{i}^{*}-{u}_{i}^{*}\right)}{Var\left({u}_{i}^{*}\right)}\) gives a systematically lower value of reliability than the regular \(re{l}_{i}\); in fact, \({rel}_{i}^{*}=\frac{\left(2-\upgamma \right){\mathbf{A}}_{ii}}{\left(2-\upgamma \right){\mathbf{A}}_{ii}+2\upgamma } {rel}_{i}\) (see Eq. (19) in Appendix). This is because \(\gamma\) is twice the average inbreeding coefficient of a population; hence, it ranges from 0 to 2 and, therefore, the scalar \(\frac{\left(2-\gamma \right){A}_{ii}}{\left(2-\gamma \right){A}_{ii}+2\gamma }\) always ranges from 0 to 1. For instance, when \({A}_{ii}=1\) (no inbreeding) this gives \({rel}_{i}^{*}=\frac{2-\gamma }{2+\gamma } re{l}_{i}\), and for a typical value of \(\gamma\), say \(0.6\), this gives \({rel}_{i}^{*}=0.54re{l}_{i}\).

Reliabilities of metafounders

A point that is overlooked in genetic evaluations with UPG is the precision of the estimation of UPG values [9]. In principle, it is possible to compute the standard error of the contrasts of the estimates of two UPG from the elements of the inverse of the MME. However, to our knowledge, this is usually not done. In our study, the reliability of a contrast (Eq. 5) can also be applied to metafounders, which are treated like any other animals. Thus, it is possible to obtain reliabilities of contrasts of metafounders on the same 0 to 1 scale as individuals. So, the reliability of metafounder 2 compared to metafounder 1 also uses Eq. (5). Contrary to contrasts of UPG, the reliabilities of contrasts of metafounders do not need special computational treatment.

Computing strategies

For small datasets, the MME can be solved by inversion of \(\mathbf{C}\). Therefore, \({\mathbf{C}}_{{\Gamma }_{ii}}^{22}\), \({\mathbf{C}}_{{\Gamma }_{mf,mf}}^{22}\), and \({\mathbf{C}}_{{\Gamma }_{i,mf}}^{22}\) of Eq. (5) can be obtained from \({\mathbf{C}}^{-1}\). Also, \({\mathbf{K}}_{\Gamma }\) can be explicitly created and used to obtain the elements for the denominator of Eq. (5).

For large datasets without genomic information, \({\mathbf{C}}_{{\Gamma }_{ii}}^{22}\) can be retrieved from a sparse inverse of \(\mathbf{C}\), or approximated using a reliability approximation method, e.g. [23, 24], or by a Gibbs sampler. The elements \({\mathbf{C}}_{{\Gamma }_{mf,mf}}^{22}\) and \({\mathbf{C}}_{{\Gamma }_{i,mf}}^{22}\) are obtained in \(\mathbf{x}\) by solving the system \(\mathbf{C}\mathbf{x}={\mathbf{e}}_{mf}\), where \({\mathbf{e}}_{mf}\) is a vector of 0s except for a 1 in the position of the reference metafounder. Note that the system needs to be solved only once, for the reference metafounder. The diagonal elements of \({\mathbf{A}}_{\Gamma }\) (\({\mathbf{A}}_{{\Gamma }_{ii}}\) and \({\mathbf{A}}_{{\Gamma }_{mf,mf}}\)) are calculated before setting up the MME, using either a modified version of the Meuwissen and Luo [25] algorithm [13] or recursion. The elements \({\mathbf{A}}_{{\Gamma }_{mf,i}}\) are obtained by applying the method of Colleau [22], modified to account for metafounders.

For large datasets with genomic information, \({\mathbf{C}}_{{\Gamma }_{ii}}^{22}\) can still be obtained from a sparse inverse of \(\mathbf{C}\) if the number of genotyped animals is small. Otherwise, \({\mathbf{C}}_{{\Gamma }_{ii}}^{22}\) can be approximated by a Gibbs sampler or by reliability approximation methods that account for genomic information, e.g. [26,27,28]. The calculation of \({\mathbf{C}}_{{\Gamma }_{mf,mf}}^{22}\) and \({\mathbf{C}}_{{\Gamma }_{i,mf}}^{22}\) is the same as with pedigree only. The diagonal elements of \({\mathbf{H}}_{\Gamma }\) can be calculated using the methods of Legarra et al. [29]. Finally, the elements \({\mathbf{H}}_{{\Gamma }_{i,mf}}\) can be obtained in \(\mathbf{x}\) by solving the system \({\mathbf{H}}_{{\varvec{\Gamma}}}^{-1}\mathbf{x}={\mathbf{e}}_{mf}\). A more efficient way would be to solve separate systems for genotyped and non-genotyped animals. For genotyped animals, it is necessary to calculate \(\mathbf{x}=\mathbf{G}{\mathbf{A}}_{{\Gamma }_{22}}^{-1}{\mathbf{A}}_{{\Gamma }_{21}}{\mathbf{e}}_{mf}\), where the subscripts \(1\) and 2 refer to the non-genotyped and genotyped animals, respectively. First, the product \(\mathbf{w}={\mathbf{A}}_{{\Gamma }_{22}}^{-1}{\mathbf{A}}_{{\Gamma }_{21}}{\mathbf{e}}_{mf}\) can be efficiently calculated using the method of Fernando et al. [30], as the solution to \({\mathbf{A}}_{{\varvec{\Gamma}}}^{22}\mathbf{w}={-\mathbf{A}}_{\varvec{\Gamma }}^{21}{\mathbf{e}}_{mf}\). Then, values of \(\mathbf{x}\) for genotyped animals are obtained from \(\mathbf{x}=\mathbf{G}\mathbf{w}\). For non-genotyped animals, it is necessary to calculate \({\mathbf{x}=\mathbf{A}}_{{\Gamma }_{11}}{\mathbf{e}}_{mf}+{\mathbf{A}}_{{\Gamma }_{12}}{\mathbf{A}}_{{\Gamma }_{22}}^{-1}\left(\mathbf{G}-{\mathbf{A}}_{{\Gamma }_{22}}\right){\mathbf{A}}_{{\Gamma }_{22}}^{-1}{\mathbf{A}}_{{\Gamma }_{21}}{\mathbf{e}}_{mf}\). The first term is calculated using the method of Colleau [31], whereas the second uses multiplications from right to left. For more details on these methods, we refer to Colleau et al. [32].

The above methods to implement Eq. (5) have been programmed in the BLUPF90+ [33] software and may be invoked by adding the following options to the parameter file:

OPTION store_accuracy n

OPTION store_pec_mf arg

where n refers to the number of the animal effect and arg defines which metafounder is used as a contrast (first, last, user nmf). The default uses the first (first) metafounder, but any metafounder (nmf) can be defined as a contrast through the argument user nmf.

Materials

The proposed method was tested on a sheep dataset from the Lacaune breed. The number of animals in the pedigree was 1,961,687 (primarily females), from which 1,791,268 had phenotypes for milk yield. In total, 29,138 rams were genotyped with a 50K single nucleotide polymorphism (SNP) chip. The pedigree records started in 1970 and the unknown parents of sheep until 1978 constituted the first metafounder. Then, metafounders were created every 2 years. Although pedigree completeness was above 90% and all males born after 1978 had both parents known, there are females with unknown sire or unknown sire and dam. Table 1 shows the number of animals with records assigned to each metafounder, which ranged from 2000 to 100,000.

Analysis

Two models (BLUP and ssGBLUP) with three versions of reliability scenarios were used to explore the properties of regular reliability and the reliability of the contrast between an individual’s EBV and the metafounder’s effect. For BLUP, the first scenario did not include UPG or metafounders. The reliabilities calculated from this model \(Re{l}_{i}=1-\frac{\text{Var}\left({\widehat{u}}_{i}-{u}_{i}\right)}{{\mathbf{K}}_{ii}{\sigma }_{u}^{2}}\) will be referred to as RelNoMF for BLUP and serve as a reference of properly defined and accurate reliabilities, as most animals have complete pedigree. For ssGBLUP, this scenario was not considered, as genomic reliabilities depend strongly on the genotype coding [14, 34]. The second scenario (used for BLUP and ssGBLUP) included metafounders but without any correction for the reference population. Thus, the reliability was calculated following Eq. (4): \(Re{l}_{i}^{*}=1-\frac{\text{Var}\left({\widehat{u}}_{i}^{*}-{u}_{i}^{*}\right)}{{\mathbf{K}}_{{\Gamma }_{i,i}}{\sigma }_{u{\text{-}}related}^{2}}\). For this method, the reliability calculated from the inverse of the MME coefficient matrix refers to a genetic base population with maximum heterozygosity, which will be referred to as RelMFnc for BLUP and ssRelMFnc for ssGBLUP. Here, we compute reliabilities with sparse inversion for ssGBLUP because the use of metafounders with estimated \({\varvec{\Gamma}}\) puts \(\mathbf{G}\) and \(\mathbf{A}\) on the “same scale” [13, 17], leading to meaningful reliabilities. The third scenario used the same model with metafounders but reliabilities were defined as a contrast to a reference metafounder, as proposed in this study in Eq. (5): \(Re{l}_{{c}_{i}}=1-\frac{\text{Var}\left(\left({\widehat{u}}_{i}^{*}-{u}_{i}^{*}\right)-\left({\widehat{u}}_{mf}-{u}_{mf}\right)\right)}{Var\left({u}_{i}^{*}-{u}_{mf}\right)}\) and will be referred to as RelMFc and ssRelMFc for BLUP and ssGBLUP, respectively. The reference metafounder was the first one, which happens to represent the “oldest” Lacaune population.

For all the scenarios with metafounders, their relationship matrix \(({\varvec{\Gamma}})\) was calculated using a modification of the method of Macedo et al. [20] as:

where \({\widehat{\upgamma }}_{0}\) is the estimate of the self-average relationship of the first metafounder \(\left({\gamma }_{0}\right)\), \(k\) is the time (in years) between consecutive metafounders, \(\mathbf{T}\) is a strictly lower triangular matrix of 1s, and \(\Delta {F}_{\upgamma }=\Delta F\left(1-0.5 {\upgamma }_{0}\right)\), where \(\Delta F\) is the change in average inbreeding per year (assumed constant but this can be modified). For the dataset used in this study, \(k=2\) and \(\Delta F=6.442 \times {10}^{-4}\) per year as obtained by pedigree analyses. Parameter \({\gamma }_{0}\) was estimated as \({\widehat{\upgamma }}_{0}=2 {n}^{-1}{\sum }_{i=1}^{n}{\left(2 {\widehat{p}}_{i}-1\right)}^{2}\) [18], where \(n\) is the number of genotyped markers and \({\widehat{p}}_{i}\) is the estimate of the minor allele frequency in the base population for the \(i\)th marker, which was calculated using the method of Gengler et al. [35], resulting in \({\widehat{\upgamma }}_{0}=0.46375\). Note that this technique to obtain \({\varvec{\Gamma}}\) is only valid in the context of a single breed and cannot be used when the population includes several breeds and crosses. For simplicity, we assumed a continuous decrease in heterozygosity, but this can be easily relaxed. All computations were done using BLUPF90+ [33].

Reliabilities obtained from each scenario were compared for all animals and by the following categories of animals: selection candidates and males and females with two, one, or no parents known. Selection candidates were defined as genotyped young males without progeny.

Results

Figure 1 shows histograms for reliabilities for RelNoMF, RelMFc, and RelMFnc. The leftmost and larger peaks contain females, most having own records, whereas the rightmost and smaller peak consists of proven males. The average reliabilities of ~ 0.55 for females and ~ 0.90 for proven males in Fig. 1 agree with values from simple selection index theory, as the females have one to three records and proven males have a minimum of 30 daughters each, the heritability of the trait was 0.30 and the repeatability was 0.50. The values for RelNoMF and RelMFc align extremely well, as also shown in Fig. 2 (for females with both parents known). However, RelMFnc resulted in reliabilities that are too low both for males (~ 0.55) and females (~ 0.40), as expected based on the expression \(Re{l}_{i}^{*}=\frac{\left(2-\upgamma \right){\mathbf{A}}_{ii}}{\left(2-\upgamma \right){\mathbf{A}}_{ii}+2\upgamma } {rel}_{i}\). RelMFnc also gave reliabilities that are too low because they refer to a conceptual base population with maximum heterozygosity.

Figures 2 and 3 show the comparison between scenarios using BLUP for females and males, respectively. The leftmost plots in Figs. 2 and 3, i.e., the comparison between RelNoMF and RelMFnc evidence the inadequacy of Eq. (4) for calculating reliabilities with metafounders. The dots over the red dashed line indicate that the reliability is miscalculated because the addition of metafounders in RelMFnc should result in very similar reliabilities as RelNoMF, given that most individuals have both parents known. The comparison between RelMFc against RelMFnc in Figs. 2 and 3 demonstrate the underestimation of the reliabilities when using metafounders without the correction that is suggested in Eq. (5). Finally, the rightmost plots show how the calculation of the reliability as a contrast with a reference metafounder corrects the underestimation in RelMFnc.

For ssGBLUP, because the vast majority of individuals are not genotyped, results for non-genotyped animals were equal to those from Figs. 2 and 3. For genotyped selection candidates (males with genotype but no offspring yet), Fig. 4 shows the comparison between ssRelMFnc and ssRelMFc. Again, ssRelMFnc was systematically lower than ssRelMFc, with the mode for genotyped selection candidates approximately equal to 0.62 for ssRelMFnc and 0.70 for ssRelMFc. Given that ssRelMFnc systematically underestimated reliabilities, ssRelMFc provides accurate reliabilities. The comparison of ssRelMFc and RelMFc in Fig. 4 shows the gain in reliability by adding genomic information.

Finally, Fig. 5 shows the reliabilities of the contrasts of each metafounder with the reference metafounder. Most reliabilities oscillated around 0.90. In particular, the reliabilities declined in the most recent years because there is less information. Given the large number of records per metafounder (Table 1), we expected higher reliabilities. This may illustrate that correct estimation of different base populations (metafounders or UPG) is difficult, probably due to confounding of metafounders with environmental effects (flock-year) and poor genetic connections between metafounders (which, at best, are based through many common offspring of metafounders).

Discussion

We have not been able to find a discussion in the scientific literature on the definition of reliabilities for models with fixed UPG. Presumably, this is mainly because UPG are used in large datasets and reliabilities for EBV are approximated while ignoring UPG [23, 24, 36]. Another reason might be that reliabilities are not uniquely defined because the inverse of the additive relationship matrix is non-full rank [8]. Da et al. [37] presented methods to calculate prediction error variances for models with fixed UPG. Although these prediction error variances can be used in REML procedures, they cannot be used to obtain reliabilities because these prediction error variances and their derived reliabilities depend on the choice of the generalized inverse that is used to solve the mixed model equations. Hickey et al. [38] proposed a sampling-based method to calculate the reliability of EBV by simulating the true breeding value using random UPG. However, in this method, UPG effects are either drawn as random and estimated as random (in which case the associated variance component is unclear and the reliabilities refer to an undefined population), or UPG effects are drawn as fixed and estimated as fixed (in which case they suffer the same problem of lack of uniqueness). In our opinion, metafounders is a structure that is better defined from a quantitative genetics point of view, as relationships between metafounders are functions of heterozygosity at markers [18].

Another point is how to define a base point in genetic evaluations [14]. The metafounder approach uses a conceptual, non-existing population that has maximum heterozygosity, which leads to reliabilities that are too low, i.e. in some way, the maximum heterozygosity population is too distant from the actual animals—for the difference to be correctly estimated. In animal breeding practice, EBV are often referred to the average of a group of recent animals, such as “purebred females born in a given year”. This suggests that reliabilities should be defined as contrasts from the same average, which is conceptually feasible but not practical as it would imply manipulating thousands of prediction error variances and covariances. Thus, the breeder must choose a metafounder to provide a base for the contrast. In purebreds, a meaningful choice is an “old” population (representing the oldest founders of the breed), but it could also be the metafounder that has the most unknown parents assigned to it. In crossbreds, a natural choice is one of the parental breeds (or the oldest population within the breed), or even different breeds, which would result in different reliabilities for contrasts from different breeds. Our proposal of using metafounders within breeds assumes that differences in variances per breed and segregation variances are correctly defined by the matrix \({\varvec{\Gamma}}\).

Using genomic information added another layer of complexity to the interpretation of reliabilities calculated from the inverse of the MME. Stranden and Christensen [34] showed that different allele coding results in different reliabilities, although the EBV remain the same. Tier et al. [14] proposed a method to obtain prediction error variances that are independent of allele coding for GBLUP. Stranden et al. [39] proposed a method to fit the so-called J-factors and genetic groups, similar to metafounders, but as fixed instead of random effects. One of the drawbacks of their method is that the reliability is undefined. Our proposal of using metafounders together with contrasts overcomes these issues since the arbitrariness of allele frequencies is overcome by fixing them to 0.5 and the base population to which the reliabilities refer is explicitly addressed. Thus, our method adequately reports reliabilities of EBV from ssGBLUP models with different base populations.

An issue with the use of metafounders is the variance components that should be used for the covariance structure because they have no clear genetic interpretation without the associated \({\varvec{\Gamma}}\). For a single metafounder, the variance components associated with the metafounder and the genetic (co)variance of pedigree founders are proportional by \(\left(1-\frac{\gamma }{2}\right)\) [13]. For several metafounders, the proportionality constant is \(\left(1+\frac{\overline{diag\left({\mathbf{\Gamma}}\right)}}{2}-\overline{{\varvec{\Gamma}} }\right)\) [13]. However, this assumes a mixture in equal proportions of all individuals’ origins, which is not necessarily valid. Thus, future research could focus on which covariance matrix should be used when many metafounders are fitted in a model. In our Lacaune example, the scaling factor \(1/\left(1+\frac{\overline{diag\left({\mathbf{\Gamma}}\right)}}{2}-\overline{{\varvec{\Gamma}} }\right)\) resulted in a value of 0.7654, which is very close to \(1/\left(1-\frac{{\Gamma }_{\left({1,1}\right)}}{2}\right)\) (where \({\Gamma }_{\left({1,1}\right)}\) = 0.4638) because the increase in coancestry over time is very small compared to the initial heterozygosity of the breed. This may be true when using metafounders to model missing pedigrees within a breed.

However, in breed crosses, it is not obvious how to obtain variance components with metafounders from “routine” evaluations. The current modeling of breed crosses does not consider segregation variances, so a model with metafounders is just different. However, Poulsen et al. [40] showed by simulation that there is very good agreement between simulated segregation variances and their expected values based on \({\varvec{\Gamma}}\) multiplied by a single variance component \({\sigma }_{u{\text{-}}related}^{2}\). Nevertheless, more work is needed in this regard. Alternatively, one could simply estimate \({\sigma }_{u{\text{-}}related}^{2}\) and, if needed, express genetic variances on the usual “unrelated” scale, combining it with \({\varvec{\Gamma}}\) [41].

As mentioned before, Eq. (5) allows the reliability of the estimates of genetic groups or metafounders with respect to a reference group to be computed. Reliabilities of the effects of genetic groups are usually not calculated in genetic evaluations. The proposed Eq. (5) could help identify non-reliable groups, which can be a sign of non-optimal assignment of genetic groups. For example, if 2 consecutive metafounders defined by year of birth have a low reliability, combining them would allow more accurate modeling of genetic groups.

Conclusions

To date, there was no expression for the reliabilities with different base populations. Reliabilities that are calculated directly from prediction error variances obtained from the inverse of MME in models with metafounders underestimate the reliability of EBV because they refer to a conceptual base population that has maximum heterozygosity. We propose to calculate reliability as a contrast to a chosen metafounder. This leads to expressions for reliability that are mathematically identical to those for a single population and using pedigree-BLUP, and our empirical results show that they are adequate when there are several base populations. Given that computations are not difficult, we propose that reliabilities as a contrast to a reference metafounder should be used in routine evaluations, when different base populations are fitted into the model, both for BLUP and ssGBLUP.

Data availability

The data used in this work is owned by the organizations responsible for the Lacaune breeding program (Upra Lacaune, Ovitest, Confederation Générale de Roquefort), to whom it may be requested.

References

Sorensen DA, Kennedy BW. Estimation of response to selection using least-squares and mixed model methodology. J Anim Sci. 1984;58:1097–106.

Legarra A, Christensen OF, Vitezica ZG, Aguilar I, Misztal I. Ancestral relationships using metafounders: finite ancestral populations and across population relationships. Genetics. 2015;200:455–68.

Tier B, Meyer K, Swan A. On implied genetic effects, relationships and alternate coding. In: Proceedings of the 11th world congress on genetics applied to livestock production: 11–16 February 2018; Auckland. 2018.

Legarra A, Aguilar I, Misztal I. A relationship matrix including full pedigree and genomic information. J Dairy Sci. 2009;92:4656–63.

Aguilar I, Misztal I, Johnson DL, Legarra A, Tsuruta S, Lawlor TJ. Hot topic: a unified approach to utilize phenotypic, full pedigree, and genomic information for genetic evaluation of Holstein final score. J Dairy Sci. 2010;93:743–52.

Garcia-Baccino CA, Legarra A, Christensen OF, Misztal I, Pocrnic I, Vitezica ZG, Cantet RJ. Metafounders are related to F_{st} fixation indices and reduce bias in single-step genomic evaluations. Genet Sel Evol. 2017;49:34.

Xiang T, Christensen OF, Legarra A. Technical note: genomic evaluation for crossbred performance in a single-step approach with metafounders. J Anim Sci. 2017;95:1472–80.

Macedo FL, Astruc JM, Meuwissen THE, Legarra A. Removing data and using metafounders alleviates biases for all traits in Lacaune dairy sheep predictions. J Dairy Sci. 2022;105:2439–52.

Angarita BK, Han J, Cantet RJC, Chewning SK, Wurtz KE, Siegford JM, Ernst CW, Steibel JP. Estimation of direct and social effects of feeding duration in growing pigs using records from automatic feeding stations. J Anim Sci. 2021;99:skab042.

Poulsen BG, Ask B, Nielsen HM, Ostersen T, Christensen OF. Prediction of genetic merit for growth rate in pigs using animal models with indirect genetic effects and genomic information. Genet Sel Evol. 2020;52:58.

Tier B, Meyer K. Approximating prediction error covariances among additive genetic effects within animals in multiple-trait and random regression models. J Anim Breed Genet. 2004;121:77–89.

Ben Zaabza H, Mäntysaari EA, Strandén I. Using Monte Carlo method to include polygenic effects in calculation of SNP-BLUP model reliability. J Dairy Sci. 2020;103:5170–82.

Bermann M, Lourenco D, Misztal I. Efficient approximation of reliabilities for single-step genomic best linear unbiased predictor models with the algorithm for Proven and Young. J Anim Sci. 2022;100:skab353.

Legarra A, Aguilar I, Colleau JJ. Short communication: methods to compute genomic inbreeding for ungenotyped individuals. J Dairy Sci. 2020;103:3363–7.

Fernando RL, Dekkers JC, Garrick DJ. A class of Bayesian methods to combine large numbers of genotyped and non-genotyped animals for whole-genome analyses. Genet Sel Evol. 2014;46:50.

Colleau JJ, Palhière I, Rodríguez-Ramilo ST, Legarra A. A fast indirect method to compute functions of genomic relationships concerning genotyped and ungenotyped individuals, for diversity management. Genet Sel Evol. 2017;49:87.

Lourenco D, Tsuruta S, Masuda Y, Bermann M, Legarra A, Misztal I. Recent updates in the BLUPF90 software suite. In: Proceedings of the 12th world congress on genetics applied to livestock production: 3–8 July 2022; Rotterdam. 2022.

Strandén I, Christensen OF. Allele coding in genomic evaluation. Genet Sel Evol. 2011;43:25.

Gengler N, Mayeres P, Szydlowski M. A simple method to approximate gene content in large pedigree populations: application to the myostatin gene in dual-purpose Belgian Blue cattle. Animal. 2007;1:21–8.

Da Y, Grossman M, Misztal I. Prediction error variance and restricted maximum likelihood estimation for animal model with relationship grouping. J Dairy Sci. 1989;72:2125–35.

Hickey JM, Keane MG, Kenny DA, Cromie AR, Mulder HA, Veerkamp RF. Estimation of accuracy and bias in genetic evaluations with genetic groups using sampling. J Anim Sci. 2008;86:1047–56.

Strandén I, Aamand GP, Mäntysaari EA. Single-step genomic BLUP with genetic groups and automatic adjustment for allele coding. Genet Sel Evol. 2022;54:38.

Poulsen BG, Ostersen T, Nielsen B, Christensen OF. Predictive performances of animal models using different multibreed relationship matrices in systems with rotational crossbreeding. Genet Sel Evol. 2022;54:25.

Aldridge MN, Vandenplas J, Bergsma R, Calus MPL. Variance estimates are similar using pedigree or genomic relationships with or without the use of metafounders or the algorithm for proven and young animals. J Anim Sci. 2020;98:skaa019.

This study was partially funded by Agriculture and Food Research Initiative Competitive Grant No. 2020-67015-31030 from the US Department of Agriculture’s National Institute of Food and Agriculture. This project has received funding from the European Unions’ Horizon 2020 Research & Innovation program under Grant Agreement No. 772787-SMARTER.

Author information

Author notes

Andres Legarra

Present address: Council on Dairy Cattle Breeding, Bowie, MD, 20716, USA

Authors and Affiliations

Department of Animal and Dairy Science, University of Georgia, Athens, GA, USA

Matias Bermann, Daniela Lourenco & Ignacy Misztal

Instituto Nacional de Investigación Agropecuaria (INIA), Montevideo, Uruguay

Ignacio Aguilar

GenPhySE, INRAE, ENVT, Université de Toulouse, 31326, Castanet-Tolosan, France

AL conceived the idea of the study and conducted the analysis. MB formalized the theory. MB, AL, and IA took part in the discussion and co-wrote the manuscript. DL and IM edited the manuscript. All authors read and approved the final manuscript.

The authors declare that they do not have any conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix

Prediction error variances and reliabilities with a single metafounder

This Appendix shows that, under some assumptions, when there is a single metafounder that represents pedigree founders, the direct reliability of the breeding value has a lower numerical value than for the regular animal model with no metafounder. We also prove that the reliability of the contrast with a single metafounder is mathematically identical to the regular animal model with no metafounder.

and inverse \({\mathbf{C}}^{-1}=\left(\begin{array}{cc}{\mathbf{C}}^{11}& {\mathbf{C}}^{12}\\ {\mathbf{C}}^{21}& {\mathbf{C}}^{22}\end{array}\right)\). Let \(\mathbf{Z1}\in \mathcal{C}\left(\mathbf{X}\right)\) (the column space of \(\mathbf{X}\)), then \({\mathbf{C}}^{22}{\mathbf{K}}^{-1}{\mathbf{1}}={\mathbf{1}}{\upsigma }_{u}^{2}\). That is, \({\upsigma }_{u}^{2}\) is an eigenvalue of \({\mathbf{C}}^{22}{\mathbf{K}}^{-1}\), and its associated eigenvector is \(\mathbf{1}\).

Proof

By partitioned matrix inverse rules [42], \({\mathbf{C}}^{22}={\left({\upsigma }_{u}^{-2}{\mathbf{K}}^{-1}+{\mathbf{Z}}^{\mathbf{\prime}}{\mathbf{R}}^{-1}\left(\mathbf{I}-{\mathbf{P}}_{\mathbf{X}}\right)\mathbf{Z}\right)}^{-1}\), where \({\mathbf{P}}_{\mathbf{X}}={\mathbf{X}\left({\mathbf{X}}^{\mathbf{\prime}}{\mathbf{R}}^{-1}\mathbf{X}\right)}^{-1}{\mathbf{X}}^{\mathbf{\prime}}{\mathbf{R}}^{-1}\) is an oblique projector operator onto the column space of \(\mathbf{X}\). Then,

Since \({\upsigma }_{u}^{-2}\) is an eigenvalue of \(\mathbf{K}{{\mathbf{C}}^{22}}^{-1}\) and \(\mathbf{K}{{\mathbf{C}}^{22}}^{-1}\) is non-singular, then \({\upsigma }_{u}^{2}\) is an eigenvalue of \({\mathbf{C}}^{22}{\mathbf{K}}^{-1}\), and its associated eigenvector is \(\mathbf{1}\).

Reliability for models with one metafounder

When one metafounder is included in an animal model, the assumed linear mixed model is:

where \(\upgamma\) is a scalar. For the genetic interpretation of \(\upgamma\), we refer the reader to Christensen [17] and Legarra et al. [13]. Using partitioned matrix inverse rules, it can be shown that \({\mathbf{A}}_{\upgamma }^{-1}{\left(w\right)}^{-1}=\left(\begin{array}{cc}{\upgamma }^{-1}\left(1-\frac{\upgamma }{2}\right)+{\mathbf{1}}^{\mathbf{\prime}}{\mathbf{A}}^{-1}\mathbf{1} & -{\mathbf{1}}^{\mathbf{\prime}}{\mathbf{A}}^{-1}\\ -{\mathbf{A}}^{-1}\mathbf{1}& {\mathbf{A}}^{-1}\end{array}\right).\)

The coefficient matrix of mixed model equations is:

which thanks to the factorization of \({\left(1-\frac{\upgamma }{2}\right)}^{-1}\) is in terms of \({\upsigma }_{\text{u}}^{-2}\), not of \({\upsigma }_{\text{u-related}}^{-2}\), with inverse \({\mathbf{M}}^{-1}=\left(\begin{array}{ccc}{\mathbf{M}}^{11}& {\mathbf{M}}^{12}& {\mathbf{M}}^{13}\\ {\mathbf{M}}^{21}& {\mathbf{M}}^{22}& {\mathbf{M}}^{23}\\ {\mathbf{M}}^{31}& {\mathbf{M}}^{32}& {\mathbf{M}}^{33}\end{array}\right)\). Letting \({\mathbf{C}}^{22}={\left({\upsigma }_{u}^{-2}{\mathbf{A}}^{-1}+{\mathbf{Z}}^{\mathbf{\prime}}{\mathbf{R}}^{-1}\left(\mathbf{I}-{\mathbf{P}}_{\mathbf{X}}\right)\mathbf{Z}\right)}^{-1}\), i.e., the prediction error variance of the breeding values under a regular animal model:

By the mixed model property derived before, \({\mathbf{C}}^{22}{\mathbf{A}}^{-1}{\mathbf{1}}{\upsigma }_{\text{u}}^{-4}={\mathbf{1}}{\upsigma }_{\text{u}}^{-2}\). Thus:

Hence, the prediction error variance of the metafounder effect is \({\upsigma }_{\text{u}}^{2}\frac{2\upgamma }{2-\upgamma }\). Moreover, its prediction error variance is equal to its variance as seen from Eq. (11), since \({{\upsigma }_{\text{u}}^{2}\upgamma \left(1-\frac{\upgamma }{2}\right)}^{-1}={\upsigma }_{\text{u}}^{2}\frac{2\upgamma }{2-\upgamma }\).

where \(k={\upsigma }_{\text{u}}^{2}\left({\upgamma }^{-1}\left(1-\frac{\upgamma }{2}\right)+{\mathbf{1}}^{\mathbf{\prime}}{\mathbf{A}}^{-1}{\mathbf{1}}\right)\). Applying a variant of the Sherman–Morrison formula ([42]; formula 14) to Eq. (15):

Thus, the prediction error variance of the breeding values in a model with one metafounder is equal to the prediction error variance of the breeding values in a model without metafounders plus the prediction error variance of the metafounder effect.

Hence, the prediction error covariance between an animal effect and the metafounder effect is equal to the prediction error variance of the metafounder.

Using Eqs. (14) and (17) and \(\text{Var}\left({\widehat{u}}_{i}-{u}_{i}\right)={\mathbf{A}}_{ii}\left(1-{rel}_{i}\right)\), the reliability of an animal under a model with one metafounder is:

Thus, the reliability of the contrast (\(re{l}_{{c}_{i}}\)) in a model with one metafounder and the reliability of the regular animal model (\({rel}_{i}\)) are identical.

For multiple-trait models, the necessary mixed-model property is \(\left(\left(\mathbf{K}\otimes {\mathbf{G}}_{0}\right){{\mathbf{C}}^{22}}^{-1}\right)\left({\mathbf{1}}\otimes \mathbf{I}\right)=({\mathbf{1}}\otimes \mathbf{I})\), where \(\mathbf{K}\) is the covariance matrix for random effects and \({\mathbf{G}}_{0}\) is the covariance matrix between traits. Then, Eq. (20) holds because it can be proved that the results for Eqs. (14), (17), and (18) are \({\mathbf{G}}_{0}\frac{2\upgamma }{2-\upgamma }\), \({\mathbf{C}}^{22}+{\mathbf{11}}^{\mathbf{\prime}}\otimes {\mathbf{G}}_{0}\frac{2\upgamma }{2-\upgamma }\), and \({\mathbf{1}}\otimes {\mathbf{G}}_{0}\frac{2\upgamma }{2-\upgamma }\), respectively.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Bermann, M., Aguilar, I., Lourenco, D. et al. Reliabilities of estimated breeding values in models with metafounders.
Genet Sel Evol55, 6 (2023). https://doi.org/10.1186/s12711-023-00778-2