# Prediction of the reliability of genomic breeding values for crossbred performance

- Jérémie Vandenplas
^{1}Email authorView ORCID ID profile, - Jack J. Windig
^{1}and - Mario P. L. Calus
^{1}

**Received: **5 July 2016

**Accepted: **27 April 2017

**Published: **12 May 2017

## Abstract

### Background

In crossbreeding programs, various genomic prediction models have been proposed for using phenotypic records of crossbred animals to increase the selection response for crossbred performance in purebred animals. A possible model is a model that assumes identical single nucleotide polymorphism (SNP) effects for the crossbred performance trait across breeds (ASGM). Another model is a genomic model that assumes breed-specific effects of SNP alleles (BSAM) for crossbred performance. The aim of this study was to derive and validate equations for predicting the reliability of estimated genomic breeding values for crossbred performance in both these models. Prediction equations were derived for situations when all (phenotyping and) genotyping data have already been collected, i.e. based on the genetic evaluation model, and for situations when all genotyping data are not yet available, i.e. when designing breeding programs.

### Results

When all genotyping data are available, prediction equations are based on selection index theory. Without availability of all genotyping data, prediction equations are based on population parameters (e.g., heritability of the traits involved, genetic correlation between purebred and crossbred performance, effective number of chromosome segments). Validation of the equations for predicting the reliability of genomic breeding values without all genotyping data was performed based on simulated data of a two-way crossbreeding program, using either two closely-related breeds, or two unrelated breeds, to produce crossbred animals. The proposed equations can be used for an easy comparison of the reliability of genomic estimated breeding values across many scenarios, especially if all genotyping data are available. We show that BSAM outperforms ASGM for a specific breed, if the effective number of chromosome segments that originate from this breed and are shared by selection candidates of this breed and crossbred reference animals is less than half the effective number of all chromosome segments that are independently segregating in the same animals.

### Conclusions

The derived equations can be used to predict the reliability of genomic estimated breeding values for crossbred performance using ASGM or BSAM in many scenarios, and are thus useful to optimize the design of breeding programs. Scenarios can vary in terms of the genetic correlation between purebred and crossbred performances, heritabilities, number of reference animals, or distance between breeds.

## Keywords

## Background

Several livestock production systems are based on crossbreeding schemes (e.g., [1–3]), and take advantage of the increased performance of crossbred animals compared to purebred animals, along with breed complementarity. For such production systems based on crossbreeding, the breeding goal for the purebred populations is to optimize the performance of crossbred descendants. However, the selection of purebred animals for crossbred performance has not been extensively implemented in livestock, partly due to the difficulty of routine collection of pedigree information on crossbred animals [4].

With the advent of genomic selection, various genomic prediction models have been proposed, which use phenotypic records of crossbred animals to increase the selection response for crossbred performance in purebred animals (e.g., [2, 4–6]). These approaches predict breeding values for crossbred performance of selection candidates using the estimated allele substitution effects of many single nucleotide polymorphisms (SNPs). The SNP allele substitution effects are estimated from phenotypes of genotyped reference animals. In the context of crossbreeding, several breeds and their crosses are involved in genomic prediction, and purebred and crossbred performances are often considered to be different but correlated traits (e.g., [1, 3, 5, 7, 8]). Therefore, estimates of SNP allele substitution effects for purebred and crossbred performance traits may not be the same for purebred and crossbred populations, e.g., due to genotype by environment interactions. Assuming only additive gene action, one approach to accommodate this is to model differences between allele substitution SNP effects using a multivariate genomic model that assumes a correlation structure between the effects of SNPs across the purebred and crossbred populations, or equivalently, by assuming a genetic correlation structure across the trait measured in purebred and crossbred populations [9, 10]. These multivariate genomic models are referred to hereafter as across-breed SNP genotype models (ASGM), since the estimates of SNP allele substitution effects for the crossbred performance trait are also used to predict breeding values for crossbred performance of purebred selection candidates, regardless of their breed of origin [4, 6]. Thus, estimates of SNP effects for the crossbred performance trait using ASGM are not breed-specific. However, a number of factors may have an impact on the effect that can be measured for a SNP for the crossbred performance trait. First, the two parental alleles at a SNP in a crossbred animal may have different effects on the phenotype due to different levels of linkage disequilibrium (LD) with a quantitative trait locus (QTL) in the parental purebred populations. Second, different genetic backgrounds, such as dominance or epistatic interactions, can result in the effects of the same QTL to be different in purebred versus crossbred animals. And third, purebred and crossbred animals may be exposed to different environments, leading to genotype by environment interactions. Because of these reasons, estimated allele substitution effects at SNPs for the crossbred performance trait may be breed-specific. To accommodate all these differences, previously an approach was proposed [3–6] that estimates breed-specific allele substitution effects for the crossbred performance trait (BSAM), assuming that the breed origin of SNP alleles in crossbred animals is known. Results from simulations have shown that BSAM can result in greater accuracy of genomic estimated breeding values (EBV) of purebreds for crossbred performance than ASGM under some conditions [2, 4, 11].

In order to be able to evaluate many different breeding program designs that apply genomic prediction for crossbreeding performance, it would be useful to be able to predict the reliability of genomic EBV using, for example, different genomic models or different breeding schemes. Prediction of reliability should preferably consider the genotype data of all reference animals and selection candidates when available, although it is also desirable to be able to predict the reliability when genotype data of, e.g., selection candidates is not available, i.e. when designing breeding programs. Various equations have been proposed in the literature to predict the reliability or the accuracy (i.e., the square root of reliability) of genomic EBV for (groups of) animals. The investigated genomic predictions rely on single-population genomic models [12, 13], and on ASGM [10, 14]. When genotypes are available for both reference animals and selection candidates, prediction equations are derived using selection index (SI) theory, while before availability of all genotyping data, they are derived using population parameters (e.g., heritability, number of reference animals) [10, 13, 15]. However, to our knowledge, equations for predicting the reliability of genomic EBV for crossbreeding performance for (groups of) animals have not yet been reported.

The primary aim of this study was to derive equations for predicting the reliability of genomic EBV for crossbred performance based on ASGM or BSAM. Prediction equations were derived for situations when all genotyping data are available for both reference animals and selection candidates (referred to as “with availability of genotyping data”), and for situations when the genotyping data are not available (referred to as “without availability of genotyping data”). The second aim was to compare the predictions of the reliability of genomic EBV without availability of genotyping data to the predictions obtained from the equations with availability of genotyping data, because the former are an approximation of the latter. Both reliabilities have the same expectation, since they both rely on prediction error variances (PEV) and assume absence of selection. Finally, the equation for predicting reliability without availability of genotyping data was used to investigate the expected ranges of reliabilities of genomic EBV using BSAM for a pig breeding program.

## Methods

The first part of this section describes equations for predicting the reliability of genomic EBV for crossbred performance using ASGM or BSAM. For the derivations of these equations, we assumed a crossbreeding program with two breeds, A and B, with their F1 being crossbred AB animals. In order to simplify the derivation of the equations, we assumed that phenotypes are corrected for all fixed and random effects, other than additive genetic effects. Furthermore, reference animals are defined as animals with genotypes and phenotypes, and selection candidates are defined as animals with genotypes but without their own phenotype. The assumption that all reference animals have genotypes is likely to be correct in the near future, as genotyping costs continue to decrease. The aim is to predict the reliability of genomic EBV for crossbred performance for selection candidates of breed A. For the reference population, three scenarios were investigated: (1) the reference population includes only breed A animals (PB–PB), i.e. purebred (PB) phenotypes are used to predict EBV for crossbred (CB) performance of PB selection candidates; (2) the reference population includes only crossbred AB animals (CB–PB), i.e. CB phenotypes are used to predict EBV for CB performance of PB selection candidates; and (iii) the reference population includes both crossbred AB and breed A animals (CB + PB–PB), i.e. CB and PB phenotypes are used to predict EBV for CB performance of PB selection candidates. These scenarios represent situations where crossbred animals are terminal animals in commercial herds of pigs and chickens. The second part of this section describes simulations of the three scenarios used to validate the prediction equations without availability of genotyping data. In the equations below, reference animals are indicated by uppercase letters, while selection candidates are indicated by lowercase letters.

### Across-breed SNP genotype models

Equations for predicting the reliability of genomic EBV for crossbred performance using ASGM were developed for the three scenarios. As ASGM is assumed, breed A and crossbred AB animals can be considered as belonging to different populations, assuming the genetic correlation between the PB and CB performance traits (\(r_{PC}\)) to be the genetic correlation between these breed A and crossbred AB populations. Therefore, equations for predicting the reliability of genomic EBV for crossbred performance for the three scenarios using ASGM can be derived from previous studies by, for example, Daetwyler et al. [12] and Wientjes et al. [10], without availability of genotyping data, and by VanRaden [15] with availability of genotyping data.

#### PB–PB scenario

The PB–PB scenario considers breed A animals for both reference animals and selection candidates. Phenotypes are therefore associated with the purebred performance trait, while the trait of interest is the crossbred performance trait. Indeed, selection candidates must be selected to optimize crossbred performance of their crossbred descendants.

*i*th selection candidate of breed A of the \(N_{a} \times N_{A}\) genomic relationship matrix \({\mathbf{G}}_{a,A}\) between selection candidates of breed A and reference animals of breed A; \({\mathbf{G}}_{{a_{i} ,a_{i} }}\) is the diagonal element corresponding to the

*i*th selection candidate of breed A of the \(N_{a} \times N_{a}\) genomic relationship matrix \({\mathbf{G}}_{a,a}\) between selection candidates of breed A; and matrix \({\mathbf{I}}\) is the identity matrix.

Matrices \({\mathbf{G}}_{A,A}\), \({\mathbf{G}}_{a,a}\), and \({\mathbf{G}}_{a,A}\) are parts of the genomic relationship matrix among all reference animals and selection candidates of breed A, i.e. \({\mathbf{G}} = \left[ {\begin{array}{*{20}c} {{\mathbf{G}}_{A,A} } & {\quad {\mathbf{G}}_{A,a} } \\ {{\mathbf{G}}_{a,A} } & {\quad {\mathbf{G}}_{a,a} } \\ \end{array} } \right]\). Without loss of generality, and similar to Wientjes et al. [10], matrix \({\mathbf{G}}\) is computed following the second method of VanRaden [15], i.e., \({\mathbf{G}} = \frac{{{\mathbf{ZZ}}^{\prime } }}{m}\) where *m* is the number of SNP genotypes, and matrix \({\mathbf{Z}}\) contains the standardized genotypes as \({\mathbf{Z}}_{lk} = \frac{{{\mathbf{M}}_{lk} - 2p_{k} }}{{\sqrt {2p_{k} \left( {1 - p_{k} } \right)} }}\), with \({\mathbf{M}}_{lk}\) being the SNP genotype (coded as 0 for one homozygous genotype, 1 for the heterozygous genotype, or 2 for the alternate homozygous genotype) of the *l*th animal of breed A for the *k*th locus, and \(p_{k}\) is the allele frequency at the *k*th locus.

#### CB–PB scenario

*i*th selection candidate of breed A of the \(N_{a} \times N_{AB}\) genomic relationship matrix \({\mathbf{G}}_{aAB}\) between breed A selection candidates and crossbred AB reference animals. Similarly to Wientjes et al. [10], the genomic relationship matrix between breed A selection candidates and crossbred AB reference animals, \({\mathbf{G}}\), is computed following the second method of VanRaden [15] but taking into account that the selection candidates and reference animals belong to two different populations. It then follows that \({\mathbf{G}} = \left[ {\begin{array}{*{20}c} {{\mathbf{G}}_{AB,AB} } & {\quad {\mathbf{G}}_{AB,a} } \\ {{\mathbf{G}}_{a,AB} } & {\quad {\mathbf{G}}_{a,a} } \\ \end{array} } \right] = \frac{{{\mathbf{ZZ}}^{\prime } }}{m}\), where \(m\) is the number of SNPs and matrix \({\mathbf{Z}}\) contains the standardized genotypes as \({\mathbf{Z}}_{ljk} = \frac{{{\mathbf{M}}_{ljk} - 2p_{jk} }}{{\sqrt {2p_{jk} \left( {1 - p_{jk} } \right)} }}\), with \({\mathbf{M}}_{ljk}\) being the SNP genotype (coded as previously) of the

*l*th individual from the

*j*th population (i.e., purebred or crossbred) for the

*k*th locus, and \(p_{jk}\) is the allele frequency of the

*j*th population at the

*k*th locus.

#### CB + PB–PB scenario

### Breed-specific allele substitution models

In crossbred populations, SNP effects may be breed-specific due to a number of factors [4], including different extents of LD between SNP and QTL between breeds, which can be accommodated by using BSAM, which fits breed-specific allele substitution effects [3, 4]. In this section, it is assumed that the breed origin of SNP alleles is known, as required by BSAM. Moreover, only the CB–PB and CB + PB–PB scenarios are considered, since the PB–PB scenario involves data on only one breed. To our knowledge, equations for predicting the reliability of genomic EBV using BSAM have not previously been developed.

#### CB–PB scenario

*k*th locus of the

*l*th individual has breed A allele 1 or 2, respectively; and \(p_{Ak}\) is the frequency at the

*k*th locus for breed A. Matrix \({\mathbf{Z}}_{AB}^{\left( B \right)}\) is defined similarly. Expectations and variances of \({\varvec{\upbeta}}_{c}^{\left( A \right)}\) and \({\varvec{\upbeta}}_{c}^{\left( B \right)}\) are assumed to be \(E\left[ {\begin{array}{*{20}c} {{\varvec{\upbeta}}_{c}^{\left( A \right)} } \\ {{\varvec{\upbeta}}_{c}^{\left( B \right)} } \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {\mathbf{0}} \\ {\mathbf{0}} \\ \end{array} } \right]\) and

*i*th selection candidate of breed A is then equal to:

Since no equation has previously been proposed to predict the reliability of genomic EBV for BSAM without availability of genotyping data, here, we put forward a derivation based on mixed model theory [17], assuming that allele substitution effects for breeds A and B are estimated simultaneously. Equivalence between the mixed model and SI theories has previously been shown under certain conditions, including the use of the same estimates of the fixed effects [15, 17, 18]. Our derivation of the equation for predicting the reliability of genomic EBV for BSAM without availability of genotyping data [i.e., Eq. (8) below] is detailed in Additional file 2, and the result is briefly described in the following.

*k*th independent locus explains an equal amount of the breed A-specific additive genetic variance \(\sigma_{{c_{A} }}^{2}\), i.e., \(\sigma_{{c_{A} }}^{2} = Me_{a,AB}^{\left( A \right)} \sigma_{{\beta_{c}^{*\left( A \right)} }}^{2}\), with \(Me_{a,AB}^{\left( A \right)}\) being the effective number of chromosome segments underlying the crossbred performance trait for breed A and segregating in both breed A selection candidates and crossbred AB reference animals. The same assumption is made for the breed B-specific effect \({{\upbeta }}_{{c_{k} }}^{ *\left( B \right)}\). The genomic EBV (\({\text{c}}_{{a_{i} }}^{\left( A \right)}\)) for the

*i*th selection candidate of breed A can be predicted as follows:

*i*th selection candidate of breed A and \({\hat{\varvec{\upbeta }}}_{c}^{*(A)}\) is the vector of the predictions of \({\varvec{\upbeta}}_{c}^{*\left( A \right)}\). Following mixed model theory [17, 19], the reliability of \({\hat{\text{c}}}_{{a_{i} }}^{\left( A \right)}\) can be computed from the prediction error variance, \(Var\left( {{\hat{\text{c}}}_{{a_{i} }}^{\left( A \right)} - {\text{c}}_{{a_{i} }}^{\left( A \right)} } \right)\), and is equal to:

*k*th independent locus explains an equal amount of the breed A-specific additive genetic variance \(\sigma_{{c_{A} }}^{2}\) and that the reliability of the estimated effect, \(r_{{\beta_{c}^{*\left( A \right)} }}^{2}\), is the same for each locus, it follows that:

*k*th effect, \({\hat{\varvec{\upbeta }}}_{{c_{ \ne k} }}^{*\left( A \right)}\), as well as for the breed B-specific allele substitution effects, \({\hat{\varvec{\upbeta }}}_{c}^{ *\left( B \right)}\). The prediction of \({{\upbeta }}_{{c_{k} }}^{*\left( A \right)}\) for the

*k*th locus can then be performed using the following model:

#### CB + PB–PB scenario

*i*th selection candidate of breed A is then equal to:

Without availability of genotyping data, the prediction equation for the reliability of genomic EBV based on BSAM, \(r_{C + P\_BSAM\_without}^{2}\), can be derived similarly to the prediction equation for the CB–PB scenario, \(r_{C\_BSAM\_with}^{2}\). The derivation is based on mixed model theory and assumes that independent allele substitution effects for breeds A and B for both purebred and crossbred performances were estimated simultaneously. The detailed derivation can be found in Additional file 3.

*i*th selection candidate of breed A can be predicted as \({\hat{\text{c}}}_{{a_{i} }}^{\left( A \right)} = {\mathbf{z}}_{{a_{i} }}^{{{*}\left( A \right)}} {\hat{\varvec{\upbeta }}}_{c}^{*\left( A \right)}\) and its reliability is equal to:

*k*th independent locus can be performed using the phenotypes of both purebred and crossbred performances, \(\left[ {\begin{array}{*{20}c} {\widehat{{{\mathbf{y}}_{A}^{*} }}} \\ {\widehat{{{\mathbf{y}}_{AB}^{*} }}} \\ \end{array} } \right]\), corrected for all other fixed effects, as well as for the breed B-specific allele substitution effects and correlated effects, using the model:

### Computation of the effective number of chromosome segments (Me)

The proposed computation of \(Me\) requires genotypes for both selection candidates and reference animals, which may be inconsistent with its use in the computation of reliabilities without availability of genotyping data. However, it is reasonable to assume that genotypes are already available for a limited number of animals, for example at least 100, that have the right family structure that is representative of the evaluated scenario, such that an accurate approximation of \(Me\) can be computed [14].

The effective number of chromosome segments originating from a specific breed (*b*) and that are shared between purebred selection candidates (\(S\)) of this breed and crossbred reference animals (\(Rc\)), \(Me_{S,Rc}^{\left( b \right)}\), is required for the prediction equations for BSAM. In this study, \(Me_{a,AB}^{\left( A \right)}\), is required in Eqs. (8) and (10) and was assumed to be equal to \(Me_{a,A}\), which is required in Eqs. (2) and (6). The equality \(Me_{a,AB}^{\left( A \right)} = Me_{a,A}\) was assumed since the selection candidates were the same for Eqs. (2), (4), (6), (8), and (10) the number of reference animals \(R\) and \(Rc\) was large, and the parents of breed A and crossbred AB reference animals were sampled from the same finite pool.

### Simulated data

Data were simulated to validate Eqs. (2), (4), (6), (8), and (10), which predict the reliability of genomic EBV for crossbred performance using ASGM or BSAM, without availability of genotyping data. Two extreme scenarios were considered, in which either two closely-related or two unrelated breeds were used to produce crossbred animals. The reliabilities predicted by Eqs. (2), (4), (6), (8), and (10) were validated against the reliabilities computed with the corresponding prediction equations with availability of genotyping data, that is, Eqs. (1), (3), (5), (7), and (9). The reliabilities predicted by equations with availability of genotyping data are equivalent to those computed from PEV associated with selection candidates of a genomic best linear unbiased prediction including both reference animals and selection candidates, based on phenotypes corrected with the best linear unbiased estimates of the fixed effects, and assuming the absence of selection [15].

#### Populations

In a second step, a two-way crossbreeding program with five generations of random selection was simulated. The animals of breeds A and B that were used to start the crossbreeding program were sampled from generation 2010 for the related breeds and from generation 2100 for the unrelated breeds. During the crossbreeding program, and for both breeds, animals of breeds A and B were randomly selected and mated to simulate the next generation of a constant size of 1000 males and 3000 females for each breed. From each of these five generations, animals of breeds A and B were randomly crossed to produce five generations of 4000 crossbred AB animals. Purebred animals used as parents of crossbred animals could also be parents of the next generation of purebred animals (Fig. 1).

#### Genotypes

The total length of the simulated genome was 10 Morgans (M) (10 chromosomes of 1 M and 4000 SNPs each). The positions of SNPs and of recombinations were randomized per chromosome and a recurrent mutation rate of 2.5 × 10^{−4} was assumed. All SNPs with a minor allele frequency (MAF) higher than or equal to 0.05 in the last historical generation (i.e., generation 2000) and were used to simulate the SNP genotypes of the purebred and crossbred animals. For subsequent analyses, 2000 SNPs were randomly selected from these SNPs for each chromosome. The breed origin of each allele for each crossbred animal was recorded. All scenarios (including the historical populations) were replicated 10 times.

#### Validation of prediction equations without availability of genotyping data

The validation required a set of known genotypes, as described previously, but no phenotype, since the reliabilities predicted without availability of genotyping data were validated against the reliabilities predicted with availability of genotyping data. However, estimates of heritabilities and genetic correlations between purebred and crossbred performance were required. Heritabilities of 0.20, 0.40, and 0.95 were used for both the purebred and crossbred performance traits. A high heritability, such as 0.95, and a single record per reference animal can be assumed when phenotypes of reference animals are derived from highly reliable EBV (e.g., deregressed EBV) [10]. Genetic correlations between purebred and crossbred performance traits were assumed to be equal to 0.30 or 0.70.

In the simulated data, two groups of reference animals and one group of selection candidates were defined for each scenario of related and unrelated breeds. For the scenarios with related and unrelated breeds, the two groups of reference animals were randomly selected from generations 2012 and 2102, respectively. For scenarios PB–PB and CB–PB, the two groups of reference animals included 2000 and 4000 animals that were randomly chosen from breed A and crossbred AB animals, respectively. For scenario CB + PB–PB, the first group included 4000 randomly chosen breed A animals and 2000 randomly chosen crossbred AB animals and the second group included 4000 breed A animals and 4000 crossbred AB animals. For the selection candidates for scenarios PB–PB, CB–PB and CB + PB–PB, 1000 breed A animals were randomly selected from each generation, starting from generation 2013 for the related breeds scenario and from generation 2103 for the unrelated breeds scenario, to create the groups of selection candidates. In the following, selection candidates from generations 2013 or 2103 are referred to as “G1” selection candidates. Similarly, selection candidates from generations 2014 and 2104 and from generations 2015 and 2105 are referred to as “G2” and “G3” selection candidates, respectively.

For each ‘reference population-selection candidates’ combination and for each scenario, reliabilities of the genomic EBV for crossbred performance were computed using Eqs. (1), (3), (5), (7), and (9) for the scenarios in which all data was available, and using Eqs. (2), (4), (6), (8), and (10) for scenarios without availability of genotyping data. The required genomic relationship matrices and values of \(Me\) were computed using our in-house software calc_grm [22]. The predicted reliabilities were averaged across the 10 replicates.

### Application of a prediction equation

The proposed equations can be used to investigate the reliability of genomic EBV for crossbred performance in crossbreeding schemes. As an illustration, Eq. (10), which predicts the reliability of genomic EBV using both purebred and crossbred animals as reference animals by BSAM, was used to predict the reliability of genomic EBV for a pig production system for which 10,000 breed A animals were previously genotyped and phenotyped. The aim was to investigate the effect of the addition of crossbred AB animals to the reference population on the reliability of genomic EBV for crossbred performance. A heritability of 0.20 was assumed for both purebred and crossbred performance traits and the genetic correlation between purebred and crossbred performance traits for breed A, (\(r_{PC}^{\left( A \right)}\)), ranged from 0.0 to 1.0. Both values of \(Me\) required by Eq. (10) (i.e. \(Me_{a,AB}^{\left( A \right)}\) and \(Me_{a,A}\)) were assumed to be equal to 476.6, based on the equation \(Me = 2N_{e} L/\left( {\ln \left( {4N_{e} L} \right)} \right)\) [23], with \(N_{e}\) being the effective population size and \(L\) being the total length of the genome in M. For \(N_{e}\) and \(L\), we assumed values of 80 and 27 respectively, based on the study of Landrace pigs by Uimari and Tapio [24] and the study by Lin et al. [25]. The use of equal values of \(Me\) for the purebred and crossbred populations was based on the assumption that breed A parents of purebred and crossbred animals were sampled from the same pool.

## Results

This section first presents the results of the validation of the equations for predicting reliability without availability of genotyping data. As defined previously, the reliabilities without availability of genotyping data were validated against the reliabilities computed with availability of genotyping data. The second part of this section describes the increase in reliabilities from the addition of crossbred animals to a purebred reference population in a pig breeding program.

### PB–PB scenario

Reliabilities predicted without availability of genotyping data were always lower than those predicted with availability of genotyping data, which agrees with theory (see “PB–PB scenario” section in the “Methods” section). For the scenario with related breeds and \(r_{PC} = 0.3\) (Fig. 2), the differences between reliabilities predicted without and with availability of genotyping data were around 0.00 for \(h_{a}^{2} = 0.2\), in the range [−0.02; 0.00] for \(h_{a}^{2} = 0.4\), and in the range [−0.02; −0.01] for \(h_{a}^{2} = 0.95\) across all three groups of G1, G2 or G3 selection candidates and with 2000 breed A reference animals. When \(r_{PC} = 0.7\) (Fig. 3), the corresponding differences between reliabilities predicted without and with availability of genotyping data were in the range [−0.03; 0.00] for \(h_{a}^{2} = 0.2\), in the range [−0.06; −0.02] for \(h_{a}^{2} = 0.4\), and in the range [−0.11; −0.04] for \(h_{a}^{2} = 0.95\). The largest differences between reliabilities predicted without and with availability of genotyping data were always observed for the G1 selection candidates.

Similar results were obtained for the scenario with unrelated breeds (see Additional file 4: Tables S1, S2). Such similar results were expected since the distance between breeds is not taken into account by ASGM. The SD of the reliabilities across replicates were in the range [0.000; 0.001] (see Additional file 4: Tables S1, S2).

### CB–PB scenario

For the G1 selection candidates, the reliabilities for ASGM with availability of genotyping data were around 0.09 with 2000 crossbred reference animals, independent of the relationship between the breeds, and around 0.16 with 4000 crossbred reference animals, using \(h_{c}^{2} = 0.20\) (Figs. 4, 5). Differences between the reliabilities predicted without and with availability of genotyping data were around −0.01 for both 2000 and 4000 crossbred reference animals. The corresponding reliabilities using \(h_{c}^{2} = 0.95\) were around 0.37 and 0.58 with 2000 and 4000 crossbred reference animals, respectively. The corresponding differences between reliabilities predicted without and with availability of genotyping data were in the range [−0.13; −0.08].

For G1 selection candidates with related breeds, the reliabilities for BSAM with availability of genotyping data were around 0.06 and 0.11 with 2000 and 4000 crossbred reference animals, respectively, when using \(h_{c}^{2} = 0.20\) (Fig. 4). Differences between reliabilities predicted without and with availability of genotyping data were around −0.01 with both 2000 and 4000 crossbred reference animals. The corresponding reliabilities using \(h_{c}^{2} = 0.95\) were around 0.27 and 0.43 with 2000 and 4000 crossbred reference animals, respectively. Corresponding differences between reliabilities predicted without and with availability of genotyping data were in the range [−0.09; −0.06]. Similar differences were observed with unrelated breeds (Fig. 5). The SD of reliabilities across replicates were in the range [0.000; 0.002] (see Additional file 4: Tables S3, S4).

A comparison of reliabilities with availability of genotyping data between ASGM and BSAM showed that ASGM consistently performed better than BSAM. However, reliabilities for BSAM increased with increasing distance between breeds, while reliabilities for ASGM were only slightly affected (Figs. 4, 5). The increase in reliabilities with increasing distance between breeds, which compensates for the larger number of effects fitted in BSAM compared to ASGM, is in agreement with previous studies, e.g., Ibanez-Escriche et al. [4].

### CB + PB–PB scenario

The CB + PB–PB scenario included both breed A and crossbred AB animals in the reference population. The number of breed A reference animals was always 4000. The number of crossbred AB animals was equal to 2000 or 4000. The CB + PB–PB scenario also included both ASGM and BSAM.

### Reliabilities in a pig-breeding program

## Discussion

In this study, the term “reliability” refers to the precision of genomic EBV obtained by relating their PEV to the additive genetic variance of the base population, i.e., assuming absence of selection. Equations for predicting the reliability of genomic EBV for crossbred performance are proposed for reference populations that include purebred animals, crossbred animals, or both. Reliabilities were predicted for two models: ASGM and BSAM. For the BSAM, we used the true breed-of-origin of all alleles for the crossbred animals, which would have to be estimated in practice, which may negatively impact the reliability obtained. However, we expect this to have only a very minor effect, since we showed in previous studies that it is possible to accurately derive breed-of-origin of alleles in three-breed crossbred pigs [26, 27].

Reliabilities of genomic EBV can be predicted when genotype data are already available, i.e., with availability of genotyping data, or without availability of genotyping data. For scenarios without availability of genotyping data, it is assumed that the required genetic parameters are computed using pedigree instead of genomic data, or that estimates are available from the literature. The results of this study showed that the reliabilities of genomic EBV for crossbred performance predicted without availability of genotyping data were of the same order of magnitude as those predicted with availability of genotyping data. Therefore, while prediction of reliability should preferably take the genotype data of selection candidates into account when available, both methods can predict the reliability of genomic EBV for crossbred performance for different reference populations, heritabilities, and \(r_{PC}\). The derived equations can therefore be useful to optimize the design of breeding programs.

### Reliabilities predicted without and with availability of genotyping data

The aim of this study was to predict the precision of genomic EBV based on PEV in the absence of selection. Thus, the derivation of our prediction equations without and with availability of genotyping data was based on the SI and mixed model theories and assumed that phenotypes were corrected for all fixed and random effects other than the considered genetic additive effects. The equivalence between SI and mixed model theories under certain conditions, such as the use of the same estimates for the fixed effects, has previously been shown by several studies (e.g., [15, 17, 28, 29]). Therefore, reliabilities predicted with availability of genotyping data would be expected to be close to reliabilities computed from PEV obtained from genomic best linear unbiased prediction, in the absence of selection. Equations for predicting the reliability of genomic EBV without availability of genotyping data were validated against the equations for predicting reliability with availability of genotyping data, and not against the reliability of selection, i.e., the squared correlation between estimated and true genomic breeding values, which is often obtained by cross-validation. Indeed, the reliability of genomic EBV is not equivalent to the reliability of selection for populations that are under selection, although they are equivalent for populations without selection [30–33]. Reliability of selection can be predicted from the reliability of genomic EBV by considering the intensity of selection using, e.g., the equations proposed by Dekkers [30] and Bijma [31].

We also assumed that all additive genetic variance was captured by the SNPs in the derivation of the prediction equations. When only a portion of the additive genetic variance is captured by the SNPs, the prediction equations need to take this into account, as proposed by Goddard et al. [13] and Wientjes et al. [14]. This proportion could be empirically estimated when the reference population includes only one population by comparing predicted and realized (cross-validation) reliabilities [14].

For most scenarios, predicted reliabilities without availability of genotyping data underestimated the reliabilities predicted with availability of genotyping data (Figs. 2, 3, 4, 5, 6, 7, 8, 9). While this is in agreement with the theory, only a part of the underestimation is due to the fact that the decrease of the error variance when multiple loci are used was ignored (see “Methods” section; [12, 13]). This underestimation is greater when heritability and reliability increase to a value of 1 [13], as observed in our results. Most of the underestimation is, however, primarily due to an overestimation of \(Me\), especially for the PB–PB and CB–PB scenarios with only one generation separating reference animals and selection candidates, for which the largest underestimations were observed. For instance, the fractional underestimation of \(r_{P\_ASGM\_with}^{2}\) that can be attributed to not considering the reduction in error variance \(\left( {1 - h_{a}^{2} r_{P\_ASGM\_without}^{2} } \right)\) (see Additional file 1) is approximately equal to \(0.03\) (i.e., 3% error) for the PB–PB scenario with one generation separating 4000 reference animals and selection candidates, \(h_{a}^{2} = 0.95\), \(r_{PC} = 1.0\), and \(Me_{a,A} = 3730\) (obtained from one random replicate). This does, however, explains only part of the fractional underestimation of about 0.57 that is observed in Fig. 3. Thus, the underestimation appears to be mainly due to the overestimation of \(Me\), particularly when only one generation separates the reference animals and selection candidates. Indeed, while estimates of \(Me\) increased with decreasing predicted reliabilities with availability of genotyping data, the results show that the reliabilities predicted without availability of genotyping data decreased at a lower rate than reliabilities predicted with availability of genotyping data when the relationships between the reference population and the selection candidates decreased. Further work to improve estimation of \(Me\) is needed, especially for scenarios in which reference animals and selection candidates are highly related.

While predicted reliabilities without availability of genotyping data were underestimated for most scenarios, overestimations were observed for some scenarios with reference populations that included both purebred and crossbred animals (Figs. 6, 7, 8, 9). These overestimations may be the result of estimation of \(Me\) and assumptions taken for the derivation of the equations without availability of genotyping data (e.g., a diagonal residual (co)variance matrix for corrected phenotypes associated with BSAM and using purebred and crossbred reference animals).

### Potential use of the prediction equations

The equations derived in this study can be used to compare the effects of modifying the values of various factors (e.g., \(r_{PC}\), numbers of reference animals, or relationships between the reference population and the selection candidates) on the reliability of genomic EBV for crossbred performance and for the optimization of the design of breeding programs. However, the effects of some factors should be compared carefully. For example, the results show that the prediction equations without availability of genotyping data should be used with care for the comparison of the effects of different relationships between the reference population and the selection candidates. The prediction equations without availability of genotyping data should also be used with care for the comparison of the reliabilities of the ASGM and BSAM models, especially when the reference population includes both purebred and crossbred animals (e.g., for the PB + CB–PB scenario with unrelated breeds and \(r_{PC} = 0.7\); Fig. 9). Nevertheless, the prediction equations without availability of genotyping data can still provide some insight into the reliability of both models in different scenarios. For instance, the results (Figs. 4, 5, 6, 7, 8, 9) showed that reliabilities for BSAM tended to increase with increasing distance between breeds, while the reliabilities for ASGM were only slightly affected. The increase in reliabilities with increasing distance between breeds, which compensates for fitting more effects in BSAM in comparison to ASGM, is in agreement with previous studies, e.g., Ibanez-Escriche et al. [4]. For instance, assume that a reference population of a fixed number of crossbred AB animals is available, and that heritabilities of crossbred performance traits estimated for ASGM and BSAM for the breed A are equal. Therefore, from Eq. (4), \(r_{C\_ASGM\_without}^{2} = \frac{{N_{AB} h_{c}^{2} }}{{N_{AB} h_{c}^{2} + Me_{a,AB} }}\), and Eq. (8), \(r_{C\_BSAM\_without}^{2} = \frac{{N_{AB} h_{{c_{A} }}^{2} }}{{N_{AB} h_{{c_{A} }}^{2} + 2Me_{a,AB}^{\left( A \right)} }}\), it follows that the reliability of genomic EBV based on BSAM would be higher than the reliability based on ASGM if \(Me_{a,AB} > 2Me_{a,AB}^{\left( A \right)}\). This will be the case if the LD patterns between breeds A and B are sufficiently different, which is more likely in the case when the breeds have diverged for many generations [37]. This is in agreement with our results (Figs. 4, 5) and previous studies based on simulated data (e.g., [4, 11]) which show that reliabilities for BSAM increase with increasing distance between breeds. The additional effects fitted in BSAM are taken into account in Eq. (8) by the factor of 2, which was also considered by van Grevenhof and van der Werf [38], who evaluated the benefit of including crossbred animals in the reference population of a crossbreeding program using genomic selection.

### Computation of *Me*

The evaluation of different scenarios based on the prediction equations without availability of genotyping data requires accurate estimates of all parameters, and especially of \(Me\) (e.g., [10, 14, 39, 40]). Parameters such as heritabilities and correlations, if estimated inaccurately, would similarly bias reliabilities predicted without and with availability of genotyping data, since these parameters are used in both equations. However, \(Me\), the effective number of segments that are shared and segregating in both selection candidates and reference animals, is only used when predicting reliability without availability of genotyping data, and has a large impact. In our study, the estimates of \(Me\) were computed from the differences between genomic and pedigree relationships between reference animals and selection candidates, as proposed by Wientjes et al. [14]. However, our results showed that these estimates of \(Me\) did not adequately consider the close relationships that can exist between reference animals and selection candidates. As already proposed by Daetwyler et al. [39] and Brard and Ricard [40], another approach would be to reverse the prediction equations without availability of genotyping data for computing \(Me\). Required reliabilities and other parameters should be obtained from a reference population and different generations of selection candidates in which genomic prediction is already applied. However, estimates of \(Me\) obtained by the reversion of prediction equations would be underestimates, since this would include a correction for the fact that the error variance decreases when multiple loci are used, which is trait-dependent.

This study has introduced the concept of the effective number of chromosome segments originating from a specific breed (\(b\)), and shared by selection candidates (\(S\)) from this breed and crossbred reference animals (\(Rc\)), \(Me_{S,Rc}^{\left( b \right)}\). This \(Me_{S,Rc}^{\left( b \right)}\) is different from \(Me_{S,Rc}\) as defined previously, since the latter does not take the breed origin of the chromosome segments of the crossbred animals into consideration. Indeed, each purebred population has its own value of \(Me\), while the genome of crossbred animals combines segments from the different populations they originated from. Thus, the value of \(Me_{S,Rc}\) includes both the effective number of chromosome segments segregating in breed \(b\), and the effective number of chromosome segments segregating in the other breed(s) of origin for the crossbred animals, while \(Me_{S,Rc}^{\left( b \right)}\) only involves the effective number of chromosome segments segregating in breed \(b\). For this study, it was assumed that \(Me_{S,Rc}^{\left( b \right)}\) (i.e., \(Me_{a,AB}^{\left( A \right)}\)) was equal to \(Me_{S,R}\) (i.e., \(Me_{a,A}\)) for which the breed \(b\) selection candidates and reference animals (\(R\)) share the same parents as the crossbred \(Rc\) animals. This assumption was valid based on the results obtained. In practice, such an assumption would not be possible, since the purebred and crossbred reference animals may not share the same parents, or reference animals may belong to different generations. Further research on accurate estimation of \(Me\) is therefore required.

## Conclusions

Several equations for predicting the reliability of genomic EBV for crossbred performance based on ASGM or on BSAM were derived for three different scenarios. These three scenarios involved a reference population that included only purebred animals, only crossbred animals, or both. The prediction equations were derived for application either without or with availability of genotyping data. Results showed that the reliabilities predicted without availability of genotyping data were of the same order of magnitude as the predictions of reliabilities predicted with availability of genotyping data. Thus, the proposed equations applied either without or with availability of genotyping data can be used to evaluate the effects of several parameters on the reliability of genomic EBV for crossbred performance (e.g., the genetic correlation between purebred and crossbred performances, heritabilities of the traits, number of reference animals, distance between breeds), and for the optimization of the design of breeding programs. Moreover, we showed that model BSAM can outperform model ASGM for a breed, if the effective number of chromosome segments originating from this breed and shared by selection candidates of this breed and crossbred reference animals is less than half the effective number of all chromosome segments that are independently segregating in these same animals, provided all other parameters remain equal. It is necessary to improve estimation of the effective number of chromosome segments to predict the reliability of genomic EBV without availability of genotyping data more accurately.

## Declarations

### Authors’ contributions

JV derived the equations, performed the analyses, and drafted the manuscript. JJW wrote the simulation program. All authors discussed the design of the simulations. All authors provided valuable insights throughout the analysis and writing process. All authors read and approved the final manuscript.

### Acknowledgements

Financial support from the Dutch Ministry of Economic Affairs, Agriculture, and Innovation (Public–private partnership “Breed4Food” Code BO-22.04-011-001-ASG-LR-3) is acknowledged. Discussions with Yvonne Wientjes and Piter Bijma, and useful comments of the two anonymous reviewers are acknowledged.

### Competing interests

The authors declare that they have no competing interests.

### Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

## Authors’ Affiliations

## References

- Wei M, van der Werf JHJ. Maximizing genetic response in crossbreds using both purebred and crossbred information. Anim Sci. 1994;59:401–13.Google Scholar
- Toosi A, Fernando RL, Dekkers JCM. Genomic selection in admixed and crossbred populations. J Anim Sci. 2010;88:32–46.View ArticlePubMedGoogle Scholar
- Christensen OF, Madsen P, Nielsen B, Su G. Genomic evaluation of both purebred and crossbred performances. Genet Sel Evol. 2014;46:23.View ArticlePubMedPubMed CentralGoogle Scholar
- Ibánẽz-Escriche N, Fernando RL, Toosi A, Dekkers JC. Genomic selection of purebreds for crossbred performance. Genet Sel Evol. 2009;41:12.View ArticlePubMedPubMed CentralGoogle Scholar
- Dekkers JCM. Marker-assisted selection for commercial crossbred performance. J Anim Sci. 2007;85:2104–14.View ArticlePubMedGoogle Scholar
- Zeng J, Toosi A, Fernando RL, Dekkers JC, Garrick DJ. Genomic selection of purebred animals for crossbred performance in the presence of dominant gene action. Genet Sel Evol. 2013;45:11.View ArticlePubMedPubMed CentralGoogle Scholar
- Lourenco DAL, Tsuruta S, Fragomeni BO, Chen CY, Herring WO, Misztal I. Crossbreed evaluations in single-step genomic best linear unbiased predictor using adjusted realized relationship matrices. J Anim Sci. 2016;94:909–19.View ArticlePubMedGoogle Scholar
- Hidalgo AM, Bastiaansen JWM, Lopes MS, Harlizius B, Groenen MAM, de Koning D-J. Accuracy of predicted genomic breeding values in purebred and crossbred pigs. G3 (Bethesda). 2015;5:1575–83.View ArticleGoogle Scholar
- Karoui S, Carabaño MJ, Díaz C, Legarra A. Joint genomic evaluation of French dairy cattle breeds using multiple-trait models. Genet Sel Evol. 2012;44:39.View ArticlePubMedPubMed CentralGoogle Scholar
- Wientjes YC, Veerkamp RF, Bijma P, Bovenhuis H, Schrooten C, Calus MP. Empirical and deterministic accuracies of across-population genomic prediction. Genet Sel Evol. 2015;47:5.View ArticlePubMedPubMed CentralGoogle Scholar
- Esfandyari H, Sørensen AC, Bijma P. A crossbred reference population can improve the response to genomic selection for crossbred performance. Genet Sel Evol. 2015;47:76.View ArticlePubMedPubMed CentralGoogle Scholar
- Daetwyler HD, Villanueva B, Woolliams JA. Accuracy of predicting the genetic risk of disease using a genome-wide approach. PLoS One. 2008;3:e3395.View ArticlePubMedPubMed CentralGoogle Scholar
- Goddard ME, Hayes BJ, Meuwissen THE. Using the genomic relationship matrix to predict the accuracy of genomic selection. J Anim Breed Genet. 2011;128:409–21.View ArticlePubMedGoogle Scholar
- Wientjes YCJ, Bijma P, Veerkamp RF, Calus MPL. An equation to predict the accuracy of genomic values by combining data from multiple traits, populations, or environments. Genetics. 2016;202:799–823.View ArticlePubMedGoogle Scholar
- VanRaden PM. Efficient methods to compute genomic predictions. J Dairy Sci. 2008;91:4414–23.View ArticlePubMedGoogle Scholar
- Vitezica ZG, Varona L, Elsen JM, Misztal I, Herring W, Legarra A. Genomic BLUP including additive and dominant variation in purebreds and F1 crossbreds, with an application in pigs. Genet Sel Evol. 2016;48:6.View ArticlePubMedPubMed CentralGoogle Scholar
- Henderson CR. Applications of linear models in animal breeding. 2nd ed. Guelph: University of Guelph; 1984.Google Scholar
- de los Campos G, Hickey JM, Pong-Wong R, Daetwyler HD, Calus MPL. Whole-genome regression and prediction methods applied to plant and animal breeding. Genetics. 2012;193:327–45.View ArticlePubMedGoogle Scholar
- Henderson CR. Best linear unbiased estimation and prediction under a selection model. Biometrics. 1975;31:423–47.View ArticlePubMedGoogle Scholar
- Powell JE, Visscher PM, Goddard ME. Reconciling the analysis of IBD and IBS in complex trait studies. Nat Rev Genet. 2010;11:800–5.View ArticlePubMedGoogle Scholar
- Sargolzaei M, Schenkel FS. QMSim: a large-scale genome simulator for livestock. Bioinformatics. 2009;25:680–1.View ArticlePubMedGoogle Scholar
- Calus MPL, Vandenplas J. Calc_grm—a program to compute pedigree, genomic, and combined relationship matrices. Wageningen: ABGC, Wageningen UR Livestock Research; 2016.Google Scholar
- Goddard ME. Genomic selection: prediction of accuracy and maximisation of long term response. Genetica. 2009;136:245–57.View ArticlePubMedGoogle Scholar
- Uimari P, Tapio M. Extent of linkage disequilibrium and effective population size in Finnish Landrace and Finnish Yorkshire pig breeds. J Anim Sci. 2011;89:609–14.View ArticlePubMedGoogle Scholar
- Lin Z, Hayes BJ, Daetwyler HD. Genomic selection in crops, trees and forages: a review. Crop Pasture Sci. 2014;65:1177–91.View ArticleGoogle Scholar
- Sevillano CA, Vandenplas J, Bastiaansen JWM, Calus MPL. Empirical determination of breed-of-origin of alleles in three-breed cross pigs. Genet Sel Evol. 2016;48:55.View ArticlePubMedPubMed CentralGoogle Scholar
- Vandenplas J, Calus MPL, Sevillano CA, Windig JJ, Bastiaansen JWM. Assigning breed origin to alleles in crossbred animals. Genet Sel Evol. 2016;48:61.View ArticlePubMedPubMed CentralGoogle Scholar
- Mrode RA. Linear models for the prediction of animal breeding values. 2nd ed. Wallingford: CABI Publishing; 2005.View ArticleGoogle Scholar
- Strandén I, Garrick DJ. Technical note: derivation of equivalent computing algorithms for genomic predictions and reliabilities of animal merit. J Dairy Sci. 2009;92:2971–5.View ArticlePubMedGoogle Scholar
- Dekkers JCM. Asymptotic response to selection on best linear unbiased predictors of breeding values. Anim Sci. 1992;54:351–60.Google Scholar
- Bijma P. Accuracies of estimated breeding values from ordinary genetic evaluations do not reflect the correlation between true and estimated breeding values in selected populations. J Anim Breed Genet. 2012;129:345–58.View ArticlePubMedGoogle Scholar
- Van Grevenhof EM, Van Arendonk JA, Bijma P. Response to genomic selection: the Bulmer effect and the potential of genomic selection when the number of phenotypic records is limiting. Genet Sel Evol. 2012;44:26.View ArticlePubMedPubMed CentralGoogle Scholar
- Gorjanc G, Bijma P, Hickey JM. Reliability of pedigree-based and genomic evaluations in selected populations. Genet Sel Evol. 2015;47:65.View ArticlePubMedPubMed CentralGoogle Scholar
- Aguilar I, Misztal I, Johnson DL, Legarra A, Tsuruta S, Lawlor TJ. Hot topic: a unified approach to utilize phenotypic, full pedigree, and genomic information for genetic evaluation of Holstein final score. J Dairy Sci. 2010;93:743–52.View ArticlePubMedGoogle Scholar
- Christensen OF, Lund MS. Genomic prediction when some animals are not genotyped. Genet Sel Evol. 2010;42:2.View ArticlePubMedPubMed CentralGoogle Scholar
- Legarra A, Christensen OF, Aguilar I, Misztal I. Single step, a general approach for genomic selection. Livest Sci. 2014;166:54–65.View ArticleGoogle Scholar
- de Roos APW, Hayes BJ, Spelman RJ, Goddard ME. Linkage disequilibrium and persistence of phase in Holstein-Friesian, Jersey and Angus cattle. Genetics. 2008;179:1503–12.View ArticlePubMedPubMed CentralGoogle Scholar
- van Grevenhof IE, van der Werf JH. Design of reference populations for genomic selection in crossbreeding programs. Genet Sel Evol. 2015;47:14.View ArticlePubMedPubMed CentralGoogle Scholar
- Daetwyler HD, Pong-Wong R, Villanueva B, Woolliams JA. The impact of genetic architecture on genome-wide evaluation methods. Genetics. 2010;185:1021–31.View ArticlePubMedPubMed CentralGoogle Scholar
- Brard S, Ricard A. Is the use of formulae a reliable way to predict the accuracy of genomic selection? J Anim Breed Genet. 2015;132:207–17.View ArticlePubMedGoogle Scholar