An \(F_{1}\) population involves gametes from the parental populations 1 and 2. If dominance is present, and because allelic frequencies differ in each breed, the within-breed (additive) substitution effects are not equal to the substitution effects across the \(F_{1}\) population. Thus, purebred individuals have different breeding values depending on whether they are mated to individuals from the same or another breed/line. This situation is well known [3, 12, 13], and holds even if the genotype effects are constant across breeds or crossbred individuals.
Consider one locus/gene and two non-inbred populations, \(P_{1}\) and \(P_{2}\) that are each in Hardy–Weinberg equilibrium. An individual from \(P_{1}\) is crossed with a random individual from \(P_{2}\). Individuals in the \(F_{1}\) population have genotypes \(B_{1} B_{2}\), \(B_{1} b_{2}\), \(b_{1} B_{2}\) or \(b_{1} b_{2}\) where subscripts 1 and 2 indicate the origin of the allele, i.e. populations 1 or 2, respectively. The genotypic value \(G\) of an individual in the crossbred population \(F_{1}\) is equal to:
$$G_{{B_{1} B_{2} }} = a,\;G_{{B_{1} b_{2} }} \,{\text{and}}\,G_{{B_{2} b_{1} }} = d \,{\text{and}}\,G_{{b_{1} b_{2} }} = - a ,$$
where \(a\) and \(d\) are deviations from the midpoint of the two homozygotes, and correspond to the (biological) additive and dominant effects of the gene, respectively. Let us assume that the genotypic values (\(a, d\) and \(- a\)) are the same in the two parental populations and the crossbred population \(F_{1}\) (this assumption will be relaxed later) [1], the genetic mean of the \(F_{1}\) population is therefore:
$$E\left( G \right) = \left( {pp^{\prime} - qq^{\prime}} \right)a + \left( {pq^{\prime} + qp^{\prime}} \right)d,$$
where \(p\) and \(q = 1 - p\) are the allelic frequencies of \(B_{1}\) and \(b_{1}\) in population 1, and \(p^{\prime}\) and \(q^{\prime}\) are the allelic frequencies of \(B_{2}\) and \(b_{2}\) in population 2. If the difference in allele frequencies between the two populations is denoted by \(y = p - p^{\prime} = q^{\prime} - q\), the genetic mean is, as in Falconer [1], equal to:
$$E(G) = \left( {p - q - y} \right)a + \left[ {2pq + \left( {p - q} \right)y} \right]d.$$
Following the classical parameterization, the genotypic values of individuals in the \(F_{1}\) population are the sum of the additive (or breeding) effects of the gametes that originate from populations \(P_{1}\) and \(P_{2}\) (\(u_{1}\) or \(u_{2}\)) and a dominant deviation \((v)\) which depends on the combination of alleles received [14]:
$$G = E\left( G \right) + u_{1} + u_{2} + v,$$
(1)
where \(u_{1}\) is the additive effect of a gamete from population 1 combined with a gamete from population 2, which differs from the effect of the gamete within the same population. Thus, \(u_{1}\) and \(u_{2}\) represent the general combining ability (GCA) of alleles \(B_{1}\) or \(b_{1}\), and \(B_{2}\) or \(b_{2}\), whereas \(v\) is the specific combining ability (SCA) between alleles \(B_{1}\) or \(b_{1}\), and \(B_{2}\) or \(b_{2}\). An equivalent expression that is often used in plant breeding is:
$$G = E(G) + GCA_{i} + GCA_{j} + SCA_{ij} ,$$
where the performance of an individual i is evaluated in terms of its average performance when it is crossed with another individual j [13].
Additive values \(u_{1}\) and \(u_{2}\) of the gametes include a substitution effect for each gene. Thus, \(\alpha_{1}\) is the additive (or breeding) effect of the gametes from population 1 crossed with population 2, and \(\alpha_{2}\) the additive (or breeding) effect of the gametes from population 2 crossed with population 1, which are equal to:
$$\alpha_{1} = a + d\left( {q^{\prime} - p^{\prime}} \right)\quad {\text{and}}\quad \alpha_{2} = a + d\left( {q - p} \right).$$
From the expression, \(\sigma_{G}^{2} = E\left( {G^{2} } \right) - \left( {E\left( G \right)} \right)^{2}\), the total genetic variance for the \(F_{1}\) population is equal to:
$$\sigma_{G}^{2} = \left( {pq + p^{\prime}q^{\prime}} \right)a^{2} - 2\left( {1 - q - q^{\prime}} \right)\left( {pq^{\prime} + p^{\prime}q} \right)ad + \left( {pq + p^{\prime}q^{\prime} - 4pqp^{\prime}q^{\prime}} \right)d^{2} .$$
We can partition the genetic variance \(\sigma_{G}^{2}\) into components due to individual additive value (breeding values, \(u\)), and dominance deviations (\(v\)). The additive genetic variance for the \(F_{1}\) population is:
$$\sigma_{A}^{2} = \frac{1}{2}\sigma_{{A_{1} }}^{2} + \frac{1}{2}\sigma_{{A_{2} }}^{2} ,$$
where \(\sigma_{{A_{1} }}^{2} = 2pq\left( {\alpha_{1} } \right)^{2}\) and \(\sigma_{{A_{2} }}^{2} = 2p^{\prime}q^{\prime}\left( {\alpha_{2} } \right)^{2}\).
The part of variance for each population is:
$$\sigma_{{A_{1} }}^{2} = 2pq\left( {\alpha_{1} } \right)^{2} = 2\left[ {pqa^{2} + 2pq\left( {q^{\prime} - p^{\prime}} \right)ad + pq\left( {q^{\prime} - p^{\prime}} \right)^{2} d^{2} } \right],$$
$$\sigma_{{A_{1} }}^{2} = 2pq\left[ {a + \left( {q^{\prime} - p^{\prime}} \right)d} \right]^{2} ,$$
(2)
$$\sigma_{{A_{2} }}^{2} = 2p^{\prime}q^{\prime}\left( {\alpha_{2} } \right)^{2} = 2\left[ {p^{\prime}q^{\prime}a^{2} + 2p^{\prime}q^{\prime}\left( {q - p} \right)ad + p^{\prime}q^{\prime}\left( {q - p} \right)^{2} d^{2} } \right],$$
$$\sigma_{{A_{2} }}^{2} = 2p^{\prime}q^{\prime}\left[ {a + \left( {q - p} \right)d} \right]^{2}.$$
(3)
\(\sigma_{{A_{1} }}^{2}\)
\((\sigma_{{A_{2} }}^{2} )\) is the variance of the GCA of the alleles of individuals from population 1 crossed to individuals from population 2 (alleles of individuals from population 2 crossed with individuals from population 1) or it can also be considered as the additive variance of gametes inherited from population 1 (from population 2) in the \(F_{1}\) population as in Lo et al. [3].
The variance of the GCA (\(\sigma_{A}^{2}\)) is an important parameter to understand if selection of purebred individuals can increase crossbred performance [1]. If variance of the GCA explains a large part of the total genetic variance for the \(F_{1}\) population, it means that within-population selection will result in a large increase of the crossbred performance, without resorting to specific matings to create crossbreds with large dominance deviations.
The term \(ad\) appears in \(\sigma_{G}^{2}\) but is completely embedded in \(\sigma_{{A_{1} }}^{2}\) and \(\sigma_{{A_{2} }}^{2}\). This term differs from 0 if there is covariance between \(a\) and \(d\), i.e. if \(a\) and \(d\) are of the same magnitude and direction or if there is overdominance. This covariance between additive and dominant effects of genes implies the presence of inbreeding depression or heterosis. Different models have been proposed to take the dependency between additive and dominant effects into account [15].
Thus, based on Eq. (2) and (3), we can write the additive variance for the \(F_{1}\) population as:
$$\sigma_{A}^{2} = pq\left[ {a + \left( {q^{\prime} - p^{\prime}} \right)d} \right]^{2} + p^{\prime}q^{\prime}\left[ {a + \left( {q - p} \right)d} \right]^{2} .$$
Using this last expression of \(\sigma_{A}^{2}\) and the expression of the total genetic variance, i.e. \(\sigma_{G}^{2} = \sigma_{A}^{2} + \sigma_{D}^{2}\), the variance for the dominance deviation \((v)\) can be obtained as:
$$\sigma_{D}^{2} = \sigma_{G}^{2} - \sigma_{A}^{2} ,$$
$$\sigma_{D}^{2} = \left[ {pq\left( {1 - 2p^{\prime}q^{\prime}} \right) + (p^{\prime}q^{\prime}(1 - 2pq)} \right]d^{2} - \left[ {pq\left( {1 - 2p^{\prime}} \right) + (p^{\prime}q^{\prime}(1 - 2p)} \right]d^{2},$$
where the first and second terms correspond to the total genetic variance and the breeding value (or GCA) variance, respectively. Thus, the dominance genetic variance or the variance of the SCA is equal to:
$$\sigma_{D}^{2} = 4pqp^{\prime}q^{\prime}d^{2} ,$$
(4)
which leads to the result obtained for a single population if \(p = p^{\prime}\) (e.g., [1]).
If \(a\) and \(d\) effects are considered as random variables with a covariance of 0 between \(a\) and \(d\), variance components for the \(F_{1}\) population can be obtained from these expressions using markers in a GBLUP context as detailed in the next section.
Equivalent genomic model based on SNPs
A model including (biological) additive and dominant effects of the SNPs can be written in matrix form for a set of individuals as [16]:
$${\mathbf{y}} = \mathbf{1\mu} + {\mathbf{Za}} + {\mathbf{Wd}} + \varvec{e}\text{,}$$
where \({\mathbf{y}}\) is the phenotypic value of individuals, \(\mu\) is the population mean and \(\varvec{e}\) is the residual. Additive effect \({\mathbf{a}}\) and dominant effect \({\mathbf{d}}\) vectors are included for each of the SNP markers. The matrix \({\mathbf{Z}} = ({\mathbf{z}}_{1} \ldots {\mathbf{z}}_{{\mathbf{m}}} )\) is equal to 1, 0, −1, for SNP genotypes \(BB\), \(Bb\) and \(bb\), respectively. For the dominant component, \({\mathbf{W}} = ({\mathbf{w}}_{1} \ldots {\mathbf{w}}_{{\mathbf{m}}} )\) is equal to 0, 1, 0 for SNP genotypes \(BB\), \(Bb\) and \(bb\), respectively. This model is general and applies to any population structure (purebred or crossed), as far as effects \(a\) and \(d\) are assumed constant across populations.
From this genotypic model, we can define \(u^{*}\) and \(v^{*}\) as the genotypic additive and dominant effects, i.e. the parts that are attributed to the additive and dominance “biological” effects [17, 18] of the markers for the whole population (individuals from populations 1 and 2 and the crossbred population \(F_{1}\)). Note that ‘biological’ is used here to refer to genotypic additive and dominant values of the SNP, to distinguish it from the traditional treatment of quantitative genetics in terms of “statistical” effects (breeding values and dominance deviations). So for a set of individuals \({\mathbf{u}}^{*} = {\mathbf{Za}}\) and \({\mathbf{v}}^{*} = {\mathbf{Wd}}\). Under standard assumptions, the covariances across genotypic additive values are:
$$Cov\left( {{\mathbf{u}}^{ *} } \right) = {\mathbf{ZZ}}\varvec{'}\sigma_{a}^{2} ,$$
where \(\sigma_{a}^{2}\) is the SNP variance for additive component. Then, the normalized matrix is:
$$Cov\left( {{\mathbf{u}}^{ *} } \right) = \frac{{{\mathbf{ZZ}}\varvec{'}}}{{\left\{ {tr\left[ {{\mathbf{ZZ}}\varvec{'}} \right]} \right\}/n}}\sigma_{{A^{*} }}^{2} .$$
The division by \(\left\{ {tr\left[ {{\mathbf{ZZ}}\varvec{'}} \right]} \right\}/n\) where n is the number of individuals scales the matrix to an average of the diagonal elements equal to 1. This covariance matrix is similar to the classical \({\mathbf{G}}\) matrix of genomic BLUP [19], but with a different variance component i.e. \(\sigma_{{A^{*} }}^{2}\), the variance component that is associated to the genotypic additive values (this is not a genetic variance per se since it cannot be interpreted as the variance of the population). Based on \(\sigma_{{A^{*} }}^{2}\), the SNP variance for the additive component can be obtained as \(\sigma_{a}^{2} = \frac{{\sigma_{{A^{*} }}^{2} }}{{\left\{ {tr\left[ {\varvec{ZZ'}} \right]} \right\}/n}}\).
Then, the covariance of genotypic values due to dominance is:
$$Cov\left( {{\mathbf{v}}^{ *} } \right) = \frac{{{\mathbf{WW}}\varvec{'}}}{{\left\{ {tr\left[ {{\mathbf{WW}}\varvec{'}} \right]} \right\}/n}}\sigma_{{D^{*} }}^{2} ,$$
where \(\sigma_{{D^{*} }}^{2}\) is the variance component associated to genotypic dominant values. The SNP variance for the dominance component can be obtained as:
$$\sigma_{d}^{2} = \frac{{\sigma_{{D^{*} }}^{2} }}{{\left\{ {tr\left[ {{\mathbf{WW}}\varvec{'}} \right]} \right\}/n}}.$$
Therefore, the genotypic model is an equivalent model, which is useful to go from variance components (\(\sigma_{{A^{*} }}^{2}\), \(\sigma_{{D^{*} }}^{2}\)), with no particular interpretations, to marker variances (\(\sigma_{a}^{2} , \sigma_{d}^{2}\)).
To estimate SNP variance, additive and dominance genetic variances in the F1 population are obtained from Eqs. (2), (3) and (4) extended to multiple loci. The extension to multiple loci assumes linkage equilibrium and uncorrelated marker effects which are standard assumptions [19]. To estimate additive variances, we also assume a covariance of 0 between \(a\) and \(d\). Thus, the additive genetic variance due to alleles from population 1 in the \(F_{1}\) population can be written as:
$$\sigma_{{A_{1} }}^{2} = \mathop \sum \nolimits (2p_{i} q_{i} )\sigma_{a}^{2} + \mathop \sum \nolimits (2p_{i} q_{i} \left( {q_{i}^{\prime } - p_{i}^{\prime } } \right)^{2} )\sigma_{d}^{2} ,$$
(5)
and the additive genetic variance due to alleles from population 2 in the \(F_{1}\) population as:
$$\sigma_{{A_{2} }}^{2} = \mathop \sum \nolimits \left( {2p_{i}^{\prime } q_{i}^{\prime } } \right)\sigma_{a}^{2} + \mathop \sum \nolimits \left( {2p_{i}^{\prime } q_{i}^{\prime } \left( {q_{i} - p_{i} } \right)^{2} } \right)\sigma_{d}^{2} .$$
(6)
This equation is the variance of GCA among individuals from population 2 crossed with individuals from population 1. It should be recalled that the additive genetic variance for the \(F_{1}\) population is equal to:
$$\sigma_{A}^{2} = \frac{1}{2}\sigma_{{A_{1} }}^{2} + \frac{1}{2}\sigma_{{A_{2} }}^{2} .$$
We can also write the dominance genetic variance for the \(F_{1}\) population as:
$$\sigma_{D}^{2} = \mathop \sum \nolimits (4p_{i} q_{i} p_{i}^{\prime } q_{i}^{\prime } )\sigma_{d}^{2} .$$
(7)
For the additive and dominance genetic variances in the parental breeds/lines, expressions are in Vitezica et al. [18]. For instance, for population 1 \((P_{1} )\) with allele frequencies \(p\) and \(q\), variances are equal to:
$$\sigma_{{A_{{P_{1} }} }}^{2} = \mathop \sum \nolimits (2p_{i} q_{i} )\sigma_{a}^{2} + \mathop \sum \nolimits (2p_{i} q_{i} \left( {q_{i} - p_{i} } \right)^{2} )\sigma_{d}^{2} ,$$
and
$$\sigma_{{D_{{P_{1} }} }}^{2} = \mathop \sum \nolimits \left( {2p_{i} q_{i} } \right)^{2} \sigma_{d}^{2} .$$
Therefore, this approach allows to estimate variance components for the \(F_{1}\) population under a genomic model with additive and non-additive (dominance) inheritance. The three variance components in Eqs. (5), (6) and (7) do have an interpretation in terms of variances of breeding values (or GCA) and of dominant deviations (or SCA).
The biological additive and dominant effects of SNPs may not be the same across the different populations, due to genotype by environment or genotype by genotype (i.e. epistasis) interactions.
A simple alternative is to model marker effects as correlated across populations [20], which implies correlated \({\mathbf{u}}^{ *}\) and \({\mathbf{v}}^{ *}\) [21, 22]. This generalizes the methods above.