Genomic evaluation of both purebred and crossbred performances

Background For a two-breed crossbreeding system, Wei and van der Werf presented a model for genetic evaluation using information from both purebred and crossbred animals. The model provides breeding values for both purebred and crossbred performances. Genomic evaluation incorporates marker genotypes into a genetic evaluation system. Among popular methods are the so-called single-step methods, in which marker genotypes are incorporated into a traditional animal model by using a combined relationship matrix that extends the marker-based relationship matrix to non-genotyped animals. However, a single-step method for genomic evaluation of both purebred and crossbred performances has not been developed yet. Results An extension of the Wei and van der Werf model that incorporates genomic information is presented. The extension consists of four steps: (1) the Wei van der Werf model is reformulated using two partial relationship matrices for the two breeds; (2) marker-based partial relationship matrices are constructed; (3) marker-based partial relationship matrices are adjusted to be compatible to pedigree-based partial relationship matrices and (4) combined partial relationship matrices are constructed using information from both pedigree and marker genotypes. The extension of the Wei van der Werf model can be implemented using software that allows inverse covariance matrices in sparse format as input. Conclusions A method for genomic evaluation of both purebred and crossbred performances was developed for a two-breed crossbreeding system. The method allows information from crossbred animals to be incorporated in a coherent manner for such crossbreeding systems.


Background
Production systems based on crossbreeding are predominant in pig and chicken breeding and take advantage of the increased performance of crossbred animals compared to purebred animals. For a two-breed crossbreeding system, Wei and van der Werf (Appendix 2 in [1]) presented a model for genetic evaluation using information from both purebred and crossbred animals. The model provides estimated breeding values for purebred (mating with own breed) and crossbred (mating with the other breed) performances that are different but correlated. The model is particularly attractive since it can fit a breeding goal that includes both purebred and crossbred performances (see  [2]). This model is the starting point of our paper.
Genomic selection [3] has offered a new paradigm for livestock breeding and has been successfully applied for selection within purebred populations [4][5][6]. Moreover, genomic selection also offers greater opportunities for incorporating information from crossbreds and selecting for crossbred performance [7][8][9]. Genomic selection of purebreds for crossbred performance was proposed by Ibánẽz-Escriche et al. [7] that used phenotypes on crossbreds only, and a genomic model with breed of origin specific allele substitution effects. The resulting breeding values for purebred animals were for crossbred performance. Although the study included genomic data, it was less sophisticated than the Wei and van der Werf model [1] since each animal had only one breeding value and phenotype recordings in purebreds were not used. In addition, it assumed that all relevant animals were http://www.gsejournal.org/content/46/1/23 genotyped, which would not be a very likely scenario in practice.
In cases in which not all animals are genotyped, the so-called single-step methods [10][11][12] provide a coherent approach for genomic evaluation. These methods incorporate marker genotypes into a traditional animal model [13] by using a combined relationship matrix that extends the marker-based relationship matrix of VanRaden [14] to non-genotyped animals, and they have been shown to perform well for genomic evaluation of dairy cattle [11,15], pigs [16,17] and chickens [18]. Misztal et al. [19] provided an extension with "unknown-parent groups" to allow for different populations, but using such an approach on data from both purebred and crossbred animals would assume equal genetic variances in the two breeds and in the crossbreds, and also that breeding values for purebred and crossbred performances are the same. A single-step method for genomic evaluation of both purebred and crossbred performances has not been developed yet. In a genomic model, when crossbred animals are genotyped, it is natural to split the additive genetic effect of crossbreds into breed of origin specific genetic components, as in Ibánẽz-Escriche et al. [7]. Each of these components is a partial genetic effect, in the sense that only breed-specific alleles are used. This use of the terminology "partial genetic effect" is consistent with the model of Garcia-Cortes and Toro [20], in which for multibreed analysis the additive genetic value is split into several independent parts depending on their genetic origin, with the variance-covariance structure of each part being determined by a partial relationship matrix (constructed from pedigree). A partial relationship matrix is a relationship matrix that describes relationships only according to genetic origin. From this point of view, partial relationship matrices are key when constructing a single-step method for both purebred and crossbred performances. However, the Wei and van der Werf model is not formulated using partial relationship matrices, and it therefore needs to be reformulated for the purpose of incorporating genomic information.
The aim of this paper is to present an extension of the Wei van der Werf model that incorporates genomic information. The extension consists of four steps: (1) the Wei van der Werf model is reformulated using two partial relationship matrices [20] for the two breeds; (2) markerbased partial relationship matrices similar to VanRaden [14] are constructed; (3) marker-based partial relationship matrices are adjusted to be compatible to pedigree-based partial relationship matrices, similar to Christensen et al. [17] and (4) combined partial relationship matrices are constructed using information from both pedigree and marker genotypes, similar to the combined relationship matrix of Legarra et al., Aguilar et al. and Christensen and Lund [10][11][12]. This extension of the Wei and van der Werf model can be implemented using software that allows inverse covariance matrices in sparse format as input.

The Wei and van der Werf model
Here, the Wei van der Werf model (Appendix 2 in [1]) is presented. The two breeds are named A and B, and it is assumed that all crossbred animals AB have known purebred parents. The number of animals in the pedigree is n A and n B for breed A and breed B, respectively, and the number of crossbred animals is n AB . The model for the phenotypes is a trivariate model where the vectors y A , y B and y AB contain phenotypes on the breed A, breed B and crossbred AB animals, respectively, and for the three breed groups A, B and AB, the vectors X A β A , X B β B and X AB β AB contain fixed effects, and e A ∼ N(0, σ 2 A I), e B ∼ N(0, σ 2 B I) and e AB ∼ N(0, σ 2 AB I) are the residual error vectors. The n A -dimensional vector a A contains breeding values for purebred performance for breed A animals (mating within breed A), and matrix Z A is an incidence matrix assigning breeding values to records. Vector a B and matrix Z B are defined similarly for breed B. Finally, the n AB -dimensional vector c AB contains the additive genetic effects for crossbred animals, and these are related to the vectors of breeding values for purebred animals for crossbred performance (mating with the other breed) as follows where the matricesZ AB,A andZ AB,B assign purebred parents to crossbred offspring, c A is an n A -dimensional vector containing breeding values for crossbred performance for breed A animals (mating with breed B animals), c B is an n B -dimensional vector containing breeding values for crossbred performance for breed B animals (mating with breed A animals), and the vector AB contains the Mendelian sampling effects. The genetic covariances are described by A and breed B, respectively, denotes the Kronecker product, and are the 2 × 2 variance-covariance matrices containing the genetic variances for purebred breeding values and crossbred breeding values, and the covariance between the two, for breed A and breed B, respectively. The variancecovariance matrix of the Mendelian sampling term is a diagonal matrix D AB with elements The Wei and van der Werf model is an additive genetic model in the sense that the breeding values for purebred performance, a A , a B , are additive genetic effects, and the breeding values for crossbred performance, c A , c B , in combination with the genetic effects c AB are also additive genetic effects. The model therefore does not contain dominance genetic effects explicitly. In practice, such an additive genetic model may also partly capture dominant gene actions and other non-additive gene actions [21]. The fact that genetic correlations between purebred and crossbred performances are different from one would be due to the presence of dominant gene actions in combination with different allele frequencies in the two breeds [22], in addition to genetic effects being different in different environments. In addition, the model captures the general level of heterosis in crossbred animals since it has a seperate fixed mean effect for crossbred animals.
Wei and van der Werf [1] made an alternative formulation of the model. The term c AB is not of interest for genetic evaluation when crossbred animals are not used for breeding, and Wei and van der Werf reformulated the model using AB = AB + e AB as the residual error term for the crossbred phenotypes and thereby expressed the model as a reduced model using only the terms a A , c A , a B and c B with breeding values for purebred animals. Note that due to different levels of inbreeding of parents (see formula (3)), the term AB has heterogeneous variance, and assuming a constant variance is an approximation. The reduced model can be implemented using software that handle multi-trait genetic models. For the purpose of this paper, observed marker genotypes on crossbred animals provide information on the Mendelian sampling term AB , and the absorption of this term into the residual error term is therefore not well-suited. For this reason we do not follow the reduced model in this paper.
Finally, the special case where where denotes artificial random vectors such that the genetic variance-covariance matrix can be expressed using a Kronecker product, and A is the usual additive relationship matrix for all animals. This can therefore be implemented using a combined pedigree across all animals. We will return to this special case in the Discussion section.

Reformulated model
Here, the Wei and van der Werf [1] model is reformulated using breed-specific partial relationship matrices, as in Garcia-Cortes and Toro [20]. Partial relationship matrices describe relationships according to genetic origin.
The starting point of the reformulation is the Mendelian sampling term for the crossbred animals in formula (2). This term can be split into breed of origin effects, (3) can be formulated in matrix notation as D AB = Var( (A) and Var( (B) with I n AB being an identity matrix of size n AB , and In other words, we decompose the Mendelian sampling term into Mendelian sampling terms for the A and B gametes. The additive genetic effect for crossbred animals in formula (2) can then be expressed as AB are independent, i.e. the genetic effects for crossbred animals is split into two breed of http://www.gsejournal.org/content/46/1/23 origin genetic effects. Thus, the model equation system (1) can be written as: AB + e AB . Focusing on breed A, the variance-covariance matrix of the genetic effects c Therefore, the variance-covariance matrix of breed A specific genetic effects for crossbred performance equals where the symmetric (n A + n AB )-dimensional matrix is the breed A specific partial relationship matrix in Garcia-Cortes and Toro [20] (see below). Similarly, the variance-covariance matrix of breed B specific genetic effects for crossbred performance equals where is the breed B specific partial relationship matrix (see below). Garcia-Cortes and Toro [20] presented a partition of the variance-covariance matrix of additive genetic values into breed-specific and breed-segregation terms, where each term is a scaled partial relationship matrix. The partial relationship matrices are constructed using recursive formulas similar to usual recursive formulas for the additive relationship matrix [23]. For the two-breed terminal crossbreeding system, the partition results in breed A and breed B specific partial relationship matrices, but no breed-segregation partial relationship matrices; we refer to Garcia-Cortes and Toro [20] for the general case. The recursive formulas for the breed A specific partial relationships are: where f (i) and m(i) are the two parents of animal i, animal i is not a descendant of i, and f A i is the breed A proportion of individual i (equal to 1 for purebred A animals, 0 for purebred B animals and 0.5 for crossbred animals). To insure that partial relationship matrices are invertible, Munilla-Leguizamón and Cantet [24] suggested to redefine the partial relationship matrices such that only elements that are non-null by breed origin were included, i.e. for the breed A specific partial relationships shown here, the elements related to purebred B animals are excluded. In this paper, we followed that suggestion, and it is not difficult to check that the matrix in (5) is indeed the breed A specific partial relationship matrix. Using matrix formulation, the breed A specific partial relationship matrix is For matrix T, the inverse matrix T −1 is a lower triangular matrix with diagonal elements equal to 1 and in the lower diagonal, the only non-zero elements are −0.5 for offspring parent elements. An example with a small pedigree is in Table 1, and the corresponding partial relationship matrices are in Tables 2 and 3.
The reformulation of the Wei and van der Werf model is completed by introducing two artificial random vectors a (A) AB and a (B) AB such that genetic variance-covariance matrices can be presented using Kronecker products. For breed A, the genetic covariances are described by  and similary for breed B, the genetic covariances are described by Implementing the model requires inverses of the two partial relationship matrices. The inverse of a partial relationship matrix (A (A) ) −1 can be expressed by the usual formula and the usual methods for computing the diagonal elements of the partial relationship matrix and the inverse partial relationship matrix in sparse format [25,26] can be applied.
The model is a trivariate model with breed A and B specific genetic effects for both purebred and crossbred performances, and can be implemented using a software package for multivariate mixed models that either explicitly can construct inverses of partial relationship matrices from pedigree or alternatively can use inverse covariance matrices in sparse format as input (e.g., DMU   Extending the model to incorporate genomic information requires the construction of two combined breedspecific partial relationship matrices expressed as inverse matrices, and for this purpose, marker-based breedspecific partial relationship matrices need to be constructed, and marker-based and pedigree-based partial relationship matrices need to be made compatible. These are the topics of the following subsections.

Marker-based partial relationship matrix
Here, a marker-based breed A specific partial relationship matrix is constructed. The assumption here is that the marker genotypes for crossbred animals are phased such that it is known which allele originated from breed A and which allele originated from breed B. The marker genotype matrix m A for purebred A animals has elements m A ij = −1, 0 or 1 if SNP j of individual i is 11, 12, or 22, respectively. For crossbred animals, the breed A marker allele matrix q A has elements q A ij = −0.5 or 0.5 if loci j of individual i has breed A allele 1 or 2, respectively.
Constructing a marked-based breed-specific partial relationship matrix similar to the marker-based relationship matrix of VanRaden [14] is done by using the breedspecific alleles only. The marker-based breed A specific partial relationship matrix G (A) is divided into submatrices with indices denoting genotyped breed A and crossbred animals, , which are is defined as where the vectorρ A contains estimated breed A specific allele frequencies based on marker genotypes for purebred animals and breed A specific marker alleles for crossbred animals, and s A is a scaling parameter. The scaling parameter s A is unspecified here since we adjust the marker-based partial relationship matrix to make it compatible with the pedigree-based partial relationship matrices, similar to Christensen et al. [17] (see below).
The marker-based breed B specific partial relationship matrix G (B) is constructed similarly. Matrices G (A) and G (B) correspond to two different covariance structures, while matrix G (AB) does not exist. For crossbred animals that are genotyped, the genetic effect is the sum of two effects, with variance-covariance matrices proportional to G (A) AB,AB and G (B) AB,AB , respectively. Since a genetic effect http://www.gsejournal.org/content/46/1/23 with a marker-based relationship matrix can be equivalently formulated as a sum of allele substitution effects, the genetic effect for crossbred animal i therefore equals where α A j , α B j are independent breed of origin specific substitution effects for SNP j = 1 . . . , p. The model for crossbred animals is therefore as described by Ibánẽz-Escriche et al. [7].

Compatibility of marker-based and pedigree-based partial relationship matrices
Marker-based and pedigree-based partial relationship matrices must be compatible [17,27,28]. In order to achieve this, either the marker-based partial relationship matrix or the pedigree-based partial relationship matrix must be adjusted [28]. Here, we show how to adjust the breed A specific marker-based partial relationship matrix, G (A) , to the breed A specific pedigree-based partial relationship matrix for the subset of genotyped animals, A (A) 11 , similar to Christensen et al [17]. The adjustment is of the form with submatrices corresponding to purebred genotyped and crossbred genotyped animals, 1 denoting a vector of ones (with sub-index denoting the dimension: n 1 is equal to the number of genotyped purebred animals and n 2 to the number of genotyped crossbred animals); matrix K being implicitly defined; and α and β are parameters that need to be estimated. The form of the adjustment above is explained in Appendix A. According to Christensen et al. [17], the parameters α and β can be determined by solving a system of two equations Note that parameter β is completely confounded with the scaling parameter s A in (5), and the choice of the scaling parameter is therefore irrelevant.

Combined pedigree-based and marker-based partial relationship matrix
The combined partial relationship matrix (H (A) ) can be constructed similar to the combined relationship matrix for purebred animals [10][11][12]. On the inverse scale, the elements of the matrix for non-genotyped animals do not depend on the marker genotypes. Therefore, with Parameter ω is the fraction of genetic variance not captured by the marker genotypes, and in practice should be chosen to maximize accuracy and minimize bias of the resulting estimated breeding values [17].
Computation of the submatrix A (A) 11 follows the Colleau algorithm [29,30], which is based on the decomposition A (A) = TDT T shown in a previous subsection. The essential idea is to compute the ith column of A

Discussion
This paper demonstrates how to incorporate marker genotypes into the Wei and van der Werf model for genetic evaluation using both purebred and crossbred information. The approach builds on using partial relationship matrices, and assumes that the marker genotypes of crossbreds can be phased such that the breed of origin http://www.gsejournal.org/content/46/1/23 of alleles is known. Many different algorithms for phasing have been developed [31,32], and it has been shown that the accuracy of phasing depends among others on size of the sample and relatedness of animals within the sample.
An alternative to using combined partial relationship matrices would be to specify one combined relationship matrix across all animals in the three breed groups A, B and AB. As mentioned in the Methods section, this is actually a special case of the model where (A) 22 = (B) 22 . With this approach, only one marker-based relationship matrix would have to be created and there would be no need to know the breed of origin of alleles. However, the adjustment of the marker-based relationship matrix to be compatible to the pedigree-based relationship matrix becomes more complicated when both breeds are considered at the same time and, as mentioned, this model is less sophisticated than the model developed in this paper.
More complicated crossbreeding systems with three breeds (mating crossbred AB animals with purebred C animals) or four breeds (mating breeds A and B, mating breeds C and D, and finally mating the two groups of crossbred animals AB with CD) are typically used in pig and chicken production. The three-breed crossbreeding system was studied using pig data by Ibanez et al. [33], using the Garcia-Cortes and Toro [20] decomposition of the relationship matrix, and assuming breeding values for purebred and crossbred performances were identical. However, an extension of the Wei and van der Werf model to the three-breed crossbreeding system can be formulated as follows (only the vector containing genetic effects for terminal crossbreds is shown), where the genetic terms c (AB)C can be incorporated into the residual error, and the three breed crossbreeding model can be formulated using three breed-specific partial relationship matrices. Extending the three-breed crossbreeding model to include observed marker genotypes is currently been investigated.
The model presented in this paper is an additive genetic model (in the sense that it considers and estimates substitution effects), but in practice it may capture both additive gene actions and partly dominant gene actions. Using purebred pig data, Su et al. [34] showed that when an additive genomic model was extended to explicitly incorporate dominance genomic effects, improved accuracies of predictions of both total genetic values and breeding values were obtained. Using simulated data from crossbred animals, Zeng et al. [9] showed that an increased response to selection was obtained with a genomic model with dominance genetic effects compared to an additive genomic model. Lo et al. [35] extended the Wei and van der Werf model to include dominance genetic effects and this model has been used in several studies on real data [36,37]. The formulation of that extension is based on extending the reduced model of Wei and van der Werf (see the Methods section) by incorporating a dominance genetic effect for the purebred phenotypes and a fullsib family effect for the crossbred phenotypes. Similar to the reduced model, this model formulation does not directly contain individual genetic effects for crossbred animals and is, therefore, not well-suited for incorporating genomic information on crossbred animals. A markerbased dominance relationship matrix was proposed by Su et al. [34], but this would need to be extended to a combined dominance relationship matrix, and further extended to a crossbreeding system. Extending the model in this paper to contain dominance genetic effects would be an interesting topic for future research.

Conclusions
A method for genomic evaluation of both purebred and crossbred performances was developed for a two-breed crossbreeding system. The method allows information from crossbred animals to be incorporated in a coherent manner for such crossbreeding systems.

Appendix A
In this appendix, we present the explanation behind the adjustment of the marker-based partial relationship matrix. Marker-based relationships, with allele frequencies equal to the observed ones, reflect relationships relative to the genotyped animals, whereas pedigree-based relationships are relative to the base population of the pedigree. The idea behind the adjustment of the markerbased partial relationship matrix is to translate relationships to become relative to the base population of the pedigree, instead of being relative to the given set of animals, as suggested by Powell et al. [38], and which is also the idea behind the adjustment in Christensen et al. [17].
For a given set of animals (purebred and crossbred) and a given breed, let us assume that breed-specific gametes are randomly assigned to animals (purebred animals receive two gametes each, and crossbred animals http://www.gsejournal.org/content/46/1/23 receive one gamete each), and let α be equal to twice the gametic relationship coefficient. The partial relationship matrix for these animals, A p , has entries and can therefore be written as with submatrices corresponding to purebred and crossbred animals, 1 being a vector of ones and I the identity matrix (with sub-indices denoting the dimension: n 1 equal to number of purebred animals and n 2 number of crossbred animals). The matrix I n 1 0 0 1 2 I n 2 , would be a partial relationship matrix when gametes are unrelated (α = 0), and therefore the partial relationship matrix relative to the given set of animals. Hence, the formula (7) shows how relationships relative to the given set of animals are related to relationships relative to the base population of the pedigree. Therefore, it provides a formula to translate a marker-based relationship matrix (with allele frequencies being the observed ones) to have the same base population as the pedigree-based relationship matrix. As in Christensen et al. [17], we substitute β for 1 − α/2 to incorporate the scaling parameter s A in (5).