### Estimation of genetic correlation between breeds using genomic information

Varona et al. [

12] suggested that the SNP effects could be modeled assuming that there is a genetic correlation of SNP effects across breeds. These authors modeled breeding values (

**u**) as a sum of marker effects (

**g**) so that

**u=Zg** and ordered by breed (breeds 1 and 2 for illustration):

Marker effects were assumed to have a multivariate distribution:

Where **I** is an identity matrix of order equal to the number of SNP markers and **B** is a 2 x 2 breed covariance matrix for SNP effects.

VanRaden [

13] (and also [

14]) showed how models that assume normality of marker effects (the so-called “BLUP-SNP”, [

15]) can be transformed into equivalent BLUP animal models (usually known as

**G**BLUP) that use a “genomic” relationship matrix, usually termed

**G**, rather than a pedigree-based relationship matrix. Matrix

**G** is an estimator of the “true” proportions of genes that are identical by descent between individuals [

16,

17]. Based on this equivalence, the model by Varona et al. [

12] can be transformed into the following model:

Where **u**
_{breedi} is a vector of genomic breeding values (GBV) for breed i, **G** is a matrix of genomic relationships (animals in all breeds), and **G**
_{0} is a matrix of variances and covariances associated to GBV in each breed for a given trait. This model is, thus, a multiple-trait model with two “pseudo-traits”, reflecting the breeding value for the trait in breeds 1 and 2. This model resembles the MACE model [18] in which the breeding values of each bull in different countries are seen as different, correlated traits. In this model, the genetic distance (for each trait) between breeds is quantified by the genetic correlations between **u**
_{breed1} and **u**
_{breed2} (similar to the genetic correlations across countries in MACE). Note that if *σ*
_{
u1}
^{2} = *σ*
_{
u2}
^{2} = *σ*
_{
u1,2}, the model reduces to the regular GBLUP model as used, for instance, by Hayes et al. [6] or [9]. In addition, if *σ*
_{
u1,2} = 0, the model reduces to two independent GBLUP models, one for each breed. In addition to the theoretical appeal, one advantage of a multi-trait GBLUP model is the possibility of using standard estimators and existing software to predict breeding values and estimate variance components.

### Models

In all analyses, a given trait (i.e., milk production) was considered a different trait for each breed. To avoid confusion, these will be referred to as traits (milk production, fat content, fertility) and as scales (breeding values on the M, N or H scale).

In the first analysis, we estimated genetic variances and correlations between breeds and the heritability of each trait and each breed using a combined data set including the three breeds. For computational reasons (see later), instead of the multiple-trait model (MTM), an almost equivalent Random Regression Model (RRM) was used, similar to that used by [

11]. The general equation for this model was:

Where,

**y** is a vector of 2*DYD;

**X** is a matrix that allocates each DYD to a breed and

**b** a vector of average breed effects;

**W**
_{i} are design matrices allocating DYD to GBV (

**u**
_{M},

**u**
_{N} and

**u**
_{H}) for the M, N and H scales consecutively. For example, the equation corresponding to bull

*t* of breed H, will have a value of 1 in the (

*t*,

*t*) position of

**W**
_{H} and 0 in

**W**
_{M} and

**W**
_{N}, since no bull has daughters in several breeds. Vector

**ε** is a vector of uncorrelated random normal pseudo-errors (“pseudo”, because they include Mendelian sampling effects of the daughters and part of the breeding value of the mates). Homogeneous pseudo-error variances were assumed across breeds. The co(variance) structure for GBV,

**u**
_{i,i=M,N,H}, for one trait was:

Where, **G**
_{
0
} is the matrix of (co) variances of GBV in each of the three scales, M,N,H, for a given trait, named as genetic (co)variances henceforth; **G** is the genomic relationship matrix relating animals of the same and different breeds. The correlation of GBV in different scales for a given trait is denoted by r_{g}, and it will be named as genetic correlation between breeds henceforth.

Matrix

**G** was created as in VanRaden [

13]:

Where **Z** is a centered incidence matrix of genotype covariates (0/1/2); 2 ∑ p_{i} q_{i} is a scaling parameter in which p_{i} and q_{i} are the allelic frequencies for SNP i (i = 1: 43852), which were computed across breeds; **I** is an identity matrix (included in order to make **G** invertible). Matrix **I** could have been replaced by **A**, following VanRaden et al. [20], but this is not expected to affect results considering the low weight assigned to **I**.

To implement this model, the regular relationship matrix was replaced by **G** using facilities in the Blupf90 series of programs [21, 22]. Variance and covariance components in the RRM were estimated using Bayesian procedures via Gibbs sampling by the Gibbs2f90 program [23]. Moreover, estimates of genetic correlations between breeds were computed from the corresponding estimates of the genetic (co)variance components. The interest in using the RRM with Gibbs sampling rather than, e.g., REML or a multiple-trait model, was the fact that, on one hand, the relationship matrix needed to be stored just once (in contrast to regular REML, for instance), and on the other hand, no “data augmentation” of missing traits was needed with the RRM, in contrast to using regular Gibbs sampling with a multiple-trait model. Both of these resulted in large reductions in computing time and memory requirements. For instance, storing **G** (which is a 6330 x 6330 dense matrix) for the MTM would take nine times as much space.

The Gibbs sampler was run for a total of 20 000 iterations. The first 4000 iterations were discarded as burn-in. Convergence was checked visually and by the Geweke diagnostic of the Markov chain [24]. Posterior means of genetic variances for each trait and for each breed and of the correlation between breeds were computed. After the parameters were estimated, the (co)variances in the model were fixed at their estimates and the RRM was used in a GBLUP analysis to estimate the GEBV of all genotyped candidates in the validation dataset.

In a second set of analyses, BLUP with a multi-breed genomic relationship matrix (GBLUP) was applied to estimate the GBV of all genotyped bulls using the following MTM:

Where **y**
_{i} is a vector of 2*DYD for breed i = {M,N,H}. In this model, each record is allocated to its breed-specific effects and breeding values.

The covariance structure of

**u** was as for the RRM and estimated (co)variances obtained with the RRM were used in the MTM to estimate the corresponding GEBV. However, in the MTM, different residual variances for each breed were used:

Because 2*DYD is pre-corrected data, its pseudo-residual variance is not the same as the actual residual variance. Thus, we used *σ*
_{
ɛ,i
}
^{2} = (4*σ*
_{
ɛ,i
}
^{*2} + 2*σ*
_{
u,i
}
^{*2}), where the *σ*
^{*} indicates values from routine genetic evaluations for these breeds (S. Fritz, UNCEIA, Jouy-en-Josas, personal communication).

In both models (RRM or MTM), EDC were used as weighting factors and the GEBV were computed using BLUP90iod2 modified by Aguilar et al. [21].

### Accuracy of GEBV

For each model (MTM and RRM), three scenarios for the genetic correlation between breeds, r_{g}, were assumed to compare the accuracy of GEBV. In scenario 1, r_{g} was set to zero to simulate a situation where breeds were uncorrelated, which is equivalent to performing single-breed evaluations. In scenario 2, the estimated value for r_{g} was used and in scenario 3, r_{g} was set to 0.95, which is equivalent to the assumption that the population is close to homogenous (r_{g} = 1) [6, 9].

Accuracy and bias of the GEBV were assessed in the validation datasets, separately for each breed, by the coefficient of determination (R^{2}) and the estimated linear regression coefficients, *δ*
_{0} (intercept) and *δ*
_{1} (linear term) of the linear regression of 2*DYD on GEBV, weighted by the corresponding equivalent number of daughters (EDC), respectively.