### Deregression of national evaluations

Traditional national EBVs (

) are often computed by animal model methods [

9] and for a single trait (e.g. milk yield) can be represented approximately using a vector of daughter deviations (

**y**), a diagonal matrix containing daughter equivalents (

**D**), an additive relationship matrix (

**A**), and a variance ratio (

*k*) as:

Genomic EBVs (

) within each country can be represented approximately by replacing the pedigree relationships from

**A** by the genomic relationship matrix (

**G**), giving

Matrix **G** can be computed from genotypes as a quadratic form and can also include polygenic variation from **A** that is not linked to the markers [10]. Ratio *k* is a function of heritability (*h*
^{2}) and was defined as
by [9]derivation or as
by Fikse and Banos [11], with mate breeding values assumed known or unknown, respectively. Elements of **D**, known as daughter equivalents or effective daughter contributions, must match the definition of *k*.

For traditional MACE, elements of
and pedigree files are provided to Interbull, and elements of **y** are backsolved from these. In the simplest case, **y** could be obtained by pre-multiplying
by **D**
^{-1}(**D**+**A**
^{-1}
*k*). However, vector
should contain solutions from all ancestors including unknown parent groups, but some are not included in the exchange formats, and the MACE model also includes an additional fixed effect of the country mean, all of which must be solved using either iterative or other methods. Elements of **y** equal 0 for the ancestors and group effects because these are not observed directly, and matrix **A**
^{-1} contains coefficients that link animals with observations to ancestors and unknown parent groups.

For genomic MACE (GMACE), diagonal matrix

**D**
_{
g
}can contain the extra daughter equivalents from genomic data. Diagonals of

**D**
_{
g
}can be calculated in at least three ways (

**D**
_{g1},

**D**
_{g2}, and

**D**
_{g3}). The first method calculates diagonals of

**D**
_{g1 }from the difference between genomic reliability (

*REL*
_{
g
}) and traditional reliability (

*REL*) for each bull simply as

The second method obtains elements of **D**
_{g2 }by reversing standard reliability formulas like those of Misztal and Wiggans [12] such that the diagonals of the matrix (**D**+**D**
_{g2}+**A**
^{-1}
*k*
^{-1}) equal or approximate the diagonals of (**D**+**G**
^{-1}k^{-1}).

The third method is the simplest and sets all diagonals of **D**
_{g3 }equal to the same constant. When **G** becomes too large for inversion, this simple strategy will still be affordable. Traditional *REL* expressed as decimals rather than percentages are summed and reliabilities of the corresponding parent averages (*REL*
_{
pa
}) are subtracted for all genotyped animals. This result is multiplied by variance ratio *k* and divided by factor *n* to determine average daughter equivalents from genomic data. A value of *n* equal to 1500 for Holsteins, 1200 for Brown Swiss, and 700 for Jerseys is used to match estimated reliabilities to those observed from truncation studies in US breed evaluations [13]. An interpretation of *n* is the number of high reliability bulls needed to obtain 50% *REL*
_{
g
}, and a larger *n* is needed for breeds with greater effective population size [14].

Equality of approximate and published genomic reliabilities is an advantage of the second method. If the first or third method is used in GMACE, *REL*
_{
g
}will be biased upwards for genotyped animals with many relatives because genomic information in **D**
_{
g
}is counted twice, once directly and once via relatives.

Matrix

**G** is not expected to be available to Interbull for the Holstein breed, whereas vector

is available. In North American evaluations,

**G** is already a 30,000 × 30,000 dense matrix and is rapidly growing larger. Let

**y**
_{
g
}contain deregressed evaluations derived from the national

, which includes both the traditional and the genomic information. Vector

**y**
_{
g
}is obtained from

using equations

The equations are solved iteratively because elements of **y**
_{
g
}equal 0 for unknown parent groups whereas corresponding elements of
must be estimated. As was the case for national models, **D** and **D**
_{
g
}must now match the international definition [11] used for variance ratio *k*, which may or may not be the same definition that was used nationally [9]. Matrix **A**
^{-1} distributes the genomic information in **y**
_{
g
}to close relatives in the same way that phenotypic information is distributed.

Genomic estimated breeding values (GEBV) can be decomposed into the parent average (

*PA*), the deviation of traditional EBV from PA (estimated Mendelian sampling), and the deviation of GEBV from EBV (additional genomic information):

The total daughter equivalents (

*DE*
_{
total
}) can be similarly partitioned into:

Furthermore, the extra daughter equivalents from genomics (

*DE*
_{
gen
}) can contain daughter equivalents from foreign daughters used to estimate SNP effects that are not included in the domestic daughter count

*DE*
_{
dau
}. The traditional reliability from domestic daughters (

*REL*
_{
dau
}) is

Deregression uses matrix algebra, but can be represented approximately for bull j as division by

*REL*
_{
dau
}to obtain the original daughter average before regression. The approximate formula

*EBV* = (

*REL*
_{
dau
})

*y*
_{
j
}+ (1-

*REL*
_{
dau
})

*PA* can be rearranged to solve for

*y*
_{
j
}as:

Variance of vector

**y** is partitioned into additive relationship matrix

**A** and diagonal matrix

**D**
^{-1} containing variance of residuals:

Diagonals of **D**
^{-1} for each bull are
or equivalently
.

### Exchange of genomic estimated breeding values

Traditional MACE combines information from domestic and foreign relatives to increase reliability. Information from daughters contributes directly to

**D** and

**y** whereas information from ancestors and sons contributes indirectly through

**A**
^{-1}. MACE equations are very similar to those used for deregression with the following exceptions: diagonals and

**y** from all countries are stored together in the same vector, genetic correlations across countries are accounted for using the Kronecker product of

**A**
^{-1} with the genetic covariance matrix inverse (

**T**
^{-1}), use of

**T**
^{-1} instead of

*k* requires dividing the diagonals of

**D** by

, and vector

includes an

*EBV* for each bull on each country scale obtained using equations:

Genomic MACE includes genomic information by applying deregression to national GEBV instead of EBV to obtain elements of

**D** +

**D**
_{
g
}and

**y**
_{
g
}. Vectors and matrices are extended to include data from multiple countries, and vector

includes international GEBVs on each country scale obtained using equations

If any countries have used foreign data to estimate marker effects, then errors in **y**
_{
g
}are no longer independent and should be modelled using the more general matrix **R** instead of **D** + **D**
_{
g
}. Approximate formulas to compute **R** are proposed in the next section.

### Correlations among national evaluations

Exchange of genomic data between countries introduces additional correlations among their national evaluations that need to be modelled in GMACE. Residual effects can be correlated with residuals in other countries for two reasons: 1) multiple evaluation centers may include genomic and phenotypic data from foreign animals in national estimates of marker effects, and 2) genomic predictions act as repeated measures of the same portion of genetic merit rather than independent measures of genetic merit, especially for major gene marker(s). As an example of 1), marker effects in Canada and the United States may be highly correlated because the countries share genomic data and include MACE evaluations as input to the genomic equations in each country. As an example of 2), multiple countries could each test a bull for DGAT1, a gene with major effects on milk yield and components [15], and these repeated tests in different countries would not provide independent information about the bull's total breeding value.

Residuals are independent in traditional MACE because each daughter is measured in only one country, but may be correlated in GMACE for the reasons described above. In genomic MACE, diagonals of

**R** should be

and off-diagonals can be non-zero due to residual correlations that depend on the ratio

in each country. Correlations are nonzero when more than one country submits GEBV for the same genotyped bull. Let d

_{1} and d

_{2} be the ratios

in country 1 and country 2, respectively, and let c

_{12} be the fraction of genotyped bulls in common. For countries that share all genotypes, c

_{12} may be 1 whereas c

_{12} may be close to 0 for country pairs that only include genotypes of domestic bulls. The correlation of residuals e

_{1} and e

_{2} may be approximated using the additive genetic correlation, the fraction of common bulls, and the proportions of genomic information as:

The genetic correlation corr(a

_{1}, a

_{2}) between true breeding values (BVs) in countries 1 and 2 is routinely estimated by Interbull and acts as an upper limit for the residual correlation corr(e

_{1}, e

_{2}) because marker effects differ in different environments, just as BVs differ. MACE equations may need just a few changes to accommodate GEBV. A bull's diagonal in country

*i* (

**R**
_{
ii
}) depends as above on

instead of only

:

Off-diagonals for the same bull in country

*i* and

*j* (R

_{ij}) are obtained by multiplying corr(e

_{i}, e

_{j}) by

, giving:

### Simulated genotypes

A world population was simulated and evaluated to test the ability of multi-country methods to combine information from genotypes or GEBV computed separately within each country. Genotypes and phenotypes were simulated using pedigrees and reliabilities for all 8,073 proven Brown Swiss bulls in the April 2009 Interbull file. Genotypes and true BV for another 120 young bulls born and sampled in the United States with no progeny records yet were simulated to test the predictions. Brown Swiss genotypes were simulated because Interbull is conducting research with actual genotypes for this breed.

Genotypes for 50,000 markers and 10,000 QTLs were simulated using the same methods as VanRaden [10]. Markers and QTL were in equilibrium in the earliest generation and transmitted to descendants with recombination from crossovers on 30 chromosome pairs. To make QTL effects correlated across countries, independent normal effects within each country were multiplied by the Cholesky decomposition of the genetic correlation matrix among countries. Then, QTL effects were transformed from standard, normal distribution (*z*) to heavy tailed distribution (*q*) using *q* = *z* (1.9)^{(abs(z)-2) }such that the largest *q* explained 1-4% of genetic variation. Genetic correlations in the simulation were set equal to official estimates from Interbull [16]. Official correlations differ from correlation estimates due to post-processing to ensure positive definiteness and averaged about 0.90 but were lower for New Zealand than for the other countries.

Phenotypes equalled true BVs plus an error with variance determined from each bull's *REL* for protein yield. The 10,000 QTL effects were summed to obtain true BV. Only one replicate was simulated to demonstrate the computations. For both proven and young bulls, observed reliabilities were computed as squared correlations of estimated with true BVs on all nine country scales.