A common reference population from four European Holstein populations increases reliability of genomic predictions

Background Size of the reference population and reliability of phenotypes are crucial factors influencing the reliability of genomic predictions. It is therefore useful to combine closely related populations. Increased accuracies of genomic predictions depend on the number of individuals added to the reference population, the reliability of their phenotypes, and the relatedness of the populations that are combined. Methods This paper assesses the increase in reliability achieved when combining four Holstein reference populations of 4000 bulls each, from European breeding organizations, i.e. UNCEIA (France), VikingGenetics (Denmark, Sweden, Finland), DHV-VIT (Germany) and CRV (The Netherlands, Flanders). Each partner validated its own bulls using their national reference data and the combined data, respectively. Results Combining the data significantly increased the reliability of genomic predictions for bulls in all four populations. Reliabilities increased by 10%, compared to reliabilities obtained with national reference populations alone, when they were averaged over countries and the traits evaluated. For different traits and countries, the increase in reliability ranged from 2% to 19%. Conclusions Genomic selection programs benefit greatly from combining data from several closely related populations into a single large reference population.


Background
Genomic predictions rely on linkage disequilibrium between Single Nucleotide Polymorphisms (SNP) and polymorphisms in genes with effects on traits of interest. Linkage disequilibrium induces associations between SNP genotypes and phenotypes. SNP effects can then be estimated and combined to form genomic predictions. The accuracies of estimated SNP effects are expected to increase with the number and accuracy of available phenotypes. Therefore, the reliability of genomic predictions increases with the size of the reference population (RP) from which the relationship between phenotypes and SNP markers is determined [1,2]. Currently, a RP generally consists of genotyped and progeny tested bulls [1,2]. Because of the importance of the size of the RP, US and Canadian RP have been combined and it has been reported that exchanging data from reference populations is beneficial [3,4]. In European countries, the size of national Holstein RP is moderate, compared to that of the combined North American RP. In September 2009, four regional breeding organizations: UNCEIA (France), VikingGenetics (Denmark, Sweden, and Finland), DHV-VIT (Germany) and CRV (The Netherlands, Flanders) created a combined RP by contributing each 4000 bulls. The resulting enlarged joint European RP is expected to increase the reliabilities of genomic predictions considerably.
This study reports on the preliminary steps necessary to combine these four RP into a single one. It also assesses to what extent the combined RP improves genomic predictions by comparing the reliabilities of genomic predictions obtained with the combined and individual RP.

Joint genomic dataset
The joint dataset, hereafter called the EuroGenomics data, comprised 15966 progeny tested bulls. The distributions of the bulls in relation to birth year are plotted in Figure 1. Bulls provided by DHV-VIT and UNCEIA were predominantly born between 1999 and 2004, whereas those provided by VikingGenetics and CRV were predominantly born before 1999. Overall, the 15966 bulls had 19.4 million daughters, with 1389 bulls having more than 1000 daughters and 939 bulls having daughters in multiple countries. The average number of daughters per bull was 117, 85, 117 and 153 for bulls provided by DHV-VIT, UNCEIA, VikingGenetics and CRV, respectively.

Imputation of genotypes across SNP chips
Genotypes provided by CRV were obtained using two versions of a custom 50 K SNP chip. They shared from 10 to 17 K SNP with the commercial Illumina Bovi-neSNP50 chip [5] that was used to genotype the bulls of the three other partners. SNP genotypes unique to each chip were imputed by genotyping 972 influential bulls with both SNP chips, and applying a combination of programs, including DAGPHASE [6] and Beagle [7]. An independent cross-validation within the 972 genotyped bulls indicated that SNP genotypes were imputed with less than 1% error [8].

Reference and validation data
Each partner validated its own bulls using the national RP and the EuroGenomics data. Deregressed proofs (DRP, [9,10]) calculated from EBV on the scale of the target population obtained from Interbull 2010-01 Multiple Across Country Evaluation (MACE) [11]) were used to predict and validate genomic predictions (GBV) of domestic bulls for three populations; for French Holsteins, daughter yield deviations (DYD) from the October 2009 national evaluation were used, because QTL mapping was already performed using these data. The national RP and EuroGenomics data were divided into reference and validation datasets by choosing a cut-off date for the birth date of bulls, so that approximately the 25% youngest national genotyped bulls were in the validation dataset. Records were included into the RP if the DRP/DYD had an effective daughter contribution (EDC) [12] of at least 20. A previous study [13] showed that reliabilities of genomic predictions for bulls whose sires were included in the reference population were much higher than for bulls without sires included. The   0   100   200   300   400   500   600   700   800   900   1000   1985  1986  1987  1988  1989  1990  1991  1992  1993  1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005 birth year proportion of bulls with their sires in the reference population differed among the four populations. Thus, to make results comparable, only the bulls whose sires were in the national RP were included in the validation data. In Germany, this criterion led to a significant decrease in the number of validation bulls. Thus, in order to increase the validation dataset for the German predictions, the German validation data included all bulls whose sire was included in the Eurogenomics RP when predictions were based on the EuroGenomics RP. The numbers of animals in the reference and validation datasets are in Table 1 for Denmark, Sweden and Finland (DFS), in Table 2 for Germany (DEU), in Table 3 for The Netherlands (NLD) and in Table 4 for France (FRA). Analyses were carried out for protein yield, udder depth, somatic cell score (SCS), and for female fertility as non-return rate (NRR) or interval from calving to first insemination (ICF).

Genetic correlation between countries
The degree of genetic correlation for a given trait between countries reflects the importance of genotype by environment interactions. Table 5 shows for each population and each trait, the average genetic correlation with the three other populations, as obtained from INTERBULL [14]. These genetic correlations differed among countries and among traits. Among the traits studied here, udder depth had the highest genetic correlation between countries (0.98 on average), followed by protein yield (0.88) and SCS (0.88). Fertility had the lowest genetic correlation (0.70).
The average genetic correlation of one country with the three other countries was highest for DFS and DEU (0.89), followed by FRA (0.85) and NLD (0.83).

Statistical models
The four partners applied different genomic prediction models. The Nordic and German genomic predictions were obtained with a mixed linear model with random regression on coefficients of SNP genotypes, assuming equal variance of SNP effects over markers [15]. The Dutch/Flemish genomic predictions used a Bayesian mixture model for SNP effects, along with polygenic effects [16], assuming that most SNP had small effects and a few SNP had moderate or large effects and the French genomic predictions used a mixed linear model with a polygenic effect and random haplotype effects across the genome [17]. Included haplotypes were identified in an initial QTL detection step using LDLA [18] on the national RP. The QTL detection was carried out also with the EuroGenomics RP, but due to time constraints, the detection procedure used hidden states obtained from the Dualphase [6] software. Hence, two lists of QTL differing by the RP in which they were detected were used to estimate haplotype effects for the prediction models using the French or EuroGenomics RP, respectively. In all French analyses, 40% of the genetic variance was assumed to be explained by polygenes and 60% by markers. In all the models described above, the weighting factor, w = r 2 /(1-r 2 ), was applied to account for heterogeneous residual variances due to different reliabilities of DRP (r 2 ) or DYD.

Validation criteria
Derivation of the GBV used for validation differed between partners. The Nordic validation was based on direct estimated genomic breeding values (DGV), as obtained from the genomic prediction model. The German validation combined DGV of the genotyped bulls and EBV of all available progeny-tested bulls to obtain a genomically enhanced breeding value (GEBV) using the approach reported by Ducrocq and Liu [19]. GBV in the Dutch/Flemish and French validations resembled GEBV, since their models included polygenic effects. The reliability of GBV (i.e. DGV or GEBV) was measured as the squared weighted correlation divided by the weighted mean of DRP (or DYD) reliabilities. The slope and intercept of weighted regressions of DRP on GBV for bulls in the validation dataset were also used to assess unbiasedness of the genomic predictions. The weights for these analyses were the same as those used for genomic prediction, but standardized such that the mean weight equals 1. In addition, reliability of the pedigree index (PI) for bulls in the test datasets was calculated using the data of bulls born before the cut-off date to divide

Expected gains in reliability
Realized gains in reliability when the national RP was extended to the EuroGenomics RP were compared to the gains expected based on equations derived in Goddard and Hayes [20]. Factors such as the size of the national RP, the size of the EuroGenomics RP (which varies between populations), the average genetic correlations between traits measured within one country and in the other countries, and the reliability of DRP were taken into account.

Reliability of DRP in the national and the EuroGenomics datasets
Reliabilities of DRP (or DYD in the case of France) in the reference dataset reflect the amount of phenotypic information available for each genotyped bull ( Table 6).
Although the heritability of SCS was much lower than that of protein yield and udder depth, the reliability of DRP for SCS was similar. Reliability of DRP for fertility was significantly lower than for the other traits, which is consistent with its very low heritability. Fertility is also the trait for which the reliability dropped most from the national RP to the EuroGenomics RP because the correlation between fertility traits among countries is lower than for the other traits. Reliabilities of DRP in the EuroGenomics reference data were generally lower than those in the national reference data. The difference in DRP reliabilities between the national and EuroGenomics data reflects the fact that genetic correlations between countries were less than one. Thus, the difference in DRP reliabilities between two datasets was largest for fertility.

Nordic validation
For the DFS reference population, substantial increases were observed in REL G-PI , when using the EuroGenomics data instead of the national data (Table 1). On average, the reliability of DGV was 20% higher than the reliability of PI in the DFS reference population. The average increase in REL GBV-PI obtained by going from the national to the EuroGenomics data was 11%. The largest benefits from using the EuroGenomics instead of the national data were observed for protein yield, udder depth and SCS. The coefficients of regression of DRP  *ICF: interval between calving and first insemination; REL PI : r 2 (PI, DRP) /REL DRP , DRP is deregressed proofs in the validation dataset, REL DRP is the average reliability of DRP; REL GBV-PI : the difference between REL GBV and REL PI , where REL GBV is r 2 (GBV, DRP) /REL (DRP) ; b 0 is intercept and b 1 is regression coefficient of DRP on GBV; σ g is the genetic standard deviation on DGV ranged from 0.82 to 1.08, and the intercepts were between -1.02 and 2.80 genetic standard deviation units.

German validation
Averaged over all traits, the reliability of GEBV from the German RP was 21% higher than the reliability of PI 1 ( Table 2). The smallest increase was observed for NRR. The reliability of GEBV from the EuroGenomics data was 32% higher than the reliability of PI 2 . REL GBV-PI from the EuroGenomics data averaged over all traits was 11% higher than REL GBV-PI from the national reference dataset. The coefficients of regression of DRP on GEBV varied from 0.83 to 1.01, and the intercepts ranged from -0.16 to 0.29 genetic standard deviation units.

The Dutch/Flemish validation
REL GBV-PI computed from the EuroGenomics data were on average 8% higher than those from the national data ( Table 3). Reliabilities of GEBV were on average 20% higher than reliabilities of PI. In line with the Nordic validation, the largest benefits from using the EuroGenomics data were observed for protein yield, udder depth and SCS. The coefficients of regression of DRP on GEBV were around unity (0.94 -1.06). In genetic standard deviation units, the intercepts ranged from -0.06 to 0.10.

French validation
The reliability of GEBV was significantly higher than the reliability of PI for all traits (Table 4). Averaged over the four traits, the reliability of GEBV obtained from the EuroGenomics data was 9% higher than that from the national data. The latter was 20% higher than the reliability of PI. The coefficients of regression of DRP on GEBV were between 0.79 and 0.98; the intercepts were in the range of -0.07 to 0.25 genetic standard deviation units.

Realized and expected gains in reliabilities from enlarged reference data
Realized and expected gains in reliabilities of genomic predictions when going from national to EuroGenomics data varied between traits and populations ( Table 7). Expected gains increased over traits from fertility (lowest), protein yield, SCS to udder depth. Averaged over the four populations, the realized gains followed the same order, except for protein yield, which ranked second for expected gain but realized the lowest gain. This low outcome was observed in all the populations, except for DFS. For udder depth, high gains were generally achieved, especially for DEU and NLD. For SCS, the increase was generally high and was larger for DFS and DEU than for NLD and FRA. For fertility, DEU and FRA achieved larger gains than DFS and NLD.
Averaged over traits, the expected gains by population increased in the following order: NLD, DFS, FRA, and DEU. The order of the realized gains was the same, except for FRA, which had the second largest expected gain, but only ranked third highest in realized gain.

Discussion
Combining reference datasets, the reliability of genomic predictions, averaged over four populations and four traits, increased by 10% compared to genomic predictions using national RP alone. This demonstrates the benefit of combining four European Holstein RP into a single EuroGenomics RP. The size of the RP is one of the most important factors affecting the accuracy of genomic predictions. Currently, the RP generally consists of bulls which have already gone through a progeny test program. Goddard and Hayes [21] demonstrated that even for a trait for which the response variable has a reliability of 0.80 (such as DRP of progeny tested bulls  The magnitude of the expected increases in reliabilities from combining RP varied between the four partners and the four traits. The factors that explain most of this variation are differences in the actual increase in RP size and differences in reliabilities of DRP/DYD based on national and EuroGenomics data. The differences in reliabilities of foreign DRP are a consequence of differences in genetic correlations between countries (reflecting genotype by environment interactions), and differences in heritability and the number of daughters in the DRP. In general, the observed increases in reliabilities from combining RP were in line with the expected values (Table 7).

Different gains among countries
The average increase in reliability of genomic prediction was 11% for DEU, 11% for DFS, 9% for FRA and 8% for NLD. This trend was consistent with expectations, except for France, which had the highest expected gain but only the third highest realized gain. The main factor generating the differences in the expected increase in reliability was the increase in the number of bulls in the reference populations. The cut-off points for dividing the EuroGenomics data into the reference dataset and the validation dataset differed between the four partners in order to meet the requirement that the size of the validation data should be about 25% of that of the national dataset. This was due to large differences in the age distribution of bulls in the different populations. Consequently, the differences between the size of national and EuroGenomics RP varied considerably (Tables 1, 2, 3 and 3). This led to increases in the size of RP reaching 10736, 7727, 9007 and 6073 for DEU, DFS, FRA and NLD, respectively. The expected gain was similar between DFS and FRA even though the RP increased more for FRA. One explanation is that FRA had the lowest average trait genetic correlations with the other three countries. The average genetic correlation between France and the other partners was only 0.57 for fertility. This is a consequence of FRA using CR rather than the NRR that is used by the other partners. These correlations are directly related to the accuracy of DRP of foreign bulls on the national scale, which is causing different gains in reliabilities among the countries. The increase in reliability deviated most from expectations for France, where the gain was less than expected. France uses the most complicated procedure to predict GBV, including a QTL detection step and inclusion of haplotypes for which a likelihood ratio test exceeds a predefined liberal threshold. This detection step was only performed on the national RP, so the EuroGenomics RP was not exploited to select which marker haplotypes were used in the final model. This is probably the main reason why France does not appear to reach the full potential of using the EuroGenomics versus the national RP.

Different gains among traits
Among the four traits in this study, using the EuroGenomics data improved reliabilities of genomic predictions most for udder depth, followed by SCS, protein yield and fertility. This order of improved genomic predictions is consistent with expectations, with the exception of protein yield. The reason why the largest gain was observed for udder depth (12-19%) is largely due to the very strong genetic correlation between countries (0.98) for this trait. Average genetic correlations between countries were 0.88 for both SCS and protein yield but the average gain in reliability from using the   EuroGenomics data was 11% for SCS but only 6% for protein yield This might be explained by the fact that the reliability of DRP in the EuroGenomics data was much lower than that in the national data for protein yield, while differences in reliabilities were smaller for SCS. In other words, the EuroGenomics data provide more information for SCS but less information for protein yield. Generally, traits with a low heritability are expected to benefit relatively more from a larger reference population. However, in this study a relatively low gain was observed for fertility. The most likely reasons are that fertility had a low genetic correlation (in part due to differences in trait definitions) between countries and that reliability of DRP was much lower in the EuroGenomics data than in the national data. This is reflected in the calculated expectations of increased reliabilities, which is why fertility was also expected to show the lowest increase.
Longevity was not included in the analyses although it is an important trait in all breeding goals, because the definition of longevity differs substantially between countries. Our aim was to study the increase of reliabilities from combining training data for traits with different heritabilities (low for fertility, medium for SCS, and high for udder depth and protein yield) and different ranges of genetic correlations between countries (low for fertility, medium for SCS and protein yield, and highest for udder depth).

Genomic prediction using national reference populations
In the present study, the sizes of the four national reference datasets were almost the same and the reliabilities of DRP were also similar, but prediction models used by the EuroGenomics partners were different. Previous simulation studies e.g. [22][23][24] showed that variable selection models (e.g., BayesB) have a greater predictive ability than models allowing for weaker differentiation of variances among markers (e.g., BayesA), and the latter were superior to linear BLUP models. However, based on real data from dairy cattle, VanRaden et al. [2] reported that the predictive ability of a nonlinear BLUP model (a heavy-tailed prior model) was considerably better than a linear BLUP model for fat percentage and protein percentage, while their predictive abilities were similar for 25 other traits. Cole et al. [25] reported that a heavy-tailed prior (analogous to BayesA) provided a slightly higher GEBV reliability for all nine traits than a finite locus model with heavy tails (analogous to BayesB) and higher than a linear model for fat yield, fat % and protein %. Su et al. [26] reported that a common prior Bayesian model (analogous to BayesA) exhibited a greater predictive ability than a mixture prior Bayesian model (analogous to BayesB) for fertility, udder health and protein yield, but not for fat %. In the present study, DEU and DFS used a linear BLUP model (random regression on SNP), NLD applied a Bayesian mixture model including polygenic effects, and FRA used a mixed linear model including pre-selected haplotypes and polygenic effects. Although applying different prediction models, the gains from genomic prediction over a conventional pedigree index using national reference data were similar between countries. Averaged over the four traits, the reliability of predicted breeding values was increased by 20-21% for the four partners. This suggests that the different models used in this study had a similar predictive ability.

Measure of the reliability of genomic prediction
In this study, reliabilities of DGV, GEBV and PI were measured as the squared correlation divided by reliability of DRP for bulls in the validation data. This measure of reliability is unbiased only if the validation bulls come from a random sample but the bulls in this study were selected on the basis of PI. Directional selection is expected to reduce the correlation between PI (also DGV and GEBV) and DRP. Therefore, the reliabilities reported in this study might underestimate the reliability for a random group of bulls, especially for strongly selected traits. This underestimation could partly explain the difference in the presented PI reliability among the countries, as the selection intensities on the validation data could differ between countries. The amount of underestimation of reliability from the current validation might be similar to the difference (D PI ) between the expected reliability of PI estimated by traditional BLUP based on the whole population and the reliability of PI estimated from the validation-based selected data. Thus, estimates of the reliability of DGV and GEBV for an unselected population are approximately equal to the reported reliability in the current validation plus D PI [2].

Conclusions
This study showed that reliabilities of genomic predictions using EuroGenomics data were considerably higher than those using national reference data alone. The results confirm the importance of the size of reference populations for genomic prediction. A significant improvement of genomic prediction can be achieved through cooperation between countries by combining reference data.