A simple method to separate base population and segregation effects in genomic relationship matrices
 Laura Plieschke^{1}Email author,
 Christian Edel^{1},
 Eduardo CG Pimentel^{1},
 Reiner Emmerling^{1},
 Jörn Bennewitz^{2} and
 KayUwe Götz^{1}
https://doi.org/10.1186/s1271101501308
© Plieschke et al. 2015
Received: 18 December 2014
Accepted: 27 May 2015
Published: 23 June 2015
Abstract
Background
Genomic selection and estimation of genomic breeding values (GBV) are widely used in cattle and plant breeding. Several studies have attempted to detect population subdivision by investigating the structure of the genomic relationship matrix G. However, the question of how these effects influence GBV estimation using genomic best linear unbiased prediction (GBLUP) has received little attention.
Methods
We propose a simple method to decompose G into two independent covariance matrices, one describing the covariance that results from systematic differences in allele frequencies between groups at the pedigree base (G _{A} ^{*} ) and the other describing genomic relationships (G _{S}) corrected for these differences. Using this decomposition and F_{st} statistics, we examined whether observed genetic distances between genotyped subgroups within populations resulted from the heterogeneous genetic structure present at the base of the pedigree and/or from breed divergence. Using this decomposition, we tested three models in a forward prediction validation scenario on six traits using Brown Swiss and dualpurpose Fleckvieh cattle data. Model 0 (M0) used both components and is equivalent to the model using the standard Gmatrix. Model 1 (M1) used G _{S} only and model 2 (M2), an extension of M1, included a fixed genetic group effect. Moreover, we analyzed the matrix of contributions of each base group (Q) and estimated the effects and prediction errors of each base group using M0 and M1.
Results
The proposed decomposition of G helped to examine the relative importance of the effects of base groups and segregation in a given population. We found significant differences between the effects of base groups for each breed. In forward prediction, differences between models in terms of validation reliability of estimated direct genomic values were small but predictive power was consistently lowest for M1. The relative advantage of M0 or M2 in prediction depended on breed, trait and genetic composition of the validation group. Our approach presents a general analogy with the use of genetic groups in conventional animal models and provides proof that standard GBLUP using G yields solutions equivalent to M0, where base groups are considered as correlated random effects within the additive genetic variance assigned to the genetic base.
Background
Genomic selection [1] and estimation of genomic breeding values (GBV) are currently used for many cattle populations. Genomic best linear unbiased prediction (GBLUP) using relationships estimated based on SNPs (single nucleotide polymorphisms) has been established as one of the most prominent methods for practical applications [2]. The question of how and to what extent population subdivision affects the genomic relationship matrix and genomic predictions was not addressed until applications of GBLUP across breeds or in admixed or crossbred populations were proposed e.g. [3–5]. However, several authors have shown that genomic relationship matrices can be used to detect population subdivision and to calculate measures of genetic distances (e.g. F_{st}) [6, 7].
Conventional methods to estimate breeding values consider that animals with unknown parents belong to an arbitrarily defined base population. Members of this base population are assumed to come from a single population with a mean breeding value of 0 and variance σ _{ a } ^{2} . Since this is rarely true in practical applications, many conventional methods to estimate breeding values include genetic groups or phantom parents [8–10] in the model. A more elaborated approach in the context of multibreed evaluations was proposed by GarcíaCortés and Toro [11], who partitioned the elements of the covariance matrix of the additive values into a breedsource term and a segregation term.
In spite of the large number of studies that deal with the use of genetic groups in conventional models, only a few have investigated this issue within the framework of genomic models. Makgahlela et al. [12–14] tested models that accounted for breed effects and compared allele frequencies in subgroups of Nordic Red cattle. They showed that a model that included a fixed breed effect [12, 13] increased the reliability of direct genomic values (DGV) by 2 to 3 % [13] for an admixed Nordic Red population. In a followup investigation, they found that using breed or subpopulationspecific allele frequencies to calculate the genomic relationship matrix (G) did not result in higher validation reliabilities, although accounting for specific allele frequencies in the calculation of G changed the estimated GBV of some individuals considerably [14]. Tsuruta et al. [15] proposed an approach to assign unknown parent groups in onestep GBLUP for US Holstein cattle data. Their approach can be described as an application of the model that fits standard fixed genetic groups within the context of onestep GBLUP. The question of whether and how population subdivision influences the Gmatrix was not addressed.
A simulation study by Vitezica et al. [16] compared five BLUP methods and investigated the effect of selection and genomewide evaluation methods (onestep and multistep) on bias and accuracy of genomic predictions. They examined the problem of unequal genetic levels between genotyped and nongenotyped animals in the onestep GBLUP procedure, where the genomic relationship matrix G and the pedigreebased relationship matrix A are combined. They proposed a correction of G and concluded that onestep estimation with a corrected G results in unbiased estimates of GBV, which have a similar inflation rate and a higher accuracy than estimates obtained with other methods. Christensen [17] presented an alternative approach for onestep models. For admixed populations, he suggested that the pedigreebased relationship matrix should be adjusted by assuming a parametric structure for the relationships between animals in the base population and estimating those parameters. He argued that this approach would be easier to extend and simpler than developing an appropriate method of adjusting the matrix of genomic relationships of genotyped animals across breeds.
The effects of population subdivision on the structure of the genomic relationship matrix G have also been investigated in contexts other than when it is used to estimate GBV. There are numerous studies on the calculation of F_{st} statistics [6, 18] and principal component analysis (PCA), e.g. [19, 20], and corresponding extensions to the Gmatrix [16]. These studies show that it is possible to detect population subdivision with G in the same manner as with A. This means that G includes information about population subdivision and that, in some cases, this information includes the genetic distance between potentially discriminable groups in the base population that is defined by the pedigree. Since base animals are rarely genotyped, these distances cannot be estimated directly. A simple and straightforward method to estimate allele frequencies in the base population was proposed by Gengler et al. [21] and is based on a mixed model approach. In this paper, we estimate allele frequencies in the base of different subpopulations that are present in our datasets and propose a method to separate the genomic relationship matrix (G) into two independent components: a base group (G _{A} ^{*} ) component and a segregation (G _{S}) component. Furthermore, we demonstrate that this decomposition leads to basically identical results as ordinary GBLUP. Finally, we examine models that either ignore the effects of base groups or that consider base groups as fixed effects.
Methods
Material
In total, 7965 genotyped Fleckvieh (FV) and 4257 genotyped Brown Swiss (BS) and 143 genotyped Original Braunvieh (OB) bulls were available for this study. BS and OB data were combined (hereafter called BS/OB, n = 4400) into a single dataset because these two subpopulations actually originated from a single breed. The term Brown Swiss is used to denote the modern Braunvieh, which resulted from an exchange of genetic material between Europe and North America. An OB animal is genetically characterized as a descendant of the old European Braunvieh population, with no or only minor genetic contributions from the reimported US Brown Swiss population. This labelling of OB animals within the European Braunvieh population is not necessarily applied in a uniform manner and small differences in the definition can occur between countries.
Number of animals per defined base group for the BS/OB population
EU_{b}  DE_{b}  AT_{b}  CH_{b}  IT_{b}  US_{b1}  US_{b2}  OB_{b1}  OB_{b2}  

Year  ≤1960  >1960  >1960  >1960  >1960  ≤1955  >1955  ≤1960  >1960 
Number  2093  1482  743  1281  413  489  445  458  398 
Number of animals per defined base group for FV
DE_{b1}  DE_{b2}  DE_{b3}  DE_{b4}  HOL_{b1}  HOL_{b2}  AT_{b}  CZ_{b}  CH_{b}  FR_{b}  Div_{b}  

Year  <1960  ≥1960 < 1970  ≥1970 < 1980  ≥1980  <1960  ≥1960  All  All  All  All  All 
Number  1368  6055  1661  773  528  427  3452  977  183  228  705 
We estimated DGV for three milk traits and three conformation traits from a dataset that was reduced for the last four years of phenotypic data (referred to as the reduced dataset). Daughter yield deviations (DYD) from the GermanAustrian system [22] were used for FV bulls and deregressed MACE (multitrait across country evaluations) proofs from Interbull [23] for BS/OB bulls. Deregression was done using the method proposed by Garrick et al. [24]. Group effects were not accounted for in the deregression. Traits analyzed were milk yield (MY), protein yield (PY), fat yield (FY), stature (STA), feet and legs (FL) and udder conformation (UD). These traits were a priori assumed to have a large genetic trend and/or to show considerable differences between base groups. DGV estimated from the reduced dataset were then compared to DYD and deregressed proofs from the corresponding April 2014 evaluations (current dataset) according to the guidelines of the Interbull GEBV test [25, 26]. In short, the validation group included bulls with no information on the offspring’s performances in the reduced dataset but corresponding information in the current dataset. Current information was assumed to be sufficient for the test when the effective daughter contribution (EDC) [27] based on offspring performances was equal to at least 20. The remaining bulls from 2010 with an EDC of at least 1 were included into the training set (Calib).
Technically, we tested DGV by a weighted regression of current DYD or deregressed proofs of the animals in the validation group on their DGV estimated from the reduced set. The resulting test statistics are the intercept and slope (b) of this regression as measures of bias and the coefficient of determination (R^{2}) of this regression as a measure of the reliability of the DGV. The R^{2} values were corrected for the uncertainty in DYD, as proposed by [28], i.e. they were divided by the average reliability of the DYD of validation bulls.
Number of animals per validation group for the BS/OB and FV populations and the seven traits considered
Training set  Validation set  

DEA  others  OB  
BS/OB  MY  3262  416  346  8 
PY  3262  416  346  8  
FY  3262  416  346  8  
STA  3535  464  350  51  
FL  3551  461  345  43  
UD  3550  458  349  43  
DEA  others    
FV  MY  5276  2589  97  
PY  5276  2581  97    
FY  5276  2581  97  
STA  5956  2264  139    
FL  5956  2272  139  
UD  5956  2272  139   
Decomposition of G
Conceptually, this manipulation is equivalent to columnwise centering of C if current allele frequencies are used and if each marker is in HardyWeinberg equilibrium in the genotyped population.
Similar to conventional estimation of GBV, base animals can be grouped according to known or assumed population subdivisions and/or generations, when additional differentiation due to considerable genetic trend has to be taken into account. To estimate base groupspecific allele frequencies, matrix 1 in Equation (6) is replaced by matrix Q. Matrices G _{ T }, G _{S} and G _{A} ^{*} can then be calculated as described above, using estimates for global and groupspecific base allele frequencies and again G _{T} = G _{S} + G _{A} ^{*} , as described above.
Models

Standard model (model 0, M0): X = 1 and V_{uu} = G_{T} × σ _{ u } ^{2} .

Model 1 (M1): X = 1 and V_{uu} = G_{S} × σ _{ u } ^{2} .

Model 2 (M2): X = [1  Q] and V_{uu} = G_{S} × σ _{ u } ^{2} .
Models were tested in forward prediction by means of the test described in the subsection Material. To better understand the factors that influence the predictive ability of a specific model for different validation datasets, we analyzed the matrix of base group contributions (Q) and derived base group estimates, as well as their prediction errors, using M0 and M2. Differences between group effect estimates were calculated and tested by formulating linear hypotheses.
Distance measures
Results
F_{st} statistics
Forward prediction
Results for the coefficient of determination (R^{2}) from the forward prediction for the BS/OB and FV populations for different models
BS/OB  Trait  M0 (G _{ A } ^{*} and G _{ S })  M1 (G _{ S })  M2 (G _{ S } + fixed effects) 

R^{2}  MY  0.416  0.386  0.421 
PY  0.409  0.370  0.417  
FY  0.388  0.349  0.395  
STA  0.499  0.382  0.505  
FL  0.234  0.216  0.220  
UD  0.416  0.394  0.410  
FV  
R^{2}  MY  0.580  0.530  0.557 
PY  0.512  0.463  0.491  
FY  0.548  0.490  0.521  
STA  0.526  0.515  0.516  
FL  0.438  0.425  0.415  
UD  0.406  0.404  0.405 
Results for the intercept (a), slope (b) and its standard error (s.e.) from the forward prediction for the FV and BS/OB populations for different models
Trait  M0 (G _{ A } ^{*} and G _{ S })  M1 (G _{ S })  M2 (G _{ S } + fixed effects)  

BS/OB  a  b (s.e.)  a  b (s.e.)  a  b (s.e.)  
MY  85.551  0.828 (0.035)  87.672  0.813 (0.037)  85.091  0.820 (0.035)  
PY  3.152  0.768 (0.033)  3.221  0.748 (0.035)  3.129  0.765 (0.033)  
FY  3.202  0.762 (0.035)  3.198  0.753 (0.037)  3.178  0.757 (0.034)  
STA  14.934  0.854 (0.029)  −3.706  1.020 (0.044)  18.807  0.817 (0.028)  
FL  1.285  0.979 (0.061)  −4.480  1.032 (0.068)  24.889  0.751 (0.059)  
UD  22.008  0.786 (0.032)  9.036  0.904 (0.038)  30.023  0.711 (0.030)  
FV  a  b (s.e.)  a  b (s.e.)  a  b (s.e.)  
MY  62.576  0.660 (0.019)  76.031  0.582 (0.018)  76.031  0.619 (0.018)  
PY  3.213  0.664 (0.019)  3.914  0.593 (0.019)  3.914  0.644 (0.019)  
FY  2.640  0.734 (0.019)  3.696  0.650 (0.019)  3.696  0.729 (0.020)  
STA  0.046  0.782 (0.024)  0.076  0.774 (0.024)  0.076  0.786 (0.025)  
FL  −0.082  0.900 (0.036)  −0.179  0.878 (0.036)  −0.179  1.021 (0.038)  
UD  −0.013  0.713 (0.033)  −0.031  0.708 (0.033)  −0.031  0.736 (0.040) 
Brown Swiss and Original Braunvieh breeds
For the BS/OB data, we found a minimal advantage in terms of the R^{2} for model M2 that fitted fixed groups. Exceptions were for the traits FL and UD, here the standard random model M0 showed the highest R^{2}. Across traits, R^{2} for M1 was 0.028 to 0.123 lower than that of the best model. Based on results in terms of slope, it should be noted that inflation of genomic predictions was lowest for conformation traits using model M1. For milk traits, the slope was slightly higher and estimates were thus less inflated with the random model M0 than with the fixed model M2.
Fleckvieh breed
Differences in R^{2} between M0 and M2 ranged from 0.001 to 0.021. For all six traits, M0 resulted in a higher R^{2} than the fixed group model M2. The R^{2} achieved with M1 was always lower than that achieved with M0 and M2. Nevertheless, the difference in R^{2} between M1 and M0 was only 0.002 for the UD trait. For the other traits, the R^{2} that was achieved with M1 was between 0.011 and 0.058 lower than that with M0. Based on slope, model M0 was superior and always led to the lowest inflation of estimates for milk traits. For conformation traits, the fixed model M2 led to the lowest inflation. However, differences between models were relatively small in many cases (between 0.004 and 0.143).
Base group effects
Differences between base group effects estimated with the fixed model for the BS/OB population for protein yield above the diagonal and stature below the diagonal
EU_{b}  DE_{b}  AT_{b}  CH_{b}  IT_{b}  US_{b1}  US_{b2}  OB_{b1}  OB_{b2}  

≤1960  >1960  >1960  >1960  >1960  ≤1955  >1955  ≤1960  >1960  
EU_{b}  0  −64.86***  −22.52***  −13.97***  −19.36***  −26.06***  −29.90***  −14.01***  −45.54*** 
DE_{b}  25.48***  0  42.35***  50.90***  45.50***  38.80***  34.97***  50.85***  19.32*** 
AT_{b}  15.66***  −9.82***  0  8.55***  3.15^{n.s}.  −3.55^{n.s.}  −7.38 ^{n.s}.  8.50*  −23.03*** 
CH_{b}  1.21*  −24.27***  −14.45***  0  −5.40**  −12.10***  −15.93***  −0.05^{n.s}.  −31.58*** 
IT_{b}  19.63***  −5.85***  3.97***  18.42***  0  −6.70*  −10.53***  5.35*  −26.18*** 
US_{b1}  11.23***  −14.25***  −4.43***  10.02***  −8.40***  0  −3.83^{n.s.}  12.05**  −19.48*** 
US_{b2}  23.05***  −2.43 ^{n.s.}  7.39***  21.85***  3.42*  11.82***  0  15.88***  −15.65*** 
OB_{b1}  3.56***  −21.92***  −12.11***  2.35***  −16.08***  −7.67***  −19.50***  0  −31.53*** 
OB_{b2}  18.05***  −7.43***  2.38***  16.83***  −1.59**  6.82***  −5.01***  14.49***  0 
Differences between base group effects estimated with the fixed model for the FV population for protein yield above the diagonal and stature below the diagonal
DE_{b1}  DE_{b2}  DE_{b3}  DE_{b4}  HOL_{b1}  HOL_{b2}  AT_{b}  CZ_{b}  CH_{b}  FR_{b}  Div_{b}  

<1960  ≥1960 < 1970  ≥1970 < 1980  ≥1980  <1960  ≥1960  All  All  All  All  All  
DE_{b1}  0  −16.77***  1.06 ^{n.s.}  −7.49***  −50.43***  −49.94***  18.21***  −32.21***  10.89***  −28.14***  49.76*** 
DE_{b2}  −0.29^{n.s.}  0  17.83***  9.28**  −33.66***  −33.17***  34.98***  −15.45***  27.66***  −11.37***  66.54*** 
DE_{b3}  −1.60^{n.s.}  −1.31 ^{n.s.}  0  −8.55***  −51.49***  −51.00***  17.15***  −33.28***  9.82***  −29.20***  48.701*** 
DE_{b4}  −0.24^{n.s.}  0.05 ^{n.s.}  1.36 ^{n.s.}  0  −42.94***  −42.45***  25.70***  −24.73***  18.38***  −20.65***  57.25*** 
HOL_{b1}  5.16***  5.45***  6.76***  5.40***  0  0.49^{n.s.}  68.64***  18.21***  61.32***  22.29***  100.19*** 
HOL_{b2}  −1.49^{n.s.}  −1.20^{n.s.}  0.11^{n.s.}  −1.25^{n.s.}  −6.65***  0  68.15***  68.14***  60.83***  21.80***  99.70*** 
AT_{b}  −0.14 ^{n.s.}  0.16^{n.s.}  1.46^{n.s.}  0.11^{n.s.}  −5.30***  1.35^{n.s.}  0  −50.43***  −7.32**  −46.35***  31.55*** 
CZ_{b}  −3.48***  −3.19^{n.s.}  −1.88^{n.s.}  −3.24^{n.s.}  −8.64***  −1.99^{n.s.}  −3.35^{n.s.}  0  43.11***  4.08^{n.s.}  81.98*** 
CH_{b}  −1.79^{n.s.}  −1.50^{n.s.}  −0.19^{n.s.}  −1.55^{n.s.}  −6.95***  −0.30^{n.s.}  −1.65^{n.s.}  1.69 ^{n.s.}  0  −39.03***  38.88*** 
FR_{b}  0.22^{n.s.}  0.51^{n.s.}  1.82^{n.s.}  0.46^{n.s.}  −4.95***  1.71^{n.s.}  0.35^{n.s.}  3.70*  2.01^{n.s.}  0  77.91*** 
Div_{b}  −3.09***  −2.80^{n.s.}  −1.49^{n.s.}  −2.85*  −8.25***  −1.60^{n.s.}  −2.95*  0.39^{n.s.}  −1.30^{n.s.}  −3.31**  0 
Brown Swiss and Original Braunvieh breeds
In the BS/OB dataset, we defined nine different base groups that led to 36 possible contrasts between base groups. Differences were tested for significance using ttests. For the PY trait, significant differences were found for the majority of group contrasts and only 5 out of 36 differences were not significant. The largest difference was between the European base group (EU_{b}) and the German base group (DE_{b}) (−64.86). Estimates for DE_{b} were significantly larger than estimates for all other groups. Differences between the EU_{b} group and the other groups were also large but clearly negative. The smallest difference was between the Swiss base group (CH_{b}) and the older Original Braunvieh base group (OB_{b1}) (−0.05). The differences between the Austrian (AT_{b}) and the Italian (IT_{b}) base groups were relatively small in many cases.
For the STA trait, all group differences were significant, except the difference between the German base group (DE_{b}) and the younger American base group (US_{b2}). The patterns of differences were quite similar as for PY, although slightly different in magnitude for STA. The largest and smallest differences were also between EU_{b} and DE_{b} (25.48) and between the Swiss base group (CH_{b}) and the European base group (EU_{b}) (1.21), respectively.
Fleckvieh breed
For the FV breed, almost all group differences were significant for PY. The largest differences were between the older Red Holstein base group (HOL_{b1}) and the Austrian base group (AT_{b}), between the younger Red Holstein base group (HOL_{b2}) and AT_{b} and between HOL_{b2} and CZ_{b} (68.64, 68.15 and 68.14, respectively). The smallest difference was between the two Red Holstein base groups (0.49).
The situation for STA was almost the opposite. Only 16 group differences were significant, while 39 out of 55 differences were not significant. From these 16 significant differences, 10 were between the older Red Holstein base group (HOL_{b1}) and all other base groups.
Base group contributions
Results of the analysis of the Qmatrix for the BS/OB population
BS/OB  EU_{b}  DE_{b}  AT_{b}  CH_{b}  IT_{b}  US_{b1}  US_{b2}  OB_{b1}  OB_{b2}  

Year  ≤1960  >1960  >1960  >1960  >1960  ≤1955  >1955  ≤1960  >1960  
Calib (3262)  m  0.02  0.02  0.01  0.01  0.01  0.24  0.62  0.03  0.03 
sd  0.04  0.05  0.03  0.03  0.03  0.07  0.12  0.07  0.06  
DEA (416)  m  0.02  0.03  0.01  0.00  0.00  0.23  0.62  0.03  0.06 
sd  0.01  0.05  0.07  0.04  0.01  0.04  0.04  0.02  0.04  
OB (8)  m  0.25  0.00  0.01  0.05  0.00  0.00  0.00  0.54  0.16 
sd  0.25  0.00  0.02  0.09  0.00  0.00  0.00  0.19  0.15  
Others (346)  m  0.01  0.01  0.00  0.01  0.00  0.27  0.67  0.01  0.01 
sd  0.01  0.01  0.01  0.01  0.01  0.03  0.05  0.01  0.02 
Results of the analysis of the Qmatrix for the FV population
FV  DE_{b1}  DE_{b2}  DE_{b3}  DE_{b4}  HOL_{b1}  HOL_{b2}  AT_{b}  CZ_{b}  CH_{b}  FR_{b}  Div_{b}  

Year  <1960  ≥1960 < 1970  ≥1970 < 1980  ≥1980  <1960  ≥1960  All  All  All  All  All  
Calib (5273)  m  0.13  0.61  0.04  0.01  0.04  0.03  0.09  0.01  0.04  0.01  0.00 
sd  0.07  0.17  0.04  0.04  0.04  0.05  0.12  0.08  0.04  0.05  0.01  
DEA (2581)  m  0.13  0.64  0.05  0.01  0.04  0.02  0.07  0.00  0.04  0.01  0.00 
sd  0.03  0.08  0.02  0.03  0.03  0.02  0.06  0.00  0.02  0.01  0.00  
Others (97)  m  0.07  0.36  0.02  0.00  0.09  0.08  0.05  0.25  0.04  0.03  0.02 
sd  0.03  0.14  0.02  0.01  0.05  0.07  0.04  0.13  0.03  0.06  0.02 
Brown Swiss and Original Braunvieh
In the BS population, the two American base groups (US_{b1} and US_{b2}) represented between 80 % and 90 % of the overall genetic makeup of the genotyped population (Table 8). No differences in US contributions were detected between the training set (Calib) and the validation animals that were assigned to the DEA validation set and only a slight increase in US contributions was found in the others validation set. The small number of validation animals that was unequivocally assigned to the OB group showed a marked difference in this respect, with absolutely no contributions from the US base groups. Standard deviations of contributions for training animals (Calib) were also highest for the two US groups. Comparing standard deviations of all contributions between Calib and validation groups showed that the validation animals tended to have less variation, again except for the OB group.
Fleckvieh
In the FV breed, the second German base group (DE_{b2}) had the largest contribution to all validation groups (Table 9). Average contributions of more than 0.60 of the second German base group to the Calib training set and DEA validation set were observed and a considerable average contribution of 0.36 to the others validation set. The contribution of the Czech group (CZ_{b}) to the others validation set was relatively high (0.25).
As previously, across all base groups, we found similar average contributions to Calib and DEA and decreasing standard deviations in base group contributions when comparing Calib to DEA, which indicates an ongoing equalization of contributions.
Discussion
In conventional methods for estimating breeding values, phantom parent groups are used in most practical applications. The reason for this is that the theoretical base population is rarely correctly represented in the available pedigree. The same is of course true for genomic evaluation models. Stratification of the population can be easily determined by F_{st} plots.
Concept and implementation
The decomposition of the standard Gmatrix that we propose here is primarily an analytical tool. It allows studying the following aspects in some detail: (i) whether and how differences in allele frequencies between base groups contribute to the proportion of genetic variance explained by differences between base groups; and (ii) how the effects estimated for the base groups influence the current population and their genomic predictions. Conceptually, it follows the classical approach for modeling base groups in genetic evaluations and extends it to the GBLUP case. More fundamentally, it theoretically shows that parts of the genetic variation represented by the Gmatrix can be assigned to systematic differences in allele frequencies between base populations. This implies that standard GBLUP is equivalent to a model that fits random genetic groups, where differences in group means are modeled as part of the natural additivegenetic variance (assumed to be known in the present investigation). Recently, Makgahlela et al. [13] showed that, in the case of the largely admixed Nordic Red population, a model that fits a fixed genetic group has some advantage in terms of the reliability of DGV over the standard GBLUP model. Modeling groups as fixed might be advantageous if true differences between groups are larger than what can be attributed to differences in allele frequencies of genetic markers. This can arise from inconsistent linkage disequilibrium phases between quantitative trait loci (QTL) and markers between subpopulations or breeds, or from different QTL segregating within groups. Both aspects have been used in the past to explain why acrossbreed genomic predictions based on 50 k genotypes have low accuracy [36–38].
As in the classical approach for modeling base groups, we assigned base animals to groups and calculated a matrix of genetic contributions Q using standard methodology. This matrix Q was then used to estimate average allele frequencies using mixedmodel methodology, as described by Gengler et al. [21]. As mentioned in the Methods section, estimation of average allele frequencies in base groups is not essential for the proposed decomposition of G. However, it provides a convenient way to integrate new animals under practical conditions. Conceptually, it divides the genetic distance between any pair of animals into two parts, i.e. a distance that already exists in the base population and a distance that originates from the history of the breed as documented by the known pedigree. Moreover, estimating allele frequencies in base groups from subsets of genotypes may lead to similar problems as in standard applications of models that fit genetic groups, i.e., if the amount of data to estimate allele frequencies in base groups reliably is not sufficient, it can result in a loss of accuracy and introduction of bias [39]. Then, this tradeoff between defining all possible relevant base groups and estimability needs to be taken into account. A closer examination of the required size and properties for an optimal design of base groups is beyond the scope of this paper.
Group effects were not accounted for when deregressing MACE breeding values for BS/OB animals because (i) group effects or group contributions are usually not reported to Interbull by the participating countries; (ii) Interbull introduces its own group categorizations based on birth year of bull dams for MACE evaluation; and (iii) Interbull does not report group effects or group contributions back to the participating countries. Because of these limitations, we cannot exclude that our results for BS/OB animals may be influenced in one way or the other by the properties of MACE breeding values.
Since we tested different models only in a single forward prediction, the generalization of our results is not straightforward. However, from a practical point of view, the steps that we followed allowed us to better characterize the genetic composition of the validation groups. This in turn might help to decide if a standard GBLUP model is sufficient or whether a different model should be preferred. However, modeling genetic groups in any of the proposed ways is neither intended nor expected to improve the prediction for a standard animal with a pedigree that has many generations and that is sufficiently complete. Predictions for an animal with an incomplete pedigree or a limited number of genotyped ancestors should, however, benefit from the inclusion of group effects in one form or the other.
Models
We compared three models, which treated effects of base groups as random (M0), as fixed (M2), or ignored them completely (M1). Model M1 consistently showed the lowest R^{2} values across both breeds and all traits. This was expected, since ignoring part of the genomic information should not result in increased predictive ability. However, it is interesting to note that the segregation term itself results in a relatively good prediction. Using M1, we observed differences in the decrease of the model R^{2} between traits, with the UD trait being the least influenced by G _{A} ^{*} . We cannot exclude that there might be cases where omission of base groups will increase the R^{2} of predictions. However, the slopes of the regression of current DYD or deregressed proofs on DGV that we used as a test statistic here gave no indication that omitting G _{A} ^{*} without adjusting the genetic variance could lead to less inflated estimates. Recently, Makgahlela et al. [14] compared predictions using a genomic relationship matrix based on average allele frequencies across breeds with predictions using breedspecific allele frequencies in the Nordic Red dairy cattle population. This comparison is conceptually quite close to what we did in the comparison between the reduced model (M1) and the fixed model (M2). The authors found a smaller predictive power and greater inflation of DGV when considering breedspecific allele frequencies. Since using breedspecific allele frequencies without modeling differences in allele frequencies in the base population is equivalent to our reduced model (M1), in this respect, their results are consistent with those presented here.
In terms of predictive power, M2 was better than M0 for all milk traits and one conformation trait for the BS/OB data (Table 5). With the FV data, we saw a clear advantage of M0 for all traits. In a preliminary study [40], we had reported that the OB and current BS populations were separated by a fairly large genetic distance. The validation BS/OB group that we used here included only very few OB animals. The observed genetic distance and the fact that this group of animals is small compared to the overall validation group might explain the small superiority of M2 observed for the BS/OB data. Genetic distances of similar magnitude were not detected in the FV population, for which M0 was clearly the best model. However, the GermanAustrian cooperation for genetic evaluations in FV [22] recently fully opened the routine evaluations for the Czech population, which shows some differences in genetic composition compared to the current GermanAustrian breeding population (Table 9). Additional investigations will be necessary to verify if M0 is still superior with an extended base population that will very likely be the result of this extended cooperation.
Genetic contributions and base group effects
Analysis of the matrix of genetic contributions Q revealed some interesting features. For example, on the one hand, the analysis of average contributions of genetic groups to current animals revealed that US animals had a strong impact on the current BS population in Europe. On the other hand, a substantial contribution of the “old” European base group (EU _{ b }) to the OB validation group was found. Averages and standard deviations of contributions are also an indirect indicator for how accurate base allele frequencies and base group effects could be estimated from the current data. However, since information in Q naturally implies some degree of collinearity, this factor has to be taken into account also. Finally, differences in trait means between base groups can only be detected if there is enough variation in base group contributions within the training set (Calib). Such variation was observed for both breeds and was considerably smaller for the dominant groups of the validation set. This was expected since, in the last 20 years, much less migration has occurred in both populations, which probably resulted in less admixture in the more recent groups. Although this was not the primary focus of this investigation, it was interesting to note the extremely strong genetic contribution of American Brown Swiss animals to the current BS population. The validation group OB was clearly an exception in the sense that a small or even nonexisting contribution of American Brown Swiss cattle defines what an OB animal is. In contrast, the strong contribution of the DE_{b2} group to the FV population seems to be an artifact of the completeness of the pedigree used, i.e. most of the pedigrees traced back to this base group.
For both breeds and for the traits analyzed here, it was possible to estimate significant differences between the means of base groups in most cases (Tables 6 and 7). Treating base groups as fixed or random resulted in similar patterns, although they were more pronounced in the case of fixed effects. The observed effects were quite consistent with our expectations and seem to be reasonable when considering the limits that were imposed on estimability and precision by the collinearity and dependencies in Q (Q has no full column rank). For example, the two Holstein base groups in the FV dataset had a clear advantage for protein yield, which is not surprising since Holstein bulls were introgressed for exactly that reason. In some cases, such as the advantage found for the DE_{b} group in BS, knowing that the base group definition for DE_{b} also comprised relatively young base animals was helpful, whereas assignment to American Brown Swiss was more linked to a specific period further back in the history of the breed.
Both the distribution of genetic contributions and precision of base group effects emphasize that when considering genetic grouping in genetic evaluation models, the question of estimability and relevance for the current population should always be included [39]. However, as already noted above, it is not reasonable to believe that the model used has a strong impact on predictive power if the animals used for validation show no differences in their genetic composition with respect to the base groups and if the majority of them have complete pedigrees of sufficient depth.
Additional considerations
This investigation demonstrates that, in many cases, the genomic relationship matrix includes an important component of variation that has no corresponding counterpart in the conventional numerator relationship matrix. However, many practical applications of the estimation of GBV include a step for scaling the genomic relationship matrix to the numerator relationship matrix to set them on the same genetic base (see for example [41]). Based on our results, it seems more suitable to do this scaling based on matrix G _{ S } only. This component of the Gmatrix should be free of the effects of systematic differences in allele frequencies between base groups (represented in G _{A} ^{*} ), which might otherwise exacerbate the derivation of correct scaling factors. This issue was also raised by Makgahlela et al. [14] and might be of special importance for applications of onestep genomic evaluations [16, 17, 42, 43]. Furthermore, it suggests that estimating genetic parameters for genomic evaluations using G _{ T } might be preferred over a simple transfer of the parameters estimated with the numerator relationship matrix.
Possible extensions of M0, for example with an individual λ for group effects or – in the most general form – using an identity matrix instead of G _{ A }, e.g. [39], as well as an individual λ for group effects were beyond the scope of this paper. In addition, these extensions would require the estimation of a variance component for groups, which would be difficult to do due to the typically small number of degrees of freedom for the variance between group means. Using G _{ A } but assuming an individual λ for group effects is also somewhat questionable from a conceptual point of view, since it would be necessary to describe the covariance between and within subpopulations based on the same distance between allele frequencies but with different genetic variances.
Conclusions
We showed that the proposed decomposition of the Gmatrix is helpful to examine the relative importance of base group and segregation effects in a dataset. The commonly used genomic relationship matrix G is equivalent to our model M0, where base groups and segregation terms are considered as random effects with the same genetic variance. Although it is interesting to examine contributions of different founder populations from a scientific point of view, we also conclude that the standard model M0 is preferred in many cases, e.g. if base group effects are small or difficult to estimate, or if the current population is homogenous with balanced base group contributions. However, a fixed model (M2) might be preferred if base group effects are large (i.e. in the range of differences between breeds rather than between subpopulations) or if the genomic evaluation comprises two or more separated populations with only weak genetic links.
Declarations
Acknowledgments
We want to thank the contributors of the genotype pool GermanyAustria as well as the Intergenomics consortium for providing the genotypes. We gratefully acknowledge the Arbeitsgemeinschaft Süddeutscher Rinderzucht und Besamungsorganisationen e.V. for their financial support within the research cooperation ”Zukunftswege“. Furthermore, we wish to thank the editors JCM Dekkers and H Hayes as well as two unknown reviewers for their helpful suggestions to improve the final manuscript.
Authors’ Affiliations
References
 Meuwissen THE, Hayes BJ, Goddard ME. Prediction of total genetic value using genomewide dense marker maps. Genetics. 2001;157:1819–29.PubMed CentralPubMedGoogle Scholar
 Habier D, Fernando RL, Garrick DJ. Genomic BLUP decoded: a look into the black box of genomic prediction. Genetics. 2013;194:597–607.PubMed CentralPubMedView ArticleGoogle Scholar
 IbánẽzEscriche N, Fernando RL, Toosi A, Dekkers JCM. Genomic selection of purebreds for crossbred performance. Genet Sel Evol. 2009;41:12.PubMed CentralPubMedView ArticleGoogle Scholar
 Harris BL, Johnson DL. Genomic predictions for New Zealand dairy bulls and integration with national genetic evaluation. J Dairy Sci. 2010;93:1243–52.PubMedView ArticleGoogle Scholar
 Erbe M, Hayes BJ, Matukumalli LK, Goswam S, Bowman PJ, Reich CM, et al. Improving accuracy of genomic predictions within and between dairy cattle breeds with imputed highdensity single nucleotide polymorphism panels. J Dairy Sci. 2012;95:4114–29.PubMedView ArticleGoogle Scholar
 Caballero A, Toro MA. Analysis of genetic diversity for the management of conserved subdivided populations. Conserv Genet. 2002;3:289–99.View ArticleGoogle Scholar
 Álvarez I, Royo LJ, Gutiérrez JP, Fernández I, Arranz JJ, Goyache F. Relationship between genealogical and microsatellite information characterizing losses of genetic variability: Empirical evidence from the rare Xalda sheep breed. Livest Sci. 2008;115:80–8.View ArticleGoogle Scholar
 Thompson R. Sire evaluation. Biometrics. 1979;35:339–53.View ArticleGoogle Scholar
 Quaas RL, Pollack EJ. Modified equations for sire models with groups. J Dairy Sci. 1981;64:1868–72.View ArticleGoogle Scholar
 Westell RA, Quaas RL, Van Vleck LD. Genetic groups in an animal model. J Dairy Sci. 1988;71:1310–8.View ArticleGoogle Scholar
 GarcíaCortés LA, Toro MA. Multibreed analysis by splitting the breeding values. Genet Sel Evol. 2006;38:601–15.PubMed CentralPubMedGoogle Scholar
 Makgahlela ML, Mäntysaari EA, Strandén I, Koivula M, Sillanpää MJ, Nielsen US, et al. Across breed multitrait random regression genomic predictions in the Nordic Red dairy cattle. Interbull Bull. 2011;44:42–6.Google Scholar
 Makgahlela ML, Mäntysaari EA, Strandén I, Koivula M, Nielsen US, Sillanpää MJ, et al. Across breed multitrait random regression genomic predictions in the Nordic Red dairy cattle. J Anim Breed Genet. 2013;130:10–9.PubMedView ArticleGoogle Scholar
 Makgahlela ML, Strandén I, Nielsen US, Sillanpää MJ, Mäntysaari EA. Using the unified relationship matrix adjusted by breedwise allele frequencies in genomic evaluation of multibreed population. J Dairy Sci. 2014;97:1117–27.PubMedView ArticleGoogle Scholar
 Tsuruta S, Misztal I, Lourenco DAL, Lawlor TJ. Assigning unknown parent groups to reduce bias in genomic evaluations of final score in US Holstein. J Dairy Sci. 2014;97:5814–21.PubMedView ArticleGoogle Scholar
 Vitezica ZG, Aguilar I, Misztal I, Legarra A. Bias in genomic predictions for populations under selection. Genet Res. 2011;93:357–66.View ArticleGoogle Scholar
 Christensen OF. Compatibility of pedigreebased and markerbased relationship matrices for singlestep genetic evaluation. Genet Sel Evol. 2012;44:37.PubMed CentralPubMedView ArticleGoogle Scholar
 Weir BS, Cockerham CC. Estimating FStatistics for the analysis of population structure. Evolution. 1984;38:1358–70.View ArticleGoogle Scholar
 Patterson N, Price AL, Reich D. Population structure and Eigen analysis. PLoS Genet. 2006;2, e190.PubMed CentralPubMedView ArticleGoogle Scholar
 Zou F, Lee S, Knowles MR, Wright FR. Quantification of population structure using correlated SNPs by shrinkage principal components. Hum Hered. 2010;70:9–22.PubMed CentralPubMedView ArticleGoogle Scholar
 Gengler N, Mayeres P, Szydlowski M. A simple method to approximate gene content in large pedigree populations: applications to the myostatin gene in dualpurpose Belgian Blue cattle. Animal. 2007;1:21–8.PubMedView ArticleGoogle Scholar
 Edel C, Schwarzenbacher H, Hamann H, Neuner S, Emmerling R, Götz KU. The GermanAustrian genomic evaluation system for Fleckvieh (Simmental) cattle. Interbull Bull. 2011;44:152–6.Google Scholar
 Schaeffer LR. Multiplecountry comparison of dairy sires. J Dairy Sci. 1994;77:2671–78.PubMedView ArticleGoogle Scholar
 Garrick DJ, Taylor JF, Fernando RL. Deregressing estimated breeding values and weighting information for genomic regression analyses. Genet Sel Evol. 2009;41:55.PubMed CentralPubMedView ArticleGoogle Scholar
 Mäntysaari E, Liu Z, VanRaden PM. Interbull validation test for genomic evaluations. Interbull Bull. 2010;41:17–22.Google Scholar
 Interbull CoP. Appendix VIII  Interbull validation test for genomic evaluations – GEBV test. 2013. https://wiki.interbull.org/public/CoPAppendixVIII?action=print&rev=44. Accessed 12 June 2014.
 Fiske WF, Banos G. Weighting factors of sire daughter information in international genetic evaluations. J Dairy Sci. 2001;84:1759–67.View ArticleGoogle Scholar
 Habier D, Tetens J, Seefried FR, Lichtner P, Thaller G. The impact of genetic relationship information on genomic breeding values in German Holstein cattle. Genet Sel Evol. 2010;42:5.PubMed CentralPubMedView ArticleGoogle Scholar
 International Organization for Standardization. Codes for the representation of names of countries and their subdivisions – Part 1: Country codes. 3rd ed. Geneva: ISO copyright office; 2013.Google Scholar
 VanRaden PM. Efficient methods to compute genomic predictions. J Dairy Sci. 2008;91:4414–23.PubMedView ArticleGoogle Scholar
 Weir BS, Cardon LR, Anderson AD, Nielsen DM, Hill WG. Measures of human population structure show heterogeneity among genomic regions. Genome Res. 2005;15:1468–76.PubMed CentralPubMedView ArticleGoogle Scholar
 Chen C, Durand E, Forbes F, Francois O. Bayesian clustering algorithms ascertaining spatial population structure: A new computer program and a comparison study. Mol Ecol Notes. 2007;7:747–56.View ArticleGoogle Scholar
 Mrode RA. Linear models for the prediction of animal breeding values. 2nd ed. Oxfordshire: CABI Publishing; 2005.View ArticleGoogle Scholar
 Quaas RL. Additive genetic model with groups and relationships. J Dairy Sci. 1988;71:1338–45.View ArticleGoogle Scholar
 Nei M. Analysis of gene diversity in subdivided populations. Proc Nat Acad Sci USA. 1973;70:3321–3.PubMed CentralPubMedView ArticleGoogle Scholar
 Harris BL, Johnson DL, Spelman RJ. Genomic selection in New Zealand and the implications for national genetic evaluation. In Proceedings of the 36th International Committee for Animal Recording Biennial Session:16–20 June 2008; Niagara Falls. 2009:325–30.Google Scholar
 Hayes BJ, Bowman PJ, Chamberlain AJ, Goddard ME. Invited review: Genomic selection in dairy cattle: Progress and challenges. J Dairy Sci. 2009;92:433–43.PubMedView ArticleGoogle Scholar
 de Roos APW, Hayes BJ, Goddard ME. Reliability of genomic predictions across multiple populations. Genetics. 2009;183:1545–53.PubMed CentralPubMedView ArticleGoogle Scholar
 Phocas F, Laloë D. Should genetic groups be fitted in BLUP evaluation? Practical answer for the French AI beef sire evaluation. Genet Sel Evol. 2004;36:325–45.PubMed CentralPubMedView ArticleGoogle Scholar
 Plieschke L, Edel C, Pimentel E, Emmerling R, Bennewitz J, Götz KU. Influence of foreign genotypes on genomic breeding values of national candidates in Brown Swiss. In Proceedings of the 10th World Congress of Genetics Applied to Livestock Production: 17–22 August 2014;Vancouver. https://asas.org/docs/defaultsource/wcgalpproceedingsoral/078_paper_8984_manuscript_342_0.pdf?sfvrsn=2. Accessed 12 June 2014.
 Meuwissen THE, Luan T, Woolliams JA. The unified approach to the use of genomic and pedigree information in genomic evaluations revisited. J Anim Breed Genet. 2011;128:429–39.PubMedView ArticleGoogle Scholar
 Legarra A, Aguilar I, Misztal I. A relationship matrix including full pedigree and genomic information. J Dairy Sci. 2009;92:4656–63.PubMedView ArticleGoogle Scholar
 Aguila I, Misztal I, Johnson DL, Legarra A, Tsuruta S, Lawlor TJ. Hot topic: A unified approach to utilize phenotypic, full pedigree, and genomic information for genetic evaluation of Holstein final score. J Dairy Sci. 2010;93:743–52.View ArticleGoogle Scholar
 Rao CR. Leastsquares theory using an estimated dispersion matrix and its application to measurement of signals. In: Proceedings of the fifth Berkeley Symposium on Mathematical Statistics and Probability. Berkeley: University of California Press; 1967. 1:355–72.Google Scholar
Copyright
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.