Bayes factor for testing between different structures of random genetic groups: A case study using weaning weight in Bruna dels Pirineus beef cattle

The implementation of genetic groups in BLUP evaluations accounts for different expectations of breeding values in base animals. Notwithstanding, many feasible structures of genetic groups exist and there are no analytical tools described to compare them easily. In this sense, the recent development of a simple and stable procedure to calculate the Bayes factor between nested competing models allowed us to develop a new approach of that method focused on compared models with different structures of random genetic groups. The procedure is based on a reparameterization of the model in terms of intraclass correlation of genetic groups. The Bayes factor can be easily calculated from the output of a Markov chain Monte Carlo sampling by averaging conditional densities at the null intraclass correlation. It compares two nested models, a model with a given structure of genetic groups against a model without genetic groups. The calculation of the Bayes factor between different structures of genetic groups can be quickly and easily obtained from the Bayes factor between the nested models. We applied this approach to a weaning weight data set of the Bruna dels Pirineus beef cattle, comparing several structures of genetic groups, and the final results showed that the preferable structure was an only group for unknown dams and different groups for unknown sires for each year of calving.


INTRODUCTION
The best linear unbiased prediction (BLUP) assumes that the base population is unselected animals sampled from a normal distribution with a zero mean and a variance equal to the genetic variance [11]. Nevertheless, it implies an extensive knowledge of the pedigree of our livestock, which is often impossible in our experimental and commercial farms. The acquisition of selected animals with a short or null pedigree known, or the loss of genealogical information in our populations, are usual situations in livestock production and they may lead to an underestimation of the genetic trend and to a biased prediction of breeding values [20]. The inclusion of genetic groups in BLUP evaluation [22,32] accounts for differences in genetic values of base animals, overcoming the assumption of equality of expectations of breeding values across geographical origins or the temporal scale [12]. Moreover, it may be advantageous to consider genetic groups as random effects, especially in situations with low heritabilities [26] or when small group sizes can not be avoided [3]. Including genetic groups in the evaluation leads to an unbiased estimation of differences between those groups, but also leads to less accurate estimated breeding values due to an increased parameterization of the model [20]. In this sense, appropriate analytical tools to determine the preferable structure of the genetic groups become essential in the selection programs of our livestock.
Under the Bayesian framework, hypothesis testing is usually analyzed by calculating the Bayes factor, the ratio between the marginal probabilities of the data given the tested model, and after integrating out all parameters in the model [14]. This methodology suffers from disadvantages due to its complexity of computation in complex models or its strong dependence on the assumed prior distributions [14]. Notwithstanding, a simple and stable Bayes factor procedure has been described to test between nested models that only differ in a bounded variable [4,28]. This methodology shows an important advantage in terms of dependence to the prior distributions for all parameters, with the only exception of the boundary variable, because they are the same in both competing models and then, they are cancelled in the final calculation [28]. Taking this as a starting point, we developed a new approach to test between different structures of random genetic groups. Our methodology provides a Bayes factor comparison between all pairs of structures of genetic groups, as well as with the model without genetic groups, and it requires an only Bayes factor analysis for each structure of groups, greatly reducing the computational and temporal demands. The described method has been tested on a weaning weight data set of the Bruna dels Pirineus beef cattle breed with encouraging results.

Bayes factor between models with and without genetic groups
Now, we present a Bayes factor for a model containing genetic groups over a non-genetic groups model. Taking as the starting point a data set with n phenotypic records coming from m individuals, we assumed the following standard model (Model 1) with genetic groups as defined by Westell et al. [32]: where y contains the n phenotypic records, X is the incidence matrix of systematic effects (b), Z 1 is the incidence matrix of the permanent environmental effects (p) with k levels, Z 2 is the incidence matrix relating observations to additive genetic effects (a) and genetic groups (Qg), and e is the vector of residuals. Note that the q i j element of Q is a fraction relating the contribution of the jth genetic group to the total genetic value of the ith individual, and g is the column vector of order j containing the effects of genetic groups [32]. Based on the results from the infinitesimal model, p, a and e are assumed normally distributed: and, without loss of generality, g is also assumed normally distributed to provide a straightforward implementation of the Bayes factor: A being the m × m numerator relationship matrix, I p being an identity matrix with dimensions k × k, I e being an identity matrix with dimensions n × n, I g being an identity matrix with dimensions j × j, and σ 2 p , σ 2 a , σ 2 e and σ 2 g being the permanent environmental, additive genetic, residual and between genetic group variances, respectively. Model 1 can be reparameterized following the standard procedure defined by Varona et al. [28] as: Consequently, where ρ 2 g = σ 2 g /σ 2 e * is the intraclass correlation, and σ 2 e * = σ 2 g + σ 2 e is the variance of e * .
The joint distribution of all variables in Model 1 is: where p ∼ N 0, I p σ 2 p a ∼ N 0, Aσ 2 a e ∼ N 0, I e σ 2 e * . Then, the joint distribution of records and parameters is: where we can assume that prior distributions p 2 (b), p 2 p σ 2 p , p 2 σ 2 p , p 2 a Aσ 2 a , p 2 σ 2 a , and p 2 σ 2 e * are identical to the prior distributions of the previous model. And the likelihood of the model is: According to Varona et al. [28], only the analysis with the complex model (Model 1) is required to calculate the Bayes factor between Model 1 and Model 2 (BF 1,2 ). Following García-Cortés et al. [4] and Varona et al. [28]: because ρ 2 g = 0 = 1, or simultaneously: A BF 1,2 (BF 2,1 ) > 1 (<1) indicates that the model with between genetic group variance is more suitable. On the contrary, a BF 2,1 (BF 1,2 ) < 1 (>1) indicates that the model without genetic groups is more probable. Sampling from the conditional distribution of ρ 2 g can be performed using a Gibbs sampler [5], with a Metropolis-Hastings step [10]. Density p 1 ρ 2 g = 0 |y suffices to obtain BF and this value can be obtained from the Gibbs sampler output by averaging the full conditional densities of each cycle at ρ 2 g = 0 using the Rao-Blackwell argument [4,28].

Testing between different structures of genetic groups
The previously described Bayes factor procedure allows for a fast comparison between different structures of genetic groups. Assuming that there are several different structures of genetic groups to be tested, BF 1(p),2 is the Bayes factor between the model with the pth structure of genetic groups (Model 1(p)) and the model without genetic groups (Model 2). In this sense, the Bayes factor between the models with the pth and the qth structures of genetic groups is easily obtained from:

Bruna dels Pirineus weaning weight analysis
The analysis described above was applied to data from field records on weaning weight of the Bruna dels Pirineus breed, a local beef cattle breed located in the mountainous areas of Catalonia (northeastern Spain). The trait analyzed was weight standardized at 185 days of age (weaning weight in this population). Standardization was accomplished following BIF guidelines [1]. The available data consisted of 644 records registered in a herd between the years 1991 and 2003. Following Quintanilla et al. [23], the assumed operational model included the sex of the calf (male or female), the year of calving (years 1991 and 1992 were grouped due to the reduced number of calves born in 1992; Tab. I), and the age of the dam, with 6 categories (2, 3, 4, 5, 6 and >6 years), as systematic effects, as well as the permanent environmental effect characterized by the dam and the additive genetic effect of each calf as random sources of variation. The loss of pedigree information throughout the period analyzed allowed us to define different structures of genetic groups according to the sex of the ancestor and/or the year of calving: (i) a group for unknown sires and a group for unknown dams (S 1 D 1 ); (ii) a different genetic group for each year of calving without differentiating between sires and dams (YR); (iii) an only genetic group for unknown dams and a different genetic group for unknown sires within each year of calving (S YR D 1 ); (iv) an only genetic group for unknown sires and a different genetic group for unknown dams within each year of calving (S 1 D YR ); and (v) a different genetic group for each year of calving and separated groups for unknown sires and dams (S YR D YR ). With respect to genealogical information, the pedigree file consisted of 999 animals, 644 of them with weaning weight records (64.5%), with all dams and 84.0% of sires known for animals with phenotypic information (Tab. I). A total of 28 sires and 344 dams were included in the pedigree, with an average of 25.1 descendants per sire, and 19.3 of them with a registered weaning weight.
Our Bayes factor analysis was performed for each structure of genetic groups with a total of 12 500 iterations, and 10 000 iterations were used after discarding the first 2500 as burn-in. The analysis of convergence and the calculation of effective chain size followed the algorithms by Geyer [6] and Raftery and Lewis [24]. All correlated samples were used to calculate the posterior distribution of the intraclass correlation using the ergodic property of the chain [7].

Comparison with a likelihood ratio test
In order to compare our results with a standard frequentist approach, a likelihood ratio test (LRT) was calculated between each model with random genetic groups against the model without genetic groups according to the following expression: HPD = Highest posterior density at 95%; σ 2 e * = σ 2 e + σ 2 g ; h 2 = σ 2 a / σ 2 a + σ 2 p + σ 2 e * ; S 1 D 1 : a genetic group for unknown sires and a genetic group for unknown dams; YR: a different genetic group for each year of calving (sires and dams together); S YR D 1 : an only genetic group for unknown dams and a different genetic group for unknown sires within each year of calving; S 1 D YR : an only genetic group for unknown sires and a different genetic group for unknown dams within each year of calving; S YR D YR : a different genetic group for each year of calving and separated groups for unknown sires and dams.
where L 1 b ,p,â,σ 2 p ,σ 2 a ,σ 2 e * ,ρ 2 g was the likelihood under Model 1 at maximum likelihood estimates b ,p,â,σ 2 p ,σ 2 a ,σ 2 e * ,ρ 2 g and L 2 b ,p,â,σ 2 p ,σ 2 a ,σ 2 e * was the likelihood under Model 2 at the maximum likelihood estimates of that model. We tested those LRT values with a χ 2 -square distribution of 1 degree of freedom [28]. Maximum likelihood estimates were obtained through a simplex algorithm [21]. Unfortunately, comparisons by LRT between models with different structures of genetic groups could not be performed since this approach requires hierarchical models to be tested [17].

RESULTS
Bruna dels Pirineus calves reached an average weaning weight of 243.2 kg with a phenotypic variance of 758.1 kg. The number of required genetic groups differed greatly depending upon the structure used. Whereas the minimum number was achieved with the S 1 D 1 structure, a group for unknown sires and a group for unknown dams, 37 genetic groups were defined for the S YR D YR structure, with a reduced number of animals directly assigned to several groups (Tab. I). The remaining structures required between 16 (S 1 D YR ) and 23 genetic groups (S YR D 1 ). Estimated variance components for each model are shown in Table II. The mode of the additive genetic variance ranged between 161.7 (S YR D 1 ) and 193.5 (YR), whereas the permanent environmental variance fluctuated between 47.1 (S 1 D YR ) and 64.9 (S 1 D 1 ). The residual variance of Model 1, which included residuals and genetic groups after reparameterization of the model, showed the largest estimates, ranging from 502.8 (YR) to 544.4 (S YR D 1 ). Heritability (h 2 ) ranged from 0.213 (S YR D 1 ) to 0.256 (YR), with a permanent environmental coefficient (c 2 ) close to 0.07 (Tab. II). The Bayes factor analyses showed that only the models with the YR (BF = 1.42) and the S YR D 1 (BF = 11.23) structures of genetic groups were preferable to the model without genetic groups (Tab. III). For the remaining structures, genetic groups were discharged with BF values between 0.63 (S YR D YR ) and 0.83 (S 1 D 1 ). Comparable results were obtained with the likelihood ratio test, with significant differences observed for the YR (P < 0.05) and the S YR D 1 (P < 0.001) models. Notwithstanding, P-values for the remaining structures of genetic groups were lower or close to 0.1 (Tab. IV). In contrast to the likelihood ratio test, our approach allowed for a cross comparison of all structures of genetic groups among them, also including the model without genetic groups. In this sense, we obtained the BF estimate for the 30 possible combinations (by pairs) of the structures of genetic groups (Tab. III), with the effective analysis of five models only. The results showed that the most preferable structure of genetic groups was the S YR D 1 , with a substantial evidence (3.16 < BF < 10) in front of YR, and a strong evidence (10 < BF < 31.62) against the remaining structures, in accordance to the Jeffreys [13] levels of evidence. The value of the BF oscillated between 0.44 (S YR D 1 vs. S YR D YR ) and 2.27 (YR vs. S YR D YR ) between the remaining pairs of structures of genetic groups.
The effective size of the Markov chains for the intraclass correlation of genetic groups was close to 1000 for all analyses (Tab. IV). The intraclass correlations (ρ 2 g ) for the different structures obtained are shown in Table IV. Differences between the mode of the posterior density and the maximum likelihood estimated were minimal (±0.02). The largest ρ 2 g corresponded to S YR D 1 , with a mode of 0.26 and the highest posterior density interval at 95% (HPD95) ranged between 0.06 and 0.53. The remaining structures included the null correlation in their HPD95, with the only exception of the YR one (HPD95 = 0.01 to 0.40), with a mode of 0.13.

DISCUSSION
We developed a new approach to compare different structures of genetic groups in the context of the BLUP model taking as the starting point the García-Cortés' et al. [4] and Varona's et al. [28] Bayes factor approach to test between nested competing models. The use of this methodology allows for a fast and stable computation of the BF [4,28], in contrast to other numerical approximations to the BF or posterior probabilities such as the harmonic mean [18] or the Reversible jump Markov chain Monte Carlo [9]. The effective chain sizes (Tab. IV) were comparable with the ones reported by García-Cortés et al. [4] and it revealed the fast mixing of the Markov chains, with a burn-in period lower than 2500 cycles in all cases (results not shown). Moreover, assuming m different structures of genetic groups to test, we only have to effectively analyze m models, allowing for a fast and easy computation of the BF for the 2 m k=1 k possible combinations of models with different structures, including the model without genetic groups. It compares the advantage of a given model with any other one in terms of probability, without defining any null or alternative hypothesis, in contrast with LRT or other frequentist approaches. Moreover, LRT is only useful in testing hierarchical models. They can test a model with genetic groups against a model without genetic groups (Tab. IV), but the comparison between models with different structures of genetic groups is not feasible. In contrast with likelihood-based approaches for testing significance, Bayes factor shows several important advantages: (i) it includes all the information provided by the data after integrating out along the parameter space, not only conditioned by the maximum likelihood estimates; (ii) it does not need to invoke asymptotic assumptions, providing exact results even with small data sets [28]; (iii) it retains good behavior even when the hypothesis to be tested is close or at the boundary of the parameter space [4], where asymptotic properties of the likelihood ratio tests fail [25]; and (iv) it does not require one to define any null or alternative hypothesis model, providing a probability of both candidate models [14,16]. However, as the information increases, the probability of the preferable model also increases, as pointed out by García-Cortés et al. [4] and Varona et al. [29] within the scope of variance components and a QTL model, respectively.
The main disadvantage of the Bayesian approach is its dependence on the assumed prior distributions. Nevertheless, our BF approach assumed identical prior distributions for both competing models, with the only exception of the intraclass correlation, and they are cancelled in the final formula [28,30]. Ultimate methodology only includes the marginal prior and posterior distribution for the intraclass correlation and, as a consequence, the BF results are robust for modifications of this prior distribution. When the prior distribution of the intraclass correlation is modified, the posterior distribution changes in the same direction and, as a consequence, the BF remains stable [28].
The use of genetic groups in BLUP evaluation [22,32] is a useful tool to account for different expectations for breeding values in founders. Notwithstanding, there is no standard structure of genetic groups to be applied in the genetic evaluation of our livestock, and multiple assumptions can be done depending on several factors like the magnitude of the genetic trend in our population or the acquisition of foreign animals [8,19,20]. Several researches have shown that the need for grouping is greater with high differences between groups and could be counterproductive with small differences [15,20], but statistical methods specifically focused to assess the adequacy of different structures of genetic groups or the removal of genetic groups are unavailable. In this sense, our methodology becomes a useful tool to decide about the genetic groups and their preferable structure.
The Bruna dels Pirineus data set showed pedigree information loss distributed along the years and for both parents, with a 30.3% of animals in pedigree with at least one parent unknown (Tab. I), a percentage higher than the one described by Golden et al. [8] in Angus cattle. Pedigree losses are usual in extensive beef cattle where paternity information can be difficult to register and the acquisition of foreign animals, mainly bulls, is common. In this sense, the Bruna dels Pirineus data set is a suitable material to test our Bayes factor procedure on several feasible structures of genetic groups. Moreover, the reduced number of animals without known ancestors in some years does not allow the use of fixed genetic groups and requires a random approach for the grouping strategy [3], an important assumption in our approach. Average weaning weight was 243.2 kg, a value close to the ones reported by Quintanilla et al. [23] and Casellas and Piedrafita [2] in the Bruna dels Pirineus breed and comparable with the results obtained in Brown Swiss and Pirenaica [31]. The ratios of the variance components provided heritability estimates ranging between 0.213 and 0.256, similar to the values reported by Quintanilla et al. [23] in the same breed, and coefficients of permanent environment lower than 0.10, all of them comparable with the results of Quintanilla et al. [23].
The results obtained on the weaning weight mixed model showed that the S YR D 1 structure of genetic groups was preferable in Bruna dels Pirineus, an only group for unknown dams and different groups for unknown sires within each year of calving. The results obtained from the LRT were similar to the BF ones, in a similar way that Varona et al. [28] observed with the same methodology applied to Quantitative Trait Loci detection. It is important to note that the less (S 1 D 1 ) and the most (S YR D YR ) complex structures of genetic groups were penalized in our analysis (Tabs. III, IV), reaching a theoretical equilibrium between the model complexity and the need to account for differences between genetic groups. The clear advantage of the S YR D 1 structure suggested substantial differences in the average breeding value of unknown sires along years but not in dams. Indeed, the addition of a genetic group for unknown dams within each year of calving originated a 17.92 times less plausible model (Tab. III). The need of different genetic groups for sires is not a surprising result because, given the small number of sires used in cattle flocks, the replacement of some of them can imply important changes in their average breeding value. Moreover, this flock participated in the breeding and selection scheme of the Bruna dels Pirineus breed, allowing for a genetic trend on weaning weight that could increase genetic differences between animals of subsequent generations.
On the contrary, the advantage of an only group for unknown dams, in contrast with different groups according to the year, could be related with the high longevity of cows, with an average culling age of nine years [27]. This implies low annual replacement rates and therefore, a small and perhaps negligible change in the average breeding value of unknown dams in pedigree along years, just as suggested by the results of the BF and LRT analysis. Moreover, we knew the dam of all the calves with phenotypic record and the permanent environmental effect accounted for an alternative source of variation due to maternal effects (genetic or environmental) which were not constrained by the pedigree.
As a whole, our results showed that an accurate preliminary analysis becomes essential to decide if the inclusion of genetic groups is required and which is the best structure to account for different expectations in breeding values of base animals. In this context, our BF methodology allowed for a fast and easy comparison of all models, providing the probability for each candidate structure.