Genomic selection models for directional dominance: an example for litter size in pigs

Background The quantitative genetics theory argues that inbreeding depression and heterosis are founded on the existence of directional dominance. However, most procedures for genomic selection that have included dominance effects assumed prior symmetrical distributions. To address this, two alternatives can be considered: (1) assume the mean of dominance effects different from zero, and (2) use skewed distributions for the regularization of dominance effects. The aim of this study was to compare these approaches using two pig datasets and to confirm the presence of directional dominance. Results Four alternative models were implemented in two datasets of pig litter size that consisted of 13,449 and 11,581 records from 3631 and 2612 sows genotyped with the Illumina PorcineSNP60 BeadChip. The models evaluated included (1) a model that does not consider directional dominance (Model SN), (2) a model with a covariate b for the average individual homozygosity (Model SC), (3) a model with a parameter λ that reflects asymmetry in the context of skewed Gaussian distributions (Model AN), and (4) a model that includes both b and λ (Model Full). The results of the analysis showed that posterior probabilities of a negative b or a positive λ under Models SC and AN were higher than 0.99, which indicate positive directional dominance. This was confirmed with the predictions of inbreeding depression under Models Full, SC and AN, that were higher than in the SN Model. In spite of differences in posterior estimates of variance components between models, comparison of models based on LogCPO and DIC indicated that Model SC provided the best fit for the two datasets analyzed. Conclusions Our results confirmed the presence of positive directional dominance for pig litter size and suggested that it should be taken into account when dominance effects are included in genomic evaluation procedures. The consequences of ignoring directional dominance may affect predictions of breeding values and can lead to biased prediction of inbreeding depression and performance of potential mates. A model that assumes Gaussian dominance effects that are centered on a non-zero mean is recommended, at least for datasets with similar features to those analyzed here. Electronic supplementary material The online version of this article (10.1186/s12711-018-0374-1) contains supplementary material, which is available to authorized users.


Background
Since the availability of dense genotyping panels [1], genomic prediction [2,3] has become a very successful strategy for the prediction of breeding values of candidates for selection. Genomic prediction methods are based on the evaluation of the additive substitution effects of markers that capture a large part of the dominance and higher-order interaction effects [4]. However, estimating dominance effects may be relevant because their estimates can be used to allocate mates among candidates for selection [5,6].
Two approaches have been suggested to estimate the effects of dominance in genomic prediction methods. The first [7] directly models the additive (a) and dominance (d) effects, while for the second, Vitezica et al. [8] proposed to include allele substitution (α) and dominance deviation (δ) effects in order to compute appropriate breeding values. However, both these approaches impose a Gaussian regularization of additive and dominance effects that forces a symmetric distribution of the posterior estimates.
Nevertheless, the classical theory of quantitative genetics [9] argues that inbreeding depression and heterosis are based on the presence of directional dominance (i.e., a higher percentage of positive than negative dominance effects) and this contrasts with the assumption of symmetry of the above-described procedures. This discrepancy can be overcome in at least two ways: (1) by assuming that the mean of dominance effects differ from zero, which leads to the inclusion of a covariate for the average individual homozygosity in the statistical model, and (2) by using skewed distributions for the regularization of dominance effects. The first approach can be called regression on genomic inbreeding and it was empirically used by Sun et al. [6], Silió et al. [10], Aliloo et al. [11] and Zeng et al. [12], and proved by Xiang et al. [13]. As far as we are aware, the second approach was never applied in the field of animal genetics, although it may be of considerable value since it ensures that the most frequent dominance effects are close to zero. In contrast, regression on genomic inbreeding implies that the mean and mode of the dominance effects are equal, although they may differ from zero. In the statistical literature, there is a broad corpus on the specification of skewed distributions [14,15] and, among them, the family of skew-elliptical distributions defined by Sahu et al. [16] can be easily implemented in Bayesian regression using Markov chain Monte Carlo (MCMC) techniques [17].
The objectives of this study were: (1) to develop a genomic best linear unbiased prediction (BLUP) model that uses a prior skewed distribution for dominance effects; (2) to compare it with the model with inclusion of a covariate for inbreeding proposed by Xiang et al. [13]; and (3) to confirm the presence of directional dominance for pig litter size.

Data
The data used in this study were from two unrelated pig lines provided by Genus plc (Hendersonville, TN, USA). Genotypes for all sows were generated using the Illumina PorcineSNP60 BeadChip (Illumina, San Diego). After quality control, i.e. excluding genotypes from single nucleotide polymorphisms (SNPs) with a minor allele frequency lower than 0.05 and a call rate lower than 0.95 in each population, 37,900 and 37,011 genotypes for SNPs remained for lines 1 and 2, respectively. Individuals with a call rate lower than 0.95 were also removed. Finally, the number of sows included in the analysis were 3631 and 2612 for lines 1 and 2, respectively. In total, 13,449 and 11,581 records on litter size (number of piglets born alive) were available for these sows, with an average litter size of 11.7 ± 2.9 and 12.4 ± 3.0 for lines 1 and 2, respectively.

Genomic prediction models
The first step was the definition of a Full Model that included both approaches for directional dominance, i.e. regression on genomic inbreeding and a skewed distribution for SNP effects). Then, a subset of model parameters was set to 0 in order to identify reduced models. The Full Model was: where y is the vector of phenotypic records, µ is the general mean, b is a covariate that can be interpreted as inbreeding depression or heterosis, t is a vector of order of parity effects (4 levels − 1st, 2nd, 3rd and > 3rd), r is a vector of farm-year-month of farrowing effects (3163 levels for line 1 and 4293 for line 2), c is a vector of permanent environmental effects (3631 and 2612 levels for lines 1 and 2, respectively), a and d are vectors of additive and dominance effects (37,900 and 37,011 levels), and e is a vector of residuals. Furthermore, h is a vector of the average SNP homozygosity of the individuals and X, W, Q, Z and K are incidence matrices that link the phenotypic records with t, r, c, a and d, respectively. Under the Bayesian paradigm, prior distributions were uniform for µ, b, and for each element of t, univariate Gaussian for each element of r, c and a, and skew Gaussian for each element of d. Finally, prior distributions for the variances of farmyear-month of farrowing (σ 2 r ), permanent environmental (σ 2 c ), additive (σ 2 a ), dominance (σ 2 d ), and residual effects (σ 2 e ) were scaled inverted Chi square (see "Appendix" for a full description of the Bayesian inference). It should be noted that directional dominance comprises two model parameters, one covariate for the average SNP homozygosity (b) and one asymmetry parameter ( ) that is involved in the skew Gaussian prior distribution of dominance effects.
Based on the Full Model, three reduced models were defined as follows: •  [5].
The four models (Full, SC, AN, SN) were analyzed using a Gibbs sampler [18] (see "Appendix" for a full description) that provided posterior distributions for all unknowns in the model, i.e. individual breeding values (s a ) and dominance deviations (s d ), additive and dominance variances (V A and V D ), and the expected inbreeding depression per percentage of inbreeding (I D ). Models SC, AN and SN were analyzed by five chains of 75,000 iterations, after discarding the first 25,000. Each chain used a different random seed. As the convergence of the Full Model was clearly the worst, the Gibbs sampler implementation for this model was set to five chains of 250,000 iterations, after a burn-in of the first 50,000. Convergence and effective sample size were checked using the standard procedures [19] with CODA package [20] and by visual inspection of the chains. Finally, models were compared using the deviance information criteria (DIC) [21] and the logarithm of the conditional predictive ordinate (LogCPO) [22] (see "Appendix" for a full description).

Results
The results of the convergence and the effective size of the MCMC chains are presented in Additional file 1: Tables S1 and S2. The average number of iterations required until convergence was computed using the Raftery and Lewis approach [23] and ranged from 93.0 (for parameter b in the SC Model for line 1) to 9204.0 (b in the Full Model for line 1). The estimated effective sample size (EFS) of the MCMC chains [24] ranged from 82 (d 2 in the SN Model for line 2) to 16,510 (h 2 in the Full Model for line 1). Finally, the required numbers of samples to achieve an accuracy of 0.1 for the 0.5 quantile with a probability of 0.95 were calculated using the Raftery and Lewis approach [23] and ranged from 3210 (b in the SC Model for line 1) to 290,280 (b in the Full Model for line 1).
Posterior mean estimates (and posterior deviations) of variance components, asymmetry parameters, and expected inbreeding depression and results of the comparison of models are in Tables 1 and 2 for lines 1  Because of the above results, estimates of additive genetic variance (V A ) and narrow sense heritability (h 2 ) were higher for Models AN and Full than for Models SN and SC (Tables 1 and 2). In contrast, posterior mean estimates of the variance of dominance deviations (V D ) and percentage of dominance variation (d 2 ) were lower for Models SN and SC than for Models AN and Full (Tables 1 and 2).
Estimates of the variance of farm-year month effects (σ 2 r ) and of residuals (σ 2 e ) were consistent between

Table 1 Posterior mean (and posterior standard deviation) estimates for variance components, asymmetry parameters, inbreeding depression, ratios of additive and dominance variation and criteria for model comparison for line 1
b is the covariate with individual homozygosity, σ 2 a and σ 2 d are the variance of the additive and dominance SNP effects, σ 2 r is the variance of the permanent environmental effects, σ 2 c is the variance of the farm-year-month effects, d is the asymmetry parameters for the dominance effects, σ 2 e is the residual variance, I D is the inbreeding depression per percentage of inbreeding, V A and V D are the additive and dominance variance, h 2 and d 2 are the heritability and the ratio of dominance variance, LogCPO is the logarithm of the conditional predictive ordinate and DIC is the deviance information criterion  models, ranging from 0.160 to 0.161 for line 1 and from 0.296 to 0.299 for line 2 for the farm-year-month variance and from 6.567 to 6.570 and from 6.630 to 6.635 for the residual variance for lines 1 and 2, respectively. However, the estimates of the variance of permanent environmental effects (σ 2 c ) differed substantially between models (Tables 1 and 2), with posterior mean estimates for the SN and SC Models being the highest (0.478 and 0.572 for line 1 and 0.580 and 0.614 for line 2, respectively) and decreasing when asymmetry was allowed, reaching the lowest estimates for Models AN (0.308 and 0.380 for lines 1 and 2, respectively) and Full (0.394 and 0.333).
Posterior mean estimates of the asymmetry parameter for dominance effects ( ) were all positive (Tables 1 and  2  The regression coefficient on individual homozygosity (b) was estimated with Models SC and Full (Tables 1 and  2 and Fig. 2). With the SC Model, posterior mean estimates of b were clearly negative (− 12.15 and − 7.95 for lines 1 and 2, respectively), but equal to − 5.72 and 1.73 for lines 1 and 2 for the Full Model. It should also be noted that posterior standard deviations were higher for the Full than for the SC Model. The HPD95 regions for b included zero for the Full Model, but posterior probabilities of negative values were always higher than 0.99 for Model SC.
Results for the expected inbreeding depression (I D ) per percentage of inbreeding are in Tables 1 and 2  Correlations of estimates for the SNP additive (a) and dominance (d) effects and for breeding values (s a ) and dominance deviations (s d ) between the four models of analysis are in Additional file 2: Tables S3 and S4. Correlations of estimates of the additive and dominance effects between models were always higher than 0.990 and correlations of estimates of breeding values and dominance deviations between models were also close to 1. However, it should be noted that the correlations between the estimated breeding values from the SN Model and the dominance deviations from the AN Model with the estimates from the remaining models were remarkably lower than those from the other models. In the first case, they ranged from 0.933 to 0.944 in line 1 and from 0.794 to 0.842 in line 2 and in the second case, from 0.769 to 0.944 in line 1 and from 0.702 to 0.857 in line 2.
Results of the model comparison tests (logCPO and DIC) are also in Tables 1 and 2. In both lines, the model with the best fit for both tests was the SC Model, followed by the SN and AN Models. The Full Model had the worst fit.

Discussion
The advent of dense genotyping information has allowed the development of models for genomic evaluation [3] that have revolutionized the field of animal breeding during the last decade. Most models for genomic evaluation are designed to deal with the classical statistical problem of large p and small n, because the number of parameters to evaluate is frequently larger than the number of phenotypic data. The most common approach for dealing with this problem is the use of some kind Table 2 Posterior mean (and posterior standard deviation) estimates for variance components, asymmetry parameters, inbreeding depression, ratios of additive and dominance variation and criteria for model comparison for line 2 b is the covariate with individual homozygosity, σ 2 a and σ 2 d are the variance of the additive and dominance SNP effects, σ 2 r is the variance of the permanent environmental effects, σ 2 c is the variance of the farm-year-month effects, d is the asymmetry parameters for the dominance effects, σ 2 e is the residual variance, I D is the inbreeding depression per percentage of inbreeding, V A and V D are the additive and dominance variance, h 2 and d 2 are the heritability and the ratio of dominance variance, LogCPO is the logarithm of the conditional predictive ordinate and DIC is the deviance information criterion  of regularization of the effects of SNPs [25]. Several approaches have been suggested, ranging from simple Gaussian regularization [2] to more complex models that involve t-shaped [2], double exponential [26,27], or mixtures of distributions [2,28,29]. However, all these methods of regularization use symmetric distributions that, from a Bayesian perspective, imply that marker effects are centered at zero. This assumption seems reasonable for the additive or substitution effects, but it is not so clear for dominance effects. In fact, the classical theory of quantitative genetics attributes the phenomenon of inbreeding depression (or heterosis) to the presence of directional dominance or, in other words, a positive average of dominance effects, jointly with a decrease (or increase) in the degree of heterozygosity [9]. In this study, we considered two approaches to model directional dominance in genomic evaluation methods. The first assumed a prior distribution for dominance effects that allowed a mean that was different from 0, i.e. Model SC, following the work of Xiang et al. [13]; the second assumed that dominance effects followed a skew Gaussian distribution that has a higher probability of positive (or negative) effects, i.e. Model AN. Finally, both approaches were combined into a Full Model. All models were implemented using a Gibbs sampler. The analysis of the MCMC chains indicated that convergence was achieved with the proposed burn-in for all models and both lines (25,000 iterations for AN, SC and SN Models and 50,000 for the Full Model). Nevertheless, the EFS was heterogeneous across parameters and models. In general, the EFS of the variance of dominance effects was smaller than that of the variance of additive effects, and the EFS of the parameters related with directional dominance (b and ) were very large for the AN and SN Models and remarkably smaller for the Full Model. Nevertheless, the sizes of the five Gibbs sampler chains (5 × 75,000 iterations for AN, SN and SC Models and 5 × 200,000 for the Full Model) were always larger  Fig. 3 Posterior distribution of the expected inbreeding depression for an inbreeding level of 0.10 for lines 1 and 2 and S2). As a consequence, their posterior distributions were wider and they included zero in the HPD at 95% for b and (Figs. 1 and 2) and convergence and EFS for both these parameters were worse than with the SC and AN Models (see Additional file 1: Tables S1 and S2).
Models that allow the presence of directional dominance (SC, AN, Full) were able to predict the expected inbreeding depression (I D ) in populations that had a low range of levels of genealogical inbreeding. This approximation uses the classical additive model of inbreeding depression [9] but replacing dominance effects of causal polymorphisms with dominance effects of SNPs. In this approach, a linear relationship between inbreeding and inbreeding depression is assumed. Results were presented as the expected inbreeding depression per percentage of inbreeding. In the analyzed populations, the expected inbreeding depression coefficients under these models were around − 0.045 and between − 0.025 and − 0.028 piglets in lines 1 and 2, respectively. These results concur with those of Vitezica et al. [32], who also reported larger estimates of inbreeding depression in line 1 than in line 2 and they are close to the estimates of inbreeding depression for litter size in other pig populations [33][34][35]. In contrast, the estimates provided by the SN Model were substantially closer to 0, i.e. − 0.016 and − 0.008 piglets for lines 1 and 2, respectively. This may indicate that models that do not allow for directional dominance, such as the SN Model, cannot predict the magnitude of inbreeding depression (or heterosis) correctly and, thus, lead to biases if they are used for the prediction of future mate performance and mate allocation [5].
Nevertheless, there were some remarkable differences between the results obtained for the two lines, which are interesting to analyze further. Evidence of directional dominance was larger for line 1 than for line 2 for both approaches (estimates of − 12.15 vs. − 6.48 for b in Model SC and of 0.38 vs. 0.25 for in Model AN), although posterior estimates of the dominance variance were lower for line 1 for all models. This suggests that the magnitude of directional dominance (or inbreeding depression) is not necessarily related to the amount of dominance variance estimated from resemblance between relatives. In fact, in the presence of inbreeding, the total genetic variance is split into five components [36,37]: the additive and dominance genetic variances in the base population, the dominance genetic variance between homozygous individuals, the covariance between additive and dominance effects between homozygous individuals, and the square of the inbreeding depression. Traditional approaches to estimate dominance variance using genealogical [38,39] or genomic dominance relationships [32] only take the additive and dominance variance in the base population into account and ignore the remaining variance components. It is possible that the presence of directional dominance also allows some of the other variance components that are not considered under the assumption of multivariate normality to be captured.
Of particular significance is the fact that the estimates of the variance of dominance effects differed substantially between models. Lower estimates were obtained with the SC Model, whereas estimates from Models SN, AN and Full were higher. The cause of the inflation of dominance effects under the last three models may be the restrictions imposed by the assumed prior distributions. Under Model SN, the prior distribution forced the mean and mode of effects to be centered at zero. Thus, if directional dominance exists, specific estimates of the effects of SNPs would attempt to accommodate this, which would lead to an increase in the variance of dominance effects. Model AN allowed the presence of more positive (or negative) dominance effects but it forced the mode of the distributions to be close to zero. Estimates of the effects of SNPs for Model AN may be even larger than for Model SN, but as the prior distribution forced them to have a mode close to zero, the variance of dominance effects was also inflated in Model AN. Furthermore, the increase in the variance of dominance effects in Models SN, AN and Full with respect to Model SC was compensated by a corresponding decrease in the permanent environmental variance, as pointed out in other studies [40,41]. Thus, the estimate of the permanent environmental variance was the largest for Model SC for both lines.
The differences between models were also reflected in the correlations of estimates of breeding values and dominance deviations between models. Although the correlations for estimates of SNP additive and dominance effects between models were very high (see Additional file 2: Tables S3 and S4), the correlations for estimates of breeding values and dominance deviations provided some exceptions. For breeding values, estimates from the model that did not consider directional dominance (Model SN) had lower correlations with estimates from the other models (SC, AN and Full). This suggests that the inclusion of directional dominance with either of the two approaches would result in substantial changes in the ranking of individuals based on estimates of breeding values, which may have consequences for breeding decisions. In addition, the correlations between estimates of dominance deviations from Model AN with those from the other models were also lower (0.70-0.94), which may imply that the use of skewed prior distributions affects estimation of dominance deviations and the prediction of performance of future individuals (or crosses).

Comparison of models
The best model based on the two criteria used for comparison of models was the SC Model, followed by the SN and AN Models; the Full Model provided the worst fit for both lines. Model SN does not consider directional dominance and thus, it was penalized relative to Model SC. Models AN and Full were equally able to capture directional dominance since they led to similar estimates of inbreeding depression. However, they were penalized because the number of unknowns in these models is larger than in Model SC; they estimate and one auxiliary variable for the dominance effect of each SNP.
In the light of these results, the main finding of our study is that Model SC, as defined by Xiang et al. [13], is recommended for the analysis of traits when directional dominance (or inbreeding depression) is expected and when resulting estimates of dominance effects are used for prediction of performance of future mates and mate allocation [5]. This recommendation is strengthened by the ease with which the SC Model can be formulated based on the genomic dominance relationship matrix [8], which helps to reduce the computational burden and directly provides predictions of additive and dominance effects for each individual.
However, the application of skewed distributions should not be completely discarded for new lines of research. First, we assumed that the additive and dominance effects were independent, although it is possible to use multivariate asymmetric distributions [16], as in the models of Wellman and Bennewitz [42], which consider a relationship between the magnitudes of additive and dominance effects. Second, the assumption of Gaussian distributions can be replaced by the asymmetric version of any other distribution, such as t-shape or double exponential distribution, leading to asymmetric versions of the Bayes B [3] or Bayesian Lasso [26] approaches. These approaches may avoid the large increase in the variance of dominance effects since most of the estimates of the dominance effects of SNPs will be forced to be zero [3] or closer to zero [26] than with a prior Gaussian distribution.
Finally, all the approaches described here assume that directional dominance is homogeneous along the genome. However, there is evidence in the literature of local differences in the causes of inbreeding depression across the genome [43,44]. Further research is needed to investigate this phenomenon and, also, to model additional causes of inbreeding depression (or heterosis), such as epistatic interactions [45].

Conclusions
The results of our study confirm the presence of positive directional dominance for litter size in two lines of pigs. Ignoring this in genomic evaluation models with dominance effects alters the prediction of breeding values and may cause bias in the prediction of inbreeding depression (or heterosis) and of the performance of future mates. These effects can be avoided by using two alternative models, one that includes a non-zero mean of dominance effects and another that uses skewed prior distributions for them, with the latter providing a better fit. Thus, this approach should be recommended for modeling dominance effects, at least for datasets that have similar features as those analyzed here.
Author's contributions LV, AL and ZGV proposed the models of analysis. WH provided the pig data. LV developed the Fortran code, analyzed the data and wrote the first draft of the manuscript. All authors discussed the results, made suggestions and corrections. All authors read and approved the final manuscript.  where N r , N c and N SNP are the numbers of elements of r, c, and a, respectively. Furthermore, the prior distribution for d was the product of the following skew Gaussian distributions [16]:

Author details
where φ and denote the density function and cumulative distribution function of the standard Gaussian distribution. Following Sahu et al. [16], the first three moments of a skewed Gaussian (SN) distribution-x ∼ SN 0, σ 2 , -are: Note that when λ = 0, E(x) = 0 and Var(x) = σ 2 , as in the standard Gaussian distribution. Finally, the prior distributions for the variance components σ 2 r , σ 2 c , σ 2 a , σ 2 d and σ 2 e were scaled inverse Chi square distributions: with parameters s 2 = 0 and v = −2, that correspond to uniform distributions over the positive real scale.

The Gibbs sampler
The Gibbs sampler is an updating sampling scheme [18] that needs to sample from the full conditional distributions of all unknowns in the model. The full conditional distribution of µ, b and each level of t, r, c, and a were univariate Gaussian as in the standard implementation of the Gibbs sampler in the mixed model [46]. The conditional distribution of each level of d is the product of a Gaussian distribution with a skewed Gaussian (SN) distribution with parameters and σ 2 d : where d −i is the vector of the dominance effects without the i-th effect (d i ), k i is the i-th row of matrix K. In order to facilitate the conditional sampling of d i , we invoked the following property of the skewed Gaussian distribution [16]: which allows to understand the skewed Gaussian distribution as a mixture of an infinite number of standard Gaussian distributions with σ 2 variance and mean defined by u, u being a variable distributed by a half normal (HN) distribution. This property allows us to define the conditional densities for each level of d as the product of two Gaussian densities: where d −i is the vector of the dominance effects without the i-th effect (d i ), k i is the i-th row of the K matrix and u = {u i } is the vector of auxiliary variables with the following half normal (HN) prior distribution: Consequently, the conditional distribution for each element of u was: and the conditional distributions for was: Furthermore, the conditional distribution for σ 2 d was the following scaled inverted Chi square distributions: HN(0, I).
Finally, the full conditional distribution for σ 2 r , σ 2 c , σ 2 a , and σ 2 e were also the following scaled inverted Chi square distributions: The Gibbs sampler also allows to calculate posterior distributions of the any combination of parameters of the model. Thus, we calculated for each iteration of the Gibbs sampler the following approximations for the additive and dominance genetic variances [47] as: where N IND is the number of individuals in the population and s ai and s di the breeding and the dominance deviation values for the i-th individual. They were calculated as: where µ d = − b N SNP as defined by Xiang et al. [13], p j was the observed allelic frequency for the j-th SNP and Note that this parameterization implies a population under Hardy-Weinberg equilibrium and that the covariance between the breeding values and dominance deviations is equal to zero.
Finally, we also calculated the expected inbreeding depression (I D ) for each percentage of inbreeding as [9,37]: The computational costs of a Gibbs sampler implementation of the models with 75,000 iterations after a burn-in of the first 25,000 were approximately 100 and 80 CPU hours for populations 1 and 2, respectively. The analysis were performed with single thread process of an Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80 GHz processor.

Model comparison Deviance information criterion
The deviance information criterion (DIC) was defined by Spiegelhalter et al. [21]. It compares the global quality of two or more models accounting for model complexity.

Log-marginal probability (logCPO)
If we consider the data vector = y i , y −i , where y i is the i-th datum and y −i is the vector of data with i-th datum deleted, the conditional predictive distribution has a probability density equal to: where θ is the vector of parameters (µ, b, t, r, c, a, d, σ 2 r ,σ 2 c , σ 2 a , σ 2 d , and σ 2 e ) in the Full Model). Therefore, p y i |y −i can be interpreted as the probability of each datum given the rest of the data, and it is known as the conditional predictive ordinate (CPO) for the i-th datum. The pseudo log-marginal probability of the data is then: , and N is the number of Markov chain Monte Carlo (MCMC) draws, and θ j is the j-th draw from the posterior distribution of the corresponding parameter. The higher the value of the LogCPO, best fit of the model to data.