Impact of prior specifications in ashrinkage-inducing Bayesian model for quantitative trait mapping and genomic prediction

Background In quantitative trait mapping and genomic prediction, Bayesian variable selection methods have gained popularity in conjunction with the increase in marker data and computational resources. Whereas shrinkage-inducing methods are common tools in genomic prediction, rigorous decision making in mapping studies using such models is not well established and the robustness of posterior results is subject to misspecified assumptions because of weak biological prior evidence. Methods Here, we evaluate the impact of prior specifications in a shrinkage-based Bayesian variable selection method which is based on a mixture of uniform priors applied to genetic marker effects that we presented in a previous study. Unlike most other shrinkage approaches, the use of a mixture of uniform priors provides a coherent framework for inference based on Bayes factors. To evaluate the robustness of genetic association under varying prior specifications, Bayes factors are compared as signals of positive marker association, whereas genomic estimated breeding values are considered for genomic selection. The impact of specific prior specifications is reduced by calculation of combined estimates from multiple specifications. A Gibbs sampler is used to perform Markov chain Monte Carlo estimation (MCMC) and a generalized expectation-maximization algorithm as a faster alternative for maximum a posteriori point estimation. The performance of the method is evaluated by using two publicly available data examples: the simulated QTLMAS XII data set and a real data set from a population of pigs. Results Combined estimates of Bayes factors were very successful in identifying quantitative trait loci, and the ranking of Bayes factors was fairly stable among markers with positive signals of association under varying prior assumptions, but their magnitudes varied considerably. Genomic estimated breeding values using the mixture of uniform priors compared well to other approaches for both data sets and loss of accuracy with the generalized expectation-maximization algorithm was small as compared to that with MCMC. Conclusions Since no error-free method to specify priors is available for complex biological phenomena, exploring a wide variety of prior specifications and combining results provides some solution to this problem. For this purpose, the mixture of uniform priors approach is especially suitable, because it comprises a wide and flexible family of distributions and computationally intensive estimation can be carried out in a reasonable amount of time.

Complete distributional specification of the likelihood as well as of the priors in MU, derivation of the fully conditional distributions for the Gibbs sampler and their expected values for GEM.
First, we present a generic formula for the posterior distribution of the model parameters in MU. To simplify the notation, let Y = (Y kj ) be the vector of the phenotype observations and X = (x kjm ) the N × M -matrix of the genotype observations (k = 1, . . . , K; j = 1, . . . , N k ; m = 1, . . . , M ; N = K k=1 N k ). Further, we denote the vector of all model parameters by Θ = (α, β 1 , . . . , β M , u 1 , . . . , u K , σ 2 u , σ 2 ) We use the generic notation Θ −θ to indicate the vector of all model parameters except the univariate parameter θ. Mutual independence is assumed for all univariate random variables in Θ −σ 2 u conditionally given σ 2 u , and all parameters except u 1 , . . . , u K are assumed independent of σ 2 u .
According to Bayes' theorem, the joint posterior distribution of Θ can then be expressed up to a multiplicative constant as where π(·) is the probability density function (pdf) of the joint posterior and in the following also the pdf of any fully conditional posterior, L(·) the likelihood function, wheras p(·) denotes a univariate prior pdf.

Likelihood
Assuming independent and identically normally distributed error terms ( kj ), the likelihood of the parameter vector Θ can be expressed as In the following, we will list the specification of the prior distribution, derive the univariate fully conditional posterior pdf for each model parameter and provide the corresponding expected value. Except for the effect size parameters (β m ) in MU, the choice of prior is conjugate for all univariate fully conditional posteriors.
Note that in the following L(Y |X, θ) is the univariate likelihood function of the parameter θ obtained from (2) by omitting a multiplicative constant involving only elements of Θ −θ

Common intercept α
Assigning a normally distributed prior with mean 0 and variance c > 0 to α, its fully conditional posterior Effect size parameters β m (m = 1, . . . , M ) in MU Let ϕ(x) denote the pdf of the univariate standard normal distribution and Φ(x) its cumulative distribution function. The pdf of the fully conditional posterior of β m is where As the prior pdf of β m is a step function, namely and Polygenic effects u k (k = 1, . . . , K) Assigning a normally distributed prior with mean 0 and variance σ 2 u to u k , its fully conditional posterior pdf where β m X kjm . The fully conditional posterior distribution of u k is therefore Gaussian with expected value m u k and variance v u k , where Between-families variance component σ 2 u Assigning an inverse-gamma distribution with shape parameter s u > 0 and rate parameter r u > 0 as the prior of σ 2 u , its fully conditional posterior pdf is The fully conditional posterior distribution of σ 2 u is therefore inverse-gamma with the shape parameter K/2 + s u and the rate parameter K k=1 u 2 k /2 + r u . The expected value of this distribution is K k=1 u 2 k /2 + r u /(K/2 + s u − 1).

Residual variance σ 2
Assigning an inverse-gamma distribution with shape parameter s > 0 and rate parameter r > 0 as the prior of σ 2 , its fully conditional posterior pdf is π(σ 2 |Y, X, Θ −σ 2 ) ∝ L(Y |X, σ 2 )p(σ 2 ) where R kj = Y kj − α − M m=1 β m X kjm − u k . The fully conditional posterior distribution of σ 2 is therefore inverse-gamma with the shape parameter N/2 + s and the rate parameter