In this article, we successfully applied MU, a shrinkage-based Bayesian variable selection that we had previously presented in [14], to the well studied and publicly available QTLMAS XII and real data sets with genome-wide marker coverage. In particular, we focussed our attention on comparing the impact of different prior specifications on the stability of QTL detection for genetic association and the stability of breeding value prediction for genomic selection. A Gibbs sampler for MCMC simulation and a GEM algorithm for MAP point estimation were implemented as C extensions to the software package R [22]. The source codes are publicly available as supporting information [see Additional files 2 and 3]. The computation time required by the implementations on a desktop PC appears feasible, being maximally a few hours for MCMC and a few minutes for GEM.

We have compared our results regarding QTL detection and false positive signals to findings from previous studies of the QTLMAS XII data. Overall, our analyses by MU ranked well among the association and mapping methods that were summarized by Crooks et al. [18]. Only one method [41] clearly outperformed MU. Instead of single SNP, this method exploited haplotype information of multiple SNP. Arguably, integration of this additional information into the regression model via a revised genotype matrix could improve the performance of MU.

Especially in the context of QTL detection, the collinearity of the putative predictors (SNP) in genome-wide dense marker data may cause problems in multi-locus models that assume mutual independence of predictors *a priori*, such as scattering of QTL signals over several markers. Several authors have suggested procedures to improve model performance in such settings: for example removing part of the data to reduce the collinearity (e.g. [46]). This general problem of applying MU and other Bayesian variable selection or shrinkage methods needs further research to improve their performance in QTL mapping.

The rather strong positive correlations among Bayes factors with QTL signals observed in several MCMC chains suggest an appreciable robustness of the results with regard to QTL detection under varying prior assumptions. Besides its advantages (see e.g. [27, 47–50]) over such measures as *P*-values, the systematic differences in magnitude, that we observed between the Bayes factors from different MCMC chains, demonstrate the problems and limitations of the categories suggested by Jeffreys [34] and Kass and Raftery [32]. Specifically, blind application of decision rules based on these categories to declare positive QTL signals in genetic association and QTL mapping studies seem inadvisable. We stress the importance of an exhaustive analysis under varying prior assumptions. This need for a wide-ranging analysis is specifically evident, because, in general, weak prior knowledge exists on relevant biological parameters such as the prior probability of a positive QTL signal and the shape of the prior distributions for genetic effects. We tried to alleviate this problem by combining Bayes factor information from several analyses under varying prior specifications.

Obviously, an exhaustive sensitivity analysis under varying prior assumptions is necessary in shrinkage-based or other Bayesian variable selection approaches in general. However, we argue that MU provides some solution to the open problem of selecting relevant variables in Bayesian shrinkage approaches (see [17]), because MU provides a formal framework for hypothesis testing and consequently for the calculation of Bayes factors, in contrast to most other shrinkage approaches.

For the purpose of genomic prediction, MU was competitive with other studies in the estimation of GEBV for both data sets. For the simulated QTLMAS XII data set, it is noteworthy that we considered only four prior specifications in our analysis and did not attempt an exhaustive coverage of the hyper-parameter space. For this data set, our main focus was to compare MCMC and GEM estimations. Although point estimation via the GEM algorithm produced accuracies of GEBV that were inferior to accuracies from point estimation by MCMC for the single prior specifications, accuracy for GEM estimation was improved by combining GEBV across prior specifications to almost the same level as MCMC results.

In the analysis of the real data set, we explored a larger part of the hyper-parameter space and considered a dense grid of hyper-parameter values. Thus, we were able to assess accuracies of GEBV and differences more comprehensively than for the QTLMAS XII data set. Our results showed that the estimated accuracies were very sensitive to the choice of the hyper-parameter *b* in MU and that the sensitivity increased with the number of markers. Unfortunately, comparison of the sensitivity with other Bayesian genomic prediction approaches was hampered, because accuracies from an extensive search across varying prior specifications in other approaches are not documented for this data set.

In very poorly stated problems with many more SNP than individuals, as in the real data set, it may be beneficial to decrease the number of SNP to reduce this disparity prior to variable selection [38, 51]. This will save computer storage capacity and may provide better convergence properties for the algorithms. In this study, we compared random sampling of SNP with sure independence screening (SIS) [38] as two simple methods of SNP pre-selection. For SIS, we followed the pre-selection and cross-validation procedure by [25] to be able to compare MU with the Bayesian genomic prediction methods considered in that study. Our results suggest that MU is competitive with other approaches and SIS produces superior accuracies for GEBV on large parts of the hyper-parameter space, although random sampling and SIS produce almost identical accuracies in the case of 10 000 SNP under optimally chosen hyper-parameters. However, the absolute level of SIS accuracy estimates reported here may be biased upward, because SIS pre-selection depended not only on the training sets but also on the validation sets. This bias may hamper the comparison of SIS and random sampling results.

Pre-selection of SNP surely remains an issue for future research and SIS cannot be the final solution, since some drawbacks of SIS remain unresolved. Because SIS exploits only marginal correlations between markers and the phenotype, and LD between markers is ignored, this approach is associated with a risk that too many SNP in the proximity of a QTL, that carry essentially identical information, are pre-selected. In contrast, SNP with a low marginal correlation but still in LD with a QTL have no chance of entering the set of pre-selected SNP. Methods that simultaneously exploit the connection between SNP and the phenotype and the LD structure between markers could be more appropriate.

Three approaches are available for choosing hyper-parameters. First, cross-validation can be employed to detect the optimal prior configuration with respect to the assessment of prediction for some specific set of individuals. This approach is probably the most suitable and most widely used approach for prediction purposes in experimental studies.

Second, prior expectation about the heritability and limiting assumptions about the genetic architecture of the trait yield a criterion to calibrate hyper-parameters via the prior variance of additive genetic effect sizes. In this study, we derived such a criterion for MU and tested its performance in the analysis of the real data. However, the results do not support the idea that values of hyper-parameters corresponding to realistic heritabilities according to this criterion positively affect prediction accuracy.

As a third alternative to choose hyper-parameters, expert knowledge may be available on the size of hyper-parameters. However, these three approaches are not free of error in practical situations and, therefore, doubt will remain for any specific prior choice. Combining results from varying prior specifications using ”poor man’s” model averaging, as was done here, may provide some solution, as a wider choice of hyper-parameters can be integrated into ”consensus” estimates. For this purpose, an approach like MU is especially suitable, because it comprises a wide and flexible family of prior distributions.

To reduce the problem induced by the sensitivity to the choice of hyper-parameters, it is common practice in Bayesian modeling to add an extra layer to the hierarchy and to assign own prior distributions to at least some of the hyper-parameters. This is commonly done, for example, in Bayesian LASSO and stochastic search variable selection methods such as BayesCn and BayesDn (see e.g. [52]). We have refrained from doing so in this study, for the following reasons: (1) the sensitivity problem may just be moved to the next layer of the hierarchy and the method may then become sensitive to the parameters controlling the prior distribution of a hyper-parameter (see [25]), and (2) even if the approach may work in MCMC implementation, the hyper-parameter may not be identifiable in faster maximum *a posteriori* estimation algorithms such as EM, GEM or variational Bayes algorithms (e.g., [16, 25]). This behaviour would possibly have a negative impact on the performance of the GEM estimation algorithm that was introduced in this study to make the MU method scalable for large SNP panels with thousands of variables and individuals. Finally, assigning priors to hyper-parameters may result in bad separation of QTL signals [5], which is less important in genomic prediction but of major concern in genetic association studies.