An efficient unified model for genomewide association studies and genomic selection
 Hengde Li^{1}Email authorView ORCID ID profile,
 Guosheng Su^{2},
 Li Jiang^{1} and
 Zhenmin Bao^{3}Email author
Received: 12 February 2017
Accepted: 7 August 2017
Published: 24 August 2017
Abstract
Background
A quantitative trait is controlled both by major variants with large genetic effects and by minor variants with small effects. Genomewide association studies (GWAS) are an efficient approach to identify quantitative trait loci (QTL), and genomic selection (GS) with highdensity single nucleotide polymorphisms (SNPs) can achieve higher accuracy of estimated breeding values than conventional best linear unbiased prediction (BLUP). GWAS and GS address different aspects of quantitative traits, but, as statistical models, they are quite similar in their description of the genetic mechanisms that underlie quantitative traits.
Methods
Here, we propose a stepwise linear regression mixed model (StepLMM) to unify GWAS and GS in a single statistical model. First, the variance components of the genomicBLUP (GBLUP) model are estimated. Then, in the SNP selection step, the linear mixed model (LMM) for GWAS is equivalently transformed into a simple linear regression to improve computation speed, and the most significant SNP is selected and included into the evaluation model. In the SNP dropping step, the SNPs in the evaluation model are tested according to the standard errors of their estimated effects. If nonsignificant SNPs are present, the least significant one is dropped from the model and variance components are reestimated. We used extended Bayesian information criteria (eBIC) to evaluate the model optimization, i.e. the model with the smallest eBIC is the final one and includes only significant SNPs.
Results
We simulated scenarios with different heritabilities with 100 QTL. StepLMM estimated heritability accurately and mapped QTL precisely. Genomic prediction accuracy was much higher with StepLMM than with GBLUP. The comparison of StepLMM with other GWAS and GS methods based on a dataset from the 16th QTLMAS Workshop showed that StepLMM had medium mapping power, the lowest rate of false positives for QTL mapping, and the highest accuracy for genomic prediction.
Conclusions
StepLMM is a combination of GWAS and GBLUP. GWAS and GBLUP are beneficial to each other in a single statistical model, GWAS improves genomic prediction accuracy, while GBLUP increases mapping precision and decreases the rate of false positives of GWAS. StepLMM has a high performance in both GWAS and GS and is feasible for agricultural breeding programs and human genetic studies.
Background
The genetic mechanisms that underlie a quantitative trait are complicated processes to analyse. Under the infinitesimal model [1], it is assumed that a quantitative trait is determined by an infinite number of unlinked and nonepistatic loci, each with an infinitesimal effect, that satisfy normality and linearity. Based on the infinitesimal hypothesis, best linear unbiased prediction (BLUP) [2] is an effective method to predict genetic values based on the resemblance between geneticallyrelated animals within a pedigree. With the advancement of highthroughput genotyping, prediction of genetic values can also be inferred from genomewide single nucleotide polymorphism (SNP) data, referred to as genomic selection [3].
Many statistical models and algorithms are available for genomic prediction, which differ in the assumptions regarding the distribution of SNP effects. For example, Bayesian variable selection models [4] and least absolute shrinkage and selection operator (LASSO) models [5] assume that some SNPs have large or moderate effects and the others have small or null effects, while linear mixed models assume that the effects of all SNPs are normally distributed with equal variance [6]. The genomic BLUP (GBLUP) model is a linear mixed model, which integrates a genomic relationship matrix that is built using the information of SNPs, instead of a pedigreebased relationship matrix [6, 7]. This model has become a frequently used method for genomic prediction in plant and animal breeding [8–10].
A genomic estimated breeding value is obtained by summing the effects of all variants but it is generally acknowledged that, among the whole set of variants, some have a larger genetic effect on the trait of interest than others. Genomewide association studies (GWAS) have validated the existence of causal variants and have become an important tool to identify variants that underlie human diseases and agriculturally important traits. Nevertheless, performing GWAS and quantitative trait loci (QTL) analyses at the genomewide level is a challenging issue. Population structure or relatedness between individuals can lead to a high rate of false positives and to lower mapping precision and statistical power. The linear mixed model (LMM) is an effective method to handle population structure [11], but compared to the linear regression model (LRM), which is widely used in human genetics, LMM is computationally demanding. Although it takes only a few seconds to perform an association analysis for one variant, the total computation time necessary for millions of genomic variants is unimaginable. To improve the computation efficiency of GWAS, several methods were developed. The GRAMMAR method [12] first adjusts observations for family effects to estimate the residuals, and then analyses the association between SNPs and the residuals. To further decrease computational burden, Zhang et al. [13] compressed LMM by clustering individuals into groups and using a twostep association analysis to eliminate the need of recomputing variance components, which significantly improved computation efficiency. However, the clustering process by compression of the genetic relationship matrix eliminates the possibility of predicting genetic values, because individuals in the same group share the same genetic value. Kang et al. [14] proposed the EMMA algorithm to improve iteration speed, but solving mixed model equations (MME) with a large sample size and hundreds of thousands of variants remains computationally intensive. Meyer and Tier [15] found that the coefficient matrix \({\mathbf{C}}_{11}\) of MME and \({\mathbf{C}}_{11}^{  1}\) were constant across the multiple analyses for individual SNPs, and proposed a computing strategy named SNP Snappy.
The above methods are based on a singlelocus model combined with a realized genetic relationship matrix to account for confounding effects. Because quantitative traits are controlled by multiple loci, a multiplelocus model may be more appropriate. It was reported that a multiplelocus model outperforms a singlelocus model in traditional QTL mapping [16]. A multiplelocus model for GWAS is not only more robust than a singlelocus model in the statistical sense, but it also brings more computational burden. Shrinkage is an efficient method to select latent sparse predictors from genomewide variants. Lee et al. [17] developed a Bayesian method that uses genomewide markers to predict phenotypes simultaneously. Li et al. [18] proposed the Bayesian Lasso model for GWAS but it ignores population structure. Rakitsch et al. [19] presented the LMMLasso model, which corrects confounding effects with LMM and then selects candidate variants by Lasso regression. Although LMMLasso is efficient in computation cost, it approximates the variance components for confounding that is caused by the genetic background of variants and random errors, and these are assumed to remain unchanged after new variants become cofactors in the statistical model for GWAS, which may have unfavourable effects on the subsequent genomic prediction. Segura et al. [20] suggested a multiple loci mixed model (MLMM) that introduced a stepwise regression with forward inclusion and then backward elimination of variants. A more general and comprehensive stepwise regression would be to include a new variant into the model conditionally on the GWAS results based on residuals of the model and to drop the least significant variant through a significance test until all variants in the model are significant and those outside of the final model are not significant, as in multiple linear regression. From a genomic prediction point of view, a combination between the linear mixed model and the sparse regression model is more accurate than either model separately [21, 22].
As an alternative, we proposed a stepwise linear mixed regression model that is stable, flexible and computationally efficient. Importantly, this model can be used both for GWAS and GS simultaneously. We used the linear mixed model and a realized relationship matrix to handle population structure or confounding effects. At each regression step, the variance components are reestimated by an efficient mixed model (EMM) approach. Then, the linear mixed regression model is equivalently transformed into a simple linear model by removing the influence of random effects (see “Methods”), which decreases computation time significantly. For the genetic evaluation model, we used extended Bayesian information criteria (eBIC) as convergence criteria, which are useful and stringent criteria for model selection in GWAS [23]. The model for which the eBIC reaches the lowest value is considered as the final model, and all the variants that it includes are significant quantitative trait loci (QTL). We evaluated our approach and demonstrated its utility by applying it in GWAS and GS on simulated data for human genetic analyses and agricultural breeding.
Methods
Linear mixed model
Stepwise linear mixed model (StepLMM)

Step 1: estimate \(\sigma_{g}^{2}\) and \(\sigma_{e}^{2}\) using Model (3);

Step 2: calculate \({\mathbf{V}}\) and decompose it with Eq. (4), then transform \({\mathbf{y}}\), \({\mathbf{X}}\) and \({\mathbf{Z}}\) with Models (6), (7), and (8), respectively;

Step 3: perform a significance test for individual SNPs with Model (5);

Step 4: select the new, most significant SNP in the model and estimate variance components with Model (1);

Step 5: test the significance of old SNPs in the model with Eq. (9), and if nonsignificant SNPs exist, drop the one that is least significant and estimate variance components with Model (1);

Step 6: repeat Step 2 through Step 5 until the eBIC of the model cannot decrease anymore;

Step 7: estimate the breeding values and QTL effects.
Evaluation of the model
The proposed StepLMM model was evaluated in terms of mapping precision and prediction accuracy. Mapping precision was evaluated by the number of QTL detected and the number of QTL located at defined regions with certain genome lengths (0, 10, 20 kb) on either side of a true QTL. Genomic prediction accuracy was defined as the correlation between genomic estimated breeding values (GEBV) and true breeding values. In the above model, an individual GEBV is: \({\mathbf{GEBV}} = {\mathbf{Zq}} + {\mathbf{Wg}}\). The variance of QTL \(i\) is \(\sigma_{{q_{i} }}^{2} = 2p_{i} \left( {1  p_{i} } \right)q_{i}^{2}\), where \(p_{i}\) is the minor allele frequency of QTL \(i\), and the total QTL variance is \(\sigma_{q}^{2} = \mathop \sum \nolimits 2p_{i} (1  p_{i} )q_{i}^{2}\). Therefore, the heritability is calculated as \(h^{2} = \frac{{\sigma_{q}^{2} + \sigma_{g}^{2} }}{{\sigma_{q}^{2} + \sigma_{g}^{2} + \sigma_{e}^{2} }}\). To evaluate our StepLMM model, we also compared it with traditional genomic selection methods such as GBLUP and BayesB.
Data
We used Wellcome Trust Case Control Consortium (WTCCC) human genotypic data for the simulation (https://www.wtccc.org.uk/info/access_to_data_samples.html). The data consisted of 2000 samples each from the following disease collections: type 1 diabetes, type 2 diabetes, rheumatoid arthritis, inflammatory bowel disease, bipolar disorder, hypertension and coronary artery disease. The samples were genotyped with 500,568 SNPs. SNPs with a minor allele frequency (MAF) less than 0.05 were removed from the analysis. We simulated one complex trait controlled by 100 QTL. The causal variants were randomly chosen and the effect of each locus was randomly drawn from an exponential distribution with a rate of 1. The genetic value of an individual was defined as the sum of the effects of all loci, i.e., the 100 QTL accounted for the whole genetic variance. The phenotypic value was generated by adding a random residual to the genetic value, and a random residual was drawn from a normal distribution with a mean of zero and scaled variance to fix the trait heritability to 0.25, 0.50 and 0.75. For each scenario, 50 replicates were simulated to test the performance of our method.
In addition, we used QTLMAS16 datasets (http://qtlmas2012.kassiopeagroup.com/en/dataset.php) to compare our method with methods in the literature. The QTLMAS16 data consisted of simulated genotypes, true breeding values for three traits with different heritabilities for 4100 samples, among which 3000 samples were phenotyped as training data.
Results
Mapping precision
Mapping precision of the stepwise linear mixed model based on WTCCC simulated data
\(h^{2}\)  nQTL^{a}  Number of detected causal SNPs  Mapping precision  

0 kb  10 kb  20 kb  30 kb  0 kb  10 kb  20 kb  30 kb  
0.25  7.78 (0.345)  5.01 (0.171)  6.22 (0.154)  6.54 (0.154)  6.69 (0.132)  0.65 (0.022)  0.80 (0.019)  0.84 (0.019)  0.86 (0.017) 
0.50  18.52 (0.522)  12.22 (0.333)  15.37 (0.222)  15.37 (0.185)  16.67 (0.167)  0.66 (0.018)  0.83 (0.012)  0.87 (0.010)  0.90 (0.009) 
0.75  34.22 (0.640)  23.95 (0.376)  29.42 (0.308)  30.80 (0.240)  31.82 (0.205)  0.70 (0.011)  0.86 (0.009)  0.90 (0.007)  0.93 (0.006) 
Genomic prediction
Comparison of genomic prediction accuracy between the stepwise linear mixed model (StepLMM) and genomic best linear unbiased prediction (GBLUP) based on WTCCC simulated data
\(\varvec{h}^{2}\)  \(\widehat{\varvec{h}}^{2}\)  \(\widehat{\varvec{h}}_{\varvec{q}}^{2} /\widehat{\varvec{h}}^{{2^{\text{a}} }}\)  \(r_{qtl}^{\text{b}}\)  \(r_{poly}^{\text{c}}\)  \(r_{StepLMM}^{\text{d}}\)  \(r_{GBLUP}^{\text{e}}\)  \(b_{StepLMM}^{\text{f}}\)  \(b_{GBLUP}^{\text{g}}\) 

0.25  0.25 (0.009)  0.68 (0.026)  0.71 (0.010)  0.27 (0.009)  0.75 (0.007)  0.49 (0.008)  0.89 (0.011)  1.16 (0.055) 
0.50  0.52 (0.007)  0.81 (0.013)  0.87 (0.004)  0.23 (0.006)  0.90 (0.002)  0.71 (0.002)  0.94 (0.005)  1.04 (0.042) 
0.75  0.76 (0.004)  0.92 (0.007)  0.95 (0.002)  0.18 (0.007)  0.96 (0.001)  0.86 (0.001)  0.98 (0.002)  1.01 (0.022) 
Comparison with other methods
Comparison of the genomic prediction accuracy between stepwise linear mixed model (StepLMM) and other methods based on QTLMAS16 data
Method  Trait 1  Trait 2  Trait 3 

BayesB  0.79  0.83  0.83 
GBLUP  0.74  0.77  0.76 
GLASSO^{a}  0.79  0.85  0.84 
sgLASSO^{b}  0.80  0.85  0.82 
StepLMM  0.83  0.85  0.85 
Comparison of the mapping precision between stepwise linear mixed model (StepLMM) and other methods based on QTLMAS16 data with 50 simulated QTL
Method  Number of false positives  Number of true QTL  Ratio^{a}  

Trait1  Trait2  Trait3  Total  Trait1  Trait2  Trait3  Total  
RR_YD^{b}  9  15  5  29  8  6  8  22  0.43 
GRAMMAR^{c}  0  0  0  0  2  3  5  10  1.00 
RHM20^{d}  1  0  0  1  6  4  7  17  0.94 
RF_YD^{e}  3  2  0  5  3  3  5  11  0.69 
LDLA^{f}  3  3  1  7  6  2  5  13  0.65 
LA^{g}  4  3  1  8  0  1  2  3  0.27 
StepLMM  0  0  0  0  5  4  2  11  1.00 
In QTL mapping (Table 4), ridge regression on yield deviations (RR_YD) [29] detected the largest number of QTL, but also the largest number of false positives, i.e. RR_YD had the highest power and also the highest rates of false positives. StepLMM, as well as GRAMMAR [30], did not identify any false positive QTL for all three traits, but detected one more QTL than GRAMMAR, i.e. both methods had medium mapping power and very low rates of false positives. The combined linkage disequilibrium and linkage analysis method (LDLA) [31] had medium mapping power, but a relatively high rate of false positives. Among all the methods used for QTL mapping, regional heritability mapping with 20 SNPs (RHM20) [32] had a relatively high mapping power and low rate of false positives, while linkage analysis (LA) [33] had the lowest mapping power and a relatively high rate of false positives. If the proportion of true QTL among all detected QTL was used to measure mapping performance, StepLMM and GRAMMAR performed best.
Discussion
A quantitative trait is controlled both by variants with a large effect and by variants with a small effect. A statistical model that best captures the genetic architecture of a quantitative trait will fit the data of the trait better and provide more accurate estimates of the genetic effect. StepLMM divides the breeding values into QTL and polygenic effects, and fits them with a fixed effect and normally distributed random effect separately, which basically conforms to the genetic architecture of a quantitative trait. StepLMM not only detects variants with a large effect, but also estimates breeding values, hence it is a combination of GWAS and GS. StepLMM is not only an extension, but also a combination of stepwise linear regression and linear mixed regression model. The process of model optimization is also a process of significance test for SNPs in the evaluation model, which avoids determining the threshold values as in traditional association mapping with a singlelocus model. The computational burden of StepLMM depends on how many QTL are detected. Compared to other LMM methods, StepLMM needs to estimate variance components only a few times (equal to the number of detected QTL). Because the number of QTL is very small compared to the number of markers, StepLMM has a clear computational advantage. As a multilocus model, StepLMM can also detect variants with a large effect to improve genomic prediction for species without a genome map, because the stepwise procedure does not depend on either a physical or genetic map. It is useful for many species in aquaculture, for which no genome assembly is available so far.
We found that StepLMM has a high mapping precision and a low rate of false positives and that the balance between these two objectives is good, which is similar to GRAMMAR [30] and regional heritability mapping with 20 SNPs as a region (RHM20) [32]. These three methods can fit the population structure through a realized genetic relationship matrix, which can improve mapping precision and decrease the rate of false positives. A realized genetic relationship matrix constructed with marker data is more accurate than a numerator relationship matrix constructed with pedigree data, and thus improves mapping precision. StepLMM is a multipleQTL mapping model, which usually has more mapping precision than singleQTL models [16].
Many studies have shown that GBLUP models perform well for most traits in livestock [34, 35], but are not satisfactory for the analysis of simulated data where traits are controlled by a small number of QTL [36, 37]. The results from our analysis on simulated human data showed that accuracy of genomic prediction using GBLUP is equal to about the square root of heritability. An important feature of the simulated human data is that genetic relationships between individuals are very weak. The low prediction accuracy obtained with GBLUP indicates that this method is not sufficiently accurate for genomic prediction in a population with distantly related individuals.
Conclusions
A quantitative trait is controlled both by major variants with large effects and by polygenic effects, they are separately fitted with GWAS and GBLUP in StepLMM. GWAS identifies QTL and simultaneously improves genomic prediction accuracy; GBLUP accounts for polygenic effects, it also increases mapping precision and decreases the rate of false positives of GWAS. StepLMM has a high performance in both GWAS and GS and is feasible for agricultural breeding programs and human genetic studies.
Declarations
Authors’ contributions
HL and ZB designed the study. HL, LJ and SG carried out the study. HL, GS and ZB wrote the manuscript. GS and HL revised the manuscript. All authors read and approved the final manuscript.
Acknowledgements
This research was financially supported by the National Natural Science Foundation of China (Grant Number: 31672650).
Competing interests
The authors declare that they have no competing interests.
Data availability
The datasets used in this study were public datasets, and the information to access these has been provided. The program used in this study is attached as an additional file (see Additional file 1).
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Authors’ Affiliations
References
 Fisher RA. The correlation between relatives on the supposition of Mendelian inheritance. Trans R Soc Edin. 1918;52:399–433.View ArticleGoogle Scholar
 Henderson CR. Best linear unbiased estimation and prediction under a selected model. Biometrics. 1975;31:423–47.View ArticlePubMedGoogle Scholar
 Meuwissen THE, Hayes BJ, Goddard ME. Prediction of total genetic value using genomewide dense marker maps. Genetics. 2001;157:1819–29.PubMedPubMed CentralGoogle Scholar
 Habier D, Fernando RL, Kizilkaya K, Garrick DJ. Extension of the Bayesian alphabet for genomic selection. BMC Bioinformatics. 2011;12:186.View ArticlePubMedPubMed CentralGoogle Scholar
 Usai MG, Goddard ME, Hayes BJ. LASSO with crossvalidation for genomic selection. Genet Res (Camb). 2009;91:427–36.View ArticleGoogle Scholar
 VanRaden PM. Efficient methods to compute genomic predictions. J Dairy Sci. 2008;91:4414–23.View ArticlePubMedGoogle Scholar
 Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, Nyholt DR, et al. Common SNPs explain a large proportion of the heritability for human height. Nat Genet. 2010;42:565–9.View ArticlePubMedPubMed CentralGoogle Scholar
 Xu SZ, Zhu D, Zhang QF. Predicting hybrid performance in rice using genomic best linear unbiased prediction. Proc Natl Acad Sci USA. 2014;111:12456–61.View ArticlePubMedPubMed CentralGoogle Scholar
 Desta ZA, Ortiz R. Genomic selection: genomewide prediction in plant improvement. Trends Plant Sci. 2014;19:592–601.View ArticlePubMedGoogle Scholar
 Meuwissen THE, Hayes B, Goddard M. Accelerating improvement of livestock with genomic selection. Annu Rev Anim Biosci. 2013;1:221–37.View ArticlePubMedGoogle Scholar
 Yu J, Pressoir G, Briggs WH, Vroh Bi I, Yamasaki M, Doebley JF, et al. A unified mixedmodel method for association mapping that accounts for multiple levels of relatedness. Nat Genet. 2006;38:203–8.View ArticlePubMedGoogle Scholar
 Aulchenko YS, de Koning DJ, Haley C. Genomewide rapid association using mixed model and regression: a fast and simple method for genomewide pedigreebased quantitative trait loci association analysis. Genetics. 2007;177:577–85.View ArticlePubMedPubMed CentralGoogle Scholar
 Zhang Z, Ersoz E, Lai CQ, Todhunter RJ, Tiwari HK, Gore MA, et al. Mixed linear model approach adapted for genomewide association studies. Nat Genet. 2010;42:355–60.View ArticlePubMedPubMed CentralGoogle Scholar
 Kang HM, Zaitlen NA, Wade CM, Kirby A, Heckerman D, Daly MJ, et al. Efficient control of population structure in model organism association mapping. Genetics. 2008;178:1709–23.View ArticlePubMedPubMed CentralGoogle Scholar
 Meyer K, Tier B. SNP Snappy: a strategy for fast genomewide association studies fitting a full mixed model. Genetics. 2012;190:275–7.View ArticlePubMedPubMed CentralGoogle Scholar
 Jansen RC. Interval mapping of multiple quantitative trait loci. Genetics. 1993;135:205–11.PubMedPubMed CentralGoogle Scholar
 Lee SH, van der Werf JHJ, Hayes BJ, Goddard ME, Visscher PM. Predicting unobserved phenotypes for complex traits from wholegenome SNP data. PLoS Genet. 2008;4:e1000231.View ArticlePubMedPubMed CentralGoogle Scholar
 Li J, Das K, Fu G, Li R, Wu R. The Bayesian lasso for genomewide association studies. Bioinformatics. 2011;27:516–23.View ArticlePubMedGoogle Scholar
 Rakitsch B, Lippert C, Stegle O, Borgwardt K. A lasso multimarker mixed model for association mapping with population structure correction. Bioinformatics. 2013;29:206–14.View ArticlePubMedGoogle Scholar
 Segura V, Vilhjalmsson BJ, Platt A, Korte A, Seren U, Long Q, et al. An efficient multilocus mixedmodel approach for genomewide association studies in structured populations. Nat Genet. 2012;44:825–30.View ArticlePubMedPubMed CentralGoogle Scholar
 Zhou X, Carbonetto P, Stephens M. Polygenic modeling with bayesian sparse linear mixed models. PLoS Genet. 2013;9:e1003264.View ArticlePubMedPubMed CentralGoogle Scholar
 Moser G, Lee SH, Hayes BJ, Goddard ME, Wray NR, Visscher PM. Simultaneous discovery, estimation and prediction analysis of complex traits using a bayesian mixture model. PLoS Genet. 2015;11:e1004969.View ArticlePubMedPubMed CentralGoogle Scholar
 Chen J, Chen Z. Extended Bayesian information criteria for model selection with large model spaces. Biometrika. 2008;95:759–71.View ArticleGoogle Scholar
 Kang HM, Sul JH, Service SK, Zaitlen NA, Kong SY, Freimer NB, et al. Variance component model to account for sample structure in genomewide association studies. Nat Genet. 2010;42:348–54.View ArticlePubMedPubMed CentralGoogle Scholar
 Zhou X, Stephens M. Genomewide efficient mixedmodel analysis for association studies. Nat Genet. 2012;44:821–4.View ArticlePubMedPubMed CentralGoogle Scholar
 Lippert C, Listgarten J, Liu Y, Kadie CM, Davidson RI, Heckerman D. FaST linear mixed models for genomewide association studies. Nat Methods. 2011;8:833–5.View ArticlePubMedGoogle Scholar
 Usai MG, Gaspa G, Macciotta NPP, Carta A, Casu S. XVIth QTLMAS: simulated dataset and comparative analysis of submitted results for QTL mapping and genomic evaluation. BMC Proc. 2014;8:S1.View ArticlePubMedPubMed CentralGoogle Scholar
 Ogutu JO, Piepho HP. Regularized group regression methods for genomic prediction: bridge, MCP, SCAD, group bridge, group lasso, sparse group lasso, group MCP and group SCAD. BMC Proc. 2014;8:S7.View ArticlePubMedPubMed CentralGoogle Scholar
 Karacaören B. Association analyses of QTLMAS 2012 dataset using generalized least squares, principal components and discordant sibpair methodologies. In: Proceedings of the 16th QTLMAS Workshop: 24–25 May 2012; Alghero. 2012. http://qtlmas2012.kassiopeagroup.com/presentazioni/17_Karacaoeren.pdf. Accessed 17 July 2017.
 GrosseBrinkhaus C, Bergfelder S, Tholen E. Genome wide association analysis of the QTL MAS 2012 data investigating pleiotropy. BMC Proc. 2014;8:S2.View ArticlePubMedPubMed CentralGoogle Scholar
 García Gámez E, Sánchez JP, Gutiérrez Gil, Arranz JJ. Analysis of the MAS QTLXVI data set using an LDLA procedure and the DMU software package. In: Proceedings of the 16th QTLMAS workshop: 24–25 May 2012; Alghero. 2012. http://qtlmas2012.kassiopeagroup.com/presentazioni/15_GarciaGamez.pdf. Accessed 17 July 2017.
 Riggio V, PongWong R. Regional heritability mapping to identify loci underlying genetic variation of complex traits. BMC Proc. 2014;8:S3.View ArticlePubMedPubMed CentralGoogle Scholar
 Demeure O, Filagi O, Gilbert H, Moreno C, Legarra A, Elsen JM, et al. Linkage analysis of the XVI QTLMAS simulated dataset using QTLMap. In: Proceedings of the 16th QTLMAS workshop: 24–25 May 2012; Alghero. 2012. http://qtlmas2012.kassiopeagroup.com/presentazioni/16_Demeure.pdf. Accessed 17 July 2017.
 Hayes BJ, Bowman PJ, Chamberlain AJ, Goddard ME. Invited review: genomic selection in dairy cattle: progress and challenges. J Dairy Sci. 2009;92:433–43.View ArticlePubMedGoogle Scholar
 VanRaden PM, Van Tassell CP, Wiggans GR, Sonstegard TS, Schnabel RD, Taylor JF, et al. Invited review: reliability of genomic predictions for North American Holstein bulls. J Dairy Sci. 2009;92:16–24.View ArticlePubMedGoogle Scholar
 Guo G, Lund MS, Zhang Y, Su G. Comparison between genomic predictions using daughter yield deviation and conventional estimated breeding value as response variables. J Anim Breed Genet. 2010;127:423–32.View ArticlePubMedGoogle Scholar
 Meuwissen T, Goddard M. Accurate prediction of genetic values for complex traits by wholegenome resequencing. Genetics. 2010;185:623–31.View ArticlePubMedPubMed CentralGoogle Scholar
 Lee SH, Weerasinghe WMSP, Wray NR, Goddard ME, van der Werf JHJ. Using information of relatives in genomic prediction to apply effective stratified medicine. Sci Rep. 2017;7:42091.View ArticlePubMedPubMed CentralGoogle Scholar
 Minozzi G, Pedretti A, Biffani S, Nicolazzi EL, Stella A. Genome wide association analysis of the 16th QTLMAS Workshop dataset using the Random Forest machine learning approach. BMC Proc. 2014;8:S4.View ArticlePubMedPubMed CentralGoogle Scholar