Right-hand-side updating for fast computing of genomic breeding values
© Calus; licensee BioMed Central Ltd. 2014
Received: 6 July 2013
Accepted: 26 February 2014
Published: 3 April 2014
Since both the number of SNPs (single nucleotide polymorphisms) used in genomic prediction and the number of individuals used in training datasets are rapidly increasing, there is an increasing need to improve the efficiency of genomic prediction models in terms of computing time and memory (RAM) required.
In this paper, two alternative algorithms for genomic prediction are presented that replace the originally suggested residual updating algorithm, without affecting the estimates. The first alternative algorithm continues to use residual updating, but takes advantage of the characteristic that the predictor variables in the model (i.e. the SNP genotypes) take only three different values, and is therefore termed “improved residual updating”. The second alternative algorithm, here termed “right-hand-side updating” (RHS-updating), extends the idea of improved residual updating across multiple SNPs. The alternative algorithms can be implemented for a range of different genomic predictions models, including random regression BLUP (best linear unbiased prediction) and most Bayesian genomic prediction models. To test the required computing time and RAM, both alternative algorithms were implemented in a Bayesian stochastic search variable selection model.
Compared to the original algorithm, the improved residual updating algorithm reduced CPU time by 35.3 to 43.3%, without changing memory requirements. The RHS-updating algorithm reduced CPU time by 74.5 to 93.0% and memory requirements by 13.1 to 66.4% compared to the original algorithm.
The presented RHS-updating algorithm provides an interesting alternative to reduce both computing time and memory requirements for a range of genomic prediction models.
Many models have been suggested for genomic prediction (for a review: see ). The computing time required to estimate SNP (single nucleotide polymorphism) effects varies considerably between models, e.g. . Computing time depends both on the number of SNPs used and the number of animals in the training dataset. The latter is rapidly increasing, exceeding 15 000 animals in some cases, e.g. . The number of SNPs used is also increasing rapidly with the availability of high-density SNP panels in cattle with 648 874 and 777 962 SNPs  and recently, investigations on the use of whole-genome sequence data in genomic prediction have been reported [5, 6]. These developments emphasize an increasing need to improve the efficiency of genomic prediction models in terms of computing time and memory requirements. To overcome computing limitations, some fast implementations have been reported for genomic prediction models such as BayesA , BayesB [8, 9] and Bayesian Lasso . At the same time, it has been suggested that variable selection methods such as BayesB are required to make optimal use of whole-genome sequence data in genomic prediction . The number of reports that compare the fast implementation of such variable selection methods to the Markov chain Monte Carlo (MCMC) based counterparts has thus far been limited, and all of the aforementioned studies were based on simulated data with a limited number of simulated QTL. To enable the comparison of these fast methods to their MCMC based counterparts in real datasets with whole-genome sequence data, efficient implementations of MCMC genomic prediction models are also required.
Genomic prediction models can be classified into those that involve implicit estimation of SNP effects (using genomic relationships), e.g. , and those that involve explicit estimation of SNP effects . Genomic prediction models that explicitly estimate SNP effects, commonly perform regression with SNP genotypes as predictor variables , coded as 0,1,2 or -1,0,1, referring respectively to the homozygous, heterozygous, and the alternative homozygous genotype. The characteristic that the predictor variables can take only three possible values provides an interesting opportunity to reduce the computing time of algorithms to estimate SNP effects.
The objective of this paper was to describe two efficient algorithms to estimate SNP effects that take advantage of the characteristic that each predictor variable (SNP genotype) can take only three different values. The efficiency of the two algorithms is compared in terms of memory and computing time requirements to that of a commonly used algorithm that is based on residual updating.
Updating schemes to estimate SNP effects
where , is the residual variance, and is the variance associated with locus j. Note that in equation (2) can be estimated in several ways, as done in well-known models such as BayesA and BayesB , BayesC  or Bayesian Stochastic Search Variable Selection [16, 17], or can be assumed to be known as in RR-BLUP [12, 18]. Note that equation (2) gives the value required in a Gauss-Seidel algorithm to compute BLUP estimates for the allele substitution effects, while it gives the mean of the conditional posterior density if a Bayesian model is used to estimate the allele substitution effects.
where the γ j is a vector that contains the (squared) centred and scaled values of the three genotypes that are present at locus j, is a 3 × 1 vector that contains the sum of the residuals for each genotype i at locus j, i.e. , and vector n j contains the number of animals for each genotype at locus j. It should be noted here that , but the notation is introduced here to clarify implementation in the newly proposed algorithm, as will be shown later. Equation (6) involves only four multiplications and, thus, requires m-3 fewer multiplications than equation (4) (noting that all values for can be computed once and stored). Those m-3 multiplications are replaced by m-3 summations that are computationally less demanding than multiplications using standard Fortran functions. Hereafter, the algorithm that uses equations (5) and (6) will be referred to as “improved residual updating”.
where , i.e. a 3 × 1 vector that contains the sums of the residuals before updating locus j + 1 for each genotype k at locus j + 1 in iteration l + 1, and is a 3 × 3 matrix that contains the number of animals that have any of the nine combinations of genotypes at loci j and j + 1. Note that can be computed once and stored. The term is a vector that contains the sum of the residuals for each of the three genotypes k at locus j + 1. Each of these sums is computed from the sums of residuals (before updating) for each of the three genotypes i at locus j that were computed within groups of animals having genotype k at locus j + 1, i.e. . Thus, for locus j, first 3 × 3 = 9 sums of residuals are calculated, one for each unique combination of genotypes at loci j and j + 1. These sums include the residuals before is used to update them. The update at locus j is accounted for by the term .
The alternative proposed updating scheme to compute involves applying equation (8) instead of (4) or (6) for locus j + 1. This updating scheme is hereafter referred to as “right-hand-side updating” (RHS-updating), since it essentially involves direct updating of the right-hand-sides of the model to estimate the SNP effects, without having to explicitly update the residuals every time a SNP effect is estimated.
where is the element in that corresponds to genotype k at locus j + 1 and genotype i at locus j. Applying equations (9) and (10) for locus j and j + 1, finalizes the RHS-updating step, just like equation (5) finalizes the residual updating step.
This implies that the number of operations for locus j is similar for the residual updating and RHS-updating algorithms. However, for locus j + 1, applying equation (8) requires only 20 summations and subtractions and 11 multiplications, compared to m-3 summations and 4 multiplications that are required when applying (6). This indicates that the total number of operations is drastically reduced by the RHS-updating algorithm.
Formulas to predict RAM requirement for original and improved residual updating, and for RHS-updating
Predicted RAM requirements (Gb)1
Original residual updating
nm × 10−9
Improved residual updating
nm × 10−9
2 nm × 10−9/s
The second point that should be noted, is that within the RHS-updating scheme, the initialization step described in equation (10) is the most time-consuming and is recurrent every two loci. By applying the same principles, the RHS-updating scheme can be also applied to more than two loci consecutively. Increasing the number of loci per RHS-block may decrease the relative cost of the initialization step (10), but this will be eventually off-set by the exponential increase in the number of groups that is defined per RHS-block, which is equal to 3s, where s is the number of SNPs per RHS-block. When the number of loci per RHS-block increases, then at some stage the equivalent expression of equation (7) becomes computationally more demanding. The optimal value to be used for s likely depends on the number of individuals in the training data (m) and will be empirically derived in this study.
The three algorithms described above, are mathematically equivalent, in the sense that they estimate SNP effects using the same information. Thus, all three algorithms are expected to give the same results.
Implementation of RHS-updating in Bayesian stochastic search variable selection
where y contains phenotypic records, μ is the overall mean, 1 is a vector of 1’s, X is an m × n matrix that contains the scaled and centered genotypes of all individuals, α contains the (random) allele substitution effects for all loci, and e contains the random residuals. The specific parameterization of the genomic model that results in the Bayes SSVS model is described below.
where x i denotes the SNP genotypes of animal i. Definitions of the unknowns and their prior distributions are described hereafter.
The prior for μ was a constant. The residual variance has a prior distribution of which yields a flat prior.
where n is the number of loci.
Conditional posterior densities
where is a vector with squares of the current estimates of the allele substitution effects of all loci, weighted by vector ω, which contains values of 1 or 100 for each locus.
where 1 − π (π) is the prior probability that I j = 1 (I j = 0), , where y* contains the conditional phenotypes as defined previously, and f(r j |I j = δ), where δ is either 0 or 1, and is proportional to , where . It should be noted that v depends on I j through its dependence on ω j , i.e. if I j = 0 (I j = 1) then ω j = 100 (ω j = 1).
where m is the number of animals with records and e is a vector with the current residuals.
Derivation of the optimal number of SNPs included per RHS-block
Simulated data - CPU time
To investigate to what extent the CPU time of the newly developed algorithms depended on the number of individuals included in the analysis, a dataset of 420 SNPs was simulated. The number of SNPs was limited to 420, to reduce total computation time, and this limited number of SNPs was sufficient to compare the relative computation time of the different algorithms because total CPU time scales linearly with the number of SNPs. Further details of the simulation are not included here because the aim of the subsequent analysis was only to compare CPU time. The size of the dataset was increased with steps of 500 individuals from 500 to 100 000 individuals and each dataset was analysed 11 times. The newly developed RHS-updating algorithm was used in nine of those analyses, using one to nine SNPs per RHS-block. The tenth analysis was an implementation using original residual updating. The eleventh analysis implemented improved residual updating. Each analysis was run for 900 iterations, and the CPU time for these 900 iterations was recorded.
Simulated data - RAM use
To investigate to what extent the required amount of RAM of the newly developed algorithm depended on the number of individuals in the analysis, a dataset of 50 000 SNPs was simulated. The number of individuals was increased in steps of 5000 individuals from 500 to 95 500 individuals. This number of SNPs and animals yielded a range of datasets with dimensions that corresponded to the size of currently used practical datasets and each dataset was analysed 11 times. These analyses involved the same models and settings as used to evaluate CPU time. Each analysis was run until the iterations started, i.e. when the maximum RAM requirement was reached, at which point the used RAM was recorded and the process was aborted. The maximum RAM requirement was measured by retrieving the process ID and then storing the RAM use for that process. Details of this procedure are provided in the Appendix 1.
All comparisons were run on a Windows XP-64 desktop pc with an Intel(R) Xeon(R) 64-bit CPU E5420 with a clock speed of 2.50 GHz. Comparisons on CPU time were also run on a Linux platform with an AMD Opteron 8431 64-bit CPU with a clock speed of 2.39 GHz running Ubuntu 12.04.3. The programs were compiled with the Intel® Fortran Compiler 11.0.075 for Windows and the Intel® Fortran Compiler 13.0.079 for Linux.
Coefficients of the regression of measured on predicted RAM requirements for original and improved residual updating, and for RHS-updating
Original residual updating
Improved residual updating
Two alternative algorithms were presented that can be implemented in various genomic prediction models for fast computing of SNP effects. The algorithms replace the originally suggested residual updating algorithm , without affecting the results obtained. Differences in results between algorithms were similar to those within algorithms when using different random seeds and correlations between different sets of results were greater than 0.99 (results not shown). Both algorithms use the characteristic that only three different genotypes are observed for each SNP. Both algorithms can accommodate loci with more than three genotypes, but this may reduce or eliminate their benefit in terms of computing time over original residual updating schemes. The limitation on the number of genotypes per locus implies that imputed genotypes defined as gene contents cannot be used for individuals in the training data in the algorithm. Nevertheless, a simple transformation of gene contents to the most likely genotype overcomes this problem. This transformation could for instance be (on a 0-2 scale): genotypes ≤ 0.5 are set to 0, genotypes ≥ 1.5 are set to 2, and all other genotypes are set to 1. Such transformations for imputed genotypes in the training data are expected to have a minor impact on the estimated SNP effects, provided that the genotypes are imputed with reasonable accuracy. Using gene contents for selection candidates, i.e. individuals whose genetic merit is predicted using SNP effects estimated from the training data, is not inhibited by the proposed algorithms, because their predicted genetic merit can simply be obtained outside the algorithm by multiplying their gene contents with allele substitution effects that are estimated in the algorithm. The literature shows that for selection candidates, predictions differ when gene contents or the most likely genotypes are used .
The residual updating algorithms were implemented using standard (e.g. dot_product) Fortran functions. Computer-specific optimized libraries are available [21, 22] that can considerably reduce the CPU time required for, e.g., vector and matrix multiplications . Using such libraries may have a larger impact on CPU time for the residual updating algorithms than for the RHS-updating algorithm, since the former involves many more multiplications. However, even when using such optimized libraries, the RHS-updating algorithm is still expected to be considerably more efficient, because it drastically reduces the total number of required operations.
It should be noted that the RHS-updating scheme requires slightly more overhead in terms of computing time than the residual updating scheme, for instance to define the RHS-blocks and the group coding within those blocks. Once RHS-blocks and group coding are defined, they can be used in each iteration of the Gibbs chain. This may limit flexibility in the algorithm. For instance, one way to improve mixing of the Gibbs chains, may be to permute the order of evaluation of SNP effects between iterations. With the RHS-updating scheme, the order of evaluation of SNP effects within RHS-blocks must be the same throughout the Gibbs chain, such that group coding within RHS-blocks needs to be defined only once. Nevertheless, the order of evaluation of the RHS-blocks can still be permuted. Furthermore, to avoid that neighbouring SNPs are always evaluated in the same order, SNPs can be allocated to RHS-blocks at random.
The RHS-updating scheme not only considerably reduced computing time, by up to 93%, but also resulted in a reduction of the amount of RAM used of up to 66%. Due to the nature of the RHS-updating algorithm, computing time and RAM use are linearly related with the number of SNPs considered, similar to the residual updating algorithm. This implies that the relative benefit of using the RHS-updating algorithm compared to the residual updating algorithm is not affected by the number of SNPs included. In our implementation, which is written in Fortran 95, group codes within RHS-blocks were stored as an integer(2) variable, while the genotypes in the original implementation with residual updating were stored as an integer(1) variable. It should be noted that storing group codes as integer(1) would lead to a further reduction in RAM requirements of almost 50%, because the array that stores the group codes uses close to 100% of the RAM used by the algorithm. Storing group codes as integer(1), implies that the number of SNPs included per RHS-block should be equal to four or less, i.e. the maximum value an integer(1) variable can take is equal to 127, and including 4 (5) SNPs per RHS-block yields 34 = 81 (35 = 243) groups; the maximum value an integer(2) variable can take, is equal to 32 767. This means that a maximum of nine SNPs can be included per RHS-block, otherwise the group code must be stored as integer(4), i.e. including 9 (10) SNPs per RHS-block yields 39 = 19 683 (310 = 59 049) groups. Our results show that with the largest number of individuals considered (100 000), the optimal number of SNPs included per RHS-block was equal to six. This suggests that it is unlikely that a number of individuals in the data that justifies including more than nine SNPs per RHS-block, and therefore requires storing group codes as integer(4), is reached in the near future.
In our implementation of RHS-updating, each RHS-block containing s SNPs is assumed to contain all 3 s possible groups, which is most likely not always the case. Moreover, clever grouping of SNPs within RHS-block can reduce the observed number of groups within each RHS-block. Such redundancy could be used to further reduce computing time, but would also likely result in a more complicated algorithm.
Two algorithms are presented to estimate SNP effects that can be implemented in a range of different genomic prediction models, as an alternative to the original residual updating scheme. The first alternative algorithm uses residual updating, here termed improved residual updating, and takes advantage of the characteristic that the predictor variables in the model (i.e. SNP genotypes) have only three possible values. The second alternative algorithm, here termed “RHS-updating”, extends the idea of improved residual updating across multiple SNPs. The improved residual updating algorithm achieved a reduction in computing time of 35.3 to 43.3%, but did not change the amount of RAM used, compared to the original residual updating scheme. The RHS-updating algorithm achieved a reduction in computing time of 74.5 to 93.0% and a reduction in RAM use of 13.1 to 66.4%, compared to the original residual updating scheme. Thus, the RHS-updating algorithm provides an interesting alternative to reduce both computing time and memory requirements.
Appendix 1. Pseudo-code to measure maximum RAM requirement
The maximum RAM requirement was measured by retrieving the process ID and then storing the RAM use for this particular process. On the Windows OS, this was done using the following Fortran code:
USE DFPORT !Module that contains function “GETPID”
INTEGER :: PROC_ID
CHARACTER(LEN = 47) :: SYS_CALL
PROC_ID = GET_PID() !Retrieve process ID of the current process
!Use DOS command “tasklist” to write RAM use to file “tasklist.txt”
SYS_CALL = "tasklist /fi ""PID eq "" > tasklist.txt"
END PROGRAM GIBBS
Two anonymous reviewers are thanked for their very valuable comments on the manuscript that helped to improve it. The author acknowledges financial support of CRV BV (Arnhem, the Netherlands).
- de los Campos G, Hickey JM, Pong-Wong R, Daetwyler HD, Calus MPL: Whole-genome regression and prediction methods applied to plant and animal breeding. Genetics. 2013, 193: 327-345. 10.1534/genetics.112.143313.PubMed CentralView ArticlePubMedGoogle Scholar
- Daetwyler HD, Calus MPL, Pong-Wong R, de los Campos G, Hickey JM: Genomic prediction in animals and plants: simulation of data, validation, reporting, and benchmarking. Genetics. 2013, 193: 347-365. 10.1534/genetics.112.147983.PubMed CentralView ArticlePubMedGoogle Scholar
- Lund M, de Roos S, de Vries A, Druet T, Ducrocq V, Fritz S, Guillaume F, Guldbrandtsen B, Liu Z, Reents R, Schrooten C, Seefried F, Su G: A common reference population from four European Holstein populations increases reliability of genomic predictions. Genet Sel Evol. 2011, 43: 43-10.1186/1297-9686-43-43.PubMed CentralView ArticlePubMedGoogle Scholar
- Rincon G, Weber KL, Van Eenennaam AL, Golden BL, Medrano JF: Hot topic: performance of bovine high-density genotyping platforms in Holsteins and Jerseys. J Dairy Sci. 2011, 94: 6116-6121. 10.3168/jds.2011-4764.View ArticlePubMedGoogle Scholar
- Hayes B, Anderson C, Daetwyler HD, Fries R, Guldbrandtsen B, Lund M, Boichard D, Stothard P, Veerkamp RF, Hulsegge I, Rocha D, Van Tassel CP, Coote D, Goddard M: Towards genomic prediction from genome sequence data and the 1000 bull genomes project. Book of Abstracts of the 4th International Conference on Quantitative Genetics: 17-22 June 2012; Edinburgh. 2012, 55-Google Scholar
- Meuwissen THE, Goddard ME: Accurate prediction of genetic value for complex traits by whole-genome resequencing. Genetics. 2010, 185: 623-631. 10.1534/genetics.110.116590.PubMed CentralView ArticlePubMedGoogle Scholar
- Hayashi T, Iwata H: EM algorithm for Bayesian estimation of genomic breeding values. BMC Genet. 2010, 11: 3-PubMed CentralView ArticlePubMedGoogle Scholar
- Shepherd RK, Meuwissen THE, Woolliams JA: Genomic selection and complex trait prediction using a fast EM algorithm applied to genome-wide markers. BMC Bioinformatics. 2010, 11: 529-10.1186/1471-2105-11-529.PubMed CentralView ArticlePubMedGoogle Scholar
- Meuwissen THE, Solberg TR, Shepherd R, Woolliams JA: A fast algorithm for BayesB type of prediction of genome-wide estimates of genetic value. Genet Sel Evol. 2009, 41: 2-10.1186/1297-9686-41-2.PubMed CentralView ArticlePubMedGoogle Scholar
- Cai X, Huang A, Xu S: Fast empirical Bayesian LASSO for multiple quantitative trait locus mapping. BMC Bioinformatics. 2011, 12: 211-10.1186/1471-2105-12-211.PubMed CentralView ArticlePubMedGoogle Scholar
- VanRaden PM: Efficient methods to compute genomic predictions. J Dairy Sci. 2008, 91: 4414-4423. 10.3168/jds.2007-0980.View ArticlePubMedGoogle Scholar
- Meuwissen THE, Hayes BJ, Goddard ME: Prediction of total genetic value using genome-wide dense marker maps. Genetics. 2001, 157: 1819-1829.PubMed CentralPubMedGoogle Scholar
- Stranden I, Christensen OF: Allele coding in genomic evaluation. Genet Sel Evol. 2011, 43: 25-10.1186/1297-9686-43-25.PubMed CentralView ArticlePubMedGoogle Scholar
- Legarra A, Misztal I: Computing strategies in genome-wide selection. J Dairy Sci. 2008, 91: 360-366. 10.3168/jds.2007-0403.View ArticlePubMedGoogle Scholar
- Habier D, Fernando R, Kizilkaya K, Garrick D: Extension of the Bayesian alphabet for genomic selection. BMC Bioinformatics. 2011, 12: 186-10.1186/1471-2105-12-186.PubMed CentralView ArticlePubMedGoogle Scholar
- Calus MPL, Meuwissen THE, De Roos APW, Veerkamp RF: Accuracy of genomic selection using different methods to define haplotypes. Genetics. 2008, 178: 553-561. 10.1534/genetics.107.080838.PubMed CentralView ArticlePubMedGoogle Scholar
- Verbyla KL, Hayes BJ, Bowman PJ, Goddard ME: Accuracy of genomic selection using stochastic search variable selection in Australian Holstein Friesian dairy cattle. Genet Res. 2009, 91: 307-311. 10.1017/S0016672309990243.View ArticleGoogle Scholar
- Habier D, Fernando R, Dekkers J: The impact of genetic relationship information on genome-assisted breeding values. Genetics. 2007, 177: 2389-2397.PubMed CentralPubMedGoogle Scholar
- Jia Y, Jannink J-L: Multiple-trait genomic selection methods increase genetic value prediction accuracy. Genetics. 2012, 192: 1513-1522. 10.1534/genetics.112.144246.PubMed CentralView ArticlePubMedGoogle Scholar
- Mulder HA, Calus MPL, Druet T, Schrooten C: Imputation of genotypes with low-density chips and its effect on reliability of direct genomic values in Dutch Holstein cattle. J Dairy Sci. 2012, 95: 876-889. 10.3168/jds.2011-4490.View ArticlePubMedGoogle Scholar
- Whaley RC, Dongarra JJ: Automatically tuned linear algebra software. Proceedings of the 1998 ACM/IEEE conference on Supercomputing: 7-13 November 1998: Orlando. (CDROM). 1998, Washington, DC, USA: IEEE Computer Society, 1-27.Google Scholar
- Dongarra JJ, Du Croz J, Hammarling S, Duff IS: A set of level 3 basic linear algebra subprograms. ACM TOMS. 1990, 16: 1-17. 10.1145/77626.79170.View ArticleGoogle Scholar
- Aguilar I, Misztal I, Legarra A, Tsuruta S: Efficient computation of the genomic relationship matrix and other matrices used in single-step evaluation. J Anim Breed Genet. 2011, 128: 422-428. 10.1111/j.1439-0388.2010.00912.x.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.