More animals than markers: a study into the application of the single step T-BLUP model in large-scale multi-trait Australian Angus beef cattle genetic evaluation

Multi-trait single step genetic evaluation is increasingly facing the situation of having more individuals with genotypes than markers within each genotype. This creates a situation where the genomic relationship matrix (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbf{G }$$\end{document}G) is not of full rank and its inversion is algebraically impossible. Recently, the SS-T-BLUP method was proposed as a modified version of the single step equations, providing an elegant way to circumvent the inversion of the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbf{G }$$\end{document}G and therefore accommodate the situation described. SS-T-BLUP uses the Woodbury matrix identity, thus it requires an add-on matrix, which is usually the covariance matrix of the residual polygenic effet. In this paper, we examine the application of SS-T-BLUP to a large-scale multi-trait Australian Angus beef cattle dataset using the full BREEDPLAN single step genetic evaluation model and compare the results to the application of two different methods of using \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbf{G }$$\end{document}G in a single step model. Results clearly show that SS-T-BLUP outperforms other single step formulations in terms of computational speed and avoids approximation of the inverse of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbf{G }$$\end{document}G.


Background
Within the last decade, genotyping thousands of individuals with single nucleotide polymorphism (SNP) chips has become common practice in breeding programs of many species of economic relevance. However, due to cost effectiveness these individuals are being genotyped with low-to medium-density SNP chips, with usually not more than 50,000 markers.
To date, genetic evaluation systems accommodate SNP genotypes via the so-called single step model, in which most often markers are used to pre-calculate a relationship matrix, which subsequently augments the usual pedigree derived relationship matrix into a so-called H matrix (SS-H-BLUP) [1]. With the mixed model equations (MME) requiring the inverse of this matrix, and assuming that G is actually algebraically invertible, increasing numbers of genotyped individuals have imposed a large computational burden on genetic evaluation systems. To circumvent this problem an approximation of the inverse of G was proposed, but the effect of this approximation on estimated breeding values (EBV) is dataset-dependent and must therefore be empirically determined for every single application [2]. However, the situation described above of having more genotyped individuals than markers has led to a situation where G is not of full rank and therefore algebraically no longer invertible. An alternative solution is to not use G and move to a model which incorporates the markers directly (SS-SNP-BLUP). While SS-SNP-BLUP is generally equivalent to SS-H-BLUP, and some formulations such as [3] offer huge model flexibility, many of its final implementations suffer from convergence problems with regard to iterative solving [3] or demanding pre-conditioner computation [4]. However, recently an elegant intermediate model has been formulated, which may be seen as a mix of SS-H-BLUP and SS-SNP-BLUP and is called SS-T-BLUP [5,6]. SS-T-BLUP does not need G or its inverse and fits the marker indirectly. As it also fits G indirectly, it is generally algebraically equivalent to SS-H-BLUP. Thus, it provides EBV at the individual level, which can be readily transformed into SNP solutions but avoids the complex co-variance structure of SS-SNP-BLUP [3,5,7].
In this paper, we will examine the computational advantage of SS-T-BLUP for a large-scale multi-trait BREEDPLAN single step genetic evaluation of Australian Angus beef cattle. We will compare the results to those obtained by using an ordinary SS-H-BLUP approach.

Model
In the following, three equivalent representations of the inverse of the H matrix are derived which differ in their computational demand before and while solving the MME. Many of the formulas have been derived elsewhere [1,5,6,[8][9][10][11], but for convenience they are presented below.
The H matrix required for SS-H-BLUP can be written as: where A is the pedigree-based numerator relationship matrix, A 1,1 denotes a diagonal block of A related to the set of m n non-genotyped individuals, A 2,2 denotes a diagonal block of A related to the set of m g genotyped individuals, and A 1,2 and A 2,1 denote off-diagonal blocks of A located between the non-genotyped and genotyped individuals. G w is a genomic relationship matrix of dimension m g × m g which is constructed by G w = γ M D M ′ + C , where M is a centred and scaled matrix of marker genotypes of dimension m g × m m , D is an arbitrary but symmetric and positive definite matrix of dimension m m × m m , C is an arbitrary but symmetric and positive definite matrix of dimension m g × m g , and γ and are arbitrary non-zero weights. Note that in applications where all markers are weighted equally and the co-variance between markers is set to zero, D reduces to an identity matrix if M is centred and scaled. Furthermore, C may be a diagonal matrix of random noise which ensures invertibility of M D M ′ , and and γ are set to 1. Or C = A 2,2 , 0 < < 1 , γ = 1 − , where is interpreted as the proportion of the total additive genetic variance not explained by markers [8]. H −1 can be written as: (1) where A :,: is a respective block of the inverse of A.
Replacing G w with γ M D M ′ + C in Eq. 1 and inverting the resulting matrix yields: where according to the Woodbury matrix identity.

Computational implications when solving iteratively
The differences between the three approaches regarding computational time spent on preparing necessary data and solving the MME iteratively can be reduced to a set of very specific operations unique to the respective representation of the inverse of H . This also applies to the differences in memory requirements.
A widely used method when solving MME iteratively is the conditioned gradient descent method [also known as preconditioned gradient method (PCG)]. It requires the multiplication of a vector with the MME coefficient matrix once per iteration. Therefore, this method is affected by the way the inverse of H is presented. More specifically, during iteration the computational differences between the three approaches can be reduced to the multiplication of a vector of length m g , say z, with a dense matrix, which is ( Differences in peak memory requirement directly result from the size of the arrays, which must be kept in RAM simultaneously during preparation and iteration. Furthermore, for SS-H-BLUP and SS-H-BLUP, the computational task, in which peak memory usage occurs, changes as the number of genotyped individuals exceeds the number of markers.

Data
The SS-H-BLUP, SS-H-BLUP and SS-T-BLUP models were applied to an Australian Angus beef cattle dataset currently used in BREEDPLAN single step genetic evaluation [13]. The dataset comprised 35 traits with 9,565,814 records and 2,621,403 individuals in the pedigree. The number of animals with genotypes was 58,705 which comprised SNP genotypes of various densities and panel manufacturers imputed to a common set of 56,009 SNPs [14]. To increase the computational load, additional 91,295 and 341,295 dummy genotypes (total dataset size of 150k and 400k genotypes, respectively) were generated in a regression-sampling approach (see next paragraph). The 400k dataset was used only for SS-T-BLUP because the other models were computationally infeasible.
Dummy genotypes for 91,295 (341,295) individuals, sampled from the pool of non-genotyped individuals, were generated by M = A * ,2 A −1 2,2 M c , where M is a matrix of dimension 91,295 (341,295) × 56,009 of expected marker counts of the sampled non-genotyped individuals, M c is a matrix of real marker counts of dimension 58,705 × 56,009, which were centred using mean allele counts estimated from the data, and A * ,2 is the off-diagonal block of A between the sampled non-genotyped individuals and the 58,705 genotyped individuals. Outliers in M (< 0 and > 2 ) were truncated to 0 and 2, respectively, where the proportion of outliers was lower than 1%. Subsequently, each expected marker count M i,j was translated into a dummy marker genotype by drawing two samples from a binomial distribution with parameters p = M i,j /2 and q = 1 − M i,j /2 . Note that dummy genotypes that are generated this way may be affected by Mendelian inconsistency, but these were only generated for the purpose of increasing the computational load and are not part of the usual BREEDPLAN analysis.
The BREEDPLAN multi-trait model included pre-corrected phenotypes [15], a single fixed factor per trait, 27 correlated random genetic factors (including direct and maternal), 27 correlated random genetic group factors with 19 genetic groups (including direct and maternal), 3 correlated random maternal permanent environmental factors and 22 correlated random sire-by-herd interaction factors. For traits with repeated observations, repetitions were modelled as correlated traits sharing the same genetic factor. Accounting for the extensive production system and the widespread use of natural mating in large herds using groups of bulls, the pedigree and its derivatives (e.g. A , A −1 ) allowed for more than one pair of parents per animal if necessary [15]. The total number of equations was 76,823,378.
For all three models matrix, D was set to identity, matrix C = A 2,2 , and and γ were set as 0.05 and 0.95, respectively.

Software
The system of equations was solved with AGBU's current large-scale linear mixed model library solver, which uses the PCG algorithm for iteratively solving linear mixed models and integrates Intel(R) MKL(R), version 2017 update 8. For research and commercial purposes, the solver is available on request. Block-diagonal and diagonal pre-conditioners were used for random and fixed factors, respectively.
Denoting the MME as Xb = y, where X is the coefficient matrix, b is the solution vector and y is the right hand side vector, convergence was achieved when the L2 norm of vector (y − Xb) scaled by the L2 norm of vector y was ≤ 2.68E −9 . All computationally relevant integers and all real numbers were represented in 64 bit form. All matrices and vectors required for preparation and solving were stored in random access memory (RAM). Computations for the 150k dataset were carried out on a computer with two sockets each with an Intel(R) Xeon(R) CPU E5-2697 v3 with 2.60 GHz, a total of 28 cores and 528 GB of RAM. Computations for the 400k dataset were carried out on a computer with two sockets each with an Intel(R) Xeon(R) CPU E5-2697 v4 with 2.30 GHz, a total of 36 cores and 256 GB of RAM.

Results
Results for the different parts of the setup and solving steps are in Table 1    Additional approximate random access memory (RAM) requirements in gibabyte due to matrices and operations that are unique to the approaches are in the last row in Table 1 Table 1 shows the computing time and additional RAM requirement for SS-T-BLUP 400 . Note that SS-H-BLUP models using 400k dataset were computationally infeasible.

Discussion
SS-T-BLUP has been proposed as a single step model which can be helpful for datasets for which the number of genotyped individuals exceeds the number of markers and the G matrix is algebraically not invertible. These situations are becoming more common in commercial plant and livestock species where increasing numbers of individuals are genotyped with low-to medium-density SNP chips [6]. The method is enabled by reformulating the H matrix representation such that neither the G or A 2,2 matrices, nor their inverse matrices need to be built or approximated.
In terms of modelling capacity SS-T-BLUP, SS-H-BLUP, and SS-H-BLUP have drawbacks compared to SS-SNP-BLUP. The derivation of matrix −1 is dependent on a matrix C with weight , which is usually matrix A 2,2 or a diagonal matrix of random noise. This applies to matrices H −1 and H −1 as well, because invertibility of G is never guaranteed. In addition, SS-SNP-BLUP can be reformulated such that every single genetic effect in the model can have different γ and and every single marker in the model can have a different genetic co-variance matrix. Such a situation arises when markers have different effects within a trait and different effects in different traits. The former requires D to be non-identity diagonal, the latter a unique matrix D for every single trait. In a multi-trait analysis, the genetic covariance matrix for marker i may then be where is a global genetic co-variance matrix and D i is a diagonal matrix of weights of marker i in the different traits. This expansion is not possible for the models applied here. However, SS-SNP-BLUP usually comes at the cost of much higher model dimensionality and slow convergence rates when solved iteratively [3]. The latter can be dealt with by using a more elaborate pre-conditioner, which is still computationally demanding [4]. To our knowledge, it has not been shown yet that the model flexibility of SS-SNP-BLUP is required for more accurate EBV.
Since all models were equivalent, it was expected that the number of iterations needed for convergence was the same. However, surprisingly there was no difference in the number of iterations for convergence when using only the 58,705 real genotypes (results not shown), 150k genotypes or 400k genotypes. A possible explanation is the way the dummy genotypes were generated. Thus, it is very likely that a dataset with 400k real genotypes may require more iterations but that the time needed for preparation and per iteration will be similar to that observed in this study.
As shown by the results, SS-T-BLUP clearly outperforms SS-H-BLUP in terms of total processing time mainly due to the huge computational cost of setting-up G , A 2,2 and inverting both. In particular, the inversion cost grows cubic with m g , whereas at a constant m m the cost for generating M † grows less than linearly and the cost for K u grows proportional to (m m × (m m + 1))/2 × m g .

Conclusion
These results support the conclusion that SS-T-BLUP provides a feasible algorithm to calculate exact solutions for estimated breeding values when the number of genotyped individuals exceeds the number of markers. A limitation to the number of genotyped individuals is only set by the available RAM. Therefore, SS-T-BLUP allows solving single step equation systems iteratively without generating G or A 2,2 or their inverse matrices or any approximation of these matrices.