A rapid method for computing the inverse of the gametic covariance matrix between relatives for a marked Quantitative Trait Locus

The inverse of the gametic covariance matrix between relatives, G-1, for a marked quantitative trait locus (QTL) is required in best linear unbiased prediction (BLUP) of breeding values if marker data are available on a QTL. A rapid method for computing the inverse of a gametic relationship matrix for a marked QTL without building G itself is presented. The algorithm is particularly useful due to the approach taken in computing inbreeding coefficients by having to compute only few elements of G. Numerical techniques for determining, storing, and computing the required elements of G and the nonzero elements of the inverse are discussed. We show that the subset of G required for computing the inbreeding coefficients and hence the inverse is a tiny proportion of the whole matrix and can be easily stored in computer memory using sparse matrix storage techniques. We also introduce an algorithm to determine the maximum set of nonzero elements that can be found in G-1 and a strategy to efficiently store and access them. Finally, we demonstrate that the inverse can be efficiently built using the present techniques for very large and inbred populations.


INTRODUCTION
The utilization of marker quantitative trait loci associations in genetic evaluation is now possible and likely to be used more extensively in the future. Also, many authors have estimated gain through marker-assisted selection, e.g. [6,7,9,14].
Marker information will not replace phenotypic records because a full prediction of phenotype from DNA sequence is still far from achievable [3]. Joint utilization of marker and phenotype information in current genetic prediction models is, however, progressing at a rapid pace. Fernando and Grossman [2] explained how genetic markers associated with quantitative trait loci (QTLs) could be incorporated into mixed models. Marked QTL alleles were considered random in the context of the mixed model terminology, and algorithms to construct and invert the covariance matrix pertaining to QTL additive effects were developed.
Based on previous developments by Fernando and Grossman [2] and van Arendonk et al. [12], and using the partitioned matrix theory, Wang et al. [13] described an exact recursive method to obtain the inverse of the covariance matrix of the additive effects of a marked QTL in the case of complete marker data. If inbreeding is considered, certain elements of G are required, however, Wang et al. [13] did not specify how these elements could be computed separately.
The objective of the present paper is to develop a rapid method to obtain the inverse of the covariance matrix of the additive effects of a marked QTL in the case of complete marker data using a small subset of G. In addition to the partitioned matrix theory, we show that the inverse can also be obtained by factorizing the covariance matrix into LDL where L is a lower triangular matrix whose inverse can be directly computed from pedigree and marker data. Matrix D is shown to be proportional to the covariance matrix of Mendelian sampling at the QTL for given observed marker genotypes. We will show that D is block diagonal and can be computed from a small subset of G. The method is inspired by the rapid method of Henderson [4] to obtain the inverse of the numerator relationship matrix. In this work we will give special attention to computing efficiency. Numerical techniques to efficiently compute and store a subset of the covariance matrix and the nonzero elements of the inverse are discussed.

TABULAR METHODS FOR THE COVARIANCE MATRIX AND THE INVERSE
The covariance of marked QTL (MQTL) effects for given complete marker data was discussed by Fernando and Grossman [2], van Arendonk et al. [12], and Wang et al. [13]. The covariance can be divided into two parts: between individuals and within individuals. By definition, the genetic covariance between two alleles is the probability that they are identical by descent, multiplied by the additive genetic variance, σ 2 v . In animals, each locus consists of two alleles, hence for given known marker genotypes, four covariance values can be computed between each two individuals as described in definition (1). Also, within every individual, four covariance values can be computed as described in definition (2). Denote the two MQTL alleles of individual i by α 1 i and α 2 i ; in addition, denote the additive effects of the two MQTL alleles of individual i . Also, let P(α i ≡ α j |M) denote the probability that any two alleles, say α i and α j , are identical by descent given M with M defined as the event of observing marker genotypes, then In definition (2) the probability of identity by descent between an allele and itself, for M equals 1, and f i the probability that the two MQTL alleles of individual i are identical by descent for M; f will be referred to as the inbreeding coefficient.
If animals are ordered such that parents precede their progeny and are identified by integers from 1 to n, then a number of n 2 covariance matrices of order 2, described in (1) and (2), can be put together in a matrix of order 2n that is referred to as the conditional gametic relationship matrix for given marker data [13]. Denote the element located in row r and column c of any matrix A by A(r, c), and denote the entire rth row of A by A(r, ) and the entire cth , that we will refer to as C ij and C ii , can be written as and For example, the (1,1) element of C ij is the element of G located in the (2i−1)th row and the (2j − 1)th column. Because G is symmetric and C ij is not a scalar, C ji = (C ij ) . Moreover, all diagonal elements of G are equal to 1.

Tabular method for G
It can be shown that C ij can be computed from previous rows of G [13]. For i > j, Where s and d denote paternal and maternal parents, respectively, of individual i, Q i is a 2 × 4 matrix defined as The matrix Q i contains the probabilities that the paternal and maternal alleles of individual i descended from any of the four alleles of its two parents for given observed marker genotypes. Due to the marker-QTL association, the probability that an individual received the QTL allele that was in coupling phase in the parent with the marker allele it received from that parent is 1 − r where r is the recombination rate between the marker locus and the QTL. Based on this simple genetic fact, Wang et al. [13] computed Q i for the ith individual. Matrix Q i is required for each individual in the pedigree and hence its computing cost needs to be minimized. We present a general algorithm to efficiently compute Q i in Appendix A.
Because the relationship C ij , between the two individuals i and j, at the QTL, can be computed from already built relationships, i.e., C sj and C dj as shown in (5), there exists a recursive method to build new relationships from previous elements of G. The following formulation, as suggested by Wang et al. [13], adds the two rows corresponding to the ith individual to the lower triangle of G, and using the symmetry of G, the corresponding upper triangle elements are constructed: where A i is a 2 × 2(i − 1) matrix constructed by setting A(, 2s − 1) equal to Q(, 1), A(, 2s) equal to Q(, 2), A(, 2d − 1) equal to Q(, 3), and A(, 2d) equal to Q(, 4), the rest of A is set equal to 0. The matrix Q is defined in (6), and C ii is defined in (2). The inbreeding coefficient, f i , is the only element required to construct C ii and can be computed as described in Wang et al. [13]. It is important for future use to know that f i is a function of Q i , C ss , C dd , and C sd . Given observed marker genotypes and the recombination rate of 0.1, the conditional gametic relationship matrix for the pedigree listed in Table I is shown in Figure 1.

Decomposing G
In this section we decompose G following arguments similar to those Henderson [4] used in decomposing the numerator relationship matrix (NRM). The matrix G can be decomposed and written as G = LDL (8) where L is a lower triangular matrix and D is a block diagonal matrix. Matrix L can be recursively computed using relationship (9) that adds the two rows corresponding to individual i, i.e., rows 2i − 1 and 2i to L, where I 2 is an identity matrix of order 2; A i is defined in (7). The relationship (9) indicates that the smallest unit of L that can be built is a matrix of order 2, not a scalar as in the decomposed NRM. To illustrate this and subsequent computations, we use the pedigree of Table I. The matrix L is shown in Figure 2.
To illustrate the procedure of building L, denote the 2 × 2 matrix on the intersection of the two individuals i and j by R(i, j). Now given that 5 and 6 are parents of 7, we compute R(7, 2) as The variance and covariance of Mendelian sampling for an individual with two alleles at the marked QTL for given observed marker genotypes is described by a matrix of order 2, say d i , and not by a scalar as in the case of the infinitesimal model. Define D as diag[d 1 , d 2 , . . . , d n ], where the 2 × 2d i can be computed separately for each individual. It will be shown that σ 2 v D is the covariance matrix of Mendelian sampling due to a QTL linked to one marker for given observed marker genotypes. To find the conditional Mendelian sampling covariance for individual i, denote its Mendelian sampling by m i , then The rationale behind (10) is that Mendelian sampling in the case of a marked QTL could be computed just as in the case of the infinitesimal model, by subtracting the expected breeding value from the realized breeding value. It can now be proved that See Appendix B for a proof of (11). Further, for a proof that D is block diagonal, see Appendix C. From (11), d i can be computed as To illustrate (12), we compute d for individual 7, Now it is straightforward to verify the decomposition of G by the direct multiplication LDL .

Computing the inverse of G
The inverse of G is now computed by making use of the decomposition presented earlier. From the decomposition of G in (8), G −1 can be written as L −1 is easy to compute due to the recursive method used to construct L. The inverse of L can be computed according to the following recursive relationship as can be verified by showing that Using the recursive relationship (14), L −1 for the pedigree of Table I is As mentioned before, D is composed of the d i matrices along its diagonal, and d i is proportional to the conditional variance and covariance of Mendelian sampling within individual i. Therefore, d i is positive definite and can be written as where t i is a matrix of order 2 with 0 as its upper diagonal element. If d i is symbolically written as where p, k, q, and c are scalars. Notice that p and q are the conditional Mendelian sampling variances associated with α 1 i and α 2 i , respectively, and k is their conditional Mendelian sampling covariance. The results in (16) can be easily seen by inverting t i , obtained after the decomposition of d i in (15). Relationship (16) shows an easy way to compute t −1 i directly from elements of d i without having to decompose d i and then invert t i .
After every d i is decomposed as described in (15), D can be written as TT where T is lower triangular defined as diag[t 1 , t 2 , . . . , t n ]. The matrix D −1 can then be written as (T ) −1 T −1 and Since the inverse has the form (17), the contribution of each individual to G −1 is now easy to compute using the recursive method for constructing L −1 and the efficient way of expression (16) that can be used to directly obtain t −1 i from the elements of d i . From (17), the contribution of the ith individual to the inverse can be written as the cross product of t −1 i (−A i I 2 ) , where the cross product of any matrix, say B, is B B. Since the nonzero elements of A i are the elements of the matrix Q i , the cross product of is added to the following locations of G −1 , where i, s, d, and R(i, j) are consistent with their previous definitions, with R(i, s) for example, as the matrix of order 2 at the intersection of the individual and its paternal parent.

Algorithm
Next, we suggest an algorithm to compute and add the contributions of the ith individual to G −1 .
• For x = 1 to 6 For y = 1 to 6 The algorithm does not explicitly invert or decompose D, it only computes the elements of t −1 i according to (16) after computing d i for each individual. Furthermore, instead of carrying out the matrix product of (18), the algorithm directly assigns the multiplication results to the 2 × 6 matrix ∆.
To illustrate the computation of ∆, we compute ∆ 7 . From d 7 of Table I The matrix G −1 for the example follows: ( 0 (

One unknown parent
If one of the two parents of i is unknown, Wang et al. [13] have suggested replacing the two columns of Q i that belong to the unknown parent with zeros, i.e., considering the probability of QTL descent from the unknown parent as undefined. This approach, however, creates unusual singularities in some cases while inverting the gametic relationship matrix. For example, if the dam is unknown and both the sire and the individual have marker genotype A 1 A 2 , the contributions to the inverse due to i cannot be computed either by the current algorithm or by the Wang et al. [13] algorithm.
Phantom identification numbers could be assigned to the unknown parents and the problem becomes a pedigree with incomplete marker data. For incomplete marker data, alternative exact and approximate approaches are available (Wang et al., 1995). The current techniques are still useful for the case of one unidentified parent and the case of incomplete marker data in general. For instance, if d is a phantom parent of i, the most probable genotype of d for given s and i genotypes could simply be assigned to d, and approximate G or G −1 values could be built as described earlier.

COMPUTATIONAL TECHNIQUES FOR CONSTRUCTING THE INVERSE
In most animal breeding applications, large data are commonplace. In this case, handling matrices like G and its inverse within computer memory is most unlikely. Also, having to build these matrices on disk degrades performance due to the repeated search that has to take place for certain elements of G and G −1 . In this section we explain a scheme to compute the minimum possible set of G that contains elements required for computing the inverse. A sparse matrix technique to store this set is also presented. In addition, due to the sparse structure of the inverse, we suggest a method that can be used to determine the maximum possible set of the nonzero elements found in the inverse and corresponding sparse matrix techniques to efficiently store and retrieve them.

Computing a subset of G
Building the inverse as described earlier requires the 2 × 2 blocks C ii and C sd of G to be available if inbreeding is to be accounted for. As was shown by Tier [11], the diagonal of the NRM can be computed from a small subset of the matrix. Although the diagonal of G is known to consist of 1s, and hence need not be computed, we will use a similar approach to compute the C ii submatrices located on the diagonal of G. Besides, in our case, extra elements are needed for the inverse, i.e., the relationship of the two parents, but this does not increase the computational task because C sd is needed for computing C ii .
For the example pedigree, the set of filled cells in Table II contains the required elements. We express the subset in terms of the C ij and C ii submatrices instead of single elements. The reason for this is its computational advantage. The subset is first determined and then computed according to equation (5) and rules explained in Wang et al. [13]. First, to determine the subset, read the pedigree and flag cells C ii and C sd if s > d or C ds if d > s, i.e., the relationship between the two parents located in the lower triangle. The cells required for computing the previously flagged cells are determined as follows: starting from the second to the last row of cells and proceeding up and to the left, flag the two cells corresponding to C sj and C dj as described in equation (5). Second, after determining all the required cells, compute them row by row starting with row 1.
1 An asterisk indicates a cell required in the upper triangle that is taken from the lower triangle.
Constructing only the required subset of G found in its lower triangle saves computational time and storage requirements. For every row of cells, Q i is built only once and used for all cells in the row. An asterisk "*" indicates a required upper triangular cell that is obtained from the lower triangle. For example, in Table II, C 43 was flagged instead of C 34 . In the computing step however, (C 43 ) is used whenever C 34 is required.
For this method to be useful, it is necessary to employ a sparse storage scheme that allows efficient storage and retrieval of elements of the subset. A row-linked list approach is suggested in this case for two reasons: cells in a row are not determined and flagged in any particular order, and the number of filled cells in a row is not known a priori. Henceforth, a row-linked list will refer to the sequence of filled lower triangular elements in a row as stored in the linked lists.
To explain the storage scheme, specify the number of filled cells by n f and the number of individuals by n. Define the following arrays: an integer array of length n f , column, containing cell column indices; an integer array of length n f , link, containing pointers to the location of the next cell added to a list; a double n f × 4 array, values, a row of which contains the four values of an off-diagonal cell; and a double array of length n, f , containing inbreeding coefficients. Row Table III. Linked lists of the subset of the gametic relationship matrix required for building the inverse. indices are assumed to be sorted in ascending order corresponding to the first n entries of column, that is, for i = 1 to n, the column index of the first entry to row list i is column(i). A value of 0 in column(i) indicates no entries have yet been added to the ith list. A value of 0 in link(j) indicates a terminal link, that is, the last entry in a list. Linked lists for the example are given in Table III. To add an entry, C ij , to the lists, start at column(i) where i is the individual to which the entry belongs. If column(i) = 0, place the column index of the entry, j, in column(i) and the four values of C ij in values(i, ), otherwise proceed via links to search whether the array column already contains j. If not, store j and C ij in the next available entry of column and values, respectively. To add an entry, C ii , to the lists, only place the value of f i in the ith element of vector f , i.e., in f (i).
To retrieve the C ij entry from the linked lists, search for the entry starting at column(i) and proceed via links until the desired column index is found, i.e., j. The entries in a row list do not have to be sorted in any order because the search method we described does not require any ordering. It is likely that a better searching technique will require sorting the lists. In this case, the improved searching technique is useful only if the time saved is greater than the sorting time. Notice that in linked lists new elements are usually added to the lists by inserting them in order. This practice, when tested, consumed more time than just adding new elements to the next available entry as described earlier.

Sparse storage scheme for G −1
For large numbers of animals, neither G nor the inverse can be handled in memory. We introduce a sparse storage scheme that allows construction of the inverse within memory. The scheme first determines a maximum set of the nonzero elements found in the lower triangle of the inverse and then computes them. The scheme is similar to that described earlier in storing and retrieving the required subset of G, except that three of the four elements of the R(i, i) submatrices on the diagonal of G −1 must be stored instead of only one element, f i , of C ii . Only three elements are stored because of the symmetry of G −1 . Therefore the one-dimensional array, f , is replaced by an n × 3 array, say diagv.
Using the matrix of (19), the maximum set can be determined while reading the pedigree by adding to the lists the following entries corresponding to each individual. For the ith individual, add either R(d, s) or R(s, d) to the lists, depending on which of them is in the lower triangle; also add R(i, s), and R(i, d). Entries for R(i, i), R(s, s), and R(d, d) are automatically stored in the lists, in diagv. The sparse scheme sets an upper bound for the set of nonzero elements of the inverse that does not exceed 15n, and its proportion out of 4n 2 , the order of G, does not exceed 15/4n. This can be seen in Figure 3. The dark connectors in the figure indicate the maximum number of filled elements that individual i could ever cause. Notice that the proportion 15/4n indicates that the percentage of filled elements dramatically decreases as n increases. The simulated data summarized in Table IV clearly show that as the number of individuals in the pedigree increases, the percentage of filled cells in the inverse substantially decreases.    After the maximum set has been determined, the algorithm described earlier can be used to compute and add contributions of the ith individual to the inverse. Because the elements of (19) have to be retrieved and added to, perhaps several times, the values of the maximum set must be first set to 0. The same search method used with G is used here to retrieve the elements of the inverse. Searching via links is only required if it is for R(i, j) where i > j. If i = j, then R(i, i) can be directly retrieved from diagv (i, ).
Now it should be clear that storing the elements of the matrices in groups of 4, i.e., R(i, j), saves a great deal of computing time although it could contain zero elements. Notice that a single element of C ij or R(i, j) of the inverse is never required and hence searched for unless the other three elements are required as well. Also, no search is required if i = j. Although storing only the nonzero elements of the 2 × 2 blocks is more memory-efficient, it showed very poor performance in terms of speed when tested. More details of programming strategies can be inferred from the C code listed in Appendix C.

SIMULATION AND VALIDATION
In this section we use simulated pedigree and genotype data to investigate the efficiency of the algorithms. A modified nucleus scheme where sires are selected in two stages was simulated. The objective was to simulate a structural pedigree similar to what could be encountered in the U.S. Holstein population. Breeding values were simulated according to a finite locus model. A situation in which one QTL is associated with a known marker was simulated.
Data sets with variable sizes were simulated. Table IV shows that for larger data sets both the required subset of G and the number of nonzero elements of the inverse constitute a tiny proportion of 4n 2 . Results of three pedigree data simulated over 15, 30, and 40 years are listed in Table IV. The first pedigree comprising 18 801 animals started with 6 active sires and 14 young bulls with a maximum of 50 daughters per young bull. We used a base cow population of 2 000 cows with a maximum of 5 lactation seasons and with culling ratios of 0.22, 0.26, 0.29, 0.34, and 1 for parities 1, 2, 3, 4, and 5, respectively. The second and third pedigrees were simulated similarly, except that the simulation was continued for 30 and 40 years, resulting in a generation of 137 680 and 485 462 animals, respectively. The percentages presented in the table are the number of physically stored single elements and not the number of the R(i, j) matrices. However, this number does not include the overhead caused by storing the links and column indices. The CPU seconds presented in the table indicate that by using the current algorithms, building the inverse of the conditional gametic relationship matrix for a marked QTL is as trivial as building the inverse of the NRM.

DISCUSSION
An algorithm to directly build the inverse of a conditional gametic relationship matrix, from given marker data, was developed. The inverse algorithm is based on matrix decomposition instead of partitioned matrix theory. Numerical techniques that greatly improved computing performance were introduced. Extension to multiple markers should be straightforward provided that MQTL loci are independent. With multiple markers, efficiency should improve relatively because column indices and link pointers are the same for all markers and could be determined and stored only once.
It is imperative to mention that although both matrix decomposition and partitioned matrix theory produce the same elements that an individual contributes to the inverse, matrix decomposition offers a more computationally useful structure to the mixed model applications. First, D in our study could be used in a way similar to the Henderson D in the context of the reduced animal model [1,5,10] to absorb the non-parental equations of the MQTL and polygenes. Moreover, careful inspection of the mechanics of building D could lead to more useful reductions pertinent to the inclusion of markers in the mixed model. Reducing the number of equations is crucial if marker data are to be practically used in genetic evaluation models.
Furthermore, the decomposition allows for more flexibility in handling the mixed model equations, for instance, the Henderson L was used by Quass [8] in transforming the equations in a way that could be useful for variance components estimation methods. The decomposition we introduced allows for the same technique to be adapted to handle a mixed model with markers. This is only to name some examples, but strictly speaking, wherever the factors of the decomposed numerator relationship matrix or the Mendelian sampling variance are useful, the decomposed conditional gametic relationship matrix and the conditional Mendelian sampling covariance, introduced in this study, could be exploited similarly. The algorithm should motivate further research to build on past experience for the developing area of marker-assisted selection. [c2--2]=(1-r)/c;} } swap(&o1, &c1); swap(&o2, &c2); c = x2; /* for j = 2, c will act in place of x2 */ } l--; } return Q; }

APPENDIX B Computing Mendelian sampling conditional covariance
From relationship (10) in the text, Mendelian sampling of the ith individual is written as From (5) in the text, C is is computed as and C id is similarly computed as By substituting (B.2) and (B.3) for (B.1), we obtain

Proof that D is block diagonal
From (5) in the text, we have and from (10), we have where s i and d i are the sire and dam of individual i, respectively. To prove that D is block diagonal, it is sufficient to show that the covariance between m i and m j is null. From (C.2), But from equation (C.1), we have σ 2 v C ij = Q i C s i j C d i j , which leads us to conclude that the second term in (C.3), Cov(m i , v j |M), must be zero. Similarly, Cov(m i , v s j |M) and Cov(m i , v d j |M) could be shown to be null. Finally, given that Cov(m i , v j |M), Cov(m i , v s j |M), and Cov(m i , v d j |M) are all null, Cov(v i , v j |M) must be null.
To access this journal on line: www.edpsciences.org