Inversion of a part of the numerator relationship matrix using pedigree information

Background In recent theoretical developments, the information available (e.g. genotypes) divides the original population into two groups: animals with this information (selected animals) and animals without this information (excluded animals). These developments require inversion of the part of the pedigree-based numerator relationship matrix that describes the genetic covariance between selected animals (A22). Our main objective was to propose and evaluate methodology that takes advantage of any potential sparsity in the inverse of A22 in order to reduce the computing time required for its inversion. This potential sparsity is brought out by searching the pedigree for dependencies between the selected animals. Jointly, we expected distant ancestors to provide relationship ties that increase the density of matrix A22 but that their effect on A22-1 might be minor. This hypothesis was also tested. Methods The inverse of A22 can be computed from the inverse of the triangular factor (T-1) obtained by Cholesky root-free decomposition of A22. We propose an algorithm that sets up the sparsity pattern of T-1 using pedigree information. This algorithm provides positions of the elements of T-1 worth to be computed (i.e. different from zero). A recursive computation of A22-1 is then achieved with or without information on the sparsity pattern and time required for each computation was recorded. For three numbers of selected animals (4000; 8000 and 12 000), A22 was computed using different pedigree extractions and the closeness of the resulting A22-1 to the inverse computed using the fully extracted pedigree was measured by an appropriate norm. Results The use of prior information on the sparsity of T-1 decreased the computing time for inversion by a factor of 1.73 on average. Computational issues and practical uses of the different algorithms were discussed. Cases involving more than 12 000 selected animals were considered. Inclusion of 10 generations was determined to be sufficient when computing A22. Conclusions Depending on the size and structure of the selected sub-population, gains in time to compute A22-1 are possible and these gains may increase as the number of selected animals increases. Given the sequential nature of most computational steps, the proposed algorithm can benefit from optimization and may be convenient for genomic evaluations.

are possible and these gains may increase as the number of selected animals increases. Given the sequential nature of most computational steps, the proposed algorithm can benefit from optimization and may be convenient for genomic evaluations.

Background
For a population of n animals, the numerator relationship matrix (A), is an n-by-n matrix with the following properties: (1) a ij is the numerator relationship coefficient between two animals i and j among n, as defined by Wright [1]; (2) diagonal element a ij is equal to 1 + F i , where F i is the inbreeding coefficient [1] of animal i; (3) A is non-singular and symmetric: for two animals i and j among n, a ij = a ji .
Because the numerator relationship matrix describes the additive similarity between animals, it is an important element explaining genetic (co)variances between animals and has numerous applications in the field of animal genetics, the most important one being its use in setting up the mixed model equations for estimation of breeding values [2].
In some situations, a particular type of information (genomic information, foreign genetic evaluation, phenotypes on a particular trait, etc.) is only available for some animals, which are selected for this particular purpose, while other animals are excluded. The original population can therefore be split into two sub-populations: (1) a sub-population composed of animals called "excluded" hereafter; (2) a sub-population composed of animals called "selected" hereafter.
Splitting the original population in this way leads to the following partition of A: The four blocks include the relationships between excluded animals (A 11 ), between excluded and selected animals (A 12 and A 21 ) and between selected animals (A 22 ).
Recent methodological developments in animal breeding require inversion of A 22 , for example for genotyped animals in the context of genomic prediction using a single-step procedure [3][4][5]. Another example concerns external animals when integrating foreign information into a local genetic evaluation [6]. It is also noteworthy that the pedigree-based relationship matrix A 22 and the genomic relationship matrix (G, [7]) show structural similarities: both matrices express polygenic/genomic similarities among animals inherited from ancestors that are not represented in these matrices. Thus, the present research on A 22 can be extended to genomic relationships in G.
Based on the original work by Henderson [8] on inversion of A, a general framework for the inversion of relationship matrices follows (see Appendix 1). Henderson outlined a method that is based on the root-free factorization of A and showed the high sparsity of the inverse triangular factor of A. An efficient use of this sparsity then allows direct computation of A -1 as a sum of individual contributions based on a chronological reading of the pedigree. Applying partitioned matrix theory, van Arendonk et al. [9] gave a general expression for the sum of individual contributions outlined by Henderson [8]: an additional row/column in A leads to updating its inverse by increasing the order by 1 and by summing the square of a very sparse vector to A -1 . The very sparse vector is the corresponding row (below the diagonal) of the inverse triangular factor. All details on these developments are given in Appendix 1.
When required, the inverse of A 22 is currently obtained by brutal inversion algorithms (e.g. generalized inverse algorithm). In these algorithms, any potential sparsity occurring in the matrix to invert or in its inverse is brought out by matrix computations. In contrast, the main objectives of this paper were to investigate how potential sparsity in the inverse triangular factor of A 22 can be characterized using only the pedigree, thus without requiring matrix computations, and then use the sparsity pattern of the inverse triangular factor of A 22 in the computation of its inverse. Whereas the structure of the inverse triangular factor is known for A (positions are given by the pedigree; values are a priori known), no information is available on the structure of the inverse triangular factor of A 22 , neither on the positions of non-zero elements nor on the values of these elements. Moreover, the inverse triangular factor of A 22 may be close to dense. Therefore, we addressed our objective in the following five steps: (1) inversion of A 22 with an algorithm that uses the inverse triangular factor; (2) development of an algorithm that uses pedigree information to find the positions of the non-zero elements (sparsity pattern) in the inverse triangular factor of A 22 ; (3) inversion of A 22 with the algorithm of step (1) but restricting computations to the non-zero elements identified by the algorithm in step (2); (4) assessment of the time reduction when computing the inverse as in step (3) instead of as in step (1); (5) and evaluation of the effect of the number of generations in the pedigree used to compute A 22 , in order to reduce density of the inverse triangular factor.

Blockwise inversion of A 22
For simplicity, we assume that we are working on the last selected animal, indexed as animal n. Similarly to inversion of A (see equation 1.6 in Appendix 1), assume that A 22 is partitioned in a sub-matrix Z, of order (n-1), a (n-1)-long vector y, and a scalar m as: Using blockwise inversion, A −1 22 can be recursively computed using the following equation: where s is a scalar equal to m − y′Z − 1 y. Computing b = Z − 1 y and defining α = s − 1 simplifies equation (2) as follows: Similarly, as for A (see Appendix 1), there is a link between vector b and the root-free Cholesky factorization of A 22 (A 22 = TDT′), in that -b′ corresponds to the last row of the inverse triangular factor of A 22 (T -1 ).
Equation (3) shows that A −1 22 can be constructed recursively by adding a vector product to the previous result (Z -1 ). This recursive construction of A −1 22 will be called "Algorithm A" and implies, from the second row to the last row, the computation of the whole vector b.
If an animal and its parents are all selected, vector b is as sparse as in the case of A, i.e. the only non-zero elements of b correspond to parents. Restricting computations to these elements, i.e. discarding computations involving elements that we know equal 0, results in saving computing time. Such a case is, however, highly trivial. In the next sections, we propose a method to deal with more complex cases.
Contribution of selected animals to relationships in A 22 : characterizing the sparsity pattern of T -1 For animal n, vector b is the row of T -1 that spans from column 1 to column (n-1). By definition (b = Z -1 y thus Zb = y), vector b contains the required coefficients to compute relationships (y) of animal n with the n-1 preceding animals from the relationships between those n-1 preceding animals (Z). In the case of A, only known parents of animal n are required to compute its relationships with the preceding animals. Therefore, only positions of known parents have a value different from zero in vector b. In the case of A 22 , some selected animals replace the parents if they are excluded: the value in b of these selected animals is different from 0, which means that they are needed to compute relationships between selected animals (y) from the relationships between all selected animals (Z). This can be illustrated by the example pedigree in Figure 1 and Tables 1 and 2, which specifyA 22 and T -1 for the example pedigree. Three cases are outlined and detailed in the following: (i) animal G has two known parents, E and F. Animal E is excluded; its parent C (grandparent of G) is thus required (T − 1 GC ≠ 0) to explain the relationship between C and G (A 22;CG = 0.25).
(ii) animal K has one known parent, F, that is also selected. Any relationship that K shares with other selected animals is necessarily and only explained by F (∀X≠F; T −1 KX ¼ 0). (iii) animal L has one known parent, E, that is excluded. Its selected halfsib (G) and the selected parent of G (F, which is unrelated to L) are required, among others, to explain any relationship that L shares with other selected animals.
Animals that are required to explain relationships of a given selected animal with other selected animals will hereafter be denoted as the contributors of this selected animal. Contributors of a selected animal can be found by an exhaustive search of selected animals that replace any excluded parent of the selected animal. Their determination uses the pedigree and returns which elements of b (and thereby of T -1 ) are worth computing because they are expected to be non-zero. By subtraction, we obtain which elements are zeros, which is referred to as the "sparsity pattern" of T -1 in the following. In the next sub-section, we propose a heuristic algorithm that streamlines the determination of the sparsity pattern of T -1 . Similar methodologies [10,11] have been developed for the triangular factor of a symmetric-positive definite matrix rather than the inverse of the triangular factor.

An algorithm to set up the sparsity pattern
Our proposed heuristic algorithm to set up the sparsity pattern of the inverse triangular factor of A 22 (see pseudocode below) requires two inputs: the pedigree (of length n 0 , renumbered and ordered: parents precede progeny) and the subpopulation to which any animal belongs: excluded (population status is 1) or selected (population status is 2). The purpose of the algorithm is to complete two vectors of variable length for any animal i. The first vector (r (i) ) contains references to excluded parents of animal i. The second vector (c (i) ) contains selected contributors of  animal i. The positions of non-zeros in the i-th row in T -1 (sparsity pattern) includes any position of the i-th row that is listed in c (i) . Initialize a vector x as the integer sequence from 1 to n 0 .
For any animal i in the whole population (i goes from 1 to n 0 ), (0) initialize two vectors c (i) and r (i) as empty vectors (1) if the status of animal i is 2, then append element i to c (i) ; or else if the status of animal i is 1, append element i to r (i) (2) if the sire s of animal i is known and its status is 2, then append element s to c (i) ; or else if s is known but its status is 1, append vector r (s) to r (i) (3) if the dam d of animal i is known and its status is 2, then append element d to c (i) ; or else if d is known but its status is 1, append vector r (d) to r (i) (4) if the status of animal i is 2 and the vector r (i) is not empty, then: a. Select all elements of x that are at positions given in r (i) , remove duplicates and gather them in a temporary list t b. for any element k in list t, i. Append to c (i) the elements of vector c (k) not yet in c (i) ii. Select elements of x that are equal to k and replace them by i; or else if the status of animal i is 1 or if the vector r (i) is empty, do nothing.
If the whole population was selected (i.e. A 22 = A, every animal has status 2), it can be easily deduced from the algorithm that only the animal itself (in step (1)) and its known sire and dam (in steps (2) and (3)) would enter vector c (i) . The corresponding T -1 would be highly sparse, as it is for A. This also means that if numerous parents are selected, then this algorithm is expected to run very fast.
An example of the use of this algorithm is given in the Results section.

Use of the sparsity pattern in blockwise inversion of A 22
The algorithm for blockwise inversion of A 22 (Algorithm A, summarized in equation (3)) is modified to account for sparsity and will be called Algorithm B. For simplicity, we still consider the last selected animal (animal n). Algorithm B reduces computations to obtain b from y = Zb (equations 2 and 3) by three procedures, depending on the number (k) of elements in the corresponding vector c (n) and the length of b (n-1).
The first procedure (called EMPTY) is used when k = 0 (c (n) is empty). If so, only α is added to element A −1 22;nn . The value of α is just the inverse of A 22,nn .
The second procedure (called PROD, for matrix PRODuct) is used when k is smaller than but relatively close to (n-1). In such a case, we perform a line-wise partition (equation (4)) of b and Z -1 between non-zeros (of subscript u) and null (subscript v) entries of b in order to avoid useless computations: Since (n-1) is the number of elements in b and k the number of elements in b u , k dot products (of (n-1)-long vectors) would be performed instead of (n-1) dot products (of (n-1)-long vectors).
The third procedure (called LS, for Linear System of lower size) is used when k is much smaller than (n-1). In such a case, we extend the previous partition of b to a blockwise partition of Z and y (the non-zero and zero elements of b are respectively indexed by u and v): Then, applying partitioned matrix theory to equation (5) returns the following expressions for b u and b v (with and, since b v is a vector of zeros, it comes that computing b u shrinks to compute only Z −1 uu y u . In other words, the linear system Zb = y is replaced by a linear system of lower size Z uu b u = y u , and solving it is valuable only if the number of operations required to solve it is lower than the number of operations to achieve the product in procedure PROD. We chose the less expensive procedure (PROD or LS) by estimation of the number of expected floating-point multiplications. Table 2 Inverse of the triangular factor (T -1 ) of A 22 for the example of Figure 1 C Empty cells are 0.

Experimental design for tests on real populations
In order to evaluate Algorithm B in comparison with regular inversion (Algorithm A), different A 22 were computed on the basis of a real pedigree provided by the Luxembourg breeders society CONVIS. This pedigree includes dairy cows from Luxembourg with their ancestors tracing back up to 24 generations and contains 387 499 animals. Statistics of the pedigree data are Table 3.
Selected sub-populations of three sizes (4000, 8000 and 12 000 animals) were designed and are identified hereafter as the three size scenarios S4k, S8k and S12k. Animals of the selected sub-populations were randomly chosen from a pool of animals born after 1999 (128 465 animals) on the assumption that only recent animals could be of interest (those being genotyped or in production).
Because a pedigree with a lower number of extracted generations is expected to provide a sparser T -1 , the impact of the number of extracted generations was also evaluated for each size scenario. This enabled us to assess how many extracted generations were required in the pedigree to compute a A −1 22 that is a sufficient approximation to the A −1 22 computed using all available ancestors in the pedigree, which will be referred to as the "real inverse". Extracting no animals other than selected animals refers to "generation 0": the population is only made of selected animals. When extracting one generation of ancestors ("generation 1"), excluded parents enter the population. When extracting two generations of ancestors ("generation 2"), excluded grandparents also enter the population, and so on. Details on the number of animals extracted and the percentage of extraction after each generation, considered as the ratio between the number of animals in the population and the maximum number of animals available in the pedigree, are outlined in Figure 2.
Deviations from the real inverse were measured by the following norm: Matrix A 22 was computed in two steps. Inbreeding coefficients were first computed for each size scenario and number of extracted generations. The average inbreeding coefficient was never greater than 1.23 % and the greatest inbreeding coefficient was 44.53%. Matrix A 22 was then computed using the method of Colleau [12].

Two test software programs
In order to evaluate potential gains in time when using Algorithm B instead of Algorithm A to invert A 22 , we developed two test programs in Fortran 95. The programs were neither optimized for speed, nor parallelized. Therefore, all comparisons have to be interpreted as relative figures.
The first program applies the recursive construction of the inverse, as outlined in Algorithm A (equations (2) and (3)). Potential null entries in y are checked to avoid useless computations when performing product Z -1 y.
The second program restricts the same recursive construction of the inverse to non-zero elements by procedures EMPTY, PROD and LS. Potential null entries in y are also taken into account when performing the product Z u y (procedure PROD). The linear system Z uu b u = y u (procedure LS) is solved by factorization and by backward and forward substitutions.
For both programs, computing time was recorded using Fortran intrinsic subroutine CPU_TIME. For the program that uses Algorithm B, computing time includes the time required to determine the sparsity pattern. All computations and file storage were performed using double precision (15 digits). Each job was repeated 20 times on an Intel® Xeon® 64-bit processor (RAM: 8 Gb, cache size: 6 Mb, clock speed: 3 GHz).
No list to set up because r (11) is empty; x = x Animal 12. Status 2 and one known parent (status 1).

Effect of accounting for sparsity on CPU time for inversion of A 22
Algorithms A and B were both applied to the matrices created by different pedigree extractions of the three size scenarios. The elapsed CPU time results (averaged over 20 repetitions) are shown in Figure 3. Taking sparsity into account (Algorithm B) instead of using an inversion algorithm with cubic complexity (Algorithm A) reduced the elapsed CPU time for computing the inverse. For instance, the relative gains in computing speed of Algorithm B for the fully extracted pedigree were 1.67 faster for S4k, 1.75 faster for S8k, and 1.77 faster for S12k.

22
For each size scenario, A 22 was computed using different numbers of extracted generations and the inverses were compared (Figure 4) to A −1 22 computed using the fully extracted pedigree (after 23, 24 and 24 generations respectively for scenarios S4k, S8k and S12k) by computing the norm N. As shown in Figure 4, regardless of the size of the matrix, the norm stabilized after 14 generations to values less than 1E-13, which can be attributed to errors due to precision.

Discussion
Computation time required by the algorithm to characterize the sparsity pattern Figure 5 shows the elapsed CPU time (averaged over 20 repetitions) when running the proposed algorithm to determine the sparsity pattern of T -1 on populations with different numbers of selected animals (4000; 8000; 12 000) and that were extracted from several generations. The curves of the three size scenarios (S4k, S8k and S12k) presented a similar behavior. When the population consists only of selected animals (generation 0), the elapsed time was less than 1 second (S4k: 0.03 s, S8k: 0.11 s and S12k: 0.29 s). For this case, only non-zero entries occur for selected sires or dams of selected animals, a fortiori present in the pedigree. Then, elapsed CPU time increased linearly up to the 15th extracted generation, although at a different rate for the different size scenarios. Beyond that point, adding ancestors did not affect the elapsed time. These results have to be related with pedigree extraction (Figure 2): does it make sense to spend more time for additional generations? Almost all available ancestors have entered the population after extracting 10 generations (between 95-99% of the number of animals in the last extraction round). However, elapsed CPU time continued to increase at the same rate from generations 10 to 15. For instance, in scenario S12k, adding~3% of the final population cost an add-itional~4 seconds (or~22% of the total elapsed time). The usefulness of this small group of remote ancestors for inversion of A 22 is discussed hereafter (sub-section "Number of generations to extract").
For the fully extracted population (after 23, 24 and 24 generations for scenarios S4k, S8k and S12k, respectively), there was a close-to-linear relationship between the size of the selected population and the elapsed CPU time (approximately 6 seconds for 4000 additional animals in the selected sub-population). The effective computational complexity of this algorithm is difficult to establish, however, because it mostly depends, first, on how the population was split (for instance, a selected sub-population that includes mainly a few lines or families would not contain that many excluded parents) and, secondly, on how the population is structured (depth of the pedigree, effective size of the population, average inbreeding). The embedded loop in the algorithm Table 5 Sparsity pattern of T -1 for the example of Figure 1 C x indicates non-zero entries.   (step (4b) in the pseudo-code) is the main computational bottleneck and performs k iterations. In a population of n 0 animals, if k is related to the two factors mentioned above (i.e. splitting and structure of the population), then the computing time required by the algorithm would behave as n 0 ⋅ k, where k would be a case-specific factor. This agrees with the observations in Figure 5.

Memory requirements of the algorithm to characterize the sparsity pattern
For a population of n 0 animals with n selected animals, vectors c (i) and y (i) have the greatest RAM requirements. In our implementation, vector y (i) stores few elements (positions of excluded ancestors) for all animals (thus~n 0 integers). For selected animals, vector c (i) stores non-zero positions and includes approximately n⋅ n⋅ d À Á integers, where d is the average density of T -1 (number of non-zeros in the lower part of T -1 averaged by line). For excluded animals, c (i) accounts for potential selected ancestors, therefore including approximately n 0 −n ð Þ⋅ a integers, where a is the average number of selected ancestors per excluded animal. Memory would thus be allocated for approximately n 2 d þ n 0 −n ð Þ⋅ a integers. None of these integers may be declared as 3-byte integers when n 0 is lower than 2 24 (i.e. when pedigree contains less than 16.77 millions of animals).

Use of the algorithm to characterize the sparsity pattern on greater populations
If additional animals are selected, then the proportion of selected animals in the population would likely increase. In fact these additional animals would either bring new excluded ancestors (case 1), share ancestors with already selected animals (case 2), or have no registered parents in the pedigree (case 3). The two last cases are expected to be more important as the number of selected animals increases. Therefore, matrix T -1 of such a population should get sparser. These expectations were confirmed by randomly picking animals from the pool of 128 465 animals born after 1999, simulating eight larger selected sub-populations of 16 000 up to 128 000 animals. Table 6 gives sizes and proportions of the selected subpopulations. Using a computer with higher memory resources (64 Gb of RAM), the sparsity pattern of these new situations was computed. Then, the degree of sparsity was assessed as the percentage of null entries in the lower triangular part of T -1 for these new situations, as well as for previous size scenarios. The results in Figure 6 show that the degree of sparsity remained the same for low percentages of selected animals in the population (lower than 20%), while the degree of sparsity increased linearly beyond approximately 20 k animals in these specific cases. The average degree of sparsity by number of selected animals corresponded to the average number of contributors for a given animal in a given size situation. Figure 7 shows that the average number of contributors was linearly related to the number of selected animals up to~80 k selected animals, beyond which the average number of contributors was constant. We expected the average number of contributors to decrease as the number of selected animals increased. These new selected animals would then cover more of the relationships due to excluded animals. Note that the average number of contributors would be less than 2 if all animals were selected (i.e. A 22 = A).

Computation time required by the algorithm for inversion of A 22 using the sparsity pattern (Algorithm B)
When running Algorithm B, the procedure (EMPTY, PROD or LS) to compute vector b was chosen according to the estimated number of floating-point multiplications to be performed. A view of this choice along all (n-1) lines of T -1 is given in Figure 8 for each size scenario (A 22 was always computed using a fully extracted pedigree). Due to prior reordering of the pedigree by generation, the first lines of T -1 correspond to founders (unrelated animals) and are thus empty. Procedure LS occurred less than procedures EMPTY and PROD but was evenly distributed among line numbers, particularly for scenario S12k.
Considering Algorithm B led to estimation of the computational complexities based on the expected number of floating-point multiplications involved in the different tasks achieved by Algorithm B, as specified in Table 7. Total complexity is detailed for treatment of one line and for treatment of one full matrix of order n in Table 7, where treatment refers to all tasks to be performed, i.e. computing b and adding bb′ to the previous inverse. If k (average number of contributors) is considered as independent of n, the most complex term is O(n 2 ⋅ k), which is required when using the PROD procedure (proportion p P of the total). The PROD procedure is used less frequently for greater matrices (see Figures 8 and 9

Proportion of selected animals in population [%]
Number of selected animals in population  beyond 80 k animals). Treating k as independent of n is also a more reasonable assumption for greater matrices (Figure 7), since k is undoubtedly related to n for smaller matrices. The total complexity for a matrix of order n becomes: where d represents the average density of the matrix.
The most complex term d À Á 3 p L O n 4 ð Þ is tempered by two very low coefficients: the proportion of times the LS procedure is used (p L ), which may be very low for small matrices (Figure 9), and the cube of the average density d À Á , which was lower than 0.5 in our examples ( Figure 6) for matrices of order beyond 32 000. Thus, Algorithm B seems more suitable for large matrices than for small matrices, regardless of whether there is dependence between n and k or not.
The issue of numerical stability was also addressed. When using procedure PROD, the result of the previous iteration was used in the current iteration through α and b. Accumulating errors could lead to instabilities and/or divergences. However, in LS procedure, the result of the previous iteration does not affect the b that is computed. Choosing the LS procedure at regular intervals among iterations using the PROD procedure (see Figure 8) stops the accumulation of errors that could have resulted from continuously choosing the PROD procedure. Therefore, interlacing choices for both procedures is a good way to  prevent numerical instability. Independence between iterations also allows procedure LS to parallelized.

Memory requirements of the algorithm for inversion of A 22 using sparsity pattern (Algorithm B)
Algorithm B requires allocation of more than twice the RAM than Algorithm A because it cannot store the results of the inversion in the input matrix. This is due to procedure LS working on different parts of A 22 . However, since elements that are required for LS are identified when determining the sparsity pattern, they could be stored separately in order to reduce the amount of RAM required. For that reason, sparsity patterns should be established prior to computation of A 22 to determine which relationships are worth being computed.

Number of generations to extract
The depth of the pedigree to be used for instance in genetic evaluations, is still a question of debate, and often moderately deep pedigrees are used, especially when only recent data is analyzed.
Results in Figure 4 suggest that pedigree from a limited number of generations (5 to 10) is sufficient to compute A −1 22 with reasonable accuracy. The explanation is that distant ancestors do not greatly enhance a relationship. For instance, a common ancestor to animals i and j that enters the pedigree after g extracted generations and that is older than any selected animal, can only add up to 2 − 2g to the value of the relationship between i and j. In generation g, i and j can have a maximum of 2 g common ancestors. Therefore, extracting an additional generation can increase the relationship between i and j by only up to δ = 2 − g . Regardless of the number of animals added to the pedigree when extracting generation 10, the maximum change brought to any relationship reduces to less than 0.001, which would have a minor effect on the inverse scale, as confirmed by Figure 4. However, computing time required for determination of the sparsity pattern increases linearly after 10 generations ( Figure 5). Thus, limiting extraction of pedigree to 10 generations appears to be a good balance between taking into account relationships due to distant ancestors and computing time. Applying a similar study to pedigree extractions for routine genetic evaluations would be meaningful and may lead us to consider extracting a number of generations instead of a birth year limit, which is current common practice.

Practical use in a genomic background
For genomic evaluations, two specific situations where A −1 22 is needed may require the use of Algorithm B. First, as explained above and shown in equation (3), the inverse of the matrix is computed recursively by adding a block specific to the current animal to the previous inverse. At each genomic evaluation, A −1 22 could therefore be stored in a file and reused at the next evaluation cycle. At each evaluation, the matrix would be enhanced by adding newly genotyped animals. However, this approach has some limits: (1) Animals have to be listed by generation order and only animals younger than those already genotyped can be added because older animals may cause changes in the sparsity pattern. This could be easily implemented in a cattle breed such as Holstein, where only few animals are key ancestors of the breed. (2) The resulting file may be large but this could be reduced by sparse storage approaches.
Meyer et al. [13] recently applied a similar methodology for computation of the inverse of the genomic relationship matrix (G): their methodology also updates the previous inverse of G, necessitating its storage on disk from an evaluation to the next one.
Secondly, when using a pedigree of only one extracted generation, which contains genotyped animals and their ungenotyped parents, inversion of A 22 is even faster ( Figure 6) and the inverse seems to be a reasonable approximation of A −1 22 computed with a full extracted pedigree (see Figure 4 and Discussion here above). Such a fair approximation of A −1 22 may be useful as a preconditioner to solve A 22 x = v, for instance, as required in the iterative solution of MME of single-step genomic BLUP (best linear unbiased prediction) proposed by Legarra and Ducrocq [14].

Current limits
The algorithm to determine the sparsity pattern of the inverse triangular factor of A 22 is obviously useful only in inversion algorithms that use the inverse triangular factor. For other inversion algorithms, the algorithm to determine the sparsity pattern should not be useful.
Inversion algorithms that use the inverse triangular factor are useful in certain cases (e.g., for updating an inverted matrix or for obtaining quick approximations), but they would be less efficient, in terms of computing time, for the single purpose of inversion. The time required by Algorithms A and B was compared with the time required by subroutine "dkmxhf.f90" (K. Meyer, University of New England, Australia), which is a regular and efficient inversion algorithm. For inversion of the three different orders of A 22 (4000, 8000 and 12 000), computing times of dkmxhf.f90 were lower than computing times obtained with Algorithm A and similar to those obtained with Algorithm B (accounting for sparsity). For small numbers of extracted generations, computing times were slightly lower for Algorithm B than dkmxhf.f90, but were greater when greater numbers of generations were extracted. However, the computing speed of Algorithm B can benefit from several optimizations (e.g., parallelization of the LS procedure and use of specific libraries for matrix products).
For computational ease, a small population (less than 1 million animals) was used in this study. Gains in computing time have to be tested for other sizes of population. This study was also restricted to only one population by size scenario and used repetitions (20) of the algorithm on the same data. Use of a Holstein population may also be criticized because although the average computed inbreeding was never greater than 1.23%, such a population has few key ancestors. Having the key ancestors in the selected sub-population might avoid density, because they would be contributors of many other selected animals.

Conclusions
The determination of the sparsity pattern of T -1 using pedigree information is a prior step that allows gains in computing time for inversion based on the use of T -1 . This allowed the computing time for inversion of matrices of three different sizes (4000, 8000 and 12 000 selected animals) to be reduced by a factor 1.73 on average. Gains in computing time are expected to be higher if the number of selected animals exceeds 80 000. Memory requirements for inversion of such a matrix would increase and the algorithm would become numerically more stable, since the LS procedure would become more important than the PROD procedure. Moreover, computation of the inverse by a recursive method may be very helpful in the case of genomic prediction, where a new batch of younger selected animals at each upcoming evaluation must be added to the previous inverse matrix already computed.
The results on the number of pedigree generations required for the selected animals suggest that no more than 14 generations should be extracted. If the working precision is less than 15 digits, this can even be reduced. A good balance between computing time for determination of the sparsity pattern and accuracy may be achieved with 10 extracted generations.