Computation of all eigenvalues of matrices used in restricted maximum likelihood estimation of variance components using sparse matrix techniques

Summary - Restricted maximum likelihood (REML) estimates of variance components have desirable properties but can be very expensive computationally. Large costs result from the need for the repeated inversion of the large coefficient matrix of the mixed-model equations. This paper presents a method based on the computation of all eigenvalues using the Lanczos method, a technique reducing a large sparse symmetric matrix to a tridiagonal form. Dense matrix inversion is not required. It is accurate and not very demanding on storage requirements. The Lanczos method, the computation of eigenvalues, its application in a genetic context, and an example are presented.

(Received 16 March 1995; accepted 3 October 1995) Summary -Restricted maximum likelihood (REML) estimates of variance components have desirable properties but can be very expensive computationally. Large costs result from the need for the repeated inversion of the large coefficient matrix of the mixed-model equations. This paper presents a method based on the computation of all eigenvalues using the Lanczos method, a technique reducing a large sparse symmetric matrix to a tridiagonal form. Dense matrix inversion is not required. It is accurate and not very demanding on storage requirements. The Lanczos method, the computation of eigenvalues, its application in a genetic context, and an example are presented.
Lanczos method / sparse matrix / restricted maximum likelihood / eigenvalue Résumé -Calcul de toutes les valeurs propres des matrices utilisées dans l'estimation du maximum de vraisemblance restreinte des composantes de variance à l'aide de techniques applicables aux matrices creuses. Les estimations du maximum de vraisemblance restreinte (REML) des composantes de variance ont des propriétés intéressantes mais peuvent être coûteuses en temps de calcul et en besoin de mémoire. Le problème vient de la nécessité d'inverser de façon répétée la matrice des coefficients des équations du modèle mixte. Cet article présente une méthode basée sur le calcul des valeurs propres et sur l'utilisation de la méthode de Lanczos, une technique permettant de réduire une matrice creuse, symétrique et de grande taille en une matrice tridiagonale. L'inversion de matrices denses n'est pas nécessaire. Cette méthode donne des résultats précis et ne demande que très peu de stockage en mémoire. La méthode de Lanczos, le calcul des valeurs propres, son application dans le contexte génétique et un exemple sont présentés. INTRODUCTION The accuracy of estimates of variance components is dependent on the choice of data, method and model. The estimation of (co)variance components by restricted maximum likelihood (REML, Patterson and Thompson, 1971) procedures is generally considered to be the best method for animal breeding data. Furthermore, the animal model is considered to be the model which utilizes the information from the data in the most efficient way. Several different REML algorithms (derivative-free, expectation-maximization, Fisher-scoring) have been used with animal breeding data. Most methods are iterative and require the repeated manipulation of the mixed-model equations (Henderson, 1973).
The derivative-free (DF) algorithm (Smith and Graser, 1986) involves evaluating the log likelihood function explicitly and directly finding the parameters that maximize it. Estimation of (co)variance components does not involve matrix inversion but requires evaluation of: at each iteration, where C represents the coefficient matrix of the mixed-model equations.
The expectation-maximization (EM) algorithm (Dempster et al, 1977) which utilizes only first derivative information to obtain estimates that maximize the likelihood function of data, requires the diagonal blocks of the inverse of the coefficient matrix for random effects (C uu ) and traces of their products with the corresponding inverse of the numerator relationship matrix (A-1 ). These traces can be written as: where Z is the incidence matrix for any random effect, M is the absorption matrix of fixed effects and a is the variance ratio. For a comparison of EM-and DF-REML, see Misztal (1994). The Fisher-scoring (Meyer, 1985) algorithm, which is based on second derivatives of the likelihood function, requires the calculation of more complicated traces like: Expressions in [1-3] have straightforward multitrait extensions. Calculation of the inverses in [2] and [3] is the main computational limitation for the use of REML, particularly when the coefficient matrix is very large. Several attempts have been made to find direct or indirect methods for alleviating these numerical computations. While such algorithms such as DF-REML have proven robust and easy to use, they are generally slow to converge, often requiring many likelihood evaluations, in particular for multivariate analyses fitting several random factors. However, as noted by Graser et al (1987), they only require the factorization of a large matrix rather than its inverse and can be implemented efficiently using sparse matrix techniques for analyses involving several tens of thousands of animals. In the same way, Misztal et al (1993) studied the feasibility of estimating large-scale variance components in an animal model by an EM-REML algorithm using sparse matrix inversion (FSPACK). Other techniques for reducing computational demands based on algorithms using derivatives of the likelihood function (EM or method of scoring procedures) have involved the use of approximations (Boichard et al, 1992) or sampling techniques (Misztal, 1990;Thallman and Taylor, 1991;Garcfa-Cort6s et al, 1992). Along the same lines, Smith and Graser (1986) have advocated the use of sequences of Householder transformations to reduce the coefficient matrix C to a tridiagonal form, thus eliminating the need for direct matrix inversion. This is based on the observation that tr[C] = tr[QCQ'] for any Q such that QQ' = Q'Q = I and so that (!C(a' is tridiagonal. Furthermore, this idea has been extended by Colleau et al (1989) to compute from the tridiagonalized coefficient matrix the trace of matrix products required in a Fisher-scoring algorithm applied in a multivariate setting and by Ducrocq (1993) with an extra diagonalization step.
Diagonalization is the problem of finding eigenvectors and eigenvalues of C. As noted by Dempster et al (1984), Lin (1987) and Harville and Callanan (1990), matrix inversion is avoided if a spectral decomposition of the coefficient matrix is used. Then REML estimation of variance components in mixed models becomes computationally trivial and requires little computer storage. In expressions !l-3!, the problem amounts to finding eigenvalues of large sparse symmetric matrices. Indeed [1] can be written in terms of eigenvalues: where the A i are the eigenvalues of the coefficient matrix C and, if L is the Cholesky factor of A (A = LL'; Henderson, 1976 where the 7 i are the eigenvalues of L'Z'MZL. Until now, all of these methods have implied working on dense matrices stored in the core of the computer. These transformations are therefore limited to data sets of moderate size. The purpose of this paper is to present and apply in a genetic context a method for computing some or all eigenvalues of a very large sparse matrix with very little computer storage. This method, attributed to Lanczos (1950), generates a sequence of tridiagonal matrices T, with the property that eigenvalues of T! E J2! &dquo;! are progressively better estimates of eigenvalues of the large matrix as j increases.
This method was developed by Cullum and Willoughby (1985). The computer storage required is minimal.

METHODS
Computing all eigenvalues of a very sparse symmetric matrix Although the problem of computing eigensystems of symmetric dense matrices that fit easily in the core has been satisfactorily solved using sequences of Householder transformations or Givens rotations (Golub and Van Loan, 1983), the same cannot be said for very large sparse matrices. One way to find some or all eigenvalues of large sparse symmetric matrices B is to use the Lanczos method (Lanczos, 1950).
If a large matrix is sparse, then the advantages of methods which only use matrixvector multiplications are obvious as the matrix B need not be stored explicitly. All that is needed is a subroutine for efficiently computing the matrix vector product Bv for any given vector v. In the genetic context considered, it will be seen that such a computation is often easy. No storage of large matrices is needed. In the Lanczos method, only two vectors and the relevant information on B to build the sparse matrix-vector product Bv need to be used at each step if only the eigenvalues are required.
In theory, the Lanczos algorithm reduces the original matrix B (n x n) to a sequence of tridiagonal matrices: For a real symmetric matrix B, it involves the construction of a sequence of orthonormal vectors Vj (for j = 1, ... , n) from an arbitrary initial vector v i by recursively applying the equation: and /9 i = 0 and v o = 0. The vectors Vj are referred to as the 'Lanczos vectors'. If at each stage, the coefficient a j is chosen to make v j+l orthogonal to v j and { 3 j+1 is chosen to normalize Vj+l to unity, the vectors should form an orthonormal set and the tridiagonal matrix T n with diagonal elements a j and off-diagonal elements !3!+1 (j = 1, ... , n) should have the same eigenvalues as B. The coefficients o j and {3 j+1 are determined as follows: The resulting collection of scalar coefficients a j and { 3 1+l obtained in these orthogonalizations defines the corresponding Lanczos matrices T j . Paige (1971) showed in his thesis that for the basis Lanczos recursion [7] orthogonalization with respect to only the two most recently generated Lanczos vectors is sufficient to guarantee that each Lanczos vector is orthogonal to all previously generated Lanczos vectors. The idea of the Lanczos procedure is to replace the computation of eigenvalues for the matrix B by the computation of those of the simpler Lanczos matrices T j ( j = 1, ... , n). Cullum and Willoughby (1985) showed that the eigenvalues of the T j provide good approximations to some of the eigenvalues of B. So, if the Lanczos recursion is continued until j = n, then the eigenvalues of T n will be the eigenvalues of B. In this case, T n is simply an orthogonal transformation of B and must have the same eigenvalues as B.
Theoretically, the Lanczos method cannot determine the multiplicity of any multiple eigenvalue of the matrix B. If the matrix B has only m distinct eigenvalues (m < n) then the Lanczos recursion would terminate after m steps. Additional computations will be needed to determine which of these eigenvalues are multiple also to compute their multiplicities. The Lanczos method seems attractive for large sparse matrices because the requirements for storage are very small (the values of a! and (3j for all j). The elements of v j and v j -l for the current value of j must be stored, as well as B, in such a way that the subroutine for computing Bv from v takes full advantage of the sparsity of B. The eigenvalues of B can be found from those of the more easily handled symmetric matrices T j ( j = 1, ... , n), which are then determined by a standard method (eg, the so-called QL algorithm or a bisection method; Martin and Wilkinson, 1968).
However, because of rounding errors, Vj+1 will never be exactly orthogonal to Vk (for all k ! j -2). Consequently, the values of coefficients a! and ,!!+1 are inaccurate and the nice properties of the T j described above are quickly lost. Paige (1971) and Edwards et al (1979) showed that the loss in the orthogonality of the Lanczos vectors occurs essentially when the quantity [,3 j (ejx)] becomes small (e j is the jth base vector, ie, all components are equal to zero except for the jth one which is equal to one, and x is an arbitrary vector). A first approach to correct the problem consists of reorthogonalizing all or some of the vectors Vj at each iteration, but the costs of this operation and the computer storage required for all v j can be extremely large. Another approach is not to force the orthogonality of the Lanczos vectors by reorthogonalizing but to work directly with the basic Lanczos recursion, accepting the losses in orthogonality and then unravelling the effects of these losses. Because of this failure of orthogonality, the process does not terminate after n steps but can continue indefinitely to produce a tridiagonal matrix of any desired size. Paige (1971) showed that if the tridiagonal matrix is truncated after a number of iterative steps k much greater than n, the resulting tridiagonal matrix T k has a group of eigenvalues very close to the correct eigenvalues for each eigenvalue of the original matrix. He also showed that rounding errors only delayed convergence but did not stop it. Indeed, the eigenvalues of the Lanczos matrices are either 'good' eigenvalues, which are true approximations to the eigenvalues of the matrix B, or 'extra' or 'spurious' eigenvalues, which are caused by the losses in orthogonality. Two types of 'spurious' eigenvalues can be distinguished. Type one is a less accurate copy or 'ghost' copy of the good eigenvalue; type two is genuinely spurious. The main difficulty is to find out which of these eigenvalues correspond to eigenvalues of the original matrix B. For that, the inverse iteration method and the corresponding Rayleigh quotient (Wilkinson, 1965;Chatelin, 1988) are used.
The identification test which separates the bad eigenvalues from the good ones rests upon certain relationships between the eigenvalues of the Lanczos tridiagonal matrices T j and the eigenvalues of the submatrices T j obtained by deleting the first row and column of the matrix T j . Any eigenvalue of T j , which is also an eigenvalue of the corresponding T j matrix, is labelled as 'spurious' and is discarded from the list of computed eigenvalues. All remaining eigenvalues, including all numerically multiple ones, are accepted and labelled as 'good'. So one can directly identify those eigenvalues which are spurious. In the Lanczos procedure, numerically multiple eigenvalues, which differ from each other by less than a user-specified relative tolerance parameter, are accepted as accurate approximations of eigenvalues of the original matrix B and the others (simple eigenvalues) may or may not have converged. This is checked by computing error estimates (only on the resulting single isolated eigenvalues). These error estimates are obtained by calculating the product of the kth component (k being the size of Lanczos matrix considered) of the corresponding Lanczos matrix eigenvector u by {3 k+l (Cullum and Willoughby, 1985).
Therefore, in order to apply this technique, it is necessary to first choose a value k for the size of the Lanczos matrix (often greater than the size of the B matrix if all eigenvalues are required). Paige (1971) and Cullum and Willoughby (1985) showed that the primary factor determining whether or not it is feasible to compute large numbers of eigenvalues for a given k is the gap ratio (the ratio of the largest gap between two adjacent eigenvalues to the smallest such gap). The smaller this ratio, the easier it is to compute all the eigenvalues of the original matrix. The larger this ratio, the larger the size of the Lanczos matrix required to obtain all eigenvalues of the B matrix. Cullum and Willoughby (1985) showed that reasonably wellseparated eigenvalues on the extremes of the spectrum of B appear as eigenvalues of the Lanczos matrices for relatively small size of the Lanczos matrix. Therefore this method can also be applied to efficiently find the extreme eigenvalues of a sparse matrix, which are required for example in finding optimal relaxation parameters, when iterative successive relaxation methods are used to solve linear systems (Golub and Van Loan, 1983). Since there is no reorthogonalization, eigenvalues which have converged by a given of the Lanczos matrix may begin to replicate as the size of the Lanczos matrix is increased further.
The Lanczos algorithm does not give rules for determining how large the Lanczos matrix must be in order to compute all the eigenvalues. Paige (1971) showed that it takes more than n steps (generally between 2n and 8n are needed) to compute all eigenvalues of the spectrum. The stopping criterion cannot be determined beforehand.
A method to determine the multiplicities of multiple eigenvalues The Lanczos procedure cannot directly determine the multiplicities of the computed eigenvalues as eigenvalues of B. Unfortunately, the multiplicity of an eigenvalue of a Lanczos matrix has no relationship with its multiplicity in the original matrix B. Good eigenvalues may replicate many times as eigenvalues of a Lanczos matrix but be only single eigenvalues of the original matrix B. In fact, numerically multiple eigenvalues are accepted as converged approximations to eigenvalues of B.
Cullum (personal communication) proposed an approach to determine and compute the multiplicities of multiple eigenvalues of the matrix B, but it requires the computation of the eigenvalues of two or more associated matrices, ie, the algorithm presented previously has to be applied several times. The different matrices required are simple modifications of the original B matrix. The approach of Cullum is based upon the following property. If B is a real symmetric matrix and A is an eigenvalue of B with multiplicity p (p > 2) then A will also be an eigenvalue of the matrix obtained from B by adding a symmetric rank-one matrix: where VI is the starting vector used for B in the Lanczos algorithm and v, is an arbitrary scalar. Theoretically, if the Lanczos procedure is applied to B and with the same VI as starting vector, then the tridiagonal matrices T j generated for B would be related to those generated for B: One could continue this approach to determine the specific multiplicities of each of these multiple eigenvalues by considering the matrix: B = B+v 2 v 2 v2, where v 2 is the second Lanczos vector generated for B or any vector orthogonal to v l , and V2 is a scalar which can be equal to v l . Any eigenvalue of B that has multiplicity greater than two will be in the spectrum of B, and those with multiplicity equal to two will be in the spectrum of B and not in the spectrum of B. Thus, the procedure could continue with successive transformations of the B matrix until a matrix is obtained with no eigenvalues corresponding to eigenvalues of B. This approach seems attractive if the multiplicities of multiple eigenvalues of the matrix B are small. If this is not the case, this operation will be expensive, because the algorithm must be applied as many times as the largest multiplicities of multiple eigenvalues. We present here another approach which can be used when the number of multiple eigenvalues of the B matrix is small but their multiplicities are large. The above procedure is applied only once to determine which eigenvalues are multiple and then the following expressions are used to compute the multiplicities of each multiple eigenvalue. First, one makes use of the fact that the total number of eigenvalues of B is equal to its dimension: where N is the dimension of matrix B, N S is the number of single nonzero eigenvalues, r is the number of multiple nonzero eigenvalues and m i is the multiplicity of the ith multiple eigenvalue !.
It is also known that: where the A j are the simple eigenvalues.
Finally, the m i are obtained solving the (possibly overdetermined) system of three equations, for example, using integer linear programming subroutines (the m i must be integers).
The traces of B and B 2 are simply obtained by using the subroutine of the sparse matrix-vector multiplications which will be presented in the next part in a particular setting. The matrix B need not be stored explicitly. Only two vectors are needed to compute these traces: where the b ii are the diagonal elements of the B matrix, e i is the ith base vector, and (Be i ) i represents the ith element of the vector (Be i ).
The validity of the approach proposed here to determine multiplicities was verified on several examples for matrices of moderate size (up to n = 1000). For such matrices, a regular approach (for example, routine F02AAF of the NAG Fortran Library) working on dense matrices stored in the core can be used to compute all eigenvalues and their multiplicities. It was found that Cullum and Willoughby's method applied using a Lanczos matrix of size k = 2n and computing multiplicities from equations [11], [14] and [15] as proposed above, leads to exactly the same results as the regular approach.

Sparse computation of the matrix vector product Bv
Here we illustrate the previous method and its use with sparse matrix techniques to compute [2] and/or [3] in the context of a simple animal model with one fixed effect. In a genetic context, a typical mixed animal model is characterized by: where: y = data vector, (3 = vector of fixed effect, a = vector of random additive genetic effects, such that a -N(0, Ao,2), X, Z = known incidence matrices associated with vectors (3 and a respectively, e = vector of random residuals, such that e N N(0, I Q e ), A = numerator relationship matrix among animals (Henderson, 1975). The mixed-model equations (MME) of Henderson (1973) are: Absorption of the fixed effect equations leads to: The estimation of parameters Q a and Q e by an EM-REML (or a Fisher-scoring) procedure involves the computation of the trace given in [5] (or [5] and [6] for Fisher-scoring). The matrix B considered in the Lanczos method corresponds to L'Z'MZL here. We will assume that, before the computation of any matrix vector product Bv = u (where the vector v is arbitrary), a pedigree file and a data file have been read and that their information has been stored into three vectors of size n, where n is the size of A; the sire, dam and level of the fixed effect of each animal are stored in s i , d i and h i , respectively (h i = 0 if the animal has no record). Progeny must precede parents. Simultaneously, the number of observations in each level of the fixed effect is cumulated in a vector of size n f .
Note that u = Bv = L'Z'MZLv = L'Z'ZL V -L'Z'X(X'X)-X'ZLv. The sparse computation of Bv = u involves the following loops: (1) compute w = Lv and t = (X'Z)Lv (2) compute f = (X'X)-t (3) compute u = L'Z'(Zw -Xf) = L'r The computation of w = Lv in [1] and u = L'r in [3] is performed by solving the sparse triangular systems L-l w = v and L -T u = r. Each line i of L-1 and each column of L -T contains at most three nonzero elements which can be easily identified (the diagonal element + the elements in columns (or lines) s i and d i ). Vector t is obtained by simply cumulating elements of w corresponding to animals with records. Similarly, elements of r are equal to the difference between the elements of w and the appropriate elements of f for animals with records and to zero for the others. Note that only five vectors of size n (s, d, h, u and v) and two of size n f (f and t) are required.

EXAMPLE
To illustrate the method, a data set including the type scores of 9 686 dairy cows and their ancestors (21 269 animals in total) for 18 type traits was created. To estimate the genetic parameters of these type traits, the animal model [16] was used assuming precorrection of the data for age at calving and stage of lactation, and using a month-herd-class classifier effect as a unique fixed effect (292 levels). A canonical transformation of the data can be applied (eg, Meyer, 1985) because all traits are analyzed according to the same model with equal design matrices. However, it was considered that the repeated computations of expressions [2] and/or [3] for matrices of size n = 21269 could be advantageously replaced by a unique computation of all eigenvalues of the matrix B = L'Z'MZL followed by the repeated use of equalities [5] and [6] in a Fisher-scoring REML iterative procedure.
The main programs were supplied by Cullum and the sparse computation of the matrix vector product Bv was done according to the strategy described above on an IBM Rise 6000/590 computer. Tridiagonal matrices T k with k = 2n, 4n, 6n, 8n and lOn were computed using the Lanczos recursion [7]. The procedure was applied twice, once on B and once on B [9] in order to detect multiple eigenvalues, then their multiplicities were calculated using the relationships [11][12][13] and an integer linear programming subroutine.
REML estimates of genetic and residual (co)variances were obtained repeatedly using the eigenvalues computed from Lanczos matrices of increasing size in [5] and [6]. The quadratic forms required at each Fisher-scoring iteration were calculated by iteratively solving the mixed-model equations of a multiple-trait best linear unbiased prediction (BLUP) animal model, after canonical transformation and with starting values equal to the solutions of the previous system of equations. Estimates of asymptotic standard errors of (co)variance parameters were available from the inverse of the information matrix at convergence of the Fisher-scoring iterative procedure. Estimates of heritabilities, genetic and residual correlations and the corresponding asymptotic errors were also compared. Table I shows the number of distinct eigenvalues computed and the calculated multiplicities of the four multiple eigenvalues (0, 0.50, 0.6875, 0.75) encountered, and the CPU time required for the different values of k (2n, 4n, 6n, 8n and 10n). CPU time mainly consisted of two parts: the time required for the computation of the Lanczos matrices T k increased linearly with k. The computing cost for the determination of the eigenvalues of the tridiagonal matrices T k increased quadratically with k and was always much larger than the calculation of T k , for the simple model with only one fixed effect considered here.

RESULTS
Obviously, very large Lanczos matrices must be used in order to detect all eigenvalues: 31 new distinct eigenvalues were detected when the size of the Lanczos matrix was increased from 8n to 10n. For k = lOn, all the eigenvalues found had a good precision (at least 10-5 with very few precisions worse than 10-1°) but there is a possibility that other eigenvalues may still have been kept undetected. As a result, the multiplicities of the multiple eigenvalues obtained using [11][12][13] Cullum and Willoughby (1985). eigenvalues obtained for k = 10n are assumed to be true values. The striking result is that, whatever the value of a considered and although some distinct eigenvalues are undetected and the multiplicities of the multiple eigenvalues are incorrect, the traces considered are very well approximated when a Lanczos matrix of size k = 2n is used and are virtually exact when k ! 4n.
It is well known that small departures from the true values of the traces can lead to rather large biases in the final REML estimates due to the iterative algorithms used. This explained some disappointing conclusions when approximations of traces were proposed (eg, Boichard et al, 1992). Table III reports some characteristics of the estimates of the genetic parameters calculated using the eigenvalues obtained from Lanczos matrices of different size for the computations of traces like those in [5] and [6]. A Fisher-scoring algorithm was used and stopped after ten iterations for each run, although convergence was practically achieved on all parameters after five or six iterations. Each complete run for 18 traits treated simultaneously took about 4 min of CPU time, mainly devoted to the solution of the mixed-model equations.
Estimates of parameters (variances, heritabilities and correlations) were virtually identical regardless of the value of k. This is obviously a consequence of the good quality of the approximation of the traces reported in table II.

DISCUSSION AND CONCLUSION
This example clearly shows the feasibility of the computation of all eigenvalues for very large sparse matrices. The main difficulty is to determine the size k of the tridiagonal matrix T k to be used in practice. Cullum and Willoughby's approach has two main limitations: a) eigenvalues in small dense clusters are difficult to find; and b) there is no really satisfying way to compute the multiplicities of multiple eigenvalues when these multiplicities are very large. However, in the particular case when eigenvalues are used only to calculate traces, these two drawbacks annihilate each other, at least when the approach proposed here to determine the multiplicities is chosen and when the number of multiple eigenvalues is small; the use of incorrect multiplicities compensates for undetected eigenvalues. Therefore, there is no need to find all eigenvalues to have an excellent approximation of the quantities required in first-and second-order REML algorithms.
Our main objective was to show that the use of the simple expressions [5] and [6] is not limited to small, dense matrices. Many new directions of research are opened. The efficient computation of the matrix product Bv for any vector v in more complex models is a key point for extending this approach to other situations. For example, we were able to compute Bv = L'Z'MZLv when another fixed effect (a group of unknown parent effect) was added to the model used in our example. CPU time for the Lanczos recursion was doubled but the computation of the eigenvalues of the tridiagonal matrices remained by far the most timeconsuming part. Other promising techniques like the 'divide and conquer' approach (Dongarra and Sorensen, 1987) could be much more attractive than the traditional QL method used here. These techniques designed to take full advantage of parallel computations, when these are possible, may still significantly reduce the time required for the diagonalization of the Lanczos matrices when used in serial mode (Sorensen, personal communication).
The value of knowledge of the eigenvalues of large sparse matrices is not limited to REML estimation. It appears for example in the analysis of the contributions of different lines to the estimation of genetic parameters from selection experiments (Thompson and Atkins, 1994). Sometimes, only extreme eigenvalues are needed, eg, in the computation of the optimal relaxation factors in iterative algorithms for solving linear systems (Golub and Van Loan, 1983). In the Lanczos procedure, information about Bs extreme eigenvalues tends to emerge long before the tridiagonalization is complete (for k < n). In our example, the largest eigenvalue was detected with a value of k as low as 5.
In situations where the knowlege of all eigenvalues is necessary, the problem of the multiplicities of multiple eigenvalues may be tackled differently. The three nonzero eigenvalues observed in our example (0.5; 0.6875; 0.75) correspond to groups of animals with similar characteristics (full-sibs, same sire and maternal grandsire, half-sibs) as already pointed out by Thompson and Shaw (1990). However, we were not able to determine the expected multiplicities by a careful look at the pedigree file (except for full-sibs). If an efficient algorithm to compute these eigenvalues were available, the computation of eigenvalues using the Lanczos method could be limited to a modified smaller matrix, as suggested by Thompson and Shaw (1990), at least as long as the sparsity of the matrix is not significantly altered.