Approximate restricted maximum likelihood and approximate prediction error variance of the Mendelian sampling effect

HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés. Approximate restricted maximum likelihood and approximate prediction error variance of the Mendelian sampling effect Didier Boichard, L.R. Schaeffer, A.J. Lee


INTRODUCTION
Restricted Maximum Likelihood (REl!!IL; Patterson and Thompson, 1971) is considered as the method of choice for estimating variance and covariance components. Applied to an animal model, REML may account at least partly for assortative matings, selection over generations and selection on a correlated trait (Meyer and Thompson, 1984;Sorensen and Kennedy, 1984). Increase in computational capacities and development of new algorithms, such as the derivative-free algorithm (Graser et al, 1cJ87;1B!Ieyer, 1989a, 19cJ1) made practical application of RENIL possible on medium-size data sets, particularly in analyses of selection experiments. However, there are still severe limitations with large data sets or with multiple trait models when some data are missing.
Conceptually, the Expectation-Maximization (EM) algorithm, proposed by Dempster et al (1!J77) is one of the simplest, exploiting first derivative information only. An important property of ER/I is that variance and covariance components estimates remain within the parameter space. It is usually slow to converge, but an acceleration (Laird et al, 1987) can substantially reduce the number of iterations required. However, tlie EM algorithm requires the inverse of tbe coefficient matrix for random effects. More than the repeated solution of animal model equations, calculation of this inverse is the primary limitation computationally, particularly when the coefficient matrix is large. Some attempts have already been made to approximate this inverse or at least its diagonal (Wright et al, 1987;Tavernier, 1990) but not under an animal model with complete relationships. The objectives of this paper were 1) to present an approximate method for computing tb-r trace involved in ew EA4-type REML algorithm for an animal model with one class of fixed effects and one class of random effects, 2) to derive an approximate variance-covariance component estimation procedure suited to large data sets and some kinds of multiple trait models, and 3) to examine the accuracy of this approximate method in applications.

METHODS
Use of an equivalent model For simplicity, the main development is described initially with a single trait model, and its extension to tlie multiple trait situation will be presented in a second step.
Let the model be: with Y being the vector of observations, p being the vector of fixed effects, assumed to include only one factor called management group, u being the vector of n additive genetic effects, with expectation E(u) = 0 and variance V(u) = Ao,', A being the numerator relationship matrix, e being the vector of residual effects, with expectation E(e) = 0, variance V(e) = 1 0 -; and zero covariance between u and e, and X and Z being the corresponding design matrices.
In an ElB!I-type RE1VIL, <7! is usually estimated iteratively by (Henderson, 1984): with C 22 being the n x n block of the inverse C of the coefficient matrix, pertaining to genetic effects, and [k] the round of iteration. In the following part, superscript [k] be will omitted.
Following Henderson (197G), if the individuals are sorted from the oldest to the youngest, the inverse of the coefficient matrix can be written as: L is a lower triangular matrix with one on the diagonal and at most 2 non-zero terms per row. cciual to -0.5 and relating a progeny to its parents. D is a diagonal matrix with general term d ii , with dii = 4/(2 -Øs -O d ) if both parents s and d of i are known, dii = 4/(3 -ø s ) if one parent, say s, is known, d ii = 1 if both parents of i are unknown, !9 being the inbreeding coefficient of the parent s. Quaas (1984) proposed an equivalent model based on the Mendelian sampling effect (w), ie the deviation of the progeny breeding value from parental average. with w = Lu, E(w) = 0 and V(w) = D-1(j!. Meyer (I!J87) showed that the use of this equivalent model may simplify the estimation of variance components. The two parts of the right-hand side in [1] can be rewritten as: with M being the matrix of fixed effects absorption, A the variance ratio at iteration k, and K the coefficient matrix of the equivalent model, after absorption of the fixed effects.
Because D is diagonal, only the diagonals of K-' are needed to calculate tr!D K-1!, and, noting that those are equal to the prediction error variances of the Mendelian sampling effects, [1] can be rewritten again as follows: The next step is to determine the prediction error variance of the individual Mendelian sampling effects or, equivalently, the diagonal of K-1 .
Simplification of K = L-1!Z'MZL-1 + AD L -1 is a lower triangular matrix with general term L2! being the expected proportion of i's genes coming from j. On the diagonal, L ii = 1. If i is a descendant of j and n the number of generations between i and j, then l ij = E0.5'!; l ij = 0 otherwise. If j appears several times in the pedigree of i, the contributions are summed over the different pathways. In absence of inbreeding, L2! = 0.5 if i is a progeny of j, 0.25 if i is a grand progeny of j, and so on. The structure of K may be examined. Its general term A:,! can be written as with d ij being the general term of D(di! = 0 if i different to j) and z!! the general term of Z'MZ. Accordingly, k2! is non-zero if one of the 4 following conditions is fullfilled: and are related; or i and j are contemporary (ie have a record in the same management group); or i and j have a common descendant; or both i and j have a descendant, and these 2 descendants are contemporary. Consequently, the K matrix is rather dense and the non-zero proportion is frequently over 50%. Therefore, its exact inverse is computationally expensive to obtain and 2 simplifications are proposed to derive a sparse approximate K matrix.
The covariance between contemporaries, generated by the management group absorption is assumed to be null. Consequently, Z'MZ remains diagonal with general term Zii equal to 1 -1 /nh, if i has a record, with n h the number of records in the management group h of i. Off-diagonal terms of Z'MZ, equal to -1/n h , are neglected. Obviously, the smaller n h , the greater the impact of this simplification.
Only the diagonals (1) and the first-order terms relating parents to progeny (0.5) of L-1 are taken into account, and the other terms are neglected.
After these 2 simplifications, the density of K is very low and its structure is simple. That is, an individual may be related with a non-zero term in K only to its parents, its progeny and its mates. Its structure looks like that of A-1 (Henderson, 1976) and consequently K may be obtained directly from a pedigree list and a data file, according to the following rules. Assuming z ii equal to 0 for animals without records and (1 -1/n h ) for animals with a record, contributions to K of animal i, with sire s and dam d, are the following: Approximate inversion of K More exactly, only the diagonal of K-1 is needed. A priori the structure of the K matrix is rather favourable since only the diagonal terms receive contributions of the variance ratio A, weighted by d ii , which is greater than or equal to one. Therefore, the diagonal terms are consistently higher than the off-diagonals, particularly when the variance ratio is high, ie when the heritability is low. Schaeffer (1990) proposed an approximation of the diagonal of the inverse by the inverse of the diagonal terms of K. According to the structure of K, similar to that of A-', Meyer's method (1989b) can be adapted. lVleyer's method is an approximate method to obtain prediction error variances of breeding values under an animal model. The basic idea is to adjust diagonal terms of each individual in the mixed model equations, by absorbing relatives equations, and to invert the resulting term. For each animal, only the most important equations, corresponding to its parents, its progeny and its management group are formally absorbed. However, processing the pedigree in the right order makes it possible to concentrate information from the whole population to a given animal. Such a process involves 2 steps. First, the sequential absorption of progeny equations into parents, from the youngest to the oldest progeny in the population, and secondly, the sequential absorption of parents equations into progeny, in the reverse order. The same algorithm can be applied to the K matrix.
Let i be an animal with sire s and dam d and let k.L i and k!t1 denote its diagonal term in K before and after adjustment respectively. Then the ith diagonal term of K-' is approximated by 1/k ii .
Extension to multiple trait models Consider now a model with q traits, possibly with missing data. Let G be the non singular q x q genetic variance-covariance matrix and G-1 its inverse. Let R7 be a generalized inverse of the q x q residual variance-covariance matrix corresponding to individual i, with null rows and columns according to missing data. Firstly, R7 is adjusted for the fixed effect absorption: If K ij is the q x q block of the K matrix corresponding to animals i and j, the rules to build the K matrix are similar to those in part B. Contributions of animal i, with sire s and dam d, are the following: Again, strategies of Schaeffer and Meyer can be applied. In the first one, offdiagonals blocks K ij are neglected and the K ii blocks are inverted. With Meyer's method, the 3 steps are the following: Absorption of progeny equations into parents, from the youngest to the oldest progeny in the population, gives Absorption of parents equations into progeny, in the reverse order, is performed using one of the formulae, according to whether one or both parents are known. If one parent, say s, is known, If both parents are known Finally, invert the K ii blocks.

Material
The accuracy of the present method was investigated at 2 different levels. First, the approximate trace tr (A -l C 22 ) was compared to the true one. Three different data sets were used. The first one was a small simulated data set with 150 animals over 5 generations and records in 17 management groups. It was used to measure the effect of each individual simplification (L-1 , management group absorption, inversion). The other 2 data sets, of medium size, corresponded to real examples. The &dquo;cattle&dquo; data set included 722 feed efficiency records of Holstein heifers of the Agriculture Canada experimental farm in Ottawa. Records were distributed in 44 management groups and, after adding pedigree information, 1 248 animals were evaluated. The &dquo;chicken&dquo; data set included residual feed intake (R) data of a chicken line, called R-and selected over 15 discrete generations (Bordas and Merat, 1984).
This line included 2 G20 chickens and 640 parents with a complex family structure. In these 3 situations, approximate traces obtained according Schaeffer's and Meyer's strategies were compared to the true trace under 4 heritabilities (0.01, 0.10, 0.25, 0.50).
At the second level, an approximate RENIL was implemented and compared to a true one. Results were based on the chicken data. The female residual feed intake (R) was defined as the deviation of observed feed intake from a theoretical feed intake predicted from maintenance, change in body weight and egg production. For the male trait, only maintenance and change in body weight were accounted for. Firstly, the female residual feed intake was analyzed alone in a single trait animal model. Next, because preliminary results led us to assume that the male and the female R were not the same trait, they were analysed in a 2 trait model.
To decrease the computation cost of the true REML, and particularly the bivariate one, requiring repeated inversion of the reduced animal model coefficient matrix, the first 12 generations only were analysed. The characteristics of the data set are in table I. To speed up convergence, an exponential acceleration (Laird et al, 1987) was used every 6 iterations but was applied only if the resulting variance-covariance matrices were positive definite. Table II shows the results obtained from the small simulated data set. The density of K was strongly reduced from 39.4% without approximation to 2.9% with simplifications of L-1 and management group absorption. This reduction is expected to be much more important in large applications since the number of nonzero terms in the approximate coefficient matrix K is less than 7 times the number of animals.

Comparison of true and approximate traces
Obviously, the true trace increased with heritability, because the prediction error variance of each Mendelian sampling effect increases with genetic variability. Generally, the simplification of L-1 led to a small increase of the trace, while the simplification of the management group absorption led to a decrease, particularly for high values of heritability. This example was rather unfavourable to the simplified methods since the average number of contemporaries n h was rather small (8), and moreover, contemporaries were often highly related.
The approximate inversion of K had no additional effect when the heritability was low but led to underestimating the trace when the heritability was high, and this bias was larger with Schaeffer's method, ie when off-diagonal terms were neglected, than with l!Ieyer's. When the heritability is low, the variance ratio A is high and K's off-diagonal terms are much lower than the diagonals and can be neglected. With a high heritability, this is no longer the case and Schaeffer's methods becomes clearly less efficient than lVleyer's method. Finally, when the 3 approximations were accumulated and when the lieritability was low, tr(A -l C 22 ) was well approximated by both methods, generally differing by much less than 1% from true value. When heritability increased, Meyer's method appeared more efficient than Schaeffer's but still underestimated tr(A-1 C 22 ). Results for the larger data sets ( &dquo;chicken&dquo; in table III and &dquo;cattle&dquo; in table IV) were basically the same. In the &dquo;cattle&dquo; data set with IB!Ieyer's method, the bias was slightly positive (0.09 to 0.55%) for a low or medium heritability and slightly negative (-0.51%) for a high heritability. This good result is probably related to the small number of generations and the large average number of contemporaries.
In the &dquo;chicken&dquo; data set, bias was generally negative and reached -2.19% when heritability was 0.05. This result, less favourable than in the previous example, is probably due to the number of generations and to the relatively small number of reproducers. In spite of a large average number of contemporaries, the effect of the absorption simplification was inflated because contemporaries were related, at least after several generations (the average inbreeding coefficient at the last generation was 0.28).
In both data sets with Schaeffer's method, the bias was very small for a low heritability but reached -5.02 and -6.85% with a heritability of 0.5. Therefore, in spite of its (relative) complexity, particularly in the multiple trait situation, lvleyer's method was chosen for the approximate RE1!!IL analysis presented in the following part B.

REML analysis
While the computation of tr(A-1 C 22 ) is usually the limiting factor of the EM-type REML, its cost is negligible in the approximate RE1!!IL compared to the repeated solution of animal model equations. Table V presents the results of the female &dquo;chicken&dquo; data analysis at the first iteration and at convergence. The starting value for the variance ratio was the same (3) in the true REML analysis and in the approximate one. At the first iteration, the contribution of the prediction error variances tr(A -l C 22 ) appeared 6 times larger than the contribution of the quadratic form of the estimated breeding values. Under this very unfavourable situation and with the approximate method, the bias in the estimation of the trace was almost undiluted and led to an almost equivalent bias in the estimate of the variance component. Tlie bias in the trace estimation was rather small at any one given iteration, for example -0.64% at the first and -0.40% at the convergence point of the true RENIL. However, the bias was accumulated over iterations and the heritability estimate at convergence was clearly underestimated (0.173 us 0.208). These estimates were independent of the starting value.
Results of the bivariate analysis of the &dquo;chicken&dquo; data are presented in table VI. They were basically the same as for the single trait analysis. At convergence, the estimates of the approximate method were found to be always the same, regardless of starting values. Although the approximate method gives rather accurate estimates of the prediction error variance contribution at any one iteration, it does not provide satisfactory results in the RENIL analysis. This apparent contradiction is explained by the properties of the animal model. The variance components are estimated as the sum of the quadratic form of the breeding values, which is the really informative part, and the prediction error variance, which should be only an adjustment factor. In the case of an animal model, the amount of information carried by each animal is much smaller than the adjustment factor. In this unfavourable situation, a small bias in estimating this adjustment factor estimate leads to a variance estimate which may not be very close to the RE1VIL solution. Because the accuracy of the Mendelian sampling effect estimate is not primarily dependent of the population size, this problem is not expected to be solved by increasing the size of the data sample. To a lesser extent, similar problems may arise in a true RE1VIL when t l' (A-1 C 22 ) is not computed accurately enough, because of rounding errors in the inversion of large coefficient matrices. This may explain differences in results between methods or algorithms, or some surprising convergence points (Groeneveld and Kovac, 1990).
To avoid this problem, 2 ways might be investigated in further analysis. One approach would be to develop a similar method suited to models with fewer animals involved, each concentrating more information, as for instance in the reduced animal model (Quaas and Pollack, 1980). In that case, the ratio u'A-l u/tr(A-i C 22 ) would be increased and the method would be more robust to any bias in tr(A-l C z2 ). Another way would be to quantify by simulation the effect on the bias of heritability, the distribution of the data in the contemporary groups and the family structure, in order to ajust the trace a priori.
Presently, the approximate method does not provide the same estimates as a true REML, and further developments are needed to make it more efficient. Although the examples presented here are not general, it can be concluded that the bias increases when heritability increases, when the size of contemporary group decreases and when animals in the same contemporary group are related. Owing to its ease of use, this approximate method can be recommended only as a first approach, when the true heritability is expected to be low or moderate and when the contemporary groups are large. However, its use is restricted to the class of models in which the residual components can be computed as a residual sum of squares. Until now, no approximation has been found for the residual components in the general case of multiple trait models with missing data.