Inclusion of genetically identical animals to a numerator relationship matrix and modification of its inverse.

In the field of animal breeding, estimation of genetic parameters and prediction of breeding values are routinely conducted by analyzing quantitative traits. Using an animal model and including the direct inverse of a numerator relationship matrix (NRM) into a mixed model has made these analyses possible. However, a method including a genetically identical animal (GIA) in NRM if genetic relationships between pairs of GIAs are not perfect, is still lacking. Here, we describe a method to incorporate GIAs into NRM using a K matrix in which diagonal elements are set to 1.0, off-diagonal elements between pairs of GIAs to (1-x) and the other elements to 0, where x is a constant less than 0.05. The inverse of the K matrix is then calculated directly by a simple formula. Thus, the inverse of the NRM is calculated by the products of the lower triangular matrix that identifies the parents of each individual, its transpose matrix, the inverse of the K matrix and the inverse of diagonal matrix D, in which the diagonal elements comprise a number of known parents and their inbreeding coefficients. The computing method is adaptable to the analysis of a data set including pairs of GIAs with imperfect relationships.


Introduction
Cloning animals is regarded as a means to multiply genetically identical animals (GIAs). In Japan, clones of bulls are routinely produced to test bulls' performance and in some cases to multiply fattening animals. A survey conducted by the Japanese Ministry of Agriculture, Forestry and Fisheries, and published on October 31, 2007, has recorded calves cloned from somatic cells of 535 animals and from embryonic cell nuclei of 716 animals.
In animal breeding, analysis of quantitative traits using a mixed model is essential to predict the breeding value of an individual and to estimate the genetic parameters of the traits. When applying an animal model to perform the genetic analysis, it is necessary to include the inverse of the numerator relationship matrix (NRM) in order to connect all the animals included in the mixed model; however, calculating the inverse of a large NRM requires exceptionally large computing power. On the one hand, Henderson [1] has developed a method of calculating directly A -1 , without calculating the A matrix itself in a non-inbred population. This innovation has made it possible to use a model in which the data set includes a large number of animals. On the other hand, Quaas [2] has extended the method for the application to inbred populations by including the inbreeding coefficients in the model. A faster computing method of inbreeding coefficients has been developed by Tier [3] and Meuwissen and Luo [4], where inbreeding coefficients are computed as a subset of the A matrix. In addidion, Famula [5] has pro-posed a simplified algorithm for inbred populations, incorporating parental uncertainty to the model. Inclusion of GIAs in the model raises the problem of a singular A matrix because of perfect additive genetic relationships between pairs of GIAs. In the case of an analysis with a singular A, Henderson [6] presented a method to solve a mixed model without inversion of the G matrix, where . A few years later, Kennedy and Schaeffer [7] proposed a model in which records on GIAs are treated as repeated records on the same genotype. Their model assumes perfect genetic relationships among GIAs. However in reality, the genetic relationship between pairs of GIAs is not perfect because genetic diversity between such pairs originates from several genetic factors such as the difference in the recipient cytoplasm, mutations in the somatic cell and gene imprinting in the nucleus of the somatic cell [8][9][10]. Recently a study on human monozygotic twins analysing copy number variation, has revealed that genetic and phenotypic diversities exist even in monozygotic twins within pairs [11].
The objective of this study was to develop a method to compute directly the inverse of the NRM that includes GIAs with imperfect additive genetic relationships within pairs of GIAs.

Procedure
The A matrix is decomposed according to Famula [5].
where I is the identity matrix, P is a lower triangular matrix which identifies the parents of each individual in the population. The D matrix is a diagonal matrix with d i , ith diagonal element of D. Then where F p and F q are the inbreeding coefficients of the parents of the ith animal. In (1), is the lower triangular matrix, and is its transpose matrix.
From (1), the inverse of the A matrix is as follows: Next, we introduce the K matrix into (1) so that the A matrix includes GIA. Then, A is and where K is the matrix where a diagonal element is set to 1 and the off-diagonal element is set to (1 -x) when ith and jth animals are genetically identical and x is a constant near 0; K is as follows: Generally, for a matrix with GIAs where the diagonal elements are set to 1, and the off-diagonal elements are (1x), the inverse matrix for n GIAs is where , and .
Thus, K -1 is calculated, and the diagonal element of D -1 is . Therefore, A -1 is calculated directly by the product of the matrices without computing the inverse of A.

Algorithm for computation
Provided d i is calculated by the methods of Quaas [2] and Famula [5], A -1 is calculated directly by the following steps: i) If both parents of i, say p and q are known, and when i has no GIA, if both parents of , say and , are known n, if only one parent, say , is known, if neither add to elements (p, p), (p, q), (q, p) and (q, when i has n GIAs (g ij , j = 1, 2, ... n), If i is a donor animal of the GIAs, then add to elements (i, g ij ) and (g ij , i).
ii) If only one parent, say p, is known, add to elements (p, i) and (i, p), add to element (p, p).
iii) If neither parent is known,

Simulation
For the simulation study, animal phenotypes were generated assuming that the heritability of a trait was 0.5 and the variance for both additive genetic effect and random residuals was 2500. The number of animals in the base population (G0) was 300 (150 males and 150 females).
Their phenotypic values were generated by the infinitesimal model using the random digits generator, ranlib [12]. The phenotypic value of a descendant animal in the latter two generations (G1 and G2) was formed by an average of the parents, a Mendelian sampling effect and a random residual. The number of animals was 750 (250 males and 500 females) in G1 and 1000 (no sex effect on recorded animals) in G2. The breeding animals in G1 were selected randomly. Records used to estimate variance components comprised only the phenotypic values of animals in G2.
The number of GIAs for each mating was two, i.e., a total of 1000 GIAs (500 GIA pairs). The variance components were estimated by restricted maximum likelihood (REML) using remlf90 [13]. The number of replicates for the simulation was 20 for each x value (from 0.01 to 1.0).   The D matrix has elements calculated in (2). Thus, Therefore, A -1 in (3) is as follows:

Effect of the K matrix
A from (4) is 2) When animals 4 and 5 are GIAs, the K matrix is as follows: Then the inverse of the sub-matrix of K is where , , and . Therefore, K -1 is A -1 in (5) is as follows: Then A in (4) is Diagram of the pedigree.  Here, if x → 0, A is as expected, Thus, the elements of animals 4 and 5 are those of the GIAs. Figure 2a presents the results of the estimated genetic variance. Averages of 20 estimates are shown together with their ranges (minimum and maximum). Genetic variances were overestimated at low (1-x) and around the true value at high (1-x) and estimated residual variances were underestimated at low (1-x) and around the true value at high (1-x) (data not shown). The difference between the models with 0.95 and 0.99 was statistically insignificant by the likelihood ratio test.

Discussion
The K matrix proposed in this study is directly calculated by a simple formula; thus, calculating the inverse of a large matrix can be avoided as in standard methods [5,1,2]. Using NRM with GIAs results in adding more animal records to a dataset for variance component estimation and in calculating genetic evaluation by the mixed model procedure.
The simulation study showed that the estimated genetic variance is reasonably accurate with a high (1-x); however, a low (1-x) resulted in overestimation of the genetic vari-ance, which is caused by false genetic relationships between pairs of GIAs. For instance, in the case of (1-x) equal to 0.0, where the GIAs were erroneously treated as full sibs, a large genetic variance and, consequently, a small residual variance were estimated because the variance component within full sibs was far smaller than that expected for full sibs and vice versa for the variance component between full sibs.
This simulation study assumed perfect genetic relationships between pairs of GIAs. Then, a (1-x) value higher than 0.95 can result in unbiased estimates of additive genetic variance for the simulated data sets. Therefore, a (1-x) value of 0.95 is adequate for the data set of this simulation; however, the choice of the x value may depend on  the size of the data set and the genetic constituents of a population: a higher (1-x) may be adequate for a large data set and/or a data set containing monozygotic twins.
In the analysis of real records, a lower (1-x) value is expected because perfect genetic relationships between pairs of GIAs are no longer attainable. The highest genetic relationship is found among monozygotic twins; however, diversity within pairs of twins was found to be larger than expected, according to the human study by Bruder et al. [11]. A similar diversity is observed in embryo splitting studies, but manipulating embryos may constitute a potential source of increased diversity. Clones obtained from embryotic cell nuclear transfer may show a higher diversity caused by the recipient cytoplasm [9]. In the case of clones obtained from somatic cell nuclear transfer, an additional source for genetic diversity can originate from mutations in the somatic cell and gene imprinting in the nucleus of the somatic cell [8,10]. Different types of GIAs with various degrees of genetic diversity do exist. Thus, GIAs can be regarded as highly related animals rather than identical animals. Although, (1-x) values can range from 0.0 to 1.0, it is probably nearer to 1.0. To resolve this question, statistical studies such as REML on the estimation of x are needed with a large data set including various types of GIA. The methodology presented here provides an analytical tool to analyse GIAs with an imperfect genetic relationship within pairs of GIAs.