Covariance between relatives for a marked quantitative trait locus

Summary - Best linear unbiased prediction (BLUP) can be applied to marker-assisted selection. This application requires computation of the inverse of the conditional covariance matrix (Gof additive effects for the quantitative trait locus (QTL) linked to the marker locus (ML), given marker genotypes. This paper presents theory and algorithms to construct G and to obtain its inverse efficiently. These algorithms are suf&ciently general to accommodate situations (1) where paternal or maternal origin of marker alleles cannot be determined and (2) where the marker genotypes of some individuals in the pedigree are unknown. genetic marker / marker-assisted selection / best linear unbiased prediction / covariance between relatives / gametic relationship


INTRODUCTION
Theory for covariance between relatives provides the basis for use of data from relatives in genetic evaluation. At present, genetic evaluations in animal populations are primarily obtained by best linear unbiased prediction (BLUP; Henderson 1973) using trait phenotypes (T-BLUP). Due to advances in molecular biology, genetic markers are becoming increasingly available for use in genetic evaluation. Several approaches for use in genetic evaluation using marker genotypes and trait phenotypes have been discussed (Geldermann, 1975;Soller, 1978;Soller and Beckmann, 1982;Smith and Simpson, 1986;Kashi et al, 1990). In addition, Fernando and Grossman (1989) described how BLUP can be used for genetic evaluation using marker genotypes and trait phenotypes (TM-BLUP). Some strategies have been proposed to make TM-BLUP computationally efficient (Cantet and Smith, 1991;Hoeschele, 1993;van Arendonk et al, 1994). TM-BLUP has also been extended to accommodate multiple markers (Goddard, 1992;van Arendonk et al 1994).
TM-BLUP requires computation of the inverse of the conditional covariance matrix (G v ) of additive effects for the quantitative trait locus linked to the marker locus, given marker genotypes. To compute this inverse, Fernando and Grossman (1989) provided an algorithm that required information on the parental (paternal or maternal) origin of marker alleles, in addition to information on marker genotypes.
The parental origin of marker alleles in an individual, however, is not always known. For example, if 2 parents and their offspring each has genotype A l A 2 at the same marker locus, marker allele A 1 in the offspring could have descended from either of the parents, thus the parental origin of A 1 in the offspring is unknown.
The objective of this paper is to present theory and algorithms to compute the conditional covariance matrix and its inverse when parental origin of the marker alleles may not be known. Theory and algorithms are developed for pedigrees where the marker genotype of each individual is known (complete marker data).
Application of this theory is given for pedigrees where the marker genotype of some individuals is unknown (incomplete marker data). Wang et al (1991) presented, without proof, a recursive equation to construct G v and an efficient algorithm to compute its inverse. This recursive equation has been used by van Arenonk et al (1994) and Hoeschele (1993). In the present paper, we prove that the recursive equation holds when marker data are complete, but does not hold generally when marker data are incomplete. Chevalet et al (1984) have described a method to compute G v given marker phenotypes. This method does not require knowing the parental origin of marker alleles and can accommodate missing marker phenotypes. The method, however, is not computationally feasible for the large pedigrees typically encountered in animal breeding. Computation of the conditional covariance matrix and its inverse become feasible by conditioning on marker genotypes instead of marker phenotypes.

NOTATION AND ASSUMPTIONS
Consider a single polymorphic marker locus (ML) closely linked to a quantitative trait locus ((aTL), which will be referred to as the marked QTL (MQTL  (1991). It will be shown later that [3] does not hold generally when marker data are incomplete.
Generalizing in (3!, Pr(Q7 i « Q; P IG obs ) is the conditional probability that allele Q/° in offspring i descended from allele Qp in parent p = s or d for k i , k P = 1 or 2.
This conditional probability will be referred to as the probability of descent for a QTL allele (PD(,!). There are 8 PD(!s for each individual, as shown in Appendix B, and each PDQ can be expressed as for k i = 1 or 2 and p = s or d, where p = r when k P = 1 and p = 1r when k P = 2, and where r is the recombination rate between the ML and MQTL. Further, Pr(Miki ! M; P ¡G obs ) is the conditional probability that marker allele Mk' in offspring i descended from marker allele M!' in parent p, given the pedigree and marker genotypes. This conditional probability will be referred to as the probability of descent for a marker allele (PDM). There are 8 PDMs for each individual, and their computations are explained in Appendix A. Note that the PDMs and PD(as associated with the unknown parent(s) are undefined.
Equation [4] explicitly shows the relationship between PDQs and PDMs in scalar notation. For convenience, it is rewritten in matrix notation as where Covariance within an individual The conditional covariance between additive effects vi t and v? of MQTL alleles (! z and Q? in individual i with parents s and d, given Gobs, can be written from [1] as where fi = Pr(Ql > Q/ )Gobs) is the conditional probability that 2 homologous alleles at the MQTL in individual i are identical by descent, given Gobs. Thus, f i is the conditional inbreeding coefficient of individual i for the MQTL, given Gobs. This is different from Wright's inbreeding coefficient, which is the conditional probability that 2 homologous alleles at any locus in individual i are identical by descent, given only the pedigree.
The pair of 2 homologous alleles at the MQTL, Ql and Q?, in individual i descended from 1 of the following parental pairs: (Qs, Qd), (Q9, Qd), 8 Q') or s Q §) . Let T,!skd denote the event that the pair of alleles in i descended from the parental pair (<3!°,<3!'') for k s , k d = 1 or 2. Now, f i can be written as Because (QI = Q2!Tks!d> Gobs) implies (QSS -Qdd !Gobs), [10] becomes The Pr(T k g kdI G obs ) can be expressed in terms of PD(as (see Appendix G! as For example, where B i (l, k) are elements of B i in (5!. If 1 of the denominators in !12! is zero, then the entire corresponding term is set to zero.
Tabular method to construct covariance matrix G v The conditional covariance matrix (G v ) between additive effects of MQTL alleles can be written, from [1] and !9!, as where A is the matrix of conditional probabilities that the 2 homologous alleles at MQTL are identical by descent, given Gobs. The matrix A includes a row and column for each of the 2 MQTL alleles in each individual. Thus the order of A is 2n, where n is the number of individuals in the pedigree. This matrix is the conditional gametic relationship matrix (Smith and Allaire, 1985), given Gobs. It follows that each diagonal element of this matrix is unity. The tabular method to construct A is explained below.
Following Henderson (1976), individuals are ordered such that parents precede their progeny, and individuals 1 through b are considered to be unrelated and noninbred. Thus, the upper left submatrix of A is an identity matrix of order 2b, which is expanded sequentially by the 2 rows and 2 columns corresponding to individual i, for i = b + 1, ... , n, as follows: Let 81 = 2(i &mdash; 1) + 1 and 6f = 2(i &mdash; 1) + 2 be the row indices of A corresponding to the 2 MQTL alleles Ql and Q2 of individual i. From !3!, the elements of the 2 rows 6/ and 6i , corresponding to the 2 MQTL alleles of individual i with parents s and d, are computed as for j = 61 -1, where B i (L, k) were defined in !6!. Element À p 61 = fi, where f i is given in !11!. Elements of columns 6! and 8; are obtained by symmetry. If 1 parent is unknown, terms involving the unknown parent are dropped from !14!.
For convenience, the tabular algorithm described above can be written in matrix notation. Let A i -i be the upper left submatrix of A expanded up to i -1. For individual i, with parents s and d, A i _ 1 is expanded to Ai as where and In (17!, q' is a 2 x 2(i-1) matrix with at most 8 non-zero elements, which are from B i and are located in columns 6s, 8;, 6d and 6d.
The above tabular algorithm to construct A is similar to that used to construct the numerator relationship matrix (Emik and Terrill, 1949;Henderson, 1976 (1994) used partitioned matrix theory to develop rules to invert the numerator relationship matrix efficiently for populations with unusual relationships. A similar approach is used here to invert A efficiently. From [13], Gj! = A -1 /a!. In general, the inverse of A i , partitioned as in !15!, can be obtained as where Di = Ci -q §Aj-i qi is 2 x 2 matrix (Searle, 1982). From !18!, the contribution of individual i to Ai is given by the second term on the right-hand side of this equation, for which, as shown below, there are at most 36 non-zero elements.
Because of the sparse structure of q i as shown in (17!, qiA i _ l qi can be written as B ics , d B §, where C s , d is the 4 x 4 conditional gametic relationship matrix for parents of i, s and d, the elements of which are in A Z _ 1 , and B i is the matrix of PDQs defined in (6!. Thus If fi, f s and f d are nulle, then where 1 2 is a 2 x 2 identity matrix. The submatrix qiDilq! in [18] is a square matrix of order 2(i &mdash; 1) that contains only 16 non-zero elements, which are given by B i Dz l Bi. The submatrix Di l qi is a matrix of order 2 x 2(i -1) that contains only 8 non-zero elements, which are given by DilB!. Thus, there is a total of 36 non-zero elements contributing to Ail i from individual i. For convenience, these 36 non-zero elements are collected into a 6 x 6 matrix: Because W i contains all contributions to A i l from individual i, we refer to it as the 'contribution matrix'. The position of contribution element W i (l, k) is given by element Il i (1, k), so we define the corresponding 'position matrix' for W i as where 6b = 2(a-1)+b for a = s, d, or i and b = 1 or 2. If both parents of individual i are known, then all elements in Ii i are defined. If at least 1 parent is unknown, then elements in II i associated with the unknown parent(s) are not defined.
Because q i has at most 8 non-zero elements, and the positions of these elements are simple functions of s and d, [18] leads to an efficient algorithm to invert A, where the number of arithmetic operations for inverting is proportional to 2n, the size of A. It is noteworthy that any symmetric positive definite matrix can be inverted using !18!. Unless q i is sparse and the positions of the non-zero elements can be determined easily, this approach will not be efficient. Note that [19] requires C s , d , which is from A i _ 1 . Thus for an inbred pedigree, C s , d needs first to be computed, similar to the situation where inbreeding coefficients need first to be computed when Henderson's rapid algorithm (Henderson, 1976)  submatrix of the conditional gametic relationship matrix A is an identity matrix of order 4. This submatrix will be expanded by the tabular method for individuals 3, 4, and 5, as shown below.
The matrix B 3 of PD(as for individual 3 with parents 1 and 2 is computed using S 3 according to [5]: Now, from [14], elements A5 , j and A6 , j , for j = l, ... , 4, which correspond to individual 3, are computed as linear functions of elements in the first 4 rows, which correspond to the parents 1 and 2: Diagonal elements A 5 , 5 and A 6 , 6 for individual 3 are unity. Off-diagonal element .!6,5, which is defined as the conditional inbreeding coefficient in !10!, is null because the parents of individual 3 are unrelated. For individual 3, therefore, numerical values of elements A 5 , j for j = 1, ... , 5 and !6,! for j = l, ... , 6 are The corresponding column elements are obtained by symmetry.
For the first 2 individuals, the parents are unknown. Thus, add Is to AIL A2!, A3! and A4!. For individual 3, PD(as (B 3 ) can be obtained as shown earlier.
Because individual 3 is not inbred, D 3 = I 2 -B3B!, from (20!. Matrix W 3 is in Algorithms to construct and invert the conditional gametic relationship matrix (A), given complete marker data, are based on the recursive equation !3). In deriving [3] from !2!, it was assumed, given complete marker data, that events Q7; {:::: Q § and Qs -Q i kj for example, are independent. They may not always be independent, however, when marker genotypes of the parents are unknown. Thus, although [2] holds for complete and incomplete marker data, [3] may not hold for incomplete marker data. Therefore, algorithms developed for complete marker data cannot be directly applied, in general, to pedigrees with incomplete marker data. In this section, we first demonstrate that [3] may not hold when marker genotypes of parents are unknown. A strategy to accommodate pedigrees with incomplete marker data is then presented.
The pedigree in table I is used to demonstrate that [3] may not hold when marker genotypes of the parents are unknown. In this pedigree, marker genotype of individual 2, the maternal parent of individuals 3 and 4, is unkown. Thus, as shown below, Pr(Q4 -Q2) cannot be computed using !3!.
The last 2 terms in [26] are null because the QTL alleles in the unknown parent of individual 4 cannot be identical by descent to QTL alleles in individual 3. In deriving [3] from !2!, it was assumed, given Gobs, that Q4 ! Q' and Qz = Q2, for example, are independent, ie Because the marker genotype for the maternal parent of individual 4 is unknown, however, the above equality does not hold. This is illustrated numerically.
Given the parents' genotypes, the genotypes of offspring are independent. Therefore, Pr(Qi « Q', Q' = !w3I!'robs) can be computed by conditioning on the genotype of individual 2 (parent of individuals 3 and 4) as The probabilities required in the above computation are From the above table, Pr(Q4 ! Q)]G obs ) and Pr(Q! = Q3IG obs ) can also be computed as The values of Pr(<! 4= Q2!Gobs) = 1/24, Pr(Q2 = Q3IGobs) = 1/2, and Pr (Q' 4 -<-Q2, Q2 = Q3!Gobs) = 3/400 illustrate that Pr(! 4= Q l , Ql -Q2 i G ob,,) = A Pr(''!4 ! Q 2l G obs) Pr ( ' w2 = Q 2 3 ob.,) Because [3] may not hold when marker genotypes of parents s and d are unknown, the tabular algorithm for complete marker data cannot be applied directly to construct A, given incomplete marker data. The tabular algorithm can be used, however, to construct A given incomplete marker data, as described below.. Let S2 be the set of all possible marker genotype configurations for individuals with unknown genotypes, and let Gobs be the observed marker genotypes for individuals with known genotypes. The conditional gametic relationship matrix given incomplete marker data, A IGobs' can then be computed as where A lw ,G Ob8 is the conditional gametic relationship matrix given marker genotypes w for individuals with unknown genotypes and Gobs for individuals with known genotypes, and Pr(w I G obs ) is the conditional probability of individuals with unknown genotypes having marker genotypes w, given marker genotypes Gobs for individuals with known genotypes. The matrix A lw , GOb8 can be constructed using the tabular method given complete marker data, and the probability Pr!Go!) can be computed as where Pr(w, G obs ) can be computed efficiently (Elston and Stewart, 1971;Bonney, 1984).
The conditional gametic relationship matrix (A) for the pedigree in table I, computed using !27!, is Computing A using [27] is not efficient when a large number of individuals have unknown genotypes because the summation in [27] is over all combinations of the unknown genotypes. Further, an efficient algorithm to invert AI Gobs has not been found. Therefore, 2 approximate methods to compute A¡ Gobs and its inverse are presented: 1) We have already shown that [3] may not hold for incomplete marker data because, given Gobs, Q7i ! Q! and Q9 = Q ki in [2], for example, may not be independent.
If we ignore this dependency, then [15] and (18!, which are based on (3!, can be used to approximate A and its inverse. This approximation will require PDMs for individuals with incomplete marker data. For individual i, with unknown marker genotypes for parents s and d, PDMs can be computed as where each summation is over all possible genotypes at the ML. If G s , G d , or G i is not missing, then the corresponding summation should be dropped from !28!. The computation of Pr(G,,Gd,GilG!b,) can be very time-consuming when a large number of individuals have unknown marker genotypes. An approximation for Pr(G s , G d , Gi ] Gobs) can be obtained, however, by conditioning only on marker information of 'close' relatives of i, s and d, where, for example, a set of 'close' relatives for an individual could be its parents, sibs and offspring. The conditional gametic relationship matrix (A), for the pedigree in table I, using this approximation is The consequence of this approximation is that the summation in [27] has been brought into inside of A and performed on B i (or S i , see [5]). 2) Let w max be the genotype configuration in S2 with the largest probability. Given w max and Gobs, [15] and [18] can be used to approximate A and its inverse. Sheehan et al (1993) proposed a sampling scheme to compute the probability of genotype configurations. For the pedigree in table I, given Gobs, G z = A i A 2 has the largest .conditional probability (2/3) among all possible genotypes for G 2 , ie w max = (Gz = AiA 2 ). Thus, [15] can be used to construct A with G Z = A i A 2 . The conditional gametic relationship matrix (A) using this approximation is: The consequence of this approximation is that the resulting A is conditional on wmax · A measure of how well an approximation compares to the exact method is the correlations coefficient, r' exact ,a.pp rox; between upper off-diagonal elements of AI Gob s , computed exactly by !27!, and corresponding elements computed by approximate methods. For the pedigree in table I, Texact , a pp roxl = 0.9877 for approximation 1 and ?'exact , a pp rox2 = 0.8735 for approximation 2.
To further examine these approximations rexa!t,apProxi and rexa!c,appTOx2 were computed for a pedigree of 99 individuals with 3 generations. The first generation consisted of 3 grandsires, each mated with 12 granddams. The second generation consisted of 2 sires and 10 dams from each grandsire for a total of 6 sires and 30 dams. Each sire was randomly mated with 4 dams, avoiding full-sib and halfsib matings. The third generation consisted of 2 grandsons and 2 granddaughters from each sire for a total of 12 grandsons and 12 granddaughters. Marker genotypes were assumed missing for the 30 maternal granddams. Thus covariances were only computed for the remaining 69 individuals in the pedigree. Marker genotypes for these 69 individuals were generated randomly. Granddaughters and dams without progeny were assigned missing marker genotypes with probability 0.6.
Exact and approximate covariances were computed for 20 randomly generated marker genotype configurations. The average for r exact , a pp rox i was 0.8923 and for !'exact,approx2 was 0.8939. The effect of these approximations on marker-assisted genetic evaluation needs to be studied.

DISCUSSION
Theory and algorithms are presented here to construct the conditional covariance matrix between relatives for a marked quantitative trait locus (G v = Au v 2) and to obtain its inverse efficiently. These algorithms extend those of Fernando and Grossman (1989) to accommodate situations (1) where paternal or maternal origin of marker alleles cannot be determined and (2) where marker genotypes of some individuals in the pedigree are unknown. The exact procedure presented here to construct A!Gobs for incomplete marker data may not be efficient for large pedigree. Therefore, we presented 2 alternative strategies to approximate A!Gobs and its inverse. Simulation results indicate that the 2 approximations are similar because they have similar correlations with the exact method ( ?'exact , a pp rox i = 0.8923, rexact,approx2 ! 0.8939). Approximation (1) is preferred, however, because it may be difficult to search for w max when a large number of individuals have unknown marker genotypes.
We also presented an algorithm to compute the conditional inbreeding coefficient ( fi) for a QTL given Gobs, which is different from Wright's inbreeding coefficient.
This conditional inbreeding coefficient is the probability that the 2 homologous alleles at the MQTL in an individual are identical by descent given the pedigree and marker information, whereas Wright's inbreeding coefficient is the conditional probability that the 2 homologous alleles at any locus in an individual are identical by descent given only the pedigree. A numerical example is used to show that equation !3!, which is the basis of tabular method to construct G v , does not hold generally when marker data are incomplete.
In most practical situations, marker information will not be available on distant ancestors. Thus, TM-BLUP cannot be computed. One of the 2 approximations presented in this paper, however, can be employed to compute A IG o bs' Thus available marker information can be used to obtain improved genetic evaluations by approximate TM-BLUP. Further, in general, information on distant ancestors has little impact on genetic evaluations.
If the ML and MQTL are in linkage disequilibrium, marker data provide information on the first moment of MQTL effects. In this situation, regression techniques can be used for genetic evaluation using marker and trait information (Lande and Thompson, 1990;Zhang and Smith, 1992). If the ML and MQTL are in linkage equilibrium, marker data do not provide information on the first moment of the MQTL effects. Even with equilibrium, however, marker data do provide information on covariances of MQTL effects. In this situation, TM-BLUP can be used for genetic evaluation by fitting MQTL effects as random effects within animal (Fernando and Grossman, 1989;Cantet and Smith, 1991;Goddard, 1992;Hoeschele, 1993).
Genetic evaluation by TM-BLUP requires knowledge of genetic parameters, such as r and o, v 2. This is also true for T-BLUP, which requires knowledge of genetic variances and covariances. In practice, true values of genetic parameters are unknown and estimates are used in their places. Both restricted maximum likelihood and maximum likelihood approaches can be used to estimate parameters required for TM-BLUP (Weller and Fernando, 1991).
Ideally, marker-assisted selection will be based on multiple marker loci. When the linkage phase between flanking marker loci is known in addition to the parental origin of marker alleles, the method presented by Goddard (1992) for multiple markers can be used for TM-BLUP. Further research is needed for TM-BLUP using multiple markers when both the linkage phase between flanking marker loci and the parental origin of marker alleles are unknown.
(LB Schook, HA Lewin, DG McLaren, eds), Marcel Dekker, New York, USA, 305-328 Zhang W, Smith C (1992)  Theory for computation of PDQs The conditional probability that allele Q7 i of individual i descended from allele QP P of parent p (fig 1), given Gobs, will be denoted by Pr(Qf° « QP P I G, b ,), which is called PDQ. This conditional probability can be expressed as Because Qf° and M ki are on the same chromosome of individual i, each must have descended from the same parent. Thus, Pr M,&dquo; 4--Mp&dquo;54,, Qk° ! QPP !Gobs) is null.

Now,
There are 2 probabilities on the right-hand side in !B2!. The first probability is a PDM for individual i (see [All for its computation). The second probability can be expressed in terms of PDMs and of the recombination rate r between the ML and the MQTL as explained below.
Given Mf° « M:!, the probability that Qf° descended from QP P does not depend on other information in the pedigree. Thus, If k § = kp, then recombination has not taken place, so that If k' p 54 kp, then recombination has taken place, so that For each combination of k i , kp, k) = 1, 2, we have The PD(as, Pr( Q7 i {= Q k, !Gobs), for ki, k P = 1, 2, can be obtained by using the above in !B2!: where p = s or d.
In summary, for k i = 1 or 2 and p = s or d, where p = r when kp = 1 and p = 1-r when kp = 2.
Note that PDC!s are now expressed in terms of PDMs and r.

APPENDIX C
Theory for computation ofPr(TkskdIGobs) The event that the pair of alleles (<!,<3!) in individual i descended from parental pair (Qs s , Qd d ) (fig 1), is denoted by T kskd for k s k d = 1 or 2. This event can occur in 1 of 2 ways: