Marker assisted selection using best linear unbiased prediction

Best linear unbiased prediction (BLUP) is applied to a mixed linear model with additive effects for alleles at a market quantitative trait locus (MQTL) and additive effects for alleles at the remaining quantitative trait loci (QTL). A recursive algorithm is developed to obtain the covariance matrix of the effects of MQTL alleles. A simple method is presented to obtain its inverse. This approach allows simultaneous evaluation of fixed effects, effects of MQTL alleles, and effects of alleles at the remaining QTLs, using known relationships and phenotypic and marker information. The approach is sufficiently general to accommodate individuals with partial or no marker information. Extension of the approach to BLUP with multiple markers is discussed.


INTRODUCTION .
Genetic engineering techniques have produced a variety of molecular genetic markers with the potential to identify a large number of genetic polymorphisms (Soller * Author to whom correspondence should be addressed. and Beckmann, 1982;Smith and Simpson, 1986;Schumm et al., 1988). Markerassisted selection is one application of these techniques to animal and plant breeding. Information on marker loci that are linked to quantitative trait loci, together with phenotypic information, could be used to increase genetic progress by increasing accuracy of selection and by reducing generation interval (Soller, 1978;Smith and Simpson, 1986). Geldermann (1975) proposed a least-squares procedure to estimate effects of marker alleles on quantitative traits. Based on selection index principles, Soller (1978) combined marker information and phenotypic information to obtain genetic evaluations. This method has been used to study the additional genetic progress expected from marker-assisted selection (Soller, 1978;Soller andBeckmann, 1983, Smith andSimpson, 1986). Because of the complex nature of animal breeding data, however, these methods may not be applicable directly to marker-assisted selection with field data.
Data from field-recorded populations are affected by non-genetic nuisance factors, such as age of animal, age of dam, management system, season of birth and herb. Also, non-random mating, selection and overlapping generations contribute to the complexity of the data. Best linear unbasied prediction (BLUP; Henderson, 1973Henderson, , 1975Henderson, , 1982 deals with these complications when predicting breeding values from phenotypic data. The objective of this paper is to present methodology for the application of BLUP to marker-assisted selection in animal breeding. Each methodological development is illustrated with a numerical example using a single hypothetical pedigree METHODOLOGY Consider a single polymorphic marker locus (ML), closely linked to a quantitative trait locus (QTL). Let MP and Mi l denote alleles at the ML that individual i inherited from its paternal (p) and its maternal (m) parent, and let QP and Q7 denote alleles at the market QTL (MQTL) linked to M! and Mil, as shown below: Let vf and vi&dquo; be the additive effects of Q p and Q7. Additive effects of alleles at the remaining QTLs, unlinked to the ML, will be denoted by the residual additive effect u i . Now, the additive effect for individual i, a i , can be written as The usual model to obtain BLUP if additive effects, given phenotypic information, is where y i is the phenotypic value of individual i, xi is a vector of known constants, / 3 is a vector of unknown fixed effects, and e i is a random error. Using equ.(2), BLUP allows information from relatives to contribute to the predictor of a i through the covariance matrix of a i values. Note that this covariance matrix depends on the type of genetic information available. When only relationship information (r) is available, the covariance of a i values is which is proportional to the numerator relationship matrix (e.g., Henderson, 1976). When marker information (m) is also available, the covariance matrix a i values is It can be shown that G alr i-G alr , m , in general. For example, the covariance between half-sibs that receive the same ML allele from their common parent is higher than the covariance between half-sibs that receive different ML alleles. This is because half-sibs receiving the same ML allele also receive the same MQTL allele with greater frequency than half-sibs receiving different ML alleles.
A. Marker model I To obtain BLUP with phenotypic and marker information, it is convenient to use which is equivalent to equ.(2). The covariance matrix of v i values (G&dquo;) depends on relationship and marker information. The covariance matrix of ui values (G u ) depends only on relationship information and is proportional to the numerator relationship matrix (e.g., Henderson, 1976). Given the covariance matrices G v and G u , BLUPs of v i and u i values can be obtained using the mixed model equations (Henderson, 1973). The inverse of G u , which is required on the mixed model equations, usually is obtained using an algorithm given by Henderson (1976). A recursive algorithm to construct G v is given in section B, and an algorithm to obtain its inverse is in section C.
B. Covariance matrix of MQTL effects l. Theory. To construct G v , consider the covariance between additive effects of MQTL alleles. Without loss of generality, consider only paternal MQTL alleles. Suppose arbitrary individuals o and o' have sires s and s'. The MQTL alleles inherited by o and o' from their sires are QP and Q!, having additive effects vP and V ' . For paternal MQTL alleles in o and o', the covariance between their additive effects vo P and vo, is where Var(vo) = w is the additive variance of an MQTL allele and P(Q! Q pt) is the probability that Qo is identical by descent to QP I . For an arbitrary pair of individuals, one is not a direct descendent of other. If o is not a direct descendant of the o', QP can be identical by descent to QP, in 2 mutually exclusive ways: 1) Qo is identical by descent to the maternal MQTL allele of the sire of o' (1!9, ) and o' inherits QP I or 2) QP is identical by descent to the paternal MQTL allele of the sire of o' and o' inherits Q9 .
If marker information is available, the conditional probability that o' inherits Q!/, given that o' inherits M : ', is (1 -r), where r is the recombination rate between the ML and the MQTL. Thus if o' inherits M : ', the probability in equ. (4) (6). This is because, in the absence of marker information, Q!I and Qfi have equal probability of being transmitted to o'. The above development leads to a tabular method to construct G v , which is similar to the method used to construct the numerator relationship matrix (e.g. Henderson, 1976). Note that G v has twice as many rows as individuals because each individual has 2 effects: 1 for the paternal and 1 for the maternal MQTL allele. The rows and columns of G2! should be ordered so that those corresponding to progeny follow those for their parents. Let the row indices of G v , corresponding to the effects of MQTL alleles of individual o( vg, v'), be iP, io ; of its sire s(vP, v7 ) , be i!,i!; and of its dam d(vd, !), be id, i'. Also, let element i j of G v be g ij . Then from equs.(4), (5) and (6) (Table II). For convenience, we will assume that av = 1 and that r = 0.1. The first two individuals are assumed to be unrelated; thus the upper left 4 x 4 submatrix of G v is the identity matrix. Elements on the diagonal are equal to or2= 1. Now, row elements below the diagonal can be obtained from equs.(7a) and (7b); column elements above the diagonal are obtained by symmetry. Each row element for vl is equal to (1 -r) = 0.9 times the corresponding row element for vi 1 plus r = 0.1 times the corresponding row element for vi . Each row element for v3 is equal to r = 0.1 times the corresponding row element for v2 plus (1 &mdash; r) = 0.9 times the corresponding row element for vr. The ML allele inherited by 4 from its sire is unknown. Thus, each row element for f! is the mean (r = 0.5) of the corresponding row element for vi and for vi . Marker information is available for v4 , so that each row element for v4 is (1 &mdash; r) = 0.9 times the corresponding row element for v3 plus r = 0.1 times the corresponding row element for v3 . C. Algorithm for inverting G v 1. Theory. The approach taken here follows that by Quaas et al. (1984) and Quaas (1988) to invert the matrix of additive relationships. We define a linear model to relate the effect of the paternal MQTL allele of an individual (o) to effects of paternal and maternal MQTL alleles of its sire (s) where EP is a residual effect. Similarly, a linear model for effect of the maternal MQTL allele of o is It can be shown that the residuals eP in equ.(8a) and E m in (8b) have a diagonal covariance matrix ( Gs ; see Appendix). Now, the vector of effects of MQTL alleles (v) can be written as where P is a matrix with each row containing only two non-zero elements, if the parent is known or containing only zeros, if the parent is unknown; and where is a vector of residuals. For example, row iP will have (1 &mdash; po) in column iP and pa in column i l , if the sire of i is known. Similarly, row 17 will have (1 &mdash; pl) in the column iP and p7 in column id , if the dam of i is known.
To proceed, we need the diagonal elements of G e . Consider, for example, the variance of eo. From equ.(8a), if the sire of o is known because effects of MQTL alleles of sire s are uncorrelated with residuals of its offspring o (see Appendix). Hence The covariance between the effects of paternal and maternal MQTL alleles can be written as where F S is the inbreeding of sire s. Now, equ.(10) can be written as because Var(vo) = Var(v § ) = Var(v7 ) = Q v, and where (1-!)! = (1-r)r for po or for pg = (1 -r). When the sire is not inbred: Var(eo) = 2o!(l &mdash; r)r, if marker information is available; or Var(e §) = a!/2, if marker information is not available. If the sire is not known, Var(e §) = w.
Similarly, if dam of o is known, the variance of e7 is where (1 -p7 )p §! = (1 -r)r for p' = r or for p-= (1r) and where F d is the inbreeding of dam d. When the dam is not inbred: Var(eo ) = 2o,2 (1 -r)r, if marker information is available; or Var(eo ) = u § /2, if marker information is not available. If the dam is not known, Var(eo ) = Q v.
Rearranging (9), v can be written as for non-singular (I -P), and thus G.&dquo; can be written as From equ. (14), it is clear that a ; ;l can be written as As shown earlier, P has a simple structure, with each row containing at most 2 non-zero elements, and GE is diagonal.
To obtain the rules for inverting G v 1 , equ.(15) is written as where Q = (I -P'). Because G, is diagonal, equ.(16) can be written as where n is number of individuals in the pedigree, q j is column j of Q, and d j is diagonal element j of G §! . By definition of Q, element j of q j is unity. Further, q j will have, at most, only 2 other non-zero elements; for j = iP, element iP equals -(1 -pP,) and element is equals -p P o, if the sire of o is known. Similarly, for j = i!, element id equals -(1 -p!) and element id equals -p!, if the dam of o is known. Thus, given parent and marker information of an individual, the contributions to G v 1 , corresponding to effects of paternal and maternal MQTL alleles of the individual, are easily obtained. Now, to obtain the inverse of G&dquo;: 1) calculate diagonals of Gs : when the parent is known, the diagonal is given by equ.(12a) or (12b), and when the parent is unknown, the diagonal is o, V; 2 2) set G v to the null matrix; 3) for each offspring o, with sire s and dam d, add the following to the indicated elements of G-1 : if sire is known, add (1 -p!)2di! to diagonal element iP, iP; if dam is known, add (1 &mdash; p:;' ) 2 d i: ;. to diagonal element il, il; ..  Table 1. To construct Go we again take Q v = 1 and r = 0.1. Because the parents of individuals 1 and 2 are not known, the first 4 elements on the diagonal of G, are w = 1. For individual 3, each parent is known and marker information is available. Thus, from equs.(12a) and (12b), the two diagonals of G, corresponding to effects of paternal and maternal MQTL alleles of individual 3 are 2(1&mdash;r)r = 0.18. Each parent of individual 4 is also known, but the marker inherited from the sire is not known. Therefore, the diagonal of G, corresponding to v4 is 0.5, and that corresponding to v4 is 2(1&mdash; r)r = 0.18.
The P matrix for this example is given in Table III. The first 4 rows of P are null because parents of the first 2 individuals are not known. The sire of individual 3 is 1, and Mi was transmitted to 3. Thus, the row corresponding to v3 has (1 &mdash; r) = 0.9 in the column corresponding to vi and r = 0.1 in the column corresponding to vr. Similarly, the dam of individual 3 is 2, and M2 was transmitted to 3. Thus, the row corresponding to v3 has r = 0.1 in the column corresponding to v2 and (1r) = 0.9 in the column corresponding to v2 . The sire of individual 4 is 1, but marker information is not available. Thus, the row corresponding to v4 has 0.5 in the columns corresponding to vi and vr. The dam of individual 4 is 3, and M3 was transmitted to 4. Thus, the row corresponding to v4 has (1 -r) = 0.9 in the column corresponding to vp and r = 0.1 in the column corresponding to v7n The matrix Q = (I -P') is given in Table IV. The product QG E 1 Q' is given in Table V. It can be verified that this is identical to the inverse of the matrix G v in Table II. D. BLUP with multiple markers If information on another marker locus linked to a QTL is available, the model can be expanded to include effects of alleles of this MQTL. This approach, however, results in 2n additional equations for each marker introduced into the analysis. Thus, for a large number of individuals (n) and a large number of MQTLs, solving the mixed model equations may not be feasible. An alternative would be to use equ.
(2), with where v!i and vk j are effects of paternal and maternal alleles of the k t i' MQTL.
The covariance matrix of effects of MQTL alleles at each locus ( Gv! ) can be constructed using the tabular method described in Section ILB. Then, assuming gametic equilibrium, the covariance of matrix a i values (Ga!T n!) can be obtained as where Z is a n x 2n matrix with elements for row i containing a 1 corresponding to each of the paternal and maternal MQTL effects of individual i and zeros for the remaining elements. The problem with this approach, however, is that it could not be applied to large systems, unless a simple algorithm to invert Galr, m is available.

DISCUSSION
Results presented here are an application of BLUP to marker-assisted selection.
This is a generalization of the method presented by Soller (1978) and Soller and Beckmann (1983). This generalization allows simultaneous evaluation of fixed effects, MQTL effects and the residual QTL effects, using known relationships and phenotypic and marker information. It is sufficiently general to accommodate individuals with partial or no marker information. Several authors have calculated the additional genetic progress expected from marker-assisted selection (Soller, 1978, Soller andBeckmann, 1983;Smith and Simpson, 1986). Because the method presented here is a generalization of the method considered by these authors, their results give an indication of the advantage expected by using marker-assisted BLUP.
Application of this procedure requires knowledge of the recombination rate (r) between the marker and the MQTL and the variance of the additive effect of the MQTL alleles (a'). Assuming that effects of MQTL alleles are normally distributed, the model presented here could be used to estimate r and ufl by restricted maximum likelihood (REML; Patterson and Thompson, 1971). The robustness of REML estimation, with respect to the distribution of effects of MQTL alleles, needs to be examined.