Reduced animal model for marker assisted selection using best linear unbiased prediction

A reduced animal model (RAM) version of the animal model (AM) incorporating independent marked quantitative trait loci (M(aTL’s) of Fernando and Grossman (1989) is presented. Both AM and RAM permit obtaining Best Linear Unbiased Predictions of MQTL effects plus the remaining portion of the breeding value that is not accounted for by independent M(aTL’s. RAM reduces computational requirements by

presented. maker assisted selection / best linear unbiased prediction / reduced animal model / genetic marker Résumé -Un modèle animal réduit pour la sélection assistée par marqueurs avec BLUP. Une version du ncodèle animal réduit (RAM) basée sur le modèle animal (AM) de Fernando et Crossman (1989) avec loci indépendants de caractères quantitatifs marqués (MQTL) est présentée. Dans les 2 cas, RAM et AM, on obtient les meilleurs prédictions linéaires sans biais (BLUP) des effets des MQTL en plus de la portion restante de la valeur génétique inexpliquée par les MG!TL indépendants. L'emploi de RAM diminue les exigences de calcul par une réduction de la taille du système d'équations. Les effets des MQTL reon-parentaux sont exprimés sous la forme d'une fonction linéaire des effets des MQTL expliquée par la régression des effets des MQTL parentaux est donnée par l'expression 2!(1 -r) 2 + r2] /2 dans le cas d'un individu non consanguin avec parents connus. Des formules sont dérivées pour simplifier les calculs lorsque l'on résout pour les effets des MQTL et des valeurs génétiques non parentaux dans le cas où tous les individus non parents possèdent une seule observation. Un exemple numérique est également donné. sélection assistée par marqueurs / BLUP / modèle animal réduit / marqueur génétique INTRODUCTION In a recent paper, Fernando and Grossman (1989) obtained best linear unbiased predictors (Henderson, 1984) of the additive effects for alleles at a marked quantitative trait locus (MQTL) and of the remaining portion of the breeding value. They used an animal model (AM; Henderson, 1984) under a purely additive mode of inheritance. Letting p be the number of fixed effects in the model, n the number of animals in the pedigree file and m the number of M(!TL's, the number of equations in the system for this AM is p + n(2m + 1). For large m, n or both, solving such a system may not always be feasible. The reduced animal model (RAM; Quaas and Pollak, 1980) is an equivalent model, in the sense of Henderson (1985), to the AM and provides the same results, but with a smaller number of equations to be solved.
In this paper, the RAM version of the model of Fernando and Grossman (1989) is obtained. The resulting system of equation is of order p + s(2m + 1), s being the number of parents. In general s is much smaller than n. Therefore, the advantage due to the reduction in the number of equations by using RAM is considerable. A numerical example is included to illustrate the application. THEORY For simplicity, derivations are presented for a model with one MQTL. The extension to the case of 2 or more independent M(!TL's is covered in the section entitled More than one MQTL. In the notation of Fernando and Grossman (1989), MP and Mm are alleles at the marker locus that individual i inherited from its paternal (p) and its maternal (m) parents, and vf and vi are the additive effects of the paternal and maternal MQTL's, respectively. The recombination frequency between the marker allele and the MQTL is denoted as r. We will use the expression &dquo;breeding value&dquo; to refer to the additive effects of all genes that affect the trait excluding the MQTL(s).
Matrix expressions for the animal model with genetic marker information A matrix version of equation (3) in Fernando and Grossman (1989) is : where y is an n x 1 vector of records, X, Z and W are n x p, n x n and n x 2n incidence matrices which relate data to the unknown vector of fixed effects !, the random vector of additive breeding values u and the random vector v of additive effects of the individual MQTL effects, respectively. The 2n x 1 vector v is ordered within animal such that vf always precedes f!. The matrices Z and W will have zero rows for animals that do not have records on themselves but that are related to animals with records. Non-zero rows of Z and W have 1 and 2 elements equal to 1, respectively, with the remaining elements being zero. First and second moments of y are given by : where Acr! and G 2 ,w are the variance-covariance matrices of u and v, respectively. The scalars a A 2 w and o,2 are the variance components of the additive effects of breeding values, the MQTL additive effects and of the environmental effects. RAM requires partitioning the data vector y into records of individuals with progeny (yp ; parents) and records of individuals without progeny (y,!r ; non-parents) so that y' = [y%, y' 1. A conformable partition can be used in X, Z, W, u, v and e. Using this idea (1) can be written as : To obtain RANI, u N and v N should be expressed as linear functions of up and vp, respectively. Since an individual's breeding value can be described as the average of the breeding value of its parents plus an independently distributed Mendelian sampling residual (!) (Quaas and Pollak, 1980), for u N we can write : where P is an (ns) x s matrix relating non-parental to parental breeding values.
Each row of P contains at most two 0.5 values in the columns pertaining to the BV's of the sire and of the dam. Now, E(!) = 0 and Var(cp) = D A aA, where D A is a diagonal matrix with diagonal elements equal to : 1 -0.25(a!! + add), if both sire and dam of the non-parent are known 1 -0.25a ss , if only the sire is known 1 -0.25ad!, if only the dam is known 1, if both parents are unknown with a ss and add being the diagonal elements of A corresponding to the sire and the dam, respectively. A scalar version of the relationship between v N and vp can be obtained from equations (8a) and (8b) in Fernando and Grossman (1989) and these are : The subscripts o, s and d denote the individual, its sire and its dam, respectively.
The coefficients bis are either 1-r or r according to any of these 4 possible patterns of inheritance of the marker alleles : Paternal marker Maternal marker The above developments lead us to the following relationship between v!Br and ! : The 2(n &mdash; s) x 2s matrix F relates the additive effects of the MQTL of nonparents to the additive effects of the MQTL of parents and s is the vector with element i equal to residual eo and element i + 1 equal to the residual &0'. Each row of F, contains at most, 2 non-zero elements : the bis. Let i and k be the row indices for the MQTL marked by MÓ and A/o&dquo; respectively. Let j and j + 1 be the column indices corresponding to the additive effects of the MQTL for the sire that transmits i : j refers to the paternal grandsire and j + 1 to the paternal granddam. Also, let 1 and 1 + 1 be the column indices corresponding to the dam that transmits i + 1 : corresponds to the maternal grandsire and l + 1 to the maternal granddam. Then Fij = b l , Fi,!+1 = bz, F!,! = b 3 and F k ,i +1 = b 4 . All remaining elements of F are 0. When marker information is unavailable, r is taken to be 0.5 (Fernando and Grossman, 1989) and all bis are 0.5. To exemplify, consider individuals 1 (male), 2 (female) and 3 (progeny of 1 and 2). Animals 1 and 2 are unrelated and 3 has paternal and maternal marker alleles originating from the dams of 1 and 2, namely alleles M d and M.! respectively. Then, v = [v', v í &dquo;, v p V!l, vp v'n!' with 5' 1 2> > 1 1 2 2 ) 31 3 V!! = !7J!, vi l t 1 , v p 2, V2n]' and yM = w3, 'U!i!'. The matrix W 1 S : For r = 0.2, the matrix F is 2 x 4 and equal to : The residuals e have E(s) = 0 and Var(e) = G c ufl. Fernando and Grossman (1989) showed that G«u fl is diagonal with non-zero elements equal to Var(e') = 2r(1 -r)(1 -fg)u' § and Var( E ü) = 2r(1r)(1 -fd,)o, 2, where f s , and f d are the inbreeding coefficients at the MQTL of the sire and of the dam, respectively. They express the probability that the paternal and maternal alleles of an individual for a given MQTL are the same. These f's are the of f -diagnonal elements in the 2 x 2 diagonal blocks of the matrix G v (Fernando and Grossman, 1989). Using (3) and (4)  The matrices A P and G vP are the corresponding submatrices of A and G v that belong to parents. Equations (7) give the solutions for RAM with genetic markers.
Of practical importance is the case where all non-parents have only one record so that Z N = I. Then, W N G E WN and Q-1 are diagonal (see Appendix A). The diagonal elements of W NGe W! are derived in Appendix A and they are equal to : 2r(1-r)(2f s -f d ), when both the sire and the dam of the non-parent are known 2r(1 -r)(1 -f s ) + 1, when only the sire is known 2r(1 -r)(1f d ) + 1, when only the dam is known 2, if both the sire and the dam of the non-parent are unknown.
If there is zero probability that the paternal and maternal alleles at the MQTL of parent p are the same (ie fp = 0), the contribution to the diagonal element of W NGe: W! is 2r (I -r) (if marker information is available) or 1/2 (if marker information is unavailable). This occurs because, in the absence of marker information, there is equal probability of receiving the MQTL from the grandsire and from the granddam, and r = 0.5 (Fernando and Grossman, 1989).
A further simplification to (7) occurs when parents do not have records so that Zp and Wy are zero and the model becomes a sire-darn model. A program for RAM, such as the one presented by Schaeffer and Wilton (1987) and modified to include marker information can be employed to solve equations (7).

EXAMPLE
We use the same data that Fernando and Grossman (1989) employed. There are 4 individuals, 3 of them are parents and 1 is a non-parent. The file is : Notice that individual 4 is inbred. A fixed effect was included and the matrix resulting from adjoining the incidence matrix X and the vector of observations y, ie [Xly] is : Variance components used were (J! = 100, a § = 10 and Q! = 500 and r = 0.1. The matrices G u and G U are presented in Fernando and Grossman (1989).

DISCUSSION
The advantage of RAM over AM increases as both the ratio between the number of non-parents and the number of parents and the number of independent MQTL increase. Goddard (1991) suggested the use of RAM to decrease the size of the resulting system of equations when working with information on flanking markers.
Therefore, the fraction of the variance of the MQTL that is explained by parental segregation is 2[r! + (1 -r)!]/2. These proportions can also be worked out from equations (8a) and (8b) in Fernando and Grossman (1989) and they agree with formulae derived by Dekkers and Dentine (1991). A slight difference between their result and the one obtained here stems from the fact that they define the variance of the MQTL as one half the variance as defined by Fernando and Grossman (1989) ( 0 .5a').
Both AM and RAM rest on knowing the variance components as well as the recombination rate between the marker gene and the QTL. As the latter parameter enters into the variance-covariance matrix of QTL effects in a rather complex manner, its estimation by the classical methods employed in animal breeding seems to be difficult, as discussed by Fernando (1990).
When more than one MQTL is being considered, covariances between pairs of MQTL effects are likely to be non-zero due to linkage disequilibrium caused by selection (Bulmer, 1985). Model (8) assumes that these covariances are zero. The extent of the error in predicting v (or functions of v) due to incorrectly assuming null covariances between MQTL effects will depend on the magnitude and sign of the covariance. If the covariances are mostly negative, which is likely to happen on a trait undergoing selection (Bulmer, 1985), 1VIQTL effects may be overpredicted. Research is in progress to overcome this restriction of model (8).
APPENDIX A Derivation of the diagonal elements of W N G E W'N when all non-parents have one record First we show that W N Ge W'tv is diagonal. Because G, is diagonal (Fernando and Grossman, 1989), we can write : where w j is the column j of W N and g j is diagonal element j of G e . Now, w j has all its elements equal to zero except for a 1 in position j. Therefore, the matrix Wj w j g j has all elements equal to zero except for element j, j which is equal to g! . The paternal and maternal MQTL additive effects of an animal are in consecutive columns of the matrix W (and W N ), w j and w j+l say, and these are equal. We then have : and W N G E WN is diagonal with non-zero elements equal to g! + g j+1 . Now, (g! +g!+1)!! = Var(eo)+Var(eo ) = 2r(I -r)(I -f,) + 2r(I -r) (I -f d ), where f, and f d are the inbreeding coefficients of sire and dam for the MQTL, respectively. The last equality follows from expressions (12a) and (12b) in Fernando and Grossman (1989). After some rearranging, the diagonal element of W NGe W! is : when both parents of the individual are known. If the sire is unknown, EP = vP 0 and the diagonal element is 2r(1 -r)(1 -f d ) + 1. If the dam is unknown, eo v' and the diagonal element is 2r(1 -r)(1 -f! ) + 1. If both parents are unknown the diagonal element of W N GE W'¡y is 2.