Prediction of breeding values with additive animal models for crosses from 2 populations

Prediction des valeurs genetiques avec des modeles individuels additifs pour des croisements a partir de 2 populations. De recentes avancees de la theorie de la covariance entre apparentes dans les croisements entre 2 populations, en heredite additive, sont utilisees pour predire les valeurs genetiques (VG) par le BLUP-modele animal. Les consequences resultant d'une definition incorrecte de la matrice de covariance des VG sont discutees. La theorie de la covariance entre apparentes en croisement a partir de 2 populations est etendue a la prediction de VG pour plusieurs caracteres. Un exemple numerique illustre les procedures de prediction.


INTRODUCTION
A common method used to genetically improve a local population is by planned migration of genes from a superior one. For example, in developing countries, US Holstein sires are mated to local cows in order to genetically improve the local Holstein population. Genetic evaluation in such populations must take into consideration the genetic differences between the local and the superior populations.
Best linear unbiased prediction (BLUP) is widely used for genetic evaluation (Henderson, 1984). BLUP methodology requires modelling genotypic means and covariances. Genetic groups are used to model differences in genetic means between populations (Quaas, 1988). However, populations can also have different genetic variances. Under additive inheritance, Elzo (1990) provided a theory to incorporate heterogeneous genetic variances in genetic evaluation by BLUP. His procedure is based on computing the additive variance for a crossbred animal as a weighted mean of the additive variances of the parental populations plus one half the covariance between parents. Lo et al (1993) showed that Elzo's theory did not account for additive variation created by segregation of alleles between populations with different gene frequencies. For example, even though the additive variance for an F2 individual should be higher than for an Fl, due to segregation (Lande, 1981), Elzo's formulation gives the same variance for both. Lo et al (1993) provided a theory to incorporate segregation variance in computing covariances between crossbred relatives, and to invert the genetic covariance matrix efficiently. The objectives of this paper are: 1) to demonstrate how the theory  can be used for genetic evaluation, ie to predict breeding values (BV), by BLUP; 2) to study the consequences of using an incorrect genetic covariance matrix on prediction of BV; and 3) to extend the theory of Lo et al (1993) to accommodate multiple traits. A numerical example is used to illustrate the principles introduced here.

MODEL
Even though the theory presented by Lo et al (1993) allowed for several breeds or strains within a breed, we focus on the case of 2. A typical situation in beef or dairy cattle is when a 'local' (L) strain or breed is crossed with an 'imported' (I) one. Usually the program starts by mating genetically superior L females with I males to produce Fl progeny. Then superior Fl females are mated to I sires to produce backcross progeny. The program is continued by repeatedly mating superior backcross females to I sires. It should be noted that L, Fl and backcross sires are also used to produce progeny. Thus, the crosses generated by such a program may includeFl=IxL,F2=FlxFI,BI=IxFI,BL=FlxL,BII=BIxF l , 5/81 = BI x Fl, 3/81 = BI x L, etc. It is shown below how genetic evaluations for such a mixture of crossbred animals can be obtained by BLUP using Henderson's (1984) mixed-model equations (MME).
Genetic evaluations are based on a vector of phenotypic records (y), which can be modelled as: where (3 is a vector of non-genetic fixed effects, a is a vector of additive genetic values or BV and e is a vector of random residuals, independent of a, with null mean and covariance matrix R. Although R can be any general symmetric matrix, in general it is taken to be diagonal, and this simplifies computing solutions of (3 and predictions of a. The incidence matrices X and Z relate (3 and a, respectively, to y. The mean and the covariance matrix of the vector of BV (a) for crossbred individuals are modelled as: and , , L where g is a vector of genetic group effects for individuals in the I and L populations, Q is a matrix relating a with the genetic groups. If there is only 1 group on each breed, Q specifies the breed composition for each individual. The matrix G contains the variances and covariance among BV as defined by Lo et al (1993).
In modelling the mean of a, genetic groups are only assigned to 'phantom' parents of known animals following the method proposed by Westell et al (1988). Quaas (1988) showed that Q can be expressed as: where P relates progeny to parents, P 6 progeny to phantom parents, and Q b is an incidence matrix that relates phantom parents to genetic groups. Elements in each row of [P b :P] are all zero, except for two 1/2's in the columns pertaining to the parents of the animals in a. It should be stressed that the above model for a assumes additive inheritance (Thompson, 1979;Quaas, 1988;Lo et al, 1993). In the genetic grouping theory of Quaas (1988), all the groups are assumed to have the same additive variance. In this model, however, we allow the I and L populations to have different additive variances, and the variances and covariances of crossbred animals are computed following the theory of Lo et al (1993). They showed that the covariance between crossbred relatives can be computed using the tabular method for purebreds (Emik and Terril, 1949;Henderson, 1976), provided that the variance of a crossbred individual i is computed as: where j and k are the parents of i, and f jl , for example, is the breed I composition of dam j, QAL is the additive variance of population L, U2 is the additive variance for population I, and QALI is the segregation variance, which results from differences in gene frequencies between the L and I populations. The term segregation variance was used by Wright (1968) and Lande (1981) to refer to the additional genetic variance due to segregation in the F2 generation over that in the Fl. Following Quaas (1988), Lo et al (1993) further showed that the inverse of the genetic covariance matrix (G), required to setup Henderson's MME, can be constructed as: where G, is a diagonal matrix with the ith diagonal element defined as: Note that these elements are linear functions of a 2 2 and !ALI!

PREDICTION OF BREEDING VALUES
Following Quaas (1988), MME for a model with genetic groups can be written as: where and The matrix H can be constructed efficiently using algorithms already available (eg, Groeneveld and Kovac, 1990). Quaas (1988) gave rules to construct E efficiently for a model with homogeneous additive variances across genetic groups. To construct E efficiently for a model with heterogeneous additive variances, replace x(= 4/[number of unknown parents + 2!) in the rules of Quaas (1988) with 1/Gg i . Henderson (1975a) showed that using an incorrect G leads to predictions that are unbiased but do not have minimum variance. His results are employed here to examine the consequences of using the same additive variance ( QA * ) for L, I and crossbred animals.

CONSEQUENCES OF USING AN INCORRECT G
Let Caa be the submatrix of a g-inverse of the right-hand-side of the MME corresponding to a, but calculated with G * = Aa2 A . . Then, as in Henderson (1975a) and Van Vleck (1993), the prediction error variance (PEV) of a is not equal to C aa , but is: where G is the correct covariance matrix of a. Now, let D be a diagonal matrix with the ith diagonal element being equal to 0.5[1 -0.5(Fs i + FDi)!, if the father (Si) and the mother (Di) of i are known, and F si is the inbreeding coefficient of Si. Therefore, PEV of a obtained from Caa will be incorrectly estimated by the second term on the right of [11] or [12]. As this term depends on the structure of G, and D, no general result can be given. However, if caa(I-p')D-1(Go -D)D-1(I-p)caa is positive definite, it adds up to C&dquo; and true PEV is underestimated. This happens if both (G, -D) and C aa (I -P')D-1 are positive definite (see, for example, theorem A.9 in page 183 of Toutenburg, 1982). Now, the fixed effects are reparameterized so that [X:ZQ] is a full rank matrix. Then, !Caa(I -P')D-']-' = D(I -P')-1 (C aa )-1 and C aa (I -P')D-1 is positive definite. Finally, if (Go -D) is positive definite its diagonal elements are positive (Seber, 1977, page 388), which in turn happens when the diagonal elements of Gi are strictly greater than corresponding elements of D. For example, this may happen whenever u 2 A LI contributes to the variance of crossbred individuals (such as F2 or 5/8I), and this variance parameter is ignored. Under these conditions PEV will be underestimated, and the amount of underestimation will depend on the magnitude of or A LI * 2 It has been shown that if all data employed to make selection decisions are available, then the BLUP of a can be computed ignoring selection (Henderson, 1975b;GofF!net, 1983;Fernando and Gianola, 1990). This result only holds when the correct covariance matrix of a is used to compute BLUP. Thus, in the improvement of a local breed by mating superior I sires to selected L females, the use of the same additive variance for the I and L populations will give biased results if the 2 populations are known to have different additive variances and the process of selection and mating to superior males is repeated.

MULTIPLE TRAITS
The theory presented by Lo et al (1993) can be extended to obtain BLUP with multiple traits. Consider the extension for 2 traits: X and Y. To obtain BLUP for X and Y the additive covariance matrices for traits X and Y, and between traits X and Y are needed. The covariance matrix for traits X and Y can be computed as described by Lo et al (1993). It is shown below how to compute the additive covariance matrix between traits X and Y.
Following the reasoning employed by Lo et al (1993) to derive the additive variance for a crossbred individual, it can be shown that the additive covariance between traits X and Y for a crossbred individual is: where a A ( XY ) L and a A < xY > I are the additive covariances for traits X and Y in populations L and I, respectively, and a A ( XY ) LI is the additive segregation covariance for populations L and I.
Provided that the covariance between traits X and Y for a crossbred individual is computed using equation !9!, the covariance between X and Y between crossbred individuals i and i' can be computed as: provided that i' is not a direct descendant of i. Now let a(2q x 1) be the vector containing the BV of q animals for the 2 traits, ordered by trait within animal. Then, G is the 2q x 2q covariance matrix between traits and individuals. Following Elzo (1990)  Now let t be the number of traits, for m = 1 to t and n = 1 to t, and add to E the following 9 contributions: where Gt i ;&dquo; n is element (m, n) of the inverse of the t x t matrix G £i , which is associated to individual i. If E is full-stored, every animal makes 9t 2 contributions.
To obtain BLUP under a maternal effects model (Willham, 1963), the additive covariance matrices for the direct effect, the maternal effects, and between the direct and maternal effects are needed. These matrices can be computed using the theory used to compute the covariance matrices for traits X and Y as described above.

NUMERICAL EXAMPLE
Consider a single trait situation where a sire from breed I (animal 1) serves 2 dams: animal 2, from breed I, and animal 3 from breed L. Individuals 1 and 2 are the parents of 4 (purebred I), and 1 and 3 are parents of 5 (an Fl male). Finally, the F2 animal 7 is the offspring of 5 and 6, the latter being an Fl dam with unknown parents. Individuals 1, 4 and 5 are males and the rest are females. Age at measure and observed data for animals 2-7 are 100 (age), 100 (data); 110, 103; 95, 160; 98, 175; 106, 105; and 100, 114; respectively. There are 2 genetic groups for breed I and 1 for L. The model of evaluation includes fixed effects of age (as a covariate), sex and genetic groups (Al, A2 and B), and random BV for animals 1 through 7. In order for [X:ZQ] to have full rank we imposed the restriction: sex 1 + sex 2 = 0, or sex 1 = -sex 2. Hence, f3 contains only 2 parameters: 1) the age covariate; and 2) the sex 1 effect (or -sex 2). Matrices y, X and Q are then equal to: Variance components are (TÃL = 80, QAI = 120 and U p LI = 50. Using [7], the diagonal elements of matrix G e (the variances of Mendelian residuals) are 80, 80, 120, 40, 50, 100 and 100 for animals 1-7 respectively. Residual variance is R = 1 6 (400). Matrix G is: MME are equal to the following matrix: Solutions are -2.903201, -28.95692, 402.73785, 431.59906, 452.07844, 402.68086, 417.28243, 452.16393, 409.6967, 427.70734, 441.69628 and 434.41686. The large absolute values of the solutions are due to multicollinearity associated with genetic groups in the model for the small example worked. This is evidenced by a small eigenvalue (4.36 x 10-7 ) in the coefficient matrix. Van Vleck (1990) obtained a similar result in an example with genetic groups for direct and maternal effects. If groups are left out of the model, solutions are 1.3540226 for the age effect, -36.19053 for the sex 1 effect, and 0.602659, -0.247433, -1.030572, -0.276959, 1.1075823, 0.6457529, 3.6589865 for the BV of animals 1-7, respectively. The smallest eigenvalue is 0.0046413, almost 10 000 times larger than the situation where genetic groups are in the model. The consequences of assuming an incorrect G can be seen by taking G * -A(100). The value of 100 for QA* is chosen because it is the average between ol2 A L = 80 and ai l = 120. To alleviate the problem associated with multicollinearity, the system is solved using regular MME (Henderson, 1984)  R is a diagonal matrix. A practical application is the analysis of data from crosses between 'foreign' and 'local' strains of a breed, as in dairy or beef breeding. Also, records from registered vs grade animals, or 'selected vs unselected', etc, can be analyzed in this fashion. Although the developments presented were in terms of 2 populations, inclusion of more than 2 can be done as indicated by Lo et al (1993). With p being the number of populations, the number of parameters in G is !p(p + 1)] /2, so that for p = 4 there are 10 variances to consider. Some of these estimates may be highly correlated depending on the type and distribution of the crosses involved.
The approach taken in the present paper differs from Elzo (1990) in the inclusion of the segregation variance (o, ALI). 2 The magnitude of this parameter depends on differences in gene frequencies between the 2 populations . The change in gene frequency due to selection is inversely related to the number of loci because change in gene frequency at a locus due to selection is proportional to the magnitude of the average effect of gene substitution at that locus (Pirchner, 1969, page 145), and the magnitude of average effects across loci tend to be inversely related to the number of loci. Thus, oA L , due to different selection criteria in 2 populations is expected to be inversely related to the number of loci. The change in gene frequency due to other forces (mutation, migration and random drift) is not related to the magnitude of average effects. Thus, o, A L I 2 due to differences in gene frequency between populations brought about by these forces is not related to the number of loci. Now, the greater the value of QALI , the larger the difference between the predictors calculated following the approach of Elzo (1990) and the one used in the present work. This is due to QALI not only entering into the diagonal elements of G, but also into off-diagonals which are functions of the diagonal elements . For example, consider the additive covariance between paternal half sibs (cov(PHS)) i and i', from common sire s and unrelated dams. By repeated use of expression [10] in Lo et al (1993), cov(PHS) is equal to: Expression [17] shows that cov(PHS) is a function of the additive variance of the sire, and is not a function of the additive variance present in progeny genotypes.
For I, Fl, BI or F2 sires, cov(PHS) are equal to: Note that or2 A LI enters into the covariance of individuals whose sire belongs to later generations than the Fl (eg, BI or F2). It must be pointed out that although predictions are still unbiased, ignoring o, 2 A LI would result in a larger shrinkage of predictions in [8] than otherwise. As there are no estimates of a 2 A LI so far, nothing can be said about the magnitude of the parameter on genetic variation of economic traits in livestock.
If dominance is not null, the model should be properly modified to take into account this non-additive genetic effect. Proper specification of the variancecovariance matrix for additive and dominance effects in crosses of 2 populations can involve as many as 25 parameters for a single trait (Lo et al, 1995). Therefore, predictors of BV and dominance deviations may be difficult to compute for a general situation, involving animals from several crossbred genotypes. Lo (1993) presented an efficient algorithm for computing BLUP in the case of 2-and 3-breed terminal crossbreeding systems under additive and dominance inheritance.
Up to this point the animal model has been employed. However, the 'reduced animal model' (Quaas and Pollak, 1980) can alternatively be used by properly writing matrix Z with 1/2's, whenever a BV of a 'non-parent' (an animal that has no progeny in the data set) is expressed as a function of its parental BV. Residual genetic variances are obtained by means of expression !7!.
In order to solve equations [8] the parameters o,2 2 2 and the residual components, have to be known. Usually variance components are unknown and should be estimated from the data. Elzo (1994) developed expressions for restricted maximum likelihood (REML) estimators of variance components (including a Ã L1 ) in multibreed populations, through the expectation-maximization algorithm.

CONCLUSIONS
For 1 or several traits governed by additive effects, predictions of BV from crosses between 2 populations can be obtained by means of animal models that allow for different additive means and heterogeneous additive genetic (co)variances. Calculations required are similar to those with homogeneous additive (co)variances.