Compatibility of pedigree-based and marker-based relationship matrices for single-step genetic evaluation

Genetics Selection Evolution201244:37

DOI: 10.1186/1297-9686-44-37

Received: 3 September 2012

Accepted: 20 November 2012

Published: 3 December 2012

Abstract

Background

Single-step methods provide a coherent and conceptually simple approach to incorporate genomic information into genetic evaluations. An issue with single-step methods is compatibility between the marker-based relationship matrix for genotyped animals and the pedigree-based relationship matrix. Therefore, it is necessary to adjust the marker-based relationship matrix to the pedigree-based relationship matrix. Moreover, with data from routine evaluations, this adjustment should in principle be based on both observed marker genotypes and observed phenotypes, but until now this has been overlooked. In this paper, I propose a new method to address this issue by 1) adjusting the pedigree-based relationship matrix to be compatible with the marker-based relationship matrix instead of the reverse and 2) extending the single-step genetic evaluation using a joint likelihood of observed phenotypes and observed marker genotypes. The performance of this method is then evaluated using two simulated datasets.

Results

The method derived here is a single-step method in which the marker-based relationship matrix is constructed assuming all allele frequencies equal to 0.5 and the pedigree-based relationship matrix is constructed using the unusual assumption that animals in the base population are related and inbred with a relationship coefficient γ and an inbreeding coefficient γ / 2. Taken together, this γ parameter and a parameter that scales the marker-based relationship matrix can handle the issue of compatibility between marker-based and pedigree-based relationship matrices. The full log-likelihood function used for parameter inference contains two terms. The first term is the REML-log-likelihood for the phenotypes conditional on the observed marker genotypes, whereas the second term is the log-likelihood for the observed marker genotypes. Analyses of the two simulated datasets with this new method showed that 1) the parameters involved in adjusting marker-based and pedigree-based relationship matrices can depend on both observed phenotypes and observed marker genotypes and 2) a strong association between these two parameters exists. Finally, this method performed at least as well as a method based on adjusting the marker-based relationship matrix.

Conclusions

Using the full log-likelihood and adjusting the pedigree-based relationship matrix to be compatible with the marker-based relationship matrix provides a new and interesting approach to handle the issue of compatibility between the two matrices in single-step genetic evaluation.

Introduction

Single-step methods for genetic evaluation [13] have recently become popular because they provide an approach to incorporate genomic information into genetic evaluations that is both coherent and conceptually simple. A single-step method extends the usual pedigree-based method by replacing the additive relationship matrix constructed from pedigree by an additive relationship matrix that combines the marker-based relationship matrix for genotyped animals with the pedigree-based relationship matrix.

An issue with a single-step method is compatibility between the marker-based relationship matrix for genotyped animals and the pedigree-based relationship matrix [46]. To handle this problem, it is necessary to determine which allele frequencies should be used in the marker-based relationship matrix and to adjust this matrix to the pedigree-based relationship matrix. In theory, one should use the allele frequencies in the founder population of the pedigree (base animals) for the marker-based and pedigree-based relationships to be compatible, but these allele frequencies are rarely available in practice since base animals are not genotyped. Chen et al. [5] and Forni et al. [7] concluded that using the observed allele frequencies improved accuracy of prediction compared to using allele frequencies equal to 0.5. Studies on how to adjust the marker-based relationship matrix to be compatible with the submatrix of the pedigree-based relationship matrix for genotyped animals have been reported [46]. These adjustments consisted of scaling and adding a number to all elements in the marker-based relationship matrix based on equating means of diagonal and off-diagonal elements in the two matrices, and it was demonstrated that accuracy of prediction increased and bias decreased. In relation to this problem of compatibility between marker-based and pedigree-based relationship matrices, with data from routine evaluations, selection affects the allele frequencies over time, and in principle both observed marker genotypes and observed phenotypes contain information about allele frequencies in the base population. Therefore, studies on the adjustment of the marker-based relationship matrix to the pedigree-based relationship matrix have overlooked the fact that it should in principle incorporate information on observed phenotypes.

This work explores two possibilities to solve the problem, i.e. 1) in which the pedigree-based relationship matrix is adjusted to the marker-based relationship matrix and 2) in which the single-step genetic evaluation is extended by using a joint likelihood of observed phenotypes and observed marker genotypes. This results in a single-step method in which the marker-based relationship matrix is constructed assuming all allele frequencies are equal to 0.5 and the pedigree-based relationship matrix is constructed using the unusual assumption that animals in the base population are related and inbreed with a relationship coefficient γ and an inbreeding coefficient γ / 2. The log-likelihood function used for parameter inference contains two terms. The first term is the REML-log-likelihood for the phenotypes conditional on the observed marker genotypes and the second term is the log-likelihood for the observed marker genotypes. The performance of the proposed method was evaluated using two simulated datasets.

Methods

This section presents, first, the statistical model on which the single-step methods are based along the lines of Christensen and Lund [3], then the proposal on how to adjust the pedigree-based relationship matrix and finally the adjustment of the marker-based relationship matrix as previously used.

Single-step model

The marker genotypes are summarised into a marker matrix m, where m ij  = −1, 0 or 1 if SNP j of individual i is 11, 12, or 22, respectively. In the following, capital M and lowercase m indicate whether marker genotypes are considered as random variables or as non-random variables (observed variables or integration variables), respectively.

Let us consider the simple model
http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_Equa_HTML.gif
where y is the vector of phenotypes, μ is the general mean, 1 is a vector of ones, a is the vector of breeding values, Z is an incidence matrix, and e is the vector of residual errors. The breeding value may be decomposed into a = g + a r , where g is the vector of genomic effects and a r  = (a − g) is the vector of residual polygenic effects. The residual polygenic effects http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq1_HTML.gif , where matrix A is the pedigree-based additive relationship matrix and ω ∈[0;1] is the relative weight on the residual polygenic effect. The genomic values http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq2_HTML.gif , with http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq3_HTML.gif , where ρ = (ρ 1, … , ρ p ) with ρ j being the allele frequency for the jth marker and http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq4_HTML.gif with v j being defined below. The model for the marker genotypes is that M is a multivariate Gaussian distribution with
http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_Equb_HTML.gif
http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_Equc_HTML.gif
where v j  = v j,j is a parameter and http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq5_HTML.gif for j ≠ j . The crude assumption that the multivariate distribution is Gaussian is crucial in the derivation, whereas the unrealistic assumption that http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq6_HTML.gif for j ≠ j is made for simplicity. Dividing marker genotypes m into observed marker genotypes m o and un-observed marker genotypes m u , the joint marginal density of observed phenotypes y and observed marker genotypes m o is
http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_Equd_HTML.gif
where
http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_Eque_HTML.gif
By rearranging terms and using that http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq7_HTML.gif , this becomes
http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_Equf_HTML.gif

where f(ym o ) is defined implicitly. The first term f(ym o ) is the density for the phenotypes given the observed marker genotypes, whereas the second term f(m o ) is the multivariate Gaussian density for the observed marker genotypes.

Single-step methods are based on the density f(ym o ), which may be written (see [3]) as
http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_Equg_HTML.gif
where the vector a has a mean zero and a variance-covariance matrix H ω with inverse
http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_Equ1_HTML.gif
(1)

with G ω = (1 − ω)G(m o ) + ω A 11. In the above formula, the sub-division is made according to genotyped and non-genotyped animals. The matrices http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq8_HTML.gif and A 11 are the marker-based relationship matrix for the genotyped animals and the submatrix of the pedigree-based relationship matrix corresponding to genotyped animals, respectively. The sparse structure of the matrix in equation (1) is the corner-stone for efficient computing using a single-step method. Assuming that am o is Gaussian distributed, mixed model equations can be solved for BLUP predictions and AI-REML [8] provides REML parameter estimates.

Issues raised by the single-step method are 1) how are the allele frequencies ρ that are used in G(m o ) obtained and 2) how is the required compatibility between G(m o ) and A 11 that is evident in equation (1) reached. To investigate these issues, the joint density of observed phenotypes and marker genotypes, f(y, m o ) = f(ym o )f(m o ), is taken as the starting point. From this, the full marginal log-likelihood becomes
http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_Equ2_HTML.gif
(2)
where http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq9_HTML.gif for fixed ω, ρ and http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq10_HTML.gif is the single-step REML-log-likelihood for the phenotypes conditional on the observed marker genotypes, and
http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_Equh_HTML.gif

with n 1 denoting the number of genotyped animals, is the log-likelihood of the observed marker genotypes. Thus, the parameter ρ enters into both terms of the full log-likelihood in equation (2), and this implies that estimation of allele frequencies ρ should be in principle based on this log-likelihood. In particular, if selection has been performed the phenotypes will contain information about the allele frequencies which may cause bias if ignored. However, estimating ρ and s = s(v) by maximising http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq11_HTML.gif for all parameters jointly is not feasible in practice since ρ is a very high-dimensional parameter. Instead, in general the observed marker genotypes are used to estimate ρ and then they are plugged into http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq12_HTML.gif , and this log-likelihood is used to estimate the remaining parameters. Estimation of ρ based on observed marker genotypes may consist of simply using the observed allele frequencies [5, 7], or may be done by maximising mark (ρ v) as a function of ρ (this is essentially the method of Gengler et al. [9], although, for computational reasons, that method adds a small residual error to the distribution of M). The high-dimensional parameter v also enters into both terms of the full log-likelihood, although it only enters into http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq13_HTML.gif via the scaling parameter s = s(v), and therefore estimation of v should also be in principle based on maximising the full log-likelihood. Usually, estimating http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq14_HTML.gif is based on the observed marker genotypes, either using http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq15_HTML.gif , implying http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq16_HTML.gif as in [2, 3], or using s such that the average diagonal of G equals the average diagonal of A 11 as in [7]. Furthermore, it has been demonstrated that the accuracy of prediction is improved and bias is reduced by adjusting the marker-based relationship matrix as G(m o )β + α where α and β are based on the elements in G(m o ) and A 11; further details are given below. It should be noted that such an adjustment is based on observed marker genotypes only, and lacks a theoretical justification within the framework considered here. To summarise, the allele frequencies in G(m o ) and the adjustments necessary for the compatibility of G(m o ) and A 11 should be in principle derived based on both observed marker genotypes and observed phenotypes.

Furthermore, when computing BLUP breeding values http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq17_HTML.gif with plugged-in parameter estimates, the uncertainty in these parameters estimates is ignored, which is an important issue in the case of the high-dimensional parameter ρ. Alternatively, uncertainty in parameter estimates may be incorporated into the predictions by using a Bayesian approach. Demonstration that a Bayesian approach with prior distributions on ρ and v results in an method in which the pedigree-based relationship matrix is adjusted to a marker-based relationship matrix, is presented below.

Adjusting the pedigree-based relationship matrix

The derivation of the proposed adjustment of the pedigree-based relationship matrix is based on assigning priors on the high-dimensional parameters ρ and v in the previously described single-step model, and then considers the first and second order moments of the marginal distribution of M(integrating ρ and v). Appendix A shows that the resulting marker distribution satisfies
http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_Equi_HTML.gif
http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_Equj_HTML.gif
with http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq18_HTML.gif , j = 1, … , p, http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq19_HTML.gif being a parameter, and http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq20_HTML.gif when j ≠ j . Matrix http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq21_HTML.gif is an additive relationship matrix that satisfies the usual recursions but with the peculiar feature that base animals in the pedigree are related and inbred. Within the population of base animals, the relationship coefficient is γ and the inbreeding coefficient is γ / 2. Table 1 contains data for a small pedigree used to derive matrix http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq22_HTML.gif as shown in Table 2. The mean and variance-covariance structure of M shown above is of the form in [3], with scaling parameter http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq23_HTML.gif , and hence the breeding values g have a combined relationship matrix http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq24_HTML.gif , where the inverse is
http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_Equ3_HTML.gif
(3)
Table 1

Example pedigree

id

sire

dam

1

0

0

2

0

0

3

1

2

4

1

3

5

1

2

6

5

4

Table 2

http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq25_HTML.gif for the pedigree in Table 1

id

1

2

3

4

5

6

1

1 + γ / 2

     

2

γ

1 + γ / 2

    

3

1/2 + 3γ / 4

1/2 + 3γ / 4

1 + γ / 2

   

4

3/4 + 5γ / 8

1/4 + 7γ / 8

3/4 + 5γ / 8

5/4 + 3γ / 8

  

5

1/2 + 3γ / 4

1/2 + 3γ / 4

1/2 + 3γ / 4

1/2 + 3γ / 4

1 + γ / 2

 

6

5/4 + 3γ / 8

3/8 + 13γ / 32

5/8 + 11γ / 16

7/8 + 9γ / 16

3/4 + 3γ / 16

5/4 + 3γ / 8

with http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq26_HTML.gif . This construction is a single-step method for which the individuals in the base population are related and inbred, and the marker-based relationship matrix, http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq27_HTML.gif , has allele frequencies equal to 0.5.

Based on the derivation above (see also Appendix A), the full marginal log-likelihood function for parameter inference becomes
http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_Equ4_HTML.gif
(4)
where http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq28_HTML.gif for fixed ω, γ and http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq29_HTML.gif is the single-step REML-log-likelihood for the phenotypes conditional on the observed marker genotypes, with marker-based relationship matrix http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq30_HTML.gif and pedigree-based relationship matrix http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq31_HTML.gif , and
http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_Equ5_HTML.gif
(5)

is the log-likelihood of the observed marker genotypes. Using the log-likelihood in equation (4) instead of the log-likelihood in equation (2) makes the estimation of parameters feasible in practice by numeric maximization methods since the high-dimensional parameters ρ and v are replaced by two parameters, a parameter γ that determines the relationship and inbreeding of individuals in the base population and a parameter http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq32_HTML.gif that scales the marker-based relationship matrix.

The computations require algorithms that compute A(γ)−1 and A 11(γ) efficiently. Computations of inbreeding coefficients with related base animals have been considered by [10, 11] in the context of incomplete pedigrees. In Appendix B, algorithms are presented that extend the approaches of Quaas [12] for computing A −1 and of Colleau [13] for computing A 11 to the case where base animals are related and inbred.

Maximisation of the full log-likelihood in equation (4) is done by first specifying a discrete three-dimensional grid of values for parameters http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq33_HTML.gif and then, for each value of http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq34_HTML.gif , computing the maximum values of the log-likelihood http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq35_HTML.gif and the log-likelihood of observed marker genotypes http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq36_HTML.gif . This provides a three-dimensional profile log-likelihood http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq37_HTML.gif , which can then be assessed to find the maximum.

For faster computing, an alternative to using the full log-likelihood is to determine parameters γ and http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq38_HTML.gif based on observed marker genotypes only, i.e. by maximising http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq39_HTML.gif , and to estimate the remaining parameter based on http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq40_HTML.gif for a grid of values for ω, with estimates of γ and http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq41_HTML.gif plugged in. Setting the derivative of http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq42_HTML.gif with respect to http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq43_HTML.gif equal to zero gives
http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_Equk_HTML.gif
which has the solution
http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_Equl_HTML.gif
Substituting http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq44_HTML.gif into equation (5), we obtain
http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_Equm_HTML.gif

which has to be maximised numerically to estimate γ.

Adjusting the marker-based relationship matrix (G-adjust)

Alternatively, an adjustment of the form G a  = G β + α is used, where G is the marker-based relationship matrix with allele frequencies http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq45_HTML.gif equal to the observed ones and scaling parameter http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq46_HTML.gif , and parameters α and β are determined by fitting G a to A 11. This adjustment was used by VanRaden [14], Christensen et al. [6] and Gao et al. [15], with the first paper suggesting that α and β should be estimated by least square estimation, i.e. by minimizing the sum of squares of G β + α − A 11 and the other two papers suggesting that they should be to estimated by equating means of diagonal elements and all elements in the two matrices. Here, the later is applied and α and β are estimated by solving the two equations
http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_Equ6_HTML.gif
(6)

for α and β, where http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq47_HTML.gif and http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq48_HTML.gif denote means of all elements in the two matrices, and dG and dA denote means of diagonal elements in the two matrices.

Simulated example 1

This example is deliberately simple and very extreme, and constructed for the purpose of showing that parameter estimates of γ and http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq49_HTML.gif can depend on the observed phenotypes.

The base population consists of two individuals, one sire and one dam with 15 bi-allelic markers, and their genotypes are simulated assuming independence between markers and with equal allele frequencies. Furthermore, it is assumed that the markers are independently inherited, are all QTL with allele substitution effect equal to 1, and heritability of the phenotype is equal to 1.

The two base individuals produce 100 offspring (generation 1) that all have observed phenotypes. The two individuals in generation 1 with the largest own phenotype value are selected as parents and produce 100 offspring (generation 2) that are all genotyped.

Two different approaches to estimate parameters are compared. In the first approach, all parameters are estimated using the full log-likelihood in equation (4). In the second approach, γ and http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq50_HTML.gif are estimated based only on the log-likelihood of the observed marker genotypes in equation (5), and the remaining parameters are estimated based on the REML-log-likelihood http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq51_HTML.gif , with estimates of γand http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq52_HTML.gif plugged-in.

Simulated example 2

This example is inspired by a pig nucleus breeding scheme and consists of five generations in which all animals have recorded phenotypes. In each generation, 150 boars are mated to 1 500 sows to produce 15 000 offspring (50/50 males/females). For the next generation, boars with a high own phenotype value are chosen and sows are selected at random. The last three generations of selected boars are genotyped and a sixth generation of 300 candidate boars are also genotyped. The breeding value is the sum of 500 independent QTL effects simulated from a Gamma(5.4,0.42) distribution, and the heritability of the phenotype is 0.22. This dataset is described in more detail in Christensen and Lund [3].

Three different approaches are compared. In the first approach, all parameters are estimated using the full log-likelihood in equation (4). In the second approach, parameters γ and http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq53_HTML.gif are estimated based only on the log-likelihood of the observed marker genotypes in equation (5), and the remaining parameters are estimated based on the REML-log-likelihood http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq54_HTML.gif , with estimates of γ and http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq55_HTML.gif plugged in. Finally, the G-adjust approach is used for which α and β are estimated using equation (6) and the remaining parameters are estimated by http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq56_HTML.gif . For all three approaches, the correlation between predicted breeding value http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq57_HTML.gif and true breeding value a for the candidate boars is reported as well as the estimated regression coefficient (reg) for the regression of a on http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq58_HTML.gif , where deviation from one indicates bias.

Results

Simulated example 1

Table 3 shows that both γand http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq59_HTML.gif were smaller when estimated with the full log-likelihood than with the log-likelihood of the observed marker genotypes.
Table 3

Parameter estimates obtained with simulated dataset 1

 

http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq60_HTML.gif

http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq61_HTML.gif & http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq62_HTML.gif

http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq67_HTML.gif

0.524

0.542

http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq68_HTML.gif

1.068

1.081

http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq69_HTML.gif

0.005

0.005

http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq70_HTML.gif

1.186

1.355

http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq71_HTML.gif

2.370

2.312

Parameters were estimated either, jointly using the full log-likelihood http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq63_HTML.gif , or by first estimating γ and http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq64_HTML.gif using http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq65_HTML.gif and then the other parameters using http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq66_HTML.gif .

Simulated example 2

Table 4 shows that γ and http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq72_HTML.gif were slightly smaller when estimated with the full log-likelihood than with the log-likelihood of the observed marker genotypes. Parameter ω was about 0.375 whether estimated with the full log-likelihood or with the log-likelihood http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq73_HTML.gif , with parameter estimates of γ and http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq74_HTML.gif from http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq75_HTML.gif plugged-in. When presented in three dimensions, the profile log-likelihood showed a very weak association between http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq76_HTML.gif and ω. A contour plot of the profile log-likelihood surface for γ and http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq77_HTML.gif is shown in Figure 1 and the profile log-likelihood function for ω is shown in Figure 2.
http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_Fig1_HTML.jpg
Figure 1

Profile log-likelihood for γ and http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq78_HTML.gif . A contour-plot of the profile log-likelihood for parameters γ and http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq79_HTML.gif based on the full log-likelihood in equation (5); the plot is constructed with values in a discrete grid (explaining the roughness of the plot) that have been standardised such that the maximum value is zero.

http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_Fig2_HTML.jpg
Figure 2

Profile log-likelihood for ω. The profile log-likelihood function for parameter ω based on the full log-likelihood in equation (5); the plot is constructed with values in a discrete grid that have been standardised such that the maximum value is zero.

Table 4

Parameter estimates and prediction performance obtained with simulated dataset 2

 

http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq80_HTML.gif

http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq81_HTML.gif & http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq82_HTML.gif

G-adjust

http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq90_HTML.gif

0.4605

0.4615

 

http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq91_HTML.gif

0.9996

1.0003

 

α

  

0.0134

β

  

1.0074

http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq92_HTML.gif

0.375

0.375

0.60

http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq93_HTML.gif

6.507

6.512

5.084

http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq94_HTML.gif

15.816

15.816

15.797

http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq95_HTML.gif

0.536

0.536

0.493

reg

1.17

1.17

1.34

Parameters were estimated either, jointly using the full log-likelihood http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq83_HTML.gif , or by first estimating γ and http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq84_HTML.gif using http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq85_HTML.gif and then the other parameters using http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq86_HTML.gif ; the last column contains parameter values estimated with the single-step method with an adjusted marker-based relationship matrix, in which first α and β were estimated using equation (6) and then the remaining parameters were estimated using http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq87_HTML.gif ; the last two rows show the correlation between predicted and true breeding values, http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq88_HTML.gif , for candidate boars, which is a measure of prediction performance, and the regression coefficient (reg) for the regression of aon http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq89_HTML.gif .

Estimated parameters obtained with the G-adjust approach are also shown in Table 4, where it should be noted that http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq96_HTML.gif .

In terms of prediction performance, the correlation between predicted and true breeding value and the regression coefficient were 0.536 and 1.17, respectively, in both cases when the pedigree-based matrix was adjusted, but 0.493 and 1.34, respectively, when the G-adjust approach was used.

Discussion

This paper provides a coherent approach to handle the issue of compatibility between pedigree-based and marker-based relationship matrices in single-step genetic evaluation. The approach is computationally fast and feasible for large datasets, as is the case for previously developed single-step methods. Parameter γ can adjust the pedigree-based relationship matrix to the marker-based relationship matrix that is scaled by parameter http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq97_HTML.gif , and these two parameters should in principle be estimated using the full log-likelihood for observed phenotypes and observed marker genotypes. This is computationally feasible by computing the full log-likelihood values for parameters γ, http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq98_HTML.gif and ω in a three-dimensional grid. However, in practice this can be computationally burdensome and an appealing alternative is to estimate γ and http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq99_HTML.gif based on the observed marker genotypes only. Analysis of simulated datasets, shows that the estimates of parameters γ and http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq100_HTML.gif depend on the observed phenotypes as well as on the observed marker genotypes, although this dependence is not too large. A conjecture is that in a scenario with a small number of genotyped animals it is more important to use the full log-likelihood than in a scenario with a large number of genotyped animals. Based on these two simulated datasets only, no general conclusion can be made and further studies are needed to determine in which scenarios it would be safe to base inference on a two-step procedure in which γ and http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq101_HTML.gif are estimated based on the observed marker genotypes only, and the remaining parameters are then estimated by the log-likelihood of the phenotypes conditional on the observed marker genotypes.

Using the approach developed in this paper may also provide insight on performance of other approaches for making marker-based and pedigree-based relationship matrices compatible in single-step genetic evaluation. Examples of such approaches are the adjustments of the marker-based relationship matrix reported in [46, 14, 15] and also investigated here, and the approach by Meuwissen et al. [16]. In [16], first, an average of position specific identical by descent matrices based on linkage information [17] (hereafter denoted G FG ) were used instead of the combined relationship matrix (1), and second, the final method consisted of replacing the pedigree-based relationship matrix A by the matrix G FG in matrix (1) and then adjusting the marker-based relationship matrix to the G FG matrix. Based on simple simulated datasets, without selection, the method resulted in high accuracy and low bias. The computation of G FG is computationally burdensome and therefore this approach was not considered here. It should be noted that for these other approaches, the combined relationship matrix is constructed based on observed marker genotypes only, and the inference on remaining parameters is based on the log-likelihood conditional on the observed marker genotypes. The approach in this paper may provide guidelines on when it is safe to base inference on such procedures in which, first the observed marker genotypes are used to construct a combined relationship matrix, and second the remaining inference is based on the log-likelihood for the phenotypes conditional on observed marker genotypes.

Evaluation of accuracy and bias of prediction with the simulated dataset 2 shows that the approaches based on adjusting the pedigree-based relationship matrix, where http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq102_HTML.gif and reg = 1.17, seem to perform better than the G-adjust approach, where http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq103_HTML.gif and reg = 1.34. However, http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq104_HTML.gif is also larger in the second case (0.60) than in the first case (0.375). If ω is set at 0.375 in the second case then http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq105_HTML.gif , which is close to the value in the first case, but reg is somewhat larger, i.e. 1.30. If ω is set at 0.1, then http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq106_HTML.gif and reg = 1.00 in the first case, and http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq107_HTML.gif and reg = 1.12 in the second case, and if ω is set at 0.01, then http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq108_HTML.gif and reg = 1.05 in the second case. Thus, the first two approaches seem to perform better in terms of estimating a more proper ω and have somewhat less bias, while accuracy is similar for all three approaches.

This paper suggests that the pedigree-based relationship matrix should be adjusted instead of the marker-based relationship matrix. On the one hand, from a practical point of view this would be rather inconvenient since standard software used for REML estimation and BLUP would have to be modified, but on the other hand, it is conceptually simpler and may be easier to extend. For example, when the population consists of a mixture of breeds, it may be simpler to extend the approach in this paper and specify a parametric structure on the relationships of the animals in the base population and estimate those parameters, instead of developing an appropriate way of adjusting the marker-based relationship matrix of the genotyped animals across breeds. An additional issue is the interpretation of the genetic variance parameter http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq109_HTML.gif . The parameter estimates in Table 1 are http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq110_HTML.gif when http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq111_HTML.gif , http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq112_HTML.gif when http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq113_HTML.gif , and http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq114_HTML.gif for G-adjust, where base animals are unrelated, i.e. γ = 0. This may seem counter-intuitive since the variance of breeding values for a base animal is http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq115_HTML.gif . However, when studying the inverse relationship matrix http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq116_HTML.gif , then the averages of diagonal elements are 3.807, 3.809 and 2.93 for the three cases, respectively, and since 3.807/6.507, 3.809/6.512 and 2.93/5.084 are roughly the same, the parameter estimates actually make good sense, although the interpretation of http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq117_HTML.gif is unclear. Finally, it should be noted that parameter γ could influence to some extent accuracy and bias of prediction, and neither γ = 0 nor γ estimated to make marker-based and pedigree-based relationships compatible, would be optimal for that purpose. Adjusting the pedigree-based relationship instead of the marked-based relationship matrix to make the two matrices compatible in a single-step method is an interesting alternative.

For the simulated dataset 1, the estimate of the relative polygenic weight ω was about 0, as was expected since in the simulation the markers were assumed to capture all the genetic variation. However, for the simulated dataset 2, http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq118_HTML.gif was large when adjusting the pedigree-based relationship matrix, i.e. 0.375, and even larger, i.e. 0.60, when the G-adjust approach was used. For the second dataset, the prediction biases were reduced when ω was set at 0.1. This poses the question whether it is actually possible to estimate ω at a reasonable value from data, or whether ω should be determined manually to control the bias. It should be noted that confidence intervals of this parameter are large in these examples and further studies are needed.

Conclusions

Using the full log-likelihood and adjusting the pedigree-based relationship matrix to be compatible with the marker-based relationship matrix provides a new and interesting approach to handle the issue of compatibility between the two matrices in single-step genetic evaluation.

Appendix A

The derivation of the proposed adjustment of the pedigree-based relationship matrix is based on assigning priors on the high-dimensional parameters ρ and v in the previously described single-step model. This derivation is presented here.

The model for the marker genotypes, M, given the allele frequencies ρ and variances v, is as in [3],
http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_Equn_HTML.gif
http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_Equo_HTML.gif

where v j  = v j,j is a parameter and http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq119_HTML.gif for j ≠ j . Prior distributions for ρ and v are not specified explicitly, only the necessary assumptions are stated. Assuming that the alleles are randomly labeled 1 and 2, then a priori the expected allele frequency is E[ρ j ] = 0.5. Additional assumptions are that η 1 = Var[ρ j ] and η 2 = E[v j ] do not depend on j, and that http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq120_HTML.gif for j ≠ j .

With these prior distributions, a marginal distribution of M may be obtained by integrating ρ and v. Using well-known formulas for conditional expectations and covariances, the mean and covariances become
http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_Equp_HTML.gif
http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_Equq_HTML.gif
http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_Equr_HTML.gif
Defining http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq121_HTML.gif , and γ = 4η 1 / (2η 1 + η 2), we obtain
http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_Equs_HTML.gif

with http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq122_HTML.gif . It is not difficult to see that http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq123_HTML.gif satisfies the usual recursions for an additive relationship matrix. Requiring that 0 ≤ γ < 1 is equivalent to making a further assumption that 2η 1 < η 2 (this is for example satisfied when ρ j U] 0;1 and [v j  = 2ρ j (1 − ρ j ), where η 1 = 1/12 and η 2 = 1/3). The first and second order moments of M are therefore of the form considered in [3], with pedigree-based relationship matrix http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq124_HTML.gif , allele frequencies http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq125_HTML.gif , scaling parameter http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq126_HTML.gif with http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq127_HTML.gif not depending on j, and marker-based relationship matrix http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq128_HTML.gif .

Note that the random labelling of alleles as 1 or 2 is not important because the resulting expression does not depend on how alleles are labelled.

Using http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq129_HTML.gif , the log-likelihood for the observed marker genotypes becomes
http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_Equt_HTML.gif

This log-likelihood may also be viewed as the log-likelihood for http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq130_HTML.gif being Wishart distributed http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq131_HTML.gif .

Appendix B

The aim here is to present algorithms for computing http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq132_HTML.gif and http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq133_HTML.gif by recursions. Computations of inbreeding coefficients with related based animals were considered by [10, 11]. For simplicity, let’s assume that, either both parents are known or both parents are unknown. This is not an important restriction in practice, since the case with one unknown parent can be handled by assigning an artificial animal-id to this unknown parent and by letting both its parents be unknown.

Let us partition matrix http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq134_HTML.gif (skipping the dependence on γ in the notation from now and onwards) according to whether the animals are base animals (both parents unknown) or not
http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_Equu_HTML.gif
Similar to the usual A matrix (see [18]) a decomposition exists
http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_Equv_HTML.gif

where T is a lower triangular matrix with entries T ii  = 1, T ik  = (T f(i)k + T m(i)k ) / 2 when i has known parents f(i) and m(i) and i > k, and T ik  = 0 otherwise (k > i or i has unknown parents), and http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq135_HTML.gif is a diagonal matrix containing the variance of the Mendelian sampling part for matrix http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq136_HTML.gif , i.e. http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq137_HTML.gif . Comparing this decomposition to the decomposition for the usual A matrix, then matrix http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq138_HTML.gif has replaced an identity matrix and http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq139_HTML.gif has replaced a diagonal matrix containing the Mendelian sampling terms related to A for the animals with known parents.

The inverse matrix becomes
http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_Equw_HTML.gif

Here T −1 is a lower triangular matrix with ones on the diagonal and the only non-zero elements being −1 / 2 for offspring-parent entries, and matrix http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq140_HTML.gif has diagonal elements equal to (1 + (n bas −3/2)γ) / ((1−γ/2)(1 + (n bas −1/2)γ)) and off-diagonal elements equal to − γ / ((1 − γ / 2)(1 + (n bas  − 1 / 2)γ)), where n bas is the dimension of http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq141_HTML.gif .

A procedure to obtain the diagonal of http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq142_HTML.gif and the inverse http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq143_HTML.gif can be constructed similar to the algorithm by Quaas [12], which utilises the form http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq144_HTML.gif where
http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_Equx_HTML.gif

Matrix http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq145_HTML.gif is a lower triangular matrix such that http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq146_HTML.gif . First, matrix http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq147_HTML.gif is computed and http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq148_HTML.gif is set to 1 + γ / 2 for the base animals. Second, the remaining part of the algorithm is the same as in Quaas [12], where recursively, rows in L are computed using http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq149_HTML.gif and http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq150_HTML.gif . Obtaining http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq151_HTML.gif at the same time is done by first setting elements in http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq152_HTML.gif for base animals equal to the elements in http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq153_HTML.gif , and then adding elements to http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq154_HTML.gif in the usual way (assuming a half-stored format, add http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq155_HTML.gif to the (i,i) element, add http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq156_HTML.gif to the (i,f(i)) and (i,m(i)) elements, and add http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq157_HTML.gif to the (f(i),f(i)), (m(i),m(i)) and (m(i),f(i)) elements).

Matrix http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq158_HTML.gif can be obtained by a modification of the Colleau algorithm [13, 19, 20] by using an algorithm to compute http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq159_HTML.gif , where x is a vector. The product of matrix http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq160_HTML.gif and vector x is
http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_Equ7_HTML.gif
(7)
Therefore, an algorithm (with notation similar to that in Appendix A in Misztal et al. [19]) consists of first computing r = T T x by solving the sparse system http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq161_HTML.gif for r, then computing
http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_Equy_HTML.gif

and finally computing http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq162_HTML.gif by solving the sparse system http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq163_HTML.gif for http://static-content.springer.com/image/art%3A10.1186%2F1297-9686-44-37/MediaObjects/12711_2012_2555_IEq164_HTML.gif .

Declarations

Acknowledgements

The work was performed in a project funded through the Green Development and Demonstration Programme (grant no. 3405-11-0279) by the Danish Ministry of Food, Agriculture and Fisheries, the Pig Research Centre and Aarhus University. The author thanks two anonymous reviewers, an associate editor and editor Jack Dekkers for their comments, and in particular editor Hélène Hayes for her careful and competent language revisions.

Authors’ Affiliations

(1)
Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University

References

  1. Legarra A, Aguilar I, Misztal I: A relationship matrix including full pedigree and genomic information. J Dairy Sci 2009, 92:4656–4663.PubMedView Article
  2. Aguilar I, Misztal I, Johnson DL, Legarra A, Tsuruta S, Lawlor TJ: Hot topic: A unified approach to utilize phenotypic, full pedigree, and genomic information for genetic evaluations of Holstein final score. J Dairy Sci 2010, 93:743–752.PubMedView Article
  3. Christensen OF, Lund MS: Genomic prediction when some animals are not genotyped. Genet Sel Evol 2010, 42:2.PubMedView Article
  4. Vitezica ZG, Aguilar I, Misztal I, Legarra A: Bias in genomic predictions for populations under selection. Genet Res 2011, 93:357–366.View Article
  5. Chen CY, Misztal I, Aguilar I, Legarra A, Muir WM: Effect of different genomic relationship matrices on accuracy and scale. J Anim Sci 2011, 89:2673–2679.PubMedView Article
  6. Christensen OF, Madsen P, Nielsen B, Ostersen T, Su G: Single-step methods for genomic evaluation in pigs. Animal 2012, 6:1565–1571.PubMedView Article
  7. Forni S, Aguilar I, Misztal I: Different genomic relationship matrices for single-step analysis using phenotypic, pedigree and genomic information. Genet Sel Evol 2011, 43:1.PubMedView Article
  8. Gilmour AR, Thompson R, Cullis BR: Average information REML: an efficient algorithm for parameter estimation in linear mixed models. Biometrics 1995, 51:1440–1450.View Article
  9. Gengler N, Mayeres P, Szydlowski M: A simple method to approximate gene content in large pedigree populations: application to the myostation gene in dual-purpose Belgian Blue cattle. Animal 2007, 1:21–28.PubMedView Article
  10. VanRaden PM: Accounting for inbreeding and crossbreeding in genetic evaluation of large populations. J Dairy Sci 1992, 75:3136–3144.View Article
  11. Colleau JJ, Sargolzaei M: MIM: an indirect method to assess inbreeding and coancestry in large incomplete pedigrees of selected dairy cattle. J Anim Breed Genet 2011, 128:163–173.PubMedView Article
  12. Quaas RL: Computing the diagonal elements and inverse of a large numerator relationship matrix. Biometrics 1976, 32:949–953.View Article
  13. Colleau JJ: An indirect approach to the extensive calculation of relationship coefficients. Genet Sel Evol 2002, 34:409–421.PubMedView Article
  14. VanRaden PM: Efficient methods to compute genomic predictions. J Dairy Sci 2008, 91:4414–4423.PubMedView Article
  15. Gao H, Christensen OF, Madsen P, Nielsen US, Zhang Y, Lund MS, Su G: Comparison on genomic predictions using GBLUP models and two single-step blending methods with different relationship matrices in the Nordic Holstein population. Genet Sel Evol 2012, 44:8.PubMedView Article
  16. Meuwissen THE, Luan T, Woolliams JA: The unified approach to the use of genomic and pedigree information in genomic evaluations revisited. J Anim Breed Genet 2011, 128:429–439.PubMedView Article
  17. Fernando RL, Grossman M: Marker assisted selection using best linear unbiased prediction. Genet Sel Evol 1989, 21:467–477.View Article
  18. Mrode RA: Linear models for the prediction of animal breeding values. Wallingford: CABI Publishing; 2005.View Article
  19. Misztal I, Legarra A, Aguilar I: Computing procedures for genetic evaluation including phenotypic, full pedigree and genomic information. J Dairy Sci 2009, 92:4648–4655.PubMedView Article
  20. Aguilar I, Misztal I, Legarra A, Tsuruta S: Efficient computation of the genomic relationship matrix and other matrices used in single-step evaluation. J Anim Breed Genet 2011, 128:422–428.PubMedView Article

Copyright

© Christensen; licensee BioMed Central Ltd. 2012

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://​creativecommons.​org/​licenses/​by/​2.​0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Advertisement