Active international trade of semen and embryos of dairy cattle has created a need for global comparisons of genetic merit of sires. The International Bull Evaluation Service, Interbull, was established in 1983 to respond to this need. International breeding values of dairy bulls are currently estimated three times a year and they are expressed in the units of each member countries and are relative to each country's own base group of animals [1]. In order to accurately perform the evaluations, reliable genetic parameters, i.e., variance components and genetic correlations, are required.

Daughter groups in different countries are assumed to be genetically correlated but environmentally uncorrelated. Therefore, each biological trait under evaluation is treated as a different trait for each country participating in the international sire evaluation. Typically, some countries are very highly correlated. The multi-dimensionality and high genetic correlations create several problems such as over-parameterized models, increased sampling variances and an increased probability of parameters to be outside the boundaries of the parameter space, e.g. [2]. For restricted maximum likelihood (REML) estimation, these, in turn, complicate maximization of the likelihood and thus, exacerbate the time needed to estimate variance components. The number of countries participating in the international Holstein sire evaluation for protein yield in 2011 is 28. This requires estimation of a 28 × 28 (co)variance (VCV) matrix described by 406 parameters, if the genetic (co)variance matrix is considered to be unstructured. The current practice is to estimate this matrix by performing a number of separate analyses considering selected sub-sets of countries [3, 4]. The resulting estimates are then combined to build up the complete VCV matrix. Typically, this results in a non-positive definite matrix and a "bending" procedure is applied to ensure that the overall matrix is valid [5].

Principal component (PC) and factor analytic (FA) models provide a highly parsimonious structure for the VCV matrix compared to the standard multi-trait model, e.g. [6, 7] and they have, therefore, attracted considerable interest for their potential to ease the burden of the estimation process for multiple-trait across country evaluations (MACE) [8]. Both approaches decompose the genetic covariance matrix into pertaining matrices of eigenvalues and eigenvectors. Each eigenvector, i.e., PC, forms a linear combination of the traits, while the corresponding eigenvalue gives the variance explained. PC are independent of each other.

The aim of the PC method is to detect all necessary components explaining variation in multi-dimensional data without loosing any important information. The first PC explains the maximum amount of genetic variability in the data and each successive PC explains the maximum amount of the remaining variability. For highly correlated traits, only the leading PC have practical influence on genetic variation and PC with a negligible effect can be omitted without impairing accuracy of estimation. Furthermore, the parameter reduction results in a rank reduction and in a reduction of the dimension of the mixed model equations.

The FA method is related to the PC method but its approach is different. The traits studied are assumed to be linear combinations of a few latent variables, referred to as common factors. Any variance not explained by these is modelled separately, i.e. as trait-specific, by fitting corresponding specific factors. Due to the partitioning of variance into common and trait-specific variance, the number of factors needed to explain the variability in the data is normally notably smaller than the number of PC needed in the PC approach. Further, since the factors are assumed to be uncorrelated, substantial sparsity of the mixed model equation (MME) is gained compared to the standard unstructured multivariate analysis. However, the resulting (co)variance matrix is of full rank if all trait-specific variances are non-zero. Furthermore, factor axes can be rotated. Normally, this is done to ease their interpretation, but it also makes it possible to use the Cholesky parameterization that enhances the convergence rate of maximum likelihood estimation, e.g. [7, 9].

Madsen et al. [10] were the first to suggest the use of reduced rank covariance matrices for MACE. Instead of using standard expectation-maximization algorithm for REML estimation of variance components for MACE, they studied the feasibility of exploiting an average-information (AI) algorithm that is known to be fast and effective. They developed an AI-REML algorithm, which evaluates for each round, whether or not the VCV matrix is positive definite. If a non-positive definite matrix is encountered, the original VCV matrix is decomposed and all eigenvalues less than the operational zero are replaced with a small positive number. Thus, their method is not a real reduced rank method in the sense that small or negative eigenvalues would have been removed. In turn, Leclerc et al. [11] studied both PC and FA approaches for a sub-set of well-linked base countries, performing dimension reduction for this sub-set and then estimating genetic correlations between the remaining and the base countries, keeping the genetic correlations among the base countries fixed. When applying the approach proposed by Leclerc et al. [11], special emphasis should be placed on selection of suitable base countries.

Mäntysaari [12] introduced a bottom-up PC approach that begins with a sub-set of countries and adds the remaining countries sequentially. By examining in each step whether or not the new country increases the rank of the genetic VCV matrix, the bottom-up approach only fits PC with non-negligible eigenvalues and thus avoids over-parameterized models. While this original study was performed with a simulated dataset, recent work has demonstrated the usefulness of this approach to estimate the variance components for MACE [13, 14].

Typically, the conventional PC analysis is done after the complete VCV matrix has been estimated. Then, the matrix is decomposed and if possible, its dimension is reduced. Kirkpatrick and Meyer [15] suggested the direct estimation of the leading principal components (direct PC). However, this requires the appropriate rank to be known or to be estimated prior to the variance component analysis. Similarly, a VCV matrix imposing a FA structure can be estimated directly [6]. However, a too stringent parameter reduction should be avoided since selecting too low a rank can lead to biased estimates of genetic parameters [14, 15]. This is, because the number of available parameters is no longer sufficient to describe the (co)variance structure of the model adequately, and part of the genetic variance will be re-partioned into the residual variance. Furthermore, with more than one matrix to be estimated, the reduced rank estimator can be inconsistent, i.e. pick up the wrong subset of PC [2]. The risk of this happening when relatively few PC are considered is high.

Both direct PC and FA approaches have been applied to beef cattle datasets and have demonstrated their potential to be used for large, multi-trait data sets, e.g. [16, 17]. In addition, the direct PC approach proved to be an appealing method to estimate variance components for MACE in a recent study [14]. The objectives of this study are to evaluate the utility of the factor analytic approach for variance component estimation for MACE and to assess the impact of alternative parameterizations, both PC and FA, for practical prediction of breeding values with MACE.