Factor analysis models for structuring covariance matrices of additive genetic effects: a Bayesian implementation

Multivariate linear models are increasingly important in quantitative genetics. In high dimensional specifications, factor analysis (FA) may provide an avenue for structuring (co)variance matrices, thus reducing the number of parameters needed for describing (co)dispersion. We describe how FA can be used to model genetic effects in the context of a multivariate linear mixed model. An orthogonal common factor structure is used to model genetic effects under Gaussian assumption, so that the marginal likelihood is multivariate normal with a structured genetic (co)variance matrix. Under standard prior assumptions, all fully conditional distributions have closed form, and samples from the joint posterior distribution can be obtained via Gibbs sampling. The model and the algorithm developed for its Bayesian implementation were used to describe five repeated records of milk yield in dairy cattle, and a one common FA model was compared with a standard multiple trait model. The Bayesian Information Criterion favored the FA model.


INTRODUCTION
Multivariate mixed models are used in quantitative genetics to describe, for example, several traits measured on an individual [6][7][8], or a longitudinal series of measurements of a trait, e.g., [23], or observations on the same trait in different environments [19]. A natural question is whether multivariate observations should be regarded as different traits or as repeated measures of the same response variable. The answer is provided by a formal model comparison. However, it is common to model each measure as a different trait, 482 G. de los Campos, D. Gianola leading to a fairly large number of estimates of genetic correlations [7,8,19]. A justification for this is that the multiple-trait model is a more general specification, with the repeated measures (repeatability) model being a special case. However, individual genetic correlations differing from unity is not a sufficient condition for considering each measure as a different trait. While none of the genetic correlations may be equal to one, the vector of additive genetic values may be approximated reasonably well by a linear combination of a smaller number of random variables, or common factors.
Another approach to multiple-trait analysis is to redefine the original records, so as to reduce dimension. For example, [25] suggested collapsing records on several diseases into simpler binary responses (e.g., "metabolic diseases", "reproductive diseases", "diseases in early lactation"). Likewise, for continuous characters, one may construct composite functions that are linear combinations of original traits. However, when records are collapsed into composites, some of the information provided by the data is lost. For instance, consider traits X and Y. If X + Y is analyzed as a single trait, information on the (co)variance between X and Y is lost.
Somewhere in between, is the procedure of using a multivariate technique such as principal components or factor analysis (PCA and FA, respectively), for either reducing the dimension of the vector of genetic effects (PCA) or for obtaining a more parsimonious model without reducing dimension (FA). Early uses of FA described multivariate phenotypes, e.g., [21,24]. PCA and FA have been used in quantitative genetics [1,3,5,11], and most applications consist of two steps. One approach, e.g., [3], consists of reducing the number of traits first, followed by fitting a quantitative genetic model to some common factors or principal components. In the first step, a transformation matrix (matrix of loadings) is obtained either by fitting a FA model to phenotypic records or by decomposing an estimate of the phenotypic (co)variance matrix into principal components. These loadings are used to transform the original records to a lower dimension. In the second step, a quantitative genetic model is fitted to the transformed data. Another approach fits a multiple trait model in the first step [1,11], leading to an estimate of the genetic (co)variance matrix, with each measure treated as a different trait. In the second step, PCA or FA is performed on the estimated genetic (co)variance matrix. However, as discussed by Kirkpatrick and Meyer [10] and Meyer and Kirkpatrick [15], two-step approaches have weaknesses, and it is theoretically more appealing to fit the model to the data in a single step.
This article discusses the use of FA as a way of modeling genetic effects. The paper is organized as follows: first, a multivariate mixed model with an Factor analysis of random effects 483 embedded FA structure is presented, and all fully conditional distributions required for a Bayesian implementation via Gibbs sampling are derived. Subsequently, an application involving a data set on cows with five repeated records of milk yield each is presented, to illustrate the concept. Finally, a discussion of possible extensions of the model is given in the concluding section.

A COMMON FACTOR MODEL FOR CORRELATED GENETIC EFFECTS
In a standard FA model, a vector of random variables (u) is described as a linear combination of fewer unobservable random variables called common factors (f), e.g., [12,13,16]. The model equation for the i th subject when q common factors are considered for modeling the p observed variables can be written as, or, in compact notation, Above, u i = u 1i , ..., u pi ; Λ = λ jk is the p × q matrix of factor loadings; .., f qi is the q × 1 vector of common factors peculiar to individual i, and δ i = δ 1i , ..., δ pi is a vector of trait-specific factors peculiar to i. From (1) the equation for the entire data can be written as, where u = (u 1 , ..., u n ), f = (f 1 , ..., f n ), and δ = (δ 1 , ..., δ n ). Equation (1) can be seen as a multivariate multiple regression model where both the random factor scores and the incidence matrix (Λ) are unobservable. Because of this, the standard assumption required for identification in the linear model, i.e., δ i ⊥ f i , is not enough. To see that, following [16], let H be any non-singular matrix of appropriate order, and form the expression Λf = ΛAHH −1 f = Λ * f * , where Λ * = ΛH and f * = H −1 f. This implies that (1) can also be written as u i = Λ * f * i + δ i so that neither Λ * nor f * are unique. In the orthogonal factor model this identification problem is solved by assuming that common factors are mutually uncorrelated. However, even with this assumption, factors are determined up to an orthonormal transformation only. To verify this, following [16], let T be an orthonormal matrix such that T T = I.
where Ψ = Cov (δ i ) and Λ * = ΛT . This means that, to attain identification, factor loadings need to be rotated in an arbitrary q-dimensional direction. The restrictions discussed above are arbitrary and not based on substantive knowledge; because of this, the method is particularly useful for exploratory analysis [9,12,13].
In addition to the restrictions described above, maximum likelihood or Bayesian inference necessitate distributional assumptions. The standard probability assumption for a Gaussian model with orthogonal factors is where "iid" stands for "independent and identically distributed", and Ψ, of order p × p, is assumed to be a diagonal matrix. Combining (1) and (3), the marginal distribution of u i is, Consider now a standard multivariate additive genetic model for p traits measured on each of n subjects where y i = y i1 , ..., y ip , is a p×1 vector of phenotypic measures taken on subject i (i = 1, ..., n); β and u f are unknown vectors of regression coefficients and of additive genetic effects, respectively; X i and Z i are known incidence matrices of appropriate order, and ε i is a p × 1 vector of model residuals. Stacking the records of the n subjects, the equation for the entire data set is, where y = (y 1 , ..., y n ) , X = (X 1 , ..., X n ) , Z = Diag {Z i }, u = (u 1 , ..., u n ) , and ε = (ε 1 , ..., ε n ) . A standard probability assumption in quantitative genetics is, where R 0 and G 0 are each p × p (co)variance matrices of model residuals and of additive genetic effects, respectively, and A is the n × n additive relationship matrix.
Factor analysis of random effects 485 Assume now that (2) holds for the vector of additive genetic effects in (5) so that where Λ is as before, and f and δ are interpreted as vectors of common and specific additive genetic effects, respectively. Combining the assumptions of the orthogonal FA model described above with those of the additive genetic model leads to the joint distribution where Ψ (p × p) is the (co)variance matrix of specific additive genetic effects, assumed to be diagonal, a stated earlier. Note that in (8), unlike in the standard FA model, i.e., (3), different levels of common and specific factors are correlated due to genetic relationships. With these assumptions, the conditional distribution of the data, given β, u and R 0 is Alternatively, using (2), one can write

Bayesian analysis and implementation
In a multivariate linear mixed model, a Bayesian implementation can be entirely based on Gibbs sampling because, under standard prior assumptions, the fully conditional posterior distributions of all unknowns have closed form, e.g., [20]. It turns out that in the model defined by (7) and (8), and under prior assumptions to be described below, all fully conditional distributions have closed form, and a Bayesian analysis can be based on a Gibbs sampler as well. Next, the prior assumptions are described, and the fully conditional distributions required for a Bayesian implementation of our FA model via Gibbs sampling are presented.

Prior distribution
Let λ = Vec (Λ), and consider the following specification of the joint prior distribution (omitting the dependence on hyper-parameters, for ease of notation) The prior distribution of the genetic effects implied by (7) and (8) is where the randomness of u is made explicit to the left of the conditioning bar. Next, assume bounded flat priors for β and λ; an inverted Wishart distribution for R 0 , with scale matrix S R0 and v R prior degrees of freedom, denoted as IW p ( R 0 | S R0 , v R ), and independent scale inverted chi-square distributions for each of the diagonal elements of Ψ, denoted as χ −2 Ψ j j v j , S j , j = 1,. . . , p. With these prior-assumptions, and using (9a) as sampling model, the joint posterior distribution is

Fully conditional posterior distributions
In what follows, when deriving fully conditional distributions, use is made of many well-known results for the Bayesian multivariate linear mixed model; a detailed description of these results is in [20].
From (11), the joint fully conditional distribution of location effects is proportional to where "else" denotes everything in the model that is not specified to the left of the conditioning bar (i.e., data, hyper parameters and all other unknowns). The expression above is recognized as the kernel of the fully conditional distribution of location effects in a standard multivariate mixed model. Therefore, the fully conditional distribution of β , u is as in the standard multivariate mixed model, that is, wherer 1 and C 1 are the solution vector and coefficient matrix of the following standard mixed model equations: Similarly, from (11), the fully conditional distribution of the residual (co)variance matrix is proportional to which is the kernel of the fully conditional distribution of the residual (co)variance matrix in the standard multivariate mixed model. Thus, and E = ε 1 , ..., ε p is an n × p matrix, in which the column ε j is an n × 1 vector of residuals for trait j.
Consider now the fully conditional distribution of the parameters of the FA model. From (7), (8) and (11), the fully conditional distribution of the parameters of the FA model is proportional to where F = f 1 , ..., f q is a matrix of n × q common factor values. From (14a) the fully conditional distribution of the vector of common factors is proportional to, This is the kernel of the fully conditional distribution in a Gaussian model of random effects, f, with incidence matrix (I n ⊗ Λ), u as "data", model residual (co)variance matrix A ⊗ Ψ and prior distribution of the random effects N f|0, A ⊗ I q . Therefore, the fully conditional distribution of the common factors is wheref and C 2 are the solution vector and coefficient matrix, respectively, of the following mixed model equations: Similarly, from (14b), the fully conditional distribution of the vector of factor loadings λ is proportional to which is the kernel of the fully conditional distribution in a Gaussian model of "fixed" effects λ with bounded flat priors; incidence matrix F ⊗ I p , residual (co)variance matrix A ⊗ Ψ, and u as "data". Therefore, the fully conditional posterior distribution of the vector of factor loadings is the truncated multivariate normal process (truncation points are the bounds of the prior distribution of λ) where,λ and C 3 are the solution and coefficient matrix, respectively, of the linear system Finally, from (15a), the fully conditional distribution of the variances of the specific factors is Above, u j and λ j are the vector of random effects for the j th trait and the j th row of Λ, respectively. Hence, the fully conditional posterior distributions Here, δ j = u j − Fλ j is a vector of specific effects for the j th trait. The preceding developments imply that one can sample location parameters (β and u) and the residual (co)variance matrix with a Gibbs sampler for the standard multivariate linear mixed model, with G 0 = ΛΛ + Ψ. Once u has been sampled, the parameters of the common factor model can be sampled using (15), (16) and (17). In practice, the Gibbs sampler can be implemented by sampling iteratively along the cycle: location parameters u, β using distribution (12), residual (co)variance matrix using distribution (13), vector of common factors using (15), vector of factor loadings using (16); if desired, rotate loadings, and, variances of the specific factors using (17).

FA OF GENETICS EFFECTS: APPLICATION TO REPEATED RECORDS OF MILK YIELD IN PRIMIPAROUS DAIRY COWS
The concepts are illustrated by fitting an FA model to data consisting of five repeated records of milk yield on each of a set of first lactation dairy cows. In particular, a one common factor structure is used to model the random effect of the sire on each of the five traits, and this model is compared with a multiple trait (MT) model. In a one common factor model for five traits, the (co)variance matrix of the sire effects is modeled using 10 parameters (5 loadings and 5 variances of the specific factors), that is, 9 more dispersion parameters that in a repeatability model, but 5 less parameters than in the standard MT model, i.e., unstructured G 0 .

Data and methods
Data consisted of five repeated records of MY on 3827 first lactation daughters of 100 Norwegian red (NRF) sires having their first progeny test in 1991 and 1992. Only complete records (i.e., five test day records) of cows with a first calving in 1990 through 1992, and from herds with at least five daughters of any of these bulls were included. Data was pre-adjusted with predictions of herd effects as described in [4]. First lactation was divided into five 60-day periods starting at calving. For each cow, a test-day record (the one closest to the mid-point of the period) was assigned to each period.
A standard multiple trait sire model for this data set is MY i jk = µ k + s ik +ε i jk , where µ k (k = 1, ..., 5) is a test-day-specific mean, s ik is the effect of sire i on trait k, (i = 1, ..., 100), and ε i jk is a residual specific to the k th record of the j th daughter ( j = 1, ..., n i ) of sire i. The probability assumption was standard, as in (6), with A now being the additive relationship matrix due to sires and maternal grand sires.
A single common genetic factor model for this data specifies s ik = λ k f i + δ ik , so that the equation for the k th record on the j th daughter of sire i is, MY i jk = β k + λ k f i + δ ik + i jk , with probability assumption as in (8) Both models were fitted using a collection of R-functions [18] written by the senior author 1 that can be used for fitting multivariate linear mixed, and some R functions that were created to sample the unknowns of the FA structure. R-packages used by these function are: MASS [22], MCMCpack [14] and Matrix [2]. Post Gibbs analysis was performed using the coda package of R [17].

Results
Posterior means of the log-likelihoods were −19 706.57 and −19 696.85 for the FA and MT models, respectively, indicating that both models had similar "fit". The BIC FA,MT was −21.81, indicating that the data favored the FA model over the MT model. Table I shows posterior summaries for test-day means. Posterior means and posterior standard deviations were similar for both models, and this is expected because the FA model imposes no restriction on the mean vector. Table II shows posterior summaries for the vector of loadings and the variances of the specific factors in the FA model. The posterior mean of loadings increased from the first lactation period (0.751) to the second lactation period (0.984) and decreased thereafter. The sire variances of the specific factors were all small; those for test-days 1 and 5 were the largest. The relative importance of specific and common factors can be assessed by evaluating the proportion  of the sire variance due to common factors (called communality). In this case, the contribution of common factors to the sire variance on a trait is obtained by squaring the factor loading on the trait. Communality evaluated at the posterior means was high for all traits, ranging from around 0.88 (first and fifth lactation periods) to around 0.96 (lactation periods 2, 3 and 4). Table III shows posterior summaries of the dispersion parameters. Estimates of the residual (co)variance components were similar between the two specifications; again, this is expected because the FA model imposes no structure on such parameters. However, for sire (co)variance components, some differences between estimates from the two specifications were observed. These differences arise because of the restrictions imposed by the FA specification. Another consequence of those restrictions is that posterior standard deviations were smaller in the FA specification.

CONCLUDING REMARKS
Multivariate linear mixed models are increasingly important in animal breeding, because the number of traits included in genetic evaluation programs has increased steadily over time. When the number of traits is large, FA can provide a useful way of structuring (co)variance matrices without reducing dimensionality. A more parsimonious specification is expected to lead to smaller posterior standard deviations, but may show lack of fit. Since the FA model imposes restrictions on the parameterization of the standard multiple trait model, a natural benchmark for evaluating the goodness/lack of fit of the model is the multiple trait model. In the example presented here, the "lack of fit" of the FA model (mainly due to differences in estimates of the sire (co)variance components) was more than overcome by the parsimony of the specification.
Although we focused on the orthogonal common factor model, only minor modifications are needed for confirmatory factor analysis schemes, which may be of interest in some applications. The model presented here did not consider the possibility of missing values. However, it can be shown that the fully conditional distribution of missing values of the model presented here is as in the standard multivariate linear mixed model (e.g. [20]). Similarly, the model can be easily extended to include generalized-linear models (e.g., probit, multithreshold, censored log-normal).
Finally, although we addressed the use of FA for modeling genetic effects only, one may consider using the same strategy for modeling permanent environmental random effects, or model residuals. Extension to such models does not pose special difficulties. Similar ideas may also be used in the context of random regression models, with random regression coefficients modeled as functions of common and specific factors.