Performance of the No-U-Turn sampler in multi-trait variance component estimation using genomic data

Multi-trait genetic parameter estimation is an important topic for target traits with few records and with a low heritability and when the genetic correlation between target and secondary traits is strong. However, estimating correlations between multiple traits is difficult for both Bayesian and non-Bayesian inferences. We extended a Hamiltonian Monte Carlo approach using the No-U-Turn Sampler (NUTS) to a multi-trait animal model and investigated the performance of estimating (co)variance components and breeding values, compared to those for restricted maximum likelihood and Gibbs sampling with a population size of 2314 and 578 in a simulated and real pig dataset, respectively. For real data, we used publicly available data for three traits from the Pig Improvement Company (PIC). For simulation data, we generated two quantitative traits by using the genotypes of the PIC data. For NUTS, two prior distributions were adopted: Lewandowski-Kurowicka-Joe (LKJ) and inverse-Wishart distributions. For the two simulated traits with heritabilities of 0.1 and 0.5, most estimates of the genetic and residual variances for NUTS with the LKJ prior were closer to the true values and had smaller root mean square errors and smaller mean absolute errors, compared to NUTS with inverse-Wishart priors, Gibbs sampling and restricted maximum likelihood. The accuracies of estimated breeding values for lowly heritable traits for NUTS with LKJ and inverse-Wishart priors were 14.8% and 11.1% higher than those for Gibbs sampling and restricted maximum likelihood, respectively, with a population size of 578. For the trivariate animal model with real pig data, the estimates of the genetic correlations for Gibbs sampling and restricted maximum likelihood were strongly affected by population size, compared to NUTS. For both the simulated and pig data, the genetic variances and heritabilities for NUTS with an inverse-Wishart prior were overestimated for low-heritability traits when the population size was 578. The accuracies of variance components and breeding values estimates for a multi-trait animal model using NUTS with the LKJ prior were equal to or higher than those obtained with restricted maximum likelihood or Gibbs sampling. Therefore, when the population size is small, NUTS with an LKJ prior could be an alternative sampling method for multi-trait analysis in animal breeding.


Background
Selection of livestock is usually based on a combination of several traits of economic importance that may be phenotypically and genetically related. Multi-trait analysis was introduced in quantitative genetics by Henderson and Quaas [1]. It is based on the simultaneous evaluation of animals for several traits and makes use of the phenotypic and genetic correlations between them. Compared to analyzing each trait separately, the advantages of multi-trait analysis are an increase in prediction accuracy, statistical power and parameter estimation accuracy and decrease in trait selection bias [2][3][4]. In Page 2 of 13 Nishio and Arakawa Genetics Selection Evolution (2022) 54:51 particular, multi-trait analysis can provide more accurate estimations in the case of traits with a low heritability or populations of small size [5]. Accurate estimation of variance components and functional parameters, such as heritabilities and genetic correlations, is important because prediction error variances for predicted random effects increase as the differences between estimated and true values of variance components increase [6]. Accurate estimation of multi-trait variance components that considers the genetic correlations between economicallyimportant traits will contribute to improve the accuracy of genetic evaluation. Restricted maximum likelihood (REML) and Bayesian analyses have become standard estimation methods in animal breeding. Patterson and Thompson [7] first developed the REML approach, which has been widely used for the estimation of multi-trait (co)variance components thanks to the availability of several programs, e.g. MTDFREML [8], VCE [9], REMLf90 [10] or ASREML [11]. From a Bayesian viewpoint, REML is considered as the mode of the joint posterior distribution of all (co) variance components, with noninformative prior densities, once the fixed effects are marginalized by translation invariance functions of the data [12]. However, since the REML estimator relies on an asymptotic distribution, the inferences are valid strictly for a sample of infinite size [13,14]. Therefore, it is difficult to calculate reliable confidence intervals for REML-based variance component parameters [15]. An alternative to REML estimation is a full Bayesian approach through Markov chain Monte Carlo (MCMC) methods, which were introduced in quantitative genetics in the early 1990s [13,16]. Gibbs sampling (GS) is an MCMC method that repeatedly samples from the conditional distributions of one variable when all the other variables are assumed to be known [17]. In practice, GS is frequently used because it does not require the design of a proposal distribution and the procedure is simple to program. In animal breeding programs and in the case of a single-trait model, an inversegamma distribution is used for a prior distribution of variance components but, in practice, such a distribution has two major problems. One is that inappropriate parameters are used to make the inverse-gamma distribution as uniform as possible [13,16,17], and the other is that, if small values are set to make the distribution as least informative as possible such as 0.001, the inversegamma distribution will show a weak peak around 0, which might result in being unintentionally informative [18]. In the case of a multi-trait analysis, the inverse-Wishart (IW) conjugate family of distributions is used as priors for the covariance matrices between traits because the IW distribution is a multivariate generalization of the inverse-gamma distribution [19]. Consequently, the IW prior is expected to have the same problems as the inverse-gamma prior.
The Hamiltonian Monte Carlo (HMC) approach has become a popular alternative MCMC method, which is based on Hamiltonian dynamics used in physics and is a Metropolis strategy for all parameters simultaneously [20]. Hoffman and Gelman [21] developed the No U-Turn Sampler (NUTS), which automatically tunes the hyperparameters required for HMC. Recently, HMC and NUTS were applied to animal breeding [22,23]. In a single-trait analysis, Nishio and Arakawa [23] showed that NUTS performed well for the estimation of variance components, in the case of large effective sample sizes, low autocorrelations, and low skewness of posterior distributions, particularly when the heritability of the trait was low. NUTS can be implemented by probabilistic programming language, such as PyMC and Stan [24]. In addition, conjugate priors are not necessary for NUTS, thus appropriate priors other than the IW priors can be used for the covariance matrix. Therefore, NUTS that uses an appropriate prior might provide more accurate estimates of variance components and breeding values in multi-trait analysis than GS.
We have introduced two estimation methods, i.e. REML and Bayesian analysis, and two computing algorithms of Bayesian analysis, i.e. GS and NUTS. The goal of this study was to compare the performances of REML, GS and NUTS for the accuracy of the estimation of variance components and breeding values. This comparison was based on multi-trait genomic best linear unbiased prediction (GBLUP) models using simulated and real pig data.

Real pig data
Publicly available data including genotypic and phenotypic information on a single Pig Improvement Company (PIC) nucleus pig line were used (https:// acade mic. oup. com/ g3jou rnal/ artic le/2/ 4/ 429/ 60260 60). This dataset is composed of 3534 animals with phenotypes for five traits and genotypes from the PorcineSNP60 chip ( n = 64, 223 ). These phenotypes were already adjusted for environmental fixed effects: sex, farm and year of birth [25]. We used three traits (T1, T2 and T3) and extracted 2314 animals for which records for these three traits were available. The T1, T2 and T3 traits used in this paper correspond to the T1, T2 and T3 traits in the publicly available PIC data. Two scenarios were applied to estimate variance components: Scenario 1 that used the full data (2314 animals) and Scenario 2 that used data from 578 animals randomly selected from the 2314 animals under the assumption that few records were available for the T1, T2 and T3 traits. Criteria to exclude single nucleotide polymorphisms (SNPs) were: a minor allele frequency lower than 0.05, a call rate lower than 0.95 and a Hardy-Weinberg equilibrium cut-off P value lower than 0.001. After quality control, the final data set included 33,860 SNPs. In this study, we used the genomic relationship matrix as an additive genetic relationship matrix that was denoted A in this paper and computed according to VanRaden [26]: where p j is the frequency of the second allele at SNP j , M is the n × N snp matrix ( n is the number of genotyped animals and N snp is the number of SNPs) and the elements ( m ij ) of M for animal i at SNP j are calculated as m ij = g ij − 2p j , where g ij (coded as 0, 1 or 2) is the number of the second allele of animal i and SNP j.

Simulated data
To validate the performances of REML, GS and NUTS for the estimation of variance components, we also generated a simulated dataset from the genotypes of the real PIC pig dataset. Similar to the real data, we used the genomic relationship matrix ( A ) as an additive genetic relationship matrix. We simulated two quantitative traits (trait1 and trait2) by summing up the additive genetic effects a and the residuals e . Thus, the vector of phenotypes was calculated as y = a + e , where the a and e vectors were drawn from the multivariate normal distributions MVN (0, G 0 ⊗A) and MVN (0, R 0 ⊗I) , respectively; G 0 is the n × n additive genetic covariance matrix and R 0 is the n × n residual covariance matrix for n traits. We used the Cholesky decomposition of the covariances G(= G 0 ⊗A) and R(= R 0 ⊗I) to draw samples from the multivariate normal distribution. The random additive genetic effect a was calculated as a = L a z a , where z a ∼ MVN (0, I) and L a is the Cholesky factor L a L ′ a = G ; whereas the residual e was calculated as e = L e z e , where z e ∼ MVN (0, I) and L e is the Cholesky factor L e L ′ e = R . The heritabilities for trait1 and trait2 were set to 0.1 and 0.5, respectively. As for the real PIC pig data, we defined two scenarios with 2314 and 578 animals, respectively. In order to simulate correlated traits with a genetic correlation of 0.3 and a residual correlation of 0.1, the variance components were set as follows: In the statistical analysis, there are overall mean and no fixed effects. For each scenario, 10 replicates were simulated.

Statistical model
Following Henderson and Quaas [1], the multi-trait mixed linear model for n traits can be written as follows: where y i is the phenotype for trait i ; β i is a vector of fixed effects associated with trait i ; a i is a vector of random additive genetic effects associated with trait i ; e i is a vector of residuals with trait i ; and X i and Z i denote the incidence matrices relating the observations to the corresponding fixed and random effects.
n ] ′ , and e = [e ′ 1 , e ′ 2 , · · · , e ′ n ] ′ . Then, the mixed model equation for Model (1) can be expressed as follows: where G and R are the covariance matrices associated with a and e , respectively. Matrix R requires that each animal has either one record for all traits or none at all as is the case in our data.

Estimation of variance components by REML and GS
Variance components were estimated using REML and GS with the airemlf90 and gibbs2f90 software (available at http:// nce. ads. uga. edu/ wiki/), respectively [27]. For REML, first we ran expectation maximization (EM)-REML for all the initial 10 iterations and then switched to average information (AI) in the final iteration because the EM algorithm is much more stable than the AI algorithm and is very robust to poor initial estimates and can thus provide a good starting point for the AI algorithm [28]. Convergence was assumed when changes in the ratios of the corresponding estimates between two consecutive rounds were less than 10 −6 . The asymptotic standard error (SE) was computed following Houle and Meyer [29], as implemented in airemllf90.
For GS, the conditional distribution of y , given that the parameters are assumed to follow a multivariate normal distribution, is as follows: (2) In this study, the fixed effect was not included because the phenotypes were already corrected for environmental factors as described below. The additive genetic effects ( a ) were assigned multivariate normal distributions with a mean vector of zeros: and the residuals ( e ) were assumed to follow: For the covariance matrices ( G 0 and R 0 ), priors were derived from the IW distribution: and where S A and S E are the n × n scale parameter matrices, and v A and v E are the degrees of freedom for the additive genetic and residual covariances, respectively. The IW prior has gained popularity as the conjugate prior for multivariate normal distributions, facilitating computations via GS. In this study, we set S A = I , S E = I , v A = n and v E = n . The posterior distribution for each parameter was obtained by integration of multivariate density functions, considering a single chain with 10,000 iterations. The first 1000 iterations were discarded as burn-in and the thinning interval of the chain was 10. Posterior mean and posterior standard deviation were calculated as the parameter estimates and their SE.

Estimation of variance components by NUTS
In the Stan software, a Bayesian model is implemented by defining its likelihood and priors. Stan is an opensource software, with a publicly available manual online (https:// mc-stan. org/ users/ docum entat ion/). For the NUTS approach, we used RStan, which is the R interface for Stan.
We used a Lewandowski-Kurowicka-Joe (LKJ) distribution as a prior of the correlation. Following the separation strategy of Barnard et al. [30], the covariance matrices ( G 0 and R 0 ) were decomposed as G 0 = A A A and R 0 = E E E , where A and E are the n × n diagonal matrices with the genetic and residual standard deviations, and A and E are the n × n genetic and residual correlation matrices, respectively. For the correlation matrices ( A and E ), priors were derived from the LKJ distribution with one positive scalar shape parameter η [31]: A ∼ LKJ (η) and E ∼ LKJ (η) . Here, we set the shape parameter for the genetic correlation as equal to that for the residual correlation. The posterior density function of the LKJ distribution for A is: and is proportional to the determinant of the correlation matrix raised to the η − 1 power: Thus, the shape parameter η tunes the strength of the correlations; η = 1 leads to a uniform distribution on correlation matrices, while the magnitude of the correlations between components decreases as η → ∞ . In contrast, 0 < η < 1 leads to low correlations. In the current study, the value of η was set to 1 as the base value. Moreover, we investigated the effect of the scalar shape parameter η of the LKJ distribution. In scenario 1, the values of η were set to 0.25, 0.5, 1.0, 2.0 and 4.0.
For efficient calculation, we used Cholesky factor parameters for variance components. Let L A be the Cholesky factor of A : A = L A L ′ A . Let L A and L E be the Cholesky factors of A and E : Thus, the covariance matrices were redefined as: Stan provides an implicit parameterization of the LKJ correlation matrix density in terms of its Cholesky factor. For the Cholesky factors L A and L E derived from the LKJ Cholesky distribution: L � A ∼ LKJCholesky(η) and . The priors of the diagonals of the genetic and residual standard deviations were assigned Cauchy distributions: A ∼ Cauchy(0, 5) and E ∼ Cauchy(0, 5) . Hence, the random additive effects a was reshaped as a = L A z a A L ′ A . When z a ∼ MVN (0, I): Stan provides an implicit parameterization of the multivariate normal density in terms of its Cholesky factor. The conditional distribution of y follows a multivariate normal Cholesky distribution: y|L A , A , E , L A , L E , z a ∼ MVNCholesky Za, E L E , Nishio and Arakawa Genetics Selection Evolution (2022) 54:51 which implies that y|L A , A , E , L A , L E , z a ∼ M VN (Za, R 0 ⊗I) . The RStan code for NUTS with an LKJ prior in RStan is described in Additional file 1. In addition, we used the IW prior for NUTS to investigate whether either the sampling method or the prior, or both, contribute to the performance for the estimation of variance components. The RStan code for NUTS with the IW prior in RStan is described in Additional file 2. For NUTS, 2000 iterations were simulated to obtain posterior distributions and the first 1000 iterations were discarded as the warm-up phase.
Because for all but the most trivial model cases there is no analytical solution, NUTS uses a process called the leapfrog integration to draw a sketch of the posterior probability surface. The failures in this integrator are identified by "divergent transitions", which basically means that the sampler is no longer following the surface of the model correctly [32]. To check for the presence of divergent transitions, after the warm-up phase, we investigated the two important parameters that affect divergent transitions: the number of steps and the tree depth. In Stan, the limits for number of steps and tree depth were set to 1000 and 10, respectively.

Criteria for comparing methods
To investigate the accuracy of the estimation of variance components using the simulated data, we calculated two indices: the root mean square error (RMSE) and the mean absolute error (MAE). These indices of the estimator θ were calculated as follows: where θ is the estimated variance component obtained in each replication, θ is the true value used for the simulation and q is the number of replicates. To avoid the differences of scales between scenarios, the relative RMSE and MAE were set to 1.0 for NUTS. Therefore, the relative RMSE and MAE were calculated by dividing the RMSE and MAE values by those for NUTS.

Accuracy of estimated breeding values
The simulated population was divided into a training and a test population to investigate the accuracy of estimated breeding values. The training and test populations consisted of 2000 and 314 animals in Scenario 1 and of 500 and 78 animals in Scenario 2, respectively. The test population was randomly selected from the last generation. The training population had both phenotypic and genotypic values whereas the test population had only genotypic values. The accuracies of estimated breeding values (12)

Convergence diagnostics for MCMC
Establishing convergence of MCMC is one of the most important steps of Bayesian analysis. We used two MCMC diagnostic tools: the Gelman and Rubin's convergence diagnostic [33] and the Geweke's convergence diagnostic [34], which rely on multiple chains starting at initial points that are drawn from a density that is overdispersed with respect to the target density. Using parallel chains, the convergence diagnostic ( R ) is calculated by comparing the within-and between-chain variances.
A value of R that is much higher than 1 indicates a lack of convergence. A cutoff value of 1.1 is generally used by MCMC practitioners, as recommended by Gelman et al. [35]. In this study, values of R were calculated from three parallel chains. Geweke's convergence diagnostic is based on a test for equality of the means of the first and last parts of a Markov chain. The test statistic is a standard z-score, which is calculated under the assumption that the two parts of the chain are asymptotically independent. The absolute value of the z-score exceeding 1.96 (5% cutoff point of the standard normal distribution) indicates a lack of convergence. We calculated the z-scores using the first 10% and the last 50% as two parts of the Markov chain. The two convergence diagnostic statistics were calculated using the R "coda" package [36].

Comparison of parameter estimates
Compared to GS and REML, the average estimates of genetic variances and residual variances for trait2 obtained using NUTS with an LKJ prior were close to the true values in Scenario 1 ( Table 1), but there was little difference between the estimates for all methods. The relative RMSE and MAE of the residual variances and heritabilities for trait2 were larger using GS and REML than those using NUTS with the LKJ and IW priors (Fig. 1). In Scenario 2, all the estimates obtained using NUTS with an LKJ prior were close to the true values whereas, in contrast to Scenario 1, the estimates with the other methods greatly differed from the true values (Table 2). For all the estimates, the relative RMSE and MAE using NUTS with an LKJ prior were smaller than those with GS and REML (Fig. 2). The relative RMSE and MAE of the estimates for trait1 using NUTS with an IW prior were quite large. The relative RMSE and MAE of some the parameters for the traits with a low heritability were high when η = 0.25, whereas there were no differences in relative RMSE and MAE when η ≥ 0.5 (Fig. 3). The parameter estimates for the real PIC pig data obtained with the trivariate animal models using NUTS, GS and REML in Scenarios 1 and 2 are in Tables 3 and  4, respectively. In Scenario 1, the estimates of the genetic correlations between T1 and T2, and between T1 and T3, differed between the three methods whereas the other parameter estimates were almost the same. In Scenario 2, the differences in the estimates of genetic correlations were larger than those in Scenario 1. In particular, the estimates of the genetic correlation between T1 and T2 and their SE using GS and REML were quite high ( Table 4). The estimates of the genetic variance and heritability using NUTS with an IW prior were higher than those with the other methods. The posterior distributions of GS for the genetic correlations between T1 and T2, and between T1 and T3, were skewed compared to those of NUTS (Fig. 4).
There were no differences in accuracies and RMSE of estimated breeding values between the four methods in Scenario 1. In Scenario 2, the accuracies of estimated breeding values for trait1 using NUTS with the LKJ and IW priors were 14.8% and 11.1% higher than those using GS and REML, respectively ( Table 5). The RMSE of trait2 using NUTS with the LKJ and IW priors were smaller than those using GS and REML.

Performances of MCMC sampling using NUTS and GS
The R values of the Gelman and Rubin's R convergence diagnostics and the z-scores of Geweke's convergence diagnostics in Scenarios 1 and 2 using simulated and real PIC pig data are in Tables S1, S2, S3 and S4, respectively, (see Additional file 3: Table S1, Additional file 4: Table S2, Additional file 5: Table S3 and Additional file 6: Table S4). The convergences of the MCMC samplings using NUTS with the LKJ prior were established. Using NUTS with the IW prior, the R values were smaller than the value of 1.1 set as criterion, but the z-scores for the five parameters in Scenario 2 using simulated data exceeded the criterion value (1.96). Using GS, some of the R values and the z-scores exceeded the criterion values with both the simulated and real PIC pig data. There were no divergent transitions in both the simulated data and real PIC pig data (see Additional file 7: Table S5). The numbers of leapfrog steps ranged from 36.7 to 63.7 and the tree depths ranged from 5.0 to 5.6. These two parameters were below the limit values defined in Stan.

Computing time
Total computing times for REML were much shorter than those for NUTS and GS (See Additional file 8:  Table S6). The computing times per MCMC iteration for NUTS were longer than those for GS in all cases. The total computing times of 2000 iterations for NUTS were similar to those of 10,000 iterations for GS in Scenario 1 for both the simulated and real PIC pig data. In Scenario 2, the total computing times for NUTS with the LKJ and IW priors were 2.5 times longer compared to those for GS.

Discussion
Multi-trait analysis using mixed models tends to be more powerful and to provide more accurate estimates than single-trait analysis because the former method can take the underlying correlation structure that is present in multi-trait data into account. Thus, the estimation of (co)variance and correlation parameters in multi-trait analysis is an important topic in animal breeding programs. However, Bayesian and non-Bayesian inferences for multi-trait mixed models are complex. In this study, we focused on the NUTS approach and implemented NUTS with LKJ and IW priors for a multi-trait animal model using the recently developed software Stan, and compared the results with the commonly used REML and GS methods. The results obtained with the simulated and real pig data indicate that the estimation of genetic parameters for a multi-trait animal model is improved by using NUTS with an LKJ prior, particularly when the population size is small. Moreover, for real pig data, NUTS can provide the unimodal and bilaterally symmetrical posterior distributions of genetic correlations regardless of the level of the heritability. The NUTS approach has two advantages over GS. First, the NUTS algorithm is extremely effective for the MCMC sampling process because it can generate samples from a wide range of parameter spaces with a high level of acceptance probability and automatic tuning of the hyperparameters of HMC. Nishio and Arakawa [22] demonstrated how to use Stan for a single-trait animal model and showed that the mixing properties of Stan were better than the GS with no tuning. Second, conjugate priors are not necessary for NUTS, which opens up the possibility of other potentially beneficial priors. The choice of prior is important and can influence the posterior, particularly when the amount of data is small. A common choice is a conjugate prior, where both the prior and the posterior have the same distributional form. Typically, a conjugate prior is chosen to provide analytical solutions for the posterior and is a requirement for GS. The natural conjugate prior for a multivariate normal distribution is the IW distribution [28]. However, IW priors impose a degree of informativity and the posterior inferences are sensitive to the choice of hyperparameters [37] and there is an a priori dependence between correlations and variances [38]. These characteristics of the prior frequently result in biased estimates in the analysis of small datasets. In this study, we used the LKJ distributions as priors of correlations in the NUTS approach. This is one of the separation strategies in which the standard deviations and correlations are modeled independently and then combined to form a prior on the covariance matrix [30]. Alvarez et al. [39] showed that the separation strategy resulted in a better inference property than the use of an IW prior. In this study, we compared the performance of NUTS with LKJ and IW priors. When the population size was small, the RMSE and MAE of trait1 (trait with a low heritability) using NUTS with an IW prior were notably larger than those using NUTS with an LKJ prior.
In addition, Geweke's convergence diagnostic (z-scores) using NUTS with an IW prior were larger than the criterion value. These results indicate that the performance  of NUTS with an LKJ prior was superior to that of NUTS with an IW prior for estimating variance components and MCMC sampling convergence.
For an LKJ distribution, one positive scalar hyperparameter ( η ) tunes the strength of the correlations. In this study, we varied the values of η from 0.25 to 4.0 in the simulation Scenario 1. The effect of η on the performance of NUTS with an LKJ prior was negligible except when the values of η were very small. Our results indicate that values of η exceeding 0.5 are preferable. The Stan manual also recommends η ≥ 1.
Few studies have compared the performance of REML and GS for the estimation of variance components in a multi-trait analysis. In the animal breeding literature, Van Tassel and Van Vleck [19] reported that the posterior means of GS and REML estimates for additive genetic variances and correlations were quite similar for traits with a high heritability. In the plant breeding literature, Waldmann and Ericsson [40] reported that REML estimates were accurate and that the posterior means of GS were overestimated based on the results of two simulated traits with heritabilities of 0.1 and 0.5, respectively. These results are in concordance with those of our study: the estimates of the additive genetic variances and heritabilities obtained with GS were overestimated when using simulated data, particularly when the population size was small. However, in the simulation study of Mathew et al. [41], GS provided better estimates for the additive genetic correlations than the REML approach with a dataset for traits with a low heritability. The performance of GS could be strongly influenced by a prior with a low heritability. Thus, the choices of inference methods and priors are complex for multi-trait analyses and could be solved by using the NUTS algorithm with an LKJ prior as shown here. The predictions of breeding values by NUTS were superior to those by GS and REML when the population size was small. In animal breeding, there are cases where small datasets need to be analyzed, e.g. for rare breeds that are maintained in small population sizes. In Japan, 18 local public animal experimental stations have performed selection experiments in closed small populations for several generations using estimated breeding values. Moreover, for many difficult-to-measure or expensive traits, such as methane emission, heat tolerance, individual feed intake or immune response, NUTS is a promising sampling method for multi-trait analysis.
In this study, we used genomic information to generate the relationship matrix. The MCMC implementation of the animal model can become extremely slow when using a genomic relationship matrix instead of a pedigreebased relationship matrix. In order to decrease computing requirements, Villemereuil [42] suggested two promising approaches: Integrated Nested Laplace Approximations (INLA) [43] and HMC. Mathew et al. [41] showed that the genetic parameter estimates for the INLA approach and the MCMC method were almost the same in a multi-trait animal model when relationship matrices were dense. They concluded that the INLA approach could be a fast alternative to MCMC methods for multi-trait animal models. Our study showed the computing times of NUTS derived from HMC. The total and per iteration computing times of NUTS were longer than those of GS with both the simulated and real PIC pig data when the sample size was large. Conversely, the convergence performance of NUTS with an LKJ prior was superior to that of GS because the convergence conditions were sufficiently established for both the Gelman and Rubin's R convergence and the Geweke's convergence diagnostics in all scenarios. In addition, the effective sample size for NUTS is much larger than that for GS [22]. These results indicate that the computing time of NUTS with an LKJ prior could be reduced by decreasing the number of MCMC iterations. Recently, Arakawa et al. [23] developed the HMC method with optimized tunings of hyperparameters in a single-trait animal model. This method outperformed GS in terms of sampling from a wider range of parameter spaces. The computing time for their method was similar to that for GS. Thus, further study is needed to apply this method to multi-trait animal models.
Developing a program for NUTS is challenging because of its very complex algorithm; however, this can be overcome by using Stan, which has a simple programming language. In this study, we used Stan because a Bayesian model is implemented by defining its likelihood and priors. Recently, Burkner [44] developed the "brms" package, which allows R users to easily specify a wide range of Bayesian single-and multi-level models that are fitted with Stan. This package allows the writing of models in a relatively straightforward R syntax. Thus, it might be possible to write the program code of a multi-trait animal model easily using brms.

Conclusions
In this paper, we applied the NUTS approach with LKJ and IW priors to a multi-trait animal model and showed its performance for estimating variance components and breeding values. The simulated data showed that, compared to NUTS with an IW prior, GS and REML, most of the estimates of genetic parameters obtained by using NUTS with an LKJ prior were closer to the true values and RMSE and MAE were smaller. These tendencies were remarkable when the population size was small. The convergence performances of MCMC samplings using NUTS with an LKJ prior were superior to those of NUTS with an IW prior and to GS. Moreover, the accuracies of estimated breeding values for NUTS with LKJ and IW priors were higher than those for GS and REML when the population size was small. The real PIC pig data showed that the effect of population size on estimating genetic correlations using NUTS with an LKJ prior was smaller than that using GS and REML. For both the simulated and real PIC pig data, the genetic variances and heritabilities using NUTS with an IW prior were overestimated for traits with a low heritability when the population size was small. Developing a NUTS program for a multi-trait animal model is challenging because of its very complex algorithm but this can be overcome by using Stan and its simple programming language. However, application of NUTS to large datasets requires further study because the NUTS algorithm requires much computing time. Therefore, we conclude that NUTS with an LKJ prior could be an alternative sampling method for multi-trait analysis in animal breeding, particularly when the population size is small.