Bayesian analysis of calving ease scores and birth weights

Dans une procedure typique a deux etapes, l'evaluation genetique pour la difficulte de velage dans un modele a seuil est conditionnee par les matrices de covariance genetiques et residuelles. Ces matrices de covariance sont habituellement estimees au travers d'approximations analytiques. On decrit l'echantillonnage de Gibbs permettant d'effectuer des inferences bayesiennes completes a propos des effets fixes, des valeurs genetiques, des seuils, et des matrices de covariance genetiques et residuelles, pour analyser conjointement un caractere discret a categories multiples ordonnees (note de difficulte de velage) et un caractere continu gaussien (poids de naissance). L'echantillonnage de Gibbs est assez simple a partir de densites de divers types : normale (tronquee), uniforme et Wishart inverse. La methode est utile pour estimer les parametres genetiques a partir de leurs distributions marginales a posteriori, apres prise en compte des incertitudes concernant les autres parametres. L'echantillonnage de Gibbs n'est pas faisable en routine pour estimer les valeurs genetiques. On suggere le mode de la distribution conjointe a posteriori, pour des valeurs des seuils et des parametres de dispersion correspondant a leurs moyennes a posteriori. On decrit une analyse de notes de difficulte de velage et de poids de naissance simules.

making implementation of Gibbs sampling straightforward. The method should be useful for estimating genetic parameters based on features of their marginal posterior densities taking into full account uncertainties in estimating other parameters. For routine, largescale estimation of location parameters (breeding values), Gibbs sampling is impractical.
The joint posterior mode given the posterior mean estimates of thresholds and dispersion parameters is suggested. An analysis of simulated calving ease scores and birth weights is described. dystocia / beef cattle / threshold model / Bayesian method / Gibbs sampling Résumé -Analyse bayésienne des notes de difficultés de vêlage et des poids de naissance. Dans une procédure typique à deux étapes, l'évaluation génétique pour la diff'-cculté de vêlage dans un modèle à seuil est conditionnée par les matrices de covariance génétiques et résiduelles. Ces matrices de covariance sont habituellement estimées au travers d'approximations analytiques. On décrit l'échantillonnage de Gibbs permettant d'effectuer des inférences bayésiennes complètes à propos des effets fixes, des valeurs génétiques, des seuils, et des matrices de covariance génétiques et résiduelles, pour analyser conjointement un caractère discret à catégories multiples ordonnées (note de difficulté de vêlage) et un caractère continu gaussien (poids de naissance). L'échantillonnage de Gibbs est assez simple à partir de densités de divers types : normale (tronquée), uniforme et Wishart inverse. La méthode est utile pour estimer les paramètres génétiques à partir de leurs distributions marginales a posteriori, après prise en compte des incertitudes INTRODUCTION Calving ease is considered a calf trait and recorded subjectively as one of several exclusive ordered categories. For example, for American Simmental cattle, calving ease is scored as 1 (natural calving, no assistance), 2 (easy pull), 3 (hard pull) or 4 (mechanical force or Cesarean). Calf size (birth weight) affects ease of birth: the bigger the calf is, the more likely the birth will be difficult (Koger et al, 1967;Pollak, 1975).
In this paper, we consider joint modeling of calving ease scores and birth weights using the threshold model concept of Wright (1934). In a threshold model, an underlying continuous variable is postulated for calving ease. A set of thresholds divides this continuous variable into the discrete calving ease scores actually recorded. Gianola (1982) and Gianola and Foulley (1983) considered Bayesian analysis of single trait threshold models assuming known genetic variance. Harville and Mee (1984) and Foulley et al (1987) gave approximate methods for variance component estimation. Foulley et al (1983) developed a method to deal with a binary trait and two continuous traits without allowing for missing data, while Janss and Foulley (1993) extended the method to handle data with missing patterns. In 1990 at Cornell University, a system for routine sire evaluation of calving ease scores and birth weights jointly allowing for all possible missing data framework was implemented. This system assumed a sire-mgs (maternal grandsire) linear model for the underlying scale; it predicts the frequency of unassisted births for American Simmental cattle (Pollak et al, 1995 pers comm). This evaluation system also assumed that genetic and residual covariance matrices and thresholds were known. Variance components were estimated (Dong et al, 1991) (1983) and Janss and Foulley (1993) to a situation of one multiple ordered categorical trait and several continuous traits.
A difficulty in estimating parameters under threshold models is that the likelihood or marginal posterior distributions do not have closed forms and approximations are used. With the help of Monte Carlo methods, in particular Gibbs sampling (Geman and Geman, 1984;Gelfand et al, 1990), these approximations are no longer needed. Wang et al (1993, 1994a,b) described making Bayesian inferences in a univariate linear model in an animal breeding context using Gibbs sampling. Sorensen et al (1994) demonstrated how inference about response to selection in a linear model can be made. Berger et al (1995) applied the methods of Sorensen et al (1994) and Wang et al (1994b) to analyze a selection experiment of Tribolium. Jensen et al (1994) and Van Tassell (1994) extended the procedure to model maternal effects, while  further expanded the scope to multitrait linear models. Bayesian analysis of univariate thresholds via Gibbs sampling in an animal breeding context was recently described by Sorensen et al (1995) by extending Albert and Chib (1993). For a binary trait,  compared frequency properties of three variance component estimators: mode of approximate marginal likelihood (Foulley et al, 1987), marginal posterior mode and mean via Gibbs sampling. Jensen (1994) analyzed simulated data of one binary trait and one continuous trait via Gibbs sampling under a Bayesian framework. Wang et al (1995) gave a Bayesian method to analyze one multiple ordered categorical trait and one continuous trait with Gibbs sampling.  presented Bayesian analysis of twinning and ovulation rate using Gibbs sampling.
The purpose of this paper is to extend the work of Sorensen et al (1995) and Wang et al (1995) to one multiple ordered categorical trait (calving ease) and one continuous trait (birth weight) with all possible missing patterns of data under a Bayesian setting via Gibbs sampling. A set of full conditional posterior densities will be derived in closed form facilitating straightforward implementation of Gibbs sampling. Simulated data are analyzed to illustrate the methodology. MODEL Let Y lo be a vector of birth weights (BW), with o denoting observed record, and Y 20 be calving ease scores (CE, recorded as one of four scores (1 = no assistance, 2 = easy pull, 3 = hard pull and 4 = mechanical force or Cesarean), U lo = Y lo and U 2o be the corresponding underlying variable for observed calving ease scores, which is also known as augmented 'data' (Tanner, 1993). A sir!mgs model used for the American Simmental calving ease sire evaluation (Quaas, 1994) is assumed on the underlying scale (an animal model can be assumed by appropriately defining terms): where a i is sire effect (direct), and m m g s is maternal grandsire effect (1/4 direct BV plus 1/2 maternal BV) and e lo and e 20 are residual effects. AgeDam is age of dam effect and CG is contemporary group effect. Note that maternal effects for BW are not modeled for the Simmental population because the maternal contribution to the total genetic variance was found to be negligible (Garrick et al, 1989 where U contains U l and U z , W is composed of the design matrices -X i s, Z i s and rows of zeros associated with missing data, e contains location parameters 13 1 , 13 2 , a I , a 2 and m z , and e, the residuals e l and e z , with all the elements properly ordered and matched. For U, we assume where R is a block diagonal matrix containing n submatrices of residual covariances (R o ) for the record(s) of a particular animal with dimension 2, ie, R = R o &reg; 1,, if the data are sorted by animal and trait, and n is the number of animals with at least one trait recorded.
A uniform prior distribution is assigned to [3 i s, such that (eg, Gianola et al, 1990;Wang et al 1994a) which is similar to treating 0 as fixed in a traditional sense. We assume for the bull effects (genetic): where a contains a l , a 2 and m, q is the number of bulls (sires and mgs), G = G o l8iA with Go = Igij 1, i, j = 1, 2, 3, the covariance matrix among three genetic effects for a particular animal and A is the numerator relationship matrix among sires and mgs.
To describe prior uncertainty about Go, an inverted Wishart distribution (Johnson and Kotz, 1972;Jensen et al, 1994; is assigned with density where Sg is the location parameter matrix; vg is the scalar shape parameter (degrees of belief); Sg = E(G o iSg, Vg). A large value of vg indicates relative certainty that Go is similar to Sg; a small value, uncertainty, ie, a relatively flat distribution. (The subscript 3 of IW indicates the order of the covariance matrix.) Similarly for R o , The final parameters are the thresholds: t = (t l , t 2 , t 3 ), with to = -00 and t 4 = oo. These are assumed to be distributed as order statistics from a uniform distribution in the interval !tm;n, t max] (Sorensen et al, 1995): where I(.) is an indicator function and c = 4 in our case, the number of categories.
Applying Bayesian theorem, the joint posterior density of all the parameters including the augmented data (8, t, Go, R o , e lm , U 2 ) given the observed data (Y' = !Yio!'io!) and prior parameters, assuming prior independence of t, Go and with w li and/or W2 i the incidence vectors associated with 8 1 and/or 0 2 for the ith animal's record(s), respectively.
The first term in [8], p(Y 2oI U 2o , t), has a degenerate density (Albert and Chib, 1993;Sorensen et al, 1995): where I(.) is an indicator function. For example, for a particular CE score (= k), we have One way to ensure identifiability of parameters is to set constraints. Assuming full column rank for the incidence matrix W, two constraints need be specified. Usually, one threshold and the residual variance for CE on the underlying scale are set to 0 and 1, respectively (Harville and Mee, 1984). An equivalent parameterization (Sorensen et al, 1995) is to fix two thresholds and to estimate the CE residual variance. We followed the latter because it allows easy specification of the conditional density of R o . The two parameterizations, though equivalent, may not yield the same joint posterior density owing to different sets of priors specified.
Inference about location and dispersion parameters will be based on the joint posterior density [9], or on their respective marginal posterior densities. For example, if interest of inference is on the location parameters, we need to integrate out all other parameters in [9] other than e to obtain its marginal posterior density: Similarly, inference about Go is based on: These densities cannot be derived analytically. Monte Carlo methods, such as Gibbs sampling, draw samples from !9!. Such samples, if considered jointly, are from the joint posterior distribution or, viewed marginally, from an appropriate marginal posterior distribution. Inferences can be based on these drawn samples. Inferences about functions of parameters, such as heritabilities and genetic correlations, can be made based on transformed samples.

Fully conditional posterior densities (Gibbs sampler)
The Gibbs sampler consists of a set of fully conditional posterior densities of unknown parameters in the model, ie, the conditional density of a parameter given all other parameters and the data. These can be derived from the joint posterior density [8] or !9! . For location parameters (0), we keep terms in [8] that are functions of 0 such that: where S2 = L ! G_1 A -1 ' with blocks of Os corresponding to j3 (Gianola et al, lo Gol(&A-11, l 1990). This is a normal density, so where 6 satisfies Henderson's mixed model equations (MME) (Henderson, 1973(Henderson, , 1984: To sample a subvector or a scalar of 0, rewrite the MME as where C = {CZ! }, i, j = 1, 2, ... , N, is the coefficient matrix of the MME, with C jj s as blocks of C, and b = {b i }, i = 1, 2,..., N is the corresponding right-hand.  Sorensen, 1996). From !9!, the full conditional posterior density of the genetic covariance matrix, as in Gaussian linear models , is Similarly, the fully conditional posterior density of the residual covariance matrix is Now we proceed to derive the full conditional posterior densities of the underlying variable for CE, U Zo , used in !15! and of the missing residuals, e lm and e 2m , needed for SS e in !17!. From [8], in general, These distributions depend on which combination of records is observed for a calf: BW only, CE only or both BW and CE. For a particular calf, if a BW is observed (U lo ,i = Y l .,i) but CE is not, we need only to sample e 2m , i for CE, the distribution is only involved with p(UI8, Ro), which follows a univariate normal distribution with density: where 0(.) is a normal density function; / -t = be lo ,i = b(u lo , i -w!i8¡); w 12 is the incidence vector associated with A 1 ; Q 2 = r 22 -rî2/rll; b = r 12/ r l {r2!} = Ro.
If both BW (U li = Y ii ) and CE (Y 2i = k)) are observed, then only U 2o , i needs to be sampled, This is in a form of univariate truncated normal (TN) distribution such that with tk-1 < U20,i ! tk , where p = w!i82 + b(U1o , i -w!0i), ! as in [18] and w 21 is the incidence vector associated with 8 2 .
If only a CE (Y 2i = k) is observed, then both e lm , i and U 20 , 1 need to be sampled from a truncated bivariate normal, ie, Finally, the conditional posterior distribution of a threshold is uniform (Albert and Chib, 1993;Sorensen et al, 1995), if CE is not missing: if CE is missing.
As mentioned previously, for t = (t l , t 2 , t 3 ), there is only one estimable threshold, which to estimate is arbitrary. We took: Note that t l < t 2 . If only three categories of CE scores are available, there is no need to estimate thresholds under this parameterization. If the fourth category was rare, it would be tempting to combine scores into three categories to avoid estimating thresholds. sampler for our model. Gibbs sampling repeatedly draws samples from this set of full conditional posterior distributions. After burning-in, such drawn numbers are random samples, though dependent, from the joint posterior density !9!. Let the Gibbs samples of length m for a particular parameter, say for the direct genetic variance component g ll for BW, be x = {xi}, i = 1, 2 ... , m. An estimate of the mean of the marginal posterior density, p(g nI Y), is: and the posterior variance can be estimated by:

GIBBS SAMPLING AND POST-GIBBS ANALYSIS
Modes and medians can also be used to estimate location parameter of a posterior density (Wang et al, 1993), though usually requiring more Gibbs samples because the density needs to be estimated first. Both Geyer (1992) and used by Sorensen et al (1995).

ROUTINE GENETIC EVALUATION
The preceding sections describe a Bayesian analysis via Gibbs sampling for inferences about all the parameters in the model including fixed effects, (functions of) breeding values, genetic and residual covariance matrices, and thresholds based on marginal posterior densities. This is sensible because all uncertainties in estimating other parameters are taken into account when inference is made about a particular parameter of interest, say for the breeding value of a sire. However, it is computationally expensive to carry out large scale analyses routinely. A practical compromise is to estimate covariance matrices and thresholds using a full Bayesian analysis via Gibbs sampling once and subsequently to estimate location parameters based on conditional densities of location parameters given the estimated dispersion and threshold parameters. As data accumulate, covariance matrices and thresholds are reestimated. Explicitly, we suggest a two-stage procedure that might be useful for a large scale routine genetic evaluation program: 1) Estimate Go, R o and t using mean (mode or median), based on their respective marginal posterior densities, p(Go!Y), p(R oI Y) and p(t!Y), via Gibbs sampling, dropping prior parameters Sg, S e , V9 and Ve in notation for convenience; and 2) Estimate location parameters based on the following conditional density: which is an approximation to the corresponding marginal density: if the marginal density, p(t, Go, Ro !Y), is symmetric or peaked (Box and Tiao, 1973;Gianola and Fernando, 1986). In other words, if there is sufficient information in the data to estimate G o , R o and t well, then [25] is a good approximation to !26!.
This would be the case if the data set is one of the national data bases, for example, the American Simmental data set. Note that the underlying variable for CE (U 20 ) and the residual vector associated with the missing data (e im and e 2m ) have been integrated out of the joint posterior density !9!; that W matrix no longer contains blocks of Os corresponding to the missing data, however, the same notation is kept below. Note also that o and m denoting observed and missing data are dropped from the notation because missing data no longer play a role.
The joint posterior mode of [25] can be considered as point estimates for e, which is also known as maximum a p osteriori, or MAP for short . There are at least two ways to compute the MAP estimates: expectation-maximization (EM) (Zhao, 1987;Quaas, 1994Quaas, , 1996 and Newton-Raphson Foulley et al, 1983).
The EM iteration equation (Quaas, 1996) (Janss and Foulley, 1993;Quaas, 1994Quaas, , 1996, following closely the notation of Quaas (1996) (BIF, 1990) is: If information contained in the data conflicts with prior belief, then posterior variance could be larger than prior variance resulting in a negative accuracy. This is peculiar to a frequentist, particularly to a producer, why after collecting data on his animal, the uncertainty about his animal's BV has increased! Posterior variances of breeding values are usually approximated, based on large sample theory, by the inverse of Fisher's (expected) information matrix or by the inverse of negative Hessian matrix. The latter approximation is: where l = log{[25]}. This is the inverse of the coefficient matrix of [29]. Note that [27], the inverse of the coefficient matrix used for EM, is not a large sample approximation to the posterior variance matrix of (25!, or, at least, not a very good one. We shall return to this point in the numerical example section below.

NUMERICAL EXAMPLE
A data set representing continuous BW and discrete CE scores with ordered categories, 1-4, of 'Simmental calves' was simulated and analyzed to illustrate the methodology.

Model
The records were simulated under the sire-maternal grandsire model [1]. Model equations for the kth calf with ith bull as sire and jth bull as maternal grandsire were for BW with only direct effects and CE underlying variable with both direct and maternal effects, respectively. The random vectors [a lk , a 2k , m 2k] were drawn from a N(0, Go) distribution, ie, bulls were unrelated. Similarly !2lijk e 2zjk] -N(0, Ro). Parameter values were similar to previous estimates from the Simmental data (Dong et al, 1991). A residual correlation of 0.6 and a genetic correlation matrix, were scaled to obtain R o and Go. Assuming the phenotypic SD for BW was nine units and hb w = 0.4 ! afl! = 1/4 x 0.4 x 81. The CE components, a a2 and a m2' were assumed equal and computed by solving h!e = 2a x 2 4 a2 + x a e2 2 for a; = !a2 2 = a;' . 2 2Q! -I-Uez The residual SD which establishes the scale of U 2 was set to ten to be comparable with BW; hfl! = 0.2. The dispersion matrices were As found in previous analyses of Simmental data (Zhao, 1986(Zhao, , 1987Quaas et al, 1988), thresholds were equally spaced at 1 SD intervals: t i = 0, t 2 = 10 and t 3 = 20. Calving ease scores were assigned such that Y 2zj£ = k if t k _ 1 < U 2 ijk < t k .

Design
There were 75 bulls in three batches of 25. Bulls in the first batch were MGS of the calves sired by bulls in the second batch; bulls 26-50 had two progeny from daughters of each of the first 25 bulls. Likewise, bulls 51-75 (third batch) had two progeny from daughters of each of the second 25 bulls (second batch). Thus bulls 1-50 were each MGS of 50 calves; bulls 26-75 were each sires of 50 calves; only bulls 26-50 (second batch) were both sires and MGS. There were a total of 2500 calves, each with a BW and a CE score.

Gibbs sampling
Priors for 1 -1, t and R o were uniform while that for Go was the inverted Wishart in [5] with Sg a diagonal matrix of the genetic variances used for simulation and vg = 5. The latter 'slightly informative' prior was adopted because during initial testing of the programs, a nearly singular SSg would cause the sampler to crash.
The last Go would look reasonable but would have an effectively zero eigenvalue.
The informative prior was incorporated to ensure (SSg + Sg) was positive definite.
Subsequently, the problem was found in the simulation program where both h 2 were almost zero; Go was nearly singular! The safeguard, however, was left in the program.
The initial dispersion matrices were diagonal matrices of the variances used for simulation, bull effects were null and the average BW was assigned to !1. Thresholds and p 2 were computed from the observed frequencies: p 2 = 10 x !-!-1(Cl)! and t k = 10 x [-t -'(C k ) -!-1(Cl)! where C k is the observed cumulative frequency of scores 1 through k, k = 1, 2, ... 3. The values for t l (= 0) and t 2 were fixed; t 3 was estimated. Initial values are displayed in table I. From these starting values the parameters were repeatedly sampled in the following order: U 2ijk from [19], t 3 from (21!, ( J -L 1 p 2 ) jointly from [15], (a 12 a 2 i m 2 ) jointly from (15], Go from IW 3 (SSg+Sg, 75+5), (16), and R o from IW 2 (SS e , 2 500-3), (17!. Each bull's three effects were sampled jointly as were the two ps. Most of the results to be presented came from a single chain of 10 500 samples; all summary statistics were computed from the last 10 000. We feel a single chain of this length is sufficient to demonstrate the procedures but would probably not suffice for a real application. A few references will be made to results from replicated Gibbs chains and replicated data. Our purpose here, however, was not to study Monte Carlo error, single long chain versus multiple short chains nor the small sample properties of posterior means; no systematic study was carried out on these replications. Comparisons of frequentist properties of estimators of certain parameters are found in Jensen (1994) and .
The most time consuming step was sampling the CE underlying variables. This was due partly to the number of these but also because of the way the truncated normals in [19] were generated, ie, normals were sampled until one was obtained within the range determined by the CE score. The time for a complete cycle could almost triple; this variability was entirely due to generating the truncated normals. A more efficient truncated normal generator might be necessary for a large data set in which the probability of some categories is very small, eg, a score of four for a heifer calf out of a mature dam. In several million records, there will be a handful of these and they will cause problems. Starting values were the same as in the Gibbs sampler. Iteration was continued until log io of the maximum absolute change of any bull effect was < -10 at which point the modal estimates of p and bull effects computed by the alternative algorithms differed by < 10-l0 .

NUMERICAL RESULTS
Visual inspection of plots of parameters (or functions of parameters) against sample number suggests that a burn-in of 500 cycles was probably unnecessary (eg, fig 1). The parameters in figure 1 were chosen to illustrate the markedly different patterns observed as a result of correlations among sequential Gibbs samples. There were marked differences among parameters in these serial correlations (table II) which were particularly high for the threshold, t 3 , and also for the CE residual variance, r 22 . In contrast, mixing for the other r ij , all 9ij and the bull effects was much more rapid. However, overall convergence of a Gibbs chain is tied to parameters with slow mixing properties. In another words, one cannot declare that a Gibbs chain is converged for some parameters but not others.
Estimates of posterior means and SDs are presented in table I. The parameters involving the continuous trait, BW, were reasonably close to the values used to simulate the data. This was not the case, however, for the parameters affecting CE. The dispersion parameters, especially, were markedly smaller than the 'true' values. Too much cannot be made of this from a single sample and we cannot conclude that such a pattern is a consequence of using the marginal posterior mean as an estimator. It is quite possible that it is a matter of identifiability. For example, neither t k nor aú is identifiable but (t k -tk_1)/Q!e is identifiable. The 'true' thresholds were equally spaced 1 residual SD apart; analogous functions of t l , t 2 (fixed) and the posterior means of t 3 and r 22 are 0.996 and 0.954, quite close to the 'true' values. Likewise, estimated heritabilities and genetic correlations, table III, differed considerably less from the 'true' values than did the (co)variances.
For this small example precise estimates of the marginal posterior densities were not attempted. Coarse histograms were examined; most were slightly skewed away from zero; in figure 2 are typical examples. None were 'peaked' in the eyes of the authors.
Though the EM and Newton-Raphson algorithms both gave the same values for the joint conditional modes of [25], there was a marked difference in the pattern of convergence: linear versus quadratic (fig 3). The EM algorithm took six times as many rounds, but elapsed times were close: EM < two times longer. For a large, field data application it is questionable as to which algorithm will be faster; we do not expect a large difference. While EM is easier to code, approximations of SDs are not a by-product as they are if the Hessian is computed.
Correlations among bull effects are presented in table IV. The joint modes given the posterior mean estimates of t 3 , Go and R o , [25], were very highly correlated to the marginal posterior means of [26] despite the posterior densities for t 3 and dispersion parameters being neither symmetric nor particularly peaked. The similarity of marginal mean and joint conditional mode is graphically illustrated for the CE-D effect in figure 4. The regression of mode on mean is slightly greater than unity, 1.08, reflecting the greater spread of modes when there is no uncertainty in thresholds nor dispersion parameters. This was seen for all three bull effects. Correlations, within bull batch, were also computed between true and estimated (mode or mean) bull effects. These correlations caused debate among the authors.
The Bayesian amongst us was not sure how to interpret them whereas the lapsed frequentists had no difficulty whatsoever: a big number is good; a small number is not so good. Except for CE-D for batch 1 bulls and CE-MGS for batch 3 bulls where the estimation came entirely from correlated information, these correlations seem satisfyingly large. Of some interest was the observation that the correlations between true and estimated direct effects, both BW and CE-D, were a bit higher for the batch 3 bulls than for the batch 2 bulls. Both had 50 progeny but the latter also had 50 grandprogeny. What seems to be adding information from grandprogeny, apparently, is noise. This could be due to sampling, though the same pattern was seen in three independent samples of simulated data. The posterior SD were also examined. These were approximated in three ways: from the coefficient matrix of the MME used in EM [27!, from the Hessian used in Newton-Raphson [29] and by Gibbs sampling. In applications these are sometimes used interchangeably but they are not, of course, the same. The first is the posterior SD of p(elU 1, Û2, G_o, R_o) or the frequentist's standard error of prediction (SEP) for BLUP p(elU 1, -f J 2 , Go, R o ) note that this assumes the CE underlying variable is observed. The second is the large sample approximation of the joint posterior SD given t 3 , Go and R o . The last is the marginal posterior SD, subject to Monte Carlo error. These are plotted for the bulls' BW effects in figure 5. It is immediately obvious that the first behaves quite differently to either of the latter two. The former depends only on Go, R o and the information that is available, hence a different value for each of the three batches of bulls. The second also depends on Go and R o (and t 3 ,) but also on j!, â l , â 2 and m 2 minimally, hence the values vary a bit but noticeably so only for the bulls with progeny. (Considerably more variation was seen for the CE effects and the SDs were larger, in general, reflecting the fact that U 2 was not actually observed.) In marked contrast, the marginal posterior SD are all over the place. This is not just due to Monte Carlo error, see figure 6 where results from two independent Gibbs samples are plotted. It is due to the dependence of the SD on the posterior means. This is shown graphically in figure 7. Presumably this pattern is due primarily to the uncertainty of the dispersion parameters. Heuristically, a bull with average progeny BW will have zero â 1k regardless of the value of (or uncertainty about) hb w whereas the uncertainty about (SD) a lk for a bull whose progeny BW depart markedly from average will depend on uncertainty about hb w .

DISCUSSION AND CONCLUSIONS
Following closely the work of Sorensen et al (1995), we have extended the Gibbs sampling scheme to make full Bayesian inferences about location, dispersion, and thresholds from modeling one multiple ordered categorical trait (CE) and one continuous variable (BW) with the possibility of missing patterns of data. Inferences about genetic and residual covariance matrices and thresholds are based on their respective marginal posterior densities in a unified fashion without analytical approximations, in contrast to traditional methods based on approximations (Foulley et al, 1987;Harville and Mee, 1984).
Our model can be expanded to include heterogeneous variances in a similar way as for linear models (Gianola et al, 1992). Foulley and Gianola (1996) expanded a log-linear structural model to describe heterogeneous variances (Foulley et al, 1990;San Cristobal et al, 1993) to a single-trait threshold model. This idea can be extended to model heterogeneous variances in our situation as well.
It is advantageous to estimate location parameters including fixed effects and breeding values jointly with dispersion parameters because uncertainties in estimation of dispersion parameters can be taken into account, particularly in small populations. However, in real genetic evaluation systems with data sets of millions of records, joint modeling may be neither possible nor necessary. With good estimates for dispersion parameters and thresholds from the joint posterior density [9] from large data sets, we argue that the conditional posterior density of location parameters [25] with such estimated dispersion parameters as fixed values is a good approximation to the marginal posterior density of location parameters !26!. In our numerical example the joint mode of [25] approximated well the marginal posterior means of [26] even though the dispersion parameters were not very well estimated.
The posterior SD of [26], however, was not well approximated by the inverse of the coefficient matrix of [27] or [29]. The latter are large sample approximations; our data were not numerous.
A main purpose of the data augmentation used in the paper was to result in fully conditional posterior distributions of parameters in standard recognizable forms such that samples were easily drawn. We chose to augment the underlying variables of observed CE (U 2o ) and the residuals associated with missing BW and CE (e lm and e 2m ) l the whole augmented data vector was (e!m U!oe!m)' Another way is to augment the underlying variables of observed CE (U 2o ) and that of missing CE (U 2m ), and continuous data corresponding to missing BW (Y l , r ,); the whole augmented data vector is (Y!m!2o!2m)- Sorensen (1996) gave a parallel treatment of the problem under the latter data augmentation. In general, the whole design matrix W is considered to be fixed. In other words, the design vector w i associated with an observed or a missing (either BW or CE) calf record is assumed to be either fully or partially known and fixed. Our approach to data augmentation was, implicitly, equivalent to treating w i s associated with missing BW and CE as random and to assigning uniform priors to them, and integrating them out of the joint posterior density, as opposed to Sorensen (1996) in which the joint posterior density was conditioned on those w i s of missing BW and CE. We conjecture that our approach has computational advantages while not sacrificing theoretical simplicity, particularly when the missing data rate is high. For example, for American Simmental, about 1/3 of the records have either BW or CE missing.
It is clear also that other combinations of data augmentation are possible, such as ( e lm U 2 o U 2m) (Y!m U!oe!m) or (ei m e2 o ez m ). If no data augmentation is used, fully conditional distributions of certain parameters may not be in recognizable forms and alternative sampling procedures such as rejection sampling need to be used. Zeger and Karim (1991) presented algorithms for a single trait threshold model without data augmentation.
There is a danger in using improper priors in a Bayesian analysis. In a linear model setting, some improper priors induce improper joint posterior densities even though the fully conditional posterior densities are well defined (Hobert and Casella, 1996). We do not know whether or not the uniform priors we used in the numerical example for p, t and R o will induce a proper joint posterior density !9!. The fact that no difficulties were encountered in analysis does not necessarily mean that [9] was suitable. The safe way may be to use a noninformative but proper prior in an analysis.