Irreducibility and efficiency of ESIP to sample marker genotypes in large pedigrees with loops

Markov chain Monte Carlo (MCMC) methods have been proposed to overcome computational problems in linkage and segregation analyses. This approach involves sampling genotypes at the marker and trait loci. Among MCMC methods, scalar-Gibbs is the easiest to implement, and it is used in genetics. However, the Markov chain that corresponds to scalar-Gibbs may not be irreducible when the marker locus has more than two alleles, and even when the chain is irreducible, mixing has been observed to be slow. Joint sampling of genotypes has been proposed as a strategy to overcome these problems. An algorithm that combines the Elston-Stewart algorithm and iterative peeling (ESIP sampler) to sample genotypes jointly from the entire pedigree is used in this study. Here, it is shown that the ESIP sampler yields an irreducible Markov chain, regardless of the number of alleles at a locus. Further, results obtained by ESIP sampler are compared with other methods in the literature. Of the methods that are guaranteed to be irreducible, ESIP was the most efficient.


INTRODUCTION
QTL mapping includes the estimation of the locations of QTL, of the magnitudes of the QTL effects, and of the frequencies of QTL alleles. When QTL genotypes cannot be observed, marker genotypes are used together with trait phenotypes to map QTL by marker-QTL linkage analysis.
Typically, the mixed model of inheritance is used in linkage analyses. Under this model, the trait is assumed to be influenced by a single QTL linked to a marker (MQTL) and the remaining QTL are assumed to be unlinked to the marker (RQTL). Further, methods and programs (e.g. Loki) are also available for multiple QTL. The additive effects of the RQTL are usually assumed to be normally distributed. Under this model the marker-MQTL parameters can be estimated by likelihood or Bayesian approaches.
Both these approaches require computing the likelihood for the model given the observed pedigree, marker genotypes and trait phenotypes. Except for small pedigrees (less than 20 individuals), it is not feasible to compute the exact likelihood for the mixed model of inheritance [1,7,10,11]. Therefore, alternative models have been adopted for which the likelihood can be computed efficiently [1,7,28], or approximations of the likelihood for the mixed model of inheritance are used [10,11,20]. However, these approaches are limited because they cannot easily accommodate more general models.
Markov chain Monte Carlo (MCMC) methods have been proposed to overcome these limitations. In the application of MCMC to likelihood and Bayesian methods, samples are obtained from the conditional distributions, given the observed data, for the missing marker genotypes, the MQTL genotypes, and the additive effects of the RQTL [9,15,31,33]. Further, in the Bayesian approach samples are also obtained from the posterior distribution of the parameters in the model [15,31,33].
The scalar Gibbs sampler provides the easiest method to sample genotypes, where each genotype of an individual is sampled conditional on the genotypes of all the remaining pedigree members. Due to the Markov property of pedigrees [24], the genotype of an individual depends on only its phenotype and the genotypes of its neighbors -parents, spouses, and offspring. Because of this Markov property, the Gibbs sampler is easy to implement. However, Thomas and Cortessis [31] used a hypothetical example to show that when a marker locus has more than two alleles, sampling using the scalar Gibbs sampler may not yield samples from the conditional distribution because the resulting chain may not be irreducible. A chain is said to be irreducible if the probability is nonzero for moving between any two points in the state space in a finite number of steps.
Even when the chain is irreducible, samples may be highly correlated, which is called slow mixing. This is due to the dependence between genotypes of parents and progeny, with larger progeny groups causing greater dependence [15]. One strategy that was proposed to overcome this problem is the use of blocking Gibbs, which consists of sampling a block of genotypes jointly [15,17]. Although blocking Gibbs improves mixing, it does not result in a chain that is guaranteed to be irreducible [16]. Ideas to jointly sample the genotypes in complex pedigrees were independently proposed by Heath [13] and Fernández et al. [5]. These approaches propose to use an approximate method to obtain candidates that are accepted or rejected by a Metropolis-Hastings step. Heath [13] stated that the approximate peeling method of Thomas [30] seems to be a promising proposal distribution to obtain those candidates. Fernández et al. [5] proposed to use a "modified" pedigree as a proposal distribution. This "modified" pedigree is obtained by cutting the loops [29] and extending the pedigree at the cuts [34]. It has been shown that results obtained by "cutting" and "extending" the pedigree can also be obtained by iterative peeling without explicitly modifying the pedigree [34].
Fernández et al. [6] implemented a sampling method that combines Elston-Stewart algorithm and iterative peeling, which is called ESIP, to sample genotypes jointly from the entire pedigree. In Fernández et al. [6], the mixing properties of ESIP for a trait genotype were examined and documented. In this paper, we show that ESIP results in an irreducible and aperiodic chain even when sampling genotypes at a marker locus with more than two alleles. Here we present a brief description of the method of sampling, a proof that the resulting chain is irreducible and aperiodic, a strategy to improve the efficiency of the sampler, and a comparison of the proposed method with other methods.

METHOD FOR SAMPLING GENOTYPES JOINTLY
The method to sample genotypes jointly has been described in detail by Fernández et al. [6]. Here, only a brief description is provided to introduce the concepts necessary to prove irreducibility and aperiodicity.
When the pedigree does not have loops or the pedigree contains only simple loops, the entire pedigree is peeled using the Elston-Stewart algorithm [3]. Then genotypes are sequentially sampled using reverse peeling [14,15,17]. If the pedigree has complex loops, exact peeling is not feasible [16] and a joint sample is obtained from a pedigree modified to make peeling feasible [6]. This modified pedigree is used to generate the candidates in a Metropolis-Hastings algorithm [12,23].
This approach to jointly sample marker genotypes is now illustrated with the small pedigree shown in Figure 1a, where the marker genotypes m 3 and m 4 for individuals 3 and 4 are missing.
This pedigree is simple enough to be peeled exactly. However, to illustrate the proposed method the pedigree can be modified as shown in Figure 1b, where individual 4 * has been introduced to remove the loop. This individual is assigned the same genotype as 4, i.e., 4 * is assigned a missing genotype. A pedigree that is modified by duplicating a single individual as shown in Figure 1b will be referred to as a "cut" pedigree. In a cut pedigree, there are two kinds of individuals: those that correspond to individuals in the original pedigree and those that are introduced. Now, the missing genotypes for the original individuals in the cut pedigree can be sampled by reverse peeling [6,14,17]. For example in Figure 1b which is computed using the Elston-Stewart algorithm [3,6]. Then, m 3 is sampled from This gives a joint sample for m 3 and m 4 from Pr(m 3 , m 4 |m 1 , m 2 , m 5 , m 6 ).
In general, the missing genotypes for the original individuals are sampled conditional on the observed genotypes. This sample from the cut pedigree is either accepted or rejected according to Metropolis-Hastings algorithm as described below.
We use a special case of the Metropolis-Hastings algorithm known as the independence sampler. Let y be the vector of observed genotypes and m the vector of missing genotypes. In this algorithm, the candidate draw is accepted with probability where π(x) is the probability of sampling x from the pedigree in Figure 1a conditional on y, q(x) is the probability of sampling x from the pedigree in Figure 1b conditional on y, m c is the candidate sample obtained from the pedigree in Figure 1b, and m prev is the last vector of genotypes that was accepted. In general, the probability π(m) can be computed as where To compute q(m) we multiply the probabilities that were used in the sampling process. For this example, q(m) is q(m) = Pr(m 4 |y) Pr(m 3 |m 4 , y). (3)

Proof of irreducibility and aperiodicity
Let I be the state space for the vector of unobserved genotypes in the unmodified pedigree, and let m i and m j be two arbitrary states from I. The Markov chain for sampling genotypes is irreducible if the probability of moving from m i to m j in a finite number of steps is nonzero. We show below that for the ESIP sampler, the probability of going from m i to m j in one step is nonzero. This probability of going from m i to m j is Note that π(m i ) > 0 and π(m j ) > 0 because m i and m j are in I. Further, as shown in the Appendix, if π(m) > 0 then q(m) > 0. So in (4), η(m i , m j ) > 0 and q(m j ) > 0, and thus Pr(m j |m i ) > 0. This shows that the chain has a nonzero probability of moving from any state m i to any other state m j in a single step. Thus, this proves that the chain is irreducible and aperiodic.

IMPROVING EFFICIENCY
Sampling genotypes as described above can be inefficient in a pedigree with many loops. To illustrate, consider the case of a biallelic marker locus with alleles M 1 and M 2 . In the pedigree in Figure 1a, the marker genotypes of individuals 3 and 4 are unobserved. To sample genotypes we introduce individual 4 * to remove the loop (Fig. 1b). Assume that the genotypes of 1, 2, To compute η we also need q(m prev ). This quantity has already been calculated from a previous round of the sampler. Further, we need the probabilities π(m c ) of the candidate sample m c and π(m prev ) of the accepted sample m prev from the previous round. Computing π(m c ) is straightforward using (2). Again, π(m prev ) has already been computed in the previous round of sampling.
Suppose that m 3 was sampled as M 2 M 2 and m 4 as and π(m c ) = 0 because individual 4 with genotype M 2 M 2 cannot have offspring 5 with genotype M 1 M 1 . As a result η = 0 and the candidate sample will be rejected with probability 1. We showed earlier that π(m c ) > 0 implies q(m c ) > 0, but this example shows that q(m c ) > 0 does not imply π(m c ) > 0. The probability of getting a candidate rejected increases with the number of loops.
One strategy to improve efficiency of the sampler is to minimize the number of loops that are cut. When peeling is applied to a pedigree, intermediate results are stored in multidimensional tables called "cutsets" [2]. In a pedigree without loops, the largest cutset has dimension two. In a pedigree with loops, some cutsets have dimension greater than two. Depending on the pedigree, peeling can be efficient as long as the dimension of the largest cutset is about seven [6]. In the ESIP sampler, exact peeling is applied until the cutset size is too large for efficient computations. To proceed further, loops are cut.
A second strategy to improve efficiency of the sampler consists of extending the pedigree at the places it was cut. Wang et al. [34] have shown that the approximation to the likelihood obtained by cutting loops is improved when the pedigree is extended as shown in Figure 2. So, it seems reasonable to expect that cutting loops and extending the pedigrees will also reduce the probability of getting a candidate rejected. In Figure 2 the pedigree is extended by including individuals 5 * and 6 * as offspring of individuals 4 and 3 * . A pedigree modified by duplicating more than a single individual will be referred to as a "cut-extended" pedigree.
The probabilities of getting a rejected sample were obtained for the cut pedigree shown in Figure 1b and for the cut-extended pedigree shown in Figure 2. As before, it was assumed that individuals 1,2,5 and 6 have genotypes M 1 M 2 , M 1 M 2 , M 1 M 1 and M 1 M 2 , respectively. The gene frequencies were assumed to be 0.5 for each allele. The probabilities of getting a rejected sample were 0.333 for the cut pedigree and 0.111 for the cut-extended pedigree. "Cutting" and "extending" the pedigrees is difficult and the degree of difficulty increases as the loops are larger and more complex. In practice, the pedigree does not have to be extended explicitly. In Wang et al. [34] it was shown that genotype probabilities computed by iterative peeling are equivalent to genotype probabilities computed from a cut-extended pedigree. As explained in Fernández et al. [6], the ESIP sampler combines the Elston-Stewart algorithm and iterative peeling to sample genotypes jointly from the entire pedigree.
To speed up peeling, genotype elimination was implemented using the algorithm developed by Lange and Goradia [19]. This algorithm is an extension of Lange and Boehnke [18] and consists of identifying all those genotypes that are not consistent with the observed information in the pedigree. These genotypes have zero probability and are removed from the list of genotypes to be summed over in peeling.

PERFORMANCE OF THE ESIP SAMPLER
To assess the performance of ESIP we have compared its efficiency and accuracy with those of other MCMC methods proposed in the literature. One of the methods that is guaranteed to yield an irreducible chain is given by Sheehan and Thomas [24]. In this paper this method will be referred to as the Sheehan-Thomas sampler. Lin et al. [22] and Lin [21] have also proposed two methods for sampling marker genotypes. These will be referred to as Lin1 and Lin2 samplers, respectively. Sobel and Lange [25] have described how samples of descent graphs can be used for linkage analysis rather than samples of descent states. It has been argued that the space of descent graphs is much smaller than the space of descent states. However for comparison with ESIP, as described in Section 5, genotype probabilities can be estimated from a sample of descent graphs. This method will be referred to as the Descent-graph sampler.

Comparison of ESIP and Sheehan-Thomas samplers
Regardless of the number of the alleles at a locus, Sheehan and Thomas [24] have shown that if all penetrance probabilities are non-zero then irreducibility holds. Let π * (m) be the distribution of m given y after all zero penetrance probabilities have been replaced by some small positive probability (relaxation parameter). They showed that if samples are obtained from π * (m) and those for which π(m) = 0 are rejected, then the remaining samples are from π(m). Thus, to overcome the irreducibility problem they proposed to sample from π * (m) and only use samples for which π(m) > 0 to estimate genotype probabilities.
They also showed that if all transmission probabilities are non-zero irreducibility holds. So, an alternative π * (m) to sample missing genotypes from can be obtained by modifying the transmission probabilities and/or penetrance probabilities.
Sheehan and Thomas [24] estimated genotype probabilities by their method for the ABO blood type locus in the fictitious pedigree given in [24] (Fig. 1).
In this pedigree, squares represent males and circles represent females.  [24] explained, these phenotypes were deliberately chosen so that the mated pair [6,9] could be either (AO, BO) or (BO, AO) and these two states do not communicate. The same applies to the pair [10,15]. The assumed allele frequencies for A, B and O alleles are 0.2, 0.1 and 0.7, respectively. Even though this pedigree has loops, it is small enough that exact marginal probabilities can be calculated for all individuals.
Results obtained by the ESIP and Sheehan-Thomas samplers were compared to the true marginal probabilities. Sheehan and Thomas [24] explained that there is a trade-off between the size of the relaxation parameter and efficiency of the algorithm. If the relaxation parameter is too small then the Markov chain has slow mixing because stepping between non-communicating classes has too small a probability. On the other hand, if the relaxation parameter is too large too many samples will be rejected. They presented results for some individuals in the pedigree using different relaxation parameters. Based on those results the value of 0.025 was chosen for the relaxation parameter to estimate genotype probabilities for the entire pedigree.
Different versions of the ESIP sampler were used to compare with results obtained by Sheehan and Thomas [24]. The first version, which is called Direct, consists of peeling exactly the whole pedigree and then samples are obtained directly from the target distribution by reverse peeling. When the proposal is obtained by exactly peeling the pedigree until the cutset size is k and then iterative peeling is applied to the remainder, the sampler is called ESIP-k. For this pedigree, k = 2 and k = 3 were also used for comparison. The length of the chain for the three cases (Direct, ESIP-3, and ESIP-2) was 10 000 with no burn-in period.
The mean difference between Sheehan-Thomas sampler and the true marginal probabilities is 1.8 × 10 −3 , and the largest difference is 1.1 × 10 −2 . The total number of simulations for the Sheehan-Thomas sampler was 175 830 with a rejection rate of 94.31%, which yields a total of 10 000 legal samples. Also, genotype probabilities were obtained by the Direct, ESIP-2, and ESIP-3 samplers and compared to the true marginal probabilities. Detailed tables that show the difference mean, range and standard deviation by genotype are given in Fernández [4]. The mean difference with the Direct sampler is 1.6 × 10 −3 , and the largest difference is 1.1 × 10 −2 . The mean difference with the ESIP-2 sampler is 1.9 × 10 −3 and the largest difference is 1.2 × 10 −2 . The rejection rate for this sampler was 24.5%. For ESIP-3 (with 10 000 samples), the mean difference is 1.4 × 10 −3 and the largest difference is 1.1 × 10 −2 . These values are the same as the results obtained for the Direct sampler. The rejection rate for ESIP-3 was 6.5%. These differences show that the ESIP sampler yields results with the same level of accuracy than Sheehan-Thomas sampler. Also, the rejection rates for the ESIP sampler are much lower than Sheehan-Thomas sampler.
The accuracy of the estimates obtained by ESIP greatly improve as the number of samples is increased. For ESIP-3, the mean differences are 3.1×10 −4 and 1.2 × 10 −4 , for chain lengths of 100 000 and 1 000 000, respectively. The largest differences are 3.1 × 10 −3 and 1.6 × 10 −3 , respectively. The accuracy of the Sheehan-Thomas sampler may not increase when the number of samples is increased because it is well known that Gibbs sampler has slow mixing [6,8,15,17].
ESIP was run using a Pentium Pro-200. The computing times were 90, 36 and 12 s for ESIP-2, ESIP-3, and Direct, respectively. Sheehan and Thomas [24] used a SUN SPARC station SLC and the reported computing time is 344.64 s. But, it is difficult to compare the computing times of ESIP and Sheehan-Thomas because different computing systems were used. However, as explained below, for a single locus, the number of samples can be used for comparison instead of using the computing times.
The computing time for ESIP can be split into two components: peeling time and sampling time. Relative to sampling time, peeling time is negligible because it is done only once. Further, for an exactly peeled individual, the computations needed to sample the genotype by reverse peeling are very similar to the computations in the Gibbs sampler [6]. Thus, the number of samples from the Direct sampler are directly comparable to the number of samples from Sheehan-Thomas sampler, which is based on the Gibbs sampler. For this pedigree, the Direct sampler with a chain length of 10 000 yields the same level of accuracy than the Sheehan-Thomas sampler with 175 830 simulations. Therefore, the Direct sampler is more efficient.
As explained below, the number of samples from ESIP when iterative peeling is applied to a part of the pedigree, cannot be directly compared with the number of samples from the Sheehan-Thomas sampler. For the ESIP-k sampler, when an individual that was iteratively peeled has to be sampled, all the cutsets connected to this individual must be recalculated conditional on the genotypes that have already been sampled [6]. This can be very time consuming because iteratively peeled individuals are connected to cutsets that contain a mixture of individuals that are sampled and not sampled. Thus, this recalculation involves summing over all genotypes of the individuals that were not yet sampled conditional on the genotypes that have been already sampled. This process has to be repeated in each sample. On the contrary, when individuals are peeled exactly, all the other individuals in cutsets connected to the individual being sampled have already been sampled. Thus, there is no summing over that needs to be done. This indicates that a large improvement in the efficiency of the ESIP sampler will be possible if all loops in the pedigree are cut when the cutset size of k is reached. After cutting, exact peeling can be applied to obtain samples more efficiently. Briefly, exact peeling is first applied until cutset size is k. Second, iterative peeling is applied to the remaining individuals in the pedigree. Third, all loops in the pedigree are cut. Fourth, exact peeling is continued using the iteratively peeled probabilities where the loops were cut. As shown by Wang et al. [34] this is equivalent to cutting and extending the pedigrees at the cuts.

Comparison of ESIP and Lin1 samplers
Lin et al. [22] presented results obtained by the application of their method in a Volga German family to study Alzheimer's disease. The marker locus for the Alzheimer's disease (D14S43) has three alleles: A, B and C. The frequencies they used were 0.239, 0.760 and 0.001 for the three alleles, respectively. Lin et al. [22] presented results for nine individuals of the pedigree (shown in Fig. 3, [22]).
In the Lin1 sampler, marker genotypes are sampled using the scalar-Gibbs sampler. As described below, the irreducibility problem is overcome by coupling an auxiliary Markov chain that is irreducible with the scalar-Gibbs chain [22].
Let Γ θ be the scalar-Gibbs chain with equilibrium distribution P θ (g θ |y), where g θ denotes a genotypic configuration in the state space G θ of the scalar-Gibbs sampler. Similarly, Γ θ is the irreducible-auxiliary chain with equilibrium distribution P θ (g θ |y), where g θ denotes a genotypic configuration in the state space G θ of the auxiliary chain. These two chains are coupled by switching their states. If an appropriate switching probability is used, it has been shown that the coupled chain Γ * defined on the state space G θ × G θ is irreducible and has equilibrium distribution P θ (g θ |y)P θ (g θ |y). Thus, the {(g i θ )} component of the coupled chain converges to P θ (g θ |y) [22]. Lin et al. [22] showed that for a scalar-Gibbs sampler the chain is irreducible if and only if, each heterozygote genotype, has a positive penetrance probability. In the Lin1 sampler, the auxiliary chain Γ θ was constructed by setting each heterozygote genotype A m A n to have a small positive penetrance probability ρ mn . If ρ mn is too small, the probability of switching is too small. On the other hand, if ρ mn is too large, many of the g θ will have zero probability in the state space G θ , resulting in the switches being rejected. To overcome this, the heated Metropolis algorithm was used to simulate the auxiliary chain. Because a single heated auxiliary chain did not improve mixing in some cases, Lin et al. [22] used multiple auxiliary chains.
The mean difference of the results presented in by Lin et al. [22] and the true marginal probabilities is 9.4 × 10 −4 and the largest difference is 6.0 × 10 −3 . The estimates were obtained from 400 000 samples using three auxiliary chains. Thus, this requires generating four chains, each of length 400 000.
For comparison, a chain length of 20 000 with no burn-in period was used for all the ESIP samplers. Detailed tables that show the mean, range and standard deviation of the difference between the ESIP samplers and the true marginal probabilities, by genotype, are given in Fernández [4]. For the Direct sampler, the mean difference is 7.3 × 10 −4 and the largest difference is 2.2 × 10 −3 . This indicates that this sampler yields results more accurate than the Lin1 sampler. In addition, the number of samples required to obtain this level of accuracy using the ESIP sampler is much smaller than the samples required in the Lin1 sampler.
For the ESIP-2 sampler, the mean difference is 1.1 × 10 −3 and the largest difference is 4.9×10 −3 . The rejection rate for this sampler was 23.86%. These results have the same level of accuracy as the Lin1 sampler.
For the ESIP-3 sampler, the mean difference is 1.0 × 10 −3 and the largest difference is 6.3 × 10 −3 . The rejection rate for this sampler was 15.25%. Thus, the level of accuracy for this sampler with 20 000 samples is the same as the Lin1 sampler with 400 000 samples using three auxiliary chains.
The Lin1 sampler samples one variable at a time from the full conditional, so the number of samples from Lin1 and Direct gives a good measure of efficiency. The same level of accuracy was obtained from the Direct sampler with a chain length of 20 000 and the Lin1 sampler with 400 000 samples. Thus, the Direct sampler is more efficient.
Furthermore, as Lin et al. [22] explained, their approach may not be practical when a locus has more than three alleles because there are a larger number of non-communicating classes. This is not a problem for ESIP.

Comparison of ESIP and Lin2 samplers
The estimates obtained by the ESIP sampler were also compared to those obtained by Lin [21]. She used the same ABO-blood-type pedigree used by Sheehan and Thomas [24] to show the performance of her method. She proposed a method where an irreducible chain is constructed by jumping from one communicating class to another directly without the need of stepping through illegal configurations. This method also requires the explicit identification of non-communicating classes.
Lin [21] estimated genotype probabilities from a chain of length 3 000 cycles. For comparison, a chain length of 3 000 with no burn-in period was used for different ESIP samplers (Direct, ESIP-2 and ESIP-3). The same level of accuracy as for the Lin2 sampler was obtained with the ESIP samplers.
As Lin [21] explained, her algorithm is efficient as long as one can identify individuals in the pedigree who characterize the non-communicating classes. Her method is not a single component algorithm, since the first step is to identify all the non-communicating classes. Thus, as Lin [21] added, one can design blocking Gibbs sampling algorithms to accomplish the same task. The reason is that the identification of all non-communicating classes is also the basis of designing blocking Gibbs samplers.
On the contrary, for the chain generated by the ESIP sampler, all states communicate, and thus, irreducibility is guaranteed. The performance of ESIP sampler was also tested using large and complex pedigrees. For more details see Fernández et al. [6].

COMPARISON OF ESIP AND DESCENT-GRAPH SAMPLERS
The estimates obtained by the ESIP sampler were also compared to those obtained by the Descent graph sampler developed by Sobel and Lange [25].
To estimate probabilities on pedigrees, Thompson [32] proposed an alternative MCMC strategy, where segregation indicators are sampled rather than genotypes. She argued that the advantage of this method is that the space of segregation indicators is much smaller than the space of genotypes, especially for multiallelic loci. Sobel and Lange [25] have implemented such a sampler. This sampler will be referred to as the Descent-graph sampler. Results from the Descent-graph sampler can be used to estimate genotype probabilities. Here, we used the Descent-graph sampler to obtain genotype probabilities for the ABO-blood-type pedigree used by Sheehan and Thomas [24]. The results obtained from this sampler are compared to the true marginal probabilities. The Descent-graph and Direct samplers were run on the same system to compare the computing times. The computing times for 100 00 samples were 2 400 s and 250 s for Descent-graph and Direct, respectively. Thus, Direct is about 10 times faster than Descent-graph. We also ran ESIP-2 and ESIP-3 for 100 000 samples. The computing times were 1 320 and 660 seconds for ESIP-2 and ESIP-3, respectively.
A chain of 10 000 samples was used to obtain estimates from the Descentgraph sampler. The absolute mean difference across genotypes between the estimates and the true marginal probabilities is 3.6 × 10 −3 . The same level of accuracy was obtained form the Direct sampler with a chain of only 500 samples. The mean absolute difference for Direct was 6.2 × 10 −3 . The largest absolute difference for the Descent-graph sampler is 6.2 × 10 −2 , and the largest absolute difference for the Direct sampler is 5.8 × 10 −2 . Thus, these results show that the Direct sampler yields estimates with the same level of accuracy as the Descent-graph sampler in much less time. ESIP-2 and ESIP-3 yielded similar results to those obtained by Direct.
The Descent-graph sampler was also used to estimate probabilities for a biallelic locus in a large half-sib family. The allele frequencies are 0.75 and 0.25 for allele A and a, respectively. The pedigree consists of three founders: one sire and two dams, where each family has 35 offspring. In both nuclear families, the genotype for 34 of the offspring is known, 17 are homozygous AA and 17 heterozygous Aa. The genotype of the parents and one offspring in each nuclear family is unknown. Four different initial descent graphs (Descentgraph (1) , Descent-graph (2) , Descent-graph (3) and Descent-graph (4) ) were used to obtain estimates for this pedigree. In all the initial Descent graphs, the six founder alleles are labeled as: paternal alleles (3,4) and maternal alleles (1,2 and 5,6). The two families are labeled 1 and 2. Family 1 has founder alleles 1,2 and 3,4; and family 2 has founder alleles 5,6 and 3,4. In Descent-graph (1) : the AA offspring of family 1 inherited founder alleles 2 and 4, and the Aa offspring of family 1 inherited founder alleles 1 and 4; the AA offspring of family 2 inherited founder alleles 6 and 4, and the Aa offspring of family 2 inherited founder alleles 5 and 4. In Descent-graph (2) : the AA offspring of family 1  (1) 0.573753 0.426247 0.0 Descent-graph (2) 0.57298 0.42702 0.0 Descent-graph (3) 0.881964 0.118036 0.0 Descent-graph (4) 0.0 1.0 0.0 Offspring SALP 0.499999 0.500000 0.000001 ESIP 0.4998 0.5002 0.0 Descent-graph (1) 0.461184 0.538816 0.0 Descent-graph (2) 0.540265 0.459735 0.0 Descent-graph (3) 0.512912 0.487088 0.0 Descent-graph (4) 0.2425 0.5071 0.2504 Estimates by ESIP are from 10 000 samples. Estimates by Descent-graph are from 1 000 000 samples. Genotype probabilities were also estimated by ESIP, and exactly calculated by SALP [26,27]. Results for two individuals with unknown genotype (one parent and one offspring) are presented in Table I.
This example illustrates that Descent-graph sampler does not have good mixing properties for some pedigrees. For this pedigree, estimates based on 1 000 000 samples from Descent-graph (1) and Descent-graph (2) seem to converge to the true marginal probabilities only for the offspring with unknown genotype. Descent-graph (3) and Descent-graph (4) do not converge to the true marginal probabilities for any of the individuals with unknown genotype. On the other hand, the ESIP sampler converges to the true probabilities with only 10 000 samples.

SUMMARY AND CONCLUSIONS
In this paper we have showed that the ESIP sampler is aperiodic and irreducible. We also have compared the ESIP sampler with other samplers in the literature. The ESIP sampler seems to be more efficient than Sheehan-Thomas, Lin1 and Descent-graph samplers. For the same level of accuracy, the ESIP sampler needed much less samples than the the Sheehan-Thomas, Lin1 and Descent-graph samplers. These samplers are guaranteed to give irreducible chains. ESIP seems to be as efficient as the Lin2 sampler. They have the same accuracy in about the same number of samples, but the Lin2 sampler requires identifying non-communicating classes, which may be impossible in large pedigrees.
The Sheehan-Thomas and Lin1 samplers have addressed the irreducibility problem of the scalar-Gibbs sampler, but those samplers still are very inefficient. As Geyer and Thompson [8] indicated, methods that sample one variable at a time, like scalar-Gibbs sampler, can take long time to obtain a representative sample of genotypic configurations. This problem is even more evident in large pedigrees because the time increases exponentially with the number of individuals in the pedigree. This is not a problem for the ESIP sampler, which updates the genotypes jointly. Furthermore, the ESIP sampler has been tested in large pedigrees and it seems to perform well [6].
The Descent-graph sampler of Sobel and Lange [25] was not designed for computing genotype probabilities, but in this paper we used it to obtain genotype probabilities to compare with ESIP. We have shown that ESIP is more efficient and also that the Descent-graph sampler has poor mixing properties for some pedigrees.
In this paper we have examined the properties of ESIP when sampling genotypes at a single locus. ESIP can sample genotypes jointly at multiple linked loci, but this may be inefficient. A better strategy would be to sample genotypes at one locus conditional on other loci, but this method will have horizontal dependence problems. Thus, strategies to overcome horizontal dependence need to be examined. the sampling process, for this proof it is convenient to write it as 2) the summations are over the missing genotypes of the introduced individuals. Also, note that for (a) and (b) q j = π j , when j is a non-introduced founder or a non-introduced offspring of non-introduced individuals. Furthermore, if j is an introduced founder or an offspring (introduced or non-introduced) of introduced parents, where all of them have known genotype, then q j > 0. For (c), (d) and (e) individual j is a non-founder, therefore π j = Pr(m j |m m j , m f j ).
As shown below, for (c), π j = Pr(m j |m m j , m f j ) > 0 implies q j > 0. First, Pr(m m j * ) > 0 for all m m j * , and second, when m m j * = m m j , Pr(m j |m m j * , m f j ) = π j . Thus the term in (c) that corresponds to m m j * = m m j is greater than zero. Further, the other terms in (c) are greater than or equal to zero. So clearly, q j > 0. Similarly, also for (d), π j = Pr(m j |m m j , m f j ) > 0 implies q j > 0. Also for (e), as shown below, π j = Pr(m j |m m j , m f j ) > 0 implies q j > 0. The term in (e) that corresponds to m j * = m j is π j , which is greater than zero. The other terms in (e) are greater than or equal to zero, and so q j > 0. From (2), π(m) > 0 implies that π j > 0 for all j. Further, as shown above, for (a) and (b) q j = π j , and for (c), (d) and (e) π j > 0 implies q j > 0. So, from (A.1), q(m) > 0. In this term all of the factors are from π(m), and therefore it will be greater than zero whenever π(m) > 0. In general, q(m) can be written as a sum of factors, and for one of its terms the missing genotype for each i * will be equal to the sampled genotype for i. In this term all of the factors are from π(m), and therefore it will be greater than zero whenever π(m) > 0. The other terms in q(m) are non-negative, therefore q(m) > 0 whenever π(m) > 0.