Failing the four-gamete test enables exact phasing: the Corners’ Algorithm

Gomez-Raya, Luis; Rauw, Wendy M.

doi:10.1186/s12711-022-00763-1

Research Article
Open access
Published: 14 November 2022

Failing the four-gamete test enables exact phasing: the Corners’ Algorithm

Genetics Selection Evolution volume 54, Article number: 74 (2022) Cite this article

2450 Accesses
Metrics details

Abstract

Background

Failing the four-gamete test for two polymorphic DNA markers is an indication that two or three rather than four haplotypes segregate in the population. The objective of this paper is to show that when just three haplotypes are segregating, all three haplotypes can be fully and unambiguously phase-resolved.

Theory and methods

The Corners’ Algorithm tests the four corners in a 3 × 3 table of two-locus genotypes. If one of the four corners is filled with zeroes, then the missing haplotype is identified and the phases of all three haplotypes can be unambiguously resolved for all individuals. Three applications of this method are proposed when the four-gamete test fails: (1) direct estimation of linkage disequilibrium (LD), (2) haplotype-based genome-wide association studies (GWAS) of three haplotypes (single-marker GWAS tests for two out of three haplotypes only), and (3) haplotyping of chromosomal regions that are comprised of pairs of single nucleotide polymorphisms (SNPs) that consist of just three haplotypes. An example based on 435 sows with performance records for total number of piglets born is used to illustrate the methods.

Results

Of 20,339 SNPs, approximately 50% of the pairs of flanking SNPs failed the four-gamete test. For those, the expectation maximization (EM) algorithm gave the same results. The average of the absolute value of the difference in r² between flanking SNPs across the genome between the two methods was 0.00082. Single-marker GWAS (using two of three haplotypes) detected significant associations for total number of piglets born on chromosomes 1, 2, 6, 9, 10, 12, 13, 14, 15, and 18. Haplotype-based GWAS using the third haplotype resolved with the Corners’ Algorithm detected additional significant associations for total number of piglets born on chromosomes 2, 5, 10, 13, 14, 15, and 18. Estimated substitution effects ranged from 0.40 to 1.35 piglets. Haplotyping of chromosomal regions that failed the four-gamete test for any pair of SNPs covered 961 Mb out of the 2249 Mb by the SNP array.

Conclusions

The Corner’s Algorithm allows to fully phase haplotypes when the four-gamete test fails. Longer haplotypes in chromosomal regions in which the four-gamete test fails for any pair of SNPs can be used as a multi-allelic marker with increased polymorphism information content.

Background

Professor Rohan Fernando, together with Professor Grossman, made one of the earliest theoretical contributions to incorporate marker information to traditional pedigree information [1]. They provided equations to compute the covariance between relatives conditional on pedigree and single-marker information. Their pioneering work was performed in 1989 when availability of markers was rather poor and when no one could envision how the landscape of research in genetics would drastically change in just three decades with emerging technologies, such as DNA sequencing and genotyping by arrays of densely positioned single nucleotide polymorphisms (SNPs) that are in gametic phase linkage disequilibrium (LD) with each other and likely with causal loci. Linkage disequilibrium is the non-random association of alleles at two or more loci [2]. It has a great impact on the application of all molecular technologies that attempt to relate genotypes with disease or performance traits across species, including farm animals and humans.

Let us consider two polymorphic loci, A/a and B/b, which result in four possible haplotypes, AB, Ab, aB, and ab. Linkage disequilibrium is commonly defined as $D={f}_{AB}{f}_{ab}-{f}_{Ab}{f}_{aB}= {f}_{AB}{-f}_{A}{f}_{B}$, where ${f}_{i}$ is the frequency of the $i$-th haplotype ($i$ = AB, Ab, aB, ab). Consequently, the allele frequencies of A and B at the two loci are ${f}_{A}={f}_{AB}+{f}_{Ab}$, and ${f}_{B}={f}_{AB}+{f}_{aB}$, respectively. Estimation of LD in diploid species requires resolving haplotype phases for the individuals in the population by direct or inferential methods. Direct methods use specialized experimental techniques applied to genomic DNA derived from a single individual [3, 4], while inferential methods use statistical means to infer haplotypes. In populations of unrelated individuals from a diploid species, LD is often determined using the expectation maximization (EM) algorithm, which assumes Hardy–Weinberg equilibrium [5]. This algorithm is iterative and requires initial estimates of haplotype frequencies to converge to the maximum likelihood estimates. Alternatively, empirical correlation between allele dosages (0, 1, 2) can also be used to estimate LD but it does not provide phasing information. The main problem in resolving haplotypes in a two-locus system in diploids is that the haplotype phase in double heterozygous individuals can only be determined with an associated probability, that is, phases are not exact. Bayesian methods implemented in the software PHASE can improve accuracy of phasing of haplotypes but still does not unambiguously resolve all individual haplotypes [6, 7]. A review of existing phasing methods is given by Browning and Browning [8].

Given the allele frequencies, the maximum absolute value that LD (D) for a pair of loci can attain occurs in five cases:

(I) Allele A is fully associated with allele b ${(f}_{A}={f}_{Ab}$), while allele a is partially associated with both B and b alleles, and therefore, only three haplotypes (Ab, aB, ab) are segregating in the sample;

(II) Allele A is fully associated with allele B ${(f}_{A}={f}_{AB}$), while allele a is partially associated with both B and b alleles, and therefore, only three haplotypes (AB, aB, ab) are segregating in the population;

(III) Allele a is fully associated with allele b ${(f}_{a}={f}_{ab}$), while allele A is partially associated with both B and b alleles, and therefore, only three haplotypes (ab, AB, Ab) are segregating;

(IV) Allele a is fully associated with allele B ${(f}_{a}={f}_{aB}$), while allele A is partially associated with both B and b alleles, and therefore, only three haplotypes (aB, AB, Ab) are segregating;

(V) Allele A is fully associated with allele B ${(f}_{A}={f}_{B}={f}_{AB}$), while allele a is fully associated with allele b ${(f}_{a}={f}_{b}={f}_{ab}$), and therefore, only two haplotypes (AB and ab) are segregating; The same argument can be made for the full LD of allele A with b (haplotypes Ab and aB).

These five cases result in maximum LD given the allele frequencies because in the formula for calculating $D$, the product of either ${f}_{AB}{f}_{ab}$ or ${f}_{Ab}{f}_{aB}$ is zero in each of these five cases. Consequently, given the allele frequencies, segregation of only two or three haplotypes results in maximum LD. This situation fails the four-gamete test, which detects recombination events with four segregating haplotypes that cannot have arisen without either recombination or a repeat mutation. Hudson and Kaplan [9] defined the four-gamete test as “For the infinite site model, the mutation rate for any site is infinitesimal; therefore, at most one mutation event can occur in the history of the sample at that site. Thus, for any two sites there are at most four gametic types in the population. Furthermore, since the model does not allow for back mutation and recurrent mutation, the only way for all four gametic types to be present in the sample is for at least one recombination event to have occurred in the history of the sample between the two sites”. In practical terms, failing the four-gamete test means that there are less than four haplotypes segregating in a two-locus system.

Lewontin [10] proposed another measure of LD, $D^{\prime}$, as the ratio of $D$ to its maximum possible absolute value, given the allele frequencies. If $D^{\prime}=1$, at least one of the four possible haplotypes must be absent, regardless of allele frequencies. Today, the most widely used method to measure LD is ${r}^{2}$ [11], which is defined as the square of the correlation between locus allele dosages in the segregating haplotypes:

$$r^{2}=\frac{D^{2}}{f_{A} f_{a} f_{B} f_{b}}.$$

(1)

This measure is widely used in spite of its dependence on allele frequencies [12, 13]. The values of ${r}^{2}$ range from 0 to 1. The latter value can only occur when either ${f}_{A}={f}_{B}$, or ${f}_{A}={1-f}_{B}$. In fact, in many instances and depending on allele frequencies, the maximum ${r}^{2}$ may take values much lower than 1 [14].

In traditional single-marker genome-wide association studies (GWAS), a measure of the association or statistical dependence between the number of copies for one of the alleles at a SNP and the phenotype investigated is computed and repeated for each SNP. A large number of SNPs sparsely distributed across the genome are used to identify genotype–phenotype associations [15]. GWAS has triggered a vast number of studies that aimed at identifying genes that are responsible for diseases in humans [16]. In spite of its success, GWAS often explains only a small fraction of the observed phenotypic variability, a phenomenon referred to as the missing heritability [17, 18]. Most GWAS have considered one marker at a time but there is great interest and much research efforts in considering multiple makers and/or haplotypes in GWAS [19]. Exact phasing benefits the use of multi-marker GWAS by assigning haplotypes unambiguously to each individual.

The main objective of this study is to show that failing the four-gamete test (i.e., presence of less than four haplotypes) enables exact phasing by a newly developed Corners’ Algorithm. It enables the identification of haplotypes in two-locus systems. Our other objectives are to show applications of exact phasing for situations where the four-gamete test fails: (1) direct estimation of LD, (2) GWAS using haplotypes, and (3) haplotyping of chromosomal regions. An example using Iberian sows that are typed with a low-density SNP array and for which total number born records are available is used to illustrate the proposed methods.

Methods

Theory

Exact phasing of haplotypes when the four-gamete test fails: the Corners’ Algorithm

Genotype counts at two loci (${n}_{ij}$, $i$ = AA, Aa, aa; $j$ = BB, Bb, bb) can be arranged in a 3 × 3 table as shown in Table 1. If one of the four haplotypes is not existing, there will be no observations in one of the four corners. Which of the four corner holds zeroes depends on which of the four haplotypes is missing: haplotypes AB, Ab, aB, and ab for corners I, II, III, and IV, respectively (Table 2). Note that for all genotype pairs, with the exception of ${n}_{AaBb}$ in the center of the table, the haplotype phase can be unambiguously resolved. It must be assumed that corners with zeroes occur because their corresponding double genotypes (and corresponding haplotypes) are missing in the population. It is possible that those double genotypes are just not observed in our sample. However, if the sample size is not small then the chance that double genotypes corresponding to a corner would be observed in the sample but existing in the population is extremely low.

Table 1 Notation for genotype counts at two loci with alleles A/a and B/b, ${{n}}_{{i}{j}}$ (${i}$ = AA, Aa, aa; ${j}$ = BB, Bb, bb)

Full size table

Table 2 Possible two-locus genotype counts when one of the four possible haplotypes (AB, Ab, aB, ab) is missing, as determined by a zero genotype count at one of the four corners of the genotype count table

Full size table

When at least one of the corners has zero observations, double heterozygotes, AaBb, with genotype count ${n}_{AaBb}$, can be unambiguously phased by the Corners’ Algorithm that comprises the following steps:

(1)
Set the genotype counts for two loci in a 3 × 3 table as shown in Table 1.
(2)
Identify any of the four possible situations with one corner without observations, as portrayed in Table 2 (corner I, II, III, or IV). If two opposite corners in the table have zero observations (either I and III, or II and IV), then two haplotypes are missing (only two haplotypes are segregating) and the two loci are in complete LD.
(3)
Identify the missing haplotype corresponding to missing observations in corner I, II, III, or IV.
(4)
Resolve the linkage phase unambiguously for all individuals, including double heterozygotes. This can be done since the haplotypes in Table 2 are only possible when that given non-observed haplotype is excluded (haplotype AB, Ab, aB, and ab for corner I, II, III, and IV, respectively). Specifically, double heterozygotes can be phased as either AB/ab or Ab/aB. However, if haplotype AB is not observed in corner I, since none of AABB, AABb or AaBB are observed, all double heterozygotes must be phased as Ab/aB. The same argument can be made for the other corners.

Direct estimation of LD when the four-gamete test fails

Phasing all haplotypes when the four-gamete fails allows direct estimation of ${r}^{2}$, as described in the following. Consider corner I in Table 2, where the non-observed haplotype is AB. Computing haplotype frequencies from two-locus genotype counts is straightforward when recognizing that the double heterozygote must be phased as Ab/aB:

$${f}_{AB}=0,$$

$${f}_{Ab}=\frac{2{n}_{AAbb}+{n}_{AABb}+{n}_{Aabb}+{n}_{AaBb}}{2N},$$

$${f}_{aB}=\frac{2{n}_{aaBB}+{n}_{AaBB}+{n}_{aaBb}+{n}_{AaBb}}{2N},$$

$${f}_{ab}=\frac{2{n}_{aabb}+{n}_{aaBb}+ {n}_{Aabb}}{2N}.$$

Resulting allele frequencies are:

$${f}_{A}=\frac{2{n}_{AA}+{n}_{Aa}}{2N},$$

$${f}_{a}=\frac{2{n}_{aa}+{n}_{Aa}}{2N},$$

$${f}_{B}=\frac{2{n}_{BB}+{n}_{Bb}}{2N},$$

$${f}_{b}=\frac{2{n}_{bb}+{n}_{Bb}}{2N},$$

where genotypic counts for the three genotypes for each locus are:

$${{n}_{AA}=n}_{AABB}+{n}_{AABb}+ {n}_{AAbb};$$

$${{n}_{Aa}=n}_{AaBB}+{n}_{AaBb}+ {n}_{Aabb};$$

$${n}_{aa}={n}_{aaBB}+{n}_{aaBb}+ {n}_{aabb};$$

$${{n}_{BB}=n}_{AABB}+{n}_{AaBB}+ {n}_{aaBB};$$

$${n}_{Bb}={n}_{AABb}+{n}_{AaBb}+ {n}_{aaBb};$$

$${n}_{bb}={n}_{AAbb}+{n}_{Aabb}+ {n}_{aabb};$$

and the total number of individuals is: N = ${n}_{AA}+{n}_{Aa}+{n}_{aa}={n}_{BB}+{n}_{Bb}+{n}_{bb}$.

Substituting these values into Eq. (1) yields:

$${r}^{2}={\left[{-\frac{(2{n}_{AAbb}+{n}_{AABb}+{n}_{Aabb}+{n}_{AaBb})(2{n}_{aaBB}+{n}_{AaBB}+{n}_{aaBb}+{n}_{AaBb})}{\sqrt{(2{n}_{AA}+{n}_{Aa})(2{n}_{aa}+{n}_{Aa})(2{n}_{BB}+{n}_{Bb})(2{n}_{bb}+{n}_{Bb})}}}\right]}^{2}.$$

LD is computed in the same way for the three other situations, with missing haplotypes Ab, aB, or ab in corners II, III, and IV (see Appendix).

Estimation of haplotype-based GWAS when the four-gamete test fails

The three unambiguously and fully phased two-locus haplotypes for all individuals facilitates the application of haplotype-based GWAS instead of single-marker GWAS to test for association with performance or disease. Consider the single-marker GWAS that tests for the association between a single SNP genotype with performance or disease (Fig. 1). By testing SNP A/a, given a pair of consecutive SNPs (A/a and B/b) and the missing haplotype AB, the single-SNP GWAS compares the performance of all individuals with haplotype Ab (i.e., with allele A) with that of all individuals with haplotypes aB and ab (i.e., with allele a: Fig. 1a). Likewise, when SNP B/b is tested, the performance of individuals with haplotype aB (i.e., with allele B) is compared to individuals with haplotypes Ab and ab (i.e., with allele b: Fig. 1b). This means that haplotypes Ab and aB, but not haplotype ab are contrasted against the two other haplotypes. Thus, the single-SNP GWAS method does not take into account that haplotype ab may have a different effect on phenotype than haplotypes Ab and aB. We propose to use haplotypes that are unambiguously phased with the Corners’ Algorithm, to test the performance of the third haplotype, ab (Fig. 1c). This is achieved by attributing 0, 1, or 2 copies of the third haplotype to each individual, and by comparing the performance of individuals accordingly to the number of copies of this haplotype that they carry. This test assumes that each haplotype is genetically identical among individuals, i.e., individuals that share the same haplotype have the same DNA sequence within the fragment.

Haplotyping of chromosomal regions that fail the four-gamete test for any combination of SNP pairs

When haplotypes for any pair of SNPs in a chromosomal segment can be resolved for all individuals with the Corners’ Algorithm, the phasing can be extended to the rest of the chromosomal fragment. The strategy for phasing is:

(1)
Apply the Corners’ Algorithm to each pair of consecutive and nonconsecutive SNPs within a chromosomal fragment. When the four-gamete test fails for any pair of consecutive or nonconsecutive SNPs, the absence of recombination along the entire fragment is assured in the individuals of the sample;
(2)
Confirm that all pairs of SNPs have either two or three (but not four) haplotypes;
(3)
Phase the heterozygous haplotypes for each pair of SNPs for each individual using the Corners’ Algorithm;
(4)
Align all haplotype phases for all heterozygous SNPs for the entire chromosomal fragment for each individual using the information from point (3);
(5)
Haplotypes for each individual are then completed by filling the alleles for which each individual is homozygous for any SNP within the fragment.

Animal material

Genotypes and phenotypes of total number of piglets born for 435 sows from a composite line of Iberian pigs, Torbiscal, were used to illustrate our methods. The Torbiscal line resulted from the blending of four ancient Spanish and Portuguese Iberian strains and was genetically isolated between 1963 and 2013 [20].

SNP genotyping

DNA was isolated from blood using a standard phenol/chloroform protocol. Samples were genotyped with the Illumina Porcine SNP60 BeadChip [21] and the Infinium HD Assay Ultra protocol (Illumina Inc.). Genotypes at 62,163 SNPs were analyzed with the Genome Studio software (Illumina) using the Sscrofa10.2 assembly, which was the reference map available at the time of genotyping. Data quality control was performed according to the following criteria: the call rate of the sample had to be higher than 0.96; and SNPs were retained if they had a call rate higher than 0.99; a GenTrain score higher than 0.70; and an ABR mean higher than 0.35. SNPs located on the sex chromosomes or with at least one inconsistent inheritance from dam to daughter were also removed. This filtering resulted in 26,359 remaining SNPs. The next step was to move from assembly Sscrofa10.2 to Sscrofa11.1 based on the SNP name. In total, 1054 SNPs were removed because they were not present on both reference maps. In addition, 2447 SNPs had missing genotypes for one or more sows and were removed for all animals. This was necessary so that all animals could be used to construct haplotypes in chromosomal segments that failed the four-gamete test, as missing information in one SNP for one individual prevents exact phasing of the haplotypes for that individual. In total, 2519 SNPs with a MAF lower than 0.05 were removed, leaving 20,339 polymorphic SNPs for further analyses. Therefore, all analyses were performed with SNP positions in reference genome Sscrofa11.1.

Direct estimation of LD when the four-gamete test fails

Linkage disequilibrium was estimated for consecutive SNPs for which the four-gamete test failed and estimates of ${r}^{2}$ using the standard EM algorithm [5] and the Corners’ Algorithm were compared. Fortran 90 routines were written to perform the EM algorithm and the Corners’ Algorithm and are available at https://github.com/lgomezraya/CORNER

GWAS for total number of born piglets in Iberian pigs

Association analysis for total number of piglets born was performed using ASReml [22] with the following mixed model:

$$\mathbf{y}= \mathbf{W}\mathbf{b}+\mathbf{X}\mathbf{g}+\mathbf{Z}\mathbf{a}+\mathbf{e},$$

where $\mathbf{y}$ is a vector of records for total number of piglets born; $\mathbf{W}$ is a matrix allocating the fixed effects of parity and farrowing season; $\mathbf{b}$ is a vector of the fixed effects of parity (parity classes 1 to 6, with parity 6 representing 6 or more parities) and farrowing season; $\mathbf{X}$ is a design matrix allocating records to the haplotype effect (modeled as 0, 1, or 2 for homozygous, heterozygous, and alternate homozygous haplotypes, respectively); $\mathbf{g}$ is the fixed effect of the fitted haplotype; $\mathbf{Z}$ is a design matrix allocating records to individuals; $\mathbf{a}$ is a vector of additive values of the sows, assumed to be randomly distributed ${\sim \mathrm{N}(\boldsymbol{0}, \mathbf{G}\upsigma }_{\mathrm{a}}^{2}$), where $\mathbf{G}$ is a genomic relationship constructed using SNP genotype information following VanRaden [23] and ${\upsigma }_{\mathrm{a}}^{2}$ is the additive genetic variance; and $\mathbf{e}$ is a vector of random error.

In single-marker GWAS, each individual is recorded and has 0, 1, or 2 copies of a specific alleles of the evaluated SNP. For two consecutive SNPs, this means that contrasts are made for two haplotypes, one per SNP (Fig. 1). The Corners’ Algorithm allows phasing and testing the effect of the third haplotype against the other two haplotypes, i.e., testing individuals with 0, 1, or 2 copies of that particular haplotype. Thus, we run three GWAS tests for each pair of consecutive SNPs: two tests for the two single SNPs and a third test after phasing the third haplotype with the Corners’ Algorithm (Fig. 1).

The package “qqman” of the R statistical environment [24] was used to create the Manhattan plots [25]. Accounting for multiple testing in GWAS was performed by: (1) controlling the false discovery rate (FDR), defined as the expected proportion of false-positive associations among all associations that were declared significant [26, 27], and (2) reducing the total number of tests performed to partially accommodate LD between consecutive markers as well as the redundancy in testing the three haplotypes after phasing by the Corners’ Algorithm. Note that contrasts are between the performance for one haplotype against the other two, which are correlated because the tests share haplotypes. In total, 20,339 single-marker GWAS tests were performed with 3578 pairs of consecutive SNPs in full LD (two haplotypes segregating). There were 10,380 pairs of consecutive SNPs that failed the four-gamete test (three haplotypes segregating) as detected by the Corners’ Algorithm. Accounting for multiple testing was done with the Benjamini–Hochberg method [28]. This method starts by ordering the m tests by ascending p-values as estimated in the GWAS procedure. Then, adjusted p-values (P_i) corresponding to the i-th rank of each test are declared significant if ${P}_{i}\le \frac{i}{m}0.05$. That is, genome-wide FDR is set at a significance level of 0.05. This procedure assumes that tests are independent; this assumption is violated since there were SNPs in LD and the tests for haplotypes were not independent. An estimate of the number of effective tests was used instead of the actual number of tests for the calculation of adjusted p-values according to the Benjamini–Hochberg method. The effective number of tests (11,630) was the total number of SNPs (20,339) minus half of the SNPs in full LD (3578) and minus two thirds of the tests that failed the four-gamete test (10,380). This approach may still be conservative because the total number of tests included markers that were in partial LD.

Results

Exact phasing of haplotypes when the four-gamete test fails: the Corners’ Algorithm

The first step was to identify consecutive markers with maximum LD for the given allele frequencies. The Corners’ Algorithm revealed 10,380 consecutive pairs of SNPs with three haplotypes, which represents about 50% of all pairs of consecutive SNPs.

Direct estimation of LD when the four-gamete test fails

Estimation of ${r}^{2}$ was performed using both the EM algorithm and the Corners’ Algorithm for the 10,380 pairs of consecutive SNPs for which the four-gamete test failed. The average of the absolute value of the difference between the two was 0.00082. Figure 2 shows that, regardless of its value, the ${r}^{2}$ estimate was very similar for both methods, except for a few pairs of SNPs with ${r}^{2}$ values close to 0. The largest difference in ${r}^{2}$ estimates was 0.04 or more for 16 pairs of SNPs; in these cases, the genotypic counts showed inconsistencies. For example, SNPs at positions 39,631,490 and 39,638,306 kb on Sus scrofa chromosome 2 had just three genotypes AABB, AABb, and AaBb with 382, 17, and 36 individuals, respectively. The estimate of ${r}^{2}$ was 0.002 and 0.665 for the Corners’ Algorithm and the EM algorithm, respectively. The estimates of haplotype frequencies of Corners’ versus EM algorithms were 0.90 versus 0.94 (haplotype AB), 0.06 versus 0.02 (haplotype Ab), 0.04 versus 0.00 (haplotype aB) and 0.00 versus 0.04 (haplotype ab). Therefore, the Corner’ Algorithm does not assign any haplotype ab to the double heterozygotes whereas the EM algorithm does. Nevertheless, given an equal zygotic mortality for all haplotypes and random mating, one would expect to observe some individuals with genotype AaBB. Thus, at least one of the two SNPs in the pair likely had a genotyping error.

Figure 3 depicts a histogram of the distribution of ${r}^{2}$ estimated by the Corner’s Algorithm for consecutive SNPs for which the four-gamete failed. The estimates of ${r}^{2}$ were far from 1 for the majority of these pairs of consecutive SNPs.

GWAS for total number of born piglets in Iberian pigs

For pairs of consecutive SNPs for which the four-gamete test failed, we performed a single-marker GWAS for each of the two SNPs, as well as a haplotype-based GWAS for the third haplotype. The Manhattan plot showed some relationships between single marker and haplotype-based GWAS, which is expected because the tests are correlated, as each haplotype is tested against the other two (Fig. 4). Table 3 lists the genome-wide significant results at a significance level of 0.05 (chromosomes 1, 2, 6, 9, 10, 12, 13, 14, 15, and 18). Importantly, some significant haplotypes, on chromosomes 2, 5, 10, 13, 14, 15 and 18 were not detected using the single-marker GWAS. The allele or haplotype effects ranged between 0.40 and 1.35 piglets.

Table 3 Results for total number of born piglets using both single-marker and haplotype-based GWAS for pairs of SNPs that failed the four-gamete test

Full size table

Haplotyping of chromosomal regions that fail the four-gamete test for any combination of SNP pairs

Identification of chromosomal regions that have not undergone recombination can be accomplished by testing the four-gamete test for any possible SNP pair (consecutive and non-consecutive) within that region. If the four-gamete test fails in all cases then the entire region can be unambiguously phased. Table 4 shows the haplotypes for individual 65 for all pairs of SNPs that were resolved with the Corners’ Algorithm for the chromosomal region between bp 309,120 and 1,301,402. The SNPs that matter for phasing are the ones for which this individual is heterozygous. Individual 65 is heterozygous for three SNPs at positions 309,120/1,208,316/1,301,402. After applying the Corner’ Algorithm, the phases for double heterozygotes between 309,120 and 1,208,316 are AG/GA, and between 309,120 and 1,301,402 are also AG/GA. Therefore, the haplotypes for these three SNPs are AGG/GAA. This is confirmed by the phase between 1,208,316/1,301,402 (GG/AA) in Table 4. Then, alleles at homozygous SNPs are filled in when constructing the full haplotype of the chromosomal region. Thus, the haplotypes of this individual are AAGCGAAGG and GAGCGAAAA. After haplotyping all individuals, this chromosomal fragment can be considered as a multi-allelic locus site, for which each haplotype represents an allele since no recombination has been observed. Table 5 shows the estimates of the frequencies of the six haplotypes that are segregating in this region. They have a polymorphism information content (PIC) of 0.72 and a heterozygosity of 0.76. A list of all chromosomal regions that failed the four-gamete test is provided in Additional file 1: Table S1 and covered 961.02 of the 2249.32 Mb covered by the SNP array.

Table 4 Haplotyping for sow 65 for SNP base pair (bp) positions 309,120, 477,400, 705,066, 712,417, 768,502, 771,992, 887,856, 1,208,216 and 1,301,402 on chromosome 1

Full size table

Table 5 Estimates of haplotype frequencies based on the Corners’ Algorithm in a segment on SSC1 with nine SNPs at base pair positions 309,120, 477,400, 705,066, 712,417, 768,502, 771,992, 887,856, 1,208,216 and 1,301,402

Full size table

Discussion

The present study shows that LD can be estimated directly from haplotype frequencies for pairs of biallelic SNPs for which only three haplotypes are segregating and the four-gamete test fails. The use of iterative methods for haplotype frequency estimation and phasing, such as the EM algorithm of Excoffier and Slatkin [5], is then unnecessary. The EM algorithm is widely used and implemented in standard software such as Haploview [29]. The difference between a direct method to count genotypes and an EM algorithm to estimate LD is that the latter requires starting values of haplotype frequencies that do not necessarily converge to the absolute maximum. Excoffier and Slatkin [5] concluded that, for the EM algorithm, several starting frequencies may be necessary when sample sizes are small. The Corners’ Algorithm infers the phases of double heterozygotes based on the presence of zeroes in corners of the 3 × 3 table of genotype counts at two loci. Note that the probability of missing a haplotype also depends on the sample size and, therefore, both the EM and Corners’ Algorithms may be affected by small sample sizes. For larger sample sizes, our results show that both the EM algorithm and the Corners’ Algorithm gave exactly the same results, except when there are inconsistencies in two-marker genotypes, likely due to genotyping errors. The main advantage of the Corners’ Algorithm is that it facilitates exact phasing of haplotypes. The Corner´s Algorithm is also faster to compute than the EM algorithm since it is based on counting genotypes for pairs of SNPs without the need for iterative mathematical operations.

Failure of the four-gamete test has been used as a method for detecting recombination under the assumption that no back-mutation and/or recurrent mutation exists [9]. We have used a low-density SNP array in this study which may affect the number of tests that failed the four-gamete test. At a higher SNP density, as with data from next-generation sequencing (NGS), recombination events should be identified that would be missed in a low-density array, which would facilitate the construction of fine recombination maps. Another explanation for the abundance of three segregating haplotypes in our data is that the Iberian herd had been closed for many generations with a small population size and high inbreeding [20, 30]. In this context, haplotypes may have been lost by genetic drift.

Methods for phasing haplotypes make use of both LD and familial information that incorporates Mendelian segregation and linkage [31]. In the present study, haplotypes that were fully resolved by the Corners’ Algorithm allowed extension of single-marker GWAS to test all three haplotype effects. Our results illustrate that haplotype variants with an effect on performance that are detected with the Corners’ Algorithm may remain undetected when using single-marker GWAS. This may be one contributing factor to the missing heritability problem [17, 18]. However, identification of the causal mutation remains difficult. In addition, individuals with haplotypes that are bracketed by the same SNP alleles may not share identical DNA sequences within the haplotype fragment, i.e., for SNPs that were not genotyped. Further investigation using DNA sequencing and identifying all polymorphisms is required to reveal all haplotypes that segregate for a given chromosomal fragment and to detect their association with performance.

Genomic selection represents a major progress in methods for genetic improvement of farm animals [32]. One of the most popular applications of genomic selection is genomic best linear unbiased prediction (GBLUP), in which the relationship matrix based on pedigree is replaced by a genomic relationship matrix based on genetic markers [23]. The most popular method to construct genomic relationship matrices uses one SNP at a time when establishing genomic relationships between pairs of individuals. This method ignores haplotype information [23]. The Corners’ Algorithm proposed here could be adapted for genomic selection when three haplotypes are segregating, e.g., by treating each of the three haplotypes as different loci and, by incorporating them in the genomic relationship matrix. This is similar to multi-allelic markers which reduce to biallelic markers by considering one allele versus a pool of the other alleles. However, this may result in incorporating repeated information in the relationship matrix, which requires further research [33].

Haplotyping of chromosomal fragments that fail the four-gamete test for all pairs of SNPs facilitates the use of those haplotypes as alleles in a multi-allelic marker system. This may have a wide range of advantages over existing uses of single SNP information. It can help to identify chromosomal areas with no (or low) recombination when investigating the block structure of the animal genomes [34, 35]. Block determination typically uses a fixed-size sliding window [19] or is based on estimated LD [36]. There have been some contradictory results when using different algorithms for identifying haplotype blocks [37]. Essentially, haplotype blocks are separated from each other by historical recombination events, and, therefore, identifying regions within blocks with no recombination events by the Corners’ Algorithm may be helpful. Once blocks are established, LD between blocks constructed by the Corners’ Algorithm could be estimated using a multi-allelic approach [38, 39].

We also showed that haplotyping a chromosome segment that fails the four-gamete test for all pairs of SNPs (consecutive and non-consecutive) in the segment can be considered as a multi-allelic marker with increased polymorphism information content (PIC) and heterozygosity, which may be useful in a variety of scenarios such as paternity analysis, traceability, and DNA forensics. Ultimately, it can be used for finding local molecular relationships within and between populations that share non-recombinant haplotypes, as identified by the Corners’ Algorithm.

Conclusions

The Corners’ Algorithm allows to fully phase-resolve the haplotypes for all animals for SNPs for which the four-gamete test fails. Direct equations to estimate LD for such SNP pairs are provided that can replace iterative algorithms such as the EM algorithm. The resulting three haplotypes that are fully phase-resolved for all individuals can be implemented in GWAS, extending the testing of single-marker GWAS to haplotype-based GWAS. Haplotyping of chromosomal fragments that fail the four-gamete test for all SNP pairs can be used as a multi-allelic marker to increase PIC, elucidate haplotype blocks and reveal local historic relationships within and between populations.

Availability of data and materials

The genotyping data used are available in Zenodo (https://doi.org/10.5281/zenodo.6636369).

References

Fernando RL, Grossman M. Marker assisted selection using best linear unbiased prediction. Genet Sel Evol. 1989;21:467–77.
Article PubMed Central Google Scholar
Lewontin RC, Kojima K. The evolutionary dynamics of complex polymorphisms. Evolution. 1960;14:468–72.
Google Scholar
Snyder MW, Adey A, Kitzman JO, Shendure J. Haplotype-resolved genome sequencing: experimental methods and applications. Nat Rev Genet. 2015;16:344–58.
Article CAS PubMed Google Scholar
Ebert P, Audano PA, Zhu Q, Rodriguez-Martin B, Porubsky D, Bonder MJ, et al. Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science. 2021;372:eabf7117.
Article CAS PubMed PubMed Central Google Scholar
Excoffier L, Slatkin M. Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. Mol Biol Evol. 1995;12:921–7.
CAS PubMed Google Scholar
Stephens M, Smith NJ, Donnelly P. A new statistical method for haplotype reconstruction from population data. Am J Hum Genet. 2001;68:978–89.
Article CAS PubMed PubMed Central Google Scholar
Stephens M, Donnelly P. A comparison of bayesian methods for haplotype reconstruction from population genotype data. Am J Hum Genet. 2003;73:1162–9.
Article CAS PubMed PubMed Central Google Scholar
Browning SR, Browning BL. Haplotype phasing: existing methods and new developments. Nat Rev Genet. 2011;12:703–14.
Article CAS PubMed PubMed Central Google Scholar
Hudson RR, Kaplan NL. Statistical properties of the number of recombination events in the history of a sample of DNA sequences. Genetics. 1985;111:147–64.
Article CAS PubMed PubMed Central Google Scholar
Lewontin RC. The interaction of selection and linkage. I. General considerations; heterotic models. Genetics. 1964;49:49–67.
Article CAS PubMed PubMed Central Google Scholar
Hill WG, Robertson A. Linkage disequilibrium in finite populations. Theor Appl Genet. 1968;38:226–31.
Article CAS PubMed Google Scholar
Lewontin RC. On measures of gametic disequilibrium. Genetics. 1988;120:849–52.
Article CAS PubMed PubMed Central Google Scholar
Wray NR. Allele frequencies and the r2 measure of linkage disequilibrium: impact on design and interpretation of association studies. Twin Res Hum Genet. 2005;8:87–94.
Article PubMed Google Scholar
VanLiere JM, Rosenberg NA. Mathematical properties of the r2 measure of linkage disequilibrium. Theor Popul Biol. 2008;74:130–7.
Article PubMed PubMed Central Google Scholar
Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007;447:661–78.
Article Google Scholar
Visscher PM, Wray NR, Zhang Q, Sklar P, McCarthy MI, Brown MA, et al. 10 years of GWAS discovery: biology, function, and translation. Am J Hum Genet. 2017;101:5–22.
Article CAS PubMed PubMed Central Google Scholar
Lee SH, Wray NR, Goddard ME, Visscher PM. Estimating missing heritability for disease from genome-wide association studies. Am J Hum Genet. 2011;88:294–305.
Article PubMed PubMed Central Google Scholar
Young AI. Solving the missing heritability problem. PLoS Genet. 2019;15: e1008222.
Article CAS PubMed PubMed Central Google Scholar
Howard DM, Hall LS, Hafferty JD, Zeng YN, Adams MJ, Clarke TK, et al. Genome-wide haplotype-based association analysis of major depressive disorder in Generation Scotland and UK Biobank. Transl Psychiatry. 2017;7:1263.
Article PubMed PubMed Central Google Scholar
Silio L, Barragan C, Fernandez AI, Garcia-Casco J, Rodriguez MC. Assessing effective population size, coancestry and inbreeding effects on litter size using the pedigree and SNP data in closed lines of the Iberian pig breed. J Anim Breed Genet. 2016;133:145–54.
Article CAS PubMed Google Scholar
Ramos AM, Crooijmans RP, Affara NA, Amaral AJ, Archibald AL, Beever JE, et al. Design of a high density SNP genotyping assay in the pig using SNPs identified and characterized by next generation sequencing technology. PLoS One. 2009;4: e6524.
Article PubMed PubMed Central Google Scholar
Gilmour A, Gogel RBJ, Cullis BR, Thompson R. ASREML User guide release 3.0. Hemel Hempstead: VSN International Ltd; 2009.
Google Scholar
VanRaden PM. Efficient methods to compute genomic predictions. J Dairy Sci. 2008;91:4414–23.
Article CAS PubMed Google Scholar
R Core Team. R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing; 2021.
Turner SD. qqman: an R package for visualizing GWAS results using Q-Q and manhattan plots. J Open Source Softw. 2018;3:731.
Article Google Scholar
Benjamini Y, Drai D, Elmer G, Kafkafi N, Golani I. Controlling the false discovery rate in behavior genetics research. Behav Brain Res. 2001;125:279–84.
Article CAS PubMed Google Scholar
Kuo KHM. Multiple testing in the context of gene discovery in Sickle Cell disease using genome-wide association studies. Genomics Insights. 2017;10:1178631017721178.
Article PubMed PubMed Central Google Scholar
Benjamini Y, Hochberg Y. Controlling the false discovery rate - a practical and powerful approach to multiple testing. J R Stat Soc B. 1995;57:289–300.
Google Scholar
Barrett JC, Fry B, Maller J, Daly MJ. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics. 2005;21:263–5.
Article CAS PubMed Google Scholar
Gomez-Raya L, Rodriguez C, Barragan C, Silio L. Genomic inbreeding coefficients based on the distribution of the length of runs of homozygosity in a closed line of Iberian pigs. Genet Sel Evol. 2015;47:81.
Article PubMed PubMed Central Google Scholar
Dupuis MC, Zhang Z, Druet T, Denoix JM, Charlier C, Lekeux P, et al. Results of a haplotype-based GWAS for recurrent laryngeal neuropathy in the horse. Mamm Genome. 2011;22:613–20.
Article CAS PubMed Google Scholar
Meuwissen TH, Hayes BJ, Goddard ME. Prediction of total genetic value using genome-wide dense marker maps. Genetics. 2001;157:1819–29.
Article CAS PubMed PubMed Central Google Scholar
Hickey JM, Kinghorn BP, Tier B, Clark SA, van der Werf JH, Gorjanc G. Genomic evaluations using similarity between haplotypes. J Anim Breed Genet. 2013;130:259–69.
Article CAS PubMed Google Scholar
Daly MJ, Rioux JD, Schaffner SF, Hudson TJ, Lander ES. High-resolution haplotype structure in the human genome. Nat Genet. 2001;29:229–32.
Article CAS PubMed Google Scholar
Wall JD, Pritchard JK. Haplotype blocks and linkage disequilibrium in the human genome. Nat Rev Genet. 2003;4:587–97.
Article CAS PubMed Google Scholar
Wu Y, Fan HZ, Wang YH, Zhang LP, Gao X, Chen Y, et al. Genome-wide association studies using haplotypes and individual SNPs in Simmental cattle. PLoS One. 2014;9: e109330.
Article PubMed PubMed Central Google Scholar
Schulze TG, Zhang K, Chen YS, Akula N, Sun FZ, McMahon FJ. Defining haplotype blocks and tag single-nucleotide polymorphisms in the human genome. Hum Mol Genet. 2004;13:335–42.
Article CAS PubMed Google Scholar
Hedrick PW. Gametic disequilibrium measures: proceed with caution. Genetics. 1987;117:331–41.
Article CAS PubMed PubMed Central Google Scholar
Zhao H, Nettleton D, Soller M, Dekkers JCM. Evaluation of linkage disequilibrium measures between multi-allelic markers as predictors of linkage disequilibrium between markers and QTL. Genet Res. 2005;86:77–87.
Article CAS PubMed Google Scholar

Download references

Acknowledgements

This paper is dedicated to our friend Professor Rohan Fernando. We are thankful to him for being motivating and innovative in his career and we wish him the best in his retirement. We are thankful to Prof. Jack Dekkers and two anonymous reviewers for their comments to improve the manuscript. This research was carried out with data generated by the Spanish Ministry project AGL2016‐75942‐R, “Caracterización molecular de la eficiencia alimentaria y de los caracteres reproductivos en cerdo Ibérico” (IBERFIRE).

Funding

No funding was needed for this research.

Author information

Authors and Affiliations

Departamento de Mejora Genética Animal, Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Ctra de La Coruña Km 7.5, 28040, Madrid, Spain
Luis Gomez-Raya & Wendy M. Rauw

Authors

Luis Gomez-Raya
View author publications
You can also search for this author in PubMed Google Scholar
Wendy M. Rauw
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

LGR wrote the Corners’ Algorithm and analyzed the genotypic data; WR together with LGR wrote the manuscript. Both authors read and approved the final manuscript.

Corresponding author

Correspondence to Luis Gomez-Raya.

Ethics declarations

Ethics approval and consent to participate

All procedures followed the Spanish policy for the protection of animals used in research and other scientific purposes RD53/2013.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: Table S1.

Non-recombination chromosomal regions in Iberian pigs detected by the Corners’ Algorithm.

Appendices

Appendix

Direct LD estimation using the Corner’s Algorithm

Missing haplotype Ab

Estimation of ${r}^{2}$ when haplotype Ab is missing can be deducted based on Table 2 and by computing haplotype frequencies by using only genotype counts corresponding to unambiguous haplotypes. Double heterozygotes must have phase AB/ab. Thus:

$${f}_{AB}=\frac{{n}_{AaBb}+2{n}_{AABB}+{n}_{AABb}+{n}_{AaBB}}{2N},$$

$${f}_{Ab}=0,$$

$${f}_{aB}=\frac{2{n}_{aaBB}+{n}_{AaBB}{+n}_{aaBb}}{2N},$$

$${f}_{ab}=\frac{2{n}_{aabb}+{n}_{aaBb}{+n}_{Aabb}{+n}_{AaBb}}{2N},$$

$${f}_{A}=\frac{2{n}_{AA}+{n}_{Aa}}{2N},$$

$${f}_{a}=\frac{2{n}_{aa}+{n}_{Aa}}{2N},$$

$${f}_{B}=\frac{2{n}_{BB}+{n}_{Bb}}{2N},$$

$${f}_{b}=\frac{2{n}_{bb}+{n}_{Bb}}{2N},$$

with ${n}_{AA}={n}_{AABB}+{n}_{AABb}+ {n}_{AAbb}$;${n}_{Aa}={n}_{AaBB}+{n}_{AaBb}+ {n}_{Aabb}$; ${n}_{aa}={n}_{aaBB}+{n}_{aaBb}+ {n}_{aabb}$; ${n}_{BB}={n}_{AABB}+{n}_{AaBB}+ {n}_{aaBB}$;${n}_{Bb}={n}_{AABb}+{n}_{AaBb}+ {n}_{aaBb}$; ${n}_{bb}={n}_{AAbb}+{n}_{Aabb}+ {n}_{aabb}$; $N= {n}_{AA}+{n}_{Aa}+{n}_{aa}={n}_{BB}+{n}_{Bb}+{n}_{bb}$.

Substituting in Eq. (1) yields:

$${r}^{2}={\left[\frac{({n}_{AaBb}+2{n}_{AABB}+{n}_{AABb}+{n}_{AaBB})(2{n}_{aabb}+{n}_{aaBb}{+n}_{Aabb}{+n}_{AaBb})}{\sqrt{(2{n}_{AA}+{n}_{Aa})(2{n}_{aa}+{n}_{Aa})(2{n}_{BB}+{n}_{Bb})(2{n}_{bb}+{n}_{Bb})}}\right]}^{2}.$$

Missing haplotype aB

Estimation of ${r}^{2}$ when haplotype aB is missing can be deducted after using Table 2, and by computing haplotype frequencies by using only genotype counts corresponding to unambiguous haplotypes. Double heterozygotes must have phase AB/ab. Thus:

$${f}_{AB}=\frac{2{n}_{AABB}+{n}_{AABb}+{n}_{AaBB}+{n}_{AaBb}}{2N},$$

$${f}_{Ab}=\frac{2{n}_{AAbb}+{n}_{AABb}{+n}_{Aabb}}{2N},$$

$${f}_{aB}=0,$$

$${f}_{ab}=\frac{2{n}_{aabb}+{n}_{aaBb}{+n}_{Aabb}+{n}_{AaBb}}{2N},$$

$${f}_{A}=\frac{2{n}_{AA}+{n}_{Aa}}{2N},$$

$${f}_{a}=\frac{2{n}_{aa}+{n}_{Aa}}{2N},$$

$${f}_{B}=\frac{2{n}_{BB}+{n}_{Bb}}{2N},$$

$${f}_{b}=\frac{2{n}_{bb}+{n}_{Bb}}{2N},$$

which after substituting these frequencies in Eq. (1) yields:

$${r}^{2}={\left[\frac{(2{n}_{AABB}+{n}_{AABb}+{n}_{AaBB}+{n}_{AaBb})(2{n}_{aabb}+{n}_{aaBb}{+n}_{Aabb}+{n}_{AaBb})}{\sqrt{(2{n}_{AA}+{n}_{Aa})(2{n}_{aa}+{n}_{Aa})(2{n}_{BB}+{n}_{Bb})(2{n}_{bb}+{n}_{Bb})}}\right]}^{2}.$$

Missing haplotype ab

Estimation of ${r}^{2}$ when haplotype ab is missing can be deducted after using Table 2, by computing haplotype frequencies by using only genotype counts corresponding to unambiguous haplotypes. Double heterozygotes must have phase Ab/aB. Thus:

$${f}_{AB}=\frac{2{n}_{AABB}+{n}_{AABb}+ {n}_{AaBB}}{2N},$$

$${f}_{Ab}=\frac{2{n}_{AAbb}+{n}_{AABb}+ {n}_{Aabb}+{n}_{AaBb}}{2N},$$

$${f}_{aB}=\frac{2{n}_{aaBB}+{n}_{AaBB}{+n}_{AaBb}+{n}_{aaBb}}{2N},$$

$${f}_{ab}=0,$$

$${f}_{A}=\frac{2{n}_{AA}+{n}_{Aa}}{2N},$$

$${f}_{a}=\frac{2{n}_{aa}+{n}_{Aa}}{2N},$$

$${f}_{B}=\frac{2{n}_{BB}+{n}_{Bb}}{2N},$$

$${f}_{b}=\frac{2{n}_{bb}+{n}_{Bb}}{2N},$$

which after substituting these frequencies in Eq. (1) yields:

$${r}^{2}={\left[-\frac{(2{n}_{AAbb}+{n}_{AABb}+ {n}_{Aabb}+{n}_{AaBb})(2{n}_{aaBB}+{n}_{AaBB}{+n}_{AaBb}+{n}_{aaBb})}{\sqrt{(2{n}_{AA}+{n}_{Aa})(2{n}_{aa}+{n}_{Aa})(2{n}_{BB}+{n}_{Bb})(2{n}_{bb}+{n}_{Bb})}}\right]}^{2}.$$

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Gomez-Raya, L., Rauw, W.M. Failing the four-gamete test enables exact phasing: the Corners’ Algorithm. Genet Sel Evol 54, 74 (2022). https://doi.org/10.1186/s12711-022-00763-1

Download citation

Received: 17 December 2021
Accepted: 24 October 2022
Published: 14 November 2022
DOI: https://doi.org/10.1186/s12711-022-00763-1

Failing the four-gamete test enables exact phasing: the Corners’ Algorithm

Abstract

Background

Theory and methods

Results

Conclusions

Background

Methods

Theory

Exact phasing of haplotypes when the four-gamete test fails: the Corners’ Algorithm

Direct estimation of LD when the four-gamete test fails

Estimation of haplotype-based GWAS when the four-gamete test fails

Haplotyping of chromosomal regions that fail the four-gamete test for any combination of SNP pairs

Animal material

SNP genotyping

Direct estimation of LD when the four-gamete test fails

GWAS for total number of born piglets in Iberian pigs

Results

Exact phasing of haplotypes when the four-gamete test fails: the Corners’ Algorithm

Direct estimation of LD when the four-gamete test fails

GWAS for total number of born piglets in Iberian pigs

Haplotyping of chromosomal regions that fail the four-gamete test for any combination of SNP pairs

Discussion

Conclusions

Availability of data and materials

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher's Note

Supplementary Information

Additional file 1: Table S1.

Appendices

Appendix

Direct LD estimation using the Corner’s Algorithm

Missing haplotype Ab

Missing haplotype aB

Missing haplotype ab

Rights and permissions

About this article

Cite this article

Share this article

Genetics Selection Evolution

Contact us