Empirical determination of breed-of-origin of alleles in three-breed cross pigs

Sevillano, Claudia A.; Vandenplas, Jeremie; Bastiaansen, John W. M.; Calus, Mario P. L.

doi:10.1186/s12711-016-0234-9

Research Article
Open access
Published: 04 August 2016

Empirical determination of breed-of-origin of alleles in three-breed cross pigs

Claudia A. Sevillano^1,2,
Jeremie Vandenplas³,
John W. M. Bastiaansen¹ &
…
Mario P. L. Calus³

Genetics Selection Evolution volume 48, Article number: 55 (2016) Cite this article

2758 Accesses
25 Citations
2 Altmetric
Metrics details

Abstract

Background

Although breeding programs for pigs and poultry aim at improving crossbred performance, they mainly use training populations that consist of purebred animals. For some traits, e.g. residual feed intake, the genetic correlation between purebred and crossbred performance is low and thus including crossbred animals in the training population is required. With crossbred animals, the effects of single nucleotide polymorphisms (SNPs) may be breed-specific because linkage disequilibrium patterns between a SNP and a quantitative trait locus (QTL), and allele frequencies and allele substitution effects of a QTL may differ between breeds. To estimate the breed-specific effects of alleles in a crossbred population, the breed-of-origin of alleles in crossbred animals must be known. This study was aimed at investigating the performance of an approach that assigns breed-of-origin of alleles in real data of three-breed cross pigs. Genotypic data were available for 14,187 purebred, 1354 F₁, and 1723 three-breed cross pigs.

Results

On average, 93.0 % of the alleles of three-breed cross pigs were assigned a breed-of-origin without using pedigree information and 94.6 % with using pedigree information. The assignment percentage could be improved by allowing a percentage (f_r) of the copies of a haplotype to be observed in a purebred population different from the assigned breed-of-origin. Changing f_r from 0 to 20 %, increased assignment of breed-of-origin by 0.6 and 0.7 % when pedigree information was and was not used, respectively, which indicates the benefit of setting f_r to 20 %.

Conclusions

Breed-of-origin of alleles of three-breed cross pigs can be derived empirically without the need for pedigree information, with 93.7 % of the alleles assigned a breed-of-origin. Pedigree information is useful to reduce computation time and can slightly increase the percentage of assignments. Knowledge on the breed-of-origin of alleles allows the use of models that implement breed-specific effects of SNP alleles in genomic prediction, with the aim of improving selection of purebred animals for crossbred offspring performance.

Background

The genetic correlation between purebred and crossbred performance (r_pc) is a crucial parameter that determines the effect of selection at the nucleus level, where purebred animals are used, on the rate of genetic change at the production level, where crossbred animals are used [1, 2]. In many cases, r_pc deviates from 1 because of (1) different genetic backgrounds, and (2) different management procedures for purebreds and crossbreds [1, 3, 4]. As r_pc decreases, the benefit of using crossbred information increases [1, 5], e.g. Dekkers [6] reported that even with a r_pc as low as 0.7 using crossbred information was advantageous. When information on crossbred animals is used, effects of single nucleotide polymorphisms (SNPs) may be breed-specific because linkage disequilibrium (LD) patterns between a SNP and a quantitative trait locus (QTL) [4] and allele frequencies and allele substitution effects of a QTL may differ between breeds [7]. With genomic prediction, it is possible to determine the effect of alleles from different breeds and, thus, it can be used to select purebred animals for crossbred performance. An additive model that accounts for breed-specific SNP effects for genomic prediction using crossbred information was proposed by Ibánẽz-Escriche et al. [8] and Christensen et al. [9, 10]. Ibánẽz-Escriche et al. [8] and Esfandyari et al. [11] showed with simulated data that, under some conditions (i.e., low SNP density, large training data size, and low breed relatedness), the model that accounts for breed-specific SNP effects outperformed models in which SNP effects are assumed to be the same across breeds. If the above-mentioned conditions that favor the model that accounts for breed-specific effects with simulated data are met in real, then it is important to determine whether such models are also superior in real data.

To estimate the effect of a SNP allele that is present in a crossbred animal and originates from a purebred animal, the breed-of-origin of alleles in crossbreds must be known. While breed-of-origin of alleles was assumed to be known without error by Ibánẽz-Escriche et al. [8] and Esfandyari et al. [11], errors in breed-of-origin of alleles and the total percentage of alleles assigned to a breed-of-origin likely impact the accuracy of subsequent analyses such as genomic prediction.

For a two-way cross, determining the breed-of-origin of alleles is relatively easy, especially when both parents are genotyped [12]. However, in pig and chicken production, three-way crosses are commonly used. Bastiaansen et al. [4] developed an approach to assign breed-of-origin to alleles in three-breed cross animals. They used a long-range phasing method [13] to relax the dependency on genotyped parents and available pedigree information. Haplotypes that were derived from the long-range phasing method were assigned to a breed if they were present in only one of the purebred populations, which subsequently allowed assigning the breed-of-origin of alleles when that haplotype was observed in crossbred animals. Vandenplas et al. [14] improved and tested the approach to assign breed-of-origin of alleles on simulated data and obtained highly accurate allele assignments in three-breed cross animals without using pedigree information. Our aim was to investigate the performance of assignment of breed-of-origin of alleles on real data of three-breed cross pigs. The impact of using pedigree information on the crossbred animals on the assignment of breed-of-origin of alleles was also tested because in this dataset the pedigree was completely known and this approach is able to use such information when available.

Methods

Ethics statement

The data used for this study was collected as part of routine data recording in a commercial breeding program. Samples collected for DNA extraction were only used for routine diagnostic purposes of the breeding program. Data recording and sample collection were conducted strictly in line with the Dutch law on the protection of animals (Gezondheids- en welzijnswet voor dieren).

Genotypic data

We used pigs that originated from a three-way crossbreeding design, in which Landrace (LR) pigs were crossed with Large White (LW) pigs to produce F₁ (LR × LW or LW × LR) crossbred pigs, which in turn were crossed with synthetic boar (S) pigs to produce three-breed cross pigs [S (LR × LW) or S (LW × LR)]. Genotyping data was available for 14,187 purebred, 1354 F₁, and 1723 three-breed cross pigs (Table 1). All pigs were genotyped using one of the three following SNP panels: Illumina PorcineSNP60.v2 BeadChip (60K.v2), Illumina PorcineSNP60 BeadChip (60K), or Illumina PorcineSNP10 BeadChip (10K) (see Table 1 for details). LR, LW and S pigs were primarily genotyped with the 60K (N = 2352), 10K (N = 3618), and 10K (N = 1233) chips, respectively. F₁ pigs were primarily genotyped with the 60K.v2 (N = 786) chip and three-breed cross pigs with the 10K (N = 1432) chip. SNPs were removed from the data if they had the same position as another SNP (only one removed), if they had no position assigned, or if they were present on Sus scrofa chromosome (SSC) X or SSCY. The SNP set for subsequent analyses consisted of SNPs from the 60K.v2 that had a call rate higher or equal to 90 % across all purebred lines. Pigs genotyped with the 60K or 10K chips were imputed to the 60K.v2 panel. SNPs with low imputation accuracy across all purebred lines and F₁ crossbreds (concordance <0.80) were removed from the final set of SNPs. Finally, 52,164 SNPs remained for the analyses (Fig. 1).

Table 1 Number of genotyped pigs available per SNP panel, and per purebred line or cross

Full size table

Imputation

FImpute Version 2.2 software [15] was chosen for imputation with default parameter settings and using pedigree information because it is one of the most efficient available software programs for imputation [15, 16]. Within each of the three purebred lines, LR, LW, and S, imputation was performed in two steps: (1) pigs genotyped with the 10K chip were imputed to 60K, and (2) all pigs with 60K data were imputed to 60K.v2. For F₁ and three-breed cross pigs, imputation was done in a single step, i.e. pigs genotyped with the 10K and 60K chips were directly imputed to 60K.v2, because all ancestors were genotyped or already imputed to 60K.v2. The numbers of SNPs from each panel that were used in each imputation step are in Fig. 1.

Validation of imputation

Imputation accuracy was assessed in 80 pigs from each of the purebred lines, LR, LW, and S, and in 80 F₁ crossbred pigs, which were all genotyped with the 60K.v2 panel. Accuracy of imputation in three-breed cross pigs was not assessed because none of them were genotyped with the 60K.v2. All pigs that were chosen to assess imputation accuracy had no offspring and both their parents were genotyped with the 60K.v2, 60K, or 10K chips. In these pigs, the genotypes of all SNPs on the 60K.v2 panel were set to missing, except for the SNPs that were also on the 10K panel. Imputation accuracy was calculated for each SNP in two ways, based on concordance and Pearson correlation, using the real and imputed genotypes. Pearson correlations per SNP between the real and imputed genotypes were corrected for minor allele frequency (MAF), i.e., \( {\text{real genotype}}-2{\text{*MAF}} \) and \( {\text{imputed genotype}}-2{\text{*MAF}} \). The MAF for each SNP was calculated using the data for the 80 pigs tested from each population. SNPs with low imputation accuracy across all purebred lines and F₁ crossbreds (concordance <0.80) were removed from the final set of SNPs.

Assignment of breed-of-origin of alleles

To assign breed-of-origin of alleles to three-breed cross pigs, we used an approach that consisted of three steps: (1) phasing the haplotypes of both purebred and crossbred pigs, (2) determining the unique haplotypes among the pure breeds, and (3) assigning the breed-of-origin for each allele carried on the haplotypes of crossbred pigs, i.e. F₁ and three-breed cross pigs. For these steps, we used all the 52,164 SNPs in the final set.

Phasing

AlphaPhase1.1 software [13] that implements a long-range phasing and haplotype library imputation algorithm was used to phase the genotypes. Although FImpute [15] also searches for long shared haplotypes and builds a haplotype library, the breed-of-origin approach cannot use this program because it also searches for short shared haplotypes. Short shared haplotypes can be difficult to assign to a breed because they may be shared across breeds. Long-range phasing is also of particular interest because it does not rely on pedigree information. However, we tested both scenarios, phasing with and without pedigree information, to assess if allele assignment was improved when using pedigree information. Due to computational limitations, assigning breed-of-origin without using pedigree information was performed only for chromosomes 3, 4, 9, 12, and 16. For both scenarios, haplotypes were built using nine combinations of core and tail lengths: 350:50, 250:100, 300:100, 350:100, 150:200, 200:200, 250:200, 300:200, 350:200. The concepts of core and tails are outlined in detail in Hickey et al. [13]. Briefly, a core is a consecutive string of SNPs that are phased simultaneously, while tails are consecutive strings of SNPs that are immediately adjacent to either end of a core and that are used together with the core SNPs to identify which pigs in the data carry the same haplotype. Each combination of core and tails was run both considering “Offset” and “NotOffset” modes. The “Offset” mode shifts the start of the cores to halfway along the first core, creating 50 % overlaps between cores. These settings were chosen based on results of Vandenplas et al. [14] and allowed each allele to be considered 18 times through different haplotypes of variable length. Varying the haplotype lengths may improve the overall phasing when some animals do and others do not have close relatives present in the genotype data. For all phasing analyses, 1 % of genotype errors and 1 % disagreement between genotypes and haplotypes were allowed.

Assignment of breed for haplotypes and alleles

Assignment of breed-of-origin to haplotypes was performed as in Vandenplas et al. [14]. To assign a breed-of-origin to a haplotype, it was necessary that most of its copies were present in a specific breed. We tested two relaxation factors (f_r), i.e. 0 and 20 %, which is the maximum percentage of the copies of a haplotype that may be observed in a different purebred population. When the percentage of copies of a haplotype that was observed in a single breed was less than (100 − f_r) %, the breed-of-origin for that haplotype was set to unknown.

Assignment of breed-of-origin to each allele that is carried on the haplotypes of crossbred animals is based on the knowledge available for the breed-of-origin of the haplotypes, the zygosity (i.e., homozygosity or heterozygosity) of the locus, and the breed composition of the crossbred animals (see Vandenplas et al. [14] for the algorithm). Each allele at each locus can receive 18 breed-of-origin assignments, but, in some analyses, this number can be smaller when no breed is assigned to the haplotype.

Principal component analysis

A principal component analysis (PCA) was performed to check if three-breed cross pigs with a low assignment of breed-of-origin to their alleles were genetically distinct to the three-breed cross population. The PCA was performed by eigen decomposition of the genomic relationship matrix (G-matrix). The G-matrix was computed as in Yang et al. [17], using our in-house software calc_grm [18].

Results

Imputation and accuracies of imputation

Accuracies of imputation were very close to 1, both when based on concordance and Pearson correlation (Table 2). The Pearson correlation between imputed and real genotypes per SNP was greater than 0.96 across all pure lines and F₁ pigs (Table 2). The Pearson correlation per SNP was very similar across different MAF (Fig. 2). Some individual SNPs (N = 406) showed poor imputation accuracy (concordance <0.80) and were removed from the set of SNPs. The final set of SNPs considered for imputation and assignment of breed-of-origin for alleles of three-breed cross pigs included 52,164 SNPs from the 60K.v2 panel.

Table 2 Average imputation accuracies computed across pigs or SNPs

Full size table

Assignment of breed-of-origin for alleles

Comparison of different settings used for assignment of breed-of-origin

All pigs were used to assign the breed-of-origin of alleles but the results are presented only for three-breed cross pigs. Breed-of-origin assignments were obtained from analyses without pedigree information for chromosomes 3, 4, 9, 12, and 16, and from analyses with pedigree information for all autosomes. For chromosomes 3, 4, 9, 12, and 16, on average 93.0 % (±1.0 %) of the alleles of a three-breed cross pig were assigned to a breed-of-origin without using pedigree information and 94.6 % (±1.0 %) with using pedigree information, both with a relaxation factor (f_r) equal to 0 % (Table 3). For all autosomes, on average 93.9 % (±1.4 %) of the alleles of a three-breed cross pig were assigned to a breed-of-origin when using pedigree and f_r set at 0 %. Relaxing f_r from 0 to 20 % increased the assignment by 0.6 and 0.7 % with and without using pedigree information, respectively, for chromosomes 3, 4, 9, 12, and 16, and increased the assignment by 1.3 % with using pedigree information for all autosomes (Table 3). In general, increases in assignment percentage were small regardless of whether pedigree information was used or not or whether f_r was set to 0 or 20 %.

Table 3 Allele assignment (%) to purebred lines as breed-of-origin in four scenarios

Full size table

The assigned breed-of-origin of alleles for heterozygous genotypes may differ depending on the approach used. To assess the effect of using pedigree information, breed-of-origin assignments with or without the use of pedigree information were compared. Both scenarios included only chromosomes 3, 4, 9, 12, and 16 (Table 4, comparison A). Only 0.3 % of the assignments displayed a change in their breed-of-origin depending on the use of pedigree information or not. Assignments were concordant for 94.2 % of the genotypes and 5.5 % of the genotypes were assigned a breed-of-origin by only one of the two approaches.

Table 4 Comparison between different scenarios for the assignment of breed-of-origin of alleles

Full size table

To assess the impact of increasing the relaxation factor, assignments of breed-of-origin obtained with f_r set at 0 and 20 % were compared. In this case, both scenarios included pedigree information and only chromosomes 3, 4, 9, 12, and 16 were used (Table 4, comparison B). Only 0.1 % of the assignments displayed a change in their breed-of-origin between setting f_r at 0 or 20 %. The assignments were concordant for 99.2 % of the genotypes and 0.7 % of the genotypes were assigned a breed-of-origin by only one of the approaches. Because differences in breed-of-origin assignments between options were small, only results obtained with pedigree information and an f_r set at 20 % will be presented in the following.

Performance of assignment of breed-of-origin

Average assignment percentages were similar across three-breed cross pigs. On average, for each chromosome, at least 80 % of alleles were assigned a breed-of-origin to 98.7 % of the three-breed cross pigs. Of the three-breed cross pigs, 8 % (N = 141) had a chromosome for which less than 80 % of the alleles were assigned and 4 % (N = 66) had multiple such chromosomes. The assignment percentage of these 207 three-breed cross pigs is illustrated in Fig. 3. The chromosome that has the lowest percentage of assignment varied across these 207 pigs. The lowest assignment for a chromosome was observed in a three-breed cross pig for which only 19.0 % of the alleles on chromosome 9 were assigned to a breed. For this pig, chromosome 6 had the highest assignment, for which 67 % of the alleles were assigned to a breed. Two three-breed cross pigs, including the one mentioned above, had a low percentage of assignment for all 18 chromosomes (Fig. 3).

The average assignment of breed-of-origin of alleles was similar across chromosomes, with a standard deviation (SD) of 0.95 % among the 18 chromosomes. Within chromosome, the SD ranged from 3.36 % for chromosome 1 and 13, to 5.97 % for chromosome 2. The highest assignment was obtained for chromosome 17 (96.5 %) and the lowest for chromosome 12 (93.6 %) (see Additional file 1). For chromosome 17, 49.8 % of the alleles were assigned to the paternal S purebred line, 23.1 % to the maternal LR purebred line, and 23.6 % to the maternal LW purebred line. For chromosome 12, 49.3 % of the alleles were assigned to the paternal S purebred line, 21.7 % to the maternal purebred LR line, and 22.6 % to the maternal LW purebred line. The main differences between chromosomes were due to differences in the percentage assigned to the maternal purebred lines.

For most three-breed cross pigs, one chromosome of each pair was almost completely assigned to the paternal S purebred line, as shown for 25 random pigs in Fig. 4, while the other chromosome showed large blocks that were assigned to the maternal LR or LW purebred line. While it is expected that 50 % of the maternal chromosome originates from one of the two maternal purebred lines, these percentages can deviate strongly from this value for individual animals. The pattern in Fig. 4 is as expected based on the 1.2 recombination rate of chromosome 12 [19], and we observed on average one recombination per chromosome. However, near the ends of the maternal chromosomes, the number of alternate assignments of breed-of-origin of alleles between the maternal LR or LW purebred lines increased, which is consistent with the higher levels of recombination that are observed in these chromosome regions [19]. Assignment of breed-of-origin to each allele is also based on the breed composition of the crossbred animals. For one three-breed cross pig, if the origin of the maternal allele is assigned, the algorithm always assigns the paternal origin to the other allele at the same locus, i.e. in Fig. 4 no dark grey region is observed opposite to an assigned maternal allele. The other way around, if the origin of the paternal allele is assigned, the algorithm does not necessarily always assign the maternal origin to the other allele at the same locus, because it cannot choose between the two maternal purebred lines, as can be observed from dark grey regions opposite to an assigned paternal allele.

Principal component analysis

The principal component analysis of the genomic relationship matrix provided a clear separation between the purebred lines and between the three-breed cross pigs (Fig. 5). The first and second principal components together explained 16.9 % of variation, while the third principal component only explained 1.9 % of the variation, which is mainly associated with variation within the LR purebred line population. Previously, we detected two three-breed cross pigs with a low percentage of assignment for all 18 chromosomes. In Fig. 5, we plotted the first three principal components of the genomic relationship matrix and we observed that one of these pigs was placed within the paternal S purebred line population, while the other pig was placed outside the three-breed cross population, but also outside all purebred line populations. This indicates that these two pigs were genetically distinct from the three-breed cross population.

Discussion

Imputation

For the three purebred lines, LR, LW, and S, imputation was performed in two steps, 10K genotypes were imputed to 60K, and the output of the first step was imputed to 60K.v2. This strategy was chosen because the 10K panel shared more SNPs (8743) with the 60K panel than with the 60K.v2 panel (6861). Pedigree information was used for the imputation because it was available. However, in the absence of pedigree information and with high-density panels, family information can be captured by searching for long haplotypes and used for imputation [15]. The accuracies of imputation that were obtained in our study, using related pigs that were genotyped with high-density panels (60K or 60K.v2) and using pedigree information, were close to accuracies reported in the literature with similar datasets [16, 20]. Gualdron Duarte et al. [20], imputed 9K genotypes of F₂ individuals from a Duroc × Pietrain population to 60K, and obtained an accuracy of imputation higher than 0.94. With our data, the accuracy of imputation per SNP was very similar across different values of MAF, which indicates that rare variants were also accurately imputed. Similarly, Gualdron Duarte et al. [20] observed that SNPs with a MAF lower than 0.10 were imputed with reasonably good accuracy in the F₂ population. Ventura et al. [16], imputed crossbred beef cattle from 6K to 50K, and concluded that the accuracy of imputation of crossbred animals can be high if the number of reference animals genotyped with high-density panels is sufficiently large and if all breeds that have led to the crossbred animals are included. They also used the FImpute software [15] and obtained imputation accuracies higher than 0.94. However, accuracy of imputation was based only on concordance. Concordance estimates for imputation accuracy are generally higher than Pearson correlations. Imputation errors are generally due to the assignment of the major instead of the minor allele, and the probability of such errors decreases as MAF decreases. Therefore, SNPs with a low MAF generally show high concordance [21]. Moreover, the slightly lower accuracies reported by Ventura et al. [16] compared to those found in our study, can be explained by the fact that they lacked pedigree information. Another reason may be the higher levels of genomic divergence between the reference population and the group of animals to be imputed. In addition, the structure of the populations may have also contributed to this difference since pig breeding populations have a small effective population size, few boars with large family sizes, and generally complete pedigree information, while beef cattle populations have a larger number of sires with smaller family sizes and incomplete pedigree information [16]. Accuracy of imputation in our three-breed cross pigs data was not assessed because none of these animals were genotyped with the 60K.v2 chip. However, we would expect high imputation accuracies, i.e. similar to the results obtained for the purebreds and F₁ pigs. Shared haplotypes should have been found easily and accurately because the reference group was large and related to the target group [15]. Moreover, high imputation accuracy of rare variants was also expected in the three-breed cross pigs, because alleles present in the crossbreds must be present in the purebred parental lines [20].

Assignment of breed-of-origin

Percentage of assignment of breed-of-origin to alleles increased only slightly when pedigree information was used (1.6 % with f_r set at 0 %, and 1.5 % with f_r set at 20 %). Using pedigree information is recommended, first to increase allelic assignments, and second to reduce computation time during the phasing analyses. Only a small difference in assignment of breed-of-origin between using pedigree information or not was expected, because this information was only used for the phasing step, and it has been shown that long-range phasing, as implemented in AlphaPhase1.1 software, performs well in the absence of pedigree information [13]. Percentages of assignments were in line with the results based on simulated data that were reported by Vandenplas et al. [14]. In their simulation study [14], the distantly-related breeds scenario is comparable to our real data analysis. We obtained the highest percentage of assignment when using pedigree information and f_r equal to 20 %. Based on the simulation study of Vandenplas et al. [14], relaxing the maximum percentage of copies of the haplotype observed in another purebred population from 0 to 10 %, and then to 20 %, slightly increased the percentage of correct assignments but did not influence the percentage of incorrect assignments, and consequently slightly decreased the percentage of unknown assignments for crossbred animals that originated from distantly-related breeds. Across our results in Tables 3 and 4, 91 % of the alleles were always assigned and 2.8 % were never assigned, regardless of whether pedigree information was used or not. Therefore, 6.2 % of the alleles might switch from not being assigned to being assigned or vice versa, depending on whether pedigree information is used or not and the value set for f_r. Furthermore, we observed that assignments of breed-of-origin obtained with f_r set at 0 or 20 % were consistent. Therefore, relaxing the maximum percentage of copies of the haplotype to be observed in another purebred population from 0 to 20 % did appear to have resulted in extra assignments rather than rearrangement of assignments.

Animals with a low percentage of assignment of breed-of-origin

The percentage of assignment of breed-of-origin to alleles was high and constant across chromosomes. Two hundred and seven three-breed cross pigs had at least one chromosome for which less than 80 % of the alleles were assigned. It is difficult to characterize these animals, since 115 of these have only one or none of their parents genotyped. However across the whole data, 221 three-way crossbred animals also had only one or none of their parents genotyped, which means that 106 of them still achieved more than 80 % assignment for all chromosomes. Relatedness within these three-way crossbred animals does not seem to be the issue either. We found, a maximum of 16 half- or full-sibs (in the scenario with common sire A) and 11 half- or full-sibs (in the scenario with common sire B), however, sires A and B also produced 13 and 15 other half- or full-sibs, respectively, with more than 80 % assignment for all chromosomes. A low assignment percentage was found for the whole genome for two three-breed cross pigs. A principle component analysis of the genotype data showed that these two pigs do not overlap with the three-breed cross population. Thus, the approach used was not able to assign an origin to most of their haplotypes. We suspect that this absence of overlap of these two three-breed cross pigs with the three-breed cross population may be due to erroneous pig identification, i.e. the first pig might have originated from the paternal S purebred line and the second pig from a cross with another line that was not included in this study. This absence of overlap with the corresponding population was also observed for some purebred line pigs, likely for the same reasons. Pigs with low assignment of breed-of-origin to alleles along the whole genome should be removed from the dataset because they do not add information about breed-of-origin of alleles when it is used in further analyses, and because the low assignment may indicate an error in the data. The assignment of breed-of-origin to alleles of other three-breed cross pigs in the dataset should not be affected by these apparently incorrectly labelled pigs, even if the incorrect assignment occurs for the purebred line pigs, i.e. using breed-of-origin assignment with a f_r of 20 %, we still expected that at least 80 % of the alleles from the other purebred line pigs would be assigned the correct breed.

With the third principal component, we observed that pigs from the LR line were more variable compared to those from the other purebred lines (Fig. 5). This is probably because the recent history of the LR pigs used in our study involves animals that originated from two populations. As a result, the three-breed cross pigs were also sub-divided into two sub-groups, which probably depended on which of these two sub-populations the LR grand-dam came from. This variation within the LR population was mainly captured by the third principal component but it explained only 1.94 % of the extra variation.

Phasing and haplotype library

The first step to assign the breed-of-origin of alleles, was to phase the genotypes using the long-range phasing and haplotype library algorithm AlphaPhase1.1 (13). Phasing using pedigree information was on average three times faster than phasing without pedigree information. For the starting analysis, which includes phasing of the purebred animals, and the first batch of the crossbred animals, the assignment of breed-of-origin can still be accurately obtained without pedigree information, but one has to account for the increased computational demand. AlphaPhase1.1 builds a library of all unique haplotypes that long-range phasing has found in the dataset. This library can then be used in subsequent analyses for phasing new crossbred animals that are added to the dataset and that may or may not have pedigree information, without the need to phase the reference population again. Hickey et al. [22] tested this phasing strategy with simulations and obtained 81 to 94 % of correctly phased SNPs with a low error rate (<0.08 %). This phasing strategy can be applied for breed-of-origin assignment to speed up the assignment of alleles of new crossbred animals that are added to the dataset.

Application

Using crossbred performance for genetic predictions could be beneficial in breeding systems where production animals are crossbred, especially for traits with a low genetic correlation between purebred and crossbred performance. Genomic selection outperforms selection based on pedigree relationships and allows the use of crossbred performance information, even when pedigree information is not available. However, when using crossbred information for genomic prediction, we must take into account that effects of SNPs may be breed-specific because LD patterns between a SNP and a QTL may differ between breeds [4], and allele frequencies and allele substitution effects of QTL may also differ between breeds [7]. To include these differences between breeds in a prediction model, we first need to determine the breed-of-origin of alleles in three-breed cross animals with high accuracy, as in this study, and then use prediction models that estimate breed-specific SNP effects, as proposed by Ibánẽz-Escriche et al. [8] and Christensen et al. [9]. The benefit of this approach, training with crossbred data and using breed-specific SNP effects models, is that allele substitution effects of purebred alleles will be estimated against the genetic background that they will be expressed in. Thus, this approach can potentially incorporate the additive components of dominance and epistasis [8, 23]. This could be used in combination with reciprocal recurrent selection [1] using phenotypes and genotypes of crossbred animals instead of only phenotypes [23]. Under some conditions (i.e., low SNP density, large crossbred training data size, and low breed relatedness), Ibánẽz-Escriche et al. [8] and Esfandyari et al. [11], reported improved predictions using a model that accounts for the breed-of-origin of alleles compared to an additive or dominance model where SNP effects are assumed the same across breeds. In Ibánẽz-Escriche et al. [8] and Esfandyari et al. [11], simulated data were used and breed-of-origin of alleles was assumed to be known a priori. With the results obtained in our study, the genomic model that accounts for breed-of-origin of alleles can be tested with real data. Since applications of genomic prediction require frequent re-estimation of SNP effects to maintain prediction accuracy, genomic prediction based on crossbred performance and breed-of-origin knowledge would also require repeated derivation of breed-specific SNP effects.

In addition to genomic prediction analyses, knowledge of breed-of-origin of alleles can also be used in genome-wide association studies (GWAS), accounting for the fact that the effect of causative mutations on phenotypes may depend on breed-of-origin. The approach can be similar to that using parental origin of sequence variants [24], in which genomic imprinting restricts the effect to the allele inherited from a parent of a specific sex; however, to be able to distinguish between parental origin and breed-of-origin, reciprocal crosses will be needed.

The genomic prediction model and GWAS that account for breed-of-origin can also be tested using haplotypes instead of single SNPs, which can increase prediction accuracies in genomic prediction [25], and increase power and precision in GWAS [26]. However, although the output of the breed-of-origin approach provided 18 haplotypes libraries, it will still be necessary to combine them and redefine the start and endpoints of the haplotypes so that they are suitable for these types of analyses.

Conclusions

Breed-of-origin of alleles of crossbred animals can be empirically derived without pedigree information. Pedigree information is, however, useful to reduce computation time and slightly improves assignment percentage. Around 94 % of the alleles of three-breed cross pigs were assigned a breed-of-origin. The results of this approach for assigning breed-of-origin to alleles allows the use of models that implement breed-specific effects of SNP alleles in genomic prediction, with the aim to improve selection of purebred animals for crossbred offspring performance. Breed-of-origin information also opens new possibilities to study associations between SNPs and production traits.

References

Wei M, van der Steen HAM. Comparison of reciprocal recurrent selection with pure-line selection systems in animal breeding (a review). Anim Breed Abstr. 1991;59:281–98.
Google Scholar
Brandt H, Täubert H. Parameter estimates for purebred and crossbred performances in pigs. J Anim Breed Genet. 1998;115:97–104.
Article Google Scholar
Lutaaya E, Misztal I, Mabry JW, Short T, Timm HH, Holzbauer R. Genetic parameter estimates from joint evaluation of purebreds and crossbreds in swine using the crossbred model. J Anim Sci. 2001;79:3002–7.
CAS PubMed Google Scholar
Bastiaansen JWM, Bovenhuis H, Lopes M, Silva F, Megens H, Calus MPL. SNP effects depend on genetic and environmental context. In: Proceedings of the 10th World congress on genetics applied to livestock production, 17–22 August 2014; Vancouver. https://asas.org/docs/default-source/wcgalp-proceedings-oral/356_paper_10322_manuscript_1308_0.pdf?sfvrsn=2.
Bijma P, van Arendonk JAM. Maximizing genetic gain for the sire line of a crossbreeding scheme utilizing both purebred and crossbred information. Anim Sci. 1998;66:529–42.
Article Google Scholar
Dekkers JCM. Marker-assisted selection for commercial crossbred performance. J Anim Sci. 2007;85:2104–14.
Article CAS PubMed Google Scholar
Wientjes YCJ, Veerkamp RF, Bijma P, Bovenhuis H, Schrooten C, Calus MPL. Empirical and deterministic accuracies of across-population genomic prediction. Genet Sel Evol. 2015;47:5.
Article PubMed PubMed Central Google Scholar
Ibánẽz-Escriche N, Fernando RL, Toosi A, Dekkers JC. Genomic selection of purebreds for crossbred performance. Genet Sel Evol. 2009;41:12.
Article PubMed PubMed Central Google Scholar
Christensen OF, Madsen P, Nielsen B, Su G. Genomic evaluation of both purebred and crossbred performances. Genet Sel Evol. 2014;46:23.
Article PubMed PubMed Central Google Scholar
Christensen OF, Legarra A, Lund MS, Su G. Genetic evaluation for three-way crossbreeding. Genet Sel Evol. 2015;47:98.
Article PubMed PubMed Central Google Scholar
Esfandyari H, Sørensen AC, Bijma P. A crossbred reference population can improve the response to genomic selection for crossbred performance. Genet Sel Evol. 2015;47:76.
Article PubMed PubMed Central Google Scholar
Lopes MS. Genomic selection for improved crossbred performance. PhD thesis. Wageningen: Wageningen University; 2016.
Hickey JM, Kinghorn BP, Tier B, Wilson JF, Dunstan N, van der Werf JH. A combined long-range phasing and long haplotype imputation method to impute phase for SNP genotypes. Genet Sel Evol. 2011;43:12.
Article PubMed PubMed Central Google Scholar
Vandenplas J, Calus M, Sevillano CA, Windig JJ, Bastiaansen JW. Assigning the breed origin of alleles in crossbred animals. Genet Sel Evol (in press).
Sargolzaei M, Chesnais JP, Schenkel FS. A new approach for efficient genotype imputation using information from relatives. BMC Genomics. 2014;15:478.
Article PubMed PubMed Central Google Scholar
Ventura RV, Lu D, Schenkel FS, Wang Z, Li C, Miller SP. Impact of reference population on accuracy of imputation from 6 K to 50 K single nucleotide polymorphism chips in purebred and crossbreed beef cattle. J Anim Sci. 2014;92:1433–44.
Article CAS PubMed Google Scholar
Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, Nyholt DR, et al. Common SNPs explain a large proportion of the heritability for human height. Nat Genet. 2010;42:565–9.
Article CAS PubMed PubMed Central Google Scholar
Calus MPL, Vandenplas J. Calc_grm—a program to compute pedigree, genomic, and combined relationship matrices. Wageningen: ABGC, Wageningen UR Livestock Research; 2015.
Google Scholar
Tortereau F, Servin B, Frantz L, Megens HJ, Milan D, Rohrer G, et al. A high density recombination map of the pig reveals a correlation between sex-specific recombination and GC content. BMC Genomics. 2012;13:586.
Article CAS Google Scholar
Gualdron Duarte JL, Bates RO, Ernst CW, Raney NE, Cantet RJC, Steibel JP. Genotype imputation accuracy in a F2 pig population using high density and low density SNP panels. BMC Genet. 2013;14:38.
Article CAS PubMed PubMed Central Google Scholar
Lin P, Hartz SM, Zhang Z, Saccone SF, Wang J, Tischfield JA, et al. A new statistic to evaluate imputation reliability. PLoS One. 2010;5:e9697.
Article PubMed PubMed Central Google Scholar
Hickey J, Kinghorn BP, Cleveland M, Tier B, Van der Werf JHJ. Recursive long range phasing and long haplotype library imputation: building a global haplotype library for Holstein cattle. In: Proceedings of the 9th World congress on genetics applied to livestock production, 1–6 Aug 2010, Leipzig.
Kinghorn BP, Hickey J, Van Der Werf JHJ. Reciprocal recurrent genomic selection for total genetic merit in crossbred individuals. In: Proceedings of the 9th World congress on genetics applied to livestock production, 1–6 August 2010, Leipzig.
Kong A, Steinthorsdottir V, Masson G, Thorleifsson G, Sulem P, Besenbacher S, et al. Parental origin of sequence variants associated with complex diseases. Nature. 2009;462:868–74.
Article CAS PubMed PubMed Central Google Scholar
Calus MPL, Meuwissen THE, de Roos APW, Veerkamp RF. Accuracy of genomic selection using different methods to define haplotypes. Genetics. 2008;178:553–61.
Article CAS PubMed PubMed Central Google Scholar
Pryce JE, Bolormaa S, Chamberlain AJ, Bowman PJ, Savin K, Goddard ME, et al. A validated genome-wide association study in 2 dairy cattle breeds for milk production and fertility traits using variable length haplotypes. J Dairy Sci. 2010;93:3331–45.
Article CAS PubMed Google Scholar

Download references

Authors’ contributions

CAS selected the genotypes and pedigree information, conducted the imputation and breed-of-origin assignment, prepared figures and tables, and wrote the first draft of the manuscript. JV conducted the programming for the assignment of breed-of-origin and was involved in the discussion of analysis issues. JWMB participated in the design of the study, designed the approach for the assignment of breed-of-origin, and was involved in the discussion and evaluation of analysis issues. MPLC participated in the design of the study and coordination, and was involved in the discussion on analysis issues. All authors read and approved the final manuscript.

Acknowledgements

This work was financially supported by the Netherlands Organisation for Scientific Research (NWO) through the LocalPork project W 08.250.102 in the Food and Business Global Challenges Program and by Breed4Food (BO-22.04-011-001-ASG-LR-3), a public–private partnership in the domain of animal breeding and genomics. We are grateful to Topigs Norsvin for providing the data required to perform this study.

Competing interests

The authors declare that they have no competing interests.

Author information

Authors and Affiliations

Animal Breeding and Genomics Centre, Wageningen University, PO Box 338, 6700 AH, Wageningen, The Netherlands
Claudia A. Sevillano & John W. M. Bastiaansen
Topigs Norsvin, PO Box 43, 6640 AA, Beuningen, The Netherlands
Claudia A. Sevillano
Animal Breeding and Genomics Centre, Wageningen UR Livestock Research, 6700 AH, Wageningen, The Netherlands
Jeremie Vandenplas & Mario P. L. Calus

Authors

Claudia A. Sevillano
View author publications
You can also search for this author in PubMed Google Scholar
Jeremie Vandenplas
View author publications
You can also search for this author in PubMed Google Scholar
John W. M. Bastiaansen
View author publications
You can also search for this author in PubMed Google Scholar
Mario P. L. Calus
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Claudia A. Sevillano.

Additional file

12711_2016_234_MOESM1_ESM.docx

Additional file 1. Allele assignment (%) per chromosome (Chr) to synthetic boar (S), Landrace (LR), or Large White (LW) as breed-of-origin when using pedigree information and a relaxation factor of 20 %. The data provided represent the percentage of allele assignment to each purebred line as breed-of-origin per chromosome when using pedigree information and a relaxation factor of 20 %.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article

Sevillano, C.A., Vandenplas, J., Bastiaansen, J.W.M. et al. Empirical determination of breed-of-origin of alleles in three-breed cross pigs. Genet Sel Evol 48, 55 (2016). https://doi.org/10.1186/s12711-016-0234-9

Download citation

Received: 17 February 2016
Accepted: 27 July 2016
Published: 04 August 2016
DOI: https://doi.org/10.1186/s12711-016-0234-9

Empirical determination of breed-of-origin of alleles in three-breed cross pigs

Abstract

Background

Results

Conclusions

Background

Methods

Ethics statement

Genotypic data

Imputation

Validation of imputation

Assignment of breed-of-origin of alleles

Phasing

Assignment of breed for haplotypes and alleles

Principal component analysis

Results

Imputation and accuracies of imputation

Assignment of breed-of-origin for alleles

Comparison of different settings used for assignment of breed-of-origin

Performance of assignment of breed-of-origin

Principal component analysis

Discussion

Imputation

Assignment of breed-of-origin

Animals with a low percentage of assignment of breed-of-origin

Phasing and haplotype library

Application

Conclusions

References

Authors’ contributions

Acknowledgements

Competing interests

Author information

Authors and Affiliations

Corresponding author

Additional file

12711_2016_234_MOESM1_ESM.docx

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Genetics Selection Evolution

Contact us