Detection of quantitative trait loci in Bos indicus and Bos taurus cattle using genome-wide association studies

Background The apparent effect of a single nucleotide polymorphism (SNP) on phenotype depends on the linkage disequilibrium (LD) between the SNP and a quantitative trait locus (QTL). However, the phase of LD between a SNP and a QTL may differ between Bos indicus and Bos taurus because they diverged at least one hundred thousand years ago. Here, we test the hypothesis that the apparent effect of a SNP on a quantitative trait depends on whether the SNP allele is inherited from a Bos taurus or Bos indicus ancestor. Methods Phenotype data on one or more traits and SNP genotype data for 10 181 cattle from Bos taurus, Bos indicus and composite breeds were used. All animals had genotypes for 729 068 SNPs (real or imputed). Chromosome segments were classified as originating from B. indicus or B. taurus on the basis of the haplotype of SNP alleles they contained. Consequently, SNP alleles were classified according to their sub-species origin. Three models were used for the association study: (1) conventional GWAS (genome-wide association study), fitting a single SNP effect regardless of subspecies origin, (2) interaction GWAS, fitting an interaction between SNP and subspecies-origin, and (3) best variable GWAS, fitting the most significant combination of SNP and sub-species origin. Results Fitting an interaction between SNP and subspecies origin resulted in more significant SNPs (i.e. more power) than a conventional GWAS. Thus, the effect of a SNP depends on the subspecies that the allele originates from. Also, most QTL segregated in only one subspecies, suggesting that many mutations that affect the traits studied occurred after divergence of the subspecies or the mutation became fixed or was lost in one of the subspecies. Conclusions The results imply that GWAS and genomic selection could gain power by distinguishing SNP alleles based on their subspecies origin, and that only few QTL segregate in both B. indicus and B. taurus cattle. Thus, the QTL that segregate in current populations likely resulted from mutations that occurred in one of the subspecies and can have both positive and negative effects on the traits. There was no evidence that selection has increased the frequency of alleles that increase body weight.

Results: Fitting an interaction between SNP and subspecies origin resulted in more significant SNPs (i.e. more power) than a conventional GWAS. Thus, the effect of a SNP depends on the subspecies that the allele originates from. Also, most QTL segregated in only one subspecies, suggesting that many mutations that affect the traits studied occurred after divergence of the subspecies or the mutation became fixed or was lost in one of the subspecies.

Conclusions:
The results imply that GWAS and genomic selection could gain power by distinguishing SNP alleles based on their subspecies origin, and that only few QTL segregate in both B. indicus and B. taurus cattle. Thus, the QTL that segregate in current populations likely resulted from mutations that occurred in one of the subspecies and can have both positive and negative effects on the traits. There was no evidence that selection has increased the frequency of alleles that increase body weight.

Background
Taurine (Bos primigenius taurus) and zebu (Bos primigenius indicus) cattle are the only two surviving subspecies of wild cattle (Bos primigenius) and constitute the majority of the cattle populations in the world. The estimated time of divergence between Bos taurus (B. taurus) and Bos indicus (B. indicus) ranges from 117 000 to 275 000 years according to mtDNA analyses [1] and from 610 000 to 850 000 years according to microsatellite data analyses [2]. B. taurus and B. indicus cattle were domesticated independently in the Near East and in India respectively [3,4].
The accuracy of genomic selection [5,6] and the power of genome-wide association studies (GWAS) can be improved by increasing the number of animals with phenotypes and genotypes. Sample sizes can be increased by combining data from different breeds. However, genomic selection and GWAS depend on the presence of linkage disequilibrium (LD) between markers (usually single nucleotide polymorphisms or SNPs) and quantitative trait loci (QTL). Thus, since the extent and phase of LD can vary between breeds, combining data from different breeds can reduce the association between a SNP and trait phenotype. de Roos et al. [7,8] reported that the phase of LD between adjacent markers on the 50 K Bovine SNP chip frequently differed among B. taurus cattle breeds, which means that the power of association studies based on the 50 K chip will be reduced when using multiple breeds because the LD between SNPs and QTL is unlikely to be conserved across breeds.
In this study, we considered multiple breeds and used the Illumina high-density (HD) chip (~700 000 SNPs). We expected the LD phase to be largely conserved for typical distances between adjacent markers on the HD SNP chip (i.e., 5 kb) in B. taurus breeds (The Bovine Hap Map 2009). However, in addition to B. taurus breeds, beef cattle in Australia include a predominant B. indicus breed (Brahman) and composite breeds derived from crosses between B. taurus and B. indicus. Assuming a divergence time of about 500 000 years, only chromosome segments of approximately 1 kb in length that segregated in B. primigenius might still segregate in B. taurus or B. indicus cattle assuming 5 years per generation, recombinations over 10 5 generations will result in identity by descent segments of approximately 10 -5 Morgan = 1 kb; according to O'Rourke et al. [9]. Thus, it is unlikely that the LD between QTL and SNPs that are separated by~5 kb is conserved between the two subspecies. In fact, over 10 5 generations, it is likely that new QTL mutations occurred independently in B. indicus and B. taurus and some QTL, that segregated in the common ancestor, have become fixed in one of the two subspecies. Therefore, the aim of this paper was to determine how to analyse the HD SNP data on Brahman (B. indicus), B. taurus and composite breeds so that QTL are clearly mapped to regions of the genome.
The apparent effect of a SNP on phenotype depends on the LD between the SNP and the QTL, which as discussed above, can differ between the B. indicus and B. taurus breeds. If the Australian bovine dataset did not include composite breeds, one could simply classify all alleles present in B. taurus breeds as B.taurus alleles and all alleles in the Brahman breed as B. indicus alleles and estimate the effects of each allele separately for the two subspecies. However, the presence of composite breeds introduces a complication and an opportunity. The complication is that the SNP alleles present in the composite breeds can originate either from B. indicus or B. taurus. Therefore, to deal with this situation, all SNP alleles of the composite breeds were assigned either a B. indicus or B. taurus origin based on the identified origin (taurine or indicine) of the haplotype surrounding the SNP [10]. The opportunity is that the composite breeds will segregate for mutations that have been fixed for alternate alleles in B. taurus and B. indicus and therefore will lead to the identification of new QTL. This paper reports GWAS for growth, carcass and meat quality traits and compares the effect of indicine and taurine SNP alleles. We estimated the effect of the subspecies origin of each SNP allele, the effect of the SNP allele regardless of the subspecies origin, and the interaction between these two variables, and used this information to make inferences about the QTL that segregate in the two subspecies.  [11][12][13]. A total of 10 181 animals of the three breed types (3384 B. indicus, 3296 B. taurus, and 3501 B. taurus × B. indicus) with SNP genotypes and measured for at least one trait were used in this study.

Animals and phenotypes
Phenotypes for 12 different traits, including growth, feed intake, carcass and meat quality traits, were collated from five sources: the Beef Cooperative Research Centre Phase I (CRCI), Phase II (CRCII), Phase III (CRCIII), the Trangie selection lines, and the Durham Shorthorn group (detailed description is in Bolormaa et al. [6]). Not all cattle were measured for all traits. The number of genotyped cattle with each trait in each dataset, trait definitions, heritability estimates and means and standard deviations (SD) are in Table 1.

SNP data
Data on 729 068 SNPs were used in this study, which were obtained from five different SNP panels: (1)  Stringent quality control procedures were applied to the SNP data, separately for each platform and breed group. SNPs were excluded if the call rate per SNP (the proportion of SNP genotypes that have an Illumina GenCall score above 0.6) was less than 90% or showed an extreme departure from Hardy-Weinberg equilibrium (e.g., SNPs on autosomal chromosomes with both homozygous genotypes observed, but no heterozygotes). If two SNPs had the same position but with different names, one of them was deleted from the data. Furthermore, animals with a call rate less than 90% were removed from the SNP data.
High-density SNP genotypes were imputed for all animals using Beagle [14]. Details on imputation of the genomic dataset were described by Bolormaa et al. [6]. Briefly, imputations of the 7 K, 10 K and 50 K SNP genotype data to the 729 068 SNPs were performed in two sequential stages: from 7 K or 10 K or 50 K data to a common 50 K data set and then from the common 50 K data set to 800 K data. In the first stage, imputation was done within breed, using 30 iterations of Beagle. In the second stage, the HD genotypes of each breed type (501 B. taurus and 520 B. indicus) were used as a reference set to impute from the 50 K genotypes of each pure breed within the corresponding breed type. For the four composite breeds, all the HD genotypes (1698) were used as a reference set to impute the 50 K genotypes of each composite breed up to 800 K.

Classification of chromosome segments based on indicine or taurine origin
Each chromosome was divided into non-overlapping segments consisting of 30 or 31 consecutive SNPs. The genotypes were phased using Beagle so that two haplotypes were defined for each animal for each segment. The probability ('b') that a chromosome segment in a given animal carrying the i th haplotype was of B. indicus origin was estimated using the following formula: b ¼ p Bi i p Bi i þp Bt i [10], where p Bt i is frequency of the i th haplotype in B. taurus and p Bi i is frequency of the i th haplotype in B. indicus animals. Preliminary results showed that, in some cases, the origin of a segment was inconclusive but that many of the surrounding segments were of the same subspecies origin. Because crosses between B. taurus and B. indicus cattle are recent events and, therefore, long chromosome segments are expected to have the same origin, a rolling average of 'b' values across seven segments of 30 or 31 SNPs was calculated, which led to more segments being clearly classified as taurine or indicine. As shown in Figure 1, this procedure resulted in most segments having a 'b'

Statistical analysis
A mixed model (fitting fixed and random effects simultaneously) was used to perform genome-wide association studies (GWAS). The mixed model applied was: trait~mean + fixed effects + SNP i + animal + error, with animal and their 5-generation-ancestors and error fitted as random effects and dataset, breed, cohort and sex as fixed effects for all traits. Other fixed effects differed by Table 1 Number of records within and across five datasets and mean, standard deviation (SD) and heritability estimates (h 2 ) of each trait for the genotyped animals and their 5-generation ancestors trait. Further details of the effects used for each data set are in Johnston [15], Johnston et al. [11], Reverter et al. [16], Robinson and Oddy [17], Barwick et al. [12], Wolcott et al. [18] and Bolormaa et al. [6]. The fixed effects were fitted as nested within a dataset. In addition to the fixed and random effects described above, each analysis included the effect of one SNP position (SNP i ). Since there are four alleles defined for each position (i.e., indicine A, indicine B, taurine A, and taurine B), three contrasts of one degrees-of-freedom (e.g. sub-species origin, allele within B. taurus and allele within B. indicus) are needed to compare all four alleles. We carried out several different analyses using different parameterisations of these contrasts (see Table 2 for a definition of the variables x 1 to x 7 ).
First, we carried out a "conventional GWAS" by ignoring the subspecies origin of the alleles and contrasting the A and B alleles, i.e. by fitting only variable x 1 as defined in Table 2. Each animal has a genotype made up of two alleles, so the SNP variable analysed was the sum of the x 1 's for the two alleles, i.e. the analysis fits a regression of phenotype on the number of B alleles (0, 1 or 2) in the SNP genotype of each animal. This analysis was performed using all data together, as well as separately for each of the three breed types (Bos taurus, Brahman and composites).
Second, in the "interaction GWAS", the three contrasts were fitted simultaneously by fitting the effect of the SNP allele (A vs B, variable x 1 ), the effect of subspecies origin (B. indicus vs B. taurus, variable x 2 ) and the interaction between them (variable x 3 ). Each animal contains a paternal and maternal allele, so the full model for the effect of a SNP is: where x 1p is the x 1 variable for the paternal allele, x 1m is the x 1 variable for the maternal allele, etc., b 1 is the regression of phenotype on number of B alleles, b 2 is the regression of phenotype on the number of indicine alleles, and b 3 is the regression of phenotype on the number of indicine B alleles.
Since an interaction between SNP allele and subspecies origin can arise because a QTL segregates in only one of the two subspecies, a re-parameterisation of this "interaction" model was also tested by fitting subspecies origin (x 2 ), allele within B. taurus (A vs B or variable x 6 ) and allele within B. indicus (A vs B or x 7 ). As in the previous model, this model includes paternal and maternal alleles, so the x variables for the maternal and paternal alleles were added together, resulting in the following model: where b 6 and b 7 are the regressions of phenotype on the number of taurine and indicine B alleles, respectively. This model is equivalent to the interaction model described above but tests a different set of three onedegree-of-freedom contrasts. The most parsimonious assumption is that a QTL will have two alleles and that they will be in highest LD with one of the seven variables defined in Table 2. For instance, if the QTL segregates only in B. taurus, then variable x 4 or x 6 should be the most significant variable; if the mutant QTL allele is linked to the SNP A allele, then variable x 4 will be most significant; but if the mutant QTL allele is linked to SNP allele B, then variable x 6 should be most significant. In cases where one of x 4   to x 7 was the most significant variable we assume that the QTL segregates in only one subspecies and that the SNP allele which has an effect different to the other three alleles is tracking the mutant allele at the QTL. This also allows us to determine whether the effect of the QTL mutation is positive or negative for each trait. Therefore, in the final set of analyses, each variable x 1 , x 2 , x 4 , x 5 , x 6 , and x 7 (x 3 is the same as x 7 ) was tested one at a time, by including it in the full model described above, to identify the variable that had the greatest association with the phenotype (the "best variable GWAS").

Classification of chromosome segments based on subspecies origin
The distribution of the probability of B. indicus origin (rolling average of 'b' values) for B. taurus, B. indicus, and composite animals are shown in Figure 1. For B. taurus animals, 'b' values were low (close to 0) but only 7.1% of the 'b' values were found in the range of 0.1 < b < 0.2 and 1.5% in the range of 0.2 < =b < 0.32, indicating some uncertainty, probably because many B. taurus breeds were included in the dataset. Using 'b' < 0.6 to indicate taurine origin, the estimated frequency of indicine segments in B. taurus breeds was 0, as expected, 98% in Brahman, and 71 to 83% in composite breeds.
Genome-wide association studies Conventional GWAS GWAS, in which each SNP (variable x 1 ) was tested separately for an association with the trait, was performed for all animals together and separately for each breed type (B. taurus, B. indicus and B. taurus × B. indicus; Table 3). This resulted in, e.g. 489 SNPs to be significant (P < 10 -4 ) for residual feed intake (RFI) in the joint analysis of all breed types. With 729 068 SNPs tested, this corresponds to a false discovery rate (FDR) of 14% ( Table 3). Estimates of the FDR of SNPs declared significant (P < 10 -4 ) differed between traits, with a low value for live weight, muscle shear force (LLPF), and postweaning IGF-I concentration (PWIGF), ranging from 2 to 11% (Table 3), and a moderate value (14% to 21%) for the remaining traits. For all traits except PWIGF, FDR was lower when all data were analysed jointly than when separate analyses were performed within each breed type (Table 3). There was no consistent pattern for FDR when comparing the analyses of the three breed types; e.g., for RFI, FDR was lower for B. taurus data than for the composite cattle data but for growth traits FDR was lower for Brahman and B. taurus × B. indicus data than for B. taurus data ( Table 3). Such results might reflect the presence of mutations of larger effect segregating in particular breed types, such as the PLAG1 (pleomorphic adenoma gene 1) polymorphism (chr14:25001906..25052394), which segregates in Brahman and composite breeds but is close to fixation in B. taurus [19,20]. In such cases, many SNPs in the vicinity of the QTL were significant. Across traits, more than one SNP was found to be significant within short regions on chromosomes 5, 7, 14 and 29, which are near or within genes that are known to carry mutations with large effects affecting tenderness traits, such as PLAG1, CAPN1 (calpain 1) (chr29:44064420..44089990), and CAST (calpastatin) (chr7:98524257..98581260) [19][20][21][22][23][24][25][26]. In a number of cases, these SNPs had associations with more than one trait.

Interaction GWAS
For all traits, the number of significant (P < 10 -4 ) SNPs for the interaction GWAS, which fitted SNP allele (variable x 1 ), subspecies of origin (x 2 ), and their interaction (x 3 ), was substantially greater than the number of significant SNPs in the conventional GWAS (Table 4). This indicates that using the subspecies origin of each SNP allele increased power. Table 4 also shows the number of SNPs for which one of the three underlying variables was significant. The number of SNPs for which contrast of allele A vs B (variable x 1 ) was significant, was smaller than for the comparable contrast in the conventional GWAS (Table 3) because in the interaction analysis, this variable was fitted after accounting for subspecies origin (variable x 2 ) (i.e. a conditional F-test). Subspecies origin of the allele was significant for many SNPs, in part because adjacent SNPs are highly correlated for x 2 . Consequently, many SNPs surrounding the same QTL were significant for this variable and these contribute many counts in Table 4. The interaction between SNP and subspecies origin (variable x 3 ) was often significant, which indicates that the effect of an allele depended on the subspecies origin of the allele. If the LD phase between a QTL and a SNP differed randomly between B. indicus and B. taurus, one would expect that sometimes the phase would be the same in both subspecies and sometimes it would be reversed. When the phase is the same, we would observe a significant SNP effect and when it was reversed or absent, we would observe a significant interaction. Thus, our results are consistent with the hypothesis that LD differs between B. indicus and B. taurus.
In Table 4, the main effect of the SNP is only slightly more often significant than the interaction, which suggests that there is little consistency in LD between B. indicus and B. taurus. If a QTL was fixed for opposite alleles in the two subspecies, then subspecies origin of the allele (variable x 2 ) is expected to be the only significant variable. However, subspecies origin can be significant whenever there is a difference in QTL allele frequencies between B. indicus and B. taurus. When this occurs, the QTL might be segregating in one subspecies or in both. To investigate this possibility, we re-parameterised the interaction model to fit the effects of subspecies origin (x 2 ), of SNP allele within B. taurus (x 6 ), and of SNP allele within B. indicus (x 7 ). The number SNPs that was significant for these two additional variables is also included in Table 4. For most traits, there were many SNPs for which the difference between the A and B alleles was significant within a subspecies. An extreme example is the trait HUMP but this trait was only recorded in Brahman cattle and composite breeds, so it is not surprising that the difference between alleles was seldom significant within B. taurus (All other traits were recorded in all breed types). Table 4 does not indicate how often the effect of a SNP was significant for both B. indicus and B. taurus origins. Table 5 shows the number of SNPs for which one or more of the three one-degree-of-freedom contrasts was significant (P < 10 -4 ). For instance, for LLPF, two SNPs were significant for all three contrasts and these two SNPs were also significant in the conventional GWAS (in fact, these two SNPs are close to and within the CAST gene). The effect of a SNP was only rarely significant within both indicine and taurine origin (Table 5). For instance, for PW-lwt, 36 SNPs were significant within both indicine and taurine origin. On the rare occasions that this occurred, the effect was always in the same direction in both subspecies and the SNP was also significant in the conventional GWAS, which indicates that the phase of LD between the SNP and the QTL was the same in both subspecies.
The most common pattern observed was that the SNP was significant within one of the subspecies origins and also for subspecies of origin (Table 5), which is what one would expect if the QTL segregated in one subspecies only. Therefore, to address this possibility, we tested a series of one-degree-of-freedom contrasts at each position to find the most significant contrast.

Best variable GWAS
At each SNP position, we tested separately the effects of the main effect of SNP (x 1 ), subspecies origin (x 2 ), SNP within B. taurus (x 4 and x 6 ), and of SNP within B. indicus (x 5 and x 7 ) to find the most significant contrast among the four alleles (indicine A, indicine B, taurine A and taurine B). We expect that the pattern of the most Table 4 Interaction GWAS: number of significant variables (P < 10 -4 ) for the joint and individual effects for each trait Joint test of main effects of SNP, Origin, and the SNP × Origin interaction.
significant contrast will correspond to the pattern of segregation of the QTL, e.g. if x 4 is the most significant contrast we expect that the QTL segregates only in B. taurus and that the A allele of the SNP is in LD with a QTL allele that does not occur in B. indicus. Table 6 shows the number of SNPs for each trait for which at least one of these variables was significant at P < 10 -6 . For instance, 56 SNPs were significant for RFI. Table 6 also shows the proportion of these SNPs for which each variable was the most significant. For instance, among the 56 SNPs significant for RFI, 55% had the main SNP variable (x 1 ) as the most significant variable. Tables 7 lists the SNPs for which one of the variables was significant at P < 5 × 10 -8 . The number of significant SNPs varied widely between traits ( Table 6). The four traits with the largest number of significant SNPs were LLPF, PW_lwt, PW_hip, and X_lwt and for each of these traits there were many SNPs for which the variable recording the sub-species origin of the allele was the most significant. These SNPs clustered in several genome regions because subspecies origin was highly correlated between neighbouring SNPs. However, in these clusters subspecies origin was seldom the most significant variable. That is, there was usually one SNP in the region where other variables (x 1 , x 4 to x 7 ) were more significant than subspecies origin (x 2 ). Consequently, when the significance threshold was increased to 5 × 10 -8 , there were no SNPs for which subspecies of origin was the most significant variable (Table 7). For traits other than LLPF, PW_lwt, PW_hip, and X_lwt, the most significant variable for a SNP could be any of the main SNP effect, the effect of SNP within B. indicus or of SNP within B.taurus. This indicates that the QTL segregated in one or both of the subspecies.
To avoid counting the same QTL many times, we chose only the most significant SNP (P < 10 -4 ) for each chromosome by trait combination. This resulted in 345 SNP-trait combinations (85 conventional SNPs, 141 within B. taurus, and 119 within B. indicus), representing 339 unique SNPs. Figure 2 plots the standardized estimate of the effect of an allele (effect estimate / standard error) against the corresponding allele frequency for the 260 SNP-trait combinations, within B. taurus and within B. indicus. We interpret the allele with an effect different to that of the three other alleles to be tracking the mutant allele at the QTL. Therefore, the frequency and  OT-Nb = number of SNPs that have significant effects due to sub-species of origin and alleles within B. taurus but not for alleles within B. indicus; Numbers in brackets = number of SNPs that are significant for all three contrasts and that are also significant in the conventional GWAS for one or more of the 3 one-degree-of-freedom contrasts (see Table 3). Table 6 Best variable GWAS: number of SNPs at which one or more of the seven variables (x 1 to x 7 ) was significant at P < 10 -6 and the proportion of these for which each variable was the most significant     For CIMF and carcass rib fat (CRIB), all significant effects were found within B. taurus and most increased fatness or intra-muscular fat. No trait showed an obvious correlation between allele frequency and effect, as one might expect if selection was acting to increase the frequency of alleles with a positive effect.

Examples of significant SNPs
There were 52 trait-SNP combinations for which one of the one-degree-of-freedom contrasts was significant at P < 5 × 10 -8 (Table 7). Of these, 23 trait-SNP combinations (44%) showed the most significant contrast for the conventional SNP test, which suggests a QTL that segregated in both subspecies. However, in nearly half of these trait-SNP combinations (9/23), the MAF of the SNP was less than 0.05 in one of the subspecies, which suggests that the QTL only segregated in the other subspecies. In eight of the remaining 14 trait-SNP combinations, the significant SNP was near the PLAG1 gene on chromosome 14 that affects weight, height, fatness and RFI. For most SNPs in this region, the conventional SNP test was the most significant contrast but Fortes et al. [20] showed that the PLAG1 polymorphism is due to a B. taurus mutation that was introduced into Brahman cattle during grading up and was then selected for, perhaps because it increases height. The region between 47 and 50 Mb on chromosome 5 contains a series of SNPs that were significant for multiple traits, including X_lwt, PW_hip, HUMP, CIMF and CRIB (Table 7). These associations, except with HUMP, are likely due to the same QTL. The most significant x-variable differed between SNPs but in all cases, except for HUMP, the results indicate a SNP that was nearly fixed for alternate alleles in the two subspecies. Consequently, when the conventional GWAS within each breed type is examined, there is no evidence for this QTL, except in the composite breeds. There also appears to be a QTL on chromosome 6 at 40 Mb that affects hip height and live weight that appears to segregate only in B. indicus.

Discussion
The average of probability of B. indicus origin ('b' values) across the genome was 0.98 for Australian Brahman cattle, which indicates that about 2% of their genes were estimated to be of taurine origin. This is less than the 10% that was estimated based on 50 K SNP data by Bolormaa et al. [10]. Similarly, the present analysis estimated that 70% of the genome of the F1 crosses was of B. indicus origin. Since F1 crosses include Charbray × Brahman and Santa Gertrudis × Brahman, the percentage of the genome that was of B. indicus origin was expected to exceed 50%. The B. indicus content estimated in our work may be slightly overestimated because we classified some taurine haplotypes as B. indicus because they occurred more often in Brahman cattle than in the Australian B. taurus cattle. For instance, a taurine haplotype that was not found in Australian B. taurus cattle might have been incorporated into Brahman cattle during grading up. However, this should have little effect on the overall results.
For most traits, the FDR in the conventional GWAS was lower when all data were analysed jointly than when separate analyses were performed within each breed type. However, this probably reflects the increased size of the dataset in the joint analysis, rather than indicating that QTL segregate across B. taurus and B. indicus because a QTL segregating only in B. taurus would still segregate in the composite cattle. Indeed, evidence for a QTL in the same chromosome region in both B. taurus and composite breeds or in both B. indicus and composite breeds was frequent but the same QTL was rarely observed in all three breed types.
In the interaction GWAS, the interaction between SNP allele and subspecies origin was frequently significant, which indicates that the effects of the SNP alleles on a trait depended on the subspecies from which they originated. When we re-parameterised this analysis, we found that many SNPs had significant effects within B. taurus origin or within B. indicus origin but seldom within both. Thus, the simplest interpretation of the data is that QTL usually segregate either within B. taurus or within B. indicus. This is not surprising given that the two subspecies diverged about 10 5 generations ago and since then mutations have created new QTL independently in the two subspecies. If this is correct, then one subspecies is expected to be fixed for the ancestral allele and the other segregates for the ancestral and mutant allele. If the frequency of the mutant allele increases to a moderate value in the mutated subspecies, then a difference in effect between alleles from B. taurus and B. indicus is expected (i.e., significant variable x 2 ), as well as an effect within one subspecies. This is exactly the pattern that we found most commonly. The variable that best fitted the data was either x 4 , x 5 , x 6 or x 7 . These variables compare one allele from one subspecies against the other allele from the same subspecies and both alleles from the other subspecies. The simplest interpretation would be that one SNP allele is in linkage phase with the mutant allele at the QTL and the other three alleles are associated with the ancestral allele at the QTL. Even if this interpretation is correct only in a majority of cases, it allows us to estimate approximately the frequency and effect of the mutant allele at the QTL. Figure 2 shows, under this interpretation, that mutations that affect weight have occurred in both B. taurus and B. indicus, with some that increase and others that decrease weight. As a result of drift and selection, these mutant alleles have frequencies ranging from near 0 to near 1 but there was no evidence that selection had systematically increased the frequency of alleles that increase weight.
Not all traits show the same pattern as that obtained for weight. Mutations that affect hump appear to have occurred only in B. indicus and to predominantly increase hump size (Figure 2). This could be due to lack of power in our data to detect alleles in B. taurus that affect hump but it could also indicate that mutations that increase hump size have been selected in B. indicus cattle. Similarly, mutations that affect intra-muscular fat seem to have occurred primarily in B. taurus and most of these increased intra-muscular fat percentage, which could be a result of direct or indirect selection for marbling in B. taurus.
Some of the QTL detected here have been reported previously. The QTL near PLAG1 on chromosome 14 affects many traits (live weight, height, carcass, meat quality and IGF-I traits) and segregates in both B. taurus and B. indicus, although the mutation arose in B. taurus and was introgressed in Brahman cattle during grading up [20]. In our analysis, the most significant SNPs associated with hip height (Table 7) had a frequency of 0.95 in B. taurus, which indicates that the mutant allele, which has a positive effect on height, is almost fixed in the B. taurus breeds in Australia. The QTL for shear force that we detected on chromosomes 7 and 29 have been identified previously as polymorphisms in the genes calpastatin [22,26] and calpain 1 [19,24,25]. The polymorphism in the calpastatin gene was reported to segregate in both B. taurus and B. indicus [23] at allele frequencies of 0.82-0.99 and 0.57-0.65, respectively. In agreement with this, we found that the best variable for shear force on chromosome 7 was that for a conventional SNP test, with frequencies close to the frequencies of the mutation within each subspecies (i.e. 0.9 in B. taurus and 0.76 in B. indicus). For calpain 1, the frequencies of the favourable allele (that reduces shear force) in B. taurus and B. indicus were 0.11 and 0.81, respectively (Table 7), in agreement with previous results [23], i.e. 0.19-0.21 in B. taurus and 0.83-0.84 in B. indicus.
The QTL on chromosome 5 at 49 Mb (Table 7) affected many traits and appeared to be nearly fixed for alternate alleles in B. taurus and B. indicus and consequently it was not possible to determine which allele is the mutant or in which subspecies it occurred. The gene HMGA2 (high mobility group AT-hook 2) is near this position and has been reported to affect height, fatness and fat distribution in humans, mouse, horse and pig [27][28][29][30]. By comparison, the QTL on chromosome 6 at about 40 Mb (Table 7) appears to have arisen in B. indicus due to a mutation that decreases height and weight and now has an allele frequency of 0.1.
These results have implications for future GWAS and for genomic selection when cattle that have both B. taurus and B. indicus origins are used. The same SNP allele may not be in phase with the same QTL allele in both subspecies and so it would be useful to distinguish the subspecies origin of SNP alleles when they are used for GWAS or genomic selection. Selection in composite breeds can also benefit from selecting the taurine allele at some sites and the indicine allele at others to capture the 'best of the two subspecies'. However, only a few QTL are fixed for alternate alleles in the two subspecies. The most common pattern is that the QTL segregates in one subspecies and is fixed in the other.

Conclusions
Our results suggest that, although some QTL may segregate in both B. indicus and B. taurus, because the QTL existed in their common ancestor, it is more frequent that a mutation created a QTL in only one of the two subspecies since they diverged. Consequently, the LD between a SNP and nearby QTL is not expected to be the same in both subspecies and a SNP by subspecies interaction is associated with phenotype. By classifying SNP alleles according to subspecies origin, we were able, in many cases, to estimate which QTL allele was ancestral and which allele was derived and their effect on traits. The derived or mutant alleles occurred at a wide range of frequencies, with positive and negative effects on the traits studied. However, some traits are exceptions, i.e. QTL that affect the size of the hump were due to mutations having occurred in B. indicus and these mutations predominantly increased hump size.
Abbreviations B taurus or Bt: Bos taurus; B indicus or Bi: Bos indicus; LD: Linkage disequilibrium; GWAS: Genome-wide association studies; SNP: Single nucleotide polymorphism; QTL: Quantitative trait loci; FDR: False discovery rate; w/i: Within; conventional GWAS: GWAS fitting SNPs regardless of the sub-species origin; interaction GWAS: GWAS fitting SNPs Sub-species origin and their interaction interaction GWAS (re-parameterised); GWAS fitting sub-species origin: Allele within B. taurus, and allele within B. indicus; best variable GWAS: GWAS fitting each variable including individual SNPs, sub-species origin, allele A and B within B. taurus, and allele A and B within B. indicus.

Competing interests
The authors declare that they have no competing interests.
Authors' contributions MEG conceived and designed the experiments. SB combined the phenotype sets and performed the experiments. SB, JEP, BJH, KEK, BT, and YZ carried out quality control and imputation on genotype data. WB and AT contributed pedigree and genotype data. SB and MEG wrote the manuscript. All authors have read and approved the final manuscript.