Genome-wide association study of reproductive traits in Nellore heifers using Bayesian inference

Background An important goal of Zebu breeding programs is to improve reproductive performance. A major problem faced with the genetic improvement of reproductive traits is that recording the time for an animal to reach sexual maturity is costly. Another issue is that accurate estimates of breeding values are obtained only a long time after the young bulls have gone through selection. An alternative to overcome these problems is to use traits that are indicators of the reproductive efficiency of the herd and are easier to measure, such as age at first calving. Another problem is that heifers that have conceived once may fail to conceive in the next breeding season, which increases production costs. Thus, increasing heifer’s rebreeding rates should improve the economic efficiency of the herd. Response to selection for these traits tends to be slow, since they have a low heritability and phenotypic information is provided only later in the life of the animal. Genome-wide association studies (GWAS) are useful to investigate the genetic mechanisms that underlie these traits by identifying the genes and metabolic pathways involved. Results Data from 1853 females belonging to the Agricultural Jacarezinho LTDA were used. Genotyping was performed using the BovineHD BeadChip (777 962 single nucleotide polymorphisms (SNPs)) according to the protocol of Illumina - Infinium Assay II ® Multi-Sample HiScan with the unit SQ ™ System. After quality control, 305 348 SNPs were used for GWAS. Forty-two and 19 SNPs had a Bayes factor greater than 150 for heifer rebreeding and age at first calving, respectively. All significant SNPs for age at first calving were significant for heifer rebreeding. These 42 SNPs were next or within 35 genes that were distributed over 18 chromosomes and comprised 27 protein-encoding genes, six pseudogenes and two miscellaneous noncoding RNAs. Conclusions The use of Bayes factor to determine the significance of SNPs allowed us to identify two sets of 42 and 19 significant SNPs for heifer rebreeding and age at first calving, respectively, which explain 11.35 % and 6.42 % of their phenotypic variance, respectively. These SNPs provide relevant information to help elucidate which genes affect these traits.


Background
In tropical regions, Zebu cattle are the main breed used for beef production since they are better adapted to the climate and more resistant to parasites than taurine breeds. One factor that increases production costs is the late onset of reproduction in Zebu females. According to Brummati et al. [1], for beef cattle that are raised in tropical production systems, reproductive traits possess a high economic value and are up to 13 times economically more important than growth and carcass traits. Advancing the start of the reproductive cycle brings economic benefits to the producer and increases profitability.
In Brazil, beef cattle breeding programs have put emphasis mainly on traits such as growth and weight gain and little attention has been paid to reproductive traits, in spite of the considerable influence that these traits have on the productivity and reproductive efficiency of herds [2][3][4]. According to Evans [5], females of breeding age account for about 70 % of the costs of the beef cattle production system.
Heifer rebreeding (HR) refers to the calving success of cows that have already calved once. The success rate of rebreeding after first calving is a major issue in beef cattle farming. Several studies have reported significant reductions in calving rates between the first and second services [6][7][8]. Mercadante et al. [9] observed reductions of up to 20 % in calving rates from the first to the second conception of Nellore females. Therefore, improving the rebreeding rate of heifers should increase the economic efficiency of beef cattle production.
Since age at first calving (AFC) is easily measured and is a good indicator of female sexual precocity, this trait has been included as a selection objective in breeding programs [10]. However, direct selection for lower AFC is not simple since some producers delay the entry of females into breeding programs, which determines the age or weight at which reproductive activity begins and impairs the identification of sexually precocious females. Furthermore, the heritability estimates of this trait are generally low to moderate (0.09 to 0.28) [2,[10][11][12], which suggests that response to selection for AFC is a slow process.
Female reproductive traits are measured only on females and late in life, and generally have low heritability. As a consequence, a large number of individuals need to be analyzed for sufficiently accurate genetic evaluations. The advantage of genomic selection to estimate such reproductive traits is that, unlike traditional selection based on pedigree information and phenotypes only, it allows to select animals accurately without their own phenotypic measurements or those of their relatives and thus, genetic gains are increased compared to that obtained with traditional evaluation methods [13].
Usually, genome-wide association studies (GWAS) of traits of economic interest using high-density single nucleotide polymorphism (SNP) panels aim at identifying regions that can explain the inheritance of the traits [14]. GWAS results allow the identification of candidate regions that can then be used for genomic selection.
GWAS were first performed using the least square method, by applying Bonferroni correction to make inferences about the significance of individual SNPs. According to Peters [15], the main problems that are encountered with this method are high rates of false positives and overestimation of the effects of quantitative trait loci (QTL) and SNPs. One alternative to overcome these problems is to estimate SNP effects by Bayesian inference for which all SNPs are considered simultaneously.
The objective of this study was to perform a GWAS using a Bayesian approach to identify candidate genes that influence reproductive traits. Candidate genes are the target for future fine-mapping studies to search for causal mutations. Causal mutations that explain more than 1 % of the phenotypic variance are useful for inclusion in commercial low-density SNP arrays because since they are causal and are not in linkage disequilibrium (LD) with nearby SNPs, they are more cost-effective and informative, contribute information for the estimation of breeding values and it is not necessary to revalidate their SNP effects at each generation.

Phenotypic data
We used data from Nellore females that belonged to the company Agropecuária Jacarezinho Ltda. Heifers were born either in Valparaiso-SP or in Cotegipe-BA, Brazil. Bos indicus reaches puberty later in life than Bos taurus. In order to detect sexually precocious heifers, Agropecuária Jacarezinho Ltda. performs two breeding seasons. During the early breeding season, which occurs between February and April and lasts approximately 60 days, heifers are exposed to bulls at an early age (14 to 16 months). All heifers are exposed to bulls, irrespective of weight and body condition. Artificial insemination, controlled breeding or multiple-sire breeding, with a bull:cow ratio of 1:30, is used. Pregnancy is confirmed about 60 days after the end of the anticipated breeding season. Heifers that did not conceive during their first breeding season are exposed again to bulls at 2 years of age. The criteria for culling of females are: failure to conceive before 2 years of age, inability to conceive in any subsequent year, and low progeny performance. A small percentage of females are culled due to health problems. Close to the expected calving date, females are moved to calving paddocks. After calving, calves are divided according to sex and age and are transferred to another pasture together with their mothers where they remain until weaning. The calves are weaned at 7 months and grouped together until yearling age (18 months).
Heifer rebreeding (HR) is a binary trait and was defined by attributing a value of 2 (success) or 1 (failure) to females that calved or not, respectively, given that they had calved once. Age at first calving, measured in days, was obtained as the difference between the date of first calving and the date of birth of the heifer.
The contemporary group (CG) for HR was defined by farm, year and season of birth of the cow, and calf sex. Contemporary groups without variability in HR, i.e., groups in which all animals showed the same response category (1 or 2) were eliminated. For AFC, the CG was formed by farm, year and season of birth, and management group at birth, weaning and yearling. Phenotypic data outside the intervals given by the mean of the CG ± 3 standard deviations were also excluded. Fifteen CG were formed for HR with an average of 92 animals in each group, and 17 CG were formed for AFC with an average of 89 animals in each group. Age at first calving varied between 748 and 1253 days, with a mean of 1049 ± 141.3 days. The percentage of success in HR was equal to 72,42 % ± 19,21.

Genotypic data
Data from 2056 females born between 2007 and 2009, which were genotyped with the Illumina Bovine HD assay (Illumina, San Diego, CA, USA), were used. The quality control (QC) of genotypes was performed iteratively according to the following criteria: we excluded 22 851 SNPs in non-autosomal regions, 32 856 SNPs with a gene call score less than 0.70, 18 982 SNPs with a call rate less than 0.98, 362 148 SNPs with a minor allele frequency (MAF) less than 0.02, 9466 SNPs with a p-value for the Hardy-Weinberg equilibrium test less than 10 −5 and 26 341 SNPs that were highly correlated (r 2 > 0.995) with other SNPs from the same window that contained 100 consecutive SNPs. Samples with a call rate less than 0.90 were also excluded from the analysis. The QC process was repeated until no further SNP or sample was excluded which resulted in a final dataset of 1853 heifer and 305 348 SNPs.

Data analysis
SNP effects were estimated using the BAYESCπ method [16] in which they have a common variance, follow a scaled inverted chi-square distribution a priori, with ν g degrees of freedom and a scale parameter S g 2 . Thus, the effect of an SNP with probability (1-π) is a univariate Student's t (0, ν g, S g 2 ) distribution. In this study, ν g was set to 4.2 and S g 2 was calculated based on additive genetic variance according to Habier et al. [17].
The analyses were performed using the GS3 software developed by Legarra et al. (2011; http://snp.tolouse.inra. fr/~alegarra). A total of 300 000 MCMC iterations were used, with a burn-in period of 30 000 cycles and the results were saved every 30 cycles. Convergence was assessed by visual inspection of trace plots of the posterior density of genetic and residual variances.
The dependent variables used in the analysis were the phenotypes observed for HR and AFC. All females with available genotypes and phenotypes were used. The following model was applied: where y is a vector of phenotypes; X is an incidence matrix of systematic effects; b is a vector of systematic effects; Z is an incidence matrix of polygenic effects; u is a random vector of polygenic effects of all individuals in the pedigree; W is a matrix (n x s) consisting of the genotypes of s SNPs for each animal n; g is a random vector of SNP effects; and e is a vector of residual effects. A systematic effect of CG was assumed for each trait. For HR, the linear effect of the rest period (number of postpartum days until the beginning of the second breeding season) was included as a covariate. An inverted chi-square distribution with 4.2°of freedom was assumed for the a priori distribution of residual variance.
It is not possible to obtain p-values for the SNP effects using Bayesian approaches. An alternative to p-values is the Bayes factor [16], which was calculated to evaluate the significance of the SNPs on the traits as follows: where p is the posterior probability of an SNP to be assigned a non-zero effect and π is the a priori probability of an SNP to be included in the analysis. The following scale adapted by Kass and Raftery [18] and applied in QTL detection by Varona et al. [16] and Vidal et al. [19] was used: if BF are between 3 and 20, they provide suggestive evidence; if BF are between 20 and 150, they provide strong evidence; and if BF are greater than 150, they provide very strong evidence.
When BF are used, there is no need for Bonferroni correction because all the SNPs are introduced simultaneously in the analysis, and their estimates are already penalized by their prior information. In this study, SNPs with a BF greater than 150 were considered significant.

Identification of genes
The Map Viewer tool of the bovine genome (http:// www.ncbi.nlm.nih.gov/projects/mapview/map_search.cgi? taxid=9913&build=6) was used to determine the location of the significant SNPs on the genome. Genes that contained significant SNPs were listed. For SNPs that were not located within genes, the closest gene (either on the 5' or 3' end) was recorded with the distance between gene and SNP.

Results and discussion
Forty-two and 19 SNPs with a Bayes factor greater than 150 were detected for HR and AFC, respectively (Figs. 1 and 2). As expected, all SNPs that were significant for AFC were also significant for HR (Table 1), since both traits are indicators of reproductive efficiency in beef cattle and should be, at least in part, under the control of the same groups of genes. Gene symbols and their respective names are in Table 2.
The 42 SNPs that were significant for HR are located within or next to 35 genes that are distributed over 18 chromosomes; of these, 27 are protein-encoding genes, six are pseudogenes, and two are miscellaneous noncoding (nc) RNAs. A pseudogene is a nucleotide sequence that is similar to a normal gene, but is not expressed. Miscellaneous nc RNAs are small noncoding RNAs sequences that do not carry information for producing proteins but can have various important functions in the cell.
Genes that contained significant SNPs were listed (distance = 0); if the SNPs were not located within genes, the closest gene (either on the 5' or 3' end) was identified with the distance between SNP and gene. Espigolan et al. [20] reported that since LD in Nellore cattle was lower than in Bos taurus breeds, a distance of less than 30 kb was required for genomic prediction/association. However, in our study, although some genes were more than 30 kb away from a significant SNP, they were retained because they were the closest annotated genes.
The identified genes can be grouped according to metabolic function, including groups of genes that are involved in the formation and physiology of the central nervous system (LOC781274, NEUROD6, GPR98, GALC, MIR124A-2, MAPK8IP3, RBFOX1, and ODZ3); in the formation and physiology of the female reproductive system    FHIT, and LOC782601). The last group of genes plays a role in different metabolic pathways, including cellcell signaling, protein synthesis and transport, oxygen transport, cell proliferation and survival, transcription and metabolism of nucleotides and histones, membrane transport, and the formation of the cell membrane and its receptors, among others. Genes that are involved in the formation and physiology of the central nervous system play a role in reproduction, since they influence neuronal formation, differentiation and communication and the synthesis of reproductive hormones of the hypothalamus-pituitary axis by acting on the hormone cascade that coordinates the estrous cycle in females. Thus, because the genomic regions that are described here are biologically relevant to animal physiology, they are good candidates for marker-assisted selection. Fortes et al. [21][22][23] and Hawken et al. [24] identified other genes that belong to the same group, i.e., genes acting on the central nervous system, associated with puberty and fertility traits in Brahman cattle and tropical composite breeds.
Similarly, genes that are involved in the formation and physiology of the female reproductive system should affect the onset of the estrous cycle, conception, pregnancy establishment and maintenance, and calving. In general, the genes of this group that were detected here act on the formation of specific tissues and hormone synthesis. One example is the LOC782102 gene which encodes a component of the egg membrane that is responsible for sperm attraction. Polymorphisms in LOC782102 may result in a protein that is more or less functional for spermatozoid recognition and thus pregnancy will be either facilitated or impaired.
Lipid metabolism is intimately related to reproduction and according to [25], in dairy cattle, the success of postpartum rebreeding depends on the accumulation of fat reserves in the animal, which permits the cow to start cycling again. During the postpartum period, dairy cows enter a state of negative energy balance and mobilize body fat for milk production since they are unable to meet their energy requirement solely through feeding. As a consequence, the presence of favorable alleles of genes related to energy metabolism may be associated with better reproductive performance. Many genes related to fat metabolism and, consequently, to reproduction have been described in dairy cattle [26,27]. This mechanism has already been demonstrated in beef cattle and genes that are part of this metabolic pathway have been reported [21].
It is known that growth influences reproduction as demonstrated by the observation that heavier animals reach sexual maturity later in life [28]. Growth-related genes have been associated with fertility and puberty in Zebu and tropical composite breeds [23,24], in agreement with the results obtained here for genes related to bone growth. Our results on the chromosome location of the identified SNPs are similar to those of previous investigations on the association between SNPs and reproductive traits. Sahana et al. [29] detected SNPs for fertility traits on most of the chromosomes in Danish and Swedish Holstein females, which were associated with pregnancy rate, interval from first to last insemination in cows, number of inseminations per conception in cows, and interval from calving to first insemination. The authors reported the presence of significant SNPs in a region between 28.5 and 68.06 Mb on BTA13 (BTA for Bos taurus chromosome), which is larger than the region that we identified here. However, the SNPs that we found significant for AFC and HR are within the region reported in [29].
Schulman et al. [30] identified significant SNPs on BTA27 between 6.08 and 21.65 Mb, which were associated with non-return rate for heifers in Finnish Ayrshire heifers. Here, we detected three significant SNPs on BTA27 in the region between 3.08 and 44.15 Mb, which, although much larger than that reported in [30], contains an SNP at 12.3 Mb very close to the SNP identified by [30], i.e., at 11.2 Mb. This result suggests the presence of a QTL on this chromosome.
Pausch et al. [31] described three significant SNPs for calving ease on BTA21 in Fleckvieh females at 2.15, 2.33, and 2.38 Mb. Although we also detected one SNP on BTA21, it is located at quite a large distance from those reported by Pausch et al. [31].
Hawken et al. [24] reported 66 significant SNPs (P < 0.001) for postpartum anestrous interval in Brahman cattle. Although their results showed that BTA3 and 14 contained the largest numbers of SNPs, the most significant SNP was on BTA6 at 118 Mb. In our case, we found no significant SNP on BTA6. The same authors also described 68 significant SNPs for first postpartum ovulation before weaning mainly on BTA3, 6, 14, 17 and 21 in Brahman cattle. The most significant SNP was at 112.3 Mb on BTA3. In our case, three significant SNPs were detected on BTA3, but in the region between 6.8 and 58.3 Mb.
Although several studies have reported significant SNPs associated with fertility traits in cattle on BTA14 [23,24,[29][30][31], we did not detect any significant SNP associated with HR and AFC on this chromosome. This may be due to genetic differences between Bos taurus and Bos indicus, since Bos indicus females reach puberty later than Bos taurus females. Evidence from Hawken et al. [23], who reported that the number of significant SNPs on BTA14 associated with fertility traits was much smaller in Brahman cattle than that from other studies in Bos taurus [29][30][31], supports this hypothesis. Table 3 Significant SNPs for heifer rebreeding with chromosome number, percentage of phenotypic variance explained by the SNP (%PV) and cumulative percentage (%CPV) Taken together, the 42 SNPs significant for HR and the 19 SNPs significant for AFC explained 11.35 % ( Table 3) and 6.42 % (Table 4) of the phenotypic variance of these traits, respectively. These SNPs will be useful to generate a specific panel for Nellore animals.

Conclusions
The use of Bayes factors to determine the significance of SNPs allowed us to identify two sets of significant SNPs, i.e., 42 for HR and 19 for AFC that explain 11.35 % and 6.42 % of their phenotypic variance, respectively. These SNPs provide relevant information about HR and AFC that will contribute to elucidate which genes affect these traits. Our results led us to suggest a list of candidate genes for reproductive traits in beef cattle.