Multiple-trait QTL mapping and genomic prediction for wool traits in sheep

Background The application of genomic selection to sheep breeding could lead to substantial increases in profitability of wool production due to the availability of accurate breeding values from single nucleotide polymorphism (SNP) data. Several key traits determine the value of wool and influence a sheep’s susceptibility to fleece rot and fly strike. Our aim was to predict genomic estimated breeding values (GEBV) and to compare three methods of combining information across traits to map polymorphisms that affect these traits. Methods GEBV for 5726 Merino and Merino crossbred sheep were calculated using BayesR and genomic best linear unbiased prediction (GBLUP) with real and imputed 510,174 SNPs for 22 traits (at yearling and adult ages) including wool production and quality, and breech conformation traits that are associated with susceptibility to fly strike. Accuracies of these GEBV were assessed using fivefold cross-validation. We also devised and compared three approximate multi-trait analyses to map pleiotropic quantitative trait loci (QTL): a multi-trait genome-wide association study and two multi-trait methods that use the output from BayesR analyses. One BayesR method used local GEBV for each trait, while the other used the posterior probabilities that a SNP had an effect on each trait. Results BayesR and GBLUP resulted in similar average GEBV accuracies across traits (~0.22). BayesR accuracies were highest for wool yield and fibre diameter (>0.40) and lowest for skin quality and dag score (<0.10). Generally, accuracy was higher for traits with larger reference populations and higher heritability. In total, the three multi-trait analyses identified 206 putative QTL, of which 20 were common to the three analyses. The two BayesR multi-trait approaches mapped QTL in a more defined manner than the multi-trait GWAS. We identified genes with known effects on hair growth (i.e. FGF5, STAT3, KRT86, and ALX4) near SNPs with pleiotropic effects on wool traits. Conclusions The mean accuracy of genomic prediction across wool traits was around 0.22. The three multi-trait analyses identified 206 putative QTL across the ovine genome. Detailed phenotypic information helped to identify likely candidate genes. Electronic supplementary material The online version of this article (doi:10.1186/s12711-017-0337-y) contains supplementary material, which is available to authorized users.

fleece weight and staple length are positive and moderate to high [7] but there are significant antagonisms between some traits. For instance, unfavourable correlations exist between fleece weight and fibre diameter, fibre diameter and staple strength, and wrinkle score and fleece weight [8]. Thus, individual causal polymorphisms are likely to have pleiotropic effects on multiple traits, possibly because they operate through physiological mechanisms that affect wool growth generally. Genetic correlations between measurements at yearling and adult ages for the same trait are moderate to high and range from 0.6 to 0.9 for fleece weights and are higher for fibre diameter [9].
Genomic prediction is an attractive approach for sheep breeders because estimated breeding values (EBV) can be calculated from DNA marker data (genomic selection [10]) when animals are too young to be measured for some phenotypes. In several cases, non-linear Bayesian methods, such as BayesR, result in more accurate EBV than genomic best linear unbiased prediction (GBLUP) because they give more weight to markers that are close to the causal polymorphisms [11].
Genome-wide association studies (GWAS), which use similar data to genomic selection, have been widely used to map causal variants in livestock and humans [12,13]. Both genomic selection and GWAS are usually performed one trait at a time, which limits their power to detect single nucleotide polymorphisms (SNPs) that are associated with multiple traits and hence to study patterns of pleoitropy. Bolormaa et al. [14] showed that a novel multi-trait analysis, which combines the results from a single-trait GWAS for 56 individual body composition traits in sheep, increased the power to detect pleiotropic quantitative trait loci (QTL). However, precise mapping of QTL may be difficult in such GWAS studies because SNPs that are at a long distance from the QTL can be identified as associated with the QTL due to longrange linkage disequilibrium (LD) between SNPs in livestock. Genomic selection models, especially non-linear models, which fit all SNPs simultaneously, tend to be less affected by this problem, and map causal variants or QTL more precisely than GWAS [11]. Ideally, we would like to find a common set of SNPs with a maximum power to map QTL, to show their pleiotropic effects and to estimate breeding values.
Although multi-trait BLUP genomic selection methods are available, they become cumbersome when the number of traits is very large. Here, we present three approximate multi-trait analyses that use the results from single-trait BayesR and GWAS analyses as data to identify the SNPs that are closest to pleiotropic QTL. We applied these methods for 22 traits (each measured at two ages) that describe wool production and quality (i.e. measured and visually assessed wool traits), and indicator traits associated with susceptibility to breech flystrike on 5726 sheep with genotypes for 510,174 SNPs.

Phenotype data and traits
The Merino and Merino crossbred animals used in this study were sourced from the Information Nucleus (IN) flock of Cooperative Research Centre for Sheep Industry Innovation (Sheep CRC) [15,16]. In total, 7191 animals were available with phenotype records on 22 traits each measured at two ages ("yearling"; 150 < days < 550 days, and "adult"; ≥550 days), including wool production and quality traits and breech flystrike indicator traits. Trait definitions, numbers of records for each trait, raw means and standard deviations based on the phenotyped animals and number of genotyped animals are in Table 1. A complete description of the design, methods and analyses of wool production and quality assessments is in Hatcher et al. [17].
Prior to shearing at each IN site, the sheep were assessed for a series of visual wool scores, including staple structure (SSTRC), staple weathering (WEATH), wool character (CHAR), fleece rot (FLROT), dust penetration (DUST), and greasy colour (GCOL), and visual breech traits including breech cover (BCOV), crutch cover (CCOV), and dag (DAG) [18]. Each assessment was based on a five-point system in which low scores represent desirable attributes and high scores represent undesirable attributes. A mid-side wool sample (75 to 85 g) was taken from the right side of each animal using an electric handpiece. The samples were measured in a commercial laboratory (AWTA Limited, Melbourne, Vic., Australia) for a range of wool traits. Ten staples from each mid-side sample were randomly sub-sampled to measure staple length (SL) and staple strength (SS). The remainder of each mid-side sample was weighed, washed in hot water with detergent, rinsed in cold water twice, spun and oven-dried at 105 °C. The oven-dried weight was recorded and the 16% regain used to calculate the washing yield (YLD). A Shirley Analyser (AWTA Limited, Melbourne, Vic., Australia) was used to card the dried scoured sample before conditioning at 20 °C and 65% relative humidity for 24 h, after which 2-mm snippets were sampled via mini-coring. The snippets were measured for mean fibre diameter (FD), FD coefficient of variation (FDCV) and mean fibre curvature (CURV) by Sirolan ™ Laserscan (AWTA Limited). The washed carded sample was further subsampled and measured for various tristimulus values (T units) that are routinely used to describe aspects of clean colour (X, Y, Z and Y-Z), where X refers to reflected red light, Y to reflected green light and Z to reflected blue light. With wool, the Y value indicates brightness, with increasing values indicating increasing brightness, and the difference between the Y and Z values (Y-Z) indicating wool yellowness [19]. An additional visual breech score, i.e. breech wrinkle (BRWR), was scored post shearing [18]. Not all sheep were measured for all traits.

Genotype data
This study used the Ovine Infinium ® HD SNP BeadChip that was developed under the auspices of the International Sheep Genomics Consortium (http://www.farmiq. co.nz/) and includes 606,006 high-density (HD) SNPs, and the Illumina 50 k Ovine SNP chip (Illumina Inc., San Diego, CA, USA) that includes 54,241 (50 k) SNPs. All SNPs were mapped to the OAR 3.1 build of the ovine genome sequence using SNPchiMp v.3 [20]. Sporadic missing genotypes for the HD SNP chip were filled using FImpute [21]. Quality control of genotypes, imputation of sporadic missing genotypes within each SNP chip, and imputation of the 50 k SNP genotypes to HD SNPs are described in [14,22]. The details of the quality control are summarised below.
Stringent quality control procedures were applied to the SNP data, i.e. SNPs were excluded if the call rate per SNP (which is the proportion of SNP genotypes that have a GC (Illumina GenCall) score above 0.6) was less than 95%, the minor allele frequency was lower than 0.01 or if departure from Hardy-Weinberg equilibrium (P < 10 −5 ) was extreme. These criteria were applied on each batch of genotypes separately rather than to the whole dataset. Furthermore, if the average call rate per individual was less than 90%, those animals were removed from the SNP data. The final set of genotyped animals used in this study included 5726 animals with phenotypic records for at least one trait: 690 animals were genotyped for 510,174 SNPs, and the remaining 5036 animals were genotyped with the 50 k SNPs, which were imputed to 510,174 SNPs using a multi-breed population of 1735 animals. Crossvalidation within these 1735 HD genotypes revealed an average accuracy of imputation (correlation of imputed Table 1 Number of records, their mean, standard deviation (SD), estimated heritabilities (h 2 ), and variance explained by sire-by-flock interaction for each trait at yearling (Y) and adult (A) ages based on the animals with phenotypic measurements GFW = greasy fleece weight; YLD = wool yield; SL = staple length; SS = staple strength; FD = mean fibre diameter; FDCV = fibre diameter coefficient of variation; mean fibre curvature (CURV); BRWR = breech wrinkle; BCOV = breech cover; CCOV = crutch cover; DAG = dag; SSTRC = staple structure, WEATH = staple weathering; CHAR = wool character; FLROT = fleece rot; DUST = dust penetration; GCOL = greasy colour; COL(Z, YZ,Y, and X) = wool clean colour: Z = reflected blue light; YZ = yellowness; Y = brightness; X = reflected red light; SKINQ = skin quality; s.f. = proportion of phenotypic variance explained by sire-by-flock interaction

Single-trait genome-wide association studies
Mixed models that fit fixed and random effects simultaneously were used to estimate heritabilities and the effects of SNPs associated with each of the traits were studied. Pedigree heritabilities were estimated based on all animals for which genotype and phenotype data were available. The pedigree file included 10,360 animals (including 785 sires and 3891 dams). The analysis was performed using the ASReml software [23]. The same mixed model was used for the GWAS, except that each SNP (SNPi, i = 1, 2, 3, …, 510,174) was added to the model as a fixed effect, one at a time, and tested for association with the trait: where y is the vector of observed phenotypic values of the animals, 1 n is an n × 1 vector of 1s (n = number of animals with phenotypes), μ is the overall mean, X, Z 1 , and Z 2 are design matrices relating observations to the corresponding fixed and random effects, b is a vector of fixed effects (described below), a is a vector of polygenic additive genetic effects sampled from the distribution N (0, Aσ 2 a ), where σ 2 a is additive genetic variance and A is the additive relationship matrix constructed from the pedigree of the animals and their ancestors, q, s.f, and e are the vectors of random effects of breed (including Merino strains), sire-by-flock interaction, and residual error, respectively. Q is a matrix with breed and strain proportions calculated from pedigree (q ~ N (0, Iσ 2 q )) [24]; s i is a vector of genotypes for each animal at the ith SNP and α i is the corresponding SNP fixed effect. The sire-by-flock interaction effect being significant (P < 0.05), it was retained in the model. All models included contemporary group, flock, drop year, sex, birth type, and rearing type as fixed effects and age of measurement, age of dam and its squared value as covariates. Birth type (BT: single = 1, twin = 2, triplet = 3, and quadruplet = 4) and rearing type (RT: single = 1, twin = 2, and triplet = 3) were grouped together (BTRT). The significant interactions between fixed effects (including flock by BTRT, sex by BTRT, and BTRT by age of dam) were fitted in the model. SNPs were tested for significant association with particular traits at several significance thresholds, and the false discovery rate (FDR) [25] was calculated to account for the thousands of significance tests performed. Based on FDR, we chose stringent significance thresholds (P < 10 −5 and P < 5 × 10 −7 ) to minimise false discoveries ( Table 2).

Genomic prediction
Genomic prediction analyses were performed using two methods: GBLUP [26,27], with the genomic relationship matrix (GRM) constructed as described by [28] and BayesR [29]. Genomic EBV (GEBV) were estimated in GBLUP directly and calculated from SNP effects in BayesR.

Validation populations used for BayesR and GBLUP
The same validation populations were used for both methods. All 5726 genotyped and phenotyped animals were assigned to two groups (straightbred MER and crossbred MER) according to their breed using the breed proportions (Q) of animals, which were derived from pedigree [24]. Of the 5726 individuals, 3883 were straightbred MER (Q MER > 0.90) and 1843 were crossbred MER (0.25 < Q MER ≤ 0.90). All crossbred MER animals and the straightbred animals that had sires in common with crossbred MER were in all training sets but not in the validation set. The remaining straightbred MER individuals were split into five sets by allocating all offspring of randomly selected sires to one of the five datasets (fivefold cross-validation approach). In this way, no animal used for validation had paternal half-sibs in the training population. Thus, the analysis was performed five times using each data fold in turn as a validation group and the remaining fourfolds as the training population (i.e. fourfolds plus crossbred MER from above).

GBLUP
GEBV were predicted using Model (1), but no single SNP effect (s i ) was fitted and a was replaced by g, where g is a vector of GEBV ~N (0, Gσ 2 g ), where σ 2 g is the genetic variance and G is the genomic relationship matrix (GRM). For a SNP to be included in the GRM, its minor allele frequency had to be higher than 0.005, once genotypes (real and imputed) were combined in the whole dataset. Validation animals were included in the GRM but had unknown phenotypes in the calculation of GEBV.

BayesR
The BayesR method [29] assumes that SNP effects are from a mixture of four normal distributions with the variance of each distribution equal to 0, 0.01, 0.1 or 1% of the genetic variance. Gibbs sampling was used to sample from the posterior distributions of the parameters, running 40,000 iterations with 20,000 iterations of burn-in, which were averaged across five parallel chains. Since the BayesR software used in this study does not allow fitting a full-model, residuals were calculated by adjusting the phenotypes for fixed effects, breed effects, and the sire-by-flock interaction effect using ASReml [23]. These residuals were then used as phenotypes in the analysis. The BayesR analysis fits the effects of all SNPs and a residual polygenic effect, the latter with a covariance structure that is proportional to the numerator relationship matrix (A). The SNP effects from BayesR were then used to calculate GEBV for animals in the validation sets.

Estimation of the accuracy of GEBV
For each validation population, the accuracy of genomic prediction was calculated as the correlation between GEBV and the adjusted phenotype corrected for fixed effects, which was divided by the square root of the heritability of the trait (h 2 ) that was estimated by using the 8-generation pedigree of all recorded animals. Thus, we report accuracy as the estimated correlation between GEBV and true breeding values.

Multi-trait analyses to identify pleiotropic QTL
To identify the pleiotropic genomic regions that control a wide range of wool traits, we used three approximate multi-trait analyses: (1) multi-trait GWAS (multi-GWAS) following the procedure described by Bolormaa et al. [30]; (2) approximate BayesR posterior probability of SNP effects across traits (multi-PP); and (3) the linear combination of local GEBV that were derived from BayesR estimates of SNP effects within 250-kb windows across all traits for a total of 9813 windows (multi-LGEBV).

Multi-GWAS
Multi-trait analyses were performed following the procedure in Bolormaa et al. [30] based on SNP effects that were estimated from 44 individual single-trait GWAS. The multi-trait χ 2 statistic was calculated as: multi-trait where t i is a vector of the signed t-values of the effects of the ith SNP for the 44 traits and V −1 is the inverse of the 44 × 44 correlation matrix where the correlation is calculated over the 510,174 estimated SNP effects (signed t-values) between each pair of traits. The power of QTL detection was investigated by comparing the FDR [25] from the multi-trait test with the FDR from the single-trait GWAS. To avoid testing a large number of closely-linked SNPs, only the SNPs with the most stringent P values (P < 5 × 10 −7 ) within each 1-Mb window were selected from the multi-trait analysis. These SNPs Table 2 Number of significant SNPs (P < 10 −5 and P < 5 × 10 −7 ) and their false discovery rates (FDR, %) for each trait from the single-trait GWAS a Trait names see Table 1 b For empty cells, FDR are not available or are higher than 100% Trait a P < 1 × 10 −5 P < 5 × 10 −7 Trait a P < 1 × 10 −5 P < 5 × 10 −7 (P < 5 × 10 −7 ) were assumed to be near QTL that affect wool traits and were also examined as likely candidate gene positions.

Multi-PP
The posterior probability that a SNP had no effect on any trait was calculated as a product of posterior probabilities that a SNP had no effect on any individual trait. We use 1 minus this probability to approximate the probability that a SNP has an effect on at least one trait i.e.: We retained SNPs with a pp effect� =0 higher than 0.3.

Multi-LGEBV
GEBV in the 9813 non-overlapping 250-kb windows (local GEBV) for each animal were calculated based on the BayesR effects of all SNPs in the window. A high variance of local GEBV in a window, means that the window includes a QTL for that trait [11]. For each of the 250kb windows (segments), the covariance of local GEBV between each pair of the 44 traits was standardized for the variability of each trait as follows: where cov LGEBV (y,x) is the covariance of local GEBV between trait y and trait x and σ y and σ x are the phenotypic standard deviations of trait y and x. If a window contains a single QTL, we expect this covariance matrix to be dominated by one linear combination of traits representing the QTL effect. Therefore, we carried out a principal component (PC) analysis of the 44 × 44 covariance matrix and examined the highest eigenvalue. This eigenvalue is the variance, in phenotypic standard deviation units, of the linear combination of traits with the largest variance. Across all 9813 windows, the 120 windows with the highest eigenvalues of the first PC (PC1) were arbitrary selected as containing a QTL. After selecting the 120 top 250-kb windows, we further investigated which SNP, in each of the selected windows, was most highly associated with the PC1 of the (co)variance matrix (the 'best' SNP). To identify the best SNP, a pseudo trait (S LC ), which consisted of the linear combination of local standardised GEBV and the PC1 eigenvector across the 44 traits, was calculated for each animal at each of the selected 250-kb windows using the following formula: S LC = y ′ x, where y ′ is the transpose of a vector of the local GEBV that are standardised (divided) by the phenotypic standard deviation of each trait (1 × 44) in the corresponding 250-kb window, and x is the eigenvector of the PC1 (44 × 1), which was calculated based on the covariance matrix of the standardised local GEBV among 44 traits. Since not all animals were measured for all traits, the missing local GEBV were replaced by the mean of local GEBV across all animals, for which measurements for that particular trait were available in order to calculate a linear combination of the 44 traits. This resulted in a S LC for each animal in each of the 120 chosen windows, which were now used as a pseudo-trait in GWAS within each window to identify the "best" SNP that tagged variation within the segment. The model used in this GWAS was as follows: S LC ∼ mean + SNP i + animal + error, where animal and error were fitted as random effects and SNP i was fitted as a fixed effect, one at a time (the phenotypes used to calculate S LC had already been corrected for other fixed effects, breed effect, and sire-by-flock interaction effect prior to BayesR). After performing this new GWAS, the SNP with the highest F value within each of the corresponding 250-kb windows was chosen as the best SNP to tag the QTL.

Validation of SNP effects Predicting missing phenotypes
The multi-trait validation of SNP effects (i.e. using the linear index approach [30]) requires complete data for all traits at the individual level. For animals without records for a particular trait, missing phenotypes were predicted using a multiple regression approach. This multiple regression used phenotypes that were already corrected for fixed effects, breed effect, and sire-by-flock interaction effect. The multiple regression procedure uses the phenotypic (co)variance matrix between the 44 traits based on all animals (training and validation population), which was estimated using the available phenotypic values. Next, the phenotypic (co)variance matrix was inverted. Then, separately for each animal, traits with phenotypes were ordered before traits with missing phenotypes. Again, for each animal, the missing phenotypes (y n ) were then predicted using the following formula: where y m is a vector of the traits measured on a particular animal, U nn is the inverse of phenotypic covariance matrix between 44 traits with a missing record, and U nn is the inverse of phenotypic covariance matrix of the traits with and without a missing record.

Validation populations for single-trait and multi-trait analyses
To enable validation of SNP effects in independent animals, the 5726 animals with full phenotypic data (including the predicted phenotypic values) were divided into training and validation populations. The same cross-validation approach was used as described in the genomic prediction section above, except that crossbred animals were not excluded from validation sets. Then, one of the five divisions was randomly used as a validation population and the other four divisions as the training population. Only one 4:1 division (i.e. 4649 training animals: 1077 validation animals, out of 5726 genotyped animals) was tested for each of the traits studied.

Validation of SNP effects from the single-trait analysis
The GWAS for all traits were performed separately in the training and validation populations. SNPs with a significant effect in the training population were validated in the validation population for the five traits that displayed the largest number of significant associations. We counted the number of SNPs for which the effect was in the same direction in both the validation and training populations.

Selection of top SNPs from each of the three multi-trait methods in the training population
The single-trait BayesR analyses for all traits were also performed using only the training population (4649 animals). Then, the three multi-trait analyses described in the previous section (multi-GWAS, multi-PP, and multi-LGEBV) were repeated in the validation set to validate the top SNPs from each multi-trait analysis in the training population.

Use of a linear index in multi-trait validation
The top SNPs in each of the three multi-trait analyses (i.e., three different lists of SNPs) in the training population were validated separately in an independent set of validation animals. A linear index (y I ) was calculated for each putative QTL (from each of the three methods) and for each animal. It summarised the information across the 44 traits (22 traits at two ages) and was calculated using the following formula [30]: where animal and error were fitted as random effects and SNP i was fitted as a fixed effect, one at a time, for significant SNPs (P < 10 −5 for each 1 Mbp) discovered in the training population. After performing the GWAS, the significance (P < 0.05) and the consistency of the direction of effects (positive or negative) for the selected significant SNPs were compared between the training and validation populations.

Identification of the most likely candidate genes
The genes that were located 50 kb upstream and downstream of the best SNP were identified using UCSC Genome Bioinformatics (http://genome.ucsc.edu/) and Ensembl (http://www.ensembl.org/biomart/). If there was more than one gene, we retained only the gene that was located nearest to the SNP or the particular gene with a known effect on wool or hair.

Single-trait GWAS
The number of significant SNPs for each trait is in Table 2.

Genomic prediction
Using BayesR, mean accuracies of genomic prediction of 0.21 and 0.23 were obtained across wool traits at yearling and adult ages, respectively (Table 3). Accuracy tended to increase with Th 2 , with T being the number of phenotypes in the training set ( Fig. 1, R 2 = 0.34). GBLUP provided very similar mean accuracies (0.21 and 0.22 at yearling and adult ages, respectively). However, BayesR tended to result in higher accuracies than GBLUP for traits that had a large number of significant SNPs in the single-trait GWAS (Tables 2, 3; Fig. 2).

Multi-GWAS
The multi-trait analyses using GWAS identified many narrow regions that contained more than one significant SNP (e.g. on chromosomes OAR3, 6, 7, 13, 19, and 25, OAR for Ovis aries chromosome; Fig. 3a). Combining the single-trait GWAS in a multi-trait analysis resulted in 563 and 263 significant SNPs at significance thresholds of P < 10 −5 and P < 5 × 10 −7 , respectively. This corresponded to a FDR of 0.9 and 0.1% (respectively), which was lower than for any individual trait tested in the single-trait GWAS ( Table 2). In order to avoid testing a large number of closely-linked SNPs, we identified 64 SNPs that were significant at P < 5 × 10 −7 and which were separated from each other by at least 1 Mb. Figure 4a compares the multi-GWAS with five single-trait GWAS for a region around 59.0 Mb on OAR3. These five traits (AFDCV, YBWR, YFD, YFDCV, and YSL) were those for which the number of significant SNPs (P < 5 × 10 −7 ) was largest in the single-trait GWAS ( Table 2).

Multi-PP
One hundred and two SNPs had an approximate multitrait posterior probability (pp) higher than 0.3 (Fig. 3b), among which two SNPs had a pp effect� =0 higher than 0.05 for four traits, 11 for three traits, 34 for two traits and the remaining 55 SNPs for one trait. Thus, although the multi-trait pp was calculated using all traits, the pp for the 102 identified SNPs was mainly influenced by between one and four traits. For example, Fig. 4b compares the multi-pp of SNPs on OAR3 around 59.0 Mb with the pp for the five traits YSL, YFD, YFDCV, AFDCV, and YBRWR. Two SNPs that were 39 kb apart had a high pp i.e. higher than 0.82 (Fig. 4b) and were in high LD (r 2 = 0.44), which indicates that they may tag the same QTL. There is another SNP with a multi-trait pp ≈ 0.5 that is located 0.5 Mb upstream of these two SNPs.

Multi-LGEBV
Local GEBV, using only the SNPs within a 250-kb window, were calculated for each animal using the BayesR estimates of SNP effects. The variance of local GEBV for each window and trait was calculated. A high variance indicates that within the 250-kb window there is a QTL for that trait. The highest percentage of variance of local GEBV was equal to 1.9% of the phenotypic variance for yearling staple length (YSL), which indicates that we did not detect QTL with very large effects for any of the traits. Figure 4c shows the variance for five traits for four windows around OAR3:59 Mb.
For each of the 9813 windows, a PC analysis yielded 44 eigenvalues and eigenvectors. A high first eigenvalue indicates a window that contains a QTL. We selected the 120 windows with the highest eigenvalues as windows that potentially contain a QTL (Fig. 3c). The distribution of the log 10 of the first eigenvalues for 120 segments with the highest first eigenvalue is in Fig. 5a. If within the window, there is a single QTL that has an effect on multiple traits, we expect that the first eigenvector will explain most of the variance and that the other eigenvalues will be low. We selected 120 segments with the highest first eigenvalues across all 9813 segments and in 112 of these 120 selected segments, the PC1 explained more than 90% of the total variance. The distribution of the proportion of variance explained by PC1 eigenvalues across the 9813 windows is in Fig. 5b. Figure 4c shows the first eigenvalues for four segments on OAR3. The second segment (58.75-59.25 Mb) had the highest eigenvalue, but the segment to its left had a high variance of local GEBV for YFD in spite of a lower first eigenvalue. This suggests that the first segment contains a more pleiotropic QTL while the second segment contains a QTL that affects YFD mainly.
In order to identify the SNP that is closest to the QTL (best SNP), for each of the 120 segments, we performed a new GWAS by using the eigenvector of PC1 as a new trait and fitting each SNP, one at a time, within each segment. Thus, the dependent variable was the linear combination of traits as defined by the first eigenvector. By this process, we selected the top SNPs (with the highest F value) in each of the top 120 segments. For example, Fig. 4d shows the eigenvalues (×10 3 ) of the four neighboring windows at around 59.0 Mb on OAR3 and the F values of the SNP effects (×10 −3 ) for PC1 in the corresponding windows. The second segment had a high eigenvalue that explains 98% of the total variance and three SNPs (circled in blue, orange and green) were most highly associated with the first eigenvector for this segment. The first segment had a comparably lower

Table 3 Average accuracies of GEBV of the fivefold crossvalidation populations using BayesR and GBLUP methods for each trait at yearling (Y) and adult (A) ages
SE standard error of average accuracy of GEBV a Trait names see Table 1 b Average accuracy across traits first eigenvalue but this eigenvalue explained 96% of the total variance and there is only one SNP (circled in yellow) that was associated with the first eigenvector for that segment. The third segment had an even lower first eigenvalue (explaining 68% of the total segment variance), which probably indicates that there is no QTL in that segment. However, we did observe one SNP that was highly associated with the first eigenvector (Fig. 4d), which illustrates one of the drawbacks of this method. Since local GEBV are calculated from SNP genotypes, there will always be one or more SNPs associated with the first eigenvector but it should not be interpreted as evidence for a QTL unless the first eigenvalue is high.

Comparison and combination of results from the three multi-trait analyses
As described above, we selected 64 SNPs from the multi-GWAS (most significant SNP (P < 5 × 10 −7 ) in each 1-Mb window), 102 SNPs with the highest multi-PP and 120 from the analysis of local GEBV. Among these, 75 SNPs overlapped across two or three analyses. Therefore, the total number of SNPs identified was 206. Of these 206 SNPs, seven were identified by all three methods. In addition to these seven SNPs, 64 were identified by both local GEBV and multi-PP, and two by multi-PP and multi-GWAS ( Fig. 6 and see Additional file 1: Table S1). Of the 64 top multi-GWAS SNPs, 55 were not among the top SNPs selected by the other two multi-trait methods. In fact, 50 of these 55 SNPs were in segments that did not have a high eigenvalue based on local GEBV, and the first eigenvalue explained less than 90% of the total variance. Thus, it is possible that these SNPs are at some distance from the causal variant, which is located in another segment. Conversely, among the SNPs that were identified by the multi-LGEBV and multi-PP analyses, 38 (=9 + 13 + 16) were significant in the multi-GWAS (P < 10 −5 ) although they were not the most significant SNPs in their 1-Mb window, or failed to reach a P < 5 × 10 −7 . By adding these 38 SNPs to the 64 SNPs from the multi-GWAS, 20 (=7 + 13) are common to all three multi-trait approaches, nine in the multi-LGEBV and multi-GWAS and 18 (=2 + 16) in the multi-PP and multi-GWAS (Fig. 6). These results are illustrated in Fig. 4. The points that are circled in the same colour in Fig. 4a, b, d represent the same SNP. The second segment had the highest eigenvalue and explained 98% of the total variance of local GEBV in that segment. The SNPs circled in blue, orange, and green are those that were most highly associated with the first eigenvector (Fig. 4d) and had the highest multi-PP (Fig. 4b). These were also highly significant in the multi-GWAS (Fig. 4a). The SNP circled in blue (at 59,019,274 bp on OAR3) was identified by all three multi-trait methods (see Additional file 1: Table S1). In all three methods, the same traits (YSL, YBRWR, YFDCV, AFDCV) contributed to the multi-trait statistics, which supports the conclusion that this is a pleiotropic QTL with effects on at least these four traits. The first segment in Fig. 4 had a high variance of local GEBV for YFD, but when this trait was combined with the other 43 traits, the first eigenvalue was lower than that of the second segment. However, the SNP that is circled in yellow was associated with the first eigenvector, had a high multi-PP and was significant (P < 10 −5 ) in the multi-GWAS. In all three analyses, this SNP was associated with YFD.
In Bayes R, which generates the statistics for multi-PP and local GEBV, all SNPs are fitted simultaneously in the model. Thus, it is possible that the QTL tracked by the yellow circled SNP differed from the QTL tracked by the blue, orange and green circled SNPs. This hypothesis is supported by the traits that are affected by the SNP. Using the effect of each SNP on the 44 traits as estimated by the GWAS, it is possible to calculate the correlation between any pair of SNPs. Figure 7 displays the correlation among SNP effects in this region of OAR3 as a heat map. It shows that all the other SNPs except that circled in yellow have highly correlated effects and presumably track the same QTL. In the multi-GWAS (Fig. 4a), another significant SNP at 58.72 Mb was observed, which was not detected in either of the other analyses presumably because it tracked the same QTL as the SNP at 59.0 Mb circled in blue, orange and green colours. Figure 8 illustrates two more straightforward examples where the same SNP was identified by all three methods.

Validation of SNP effects Validation of SNPs from single trait GWAS
The most significant SNPs (P < 10 −5 ) in each 1-Mb window were tested in the validation population. Table 4 shows the results for five traits, which were among those that had the largest number of significant associations ( Table 2). The number of SNPs tested ranged from 14 (YYLD) to 101 (YFD). The proportion of these SNPs that were significant at P < 0.05 in the validation population varied from 0.07 (=1/14 for YYLD) to 0.30 (=7/23 for AYLD). The percentage of these SNPs that had an effect in the same direction in both the validation and training populations varied from 57 to 86% (Table 4). In addition, when a significance level was also imposed on SNPs discovered in the validation population, the percentage of SNP effects in the same direction across training and validation populations increased from 75 to 100% (small numbers of SNPs were discovered in the relatively small validation population partly due to lack of power).

Multi-trait validation using the linear index approach
Association between a SNP and its corresponding linear index was tested in the validation sample. The 105, 77, and 120 top SNPs were selected from the multi-GWAS (P < 5 × 10 −7 in each 1-Mb window), multi-PP, and multi-LGEBV analyses, respectively, in the training population. These SNPs were tested in a GWAS in the validation population (see Table 5). Of the 105 multi-GWAS SNPs that were significant in the training population, 70% had an effect in the same direction in both the training and validation populations and 16 were also significant (P < 0.05) in the validation population of which 88% had an effect in the same direction in both the validation and training populations ( Table 5). The results were slightly better for the 77 SNPs selected by the multi-PP analysis: 19 out of these 77 SNPs were significant and all had an effect in the same direction in both the validation and training populations. Thus, the number of validated SNPs (with a significant effect in the same direction) was largest with the multi-PP approach (19) followed by multi-LGEBV (15) and multi-GWAS (14 = 16 × 0.88).

Examples of QTL with a similar pattern of effects across traits
If two QTL affect the same physiological pathway, one might expect that they have the same pattern of effects. We assessed the similarity of SNP effects by the correlation between pairs of SNPs across the 44 traits. Two SNPs might have a similar pattern because they affect the same pathway or because they tag the same QTL. Among the 206 SNPs, strong correlations (>0.8) were mainly found among the SNPs located on the same chromosome. For instance, six SNPs were on OAR3: 59  allele that increases SL, SS and FD, but decreases FDCV. However, the SNPs differ in their effects on other traits, such as GFW, thus it is less likely that they act through the same pathway. Similarly, the SNP at OAR13:62.9 (near RALY) and OAR15:72.6 (near ALX4) both have an allele that increases FD and FDCV.

Identification of candidate genes
We searched for genes within 50-kb genomic regions up and downstream from each of the 206 SNPs selected from the three multi-trait analyses (see Additional file 1: Table S1) and identified many genes with known effects on wool and hair growth (Table 6). We also found strong effects for SNPs near the genes ALOX15, ANKS1B, ELOVL6, FASN, NCAPG, LCORL, FTL, and GK5, which are associated with growth, fatness and body composition in sheep and humans [14,[31][32][33].

Genomic prediction
Our results show that the estimated GEBV accuracies are affected by trait, size of the training population, and statistical method used (GBLUP vs. BayesR). As expected from theory [34] in which Th 2 is a critical parameter, traits with a high heritability and a large training population tended to result in higher accuracies than those with an average heritability across populations (Fig. 1). On average, BayesR and GBLUP resulted in similar GEBV accuracies but BayesR resulted in higher accuracy for traits (GFW, YLD, SL, SS, FD, FDCV, and CURV) for which there was a large number of significant SNPs in the GWAS. In an earlier study on the genomic prediction of wool and carcass traits using the 50 k SNP chip based on the same population, Daetwyler et al. [22] found no clear differences in accuracy between GBLUP and BayesR. Kemper et al. [11] found that accuracy of across-breed genomic predictions for selection candidates that are less related to the training animals was higher with BayesR than with GBLUP and the use of BayesR mapped QTL more accurately than GBLUP in dairy cattle. Many other studies in cattle and humans have reported that BayesR results in more accurate GEBV than GBLUP, in particular for traits for which mutations of moderate effect are segregating [29,[35][36][37].
There are few reports on the accuracy of genomic predictions in sheep [38][39][40][41][42][43]. Pickering et al. [43] reported the accuracy of genomic predictions for health traits including dagginess for several New Zealand breeds (Romney, Coopworth, Perendale, Texel and three breed crosses). For the dagginess traits, they found that accuracies of genomic predictions ranged from 0.11 to 0.56 for those breeds, while in our study an accuracy of 0.19 was estimated for YDAG using GBLUP. Daetwyler et al. [38] (on the same population as used here but with fewer records) reported that accuracies of GEBV with 50 k SNPs ranged from 0.15 to 0.79 for wool traits in Merino sheep and from −0.07 to 0.57 for meat traits in all breeds studied. These accuracies were higher than those that we obtained for wool traits. Several factors may explain the difference between our results and those in these previous studies: (1) only 107 animals were included in their validation set resulting in accuracies with larger standard errors; (2) some animals in their validation set were closely related to animals in the training dataset, whereas we deliberately limited relationships between training Table 6 List of candidate genes with known effects on wool or hair growth a Not the nearest gene to the particular SNP with a significant effect, but is a gene with a known effect on wool or hair  [71] and validation animals; and (3) we used within-strain GEBV (i.e., adjusted for genetic differences between Merino strains) whereas they computed GEBV accuracy in the overall population. Our validation approach, in which the relationship between training and validation populations is minimized, is relevant to the commercial use of genomic selection in which a sheep breeder relies on a training population that is not closely related to his own sheep. In our case, the average of the top 10% genomic relationships that a validation individual has with animals in the training population was equal to 0.02. When validation animals are more related to the training animals, it is likely that the estimated genomic prediction accuracies will be higher.

Gene function References
The level of genomic prediction accuracy may also depend on the accuracy of imputation [44]. However, based on the imputation tests from 50 k to HD SNPs (not reported here), the mean imputation accuracy between imputed and real non-50 k genotypes on our HD data (as the proportion of correctly imputed genotypes for non-HD SNPs) was higher than 0.98, which means that this is not a likely reason for the low prediction accuracies obtained with our data.
Overall, the accuracy of GEBV was low. The similar accuracies obtained with GBLUP and BayesR suggest that there are few QTL of moderate or large effect, which is supported by the single-trait GWAS results for many traits. Therefore, we investigated whether combining information from all traits could help to identify QTL for multiple traits.

Multi-trait analyses
It is known that genetic correlations exist between many of the traits studied here and thus there must be QTL with effects on multiple traits. We used three multi-trait analyses to map these QTL. All three methods involve a degree of approximation, thus it is difficult to apply precise significance tests. However, the fact that there is agreement among the results of the three multi-trait and with the single-trait results supports the conclusion that they detect pleiotropic QTL. This is also supported by the rates of validated individual SNPs.
The multi-trait GWAS method used here was previously shown to increase the power of QTL detection [14,30]. In most 1-Mb intervals that we selected, the pattern of the SNP effects is similar across traits, thus the high correlation among traits, which implies that there is probably only one major QTL within a given interval. The most significant SNP in a region varies from one trait to another due to sampling error even if there is only one QTL. Unless the errors for different traits are highly correlated, the multi-trait analysis reduces the sampling error, which results in a more precise localization of the QTL. However, because long range LD occurs in the ovine genome [45] a SNP that is located at a long distance from any QTL can still have a significant association with the trait. Combining this with the large number of QTL that affect most complex traits, a SNP associated with one QTL may merge with those associated with another nearby QTL, which decreases our ability to map the QTL. To overcome this difficulty, methods that fit all SNPs simultaneously, such as the BayesR method [29] used here, have been advocated for QTL mapping.
In dairy cattle, Kemper et al. [11] showed that BayesR maps QTL more precisely than GWAS. A multi-trait BayesR analysis with 44 traits would impose a very large computational burden, thus we used two approximate methods (multi-PP and multi-LGEBV) to combine the results from 44 single-trait BayesR analyses. In the single-trait BayesR analysis, high variance of local GEBV indicates the presence of a QTL in that 250-kb window. The equivalent multi-trait test is based on the first eigenvalue that is caused by one or more traits having a high variance of local GEBV. If there is more than one trait with a high variance and if the local GEBV are correlated between different traits (as expected if there is only one QTL), the first eigenvalue increases. Windows with more than one QTL can occur but for 112 of the 120 windows with the highest eigenvalues, the first eigenvalue explained more than 90% of the total variance, which indicates that windows with only one QTL predominated in our study. To identify the SNP that was located nearest to this QTL, we carried out a local GWAS using the local GEBV as a new trait.
Since multi-LGEBV and multi-PP both used the output from single-trait BayesR analyses, it is reassuring that they detected many common SNPs. In fact, of the 102 SNPs found by multi-PP, 64 were also detected among the 120 best SNPs from the multi-LGEBV (Fig. 6). However, there was less agreement between multi-GWAS and the BayesR based methods. In particular, when we considered only the top multi-GWAS SNPs (P < 5 × 10 −7 ) in each 1-Mb window, the multi-GWAS detected 64 SNPs, but only seven were found by the other two methods. Fifty-five of the SNPs detected by the multi-GWAS were not identified by the other two methods but 50 of them are located in the windows, which had a low eigenvalue of PC1 with PC1 explaining less than 90% of the total variance. Decreasing the threshold for the multi-GWAS to P < 10 −5 (Fig. 6) improves the agreement between the multi-GWAS and the other two methods.

Validation of SNP effects
The proportion of SNPs that were confirmed in the validation set by the multi-trait methods was equal to that obtained for the best single trait (fibre diameter) and higher than that for most single traits. Among the three multi-trait methods, multi-PP had the highest validation rate. This was unexpected since the two other methods used GWAS in both training and validation, whereas multi-PP did not use GWAS methods in the discovery process. However, a multi-PP can miss some QTL since it relies on finding individual SNPs with a high posterior probability. In some cases, the evidence for a QTL is spread across many SNPs each with a low posterior probability, and thus the local GEBV variance is more likely to detect the QTL than the posterior probability.

Identification of QTL with similar patterns of pleiotropic effects
Stearns et al. [46] pointed out that the relative advantage of multivariate over univariate approaches varied with the level of genetic covariance between traits. In this study, some of the wool traits are genetically highly correlated. Previously, Bolormaa et al. [14] used a correlation matrix of pairs of SNP effects across 56 meat and body composition traits to perform a hierarchical clustering analysis. Using this approach, they identified at least four groups of QTL with similar patterns of pleiotropic effects on body composition (the population of sheep was similar to that used here). In our study on wool traits, the clustering analysis based on 206 SNPs was also done using GWAS SNP effects, but there were no clear-cut clusters of SNPs (results not shown).

Candidate genes
By exploiting pleiotropic effects for mapping QTL, we identified 206 putative QTL, which were close to 130 genes (within a distance of 50 kb on either side of each SNP). In some cases, the known function of the candidate gene fits the observed phenotype well. Table 6 provides a list of candidate genes with known effects on wool or hair growth. For instance, SNPs that are located around 59.0 Mb on OAR3 and near the FOXI3 gene, are associated with multiple traits including wool quality and breech conformation traits (FDCV, SL, SS, FD, BRWR, BCOV, CCOV, and FLROT) at both ages. FOXI3 regulates multiple aspects of hair follicle development and homeostasis and loss of FOXI3 impedes hair follicle down-growth and progression of the hair cycle [47]. Shirokova et al. [47] showed that FOXI3 displays a highly dynamic expression pattern during hair morphogenesis and cycling. In mice, absence of FOXI3 results in a sparse fur phenotype and poor hair regeneration after hair plucking, and these effects are exacerbated with age due to impaired secondary hair germ activation leading to progressive depletion of stem cells. A SNP at 62.6 Mb on OAR4, which is located near (9.8 kb) the BMPER gene, was detected through multi-LGEBV and multi-GWAS (P = 2 × 10 −4 ). Bone morphogenetic protein (BMP) signalling regulates hair follicle cycle and development [48][49][50][51]. BMP signalling is a critical feature of the complex epithelial-mesenchymal cross-talk necessary to produce hair [50].
We also found significant SNPs (Table 7) close to genes (ALOX15, ELOVL6, FASN, NCAPG, and GK5) that are associated with variation in size, fatness and body composition in sheep, cattle and humans [7,14,32,33]. This is not surprising since greasy fleece weight and fibre diameter have positive genetic correlations with yearling weight (0.23 and 0.17, respectively; [7]). The SNPs that are located within or near (<5 kb) ALOX15, FASN, FTL, and NCAPG were associated with GFW (up to |t| = 8.9), while the SNPs in ELOVL6 and GK5 had significant associations with FD (up to |t| = 4.7). Another SNP with an effect of |t| = 4.3 for GFW is located at 37,559,817 bp on OAR6 with the nearest gene being LCORL (at 107 kb). Not surprisingly, the effects of SNPs in NCAPG and near LCORL were highly correlated (r > 0.8) in our data, which indicates that there may be only one QTL in this region [52][53][54]. Furthermore, we found that the LCORL SNP identified in our study (at OAR6:37,559,817 bp) is located 29 kb from the LCORL SNP that was detected in a multi-trait GWAS across carcass and growth traits (at OAR6:37,530,647 bp) [14]. The effects of these two SNPs are strongly correlated (r > 0.8), which indicates that they may be in strong LD with the same underlying causal mutation with pleiotropic effects: i.e. simultaneously increasing carcass and skeletal weights and lean meat yield and decreasing dressing percentage, fatness, and muscling (i.e. CEMD), while increasing wool growth.
In a single-trait GWAS with the 50 K SNP chip in Chinese Merino sheep, Wang et al. [55] identified 28 SNPs that affect fibre diameter, fibre diameter coefficient of variance, and crimp and are located within 12 genes. However, we did not find any significant SNPs (P < 10 −5 ) within these genes.

Application to wool quality improvement
The patterns of the effects of the QTL that we studied here indicate that they have various degrees of usefulness for selection. Some QTL have an allele with desirable effects on more than one trait and appear to be good targets for selection. For instance, the QTL on OAR11 (located at the edge of the FASN gene) has an allele that increases greasy fleece weight, wool yield, and staple length and decreases fibre diameter. This pattern of quality (sheep with higher fleece, higher yield, longer staples, and finer wool) is desirable for sheep breeders and the wool industry because these traits affect the price paid for wool to the producer and the processing efficiency and use of the wool in manufacturing. A SNP on OAR23 at about 44.2 Mb had an allele that increases staple strength and decreases fibre diameter coefficient of variance, breech wrinkle, breech cover, crutch cover and dag, which is a valuable pattern for resistance to breech strike and would reduce the need for flystrike prevention strategies such as mulesing and chemical treatment. SNPs on OAR3:137.1 Mb (KRT86), OAR3:133.9 Mb (WNT1), OAR8:89.0 Mb (MPC), and OAR12:49.9 Mb (RSC1A1) had an allele associated with better staple structure (staple with very fine bundles) and wool character (welldefined crimp with low variation along the staple), which is a desirable pattern for wool manufacturing. Generally, for a given trait, SNPs showed a similar association at both of the ages at which the trait was measured. However, none of the associations explained a large fraction of the variance for any trait. Therefore, although incorporation of the identified QTL may not increase the accuracy of single-trait EBV, they may be useful to manage unfavourable genetic correlations between traits.

Conclusions
For many wool traits, accuracy of genomic prediction was low (average over all traits = 0.22), especially for traits with a low heritability, few records and for which few QTL were identified. In an attempt to identify more QTL for these traits, we examined three approximate multi-trait methods. As well as a multi-trait GWAS, we describe two new multi-trait methods based on singletrait BayesR results. Collectively, these three methods mapped 206 putative QTL of which 20 were common to all methods. Sixteen genes that are located near a significant SNP have known effects on hair growth and a further five significant SNPs are near genes that were previously reported for QTL for growth and body composition. Future research should examine whether genomic prediction accuracy can be increased by using the QTL identified in this paper.