Genomic relationships based on X chromosome markers and accuracy of genomic predictions with and without X chromosome markers

Background Although the X chromosome is the second largest bovine chromosome, markers on the X chromosome are not used for genomic prediction in some countries and populations. In this study, we presented a method for computing genomic relationships using X chromosome markers, investigated the accuracy of imputation from a low density (7K) to the 54K SNP (single nucleotide polymorphism) panel, and compared the accuracy of genomic prediction with and without using X chromosome markers. Methods The impact of considering X chromosome markers on prediction accuracy was assessed using data from Nordic Holstein bulls and different sets of SNPs: (a) the 54K SNPs for reference and test animals, (b) SNPs imputed from the 7K to the 54K SNP panel for test animals, (c) SNPs imputed from the 7K to the 54K panel for half of the reference animals, and (d) the 7K SNP panel for all animals. Beagle and Findhap were used for imputation. GBLUP (genomic best linear unbiased prediction) models with or without X chromosome markers and with or without a residual polygenic effect were used to predict genomic breeding values for 15 traits. Results Averaged over the two imputation datasets, correlation coefficients between imputed and true genotypes for autosomal markers, pseudo-autosomal markers, and X-specific markers were 0.971, 0.831 and 0.935 when using Findhap, and 0.983, 0.856 and 0.937 when using Beagle. Estimated reliabilities of genomic predictions based on the imputed datasets using Findhap or Beagle were very close to those using the real 54K data. Genomic prediction using all markers gave slightly higher reliabilities than predictions without X chromosome markers. Based on our data which included only bulls, using a G matrix that accounted for sex-linked relationships did not improve prediction, compared with a G matrix that did not account for sex-linked relationships. A model that included a polygenic effect did not recover the loss of prediction accuracy from exclusion of X chromosome markers. Conclusions The results from this study suggest that markers on the X chromosome contribute to accuracy of genomic predictions and should be used for routine genomic evaluation.

Y chromosome and is inherited in an autosome-like fashion. This increases the complexity of the genetic relationships between individuals based on the X chromosome. Moreover, in genomic prediction of dairy cattle, deregressed proofs (DRP), daughter yield deviations (DYD) and estimated breeding values (EBV) are usually used as response variables. These variables are predicted using a model in which a pedigree-based relationship matrix is constructed based on inheritance of autosomes. In addition, the density of markers on the X chromosome is markedly lower than that on the autosomes in the current SNP (single nucleotide polymorphism) chips [4,5]. These characteristics may reduce the impact of X chromosome markers on accuracy of genomic prediction, and could be the reason why they are not used for genomic prediction in some countries and populations.
Based on the characteristics of the X chromosome, it can be hypothesized that X chromosome markers can contribute to the accuracy of genomic predictions, but will generally have a smaller impact than autosomal markers. Moreover, genomic prediction using a genomic relationship matrix that takes sex-linked inheritance for X-specific markers into account will probably perform better than using a genomic relationship matrix that does not distinguish between autosomal and X-specific markers. In addition, because marker density is lower on the X chromosome, imputation of X chromosome markers may be less accurate than that of autosomal markers. When genomic predictions are performed using data from SNP chips with different densities, genotypes of SNPs absent from low-density chips are usually inferred (imputed) from the higher density chips. Therefore, it is necessary to investigate the accuracy of imputation of markers on the X chromosome in order to perform genomic prediction using these markers. However, so far there are very few reports on the imputation accuracy of X chromosome markers [6] and on their contribution to accuracy of genomic predictions [7].
The objectives of this study were (i) to investigate the accuracy of imputing missing genotypes on the X chromosome, (ii) to demonstrate a method to calculate a genomic relationship matrix which correctly accounts for genetic relationships with regard to markers on the X chromosome, and (iii) to compare the accuracy of genomic predictions with and without X chromosome information using different models and different scenarios. Data from Nordic Holstein cattle were used to address these objectives.

Data
The data used in this analysis consisted of 5643 progenytested Nordic Holstein bulls born from 1974 to 2010. The data did not include cows since the number of Nordic Holstein cows available as reference animals was insufficient for the present analysis. Animals were genotyped with the Illumina Bovine SNP50 BeadChip [4]. In order to investigate the accuracy of imputation for markers on the X chromosome, low-density (LD) marker data were created from the SNP50 BeadChip marker data by masking markers that are absent from the Illumina BovineLD BeadChip [5]. The Bovine SNP50 BeadChip (about 54K) and the BovineLD BeadChip (about 7K) marker data were edited by removing markers with a minor allele frequency (MAF) lower than 0.01, an average GenCall score lower than 0.60, or an unknown location in UMD 3.1 [1]. After editing, 44 141 markers remained in the 54K data, and 6699 markers in the LD data. The numbers of markers available on the autosomes and on the X chromosome are in Table 1.
The bulls were divided into a reference population and a test population according to birth date, i.e., 3995 bulls born before January 1 2005 constituted the reference population and the remaining 1648 bulls constituted the test population. Four sets of data were used to validate accuracies of genotype imputation and genomic prediction: (1) 54K_real: all animals had marker data from the 54K chip; (2) IMP_test: for the test animals, the 54K marker data were imputed from LD marker data; (3) IMP_0.5ref: for half (randomly chosen) of the reference animals, the 54K marker data were imputed from LD marker data, and (4) LD_real: all animals had LD marker data without imputation to the 54K marker data.
The phenotypic data for genomic prediction were DRP that were derived from the Nordic genetic evaluations of January 2013. Fifteen traits included in the Nordic Total Merit index (http://www.nordicebv.info) were analyzed. DRP with reliabilities lower than 10% for animals in the reference data and lower than 20% for animals in the test data were deleted. The number of animals with phenotypic information differed between traits because the number of bulls with published EBV differed between traits. The number of animals available for genomic prediction and the heritability (provided by Nordic Cattle Genetic Evaluation) for each trait are in Table 2.

Imputation methods
For datasets IMP_test and IMP_0.5ref, the LD marker data were imputed to the 54K data using two programs: Beagle version 3.3.1 [8] and Findhap version 2 [9]. Beagle uses population information and a hidden Markov model to impute missing genotypes. Findhap is a fast program that imputes missing genotypes using both family and population information and takes the inheritance pattern of the X chromosome into account. Therefore, when using Findhap, markers on the PAR of the X chromosome were treated as autosomal markers, while the rest were treated as X-specific markers. The PAR was approximately identified based on the region of the X chromosome where markers had a substantial proportion of heterozygous genotypes (H%) in the genotyped bulls. The starting position of the region was determined with the criteria that the H% at a SNP was higher than 5%, and at least five of the following 10 SNPs with a MAF larger than 0.05 had a H% higher than 5%. The PAR stopped at the end of the X chromosome. For datasets 54K_real and LD_real, sporadic missing genotypes (4%) were imputed using Beagle. Genotypes for the imputed markers (in datasets IMP_test and IMP_0.5ref ) were compared to their corresponding real genotypes in 54K_real. Accuracy of imputation was measured by the ratio of the number of falsely imputed alleles to total number of imputed alleles, which will be referred to as allele error rate and the ratio of the number of falsely imputed genotypes to the total number of imputed genotypes, which will be referred to as genotype error rate, as well as the correlation between imputed and true genotypes.
Genomic relationship matrix (G matrix) using marker data including X-specific markers As presented by VanRaden [10] and Hayes et al. [11], a genomic relationship matrix (G) can be calculated as: where elements in column j (m ij ) of M are 0 -2p j , 1 -2p j and 2 -2p j for SNP genotypes A 1 A 1 , A 1 A 2 and A 2 A 2 , respectively, p j is the frequency of allele A 2 at SNP j.
The G matrix is calculated based on identity by state (IBS), with centering and scaling. Consequently, elements of the G matrix are approximations of realized proportions of the genome that are identical by descent (IBD) between pairs of individuals [11], which makes the G matrix analogous to the conventional numerator relationship matrix [10]. The G matrix describes the realized genetic relationships between pairs of individuals at the autosomal markers. However, genetic relationships between individuals at markers on the sex chromosomes and the autosomes are different. For example, for markers on the X-specific region of the X chromosome, the genetic relationship is 0 between father and son, 1 ffiffi 2 p . between mother and son and between father and daughter, 0.50 between mother and daughter and between full brothers, 0.75 between full sisters, and 1 ffiffi between full brother and sister. For autosomal loci, these relationships all have an expectation of 0.50. Therefore, sex-linked inheritance should be considered when building a genomic relationship matrix based on marker data that include X chromosome markers.
When X-specific markers are treated as autosomal markers, the resulting genomic relationship matrix reflects sex-linked relationships, but on an incorrect scale because males have one X chromosome while females have two. For example, the relationship between sire and son is 0, but the diagonal element for a male is 2, instead of 1. Consequently, the covariance structures for males, for females, and between males and females differ from each other.
Let A 1 O and A 2 O denote genotypes of an X-specific marker in males (O means null, since males have only one X chromosome), and A 1 A 1 , A 1 A 2 and A 2 A 2 denote genotypes in females. Assuming that A i O in males has the same effect on the performance of a trait as A i A i in females, genotypes of an X-specific marker can be coded in the same way as autosomal markers. Thus, genotypes A 1 O and A 2 O of males are coded as 0 and 2, and genotypes A 1 A 1 , A 1 A 2 and A 2 A 2 of females are coded as 0, 1 and 2. In addition, define γ as the effect of A 2 (i.e., allele effect on performance of a trait is expressed as the deviation from the effect of A1, thus the effect of A1 is zero), p as the frequency of A 2, and q = 1-p. The expectation of the genetic value (μ) accounted for by an X-specific marker for a male is: Let x be the genotype code as defined above and assume that the allele effect is independent of allele frequency and is additive (i.e., absence of non-additive genetic effect), then the variance of genetic value (σ 2 ) at an X-specific locus in the population of males is: where σ 2 γ is the variance of the random additive allele effect γ.
For females, the expectation and variance are the same as those for autosomal markers, i.e. μ ¼ 2pγ; and Let m ij be the element of matrix M for individual i and marker j, as defined previously. The relationship coefficient between male k and male l caused by the X-specific marker j can then be calculated as: The relationship coefficient between female k and female l has the same form as for autosomal markers, i.e.
The relationship coefficient between male k and female l is: Alternatively, it can be assumed that genotype A i O in males has half the effect of genotype A i A i in females. Then, the genotypes can be coded as the number of copies of A 2 , i.e., 0 and 1 for genotypes A 1 O and A 2 O of males, 0, 1 and 2 for genotypes A 1 A 1 , A 1 A 2 and A 2 A 2 of females, respectively. For females, the expectation and variance accounted for by an X-specific marker are the same as the above. The expectation of the genetic value for a male is: and the variance of the genetic value for males is: Let m * ij be the element for individual i and marker j in the corresponding M matrix. Define m * ij = 0-p for genotype A 1 O and m * ij = 1-p j for genotype A 2 O of males, and m * ij = 0-2p j , 1-2p j or 2-2p j for genotypes A 1 A 1 , A 1 A 2 , or A 2 A 2 of females. Then, m * ij = m ij /2 for males, and m * ij = m ij for females. Then, the relationship coefficient between male k and male l caused by the X-specific marker j is: the relationship coefficient between female k and female l is: and the relationship coefficient between male k and female l is: This demonstrates that the two alternate assumptions for the effect of the male genotype of X-specific markers lead to the same relationship coefficient. Thus, the G matrix based on both autosomal and X chromosome markers can be calculated as for autosomal markers, but element m ij of the M matrix must be divided by ffiffi ffi 2 p if marker j is a X-specific marker and individual i is a male, i.e.
if individual i is a male. To construct the M matrix, when the codes for A 1 A 1 , A 1 A 2 and A 2 A 2 are 0, 1 and 2, the X-specific genotypes of A 1 O and A 2 O are coded as 0 and 2.

Genomic prediction models
Genomic predictions based on marker data with and without markers on the X chromosome were carried out using the following GBLUP models implemented in the DMU package [12]: (1) G(A): GBLUP with the G matrix built using autosomal markers (G a ) only: (2) G(A + X): GBLUP with the G matrix built using all markers and treating X-specific markers as autosomal markers (G 0 ): (3) G c (A + X): GBLUP with the G matrix built using all markers and accounting for the sex-linked inheritance of X-specific markers (G c ), (4) G(A) + G(X): GBLUP using both the autosomal G matrix and the X chromosome G matrix (G x ): (5) G(A) + Pol: model G(A) plus a residual polygenic effect: (6) G c (A + X) + Pol: model G c (A + X) plus a residual polygenic effect: In the above models, y is the vector of DRP, μ is the intercept, g a is the vector of genomic breeding values accounted for by autosomes, g x is the vector of genomic breeding values accounted for by the X chromosome, g 0 is the vector of total genomic breeding values associated with the G matrix that treats X-specific markers as autosomal markers, g c is the vector of total genomic breeding values associated with the G matrix that accounts for X-specific markers as sex-linked markers, Z is the incidence matrix relating genomic breeding values to y, u is the vector of residual polygenic effects, Z u is the incidence matrix that associates u with y, and e is the vector of random residuals. Random effects are assumed distributed as follows: and e e N 0; Rσ 2 where A is the pedigree-based relationship matrix, and R is a diagonal matrix used to account for heterogeneous residual variances due to different reliabilities of DRP (r 2 DRP ). The diagonal element i of matrix R was computed as R ii ¼ . Reliability of DRP was calculated as EDCþλ where EDC is the equivalent daughter contribution and λ ¼ 4−heritability heritability [13]. All variances (σ 2 g a ; σ 2 g x ; σ 2 g 0 ; σ 2 g c ; σ 2 u ; and σ 2 e ) were estimated from the DRP data used in the analyses, using the corresponding models. The allele frequencies used to construct the G matrix were calculated from the current marker data of the genotyped animals.
In addition to the above analyses, genomic predictions were also performed using four reduced 54K marker datasets. These datasets were: (1) Non-2: marker data excluding the markers on chromosome 2 that has a length similar to that of the X chromosome; (2) Non-10: marker data excluding the markers on chromosome 10 which is similar to the X chromosome in terms of number of annotated genes; (3) Non-26: marker data excluding the markers on chromosome 26 which is similar to X chromosome in terms of number of markers; (4) Nonran: marker data excluding a random sample of 827 markers (equivalent to the number of markers available on the X chromosome). Genomic predictions based on these datasets were carried out using the GBLUP model y = μ + Zg r + e, where g r is the vector of genomic breeding values accounted for by the reduced marker data. The G matrix used for the analyses considered sex-linked inheritance for X-specific markers.
Genomic predictions using different marker datasets and different models were validated by comparing genomic estimated breeding values (GEBV) and DRP for animals in the test data. GEBV were calculated as the sum of the genomic effect and the residual polygenic effect for models G(A) + Pol and G c (A + X) + Pol, and as the sum of the autosomal effect and the X chromosome effect for model G(A) + G(X). Reliabilities of genomic predictions were estimated as the squared correlation between genomic predictions and DRP, and then divided by the average reliability of DRP, based on [14]: where TBV is true breeding value. Bias of genomic predictions was assessed by regression of DRP on GEBV [15]. A necessary condition for unbiased prediction is that the regression coefficient does not deviate significantly from 1.
The log-likelihood ratio statistic (−2lnLR) was used to test the difference in goodness of fit between model G(A) + G(X) and model G(A), and between model G c (A + X) + Pol and model G c (A + X). Taking G(A) + G(X) and G c (A + X) + Pol as alternative model while G(A) and G c (A + X) as null model, the log-likelihood ratio statistic was calculated as -2lnLR = −2ln(likelihood of null model/ likelihood of alternative model). The P value of -2lnLR was calculated assuming that -2lnLR is asymptotically χ 2 df ¼1 distributed [16], and calculated assuming that the asymptotic distribution of -2lnLR is a 50:50 mixture of χ 2 df ¼0 and χ 2 df ¼1 ; so that P(− χ 2 mixture ) = 0.5P ( χ 2 df ¼1 ) [17]. Hotelling-Williams' t-test [18,19] was implemented to test the equality of two dependent correlations (Cor(GEBV, DRP)) from two models for the same trait. The log-likelihood ratio test and Hotelling-Williams' t-test were implemented in the analysis using the 54K_real marker data .

Results
The accuracy of imputation from the 7K to the 54K SNP panel was high (Table 3). Using Beagle, the allele error rate for autosomal markers averaged over the two datasets (IMP_test and IMP_0.5ref ) was 1.1%. Compared with autosomal markers, the allele error rates for X-specific markers and PAR markers were increased by 2.1 and 7.7%, respectively. The accuracy of imputation with Findhap was slightly lower than that with Beagle, with an increase of the allele error rate of about 0.7% for autosomes, 0.3% for X-specific markers, and 1.5% for PAR markers, averaged over the two datasets. Correlation coefficients between imputed and true genotypes for autosomal markers, pseudo-autosomal markers, and X-specific markers were 0.983, 0.856 and 0.937 with Beagle, and 0.971, 0.831 and 0.935 with Finhap.
Genotype error rate was nearly twice as large as the allele error rate for markers on autosomes and PAR, but almost the same for X-specific markers (Table 3). This was because animals in the present data were all bulls, thus genotype error was in principle equivalent to the allele error for X-specific markers. The reason for a slightly higher genotype error rate than allele error rate for X-specific markers was that some genotypes were heterozygous in the real 54K data (due to typing error) and in the imputed data (due to imputation error).
Although animals with LD genotypes in the IMP_test dataset had more ancestors with 54K genotypes, while animals with LD genotypes in the IMP_0.5ref dataset had more progeny with 54K genotypes, these two datasets had similar accuracies of imputation (Table 3). Allele error rates were equal to 1.9% with Findhap and 1.2% with Beagle, averaged over the two imputation datasets and calculated from the data pooled over the autosomes and the X chromosome markers.
As shown in Table 4, for the four datasets, genomic predictions using all markers gave a slightly higher reliability than predictions without markers on the X chromosome. Averaged over the 15 traits, the gain in reliability from using the X chromosome markers was 0.4 to 0.5% points when using models without a residual polygenic effect, and 0.3 to 0.4% points when using models with a residual polygenic effect. Models G(A + X) and G c (A + X) resulted in the same reliability of genomic predictions, which indicates that a G matrix that took sex-linked inheritance for X-specific markers into account did not improve genomic prediction more than a G matrix that dealt with X-specific markers as autosomal markers, possibly because animals in the present data were all bulls. In addition, model G(A) + G(X) did not improve predictions compared to models G(A + X) and G c (A + X), which suggests that it is reasonable to assume that the effects of the markers on the X chromosome and the autosomes have the same distribution.
A model that included a residual polygenic effect improved the reliability of predicted breeding values, with an average increase of about 0.8% points (Table 4). For all scenarios, the greatest improvement in reliability by including a residual polygenic effect in the model was observed for the traits longevity and other diseases. Reliability of GEBV using the LD genotypes was 5% points lower than when using the real 54K genotypes and Table 3 Allele error rate (ER A , %), genotype error rate (ER G , %) and correlation (COR) between imputed and true genotypes for different sets of markers a in two datasets b a ALL: all markers; AUTO: markers on the autosomes; PAR: markers on the pseudo-autosomal region; X: X-specific markers on the X chromosome; b IMP_test: for the test animals in genomic prediction, the 54K marker data were imputed from LD marker data; IMP_0.5ref: for half (randomly chosen) of the reference animals, the 54K marker data were imputed from LD marker data.
applying models without a polygenic effect, and 3.4% points lower when applying models with a polygenic effect. Furthermore, genomic predictions based on the imputed datasets of IMP_test and IMP_0.5ref were almost as accurate as predictions based on the real 54K data.
Regression coefficients of DRP on genomic predictions based on the real 54K or imputed 54K genotype data ranged from 0.782 to 1.064, except for longevity, for which the regression coefficients ranged from 0.631 to 0.685 (Table 5). Averaged over the 15 traits, the regression coefficients were slightly closer to 1 with than without using the X chromosome markers for prediction. Regression coefficients were the same when using real versus imputed 54K genotype data. In addition, models that included a residual polygenic effect resulted in regression coefficients considerably closer to 1 than models without a polygenic effect, which indicates a reduction of prediction bias from including polygenic effects. Regression coefficients deviated more from 1 for genomic predictions based on LD genotype data than for predictions using the 54K genotype data, which indicates a larger prediction bias for the former. However, when using models with a residual polygenic effect, the regression coefficients based on LD genotypes were very close to those based on the 54K genotype data. Table 6 shows the reliability of genomic predictions when excluding one of four selected chromosomes or when deleting a random sample of markers. Compared to excluding the X chromosome, excluding chromosome 2 (similar to the X chromosome in length), chromosome 10 (similar to the X chromosome in number of annotated genes), and chromosome 26 (similar to the X chromosome in number of markers) led to larger losses in reliability. Excluding chromosome 10 led to the largest loss in reliability, while randomly deleting 827 markers (i.e. the same number of markers as on the X chromosome) led to no loss in reliability.
The log likelihood ratio test statistics in Table 7 indicate that model (G(A) + G(X)) using both autosomal and X chromosome markers had a significantly better goodness of fit than model (G(A)) using only autosomal markers for 13 of the 15 traits, and that model (G c (A + X) + Pol) with a residual polygenic effect was significantly better than model (G c (A + X)) without a polygenic effect for 12 traits. As shown in Table 7, the variance accounted for by the X chromosome was significantly different from 0 for 10 traits, and the variance accounted for by the residual polygenic effect was significant for 13 traits. On average, the X chromosome accounted for 1.7% of the total additive genetic variance, and the residual polygenic effect for 17.2% of the total additive genetic variance. Table 4 Reliability (%) of genomic predictions based on four datasets a with or without X chromosome markers, using different models b and averaged over 15 traits a 54K_real: all animals with marker data from the 54K chip; IMP_test: for the test animals in genomic prediction, the 54K marker data were imputed from LD marker data; IMP_0.5ref: for half (randomly chosen) of the reference animals, the 54K marker data were imputed from LD marker data; LD_real: all animals had LD marker data without extension to the 54K marker data;  Table 5 Regression coefficients of deregressed proofs on genomic predictions based on four datasets a with or without X chromosome markers, using different models b and averaged over 15 traits  Table 8 presents reliabilities of genomic predictions for each trait using models G(A), G(A) + G(X) and G c (A + X) + Pol, based on the 54K_real dataset and shows that the contribution of X chromosome markers to the reliability of genomic predictions differed between traits. An increase in reliability of around 2% points was observed for fertility and other diseases. Correspondingly, the variances explained by the X chromosome were much higher for these two traits than for the other traits. Longevity also showed a significant benefit of including X chromosome markers, although the variance accounted for by the X chromosome was small for this trait. Averaged over the 15 traits, including the X chromosome improved the prediction reliability by 0.5% points.

Dataset G(A) G(A + X) G c (A + X) G(A) + G(X) G(A) + Pol G c (A + X) + Pol
The benefit of including polygenic effects into the model also differed among traits (Table 8). A significant increase in the reliability of genomic predictions from including a residual polygenic effect was obtained for four traits. The largest improvements were for longevity (3.6%) and other diseases (3.7%). For these two traits, the variance accounted for by residual polygenic effect was more than 40% of the total additive genetic variance (Table 7). For the other traits, the average improvement in prediction reliability was 0.3%.  Log likelihood ratio of model G(A) + G(X) to model G(A), where G(A) was the model with an autosomal G matrix and G(A) + G(X) was the model including an autosome G matrix and an X chromosome G matrix; b Log likelihood ratio of model G c (A + X) + Pol to model G c (A + X), where G c (A + X) was the model with a G matrix built using all markers and G c (A + X) + Pol included also residual polygenic effect; c Variance accounted by the X chromosome and estimated from model G (A) + G(X); d Variance of residual polygenic effect and estimated from model G c (A + X) + Pol; e Variance in proportion to total additive genetic variance; * Significant at P < 0.05, where P was calculated as P(χ 2 df ¼1 ); ¤ Significant at P m < 0.05, where P m was calculated as 0.5P(χ 2 df ¼1 ), e.g., when P < 0.05, P m < 0.025.

Discussion
This study investigated the accuracy of genotype imputation for markers on the X chromosome and the impact of including X chromosome markers on reliability of genomic predictions. The results showed that averaged over the 15 traits evaluated, including X chromosome markers improved the reliability of genomic prediction slightly, ranging from 0.3 to 0.5% points in various datasets and using different models. The variance accounted for by the X chromosome was about 1.7% of the total additive genetic variance. Gains in reliability from including the X chromosome were smaller than observed in a previous study on USA Holstein cattle by VanRaden et al. [7], who reported an increase in reliability of 1.5%, averaged over nine traits, although the X chromosome accounted for only 1% of the total genetic variance in their study. When the genomic model included a residual polygenic effect, breeding values predicted using marker data that included X chromosome markers were still more accurate than those predicted without X chromosome markers. This means that a model that includes a residual polygenic effect does not recover the loss of prediction accuracy from exclusion of X chromosome markers. The loss of prediction accuracy from exclusion of the X chromosome was smaller than when an autosome of similar size (chromosome 2), or with an equivalent number of annotated genes (chromosome 10), or with an equivalent number of markers (chromosome 26) was excluded. There are two possible reasons why markers on the X chromosome contribute less to the reliability of genomic predictions than these three autosomes. One reason is that the density of markers on the X chromosome is much lower than that on autosomes; the average distance between adjacent markers is about 180 kb on the X chromosome and 60 kb on the autosomes in the 54K marker data. The second reason is that markers on the X chromosome represent weaker relationships between individuals in the present data, which consisted only of males. The impact of genetic relationships between animals in the reference and test datasets on reliability of genomic predictions for test animals has been reported in many previous studies [11,[20][21][22]. Since the relationship between sires and sons is 0 for the X chromosome, information of a sire does not directly influence the son's GEBV explained by the X chromosome. On the contrary, information of a sire directly influences the son's GEBV explained by the autosomes, as reported in previous studies that showed that reliability of GEBV is about 5 to 10% higher for the test animals with than without their sires in the reference population [23,24].
When a random set of 827 markers (i.e. the number of markers on the X chromosome) was excluded from the analysis, there was no loss in reliability of genomic  than X-specific markers, although the markers on the PAR were about twice as dense as X-specific markers in both the 7K and the 54K data. This can be explained by the fact that the PAR is a small segment (about 11 Mbp based on our estimation), which could reduce imputation efficiency. Another explanation could be that X-specific markers may have lower recombination rates than PAR markers, since crossovers occur only in females. Poor imputation accuracy for PAR markers was also reported by Johnston et al. [6] in the imputation from the 3K to the 54K panel.

Conclusions
Although the accuracy of genotype imputation for markers on the X chromosome was lower than that for autosomal markers, the accuracy of imputation from the 7K to the 54K panel for markers on the X chromosome was still high in the Nordic Holstein population. Including markers on the X chromosome slightly increased the reliability of genomic predictions. Based on our data which included only bulls, using a G matrix that took the sex-linked inheritance of X-specific markers into account did not improve prediction compared to a G matrix that did not. Although the improvement in the reliability of genomic prediction obtained from the X chromosome is small, including X chromosome markers does not result in any extra cost. Therefore, it is recommended to use markers on the X chromosome for genomic evaluation.