Estimated allele substitution effects underlying genomic evaluation models depend on the scaling of allele counts

Background Genomic evaluation is used to predict direct genomic values (DGV) for selection candidates in breeding programs, but also to estimate allele substitution effects (ASE) of single nucleotide polymorphisms (SNPs). Scaling of allele counts influences the estimated ASE, because scaling of allele counts results in less shrinkage towards the mean for low minor allele frequency (MAF) variants. Scaling may become relevant for estimating ASE as more low MAF variants will be used in genomic evaluations. We show the impact of scaling on estimates of ASE using real data and a theoretical framework, and in terms of power, model fit and predictive performance. Results In a dairy cattle dataset with 630 K SNP genotypes, the correlation between DGV for stature from a random regression model using centered allele counts (RRc) and centered and scaled allele counts (RRcs) was 0.9988, whereas the overall correlation between ASE using RRc and RRcs was 0.27. The main difference in ASE between both methods was found for SNPs with a MAF lower than 0.01. Both the ratio (ASE from RRcs/ASE from RRc) and the regression coefficient (regression of ASE from RRcs on ASE from RRc) were much higher than 1 for low MAF SNPs. Derived equations showed that scenarios with a high heritability, a large number of individuals and a small number of variants have lower ratios between ASE from RRc and RRcs. We also investigated the optimal scaling parameter [from − 1 (RRcs) to 0 (RRc) in steps of 0.1] in the bovine stature dataset. We found that the log-likelihood was maximized with a scaling parameter of − 0.8, while the mean squared error of prediction was minimized with a scaling parameter of − 1, i.e., RRcs. Conclusions Large differences in estimated ASE were observed for low MAF SNPs when allele counts were scaled or not scaled because there is less shrinkage towards the mean for scaled allele counts. We derived a theoretical framework that shows that the difference in ASE due to shrinkage is heavily influenced by the power of the data. Increasing the power results in smaller differences in ASE whether allele counts are scaled or not.


Background
Genomic evaluation is used to predict direct genomic values (DGV) for selection candidates in breeding programs. In addition to the DGV, allele substitution effects (ASE) are or can be computed using genomic evaluation models. An ASE represents the effect that the presence of a copy of that allele has on the phenotype. This also applies for the estimation of such effects in genomic evaluation. The loci used do not have to be the causal variants; if they are in linkage disequilibrium (LD) with the causal loci, they can pick up the correlated part of the ASE of the causal loci. The estimated ASE from genomic evaluations can be used for various additional purposes such as rapid computation of DGV for newly genotyped individuals by multiplying their allele counts with the ASE [1], in genome-wide association studies (GWAS) to get insight on the genetic architecture of a trait [2,3], and to estimate DGV based on small genomic regions, so-called 'local DGV' for quantitative trait loci mapping [4,5].
Several genomic evaluation models estimate ASE first to predict the DGV, e.g., Bayesian stochastic search variable selection, SNP-best linear unbiased prediction (BLUP) or ridge regression [6,7]. Other methods such as genomic (G)BLUP, genomic restricted maximum likelihood estimation (GREML), and one-step methods, use a genomic relationship matrix (GRM) that is constructed from the SNP genotypes [8][9][10]. The DGV are then predicted as a result of solving the mixed model equations, and ASE are not explicitly computed. However, with a GBLUP or GREML approach, it is straightforward to back-solve the ASE from DGV based on the genotypes of the animals [11].
All genomic evaluation models require genotypes that are either used directly or to construct a GRM. There are different genotype coding methods for the three possible genotypes: homozygous allele 1 (e.g., AA), heterozygous (e.g., AB) and homozygous allele 2 (e.g., BB). Often the genotypes are represented as the number of copies of one allele (e.g., counting the B allele in the above example: 0, 1, 2), which means that, in genomic evaluation models, the ASE is estimated for the allele that is being counted. These allele counts can be centered resulting in a mean of 0, or both centered and scaled resulting in a mean of 0 and a standard deviation of 1. Using centering only in genomic evaluation gives ASE directly, however, using centered and scaled allele counts results in estimated effects for the scaled genotypes, instead of for 0, 1, 2 genotypes, and an additional transformation is needed to obtain the actual ASE. Stranden and Christensen [12] showed that differences in genotype coding gave correlations between ASE close to 1 (higher than 0.9998) and the same DGV as long as the estimated general mean was included in the DGV. However, they looked at centering and did not include scaling in the genotype coding methods studied. As indicated by de los Campos et al. [13], centering only influences the intercept, but scaling results in less shrinkage towards the mean for low minor allele frequency (MAF) variants compared with variants with intermediate MAF.
In the past, low MAF variants were often ignored by applying a MAF cut-off of 1 to 5%, or a minimum number of copies of the minor allele present in the population, because such variants were considered unreliable [14]. Moreover, SNP genotyping platforms in livestock species have been developed such that mainly common SNPs are on the SNP-panel of commercial genotyping chips. However, with the recent rise in available whole-genome sequence data, the use of rare variants in genomic evaluation and GWAS is increasing.
Goddard [15] indicated that optimal long-term genomic selection is achieved by putting more emphasis on SNPs with a low frequency of the favorable allele, such that all SNPs will be fixed at the same moment in time. Jannink [16] also showed that putting more weight on favorable alleles with a low frequency benefits long-term selection since the final gain from weighted genomic selection is higher. In conservation genetics, it might be desirable to put more emphasis on rare alleles to preserve the alleles that are at high risk of disappearing in a few generations. Eynard et al. [17] showed that relationships between individuals based on variants with a MAF between 1 and 5% are significantly different from relationships based on more common variants, and concluded that for conservation of rare alleles the relationships should be estimated using scaled allele counts.
Given that the use of low MAF variants in genomic evaluation is likely to increase, scaling may become a more important consideration for the estimation of ASE. This paper shows the impact of scaling the centered allele counts on the estimation of ASE. An SNP-BLUP model was applied to dairy cattle data to estimate DGV and ASE, using different scaling parameters. We present a theoretical framework to show the origin of the difference in ASE, resulting from scaling, and the impact of the power of the data on this difference. In addition, the bestfitting scaling parameter and that with the best predictive performance were investigated.

Data
Daughter yield deviations (DYD) for stature from 5554 Holstein bulls were available from CRV (Cooperative Cattle Improvement Organization, Arnhem, The Netherlands), as well as the number of daughters used to estimate the DYD (on average 549 daughters). The bulls were genotyped with the Illumina BovineHD Bead chip (734,403 SNPs; Illumina Inc., San Diego, CA, USA), or genotyped with a 50 K SNP panel and imputed to high-density (HD). SNPs with less than five copies of the minor allele segregating in the population were discarded. In addition, each possible SNP genotype had to occur at least once (i.e., at least one heterozygote and one homozygote carrying the minor allele), resulting in a final set of 627,440 SNPs. MAF ranged from 0.00045 (i.e., five alleles present in the population) to 0.5 with an average of 0.22; the frequency distribution of MAF is shown in Fig. 1.

Genomic evaluation
To show the impact of scaling on ASE, the effects were estimated with a random regression model (SNP-BLUP) using two different genotype coding methods: centered allele counts (RRc) and centered and scaled allele counts (RRcs). The SNP-BLUP model is a random regression model, which estimates the regression coefficients with BLUP, assuming a normal distribution. The following SNP-BLUP models were solved using ASReml software [18]: where y is a vector with phenotypes, here stature DYD; µ is the intercept; Z is a n × N design matrix containing centered allele counts for all individuals, where n is the number of animals and N the number of SNP; b RRc is a vector of random unknown SNP effects, the SNP effects were assumed to be identically and independently distributed with a mean 0 and variance σ 2 g , i.e., b ∼ N 0, Iσ 2 g ; and e is a vector of residual errors. The model assumed that e ∼ N 0, Dσ 2 e , where D is a diagonal matrix with elements computed as 1 wt i , with weight wt i being the number of daughters of individual i on which the DYD of i was based, and σ 2 e is the residual variance. Elements of Z are computed as z ij = x ij − 2p j , where x ij is an element of the X matrix containing the SNP genotype for individual i at locus j coded as 0, 1, or 2; and p j is the frequency of the allele whose homozygous genotype is coded as 2 at locus j. Note that 2p j is the mean allele count of the SNP used for the centering, and that the estimated SNP effects b RRc are the estimated ASE, i.e., α RRc =b RRc .

Centered and scaled (RRcs):
For centered and scaled allele counts, the Z matrix was replaced by a W matrix, which contained elements . Note that 2p j is the mean allele count of the SNP used for the centering, and that 2p j 1 − p j is the standard deviation used for the scaling. Since W contains scaled allele counts, the estimated y = 1µ + Zb RRc + e, y = 1µ + Wb RRcs + e.
SNP effects are not ASE, i.e., they are not equal to half the difference of the value between the two homozygotes [19]. The ASE can be obtained as: In the above paragraphs, we describe SNP-BLUP models but for convenience, we applied GREML models with back-solving to obtain the results (see "Appendix"). These two SNP-BLUP models are equivalent to GREML models with a centered GRM following Van-Raden's [10] method (1), and a centered and scaled GRM following VanRaden's [10] method (2). In the case of GREML, the DGV must be back-solved to obtain the estimated regression coefficients for the SNP and, for the scaled GRM, the transformation to ASE is needed (see "Appendix").

Relationship between ASE from unscaled and scaled allele counts
First, we empirically evaluated the relationship between ASE from RRc and RRcs. For variants with the same MAF, the correlation, mean ratio and the regression coefficient were calculated between the ASE obtained with RRc and RRcs. The mean ratio was calculated by dividing the ASE estimated with RRcs by the ASE estimated with RRc for each SNP and averaged over all SNPs with the same MAF. Regression coefficients were obtained per set of SNPs with the same MAF by regressing the ASE from RRcs on the ASE from RRc. The number of SNPs per MAF ranged from 14 to 1796 with an average of 94 SNPs.
Second, we theoretically evaluated the expected relationship between ASE from RRc and RRcs. We considered expressions for estimating the ASE directly for a single locus, and ignored possible covariances between estimated ASE of different loci, which may for instance arise due to LD between the loci. For RRc, the equivalent ridge regression BLUP model can be specified as in e.g., [15,20]:

RRc
, with σ 2 α RRc as SNP variance, i.e., , with σ 2 a as total additive genetic variance. Ignoring off-diagonal elements in Z ′ D −1 Z, and using vector z .j that is column j in matrix Z, we get: Thus, the ratio between both ASE, is equal to: Assuming that there is no relationship between the genotypes of the individuals and the information content of their phenotypes, e.g., the number of daughters in our study as represented in D −1 , and that the genotypes are in Hardy- where #dtrs i is the number of daughters in our case. The latter term could be replaced in other cases for instance by the (effective) number of own records, or if the individuals involved have only one own observation, then this term can simply be replaced by n (number of individuals). In the latter situation, we get: The value of the term j 2p j 1 − p j can be obtained as: where φ(p) is the probability density function of the distribution of the allele frequencies, which is required. Here, we consider two distributions, i.e., the uniform distribution, which generally applies for commonly used 50 K SNP chips [21,22] and the U-shaped distribution, which applies to whole-genome sequence data [21,23,24]. For the uniform distribution, incrementing p by steps of 1 2n , we obtain: Thus, when the allele frequencies are uniformly distributed: For the U-shaped distribution, the probability density function, is φ(p) ≈ Cp 4N e v−1 (1 − p) 4N e u−1 e 4N e sp(1−p) [25], where v and u are assumed to be equal and represent forward and backward mutation rates (here assumed to be 1 × 10 −8 ), N e is the effective population size (here assumed to be 65), s is the selection coefficient and C is a constant that scales the sum of all probabilities to 1. Assuming s = 0 for simplification, the term e 4N e sp(1−p) drops from the equation, such that In this case, Thus, we get: Substituting this in the earlier formula, we get for the U-shaped distribution: This expression is rather tedious. Here, we assumed that v = 1 × 10 −8 , and the effective population size N e = 65 [26] resulting in 4N e v = 2.6 × 10 −6 ≈ 0. Hence, in our case, and in other datasets where N e is small such that 4N e v ≈ 0, and considering that n is not extremely large, C can be approximated as: where γ is the Euler-Mascheroni constant [27] (see "Appendix" for a derivation).
Thus, for situations where 4N e v ≈ 0, and n is not extremely large, we get: Substituting this in the earlier formula, we get for the U-shaped distribution: The above formulae show that under the assumption that the ASE are not affected by LD with other SNPs, the ratio between E α RRc,j and E(α RRcs,j ) is the result of shrinkage, i.e., when the amount of information used becomes (very) large (i.e., many individuals), α RRc,j and α RRcs,j will be the same.

Optimal scaling parameter
To determine the optimal scaling parameter, both the fit of the model to the data and the predictive performance of the model were evaluated with different scaling parameters. Adopting the notation by Speed et al. [28], allele counts from matrix X were recoded resulting in matrix V with elements computed as . Hence the SNP-BLUP model became y = 1µ + Vb γ + e. Different scaling parameters were tested by varying γ from − 1 [i.e., scaling by the standard deviation (RRcs)] to 0 [i.e., no scaling (RRc)] in steps of 0.1 for elements of the matrix V.
The model fit was evaluated by comparing the log-likelihoods from the SNP-BLUP models using the complete stature dataset.
The mean squared error of prediction (MSEP) was evaluated by running the SNP-BLUP model with the different scaling parameters (γ) using the stature dataset, which was split in a training (3414 older bulls) and validation set (2140 young bulls). The MSEP for the validation animals was calculated as with wt i being the number of daughters of bull i included in the DYD i , which was used as the actual phenotype for stature for validation bull i, DGV i + µ being the DGV of bull i plus the general mean which together result in the predicted phenotype (i.e., ŷ) for the validation bull i. The model with the lowest MSEP was considered to be the most appropriate for genomic prediction, and thus also for estimating ASE.

Results
Using a reference dataset of 5554 bulls with HD genotypes (627,440 SNPs coded as 0, 1, 2) and stature phenotypes, we compared the ASE from a SNP-BLUP model using two commonly used genotype coding methods. The first method proposed (RRc) centers the 0, 1, 2 coded genotypes; the second method proposed (RRcs) centers and scales the 0, 1, 2 coded genotypes.

Comparison of ASE from unscaled and scaled allele counts
The correlation between DGV using RRc and RRcs was equal to 0.9988 (regression coefficient (regression of RRcs on RRc) = 1.0011; σ 2 DGV : RRc = 1.64, RRcs = 1.68 ), whereas the correlation between ASE using RRc and RRcs was equal to 0.27. The main difference in ASE between the two methods was found for SNPs with a MAF lower than 0.01, as shown in Fig. 2. Figure 3 shows that although the overall correlation is low, there is a relationship between the ASE from RRc and RRcs. Please note the differences between the x-axis and y-axis in the different plots. As the MAF increases, the ASE from RRc and RRcs become more similar, and Fig. 3 shows that ASE from SNPs with a higher MAF are located closer to the diagonal. With lower MAF, the ASE from RRcs become larger than ASE from RRc, and Fig. 3 shows that ASE from SNPs with a lower MAF have much steeper regression coefficients. Although ASE seem to correlate poorly between RRc and RRcs, the correlation between ASE of variants with the exact same MAF was high, i.e., ranging from 0.770 to 0.996, with an average correlation of 0.98 (Fig. 4a). However, on the one hand, both the mean ratio (ASE from RRcs divided by ASE from RRc, averaged per MAF; Fig. 4b) and the regression coefficient (regression of ASE from RRcs on ASE from RRc; Fig. 4c) were much higher than 1 for low MAF variants. On the other hand, the mean ratio and regression coefficients were often lower than 1 for SNPs with a MAF higher than 0.25 (results not shown) because the total variance explained by all SNPs together remains the same. The differences in ASE between RRc and RRcs, and hence the observed mean ratios and regression coefficients, are due to the difference in shrinkage towards the mean. For RRc, the ASE of variants with a low MAF are heavily shrunk back towards the mean; this shrinkage is much less strong in RRcs due to the scaling of the allele counts, which considers the allele frequencies. Figure 4b shows the ratio between ASE based on RRc and RRcs in the real data for stature. The red dashed line in Fig. 4b indicates the ratio based on the derived equation for the uniform distribution of allele frequencies using the heritability, number of individuals and SNP as well as the frequencies of the real data, and this shows that the equation accurately fits the general trend of those mean ratios estimated from the real data. The regression  Fig. 4c also fit well with the ratios from the equation for the uniform distribution of allele frequencies, although the maximum regression coefficient (282) was lower than the maximum from the equation for the uniform distribution (360). Figure 5 shows the ratio as derived for different simulated scenarios for both the uniform and U-shaped distributions. Allele frequencies lower than 0.01 have high ratios for both allele frequency distributions (Fig. 5). Scenarios with a large number of individuals (100 K), a small number of SNPs (50 K) and a high heritability (0.8) have ratios closer to 1 at the extremely low MAF compared with the other scenarios (Fig. 5). The ratio of estimated ASE for the low MAF variants from the U-shaped distribution was substantially lower than that for the uniform distribution of allele frequencies.
For the uniform distribution, SNPs with an intermediate MAF of ~ 0.2 showed ratios around 1, and SNPs with a MAF of 0.5 had ratios below 1 with a minimum of 0.67 (h 2 = 0.2, 1000 individuals, and 15 × 10 6 SNPs), and a maximum of 0.96 (h 2 = 0.8, 100,000 individuals, and 50 K SNPs). For the U-shaped distribution, ratios were around 1 for SNPs with a MAF of 0.065 for scenarios with 1000 individuals, and for SNPs with a MAF of 0.04 for scenarios with 100,000 individuals. At a MAF of 0.5, the U-shaped distribution reached ratios between 0.24 and 0.30 for scenarios with 1000 individuals, and between 0.16 (h 2 = 0.2, 15 × 10 6 SNPs) and 0.91 (h 2 = 0.8, 50 K SNPs) for scenarios with 100,000 individuals.

Optimal scaling parameter
We have shown, empirically and theoretically, that the scaling of allele counts influences the ASE of variants with a low MAF. Here, we attempted to determine which scaling parameter gave the best fit of the model to the data and which had the best predictive performance.
The optimal scaling parameter was determined by comparing the maximum likelihood for SNP-BLUP  (Fig. 6a), and was therefore the optimal scaling parameter for this dataset.
The optimal scaling parameter for predictive performance was determined by comparing the MSEP of the SNP-BLUP models for which scaling parameters varied from − 1 (i.e., RRcs) to 0 (i.e., RRc) in steps of 0.1. The stature dataset was split into a training set of 3414 older bulls and a validation set of 2140 young bulls for which the MSEP was assessed. The model with a scaling parameter of γ = −1 (i.e., RRcs) retained the lowest MSEP (Fig. 6b), and therefore had the best predictive performance for this dataset.

Discussion
The ASE indicates the effect of the variant on the phenotype under investigation. When searching for variants that cause variation in the phenotype, most of those detected are either common variants (MAF > 5%) or variants that explain a large proportion of the variance in the trait, e.g., Mendelian traits [29,30]. The most difficult causal variants to detect are those with a low MAF (0.5% < MAF < 5%) and low to moderate effect, as well as those that are rare (MAF < 0.5%) [29,30]. If the power of the study design is sufficient and the ASE moderate, genomic prediction models using such low MAF variants might explain part of the so-called missing heritability [29]. With whole-genome sequence data, low MAF variants and even rare variants can be Theoretical ratio (ASE RRcs /ASE RRc ) between allele substitution effects (ASE) based on unscaled (RRc) and scaled (RRcs) allele counts. Ratio for different allele frequencies (p) based on derivations for a uniform (left) and U-shaped (right) distribution of allele frequency for marker sets. Scenarios based on the number of individuals, i.e., 1000 (1 K; black) and 100,000 (100 K; gray); and number of SNPs, i.e., 50,000 (50 K; square), 800,000 (800 K; dot), 15,000,000 (15 M; triangle). Here, only allele frequencies lower than 0.01 are shown for heritabilities (h 2 ) of 0.2 (first row) and 0.8 (second row) identified and used in genomic prediction. However, in this study, we demonstrated that different genotype coding methods result in different estimated ASE, especially for low MAF variants. Due to the increased interest in low MAF variants with the advent of using whole-genome sequence data in genomic prediction, it will become increasingly important to determine how genotypes are coded prior to estimating the ASE. Here, we showed that scaling of allele counts (0, 1, 2 genotypes) influences the estimation of ASE of low MAF variants due to more (unscaled) or less (scaled) shrinkage towards the mean of the ASE. In addition, our results show that centered and scaled allele counts (e.g., RRcs) put more weight on low MAF variants, which suggests that scaling is more preferable for long-term genomic selection than no scaling (e.g., RRc) [15,16]. Although these results were expected based on the underlying model assumptions, our aim was to visualize the differences in ASE between scaling and no scaling, to provide a theoretical framework, and create awareness of the implications for future studies, in animal and plant breeding, and even in genomic prediction of disease risk in humans. Our empirical results closely resemble the expectation based on the presented theoretical framework. In practice, accurately estimating the ASE of low MAF variants remains difficult and requires powerful study designs.

Scaling
Although in this study we presented SNP-BLUP (or ridge regression BLUP) models, equivalent models that do not explicitly estimate ASE, i.e., GREML or GBLUP, will similarly result in different estimates of the ASE when the allele counts used in the construction of a GRM are scaled or not. The same applies to other regression-based models that include parameters that accommodate for differential shrinkage across loci, e.g., Bayesian variable selection models such as BayesC and BayesR. For those models, the difference in ASE when scaling is used or not may be smaller than for GBLUP and SNP-BLUP, because these differential shrinkage models have additional parameters to modify shrinkage per locus.
Optimal scaling depends on the true relationship between the ASE size and MAF in the data analyzed [31]. Causal alleles with a larger effect tend to have a lower MAF [32], hence for models with scaled allele counts that give low MAF variants a larger ASE (i.e., RRcs) may represent more closely the truth. Speed et al. [31] showed that scaling by the square root of the variance (i.e., RRcs; γ = −1) gave stable results in estimating the heritability (based on SNPs), regardless of the simulated genetic structure of the trait. In our study, we showed that a scaling parameter of γ = −0.8, gave the best fit of the model to the data, whereas a scaling parameter of γ = −1 (i.e., RRcs) gave the best predictive performance in terms of MSEP (Fig. 6), which suggests that RRcs gave better ASE estimates than RRc for our dataset. In a recent paper, Speed et al. [28] re-evaluated the scaling parameters for scenarios that include low MAF variants. They recommended a scaling parameter of γ = −0.25 when low MAF variants are included, while the estimated heritabilities (based on SNPs) were hardly influenced by the scaling parameter when analyzing only common SNPs. In their analysis, they used a model with two GRM, one for SNPs with a MAF lower than 0.1 and one for SNPs with a MAF higher than 0.1, versus a model with one GRM containing all SNPs. The scaling parameter that fitted equally well, based on the REML likelihood in both models, was recommended. In our study, the optimal scaling parameter for this dataset was tested with the aim of maximizing model fit and minimizing the MSEP for the validation data. In our case, differences in log-likelihood and MSEP among the scaling parameters applied were rather small. The optimal scaling parameter may also depend on the characteristics of the data, such as the MAF spectrum. Here, the genotypes were from the BovineHD SNP chip with 630,000 SNPs, which contained a fair number of SNPs with a MAF lower than 1% (5% of the SNPs) and even rare SNPs with a MAF lower than 0.5% (3.7% of the SNPs) (Fig. 1); future studies focusing on whole-genome sequence variants might contain even more low MAF variants, which might result in a different optimal scaling parameter.

Allele frequency distribution
The derived equations for the ratios of ASE from a uniform and U-shaped allele frequency distribution show the implications of (more/less) shrinkage towards the mean for different scenarios. The heritability, number of individuals, and the number of variants all have an impact on the ratio. For both allele frequency distributions, with a small number of individuals (i.e., 1000) there seems to be little impact of heritability and number of variants on the difference in shrinkage between scaling and no scaling. However, when the number of individuals is large (i.e., 100,000), it is clear that heritability and number of variants have an impact.
For variants with a low MAF, the ASE obtained with RRcs and RRc are more similar when the number of variants is small and especially when the number of individuals is large and the heritability is high. These results can be interpreted as follows. Differences in genotype coding effectively reflect a difference in prior belief of the impact of rare versus more common variants. RRcs assumes that rare variants are more important than common ones, and therefore, in many cases, the shrinkage of effects of low MAF variants is much lower for RRcs than for RRc. The data can override this prior information, if the dataset is sufficiently large. All possible aspects of the dataset that increase power, i.e., more animals, higher heritability and fewer variants, all reduced the ratios between both ASE, showing that shrinkage due to the prior assumptions of the models was reduced. In other words, the prior assumptions made when choosing the genotype coding become less important when the power of the data increases. However, when increasing power by increasing the population size, even more rare variants will appear with an even lower MAF than what was considered as rare beforehand. This is because rare variants were defined in terms of MAF instead of number of alleles observed. For those 'newly' discovered rare variants, the ratio between RRcs and RRc will be high again and the power will be too low to estimate their effects very accurately, hence power should be analyzed to determine the MAF at which the ASE can be estimated accurately given the data.
Remarkably, in all scenarios, the U-shaped allele frequency distribution showed lower ratios for low MAF variants compared to the uniform allele frequency distribution. This is probably because a U-shaped distribution has more low MAF variants, and also since the total variance remains the same, there is less room for the model to allocate an extremely high ASE for all those low MAF alleles.
The distribution of the allele frequencies for the real data with HD genotypes was not uniform, but more like the U-shape distribution (Fig. 6). This more U-shaped distribution for the HD genotypes is not only true for this dataset. This is due to the chip design, which allowed for more low MAF variants to be included compared to the BovineSNP50 chip. However, the ratios in ASE between RRcs and RRc for real data on stature in cattle were better aligned with the derived ratios for a uniform distribution. The most logical explanation is that the U-shaped distribution that we assumed resembles that of whole-genome sequence data, where the number of low MAF variants is relatively much larger than for the high-density chip data that we used. Effectively, the allele frequency distribution in our chip data may be closer to the uniform distribution than the considered U-shaped distribution. Theoretically, it should be possible to use the formula that was used for the U-shaped distribution to represent any other possible allele frequency distribution, by tuning its parameters. This would provide a more general applicable theoretical framework to predict the impact of different genotype coding on the shrinkage of estimated effects, if those parameters can be computed or derived empirically.
To derive the ratio of ASE from RRcs and RRc for both allele frequency distributions, we made several assumptions. One of these assumptions was that there are no covariances between the estimated ASE of different loci, however, covariances may, for instance, arise due to LD between the loci. Especially for the U-shape distribution, which represents whole-genome sequence data, high LD between variants is expected. In that case, the assumption "ignoring off-diagonal elements in Z ′ R −1 Z" (which is the same as ignoring LD between loci) is likely to be violated when allele coding is not independent of the MAF (e.g., minor allele counted). Nevertheless, the theoretical ratios based on the uniform distribution fitted nicely with the actual ratios from the real stature data example and did not seem to be hampered by the presence of LD (Fig. 4b).

Conclusions
The results of our study show that DGV are not influenced by scaling of centered allele counts, while the estimates of ASE are. Large differences in ASE between scaled and unscaled allele counts were observed for variants with a low MAF, mainly due to less shrinkage towards the mean for scaled allele counts. We derived a theoretical framework that shows that the difference in ASE due to (more/ less) shrinkage is heavily influenced by the power of the data. Increasing the power, by increasing the number of animals, increasing the heritability or decreasing the number of variants, resulted in smaller differences of the ASE between scaled and unscaled allele counts.

GREML
In this paper, we describe SNP-BLUP models, however GREML/GBLUP models are equivalent and were actually applied to the data. This appendix contains the methods for genomic evaluation followed by the backsolving method to obtain allele substitution effects using GREML with a centered GRM according to Method (1) in VanRaden [1] (VR1; equivalent to RRc) and a centered and scaled GRM according to Method (2) in Van-Raden [1] (VR2; equivalent to RRcs). We also provide the method to construct GRM with varying scaling factors for GREML (or GBLUP) models.

Genomic evaluation
To estimate DGV, GREML was performed using two GRM constructed in different ways to show the impact of scaling on ASE. The following mixed model equation was solved using GREML in ASReml software [2]: where y i is the phenotypic record, here stature DYD, of individual i; µ is the mean; u i is the DGV of individual i , with u ∼ N 0, Gσ 2 a , where G is the genomic relationship matrix, and σ 2 a the total additive genetic variance; and e i is the residual of individual i, with e ∼ N 0, Dσ 2 e , where D is a diagonal matrix with elements computed as 1 wt i , with wt i being the number of daughters of individual i on which the DYD of i was based as weight, and σ 2 e is the residual variance.
The different GRM used were: VR1 [Method (1) according to VanRaden [1]; equivalent to the SNP-BLUP model with centered allele counts (RRc)]: where Z contains the centered allele counts of the SNP, with elements computed as x ij − 2p j , where x ij is an element of the X matrix containing the SNP genotype for individual i at locus j coded as 0, 1, or 2; and p j is the frequency of the allele for which the homozygous genotype is coded as 2 at locus j. Note that 2p j is the mean allele count of the SNP for the centering.
VR2 [Method (2) according to VanRaden [1]; equivalent to the SNP-BLUP model with centered and scaled allele counts (RRcs)]: where N is the number of SNPs; and W contains the centered and scaled allele counts of the SNPs for all individuals at all loci, with elements computed as . Note that 2p j 1 − p j is the standard deviation used for the scaling.

Back-solving
Considering VR1, the ASE (α VR1 ) were back-solved as coefficients of the regression of the DGV (û VR1 ) on the allele counts. Considering that û VR1 = Zα VR1 yields the following expression [3]: where S = j 2p j 1 − p j . For VR2, the following applies: û VR2 = Wb VR2 , where b VR2 are regression coefficients, that can be obtained with: Note that the b VR2 values are not allele substitution effects, i.e., they are not equal to half the difference of the value between the two homozygotes [4], because W ′ contains scaled allele counts. The allele substitution effects for VR2 (α VR2 ) can be obtained as: . Thus, following the above definition, DGV based on VR2 can be computed as û VR2 = Zα VR2 . The back-solving procedure was verified, within each method, by recalculating the DGV as û = Zα, i.e., the back-solved ASE were multiplied with the 0, 1, 2 coded genotypes (i.e., allele counts) and the appropriate general mean was added, and then compared to the original DGV. This gave for both methods correlations and regression coefficients of exactly 1, which demonstrates that the back-solving procedures were correct.

Optimal scaling parameter
Adopting the notation of Speed et al. [5], GRM were constructed using matrix V with elements computed as v ij = x ij − 2p j × 2p j 1 − p j γ /2 , and G = VV ′ × ( j (2p j 1 − p j )) (−1−γ ) by varying γ from − 1 (i.e., VR2) to 0 (i.e., VR1) in steps of 0.1. The factor used to compute G, i.e., ( j (2p j 1 − p j )) (−1−γ ) , is derived as follows. With proper scaling of G, and assuming that allele frequencies used are computed from the current population, the average inbreeding is expected to be zero, hence E(trace(G)) = n, where n is the number of individuals. Consider that the variance of column j in V is var V .j = var X .j × 2p j 1 − p j γ = 2p j 1 − p j × 2p j 1 − p j γ = 2p j 1 − p j (1+γ ) .
Consequently, E trace VV ′ = E i j v 2 ij = E n j var V .j = n( j (2p j 1 − p j )) (1+γ ) . To get E(trace(G)) = n, then requires to multiply VV ′ by ( j (2p j 1 − p j )) (−1−γ ) . For all scaling parameters, the diagonal elements of the GRM were on average 1, resulting in an average inbreeding coefficient of 0.
Derivation of C for the U-shape distribution when 4N e v ≈ 0 For the U-shaped distribution, the probability density function, is φ(p) ≈ Cp 4N e v−1 (1 − p) 4N e u−1 e 4N e sp(1−p) [6], where v and u are assumed to be equal and represent forward and backward mutation rates, N e is the effective population size, s is the selection coefficient and C is a constant that scales the sum of all probabilities to 1. Assuming s = 0 for simplification, the term e 4N e sp(1−p) drops from the equation, such that φ(p) ≈ Cp 4N e v−1 (1 − p) 4N e v−1 . Let us consider here C −1 instead of C, for ease of notation. α VR2 = Ub VR2 , Furthermore, because the value of 4N e v used here is rather small (v = 1 × 10 −8 , N e = 65; 4N e v = 2.6 × 10 −6 ), we can assume that 4N e v − 1 ≈ −1. Thus, we obtain that: Now, we rewrite the above, such that the summation is expressed in terms of the "Harmonic series" [7], for which it is known that: where γ is the Euler-Mascheroni constant (being close to 0.57721) [8].
With p = i 2n , we get: We then note that this series is symmetric, e.g., both i = 1 and i = 2n − 1 result in 1 1− 1 2n : For this series, the ith element is equal to: Such that: The symmetry of the series implies that it is the sum of the harmonic series 1 i and its reverse 1 2n−i , which are the same, such that: Thus: