Vector space algebra for scaling and centering relationship matrices under non-Hardy–Weinberg equilibrium conditions

Background Scales are linear combinations of variables with coefficients that add up to zero and have a similar meaning to “contrast” in the analysis of variance. Scales are necessary in order to incorporate genomic information into relationship matrices for genomic selection. Statistical and biological parameterizations using scales under different assumptions have been proposed to construct alternative genomic relationship matrices. Except for the natural and orthogonal interactions approach (NOIA) method, current methods to construct relationship matrices assume Hardy–Weinberg equilibrium (HWE). The objective of this paper is to apply vector algebra to center and scale relationship matrices under non-HWE conditions, including orthogonalization by the Gram-Schmidt process. Theory and methods Vector space algebra provides an evaluation of current orthogonality between additive and dominance vectors of additive and dominance scales for each marker. Three alternative methods to center and scale additive and dominance relationship matrices based on the Gram-Schmidt process (GSP-A, GSP-D, and GSP-N) are proposed. GSP-A removes additive-dominance co-variation by first fitting the additive and then the dominance scales. GSP-D fits scales in the opposite order. We show that GSP-A is algebraically the same as the NOIA model. GSP-N orthonormalizes the additive and dominance scales that result from GSP-A. An example with genotype information on 32,645 single nucleotide polymorphisms from 903 Large-White × Landrace crossbred pigs is used to construct existing and newly proposed additive and dominance relationship matrices. Results An exact test for departures from HWE showed that a majority of loci were not in HWE in crossbred pigs. All methods, except the one that assumes HWE, performed well to attain an average of diagonal elements equal to one and an average of off diagonal elements equal to zero. Variance component estimation for a recorded quantitative phenotype showed that orthogonal methods (NOIA, GSP-A, GSP-N) can adjust for the additive-dominance co-variation when estimating the additive genetic variance, whereas GSP-D does it when estimating dominance components. However, different methods to orthogonalize relationship matrices resulted in different proportions of additive and dominance components of variance. Conclusions Vector space methodology can be applied to measure orthogonality between vectors of additive and dominance scales and to construct alternative orthogonal models such as GSP-A, GSP-D and an orthonormal model such as GSP-N. Under non-HWE conditions, GSP-A is algebraically the same as the previously developed NOIA model.

Background Currently, massive single nucleotide polymorphism (SNP) genotyping of animals allows genomic prediction [1] with increased accuracy and response to selection compared to pedigree-based prediction of estimated breeding values. A commonly used technique is genomic best linear unbiased prediction (GBLUP) of breeding values, which incorporates a marker-based additive genomic relationship matrix ( G-matrix) instead of a relationship matrix based on pedigree [2]. There are two alternative parameterizations when constructing a genomic relationship matrix: statistical and biological. The statistical or classical parameterization describes breeding values in terms of the average substitution effect of a locus at the population level [3]. The classical parameterization is widely used in breeding value estimation for farm animals because it provides information on the impact of a progenitor on the expected performance of its offspring. The alternative is the biological parameterization in which the effects of a locus are given in terms of genotypic values, which is more intuitive and practical when analyzing variability in natural populations. The distinction between these two alternatives has been previously acknowledged [4] and implemented in the construction of the G matrix for the statistical [2,5] and biological [6] parameterizations. Most current applications assume that populations are in Hardy-Weinberg equilibrium (HWE). Conditions for HWE are: a very large breeding population, random mating, no change in allele frequencies due to mutation, and absence of migration and selection. The HWE conditions are usually assumed in the construction of the G matrix for simplicity and because departures from those conditions may not be important, in particular if dominance effects are not considered. However, some populations of commercial animals (e.g. pigs and poultry) result from crosses between distant populations, for which HWE does not apply. Recently, Vitezica et al. [7] proposed the construction of genomic relationship matrices with orthogonality between additive and dominance scales based on the NOIA (natural and orthogonal interactions approach) method of Alvarez-Castro and Carlborg [8]. Orthogonality means that additive and dominance effects are uncorrelated. The NOIA method does not require the assumption of HWE. The numerator of G matrices in VanRaden and NOIA are equivalent but the denominators (scaling) are different [9]. Varona et al. [10] reviewed methodology for the construction of relationship matrices that incorporate nonadditive effects.
Vector spaces are mathematical objects that abstractly capture the geometry and algebra of linear equations. Some techniques of the algebra of vector spaces, such as the Gram-Schmidt process, have not been applied to the construction of relationship matrices. The objective of this paper is to present and characterize methods to construct orthogonal additive ( G ) and dominance ( D ) genomic relationship matrices using vector space algebra (Gram-Schmidt process) without requiring HWE. A comparison of the newly proposed methods to construct G and D matrices to existing methods is carried out using a dataset of Large-White × Landrace crossbred pigs.

Scaling and centering the genomic relationship matrix
In the statistical parameterization, breeding values are modeled using the average of allele substitution effects at genotyped loci, rather than genotypic values. This parameterization requires the assumption of HWE. In 1941, Fisher already pointed out that computation of breeding values must assume random mating [11]. Later and along the same line, Falconer stated: "The concept of breeding value is shown to have no useful meaning when mating is not random" [12]. Therefore, the statistical parameterization will not be considered further and only the biological parameterization will be addressed in the next sections.
The statistical model for genomic prediction with additive and dominance effects in the biological parameterization [6,13] is: where y i is the phenotypic record of the i-th individual; µ is the population mean; m is the number of markers; a j and d j are the additive and dominance effects of the j -th marker; z ij is 1, 0, and − 1 for the i-th individual with genotype at the j-th marker AA , Aa and aa ; s ij is 0, 1, and 0 for the i-th individual with genotype at the j-th marker AA , Aa and aa ; respectively; and e i is the error. In matrix notation, Model (1) is: where y is a vector of phenotypes, µ is the mean; Z a and S d are matrices with n rows (number of individuals) and m columns (number of markers) with values as defined above for z ij and s ij relating information on each individual genotype with additive and dominance effects; a and d are vectors of additive and dominance effects, respectively; and e is a vector of errors. The additive and dominance covariance matrices associated with the random additive and dominance effects in this model are: (1) where G and D are additive and dominance genomic relationship matrices; σ 2 a and σ 2 d are the additive and dominance variance, respectively; H a and H d are matrices with n rows and m columns for additive and dominance scales. These scales are used to center all markers so the mean contributed by each marker is zero. SC a and SC d are used for scaling additive and dominance relationship matrices. Different assumptions and methods to center and scale relationships lead to alternative G and D matrices. First, we present several existing models to construct additive and dominance relationships matrices. Second, we propose methods to test for the lack of orthogonality between the additive and dominance scales. Third, we show a general method from vector space algebra, the Gram-Schmidt process, to generate alternative orthogonal models. Last, we present the results obtained by applying these methods to a dataset consisting of crossbred animals.

Hardy-Weinberg equilibrium parameterization
Assuming Model (1), the population is segregating for the three genotypes AA (j) , Aa (j) and aa (j) at the j-th locus, with alleles A (j) and a (j) and corresponding frequencies p (j) and q (j) . The genotypic values under an additive-dominance model of the three genotypes are a (j) , d (j) and −a (j) , respectively. VanRaden [2] proposed centering of additive marker effects by subtracting the mean additive effect, resulting in the additive genetic values: 2 − 2p (j) a (j) , 1 − 2p (j) a (j) , −2p (j) a (j) , for individuals with genotypes AA (j) , Aa (j) and aa (j) , respectively. Therefore, the scales at the column corresponding to the j-th marker in H a are: Assuming HWE, Su et al. [6] and Vitezica et al. [5] proposed centering of dominance marker effects by subtracting the mean dominance effect, resulting in , and −2p (j) q (j) d (j) for individuals with genotypes AA (j) , Aa (j) and aa (j) , respectively. Therefore, the scales at the column corresponding to the j-th marker in H d are: Assuming HWE, VanRaden [2] scaled the additive relationship matrix constructed with scales of Eq. (2) by m j=1 2p (j) q (j) , while Su et al. [6], and Vitezica et al. [5] proposed scaling the dominance relationship matrix under the assumption of HWE and constructed with scales in Eq.
. This approach has two problems. The first one is that it results in functional rather than statistical values associated with locus genotypes, which can be used to derive genotypic values of the individual but not breeding values that predict performance of progeny, as discussed before. Second, centering of the dominance incidence matrix is not necessarily achieved when the population is not in HWE. For example, the mean for the j-th marker for the centered dominance matrix is equal to p Aa (j) , and p aa (j) are the frequencies of AA , Aa and aa genotypes at the j-th marker, respectively. It reduces to p Aa (j) − 2p (j) q (j) and is not equal to 0 if the population is not in HWE. For example, an F 1 cross may have an excess of heterozygotes compared to HWE frequencies.

Non-Hardy-Weinberg equilibrium parameterization
Here, we propose a non-HWE parameterization using functional marker effects based on the mean and the variance of additive and dominance genotypic values. Thus, multiplying genotypic frequencies by their corresponding values and summing, the additive mean for the j-th marker is: Similarly, the variance of the additive genotypic effects is obtained by multiplying genotypic frequencies by the square of their corresponding values, summing, and subtracting the square of the mean: After some algebra, the additive variance becomes . The additive variance contributed by the m markers is then: When subtracting the additive mean from the genotypic values, the centered genotypic values for individuals with genotypes AA , Aa and aa at the j-th marker in absence of HWE are 1 − p AA (j) − p aa (j) a (j) , − p AA (j) − p aa (j) a (j) , and −1 − p AA (j) − p aa (j) a (j) . Therefore, in the absence of HWE, the scales at the column corresponding to the j-th marker in H a are: Note that these centered scales reduce to those of Van-Raden [2] because p (j) = p AA (j) + 1 2 p Aa (j) . However, the scaling of the additive relationships can be based on Eq. (4), which is different from the classical scaling of matrix G in, e.g., VanRaden [2], which is m j=1 2p (j) q (j) . The dominance genotypic effects for the j-th marker without the assumption of HWE are as first given by Alvarez-Castro and Carlborg [8]: , and −p Aa (j) d (j) for individuals with genotypes AA , Aa and aa , respectively. In the absence of HWE, the scales at the column corresponding to the j-th marker in H d are: The variance contributed by the j-th marker is: . Under HWE, the variance of the dominance effects reduces to . which is as given by Su et al. [6]. The resulting additive and dominance scales might be correlated (non-orthogonal), which makes estimation and interpretation of the results more difficult. (5)

Orthogonal parameterization based on the NOIA method
Based on Cockerham [14], an orthogonal decomposition of additive and dominance variance components was proposed by Alvarez-Castro and Carlborg [8] and termed the natural and orthogonal interactions approach (NOIA). The requirements for the orthogonal partition of variance are: and v NOIA are the additive and dominance scales for the g-th genotype ( g = AA, Aa, aa ) at the j-th marker. Requirements (7) and (8) force a comparison of deviations around the mean for additive and dominance scales. Requirement (9) forces the additive and dominance scales to be uncorrelated (orthogonal). Vitezica et al. [7] implemented the orthogonalization of NOIA to construct additive and dominance relationships. Thus, the proposed orthogonal scales of Alvarez-Castro and Carlborg [8] and Vitezica et al. [7] for individuals with genotypes AA , Aa , and aa are: After some algebra, scales for the additive component in the absence of HWE become the same as for the NOIA method: They also reduce to the VanRaden scales, as shown by Joshi et al. [9].
The dominance scales in the NOIA method are: Equations (10) and (11) satisfy conditions of orthogonality of Eqs. (7), (8), and (9). Vitezica et al. [7] implemented orthogonalization of the NOIA approach by scaling the G and D matrices by tr H a H ′ a /n and tr H d H ′ d /n , respectively, where H a and H d include the scales for individuals according to their genotypes in Eqs. (10) and (11), respectively. Therefore, after scaling, genomic and dominance relationship matrices become:

Vector space algebra for orthogonal parameterization using the Gram-Schmidt process
We propose to use algebra of vector spaces to construct genomic and dominance relationship matrices. The Gram-Schmidt process takes several non-orthogonal linearly independent functions to construct an orthogonal basis over an arbitrary weighting function [15]. First, we will use vector space algebra to measure orthogonality between the additive and dominance scales.

Vectors space algebra to measure orthogonality
Define vector spaces for additive ( ⇀ u j ) and dominance ( ⇀ vj ) for the j-th marker with dimensions equal to the number of individuals. The elements of ⇀ u j and ⇀ vj are the scales to (11) v NOIA center the additive and dominance relationship matrices, respectively. Then, H a and H d can be constructed as: The vector spaces for the additive scale, ⇀ uj , at the j-th marker for individuals with genotypes AA , Aa and aa under non-HWE conditions based on Eq. (5) are: The elements of ⇀ vj are dominance scales for individuals with genotypes AA , Aa and aa under non-HWE conditions from Eq. (6): For a given marker, the set of vectors, ⇀ u j and ⇀ v j , form a basis since both vectors span the vector space and the vectors in the set are linearly independent. However, the set of vectors in this basis is not necessarily orthogonal. In this setting, Alvarez-Castro and Carlborg [8] showed that orthogonality is only achieved when either the two homozygotes are at the same frequency or there is not any heterozygote. The angle, θ j , between the two vectors, ⇀ u j and ⇀ v j , provides a measure of the degree of orthogonality. From the definition of inner product (p 519 in [15]), cos θ j for the j-th marker is given by: ⇀ v j is the inner product of the vectors ⇀ u j and ⇀ v j and has an expected value equal to: and Taking arc cos in the above formula renders θ j in radians. For θ j = 90°, the two vectors are orthogonal. For θ j ≠ 90°, the two vectors are non-orthogonal, with the level of dependency being larger for values of θ j near 0° or 180°.

Construction of orthogonal G and D matrices using the Gram-Schmidt process
The Gram-Schmidt process can be used to construct an orthogonal basis of the additive and dominance scales for each marker. The basic idea behind orthogonalization by the Gram-Schmidt process is that the first vector is kept unchanged, whereas the common component to both vectors is removed in the second vector. We explored three alternatives for applying the Gram-Schmidt process in the context of genomic relationship matrices. The first, GSP-A, initiates the process with ⇀ u (additive), whereas the second, GSP-D, initiates the process with ⇀ v (dominance). In the third alternative, GSP-N, additive and dominance vector of scales are forced to be of length 1 after using the scales from GSP-A. Conditions of orthogonality from Eqs. (7), (8), and (9) for GSP-A and GSP-D are verified in Appendix 1.

GSP-A Gram-Schmidt process
The goal of the process is to obtain orthogonal vectors ⇀ τ A and ⇀ τ D for the additive and dominance scales, respectively. The Gram-Schmidt process for GSP-A is: Therefore, the elements of ⇀ τ A (j) are the additive scales from Eq. (5) for the individuals depending on their genotype, AA , Aa , or aa: Therefore, the elements of the orthogonal basis for the dominance vector, ⇀ τ D (j) , for the three genotypes are: As shown in Appendix 2, the elements for centering the three genotypes from Eq. (15) are algebraically identical to those derived under the NOIA method: The first part of the Gram-Schmidt process leads to an orthogonal basis of additive and dominance scales for each marker. The scaling of additive relationship matrix is as shown in Eq 4. For the dominance component, the variance contributed by the j-th marker is: After substituting 4p AA (j) p aa (j) + p AA (j) + p aa (j) − p AA (j) + p aa (j) 2 by its value: p AA (j) + p aa (j) − p AA (j) − p aa (j) 2 , the above equation reduces to: The only difference between GSP-A and the NOIA orthogonalization is in the scaling of the G and D matrices. The scaling in Eqs. (12) and (13)  For very large n , the left and the right-hand sides of the above equations will be the same and also, the relationship matrices constructed using either method.

GSP-D Gram-Schmidt process
The Gram-Schmidt process for GSP-D is initiated with the vector for the dominance effects and then the common part of the additive and dominance effects is removed when constructing an orthogonal basis: Therefore, dominance scales for the three genotypes in ⇀ τ D (j) using Eq. 6 are: Substituting the above equations into Eq. (17) yields: Therefore, the elements of ⇀ τ A (j) for the three genotypes are: The additive variance contributed for the m markers is: GSP-N Gram-Schmidt process The first step of the GSP-N process leads to an orthogonal basis of additive and dominance scales based on GSP-A. The next and final step of the GSP-N process is to obtain an orthonormal basis for additive and dominance scales by dividing The scaling for D is the same as for the scaling in the non-Hardy-Weinberg parameterization, m j=1 p Aa (j) − p 2 Aa (j) . The additive scales are obtained as and ⇀ τ D (j) by the norm of each vector. The length of resulting vectors in GSP-N is unity, which implies that all markers contribute equally when constructing the genomic relationship matrices. Using the scales for ⇀ τ A (j) and ⇀ τ D(j) as described in Eqs. (14) and (15), the norm of the additive and dominance vectors are obtained as follows: Then, matrices H a and H d become: The resulting relationship matrices still need to be scaled, which can be easily done by dividing G and D by

Animals and data
A dataset comprising five trials (PHGC17, PHGC21, PHGC23, PHGC24, and PHGC25) aimed at investigating the genetic basis of resistance to porcine reproductive and respiratory syndrome virus (PRRSV) after natural infection was used for variance component estimation.
function. The LOESS function sets a low-degree polynomial at each point using weighted least squares and gives more weight to observations that are near the point for which response is being estimated and less weight to observations further away. This fitting was necessary to account for the natural variation in the concentration of viremia due to sampling and the methodology used to measure viremia in serum. The area under the curve for viremia for each individual, which will be referred to as viral load (VL), was the phenotype used for estimating variance components.
A tissue sample obtained from each pig was used for SNP genotyping. DNA extraction and genotyping was carried out using the Infinium HD Assay Ultra protocol (Illumina Inc.) and the Illumina Porcine SNP60 BeadChip [16]. In total, 32,645 single nucleotide polymorphisms were used to construct relationship matrices. Details are in Gomez-Raya et al. [17].

Construction of G and D matrices and variance component estimation
An exact test for HWE conditions was carried out for all SNPs in the dataset. This test was performed using the Hardy-Weinberg package in the R language (https :// cran.r-proje ct.org/web/packa ges/Hardy Weinb erg/Hardy Weinb erg.pdf ).
Construction of the G and D matrices was performed using the six methods described in the previous sections. The Kullback-Leibler divergence [18] was used to quantify the divergence between G and D . The Kullback-Leibler divergence [18] from Q to P was computed as: where µ P and µ Q are vectors of the means of the n individuals in the P and Q matrices, respectively. P represents a normal multivariate distribution. Multivariate normal distribution Q represents an approximation to P . The Kullback-Leibler divergence is the average difference of the number of bits required for encoding samples of P Right after weaning, 903 Landrace × Large White barrows were moved to farms with a history of PRRSV infections. Blood was drawn weekly and curves of viremia over time were constructed using a LOESS (LOcal regrESSion) using a code optimized for Q rather than one optimized for P . The unit of Kullback-Leibler divergence is the natural unit of information (nats) [19]. Values of D KL (P||Q) equal to zero means that P and Q are the same distributions. The Kullback-Leibler divergence is not a true metric since it is not symmetrical and it does not obey the triangle inequality. Thus, Kullback-Leibler's divergence from P to Q is different to divergence from Q to P . Asymmetrical D KL (P||Q) > D KL (Q||P) implies that more information is needed to approximate P with Q than the other way around. In our analysis, P and Q were either two additive or dominance relationship matrices as derived for the six methods investigated in this paper. The statistical model to analyze VL included the fixed effects of the mean and trial. Random variables were the additive and the dominance scales of the biological parameterization, as described in the Theory and methods section. Heritability ( h 2 ) was estimated as the ratio of the estimate of additive genomic variance over the sum of estimates of the variance of each random component. The proportion of dominance variance ( d 2 ) was estimated by dividing the estimate of the dominance variance component by the sum of the estimates of all random components in the model. The mixed linear models were fitted using ASReml [20], with G and D matrices as described in the previous sections.

Results
A genome-wide Fisher´s exact test for Hardy-Weinberg departures was performed using all SNPs jointly for all five trials. A Manhattan plot showed that disequilibrium is common in the Landrace × Large-White crosses, although the location of SNPs that were in disequilibrium did not appear to be random (Fig. 1). There was an average excess of 4.7% of heterozygotes across the genome. Thus, these data are appropriate for investigating the properties and comparison of alternative genomic and dominance relationships matrices with departures from HWE.
A first look at the performance of relationship matrices of the six methods revealed that all except the HW approach performed well to attain an average of diagonal elements equal to one and an average off diagonal elements equal to zero (Table 1). The average of the diagonal elements of the G matrix of the HW approach was 0.94 (Table 1). All eigenvalues were positive for all six methods used to construct genomic relationship matrices.
Kullback-Leibler's divergence for pairs of combinations of either G or D matrices is in Table 2. Divergence from G (GSP-D) to G created by other methods was larger than the divergence between other pairs of G matrices. It was also slightly asymmetrical, meaning that more information is needed to approximate G from GSP-D using G from the other methods than the other way around. For the dominance relationship matrices, the Kullback-Leibler divergence using HW relationship distribution is strongly asymmetrical since relationship matrices NOIA, GSP-A, GSP-D, or GSP-N require a much larger number of bits when using a code optimized for HW than the other way around. This could be attributed again to the increase in heterozygosity in the crossbreds, resulting in the scales in D from the HW method not being actually centered to zero. Divergence from G (or D ) matrices in NOIA and GSP-A methods to G (or D ) created by other methods were very similar because they are equivalent. The only difference between these two methods is that NOIA uses tr H a H ′ a /n and tr H d H ′ d /n as the denominators of the G and D , whereas GSP-A uses just the expected values of these expressions.
The methodology of the algebra of vector spaces allows to investigate orthogonality between additive and dominance vectors. An angle of 90° between the two vectors implies that the elements of the additive and dominance scales are orthogonal. Figure 2 shows a density plot of the estimates of the angle, θ , between additive and dominance vectors for all markers. Results show that a majority of markers had an angle between the additive and dominance vectors that was close to 90°. In Fig. 3, the -log 10 (p-value) of Fisher's exact test for departures from HWE is plotted against the allele frequency and the angle between additive and dominance vectors for each marker when using NO-HW method to construct G and D matrices. Markers with intermediate allele frequencies tended to show significant departures from HWE and tended to have vectors of additive and dominance scales that were orthogonal. Table 3 shows variance component estimates when using the six alternative methods to construct G and D matrices. The statistical analysis yielded a highly significant trial effect, which is attributable to the uncontrolled conditions in each trial. The NO-HW method tended to have lower estimates of the additive and dominance variance components than HW method, which is consistent with the HW method being upwards biased. Comparing estimates from the orthogonal methods, it can be summarized that (a) NOIA and GSP-A resulted in nearly identical estimates for both additive and dominance variance components, as expected; (b) GSP-D resulted in a larger estimate of the dominance variance component than any of the other methods, which was contrary to estimates from the NOIA and GSP-A methods; and (c) GSP-N resulted in an estimate of zero for dominance variance. Gomez-Raya et al. Genet Sel Evol (2021) 53:7

Discussion
Most studies in genetics applied to animal breeding or human genetics assume that the populations under study are in HWE. Advances in molecular technologies in recent years have allowed for a renewed interest in the impact of HWE assumptions in genetic analyses [21]. Modern animal breeding now relies on genomic prediction [1]. One of the most widely used methods of genomic prediction is GBLUP, which consists in replacing the traditional pedigree-based relationship matrix by a genomic relationship matrix that incorporates genotype information on SNPs. This requires genotype contributions for each SNP to be centered and scaled for both G and D matrices. The first attempt to construct G matrices from marker genotypes was by VanRaden [2], whose method did not consider dominance and assumed HWE for the scaling of the relationship matrix. Later on, a distinction between genomic relationships constructed using biological and statistical parameterization methods was proposed [5,6], which also assumed HWE. A new method, NOIA, developed by Alvarez-Castro and Carlborg [8] and applied to the construction of genomic relationship by Vitezica et al. [7] uses an orthogonal partition of additive and dominance effects, and does not require the assumption of HWE. In this paper, we show that vector space algebra can be helpful in the construction of relationships matrices and in the evaluation of the level of departures from orthogonality between additive and dominance vectors of scales. We showed that in our data, vectors of additive and dominance scales constructed using HW-NO are often orthogonal ( θ = 90°). We also show that markers at intermediate frequencies tend to have significant departures from HWE and their vectors of additive and dominant scales are orthogonal. This is because the numerator of cos θ j is n −p Aa (j) p AA (j) − p aa (j) , which becomes zero (orthogonality) at intermediate frequencies ( p AA (j) = p aa (j) ).
We show in this paper that centering and scaling the G matrix using the NOIA method coincides with the GSP-A method, which is based on orthogonalization by the Gram-Schmidt process. The Gram-Schmidt process Table 2 Kullback-Leibler's divergence for the genomic ( D KL (G 1 ||G 2 )) and dominance ( D KL (D 1 ||D 2 . 2 Density of the angle between additive and dominance components across the genome of the crossbred population. The blue vertical bar shows the angle of orthogonality converts vectors into an orthogonal system. This is done by taking one of the vectors and finding the projection of the next vector that is orthogonal to the former vector. We also showed that algebraically, GSP-A, and consequently, NOIA only centers and scales by accounting for the lack of HWE. However, orthogonality is achieved in the construction of the D matrix after removing the variation that is common to the G and D matrices. Thus, the proposed applications of the Gram-Schmidt process are equivalent to removing the additive-dominance covariation from the other relationship matrix by linear regression.
Another alternative to deal with additive-dominance co-variation is inclusion of a covariance term between additive and dominance effects. This was explored by Xiang et al. [22] based on an equivalent statistical model, as proposed by Fernandez et al. [23]. More work is needed to compare this model with models that use NOIA or GSP-A to construct G and D matrices.
One of the most common situations where HWE does not hold is in crossbred populations. Lo et al. [24] first described how to use data on crossbreds and their corresponding purebreds to estimate breeding values and variance components. In their model, each individual has two breeding values; one for purebred performance and one for crossbred performance. Ibañez-Escriche et al. [25] first implemented a crossbred model that incorporates genomic information. More recently, Vitezica et al. [26] showed how additive and dominance components can be implemented in genomic prediction using purebred and crossbred performance to estimate breeding values for purebred animals and their crosses. Our analyses differ from those of Vitezica et al. [26] in that we do not use SNP genotype information on purebreds and just incorporate SNP genotype information from crossbreds into genomic relationship matrices, as an example with extreme departures from HWE (as expected and observed in our analyses). We did not differentiate between allele substitution effects according to the breed origin of the alleles either. In the analysis of crossbred data, the method of Vitezica et al. [26] is more appropriate if the goal is to estimate breeding values of purebreds, and SNP genotype information is available on the purebred parents. Nevertheless, their method also assumes HWE within each of the purebred parental populations, which may affect estimates of variance components.
We observed that roughly between 15 and 25% of all the variation for viral load following PRRSV infection is of genetic origin. Different methods to center and scale relationship matrices provided a different answer to the relative proportion of additive and dominance variation. HW, NO-HW, and GSP-D obtained a much higher estimate of dominance variance than of additive variance, whereas NOIA, GSP-A, and GSP-N resulted in the opposite. This is expected because of the way GSP-A, NOIA, and GSP-N are constructed, i.e. by removing common  additive-dominance covariance in the centering of the dominance relationship matrix. Fisher was the first to separate genetic variance into additive, dominance, and epistatic components using the least squares principle [27]. Later on, Cockerham partitioned the epistatic variance into additive × additive, additive × dominance, dominance × additive, and dominance × dominance interaction components [14]. Cockerham also showed how to scale additive and dominance components using a regression model under the assumption of HWE [14]. He stated "This particular set of scales (among the many others mathematically possible) was chosen for its utility. The scales pertaining to the marginal comparisons of each locus were chosen to separate the marginal variance into the additive (linear) and dominance (quadratic) portions that were long ago shown to be useful for expressing simply the correlation between parent and offspring and between other relatives". Concerning hybrids, Stuber and Cockerham showed that a proper partitioning can be done in the parental populations, with each being assumed to be in HWE [28]. The NOIA model extends these scales to the situation in which the population is not in HWE [8]. Certainly, NOIA (or GSP-A) reflects better the linear regression nature, for example, of parent-offspring than GSP-D does. Also, it is more appropriate to predict response to selection. However, different scales in GSP-A (or NOIA or GSP-N) versus GSP-D yield different partitions of additive and dominance variance components, which deserves further investigation to address which of the partitions is of interest and for which purpose. In addition, method GSP-N standardized the length of the additive and dominance vectors to one, which implies that all markers weigh equally when constructing genomic relationship matrices, regardless of their frequencies (and/or marker locations). More work is needed to understand the implications and properties of alternative G and D matrices constructed using vector space algebra.

Conclusions
Vector space theory provides techniques that can be useful for the construction of relationship matrices in populations that are not in HWE. It can provide a measure of the degree of departures from orthogonality between additive and dominance components. It can also be applied to construct orthogonal or orthonormal relationship matrices, such as based on GSP-A, GSP-D, or GSP-N. The GSP-A method coincides with the NOIA method. With the GSP-N method, all markers contribute equally when constructing relationship matrices. Alternative orthogonal models to construct relationship matrices result in different estimates of additive and dominance variances, which requires further research.