Further insights of the variance component method for detecting QTL in livestock and aquacultural species: relaxing the assumption of additive effects

Complex traits may show some degree of dominance at the gene level that may influence the statistical power of simple models, i.e. assuming only additive effects to detect quantitative trait loci (QTL) using the variance component method. Little has been published on this topic even in species where relatively large family sizes can be obtained, such as poultry, pigs, and aquacultural species. This is important, when the idea is to select regions likely to be harbouring dominant QTL or in marker assisted selection. In this work, we investigated the empirical power and accuracy to both detect and localise dominant QTL with or without incorporating dominance effects explicitly in the model of analysis. For this purpose, populations with variable family sizes and constant population size and different values for dominance variance were simulated. The results show that when using only additive effects there was little loss in power to detect QTL and estimates of position, using or not using dominance, were empirically unbiased. Further, there was little gain in accuracy of positioning the QTL with most scenarios except when simulating an overdominant QTL.


INTRODUCTION
Quantitative trait loci (QTL) detection using mixed linear models is one of the preferred methods for estimating the contribution of a particular chromosomal segment to the observed variance in general pedigrees from outbred populations [2,19]. This method infers QTL segregation using as a covariance structure the number of alleles identical by descent (IBD) conditional on genetic markers in many positions of the genome [19,29].
It is customary that, when using crosses between outbred populations (the F 2 design), additive and dominance effects are fitted jointly in the regression analysis [1]. Using the variance component method, it is usually assumed that only additive effects are of importance and therefore only IBD matrices conditional on marker data are fitted in the restricted maximum likelihood (REML) procedure (see [10,22] for traditional implementations in outbred pedigrees of pigs and sheep). Although this is indeed correct under the assumption of no dominance, it is not clear under the variance component framework what is the most powerful test of linkage and by what extent variance components can be biased if dominance is not accounted for in the model of analysis.
In light of recent results in cattle [17], where significant dominance effects have been estimated in the DGAT1 locus, this may be of importance, for example when the interest is to select genomic regions showing evidence of QTL at particular chromosomes, when predicting breeding values due to the QTL in order to select candidates in marker assisted selection programmes [8,13] or when performing confirmation studies within commercial populations. This may be important in cases where the original experiments from crosses between outbred lines show evidence of non-additive gene action at the QTL [6,7].
Under the assumption of genes with infinitesimal effects, modelling dominance is difficult since it is necessary to maximise the likelihood of the data, fitting extra parameters, such as dominance variance and the covariance between additive and dominance effects under inbreeding [5], and it is likely that the estimates of these variance components are subjected to large sampling correlations [23,24]. Also under the infinitesimal model it is difficult conceptually to deal with inbreeding depression, since it is doubtful that a genetic model of an infinite number of loci exists with directional dominance [5]. Nevertheless, at least in theory, the use of more complex models may help to improve accuracy of estimation, as well as help to exploit non-additive genetic variation within breeds [12]. However, in practice it is not easy to disentangle variation due to common environmental effects and dominance effects, since when using full-sib structures as in poultry or fish breeding both terms are completely confounded. Under mixed inheritance, non-additive genetic variance can be accommodated explicitly by extending the mixed inheritance model of the QTL. The covariance structure of dominance effects is proportional to the probability that two relatives share the same genotype at a locus [9]. Very little has been presented in the literature about this subject, although in practice it is an important issue for detecting QTL in outbred populations [21].
In the present paper, we investigated the behaviour of the mixed linear model when modelling dominance variance at the QTL in species where relatively large family sizes can be obtained, such as in pigs, poultry, and aquacultural species. Since a priori, in a given experiment where the actual genetic model is not known, we investigated a two-step approach in which additive effects 586 V. Martinez and genotypic effects at the QTL are first modelled for QTL detection in order to obtain the most likely position of the QTL. Testing for dominance effects was carried out conditionally at the most likely position of the QTL, previously obtained from both models using the required covariance structure in the mixed linear model. Using these methods, we calculated empirical power and accuracy of estimating variance components, using different livestock population structures likely to be encountered in practice.

MATERIALS AND METHODS
The outline of this paper is as follows. First, we present the mixed model including dominance at the QTL and show how covariances can be computed in full-sib structures; then we performed the testing regimes used first for detecting the presence of a segregating QTL (with or without using information of dominance) and then we made inferences about the mode of gene action at the QTL.

Genetic model
Let us assume a population of non-inbred full-sib families measured for a normally distributed trait (y i ). The model used to explain the phenotype of individual i is where l is the contribution of fixed effects (such as the mean); a i is the additive effect at the putative biallelic QTL of the individual, with values equal to a for QQ, 0 for Qq and qQ, and Àa for qq; d i , is the dominance effect (d) expressed as the difference between the mid-homozygote value [9]; p i is the polygenic component, explaining unlinked genes in the rest of the genome, and e i is the residual. In scalar notation, the first and second moments of the vector of phenotypes are then equal to the following: where U i,j is the IBD proportion of alleles that individuals i and j share at the known QTL, D is a variable indicating whether individual i and j share the same genotype at the QTL, and p i,j is the additive relationship between individuals i and j. These variables are distributed according to the following distributions (by convention, the first allele is the paternal QTL allele inherited Mapping random QTL relaxing additive effects 587 from the father to individual i (q f i ) and the second allele is the maternal QTL allele inherited to individual i (q m i )): To set up the mixed model equations at the animal level, we need to obtain the inverse of the covariance structure due to polygenic effects and of additive and dominance effects at the QTL. We first explicitly constructed the actual matrices pertaining to each full-sib group, as detailed in the following section. These calculations were carried out using a computer program designed specifically for this purpose.

Computing covariance matrices given marker data for full-sib structures
Since the actual genotype of the QTL is not known, we inferred the expected IBD proportion between two full-sibs (/ i,j ) using the marginal distribution of IBD proportions (0, 0.5, and 1) at the QTL conditional on marker information [28]. For the purposes of this analysis, completely informative markers with known ordered genotypes were assumed (see the description in [28]). First, consider what is the probability that sib i inherits QTL alleles from the maternal or paternal first homolog or the second homolog conditional on the marker haplotype. There are four possible QTL allelic classes conditional on flanking markers, each depending on the probability of recombination between the markers and the QTL, and between flanking markers in the male and female parent (see [28] for details). With random mating, the conditional probabilities of QTL genotypes can be calculated as the product between the corresponding probabilities of inherited gametes.
The expected IBD proportion (/ i,j ) is then the product of the probabilities that both offspring receive, either the same or a different QTL allele from the mother or the father (therefore comparisons are made between paternally and maternally inherited alleles) multiplied each by the corresponding IBD 588 proportion (0, 0.5, and 1). In matrix notation, the value of (/ i,j ) between any pair of full-sibs conditional on marker data is equal to the product of the vector of conditional genotype probabilities given marker data (vector Q i ) and a matrix that represents all the possible alternatives of IBD proportion (C) between both full-sibs considered (Equation (5)): where C is a 4 · 4 symmetric matrix with diagonal elements equal to 1 and off-diagonal elements equal to 0 or 0.5. The notation used in the vector Q i stands for conditional probability of QTL genotype, given the ordered genotype at the flanking markers (where 1,i and 2,i represent the flanking markers (1 or 2)) and the allele inherited (say l) from the mother (m) or the father (f) (note that at most there are four different possibilities depending on whether the paternal or maternal allele (for the QTL or marker) inherited by the sib i came from the sire (fm, ff) or the dam (mf, mm) of each parent):  : ð7Þ The value of (d i,j ) between two full-sibs can be obtained similarly, as the probability that both share the same genotype at the QTL [9]. Without using marker data, this value is equal to 1/4 for full-sib individuals. Using marker data, the expectation conditional on marker data can be calculated as the product of the vector Q i transposed and the vector Q j : The first question to be addressed when mapping a QTL is to test whether there is evidence of segregation of QTL in the linkage group under analysis. This is irrespective of whether there is an additive or a dominant QTL segregating. According to Table I, it is possible to compute two different tests for detecting QTL: 1. ADDITIVE: This test is calculated along the linkage group under analysis as minus twice the difference between the log-likelihood of a reduced model (only fitting the polygenic effects) and the log-likelihood of a model fitting additive effects at the QTL in addition to polygenic effects (Tab. I; À2(L I À L II )). This is done adjusting the covariance structure due to additive effects at the QTL at every centiMorgan of the linkage group. This test assumes that an additive genetic model is the true underlying mode of gene action of the QTL. 2. GENOTYPIC: This test is calculated along the linkage group under analysis as minus twice the difference between the log-likelihood of a reduced model (only fitting the polygenic effects) and the log-likelihood of a model fitting additive and dominance effects at the QTL in addition to polygenic effects (Tab. I; À2(L I À L III )). This is done by simultaneously adjusting the covariance structure due to additive and dominance effects at the QTL at every centiMorgan of the linkage group. This test assumes that an additive plus dominance genetic model is the true underlying mode of gene action of the QTL.
In both cases, the test statistic is computed along the linkage group and the highest value of the likelihood ratio (LR) test provides the most likely position of the QTL. Note that the location may differ between tests that were used to detect the QTL (ADDITIVE and GENOTYPIC) but on average both tests should give very similar locations if they are unbiased (see Sect. 4 below).

Second stage: inferences about the mode of gene action at the QTL
It is important to test the actual mode of gene action given the data independently of whether an additive or dominant model is assumed for detecting a QTL. Testing significance of dominance variance can be accomplished within the framework of the ADDITIVE test by fitting dominance effects at the most likely position as obtained from this test (say at position k in the linkage group). At this position, we include in the model the dominance effect with a covariance 590 V. Martinez Table I. Hypothesis testing framework under dominance. In every alternative, the test statistic is computed as twice the difference between the log-likelihood of the null model (L 0 ) and (L 1 ).

Test
Full model Hypothesis tested (L 1 ) Null hypothesis (L 0 ) Mapping random QTL relaxing additive effects proportional to d i,j in addition to the additive effects. The LR test is equal to minus twice the difference between log-likelihoods of these two models at position k (Tab. I; À2(L II À L IV )).
Using the GENOTYPIC test, testing for dominance variance at the QTL position detected can be carried out, computing the LR as minus twice the difference between the log-likelihood of the model in which no dominance effects were fitted (only additive effects of the QTL) and that of the model fitting dominance and additive effects, simultaneously (Tab. I; À2(L II À L III )).

Simulations
We investigated different alternatives using the simulation of two generation outbred pedigrees structured as independent full-sib families with variable sizes (f n ; including parents and progeny). The scenarios simulated used typical values of f n equal to 10, 20, 50, or 100, as observed in many livestock and aquacultural species. The total population size was constant (n = 500), thus the number of full-sib families is then equal to n/( f n ). The analysis comprised a single linkage group of 50 cM with fully informative markers every 10 cM (six in total), with a QTL placed at position 25 cM. Phenotypes were simulated for parents and progeny with a broad sense heritability (including QTL and polygenic variance) equal to 0.4 and a constant additive genetic variance r 2 a ¼ 150 À Á due to a biallelic QTL. Allele frequencies at the QTL in the base population in Hardy-Weinberg equilibrium were equal to 0.5 and alleles of markers and QTL were uncorrelated (i.e. no linkage disequilibrium (LD) was assumed in the base population). Dominance variance due to the QTL was simulated using different ratios of the dominance (d) and additive (a) effects (Tab. II). For each case the residual genetic variance, due to variation under infinitesimal model assumptions (polygenic effects), was varied, such that it explained the remainder of the total genetic variance. The environmental variance was kept constant in all scenarios r 2 e ¼ 600 À Á . We calculated the significance thresholds for the different tests performed for detecting QTL (ADDITIVE and GENOTYPIC) for the null hypothesis of no segregating QTL (scenario 0; Tab. II). In this case, all the genetic variance was assumed to be due to polygenic effects, such that the heritability of the trait was equal to 0.40.
The unknown significance thresholds required for testing dominance variance were obtained under two different scenarios. In the first case, we simulated an additive segregating QTL that explained 25% of the total variance, giving a high probability of the design for detecting an additive QTL. The second case did not assume that a QTL is segregating; i.e. only polygenic effects were simulated.

V. Martinez
The heritability (including QTL and polygenic variance) was equal to 0.4. This second alternative reflects well a situation, where there is no prior knowledge of whether a QTL is segregating in the population. This may be of importance when empirical significance thresholds can be obtained while analysing real populations, by permuting genotypes and phenotypes within families due to the large family sizes especially in aquaculture [4]. In all these cases, 1000 replicates were simulated under the null hypothesis and each replicate comprised 50 full-sib families of size 10 (eight sibs and two parents), giving a population size equal to 500 individuals.
The variance components were estimated using REML with ASREML [11], using the defined matrices as explained in Section 2.2.

Test statistics used to detect QTL
The empirical distribution under the null hypothesis of no QTL for the different tests implemented is presented in Figure 1. Empirically, the ADDITIVE test is distributed as a v 2 distribution of between 1 and 2 degrees of freedom (DF); this is seen here irrespective of the family sizes evaluated in this paper. From this result, it is clear that there is very good agreement between the empirical distribution obtained here and the distribution of the LR under the null hypothesis, as obtained previously by others [27].
For the GENOTYPIC test (including additive and dominance effects at the QTL in the model), there is no previous empirical evidence in the literature  group [29]. The simulation results show that the significance thresholds tend to be more conservative, with a distribution very similar to a v 2 ð2Þ distribution (Fig. 1). Again the family size has little effect on the significance thresholds. The distribution of the test statistic used to detect dominance variance conditional on the best location obtained when using the ADDITIVE test is unknown. As explained in Section 2.4 we performed two scenarios, with or without simulating additive effects at the QTL, to obtain significance thresholds under the null hypothesis of no dominance. As is seen in Figure 2, there is very little difference between the significance thresholds and the shape of the distributions of this test obtained with these two alternative scenarios. Also, note that the significance threshold tends to be more conservative than the one expected at a single position from a mixture distribution of 0 with a probability of 1/2 and a one degree of freedom (v 2 ð1Þ ) with a probability of 1/2 (for a = 0.05; 2.71 [2,3]).
When using the GENOTYPIC test, the test of dominance variance was constructed as twice the difference between the log-likelihood of the GENOTYPIC model and the log-likelihood of the model in which only additive effects were fitted. This gave an empirical distribution very similar to a distribution from a mixture distribution of 0 with a probability of 1/2 and a one degree of freedom (v 2 ð1Þ ) with a probability of 1/2 (with a significance threshold similar to 2.71 for a = 0.05; Fig. 2).

First stage: empirical power of QTL detection
The power estimates (a = 0.05) over 100 replicates are presented in Table III. In this case, power was computed as the proportions of the 100 replicates with a LR greater than the average of the empirical significance thresholds for all the family sizes considered in the previous section. This was done due to the fact

Power of the additive model
The ADDITIVE test was used to detect the QTL, irrespective of whether the true model involved dominance or not. The power calculations are presented in Table III for the different family sizes simulated. We found two main trends in terms of power to detect the QTL in these scenarios; power increased asymptotically when increasing the family size and secondly, power increased when a dominant QTL was simulated. This suggests that most of the information about linkage between markers and QTL can be captured in the random model through fitting only the additive relationships conditional on marker data.

Power of the genotypic model
The power of the GENOTYPIC test tends to be very similar to the test that only includes additive effects in the model. In fact, absolute differences in power to detect the QTL were only at the most extreme case about 3% and this holds for all degrees of dominance simulated in the present examples (such as when including overdominance, see Tab. III). Although not shown, there was a slight increase in the mean test statistic of this test when compared to the additive test alone (about 10%), however, this increase in test statistic was counterbalanced by the fact that this test had a higher significance threshold compared to the one obtained for the ADDITIVE test (see Fig. 1). Furthermore, the correlation between both the ADDITIVE and GENOTYPIC test statistics was very high and consistent with the ratio between them (ranged from 0.90 to 0.97).

Second stage: power of detecting dominance conditional on location
Testing dominance at the best position as obtained from the ADDITIVE model assumes that the most likely position obtained from the additive test is unbiased (which happens to be the case (see Sect. 3.4.1)). This test has the advantage in that there is no need to fit the covariance structure due to dominance at every position tested. For this reason, it requires only once to construct the dominance relationship matrix conditional on marker information. Therefore, this strategy should be computationally more efficient. The first trend observed was that power to detect dominance variance is relatively low even in cases where overdominance was simulated (Tab. IV). In these scenarios, dominance variance accounted for as much as 15% of the total variance (Tab. II). Secondly, Mapping random QTL relaxing additive effects the optimum family size in terms of power was around 50 individuals per family, which is very similar to the optimum observed for the tests of linkage (Tab. IV). In this case, power almost reached an asymptotic value when increasing the family size in excess of 50 individuals per family (Tab. IV). Finally, similar power was obtained whether or not the QTL was detected including or not including dominance in the first stage on QTL detection (Tab. IV).

Position
Estimates of position obtained from both methods (ADDITIVE and GENO-TYPIC) are presented in Table V. In general, estimates of position obtained from both methods were empirically unbiased. We note, however, that correlations between position estimates were only moderate-high (ranging from 0.70 to 0.80) for complete dominance and smaller for overdominance (0.50-0.60). This is because in some replicates, the maximum LR was obtained at opposite positions in the chromosome.

Variance components and bias due to model misspecification
Using the model that only incorporates the additive component gave biased estimates of variance components, if the true model incorporates dominance. This bias, due to model misspecification, was measured using the mean bias (MB) which was calculated as: where h i is the simulated value of the heritability due to the QTL and^denotes the estimate from each replicate simulated. Not including dominance in the model gave upwardly biased estimates of the heritability of the QTL and the actual bias is clearly a function of the magnitude of the dominance variance explained by the QTL (Fig. 3). Due to the fact that the power of the variance component method is related to the magnitude of the variance component considered [14,25], this may explain why, under dominance, the power of the simple additive test is very similar to the test that includes dominance in the model. When explicitly including dominance effects in the model, estimates of the additive and dominance variance of the QTL tend to show only a slight (up/downward) bias, considering the complexity of the model fitted. Furthermore, standard deviations of variance components over replicates were relatively similar for the models fitting or not fitting dominance when detecting QTL in the first stage (results not shown).

DISCUSSION
In this paper, different models for detecting QTL under the variance component method were studied. It is customary to fit only additive effects to search for QTL, irrespectively, of whether the true underlying model involves dominance. In this investigation we obtained similar power for detecting the QTL using or not explicitly using dominance effects in the model of analysis when there is a non-additive genetic action influencing the trait under consideration. The most likely explanation of this finding is related to the amount of information required for estimating different parameters at the QTL. Recent results have shown the difficulties when trying to estimate additive and dominance effects using information from dense marker data using reasonable sample sizes [23]. This result may be explained largely by the high correlation between the IBD probabilities (/ i,j ) and genotype IBD (d i,j ) between individuals. Thus, separating additive versus dominance effects within families can be difficult when estimating highly correlated parameters using the variance component method [20,23,24].
Position estimates may differ according to the models fitted to the data, but the results obtained here suggest that shifts in position are not due to a better fit when including dominance, i.e. unbiased estimates of position are obtained with or without including dominance. This has also been found in models involving dominance and inbreeding in which relatively large shifts in position have been observed when including dominance in the analysis, even when this effect only borders significance [21]. To investigate this finding further, we calculated the correlation between the difference of position estimates from both models (including or not including dominance when detecting the QTL) and we used the LR to test dominance at the most likely position from the GENO-TYPIC model, in order to see whether changes in position are associated with a better fit of this model. A significant association should be obtained if shifts in position estimates between both models are due to the incorporation of dominance. An example of such a trend is presented in Figure 4. This figure clearly shows that there is very little evidence suggesting that changes in position are due to the better fit given by using dominance in the model. Note that the example gave maximal power to detect the QTL, but the same trend was observed when the power to detect the QTL was small (data not shown). In spite of this finding, some increase in accuracy can still be obtained when including dominance effects in the model of analysis, but this was only evident when there was overdominance (see Tab. V).
So far, we completely assumed informative markers and linkage phase known. These assumptions made it possible to use a simple and very fast deterministic Mapping random QTL relaxing additive effects procedure to obtain the matrices required for modelling the covariance structure at the QTL for additive and dominance effects, given the large family sizes simulated. This enabled us to obtain the distributions under the different null  Figure 4. Relationship between the difference between position estimates obtained with or without including dominance to detect the QTL and the LR test for detecting dominance (using the GENOTYPIC test). The examples are given for complete dominance (a) and overdominance at the QTL (b). The family size was equal to 50 individuals.

602
V. Martinez hypothesis assumed. It would be expected that fully informative markers will not produce bias, but give the highest possible values of power calculations given the scenarios simulated in this paper. In practice, it would be expected that when markers are not fully informative, information from multiple markers along the chromosome can be used for inferring IBD probabilities using hidden Markov models and this improves the effects of reduced marker heterozygosity.
In a literature review [16] regarding QTL analysis in plants, it is shown that more than 50% of all the QTL surveyed gave evidence of a dominance and overdominance mode of gene action as the most plausible models for the dominant QTL. These results, however, are likely due to the significance threshold imposed to declare that a QTL is real, which will cause very large bias in the dominance effects [16]. So the interpretation of a significant dominance, either a complete or an overdominant model, should be considered with caution. In practice, this can be particularly extreme in outbred populations where the family size is relatively low, markers may not be fully informative and the proportion of the parental crosses that is expected to be informative is small.
Significant quantities of genomic data are now available, with the potential for enhancing accuracy of position estimates of QTL effects. For instance, [26] reported a genetic variation map of the chicken genome containing 2.8 million single-nucleotide polymorphisms (SNP) and [15] found 2507 putative SNP in the salmon genome that could be valuable for this purpose. In order to capture the information from historical recombinations, calculation of IBD probabilities simultaneously require information from LD and linkage information [18] at specific targeted regions. Under these circumstances, greater accuracy of positioning the QTL is expected, however, this is associated with the extent of LD in the population. This is because, the values of d i,j only rely on LD but not on linkage information, which, for species with relatively small family sizes, would be a limitation when mapping dominant QTL.