Power of QTL detection by either fixed or random models in half-sib designs

The aim of this study was to compare the variance component approach for QTL linkage mapping in half-sib designs to the simple regression method. Empirical power was determined by Monte Carlo simulation in granddaughter designs. The factors studied (base values in parentheses) included the number of sires (5) and sons per sire (80), ratio of QTL variance to total genetic variance (λ = 0.1), marker spacing (10 cM), and QTL allele frequency (0.5). A single bi-allelic QTL and six equally spaced markers with six alleles each were simulated. Empirical power using the regression method was 0.80, 0.92 and 0.98 for 5, 10, and 20 sires, respectively, versus 0.88, 0.98 and 0.99 using the variance component method. Power was 0.74, 0.80, 0.93, and 0.95 using regression versus 0.77, 0.88, 0.94, and 0.97 using the variance component method for QTL variance ratios (λ) of 0.05, 0.1, 0.2, and 0.3, respectively. Power was 0.79, 0.85, 0.80 and 0.87 using regression versus 0.80, 0.86, 0.88, and 0.85 using the variance component method for QTL allele frequencies of 0.1, 0.3, 0.5, and 0.8, respectively. The log10 of type I error profiles were quite flat at close marker spacing (1 cM), confirming the inability to fine-map QTL by linkage analysis in half-sib designs. The variance component method showed slightly more potential than the regression method in QTL mapping.


INTRODUCTION
During the last ten years, many molecular genetic markers have been developed and improved, leading to the possibility of obtaining genotypes for many markers within a very small physical distance on the genome. The inheritance of chromosomal regions within a data set can be traced by molecular markers and used to detect and localize QTL in linkage analysis. Linkage analysis uses the recombination information within the marker genotype data in current generations.
QTL detection and mapping methods are based on the assumption that QTL is either a fixed or a random effect. A simple regression method for flanking markers in inbred line crosses was developed for interval mapping and multiple QTL in outbred half-sib designs [7,10,16]. The regression method is based on a general linear model that treats the QTL as a fixed effect [4,14,15,18]. The variance component approach is based on a mixed inheritance linear model and the QTL is treated as a random effect. This method is based on an identity by descent (IBD) probability matrix between all individuals at the putative QTL position. Mixed inheritance models use phenotypic records and genotypes of marker information in current generations. Fernando and Grossman [5] presented a methodology for the application of BLUP to marker-assisted selection (MAS) in which QTL alleles were considered random in the context of the mixed model terminology. They presented recursive algorithms to calculate IBD probabilities for a QTL as a gametic relationship matrix. Further studies developed this algorithm to efficiently compute the conditional covariance matrix and its inverse in less restrictive conditions [1,12,22,25]. Alternative methods for computing IBD probability are a correlation based algorithm and a segregation algorithm [2,3,6,20]. Both methods are based on the IBD probability matrix between all individuals at the putative QTL position obtained using the markers to trace the inheritance of the QTL. These approaches are possible for pedigrees with incomplete marker information, unknown linkage phase and multiple linked markers. The variance component approach using restricted maximum likelihood (REML) is an approach derived to estimate the position of the QTL. The likelihood is maximized over the parameters at each position and the position with the largest maximized likelihood is selected [6,9,21].
The aim of this study was to investigate the variance component approach to QTL linkage mapping in half-sib designs and to compare it with the simple regression method. Simulated data sets were generated and analyzed to determine the empirical power at four different levels of QTL variance ratios relative to total genetic variance, four different marker spacings, and a variety of granddaughter designs differing in number of sire families and sons per family. The variance component approach was considered for situations with and without marker information of the dam.

The simple regression model
The simple regression method is based on the multiple linked marker approach described by Knott et al. [16]. This is the analysis of individual linkage groups across several half-sib families. The probability of inheritance of the first haplotype of the parent of a linkage group at every 1 cM interval conditional to its marker genotype is computed for each offspring. The QTL are fitted at each fixed interval along the linkage group by regression of a phenotypic trait on the conditional probability. The analysis is based on the nested design within families and the residuals are pooled across families to calculate a test statistic. The multiple linear regression model with multiple linked markers for every linkage group is as follows: where y i j is the trait value of individual j from half-sib family i, x i j is the conditional probability of individual j inheriting the first haplotype of sire i, β i is the regression coefficient within family i, α i is the polygenic effect of sire i and e i j is the residual effect. Sires could be heterozygous for at least one marker locus. If all markers in a linkage group were uninformative for an individual, the conditional probability would be equal to 0.5. If the first allele of the sire was transmitted to his son, the conditional probability would be equal to one and otherwise is equal to zero. The test statistic was an F-test and F-ratios were computed at every map position. F-ratios were converted to log 10 of probabilities of type I errors in order to compare the regression method to the variance component method. The critical values for test statistics were computed from the F-test with the degrees of freedom of the numerator based on the number of sire families and the degrees of freedom of the denominator based on the total number of sons in the simulated data [16].

The mixed inheritance model (VC)
The mixed inheritance model for variance component estimation (VC) is presented in matrix notation as follows: where y is an (n × 1) vector of phenotypes of the quantitative trait, X is an (n × t) matrix of fixed effects, b is a (t × 1) vector of fixed effects, Z is an (n × p) incidence matrix relating phenotypes to random animal effects, g is a (p× 1) vector of additive polygenic, cumulative effects of infinitesimal loci (a), and additive QTL effects (q) and e is an (n × 1) residual vector. The random effects g and e are assumed to be distributed as multivariate normal variables (probability density function) as follows: g ≈ N (0, Hσ 2 g ) and e ≈ N (0, Rσ 2 e ). Where σ 2 g and σ 2 e are the variances of additive genetic and residual effects, respectively. H is the (co) variance matrix related to total additive genetic value of polygenic and QTL effects which can be expressed as H = (A * (1 − λ)) + (G * λ). A is a standard numerator relationship matrix for the polygenic effects. The (co) variance matrix, G, is a (p × p) matrix for the additive effects of the QTL conditional on marker information. The elements of G represent the proportion of alleles IBD between all pairs of individuals in the pedigree. Lambda (λ) is the proportion of total additive genetic variance due to additive QTL variance (λ = σ 2 q /σ 2 g ). The inverse of H was calculated directly. R is a known (n × n) diagonal matrix, which is equal to Iσ 2 e . The mixed linear model assuming no QTL segregating in the population is as follows: with a ≈ N (0, Aσ 2 a ) and e ≈ N (0, Rσ 2 e ). In this study, Loki software was used to compute the G matrix [11]. Thompson and Heath gave a detailed description of the Markov chain Monte Carlo (MCMC) method and segregation algorithm used in Loki [20].

Likelihood ratio test
The likelihood ratio test was used to estimate the position of the QTL using VC model. For every 1 cM on the chromosome segment, the (co) variance matrix, H, is estimated to obtain the position based on the VC model. The following likelihood function was used for the VC model [8]: where, l is the log of likelihood function for VC model, C is the coefficient matrix of mixed model equations, R is the error (co) variance matrix given by Iσ 2 e , H = (A * (1 − λ)) + (G * λ) as described above, n is the number of animals that have a record, t is the number of fixed effects or rank of the X matrix, and and W=[X: Z].
The likelihood was estimated at each putative QTL position and a likelihood function (Eq. (4)) was built along the chromosome region. The likelihood value was estimated for different values of lambda λ = 0, 0.1, 0.2, 0.3, 0.4 or 0.5. When λ is zero there was no QTL segregating in this chromosome region and there were only random polygenic effects in the model. The likelihood function from this reduced model was used in a likelihood ratio test. The null hypothesis in this study was that the QTL variance equals zero and there was no QTL segregation in this chromosome region. The following likelihood ratio test uses the test of the null hypothesis, where l r is the log likelihood value for the reduced model and l f is the log likelihood value for the full model with λ > 0. The critical values for test statistic were computed from a chi-square test with one degree of freedom. LR values were converted to log 10 of the probabilities of a type I error in order to be compared to the regression method.

DATA SIMULATIONS
Half-sib data, commonly found in outbred populations (i.e., dairy cattle), were simulated. The polygenic and residual variance components were assumed known. The genotype for all sires and sons for six marker loci and a QTL was simulated for a small segment of the genome. A single bi-allelic QTL and six equally spaced markers each with six alleles were simulated along with a recombination fraction (θ). The allele frequency in the QTL was equal to 0.5 and the frequencies of the six alleles of each marker were equal. The base haplotypes, which included the pair of haplotypes for each sire and dam, were generated assuming complete linkage and Hardy Weinberg Equilibrium. The paternal and maternal haplotype for each son were generated based on each parent using the Haldane mapping function (no interference). Sires and dams were assumed to be unrelated. Dam genotypes were simulated to consider the performance of the VC approach for the situation with dam genotypes known.
The heritability of the quantitative trait was set to 0.3 and the total phenotypic standard deviation was assumed to be 100. The total genetic variance was 3000 and in different scenarios the proportions of QTL variance relative to total genetic variance were 0.05, 0.1, 0.2 and 0.3. The daughter yield deviations (DYD) created for each son were based on a progeny test with 100 daughters. DYD is an average of the phenotypes of the daughters adjusted for systematic  environmental effects and genetic values of the daughters and the dams [24].
The DYD were generated for each son as: where n is the number of daughters per sire (i.e. 100), BV sire is the breeding value of the sire, BV dam is the breeding value of dam, σ 2 A is the total additive genetic variance, σ 2 P is the total phenotypic variance, E is the residual effect and δ ∼ N (0, V(δ)).
The critical values of the test statistic corresponding to 5% type I error were determined empirically by simulating 500 replicates based on the null hypothesis of no QTL segregating at the segment of the chromosome. To achieve the same variance structure, a QTL was simulated with the appropriate λ for a given scenario, but it segregated independently of the markers. Critical values for the maximal LR statistic in any bracket were determined from the 25th largest value in the 500 replicates. Another 500 replicates were simulated under the alternative hypothesis, with the QTL situated in the center of the markers, and the empirical power was computed as the proportion of replicates in which the test statistic exceeded the critical value.

RESULTS
QTL detection was performed using the VC model and the regression model for all data sets. The factors studied (base values in parentheses) included number of sires (5) and sons per sire (80), ratio of QTL variance to total genetic variance (λ = 0.1), marker spacing (10 cM), and QTL allele frequency (0.5). The results concerning the power of QTL detection in all scenarios examined are summarized in Table II.

Sire family size
By increasing the number of sire families (first scenario), the empirical power of the two models increased substantially. The VC model was slightly more powerful than the regression model, especially in the design with five families. The difference in power between the two models for a moderately sized design with 10 families was 6%, whereas for five sires the difference was 7.6%. The total number of animals genotyped in scenario one with five families was 405 animals. This family data set is economically justifiable for a QTL detection design. The power of the test with 20 sires using the VC model was not very different from 10 sires. The power of VC method using 10 sire families was 0.98 for detecting QTL, whereas, the optimal number of families to achieve this power using the regression method was 20 sires. The difference between the two models using 20 sires was not significant. The test statistic at the QTL position using the Reg model was exceeds the critical value for only 20 sires (P < 0.05). The test statistic at the QTL position using the VC model was exceeds the critical value with 10 (P < 0.05) and 20 (P < 0.01) sires. Figures 1 and 2 show the average of minuses log 10 P-values for Reg and VC models, respectively. The curves of the VC model (Fig. 2) compared to the curves of the Reg model were very similar (Fig. 1).

Sons per sire
By increasing the number of sons per sire, the empirical power of the two models was increased. The VC approach was slightly more powerful than the regression method. Both models showed more than a 0.96 level of power for   QTL detection only with 200 sons per sire. The total number of animals genotyped in this situation was 1000. The results confirmed that increasing the family size was more effective than increasing the number of families, given the constant number of animals genotyping, for increasing the power of the test for QTL detection. Increasing the number of families was more economically justifiable than increasing the number of sons per sire. The difference in power between the two models using a small number of sons per sire was significant.
The test statistic at the QTL position using the VC model was exceeds the critical value at 100 (P < 0.05) and 200 (P < 0.01) sons per sire. The test statistic at the QTL position using the regression model was exceeds the critical values only in a situation with 200 sons per sire.

Proportion of QTL variance
By increasing the QTL variance ratio, the empirical power increased in the two models. The VC approach was slightly more powerful than the regression method. The maximum power of the VC model was 0.97 at λ = 0.3. In this situation, the power of the regression model was 0.95. To increase the power of the test for QTL detection specifically when the QTL ratio is low required the number of sires in the data set to increase. When λ = 0.1 the difference between the two models was maximized. The minimum difference was at λ = 0.2. The test statistic at the QTL position using the regression model were exceeds the critical value with λ = 0.2 (P < 0.01) and λ = 0.3 (P < 0.01). The test statistics at the QTL position using the VC model were exceeds the critical value with λ = 0.2 (P < 0.05) and λ = 0.3 (P < 0.01).

Interval between markers
The empirical power of the test of the position of the QTL decreased with increased intervals between markers. The test statistic at the QTL position using the regression model was not exceeds the critical value for 1, 5, 10, and 20 cM. The test statistic at the QTL position using the VC model was not exceeds the critical value for 1, 5, 10, and 20 cM.

QTL allele frequency
The empirical power of the test decreased as the QTL allele (Q 1 ) frequency departed from 0.5 using the VC method. The maximum empirical power using the regression method was with the QTL allele (Q 1 ) frequency of 0.8. For the different allele frequencies there was very little difference between the regression and VC methods. Figure 3 shows the log 10 of the probabilities of type I error values using the variance component method with and without dam genotypes. Both scenarios reached maximum values at the QTL position. The graph with the dam genotypes known was sharper than the graph without dam genotypes. The two curves were very similar and the difference was not statistically significant.

DISCUSSION
Several statistical approaches have been developed for the detection and localization of QTL in outbred populations [6,9,13,16,17]. The regression model method includes the uncertain QTL genotypes as a covariate in the model [16], and the VC method.
Several studies have estimated the theoretical power for both methods using deterministic approaches [23,26,27]. The empirical powers of this study for both models were compared to a deterministic approach. Powers of all scenarios using the deterministic approach were overestimated compared to empirical power using simulation data. For scenarios with small population size, like 5 sires and 50 sons per sire, the differences between deterministic and empirical powers were significant (P < 0.05). With 5 sires and 200 sons per sire, the differences between deterministic and empirical power were small and not significant.
The VC method was slightly more accurate than the regression method in QTL mapping. The empirical power of the test of the position of the QTL decreased with an increased interval between markers. However, the simulations with small intervals assumed prior knowledge that a QTL exists in that small region. The log 10 P profiles were quite flat for both methods at close marker spacing (1 cM), confirming the inability of both methods to fine map QTL by linkage analysis in half-sib designs of the magnitude considered here.
The VC method showed slightly more potential than the regression method in QTL mapping and was certainly feasible for real data sets, even with extensive permutation testing to determine significance levels. The power of the regression method was slightly less than the VC approach in all scenarios.
The most likely position for a QTL was predicted in these studies to be in the same position regardless of the method and test statistic. The VC approach would yield proportionally better results than the regression model when the markers are less informative or very widely spaced, or when family size is small. The other advantage of the VC approach is that it is useful for complex pedigrees with unknown linkage phase and missing information.
The dam genotypes were considered using the variance component approach. The dam genotype models showed larger log 10 P-values at the QTL position than the model without dam genotypes. Without dam genotypes, Loki assumed marker data were missing on dams and used the MCMC method to calculate the expectation of the genotypes for dams. The difference between the two methods was not statistically significant. Therefore, the model without dam genotypes was more economically justifiable than the model with dam genotypes.
The test statistics using the VC approach were only slightly larger than with the regression method. The shape of the log 10 P-values for the test statistics across the entire chromosome was the same using the two methods. When the number of families, family sizes and QTL variance ratios to total genetic variance increased, the power of both methods increased and the statistical differences were not significant. The VC approach could better handle a small number of families, small family sizes, small QTL variance, missing marker data and unknown linkage phases. For detection of QTL or genome scans, the regression method and the VC approach are appropriate methods in halfsib designs. The power of QTL detection using these methods depends on the QTL effects and the recombination fractions between markers and QTL.
The linkage analysis methods were not successful for precise estimation of QTL positions. For fine mapping QTL both models were not successful. For tightly linked markers the VC approach was stronger than the regression method. The VC approach in a situation with 5 cM intervals was better than the regression method. The log 10 P profile using either model were quite flat at close marker spacings, such as 1 cM intervals.