Single Nucleotide Polymorphism (SNP) information has enabled the use of linkage disequilibrium to detect and localize loci affecting phenotypes. The first methods developed searched for disequilibrium between one or a few marker loci and loci responsible for disease susceptibility. Case–control designs were used [1]. Typically, data were analyzed to compare the frequency of marker alleles between healthy and diseased individuals, for instance using the relative risk criterion [2]. A similar approach for quantitative traits (including production traits in animals or plants) was to model the expectation of their distribution as a linear combination of marker genotype, allele or haplotype effects. Grapes et al. [3] and Zhao et al. [4] demonstrated that the single marker regression model is as powerful and precise as other more sophisticated techniques, such as multiple regression, regression on haplotypes, or the IBD method proposed by Meuwissen and Goddard [5].

Detection of spurious associations is a major issue that has been investigated by many authors. Such errors occur when population classification based on marker information is confounded with another source of heterogeneity that affects the trait being analyzed. The problem of genetic heterogeneity has been the most widely studied. Two non-exclusive situations can occur: (i) the population consists of genetically different subpopulations and (ii) the population consists of related individuals, which may be recorded through pedigree or not. Several studies have clearly shown that neither relative risk nor simple regression is robust to genetic stratification of the population resulting from the mixture of different groups (breeds, lines, etc.) or families [6–9].

Many approaches have been proposed to avoid the effects of spurious associations. The first was to restrict the analysis to within-family comparisons, linking association analysis to transmission studies. Within this framework, samples have to be carefully organized and *ad hoc* families have to be recruited. They are based on the association, within heterozygous parents family, of segregation distortion at a marker locus and progeny phenotypes. This idea was first implemented in the transmission disequilibrium test (tdt) designed by Spielman et al. [10] and then further developed by others. Ewans and Spielman [11], when comparing tdt and a “within-family contingency statistic” that is similar to the haplotype relative risk developed by Falk and Rubinstein [12], demonstrated the robustness of tdt in various subdivision and admixture scenarios.

Two widely represented families of methods extend these within-family comparisons to quantitative traits: the “quantitative tdt” or QTDT by Abecassis et al. ([13–16]) and the family-based association tests or fbat [17–20]. All these methods are robust to population stratifications, have similar power [21, 22], and are more powerful than the first tests developed for family-based association studies [14].

Although limiting spurious associations by using within-family analyses was very successful, case–control association studies in populations consisting of individuals assumed to be unrelated were nevertheless frequently performed, in particular because the recruitment of the corresponding samples is much easier [23]. A number of techniques were derived to limit false positives: “genomic control” corrects the test statistic [24, 25], a structure effect can be added to the model of analysis [26–31], and marker transmission used in family-based tests can be generalized and used between generations [5, 32].

Concerning quantitative traits, known or hidden population structures can be modeled in mixed models where the phenotype expectation is modeled as the sum of fixed effects, including the effect of the genetic marker being tested, and a random individual polygenic effect. Covariances between the individual polygenic effects are proportional to the polygenic variance and coancestry coefficients, which can be estimated from pedigree or marker information [33–36]. This mixed model is a standard that has been used in animal breeding and genetics for many years [37, 38] and more recently in human genetics [39, 40].

In these mixed models, polygenic and residual variances have to be estimated separately for each marker fitted before its significance is tested. This estimation phase, to be repeated for each marker tested, can be a limiting factor in large designs and simpler approaches have been proposed. The GRAMMAR method was developed by Aulchenko et al. [41, 42] and by Amin et al. [43] to test marker effects on phenotypes that have been corrected for an estimate of the individual’s polygenic effect in a restricted model that is free of the polygenic effect. The FASTA approach described by Chen and Abecasis [44] is a score test, derived from the generalized FBAT [18]. In a first step, environmental fixed effects and polygenic and residual variances are estimated from a mixed model excluding the marker effect. Then, corrected phenotypes are successively correlated to each marker’s genotypes using these estimations, giving FBAT type scores. A similar approach can be considered in which the second step would be based on a simple fixed effect model as in GRAMMAR.

Other approaches have been proposed, with the aim of accelerating computations (emma for efficient mixed-model association, [45], emmax for eXpedited, [46] and P3D for Population Parameters Previously Determined, [40]). Finally, a few models deal with spurious associations arising from subpopulations and family structures [39, 43, 47–49].

The above methods have been evaluated by simulations. Aulchenko et al. [41] compared GRAMMAR to the full mixed model, to the regression model without a polygenic effect, to the QTDT method, and to a simple fbat by using simulated datasets that corresponded to typical pedigrees. Genomic control was compared in [43] using GRAMMAR and GRAMMAR-GC. Price et al. [39] compared Pca (eigenstrat), Armitage test, emmax with or without pca and roadtrips proposed by Thornton and McPeek [50], in which genomic data are modeled as random variables. Pca-based approaches ([26], eigenstrat; [51], pca-based logistic regression; [52], lapstruct (which makes use of spectral graph theory to build principal components) were compared in [53] to the genomic control described by Devlin and Roeder [24] and to roadtrips. Three GWAS (genome-wide association studies) techniques were compared in [54]: simple regression, GRAMMAR and a “mtdt”, which is a QTDT applied to Mendelian sampling terms.

On the whole, these numerical studies have shown that within-family approaches are less powerful than case control analyses in populations of unrelated individuals [41, 48] and that there are no major differences between the latter [3]. These studies have clearly demonstrated the non-robustness of the simplest methods such as the Armitage test or simple regression [47, 53–55] and that more elaborate models are robust to any type of stratification [39, 47, 49]. Furthermore, these studies have shown that approximate techniques such as GRAMMAR and emmax are very efficient in terms of error control when family structures exist, as well as in computing speed, but are less powerful in certain situations e.g. [41, 46].

One of the main limits of comparing methods based on simulations is that the simulation results cannot be generalized and only a few studies have provided algebraic results but for simple situations. For instance, Fan and Xiong [56] formalized single- or bi-marker association analyses by regression, deriving their power as a function of the non-centrality parameter of the test statistic, which depends on the linkage disequilibrium (LD) between the markers and the quantitative trait locus (QTL). In [11], the relative risk, the within-family contingency statistic and the tdt were compared algebraically using a few admixture scenarios. The Cochran Armitage test was studied by different authors [57–59]. The power of ANOVA or regression-based association analyses was derived by Ambrosius et al. [60] as a function of allelic or genotypic frequencies, and recently completed by Kozlitina et al. [61]. Abecacis et al. [13] obtained results for the QTDT in population mixture situations, by deriving within- and between-family expectations with and without parental information. Boitard et al. [62] generalized the corresponding formulae for variances and tests. In [21], Lange et al. provided algebraic formulae representing the power of fbat, depending on parental and progeny genotypes.

The aim of the work presented here was to further develop the algebraic formulation of power and type 1 error rate for four of the aforementioned methods: simple regression, the approximate methods GRAMMAR [41, 43] and FASTA [44], and the QTDT described by [13]. Our goal was to explore the effect of population structure but focusing on hidden familial relationships rather than on population mixtures. In such situations, phenotypes are both under the influence of the QTL that is linked to tested markers and the polygenic background. The model of reference used in this study was the standard mixed model which includes the coancestry coefficients as parameters. Results show in which situations the methods studied here can be considered as appropriate and provides some guidance for population sampling.