Simulation analysis to test the influence of model adequacy and data structure on the estimation of genetic parameters for traits with direct and maternal effects

Simulations were used to study the influence of model adequacy and data structure on the estimation of genetic parameters for traits governed by direct and maternal effects. To test model adequacy, several data sets were simulated according to different underlying genetic assumptions and analysed by comparing the correct and incorrect models. Results showed that omission of one of the random effects leads to an incorrect decomposition of the other components. If maternal genetic effects exist but are neglected, direct heritability is overestimated, and sometimes more than double. The bias depends on the value of the genetic correlation between direct and maternal effects. To study the influence of data structure on the estimation of genetic parameters, several populations were simulated, with different degrees of known paternity and different levels of genetic connectedness between flocks. Results showed that the lack of connectedness affects estimates when flocks have different genetic means because no distinction can be made between genetic and environmental differences between flocks. In this case, direct and maternal heritabilities are under-estimated, whereas maternal environmental effects are overestimated. The insufficiency of pedigree leads to biased estimates of genetic parameters.


INTRODUCTION
The animal model is extensively used for predicting genetic values and estimating genetic parameters, because the optimum combined use of all relationships and performances improves accuracy. However, despite the theoretical advantages of this model, some data and model conditions can affect the validity and precision of the estimation of variance components.
The first source of bias lies in the choice of the genetic model used to analyse data. Concerning maternally influenced traits, there is still discrepancy between the theoretical studies about genetic parameter estimation and practical applications. The reasons for this can be problems of convergence with variance components estimation software, or data structure (for example incomplete pedigree), or unavailability of efficient techniques (software or hardware) as is the case in some developing countries. When traits are governed by both direct and maternal effects, fitting only direct effects leads to an overestimation of direct heritability. For growth traits, most of the estimations of direct heritability with both direct and maternal effects vary between 0.20 and 0.30 [30,38,47]. When maternal effects are ignored, direct heritabilities published can reach 0.73 for daily gain before weaning, [23], 0.48 or 0.50 for birth weight [29], 0.35 for four-month weight [27], 0.56 for weights before weaning [6] or 0.45 for weaning weight [7]. However, the relative part of direct and maternal effects (genetic or environmental) and the nature and magnitude of the relation between these effects are determining conditions for the effectiveness of a selection scheme. Literature on the influence of model adequacy in order to estimate variance components is limited. There are some publications in which various models were tested in order to find the most adapted to analyse data. For example, simulations were used to study biometrical aspects of direct and maternal effects [41,43]. Meyer [33] studied the precision of genetic parameter estimation with different family structures. Robinson [41] and Lee and Pollak [28] tested the sire × year variation on the genetic correlation between direct and maternal effects. Quintanilla Aguado [39] studied the importance of the models on maternal effects analysis by fitting an environmental correlation between the dam and the offspring. These previous publications reported biases when using incorrect models. In this article, we quantify this bias for different values of true genetic parameters.
Data structure is the second source of bias likely to affect the estimation of variance components. In traditional farming systems, it is sometimes difficult to identify animals and to record performances and/or genealogy. The amount and the quality of the data are then affected by practical constraints. Although this is often the case in developing countries, this can also concern industrialised countries, in particular as regards hardy breeds managed in large flocks with several males used simultaneously for natural service. One of the consequences can be the use of a very incomplete pedigree resulting in a less thorough relationship matrix used in the animal model. Moreover, the lack of artificial insemination and a poor exchange of sires across breeding units limit gene flow and cause a partial or complete lack of genetic connectedness. Even in selection schemes under intensive breeding conditions, disconnectedness can be a problem when prediction of genetic values is done on a national scale and artificial insemination is organised into regions, as is the case for instance for the Montbéliarde and Holstein cattle breeds in France [19,20] or in North-American breeds [3,24,44]. The effect of data structure has been extensively studied in the context of genetic evaluation of animals. Absence of connectedness and poor genealogical information are responsible for biases and loss of accuracy in the prediction of genetic values by an animal or sire model [1,21,44]. However, not much is known about the effect of data structure on the estimation of genetic parameters by an animal model, especially in the presence of maternal effects. Diaz et al. [10] and Eccleston [11] studied the influence of disconnectedness on models with direct effects and found that it would act only on the precision of the estimation. Now, to propose strategies for improvement, it is necessary to assess the relative importance of deviations from the ideal situation. The second purpose of this article is to test, by simulation, the influence of data structure on the estimation of genetic parameters for traits subject to direct and maternal effects.

Simulated population
The simulation program was written in Fortran and NAG Libraries were used for all random processes.
As model adequacy can be a real problem in populations under extensive conditions where data structure and unavailability of efficient techniques can be a constraint for the use of the correct model, we used a known African sheep population [12,13,35] to set some parameters of the simulated population (prolificacy, replacement rate, male/female ratio). Compared to the real population, the number of animals per flock was increased in order to avoid confusion between animal and flock effects. The base population consisted of 1 260 unrelated animals (60 males and 1 200 females) assigned randomly to 20 flocks of 63 animals each (3 males and 60 females). Once the base population was created, the simulation was carried out over 6 years. Each year, random mating (no matter what flock animals came from) was practised with a ratio of one male for twenty females. The offspring were generated according to a prolificacy of 115%. Each year, 1/3 of the males and 1/5 of the females were replaced by offspring at random. The remaining offspring was discarded so that the number of animals per flock and the number of flocks were constant over time. The average number of offspring per female was equal to 2.7. The data set corresponds to a fully connected population with complete pedigree.

Models used for simulating data
The simulated models were similar to those used in Robinson's study [41], with A representing the genetic direct effects, M the genetic maternal effects, R the genetic correlation between direct and maternal effects, and C the maternal environmental effects. Some authors (Hohenboken and Brinks [22], Koch [25], Foulley and Ménissier [17] and Cantet [8]) have shown that a more complex biological model could exist, this model including a non genetic correlation between maternal effects of dams and daughters. Several biometrical models have been proposed to consider this correlation [8,40,41]. We could have used this model in our simulations, but we wanted to limit this work to the models most frequently used for the study of maternal effects. The models and the corresponding (co)variances are presented in Table I.
For the base population (which represents founder parents), random effects were sampled from normal distributions with zero mean and variances corresponding to each random effect. Direct genetic value A i for individuals i was simulated in a distribution N(0, σ Ao ) and maternal genetic value M i for individuals i was simulated using: AoAm × Q i × σ Am where r AoAm is the genetic correlation between direct and maternal effects and Q i is a random variable sampled from a standard normal distribution N(0, 1). Since dams were unknown for these animals, when the simulated model included maternal effects, no record was generated for this base population.
In real data, the distribution of flock effects was close to a normal distribution. We then used a random variable distributed according to N(0, σ 2 t ) to generate this t k effect for flock k which was considered as fixed in the variance component estimation model.
Over the successive years, genetic effects of the offspring were calculated as the mid-parent values, plus a Mendelian deviation, calculated following the formula [15]:  where W i (o) and W i (m) are Mendelian deviations of the offspring (i) for the direct effect (o) and the maternal effect (m), respectively; R i and R i are independent random variables sampled from a standard normal distribution N(0, 1); F p and F m are coefficients of inbreeding of the sire (F p ) and the dam (F m ), respectively. The calculation of inbreeding coefficients was made using the algorithm proposed by Meuwissen and Luo [32]. Residual effects were simulated for offspring according to N(0, σ 2 E ) for direct effects and N(0, σ 2 C ) for maternal environmental effect. Residuals corresponding to records of dams were independent from residuals corresponding to records of their progeny. Finally, a file of about 9 500 animals with a single record per animal (except for the base population) was obtained corresponding to six years of simulation.

Values of parameters used in the simulation
Two sets of genetic parameters values were used for the simulations. The first set (called population 1) was supposed to reflect genetic parameter values found in the literature for growth traits in cattle and sheep of temperate climate [30,38,47]: 0.20 for direct heritability (h 2 Ao ), 0.30 for maternal heritability (h 2 Am ) and 0.05 for the part of variance due to maternal environmental effects (c 2 ). The second set (called population 2) was chosen to reflect what can be found in countries with high constraints. They were close to genetic parameters estimated on a Tunisian breed of sheep [5]: 0.05 for h 2 Ao , 0.10 for h 2 Am and 0.25 for c 2 . The genetic correlation between direct and maternal effects (r AoAm ) has often been found to be negative or equal to zero in cattle and sheep [2,17,31,34]. Consequently, three values, 0, −0.25 and −0.50 were used for both populations. Seven simulation models were used for each population, model A including direct effects only, models AMR0, AMR25 and AMR50 including direct and maternal effects under the three alternative values of the genetic correlation, and models AMR0C, AMR25C and AMR50C which, in addition to the direct and maternal genetic effects, considered the maternal environmental effect. Values of variance components are presented in Table II.
Fifty replicates were made for each population and each of the seven models simulated. A distinct seed for the random number generator was set for each replicate. The same seed was used to simulate the genetic mean of flocks in order to limit the variability of samples.

Data analysis
The VCE program [37] was used to estimate genetic parameters by means of REML methodology. Four models were used for analysing the seven simulated data sets for each population. The first three models included direct effects only (model A), maternal and direct genetic effects (model AMR), maternal and direct genetic effects plus maternal environmental effects (model AMRC). In addition, a model including direct genetic and strictly environmental maternal effects (model AC) was used, and this fourth model assumes that maternal effect has no genetic component in the dam. These four analysis models, presented in Table I, were fitted to each of the seven data sets simulated under the genetic assumptions described above for populations 1 and 2. The average and the empirical standard deviation were calculated over the fifty replicates obtained for each model and each population.

Results and discussion
Results are shown in Tables III and IV for populations 1 and 2, respectively. Empirical standard deviations between replicates varied between 0.02 and 0.04 for heritabilities of direct and maternal effects. They were higher for the genetic correlation, particularly when true values tended to zero (AMR0, AMR0C) and when direct and maternal heritabilities were small (population 2).   For both populations, average parameters estimated with the true model (same simulation and analysis models) were very close to true values.

Simulation model A (only direct effects)
When data simulated according to a direct effect model were analysed with a more complex model (models AMR or AMRC), the direct heritability was unbiased and maternal effects (genetic or environmental) were estimated as equal to zero. Genetic correlation could not be estimated, because the maternal genetic variance was equal to zero in most of the cases.

Simulation models AMR0, AMR25, AMR50 (direct and maternal genetic effects)
When the dam effect was neglected (analysis model A) on data simulated according to a model with direct and maternal genetic effects, the direct heritability was overestimated. Estimates of direct heritability could reach more than twice the true value when genetic correlation was equal to zero (ĥ 2 Ao = 0.42 for population 1 and 0.13 for population 2). The importance of the bias increased as the genetic correlation was reached zero. These results agree with those obtained by Waldron et al. [47] or Nasholm and Danell [36] on real data, and by Southwood et al. [43], Robinson [41] or Quintanilla Aguado [39] on simulated data. Results obtained for a selected population are similar [42]. When maternal effects are partially neglected, it is difficult, with an animal model, to distinguish between maternal effects and the contribution of the dam to the genotype of her offspring, the direct genetic variance being inflated by part of the genetic maternal variance. It seems that another part of maternal heritability is included in the residual variance.
When adding an environmental maternal effect (analysis model AC), results were closer to true values: estimated direct genetic and residual variances and the estimated direct heritability decreased, part of the overall variance being accounted for by the added maternal effect. The direct heritability was slightly overestimated (ĥ 2 Ao = 0.23 for population 1 and 0.07 for population 2) for the simulation model AMR25. For the simulation model AMR50, the direct heritability was equal to the true value for population 2 and slightly under-estimated (ĥ 2 Ao = 0.16) for population 1. The introduction of this nongenetic maternal effect allowed us to take into account a fraction of the genetic maternal effects, which in the previous model was included in the direct genetic and residual variances. However, and particularly for the first population, the estimated environmental maternal variance contained only a part of the genetic maternal variance. Accounting for non-genetic maternal effects does not compensate for the overall overestimation due to the maternal genetic effects being ignored.
With the introduction of both genetic and environmental maternal effects (analysis model AMRC which is an overparameterised model compared to the simulation model) estimates were similar to those estimated with the correct model.

Simulation models AMR0C, AMR25C, AMR50C (genetic direct effects, genetic and environmental maternal effects)
For those more complex simulation models, the direct heritability was overestimated when using analysis model A. As compared to the simulation excluding maternal environmental effects (AMR0, AMR25, AMR50), this overestimation was similar in population 1, much higher in population 2, because a part of the environmental maternal effect seems to be included in direct genetic variation: when an environmental effect was added into the analysis model (AC) for population 2, direct heritability was not overestimated anymore. As before, the bias found on direct heritability depended on the value of the genetic correlation: the overestimation was less important for a genetic correlation of −0.50, as if the existence of a negative correlation between direct and maternal effects partially compensates the bias. Hence, as for the cases AMR0, AMR25 and AMR50, we can expect that direct heritability will be even more biased for higher values of the genetic correlation. This is on agreement with the study of Waldron et al. [47] using real data: genetic correlation estimated with a model including correlated direct and maternal effects varied between 0.09 and 0.30; with a model excluding maternal genetic effects, direct heritability was 1.3 to 3 times higher than the heritability estimated with the full model. Meyer [33] showed that there is a strong negative correlation (from −0.9 to −0.6) between genetic maternal variance and direct-maternal covariance estimators. This result shows that each modification of one of the components leads to a variation of the second one in the opposite direction which could explain why the gap between the true heritability and the estimated direct heritability increases when genetic correlation tends to positive values.
With the analysis model AMR, for the first population, the direct heritability and the genetic correlation were correctly estimated, but the maternal heritability was overestimated (ĥ 2 Am = 0.36). For the second population, only the direct heritability was correctly estimated. The estimated maternal heritability was four times higher than the true value and the genetic correlation was very negative (r AoAm = −0.40). The suppression of the common environment effect due to the dam acted on estimated genetic maternal effects by increasing them above their true value, irrespective of the true value of the genetic correlation. In fact, genetic maternal and environmental maternal effects are confounded depending upon the relationship among mothers. The value of maternal heritability estimated by this reduced model corresponded approximately to the maternal heritability increased by the c 2 value. Moreover, when environmental maternal effects were important, as in population 2, the high increase of maternal heritability was compensated by a decrease of genetic correlation. These results agree with those of Waldron et al. [47] and Meyer [34] in cattle and those of Koerhuis and Thompson [26] in broiler chickens. These authors observed a decrease of the maternal heritability on growth traits from an equivalent value of environmental maternal effects, when the latter are included in the model. A strong negative correlation between genetic and environmental maternal variance estimators helps to understand this result.
Generally speaking, a reduced model (with one or several random effects omitted) led to a variable bias, up to more than 50% of the true value, arising from a confusion between different variance components. On the contrary, fitting unnecessary random effects neither yielded biased estimates (genetic parameters relative to these effects being either equal to zero, or which cannot be estimated) nor substantial losses in the precision of estimates. This result is opposite to Meyer's [33] results who found an influence of unnecessary effects on the precision of variance component estimation.
It seems, as pointed out by Cantet et al. [9] that there has been an evolution of estimates for growth traits over the last fifteen years. Estimates of direct heritability tend to increase, whereas those of maternal heritability decrease. The negative values of genetic correlation estimates are less negative than before. The overestimation of maternal heritability when environmental maternal effects are omitted should thus be related to the fact that estimates of maternal heritability obtained with older methods prior to the REML-animal model tend to be higher than those of more recent works. Indeed, the damoffspring relationship, which was widely used to estimate genetic parameters before the use of the animal model contains a common environment effect provided by the dam to her offspring and leads, when this effect is neglected, to an overestimation of the corresponding parameter [16,17,25]. The variance estimated by the dam-offspring relationship also contains a dominance covariance between direct and maternal effects, which can be another source of bias.

Data simulation
Effects of data structure on the quality of parameter estimation were studied considering characteristics of the population as simulated in the previous part, the only differences being relative to level of connectedness and percentage of sires known. The model used for simulating data included direct effects, genetic and environmental maternal effects. To simplify the model, we assumed that direct and maternal effects were uncorrelated (model AMR0C of the precedent part).
Four levels of connectedness among flocks were simulated (connected, disconnected, 1 link sire, 2 link sires), which corresponded to four types of mating. In each situation, there was a ratio of one male for twenty females, and mating occurred randomly. Whatever the design, the offspring stayed in their dam's flock, so connections, when occurring, were only ensured by link sires. By and large, a part of the males and females were randomly mated within-flock (within-flock mating), the other part being randomly mated with reproducers from other flocks (between-flock mating). The number of animals of each type of mating (within-or between-flock) is presented in Table V. In the connected design, the between-flock mating represented 100% of the mating.
In the 2 link sires design, 2 males per flock contributed to connectedness by between-flock mating, the other mating being within flocks. In the 1 link sire design, between-flock mating concerned only 1 male and 20 females per flock. In the disconnected design, mating occurred within flock only. The coefficient of connectedness γ proposed by Foulley et al. [14] was applied in this paper. Nevertheless, in the same way as Hanocq and Boichard [20], this applied to measure connectedness between flocks instead of connectedness between sires. Thus with such an approach, γ can be defined as the ratio of the value of the prediction error variance of a contrast between two flocks using the full model to its value calculated under the reduced model. The full model involved both a fixed flock effect and a random sire effect. The reduced model, obtained by excluding the sire effect from the model, represented the optimal statistical situation from a connectedness viewpoint. The value of γ and the percentage of offspring coming from a link sire are presented in Table V. These levels of connectedness concern the direct effects: only the sires can have offspring in different flocks. Connectedness from a maternal viewpoint is only the consequence of connectedness at direct level ensured by sires.
Four pedigree structures were used to estimate genetic parameters: 100, 50, 20 or 10% of sires known. To simulate unknown paternity, a given percentage of sire identifications was randomly set to zero. Preliminary simulations on similar simulated populations (results not presented) have shown that below 10% of known sires, the number of different relationships was insufficient to estimate all variance components and estimated parameters depend on the starting value of the variance component estimation programme. With at least 10% of known sires, simulated and estimated parameters were equal.
Two situations were modelled. In the first one, the population was genetically homogeneous, whereas in the second one, flocks initially had different genetic means.
Connectedness level and knowledge of paternity corresponded to two different problems. Connectedness level was performed first by a mating plan assuming that all sires were known. When all relationships and performance were simulated, some sires were randomly sampled and supposed unknown for variance component estimation. Therefore, knowledge of paternity is an additional problem, independent of the population structure, but liable to hide the lack of connectedness.

Flocks with different genetic levels (cases GD10, GD20, GD30)
When flocks had different genetic levels, the genetic variance could be divided into two parts, the within-flock genetic variance (σ 2 Ao within for direct effects and σ 2 Am within for maternal effects) and the between-flock genetic variance  Ao between for direct effects and σ 2 Am between for maternal effects). Thus the (direct or maternal) genetic value of an animal of the base population was the sum of its genetic value in the population, plus the genetic mean of the flock in which it was born. No record was performed for animals of the base population because dams are unknown.
The same genetic means were used for all replicates in order to limit the variability of the sample, and after checking that, in spite of the small number of flocks, the distribution was not too different from a normal distribution. For the relative part of within and between-flock variances, three cases were simulated: direct and maternal within-flock variances were equal to 90% (GD10), 80% (GD20) and 70% (GD30) of the total variance, and direct and maternal betweenflock variances were equal to 10% (GD10), 20% (GD20) and 30% (GD30) of the total variance.

Same genetic level for all flocks (case GD0)
In this extreme situation, direct and maternal within-flock variances were equal to 100% of the total variance.

Subsequent years
The genetic simulation model was where Y ijk is the record of animal i with dam j in flock k; µ is the phenotypic mean of the population; A i is direct genetic value of animal i; M j is the maternal genetic value of dam j; C j is the maternal environmental effect; t k is the environmental effect of flock k; E ijk is the residual.
For offspring, residual effects were randomly generated and direct and maternal genetic values were equal to the average value of parents, plus the Mendelian deviation calculated using formulae (1) and (2), but the value used for variances σ 2 Ao and σ 2 Am varied according to the mating system. The withinfamily genetic variance (due to meiosis) depends on the gene pool to which it is possible to refer in the base population [15,46]. For a fully connected system, one may consider all flocks as a single population, and consequently, the within-family variance was computed from the total genetic variance in the base population. In contrast, for a completely disconnected system, one may consider each flock as a separate sub-population, and the within-family variance was computed only from the within-flocks genetic variance in the base population. For a partially disconnected system (1 or 2 link sires), the situation is intermediate between the previous two. Therefore, in this case, the withinfamily variance was computed by combining both within-and between-flock genetic variances according to the probabilities of gene origin of the considered parents.
A hundred replicates was run for each type of data set tested, with a distinct seed for the random number generator for each replicate. The same seed was used to simulate the genetic mean of flocks in order to limit the variability of samples.
Values of the variance components and parameters correspond to situation AMR0C of population 1 (Tab. II).

Data analysis
Data were analysed using the model with direct and maternal genetic effects and environmental maternal effects (model AMRC). A fixed flock effect was fitted in all situations. The data sets studied corresponded to the four connectedness designs (connected, 1 or 2 link sires, unconnected), the four levels of pedigree information (100, 50, 20 and 10% of paternity known) and degree of genetic differences among flocks (GD0, GD10, GD20 and GD30).
The average and the empirical standard deviation over replicates are calculated for variance components: σ 2 E , σ 2 Ao , σ 2 Am , σ AoAm and σ 2 C .

Results and discussion
Variances and covariance are presented in Tables VI and VII.

Precision of the estimates
As shown in Table VI, and in contrast to what has been observed by several authors for the estimation of genetic parameters [10,11] or for the prediction of genetic value [1,21,45], no clear pattern of reduction of precision related to a lack of connectedness was observed in this study, but it is possible that the number of replicates was insufficient to see an effect of connectedness on the precision of the estimates. However, standard deviation between replicates of covariance increased when genetic difference between flocks became more important.
Regardless of the level of connectedness, the alteration of genealogical information, through a progressive elimination of paternity, acted on the precision of variances and covariance, as shown in Table VII. Including the complete pedigree via the relationship matrix allows for a better dissociation, on, one hand, of genetic and environmental effects, and on the other hand, genetic effects among the latter, and provides greater precision. Table VI. Estimated parameters with different levels of connectedness for a situation with full pedigree information and same genetic levels between flocks (GD0) or for situations with a genetic difference between flocks equal to 10% (GD10), 20% (GD20) and 30% (GD30) of the overall genetic variance. Am : genetic maternal variance; σ AoAm : genetic covariance between direct and maternal effects; σ 2 C : part of variance due to maternal environmental effects; σ 2 P : phenotypic variance. GD0: no genetic difference between flocks. GD10: genetic difference between flocks equal to 10% of the overall genetic variance. GD20: genetic difference between flocks equal to 20% of the overall genetic variance. GD30: genetic difference between flocks equal to 30% of the overall genetic variance. Table VII. Estimated parameters with different percentages of known paternity for different situations of connectedness and a genetic difference between flocks equal to 30% of overall genetic variance (GD30).  Ao : direct genetic variance; σ 2 Am : genetic maternal variance; σ AoAm : genetic covariance between direct and maternal effect; σ 2 C : part of variance due to maternal environnemental effects; σ 2 P : phenotypic variance. GD0: no genetic difference between flocks. GD10: genetic difference between flocks equal to 10% of the overall genetic variance. GD20: genetic difference between flocks equal to 20% of the overall genetic variance. GD30: genetic difference between flocks equal to 30% of the overall genetic variance.

Influence of disconnectedness and genetic difference between flocks
With all sires known, and no initial genetic difference between flocks and the connected design (Tab. VI), the means over replicates of estimated variance components were close to true values. A decrease of gene flow across flocks (1 or 2 link sire designs) had no effect on direct genetic variance estimates, but led to a decrease of estimated maternal genetic variance. In the GD0 case, estimated environmental maternal and residual effects remained stable whatever the connectedness level was.
In the connected design, when genetic difference between flocks increased (GD10, GD20, GD30) estimated direct and maternal genetic variances decreased, but the trend was much more marked for maternal variance which went down to 77% of the true value for a genetic difference of 30%, whereas direct variance was equal to 95% of the true value. In parallel, estimated maternal environmental variance increased when genetic difference between flocks became more important.
Observed results depended on the connectedness level but also on the replacement rate of the females (each year, 1/5 of the females was replaced). At the beginning of the simulation, the direct and maternal genetic variances corresponded to within-flock genetic variance. When no connection was generated, the genetic variance estimated remained equal to within-flock genetic variance. This is what is observed in the disconnected design: estimated direct and maternal genetic variances for the cases GD10, GD20 and GD30 were equal to approximately 10, 20 and 30% of the true value, respectively. We can conclude that part of the genetic variability was eliminated with flock effect, so for the disconnected situation, the totality of the genetic variability between flocks disappeared. In the three other cases, the situation was different. Progressively, the establishment of connections across flocks took place by way of link sires which allowed estimability of genetic differences between flocks and to take them into account in the overall genetic variance estimation. This latter increased and got closer to the true value. According to Kennedy and Trus [24], relationships across flocks make possible to reduce the sampling error for genetic difference between flocks, by adding a sampling positive covariance between them. When no link sires exist, neither the between-flock genetic variance nor the environmental variance between flocks are accounted for and, with such a data structure, the animal model is not able to dissociate both components [24]. Thus across flock relationships, in order to restore connection, is required to separate correctly fixed and random effects and to estimate between-flock genetic variability. As connectedness was ensured by sires, the gap between true and estimated direct genetic variance decreased rapidly as simulation progressed. However, our simulation demonstrated that when genetic difference between flocks was high (GD30), the genetic connectedness achieved during six years was insufficient for a correct estimation of the variance components. Other results (not presented) showed that with a larger number of years of simulation, estimated values were closer to true values (for example, when the simulation was conducted during three more years, direct and maternal genetic variances were equal to 69.9 and 88.9, respectively, in the connected situation for the GD30 case).
Concerning maternal genetic variance, more time is required to ensure that simulated connected design is efficient. For example, at the end of the simulation, the maternal grand-dams had still passed on 78% of the gene of their flock and the dams, 39%. Therefore, six years were not sufficient to take into account the overall genetic variability, which was still close to within-flock genetic variance.

Influence of percentage of knowledge of paternity and disconnectedness
Because of the similarity of the results between the different cases tested, only the GD30 situation is described (Tab. VII) for different levels of connectedness and different percentages of sires known. In the connected population, with all sires known, both direct and maternal genetic variances were underestimated, whereas maternal environmental variance was overestimated. When connectedness was incomplete, estimated genetic variances still decreased, for the disconnected design, down to 68% of the true value for direct genetic variance, and to 71% of the true value for genetic maternal variance, with part of the genetic variability being eliminated with flock effect. Elimination of a part of paternity accentuated the under-estimation of direct and maternal variances: when only 10% of sires were known, in the disconnected population, estimated direct genetic variance was equal to only 40% of the true value. Thus disconnecting the system and discarding sires correspond to two different mechanisms (one of which is due to population structure, the other to data recording), but lead to the same result in terms of variance component estimation. Recording an incomplete pedigree can mask a connectedness problem. Whatever the percentage of known sires, when the connectedness level decreased, estimated covariance between direct and maternal genetic effects increased. Estimated maternal environmental variance was higher than its true value, except for the disconnected designs where this component was unbiased.
It seems that with the disconnectedness situation and incomplete sire identification, some additional variability is attributed to the dam-offspring genetic covariance and direct and maternal heritabilities are under-estimated. These results are in accordance with Gerstmayr's observations [18]: when one of the heritability estimates (direct or maternal) increases, the dam-offspring genetic correlation decreases, and inversely.
The optimal situation for which the flocks all have the same genetic level is probably rather rare, in particular when the animals can have various genetic origins. The choice of the threshold beyond which the number of known sires will become sufficient to obtain an unbiased and relatively precise estimate depends on the data structure of the studied population. For extensive systems, one reliable solution could be the use of DNA markers. The conditions for successful DNA fingerprinting depend not only on the cost of the method, but also on the breeding system of the animals (number of males, numbers of animals per flock, grazing area, etc.). Barnett [4] showed that, recently, the cost generated when applying these methods to determine maternal pedigree in Australian flocks of Merino sheep for the prediction of genetic values, was higher than the return from extra productivity.

CONCLUSION
The aim of this article was to study some of the genetic parameter estimation difficulties. The adequation of the estimation model is of particular interest since a reduced model leads to biased estimates. The importance of the bias depends on the true values of genetic parameters. When maternal heritability is low, exclusion of the dam effect does not affect the estimates much. However, estimated direct heritability can be more than doubled when maternal effects with a high influence on the trait are ignored. The bias was accentuated when the true genetic correlation was close to zero. When maternal effects with an environmental origin have a low influence on the trait, as it has been found in literature concerning temperate climates, the consequences on the estimates are only minor. However, when part of the variance due to maternal environmental effect reached 0.20, estimates of the other parameters were biased.
Data structure can affect the precision of variance components. Insufficient paternal genealogical information increases empirical standard deviations between replicates of the estimates, whereas insufficient genetic connectedness do not seem to act on the precision of the estimates. Data structure can also affect the unbiaseness of variance components: estimations are biased by the absence of genetic connection and unknown paternity. The fact that flocks have different genetic means highly accentuates the bias, so the use of link sires to establish connections is a major concern. Finally, extreme cases where sires are totally unknown or with no genetic connection between flocks, can make parameters non-estimable.
However, while data structure and analysis model affect the quality of the estimation, some situations will not be greatly affected. A bias of 10% for example will not be a problem for estimating genetic parameters, while it will have serious effects on the prediction of genetic values in a selection scheme.
It might be useful, under such conditions to consider the application of DNA fingerprinting for pedigree determination.
Even if the animal model has the capacity to thoroughly describe genealogical relationships, the analysis of a variance model and the structure of the animal population must be carefully controlled to get correct variance component estimations. In conclusion, the animal model is able to correctly dissociate variance components provided that all the necessary information is available.
These results were obtained in simplified simulation conditions. With real data, the problem becomes more complex and there are several additional causes of bias -due in particular to incorrect definition of the model -which are likely to interact. The statistical model can be inadequate for example if (co)variances are not well-described, as in the case of discrete traits or heterogeneous variances. A good definition of the biological model is important because direct and maternal effects can interact depending on environmental conditions. Finally, the genetic model can be different or more complex than the models used in the present experiment. This is the case, for example when trait expression is governed by a limited number of genes, or in a mixed inheritance situation or when dominance and epistatic effects are present.