Connectedness in the French Holstein cattle population

An analysis of connectedness in the French Holstein cattle population was carried out. This study was motivated by the fact that artificial insemination (AI) bulls are evaluated at the national level, whereas they are usually progeny tested only in the region of their AI stud. Connectedness among AI studs was measured by the generalised coefficient of determination (CD) of contrasts between mean breeding values of bulls from the different AI studs. Four connectedness components were distinguished. The relative influence of each component was assessed through the increase in prediction error variance (PEV) of the contrasts after this information was discarded. CDs of contrasts were always higher than 0.80. Therefore, connectedness level among AI studs was high and provided an accurate national genetic evaluation. Out of the different components of connectedness, withdrawing of proven bull connection data caused the greatest increase in PEV (+47.5 %) primarily due to the change in the connecting structure of the data. Genetic relationships among bulls were the next important source of information. In contrast, contributions from the planned use of the bulls progeny tested were quantitatively limited (8 % increase in PEV) and foreign semen had a minor contribution (2 % increase in PEV). However, in spite of its limited quantitative impact compared to the other components, planned sampling bull connectedness is recommended because it provides high quality data for model validation and bias investigations. © Inra/Elsevier, Paris


INTRODUCTION
In dairy cattle, artificial insemination (AI) has enabled breeding programmes to develop and expand geographically. As a result, exchanges of germplasm have increased, leading to intense national and even international competition between breeding companies. In most cases, exchanges are determined by the genetic level of populations evaluated in different environmental conditions. In this context, the animal model BLUP became the most widely used method in the 1990s, because it provides the most likely ranking of animals !9!. However, this ranking is correct only if the design is well connected and provides unbiased and accurate comparisons between animals and between subpopulations in different environments, i.e. herds, regions or countries. In terms of connectedness, two problems could lead to an incorrect ranking of animals distributed in different environments. On the one hand, quantitatively limited genetic ties between candidates may cause inaccurate contrast predictions, as simulated by Hanocq et al. !8!. On the other hand, connecting data may be affected by uncontrolled factors, such as preferential treatment, and produce biased estimates.
In French dairy cattle, the extensive use of artificial insemination (over 90 %) a priori suggests that there are sufficient genetic ties among herds to allow across herd genetic evaluation. A single genetic evaluation is performed for the whole country in order to provide a unique ranking of animals at the national level !1!. However, as each AI stud is specifically attached to a geographical region, sampling bulls (i.e. young bulls being progeny tested with their first crop of daughters) are mostly progeny tested within this region. Some connections are created through the exchange of proven bull semen between regions, but they are not controlled quantitatively or qualitatively. Although the quality of the design has been continuously improved since 1986 with the addition of planned connections from sampling bulls, it is worth studying the accuracy of contrasts of sampling bulls from different AI studs and, therefore, the connectedness of the overall design.
Connectedness was first an all-or-none statistical concept developed by Bose [2] followed by Eccleston and Hedayat [3]. As developed by Searle [15], a connected design is a design in which it is possible to estimate all contrasts.
Foulley et al. [5,6] proposed a continuous measure of connectedness suited to very unbalanced designs currently met in animal breeding and extended the connectedness measure to random effects. Recently, various criteria have been proposed to measure connectedness for a given contrast. To study the influence of including a factor in a model, Foulley et al. [5,6] proposed to express the prediction error variance (PEV) of a contrast relative to that observed in a reduced model without this factor. Lalo6 (13J extended the concept of coefficient of determination to any contrast between random effects. Kennedy and Trus (12J described connectedness as a gene flow between subpopulations, or in terms of relationships between animals distributed across levels of fixed effect. The purpose of this paper was to verify the assumption that connectedness among AI studs is sufficient to provide a reliable overall genetic evaluation in the French Holstein cattle population. Various components of connectedness were identified and their contribution to the overall connectedness among AI studs was determined.

MATERIALS
Milk yield records were extracted from the French national evaluation files of the Holstein breed. First lactation data recorded in 1993 (467 947 lactations) were analysed to study the connectedness between the five most important French AI studs. Only AI bulls were considered and classified into three groups according to their birth year and origin: proven bulls, foreign bulls or sampling bulls (i.e. young bulls being progeny tested with their first crop of daughters). Sampling (1091), proven (150) and foreign bulls (33), were required to have at least 20, 200 and 200 daughters with performance recorded in 1993, respectively. Eighty-four per cent of the females included in this analysis were born from proven bulls, 11.0 % from sampling bulls and 5.3 % from foreign sires.
Each AI stud is attached to a geographical area defined by the insemination activity of its membership co-operatives. As a consequence, its sampling bulls were mostly progeny tested within this area. Nevertheless, AI studs were connected to each other through link records. Later on in this paper, a link record is defined as a performance of a female recorded in a region different from the region of the stud of her sire. Link records contributed 21.5 % of the whole data set. Five main sources of connectedness were identified.
-Connections due to proven bulls. As proven bull semen was widely spread over the whole country, many daughters were recorded out of the region of origin of their sire. As shown in table I, exchanges were not fully balanced, reflecting differences in genetic superiority and marketing policy of each AI stud. For instance, exchange was heavier between AI studs A and C and between AI studs B and D than between others. Globally, proven bull link records were quantitatively very important (15.5 % of the total data set and 71.9 % of all link records). Consequently, these link data contributed largely to AI stud comparison and to overall connectedness.
-Connections due to sampling bulls. To a much smaller extent, sampling bull daughters were also spread over the whole country (table I). These links resulted from planned semen exchanges between AI studs, in order to voluntarily improve the connectedness of the progeny test programme and to test bulls simultaneously in several regions. Since 1986 and according to a national agreement, each AI stud exchanged with the others at least 10 % of the semen of at least 20 % of the sampling bulls. In practice, exchanges were more numerous, owing to specific collaborations between some AI studs and because of female trade. Sampling bull link records represented 0.8 % of the total data set and 3.6 % of all link records.
-Connections due to foreign bulls. Daughters sired by foreign bulls were spread over the whole country. They represented from 3 to 12 °70 of the females in each region, and 5.3 % on average (table 7). They contributed 24.5 % of all link records.
-Connections arising from genetic relationships among bulls. The sampling bulls of different AI studs might have common ancestors. These genetic ties, considered through bull sires and maternal grand sires, were taken into account through the relationship matrix A. The mean values of the elements, within and between AI studs, are shown in table 11. Values of within as well as between AI stud coefficients were quite high because of the reduced number of origins, particularly the number of bull sires.
-Connections due to other relationships between females and particularly dam-daughter relationships. In total, 9 825 cows with a performance in another region than their dam were found in the data set. However, these connections arising from the sales of females and embryos, were not investigated in this study because they would have required more than 1 year of data and they cannot be studied with a sire model.

METHODS
This study measured the degree of connectedness between AI studs and the relative contribution of each component. To reach this goal, several methods could be used, each closely associated with a statistical model. The most straightforward model is the model used in genetic evaluation, i.e. an animal model applied to three lactations and to the complete data set spanning 25 years and describing the whole selection process. However, with such a complex data set, disentangling the connecting components and measuring their respective weight appeared to be impossible. A smaller data set reduced to first lactations and to 1 year of recording (1993), analysed using a sire model, made it possible to focus on the information of interest, i.e. the major links across AI studs. Two models might be considered, with or without genetic groups. A model with fixed group effects assumes different subpopulations and would provide their ancestors, assumed to be normally distributed s N N (0, A O &dquo;;), with A being the numerator relationship matrix, and 0 & d q u o ; ; the sire variance. Genetic relationships among sampling bulls were considered through their sire and maternal grand-sire. The heritability h 2 was assumed to be 0.25; e is the vector of random residuals, assumed to be normally distributed e N N (0,1 0 &dquo;;), with af being the residual variance; X, Z are incidence matrices.
Note that some bulls might appear as proven sire as well as sire or maternal grand-sire of sampling bulls. These bulls were considered twice in the model, once as a fixed effect, in order to study the impact of link records of their daughters, and a second time as a random effect, in order to study the impact of relationships. This simple approach easily enabled the different components of connectedness to be distinguished, although it did not provide a totally exact picture of the true situation.
From equation (1), the PEV matrix relative to the sampling bulls is proportional to: where M is the absorption matrix for fixed effects, equal to 4 _ h2 2 and A = 4 h2 h (= 15). C can be partitioned into four blocks as follows: where C TT and C AA are blocks corresponding to the sampling bulls and their ancestors, respectively. In order to measure connectedness among the sampling bulls, analysis can be restricted to C TT .
For bull k, the individual CD was computed as: where {CTT}! and {A} k are its corresponding diagonal coefficients of the matrices C TT and A, respectively. When relationships were voluntarily partially or completely omitted from the analysis, an incorrect relationship matrix was used (A * instead of A). As a consequence, the inverse matrix corresponding to the sampling bulls in equation (3), obtained from (Z'MZ + ÀA * )-1 : CT T , does not directly lead to the PEV. The PEV were obtained from instead of C TT [10].

Connectedness criteria
The connectedness, measured with the CD criterion proposed by Laloe !13!, was expressed for a given contrast x's as follows: where Var(x's!s) is the variance of the contrast given the predicted sire effect and equivalent to the PEV of the contrast. The PEV of contrast ( X 'C TTX ) is related to its maximum theoretical value in the population (x'A TT x). The closer to 1 the value of CD, the greater the accuracy of the comparison.
To analyse connectedness among AI studs, two kinds of contrast x's were defined. For AI studs i and j, with n i and n j sampling bulls, respectively, the vector of contrast x was defined with terms 1/n z and -1/n j for bulls belonging to AI stud i and j, respectively, and 0 otherwise. The vector of contrast between a particular AI stud i and all the others jointly was defined with terms 1/n i for bulls belonging to AI stud i, and -! otherwise.
nj J j!i The effect of connectedness on the CD of contrasts was analysed by comparing these CDs under the full model and a reduced model without herd effect. Under the full model, the CD reflected both the amount of data and the connectedness of the design. Under the reduced model, the design was fully connected and the CD of contrast between AI studs, called optimal CD, reached its maximum value and was only influenced by the amount of data involved. The impact of each source of connectedness was analysed by comparing the PEV of the contrasts, after excluding the link records of interest (PEV R ; R for reduced data set), with its value calculated with all data included (PEV F ; I F for full data set). It was expressed through the variance ratio r: In contrast to Foulley et al. !6!, the notations F (for full) and R (for reduced) characterise the data set used instead of the model. By using the criterion r, emphasis is laid on the increase in prediction error variance, or equivalently, on the decrease in accuracy, relative to the reference situation, with all data. The criterion r can be expressed as a function of CD F and CD R computed for the full and reduced data set, respectively !7!: However, exclusion of link data affected both the amount and the structure of information. For a fair comparison, the effect of data structure should be measured for a constant amount of data. Therefore, another analysis was carried out on a reduced data set obtained after randomly removing within each bull a number of records equal to the number of link records. Given that the data set is large enough, the random exclusions of data affected only the quantity of information, whereas the expected structure of the design was statistically unchanged. Several replicates were performed for random exclusion for each component and results were very similar over replicates. 3.3. Comparison to a theoretical design In practice, typical values of CD of contrast were illustrated by the definition of a theoretical and simple design providing the same CD values. This simple design involved only unrelated sampling bulls in two AI studs attached to their own region. Numbers of bulls per AI stud, total number of daughters per bull and distribution of daughters across the regions were the three parameters influencing the CDs of contrasts. Numbers of bulls per AI stud were known (table I). The total number of daughters per bull (n) was the number of daughters required to reach the optimal CD of contrast derived from the real population assumed the AI stud effect known (table 11!. The distribution of daughters across the regions, assuming sampling bulls having n l daughters in the region of their own AI stud and n 2 daughters in the other one (n l +rc 2 = n), was then deducted from the true CD of contrast. The proportion of link records was computed as n 2/ n. 4. RESULTS Table IV presents the average individual sampling bull CDs within AI stud. They varied from 0.71 to 0.76 under the full model, and from 0.74 to 0.78 under the reduced fully connected model. These individual CDs were obtained with 38 (AI stud D) to 51 (AI stud A) actual daughters per bull, whereas they corresponded to 34 47 effective daughters only. This loss in accuracy was in agreement with the expected cost of herd effect estimation, owing to the limited contemporary group size, and was not related to connectedness. Because of the rather large size of sampling bull batches (93-362, according to the AI stud), individual CDs would not have been greatly affected even in a fully disconnected situation.
CDs of contrasts calculated for each AI stud pair (table V) (table III), i.e. much more than the actual number of daughters. This result is due to the multiple sources of connections in the true situation, whereas only sampling bull link records are involved in the theoretical design.
Given the effective numbers of daughters computed above, the theoretical distribution of the daughters across regions which provided a similar decrease was computed. Thus, the simplified design which had the same characteristics as the real population was a design where the percentage of daughters with performance out of the region of their sire varied from 33 % (AI stud B) to 21 % (AI stud C). As a conclusion, the accuracy of the contrast between AI studs corresponds to the accuracy of a progeny test with 100-155 daughters, out of which 21-33 % are link records.
The various sources of connectedness were successively or simultaneously omitted. Their overall influence, including the effect of both the amount and structure of data, were measured by the impact of their omission on the PEVs of genetic difference estimates between AI studs. Results, shown in table VI, were very similar for all AI studs. The exclusion of proven bull connections resulted in the highest increase in PEV (47.5 %). Withdrawing sampling bull connections resulted in an increase in PEV of 8 %, whereas the increase was limited to 2 % when foreign semen data were excluded. When all connections between the AI studs (including pedigree) were excluded, the PEV increase was found to be 223 %. Of course, in this situation, the PEV reached its maximum theoretical value and corresponded to a zero CD. Nevertheless, computation of PEV was feasible because of the prior information on sampling sires, accounted for in the model as a random effect. Rao [14] showed that Z'MZ + !A-1 is always a positive-definite matrix and, therefore, invertible.
This increase in PEV was then attributed either to changes in the quantity or the connecting structure of the data. Results were fairly similar from one AI stud to another. A proportion of 89 % of the overall increase in PEV, observed when proven bull connections were omitted, was attributed to the change in the connecting structure of the design and only 11 % of the increase was due to a reduction in the amount of information. For sampling sires, the results were completely opposite. Of the overall effect of the omission of these data, 32 % were attributed to the change in the structure and 68 % to the change in the quantity of data. Of the overall increase in PEV, found by simultaneously excluding all components of connectedness, 83 % were due to the change in the connecting structure of the data.
The study of connectedness resulting from genetic relationships among bulls was carried out in two successive stages. First, only relationships between sampling bulls owned by different AI studs were removed from the analysis, while genetic relationships within AI organisation were still considered. As shown in table VII, an increase in PEV of around 39 % was observed, varying from 35 to 45 %. In contrast, when all between and within AI stud genetic relationships were excluded, PEV increased by only 7 to 15 % according to the AI stud, and 10 % on average.

Connectedness in the Holstein breed in France
Most individual CDs were at least 0.70. The effective numbers of daughters corresponding to the individual CDs were quite close to the real values and these differences (four daughters on average) could be almost entirely attributed to the cost of contemporary group effect estimation. Theoretically, individual CDs also depend on connectedness, i.e. on distribution of daughters over regions. However, owing to the quite high number of sampling bulls per AI stud, each bull is compared to a reasonably large subpopulation and the practical effect of connectedness on individual CDs is very limited. As a consequence, the individual CD is not a useful criterion to assess connectedness level. Every CD calculated for contrasts between AI stud pairs and for contrasts between a single AI stud and the others jointly was at least 0.80. In terms of connectedness, the actual design is equivalent to a theoretical design with 100-150 daughters per bull, of which 21-33 % are link records. It is interesting to note that the effective numbers of daughters corresponding to these CDs of contrasts between AI studs are much greater than the actual numbers of daughters. This emphasises that this high accuracy of contrasts depends on the whole design and all connectedness components, and not only on sampling bull link records.
These results suggest that accuracy is sufficient to allow comparisons of the genetic level of the AI studs in France. None of the AI studs appears to stand out in any way from the rest of the population. Therefore, although the national breeding programme involves several AI organisations, the connectedness of the design is sufficient to run a unique sire evaluation with a high theoretical accuracy.
For a more accurate description of connectedness among AI studs, the various sources of connectedness were analysed separately. From a quantitative point of view, proven bull connections provide the highest contribution to overall connectedness. Foreign semen is of minor importance. The contribution of sampling bull connections is of intermediate value. The weight of each source is not directly related to its amount of data. For instance, the relative influence of sampling bull link records on overall connectedness is higher than suggested by the limited number of data involved.
The impact of the different sources of connectedness is due to the quantity of the data, and to the connecting structure of these data. In these terms, proven and sampling bull connections play completely opposite roles. As shown previously, influence on accuracy was essentially due to the connecting structure of data in the former case, and to the amount of data in the latter case. There are two main reasons for this difference. Excluding proven bull link records does not affect the primary information to evaluate the sampling bulls who still have the same number of daughters, whereas excluding sampling bull link records logically decreases their number of daughters. Second, semen exchanges are essentially bilateral for sampling bulls, whereas proven bull semen is distributed at a national level, allowing for a better connecting structure. Depending on the AI stud examined, the exclusion of foreign semen data had a variable but limited impact on the connecting structure of data. Indeed, use of foreign semen is very unbalanced from one AI stud to another and is generally restricted to a class of breeder who contributes little to progeny test. Genetic relationships among bulls within and between AI studs had opposite effects. Connectedness increases with relationships between groups, whereas it decreases when the within group relationship increases. This conclusion is in agreement with Kennedy and Trus [12] and Hofer !11!. Relationships between sires of a given group mean that genetic values of animals are correlated and lead to a decrease in the effective group size.
The fact that the computation of the PEV was feasible when excluding all connections between the AI studs (including pedigree) illustrates that the connectedness may have a different significance when the effect of interest (sampling bull effect in this paper) is considered as random or fixed. In the case of a random effect, prior information (on expectation, variance and distribution of breeding values) is taken into account in addition to the amount and the structure of data, and as explained in detail by Foulley et al. [5], connectedness in the sense of estinrability is always ensured, which can be a source of inaccuracy or bias if results are interpreted without caution !8!.
From this study it can be said that in the Holstein breed in France, quantitative connectedness among AI studs is sufficient to run a national genetic evaluation covering all AI organisations. Genetic differences between AI studs are estimated with a high theoretical accuracy. This is absolutely necessary to ensure the overall efficiency of a selection scheme !8!. As a matter of fact, a high accuracy in estimation allows the highest overall selection intensity and therefore optimal gene flows across AI studs. Despite guarantees of quantitative aspects of the connectedness among AI studs, any evaluation of animals from different origins needs to be validated through post-analysis of the quality of the connecting data. In order to study this aspect, Hanocq [7] measured on the same data set the impact on the estimates of excluding sources of connectedness. He showed that none of the components studied, including foreign or French proven bulls whose use is not controlled for regional effect or preferential treatments, generated biased evaluations of sampling bulls. This finding agrees with the analyses of the distribution of the residuals by regions within sire origin.
In a more general way, the international situation can be considered as an extension of a national situation, by replacing AI studs or regions by countries. The increase in the international trade of genetic material from dairy cattle has intensified competition between countries and it is based on a unique ranking of bulls of different origins. The methods used in the present study could be readily applied to the international level. 5.2. Overall considerations on studies of connectedness Connectedness was measured using two complementary approaches: i) based on the CD criterion characterising the accuracy of a contrast (13!; and ii) based on the comparison of PEV in two situations (complete or reduced by the factor studied) in a similar way to that of Foulley et al. !6!. The choice of criteria used to measure connectedness depends upon the purpose of the study and the structure of the population. In this paper, two criteria were used to describe first the absolute level of accuracy and second the relative impact on accuracy of a given component. It is very important to consider both these aspects simultaneously as illustrated by Hanocq et al. (8!. The CD approach was used to describe the first aspect because it is a very well-known concept providing easily interpretable values. The absolute level of accuracy being characterised, the PEV appears to be very convenient to measure a relative change in accuracy.
In this paper, the model used to describe connectedness was highly simplified in comparison to the animal model used in the official evaluation, but for the purpose of this study, this simplicity did not affect the results to any great extent. The model would not have provided reliable estimates of breeding values but it allowed a good measure of the real connectedness between AI studs. In fact, the use of the animal model would make it possible to account for a larger number of relationships, but many of these additional relationships are within herds. In terms of connectedness, the most important relationships are already accounted for in a sire model. The study focused on connectedness among the sampling bulls being tested, for which accurate selection is a key factor for the efficiency of the breeding programme. In this paper, bulls were grouped only through the definition of the contrasts used to measure connectedness. Thus, no assumption was made on connectedness between animals when allocating them to the same AI stud. This allowed a more realistic description of the effective connectedness. As in Hanocq !7!, this flexible strategy could be extended to any contrast definition, for instance within AI stud, across herds, years, small regions or levels of fixed effects in the model. In contrast, if the bulls had been grouped through groups explicitly included in the model of analysis, the study of connectedness between AI studs would have been carried out under the unverified assumption of a perfect connection of the animals within a group.