The accuracy of DGV cannot be assessed in the training set but must be assessed in a sample of individuals that are not included in training. Multi-fold cross-validation in beef cattle has some advantages described below in comparison to partitioning the genotyped animals into two groups (old and young animals), with training in older animals and validation in younger animals, which is the usual approach in dairy cattle studies [17, 18]. Using multi-fold cross-validation, the DGV can be obtained for all genotyped animals in validation sets, while large training sets can be retained.
Habier et al.  showed that the accuracies of DGV depend on both genetic relationships between individuals in the training and validation sets and on linkage disequilibrium (LD) between markers and quantitative trait loci (QTL). They showed that the accuracy of DGV for a selection candidate decreases as the average genetic relationship of the candidate to the training set individuals decreases. In beef cattle, many registered selection candidates are produced by natural mating sires, which may be distantly related to the individuals in training sets. If the accuracies of DGV are more dependent on genetic relationships than on marker-QTL LD, then the effectiveness of genomic selection will be limited in practice for such distantly-related selection candidates. Saatchi et al.  showed that conservative accuracies of DGV that are less affected by relationships can be obtained by minimizing the genetic relationships between training and validation sets using K-means clustering. In this study, we also used K-means clustering and found greater amax values between groups than reported by Saatchi et al.  for Angus beef cattle. This indicates that the genetic relationships were greater between training and validation sets for the Limousin and Simmental populations used here than for the Angus population used by Saatchi et al. .
In simulation studies, the correlation between DGV and true breeding values (TBV) has been used to represent the accuracy of DGV. However, in field data, TBV are not available and the correlation between DGV and the response variable (phenotype records, EBV, DEBV, etc.) is commonly used to derive the accuracy of DGV. In dairy cattle, the correlation between DGV and DEBV (or EBV) is a good estimate of the accuracy of DGV, because the reliabilities of DEBV are high. However, in beef cattle, for which the reliabilities of DEBV are usually low (less than 0.70, Table 2), these correlations typically underestimate the accuracy of DGV due to the contribution of environmental effects and random error to the DEBV. In some studies, the correlation between the DGV and the response variable is divided by the square root of the average reliability of the response variable [17, 20] to adjust for the underestimation of the accuracy. However, this adjustment does not consider the heterogeneous error variance that is associated with DEBV when they have different reliabilities in the validation animals, which may lead to bias. Saatchi et al.  standardized the covariance between DGV and DEBV by an estimate of the genetic variance to estimate the accuracy of DGV in American Angus beef cattle. In this study, estimates of genetic variance were not available to apply that method. Instead, the estimate of the genetic correlation of a trait with its DGV was used to estimate the accuracy of DGV, as the square of these correlations represents the proportion of genetic variance accounted for by the genomic information if the DGV has heritability 1.
In general, the DGV accuracies obtained here are lower than those reported for dairy cattle for traits with similar heritabilities [17, 18, 21]. Saatchi et al.  also reported that accuracies of DGV using the BovineSNP50 BeadChip were lower in Angus beef cattle than in dairy cattle. One reason is that the accuracies of EBV (used to derive the DEBV response variable) are lower in beef cattle than in dairy cattle because of a less extensive use of artificial insemination in beef bulls having fewer progeny with production records [22, 23]. Estimates of SNP effects and resulting DGV will be more accurate as the accuracies of EBV (or DEBV) increase, because the response variable will be closer to the TBV. In dairy cattle, the accuracies of DEBV in the training set are much higher than in beef cattle and the number of animals with high accuracy DEBV is greater. The average accuracy of the EBV for traits studied by Su et al.  was 0.89, compared to 0.57 and 0.52 in our Limousin and Simmental populations. The size of training population is an important factor affecting the accuracies of DGV , which is typically higher in dairy than beef cattle [17, 18, 21]. Furthermore, it has been common for dairy cattle studies to validate DGV on progeny , and progeny are more highly related to the training population than the situation we have created here with K-means clustering. Also, different extents and patterns of LD have been reported for beef and dairy cattle [24, 25], which could contribute to the lower accuracy of DGV reported here. The different approaches used to measure the DGV accuracies could also explain these differences.
Estimates of variances and of covariances between traits and their respective DGV obtained in this study indicate that the heritabilities of the DGV were 1 for most traits in both the Limousin and Simmental population. Heritabilities of 1 are expected for perfectly inherited attributes, such as SNP genotypes or linear functions of SNP genotypes. However, heritabilities less than 1 (ranging from 0.75 to 0.95) were estimated in Angus cattle using a similar K-means clustering and cross-validation approach . In that study, the complete numerator relationship matrix among individuals of all clustered groups was used in the bivariate animal model, rather than a matrix with zero covariances between animals that are in different groups. By using the full relationship matrix, the heritability of the DGV was underestimated in  because the linear functions that predict DGV were different for each group. Using the complete numerator relationship matrix also resulted in estimates of the trait heritability from bivariate analyses to be biased downwards () compared to the values used in national evaluations (, Table 2). This downwards bias was removed when the DEBV were used to estimate heritability in a single trait model, i.e. ignoring the correlated DGV. Setting the relationships between animals in different groups to zero results in the derivative of the likelihood function being pooled from the derivatives of the likelihood functions that would be obtained from separate analysis of each group. Furthermore, when setting relationships between groups to zero, the heritability of the DGV is depressed only by SNP genotyping errors and the heritability of the DEBV is essentially the same as that obtained from single trait analyses of DEBV. Zeroing relationships between groups results in a block structure to its inverse of the variance-covariance matrix and avoids any cross-products between DEBV used to derive an individual’s DGV and the individual’s DGV. That cross-product introduces some residual covariance between DEBV and DGV but these are assumed zero in the bivariate model used here. We believe that this approach makes better use of the data than could be achieved by estimating the genetic correlation between DEBV and DGV separately in each group and then pooling those estimates.
The estimated trait-DGV genetic correlations varied between traits due to different quantities of information and possibly different genetic architectures between the traits (Table 5). Hayes et al.  showed that the accuracy of genomic predictions is higher for traits with a higher proportion of large effect loci than for traits with no loci of large effect. Furthermore, the LD between BovineSNP50 BeadChip loci and QTL could differ between traits and between breeds. The difference in the trait-DGV genetic correlations was relatively small between low and high heritability traits due to the use of DEBV as the response variable, which makes accuracies less dependent on heritability itself and more a function of the EBV accuracies. In general, estimates of trait-DGV genetic correlations were higher in Limousin than in Simmental animals (the averages across traits were 0.55 and 0.50 for Limousin and Simmental, respectively). This may be because registered Limousin animals have a more homogeneous genetic background than the Simmental animals, as the amax values between groups were higher for Limousin than for Simmental. Both these US associations allow registration of crossbreds with other beef cattle breeds but upgrading and composite cattle are more common for the American Simmental Association than for the North American Limousin Foundation.
The estimated trait-DGV genetic correlations reported here for Limousin and Simmental animals were lower than those reported for Angus beef cattle by Saatchi et al.  for most traits that were in common in these studies. This could be due to the different selection strategies practiced within Angus compared to Continental breeds or due to differing allele frequencies among the breeds, which could affect the extent of LD between markers and causative genes and consequently the accuracies of DGV. Also, larger training population sizes (about 3570 total genotyped animals) were used in Angus . Using 1006 Angus animals genotyped with a proprietary 384 SNP panel developed by Igenity (Duluth, GA), MacNeil et al.  reported a lower trait-DGV genetic correlation for marbling (0.38) than obtained here in Limousin and Simmental animals (correlations of 0.65 or 0.63, respectively, Table 5). The different SNP panels used for the prediction of DGV and different training and validation populations largely explain these differences. In the commercial implementation of genomic prediction in these breeds, training will use all genotyped animals from each breed to predict DGV for selection candidates, which usually are young animals without phenotype records. Thus, higher correlations than reported here are expected due to the larger training dataset sizes (from all genotyped animals) and closer genetic relationships between training and implementation populations.
In beef cattle, birth weight is typically the only observation on a young bull at the time of castration when a decision is made to retain the bull. While the animal’s birth weight may contribute to EPD calculated for weaning weight direct and yearling weight in a multi-trait analysis performed before selection, in general, parent average information is the main source of information available for selecting young animals. The efficiencies of selection based on DGV in comparison to selection based on parent average information (Table 6) indicate that selection on DGV was more efficient than using parent average information only for some traits. In general, the parent averages used here have higher reliabilities than parent averages that are available at the time of birth of their progeny because they include information on progeny and grandprogeny that would not normally exist at the time of selection. Also, the available parent averages are not adjusted for selection to account for the Bulmer effect. Bijma  showed that the accuracy of parent average is dramatically reduced by selection, up to a factor of three fold, which is ignored when computing reliabilities in national genetic evaluations. This leads to an underestimation of the efficiency of selection on DGV relative to selection on parent average when the DGV accuracies obtained by cross-validation are compared to parent average accuracies obtained from national genetic evaluation.
Reliabilities of parent average are lower and so the efficiencies of selection on DGV versus parent average are higher for non-genotyped young animals, which represent the selection candidates. The distribution of parent average reliabilities exhibits considerable variation within and between traits and therefore there is considerable variation in efficiencies of selection on DGV versus parent average (see Additional file 1). Variation in parent average reliabilities within a trait reflects the difference in reliabilities between natural mating sires that have few progeny and AI sires that can have many progeny. Unlike dairy cattle, natural mating is widely used in the beef industry. These factors lead to bimodal distributions of parent average reliabilities for some traits. Even for traits with efficiencies smaller than 1, there are opportunities for selection on DGV at least for that fraction of the population sired by natural mating sires. The proportions of registrations from natural mating sires are about 63% among Limousin calves in 2011 (Bob Weaber, K-State University, personal communication) and about 40% among Simmental calves (Wade Shafer, American Simmental Association, personal communication). This means that there are more benefits from selecting on DGV for breeders who use natural mating sires in their herds.
Here, we deliberately tried to minimize the relationship between training and validation set individuals by K-means clustering to establish lower bounds for prediction accuracies. In practice, training will be performed on all genotyped animals and predictions will be implemented in young animals (creating higher genetic relationships between training and validation or implementation sets) and so even higher efficiencies for selecting on DGV are expected. However, parent average information (and/or the animal’s own records) could be blended with DGV information for selecting young animals, for example using a selection index approach . Saatchi et al.  combined the DGV and adjusted parent average information in a selection index for 16 traits in Angus beef cattle and showed that the accuracies of blended information are equal or a little higher than the accuracies of the most informative source of information (either DGV or parent average).
Progeny testing is a strategy that is commonly used to increase the accuracy of the predicted genetic merit of selection candidates but it increases generation intervals from 2.5 years when using DGV to about 5.5 years . Furthermore, progeny testing increases the cost of breeding operations and in cattle is practically limited only to males, which potentially can have many progeny, while DGV can be obtained for females with the same accuracies as for males. The availability of DGV creates new opportunities for both commercial Limousin and Simmental producers to identify superior animals in their herds. To date, this opportunity has been limited to seed stock animals enrolled in performance-recording programs. The beef industry will continue to need to record phenotypes to retrain genomic predictions for which accuracies will otherwise erode in successive generations [20, 29].