Spatial modelling improves genetic evaluation in smallholder breeding programs

Background Breeders and geneticists use statistical models to separate genetic and environmental effects on phenotype. A common way to separate these effects is to model a descriptor of an environment, a contemporary group or herd, and account for genetic relationship between animals across environments. However, separating the genetic and environmental effects in smallholder systems is challenging due to small herd sizes and weak genetic connectedness across herds. We hypothesised that accounting for spatial relationships between nearby herds can improve genetic evaluation in smallholder systems. Furthermore, geographically referenced environmental covariates are increasingly available and could model underlying sources of spatial relationships. The objective of this study was therefore, to evaluate the potential of spatial modelling to improve genetic evaluation in dairy cattle smallholder systems. Methods We performed simulations and real dairy cattle data analysis to test our hypothesis. We modelled environmental variation by estimating herd and spatial effects. Herd effects were considered independent, whereas spatial effects had distance-based covariance between herds. We compared these models using pedigree or genomic data. Results The results show that in smallholder systems (i) standard models do not separate genetic and environmental effects accurately, (ii) spatial modelling increases the accuracy of genetic evaluation for phenotyped and non-phenotyped animals, (iii) environmental covariates do not substantially improve the accuracy of genetic evaluation beyond simple distance-based relationships between herds, (iv) the benefit of spatial modelling was largest when separating the genetic and environmental effects was challenging, and (v) spatial modelling was beneficial when using either pedigree or genomic data. Conclusions We have demonstrated the potential of spatial modelling to improve genetic evaluation in smallholder systems. This improvement is driven by establishing environmental connectedness between herds, which enhances separation of genetic and environmental effects. We suggest routine spatial modelling in genetic evaluations, particularly for smallholder systems. Spatial modelling could also have a major impact in studies of human and wild populations.

shows the average rank correlation for the top 100 individuals between the true and estimated or predicted breeding values for all levels of genetic connectedness, using pedigree or genomic data, when the herd locations were simulated from a bi-variate normal distribution with mean equal to the village locations, and variance 3.5 · 10 −4 I 2×2 (intermediate herd clustering). Table S1: Average rank correlation for the top 100 individuals over 60 replications for the different levels of genetic connectedness, using pedigree or genomic markers, and for both estimated breeding values (EBV) and predicted breeding values (PBV). The standard error had order of magnitude 10 −2 .

Weak
Intermediate Strong

Changing proportion of spatial variance
Here we show the average accuracy and CRPS between TBV and EBV or PBV for all levels of genetic connectedness, using both pedigree and genomic data, when varying the proportion of spatial variance relative to the sum of spatial variance and herd effect variance. The herd locations were simulated from a bi-variate normal distribution with mean equal to the village locations, and variance 3.5 · 10 −4 I 2×2 (intermediate herd clustering). Table S2 and Table S3 respectively show accuracy and CRPS for weak genetic connectedness. Table S4 and  Table S5 respectively show accuracy and CRPS for intermediate genetic connectedness . Table S6 and Table S7 respectively show accuracy and CRPS for strong genetic connectedness.       Table S6: Average accuracy for EBV and PBV with strong genetic connectedness, using pedigree or genomic data, with varying proportion of spatial variance. The standard error for some values had order of magnitude 10 −2 , and most had 10 −3   Changing the herd clustering Table S8 and Table S9 respectively show average accuracy and CRPS between TBV and EBV/PBV for all levels of genetic connectedness, using both pedigree and genomic data, when the herd locations were simulated from a bi-variate normal distribution with mean equal to the village locations, and variance 1 · 10 −4 I 2×2 (strong herd clustering). Table S10 and Table S11 respectively show average accuracy and CRPS between TBV and EBV/PBV for all levels of genetic connectedness, using both pedigree and genomic data, when the herd locations were simulated from a bi-variate normal distribution with mean equal to the village locations, and variance 9 · 10 −4 I 2×2 (weak herd clustering).  Table S10: Average accuracy for the different levels of genetic connectedness for EBV and PBV, using pedigree or genomic data, and the herd locations simulated using variance 9 · 10 −4 I 2×2 (weak herd clustering). The standard error for some values had order of magnitude 10 −2 , and most had 10 −3

Weak
Intermediate Strong EBV PBV EBV PBV EBV PBV  Table S11: Average CRPS for the different levels of genetic connectedness for EBV and PBV, using pedigree or genomic data, and the herd locations simulated using variance 9 · 10 −4 I 2×2 (weak herd clustering). The standard error for all values had order of magnitude 10 −3 Correlation between true spatial effect and EBV with changing herd clustering Table S12 shows the average correlation between the EBV and the true spatial effect for all levels of genetic connectedness, using pedigree or genomic data, when the herd locations were simulated from a bi-variate normal distribution with mean equal to the village locations, and variance 1 · 10 −4 I 2×2 (strong herd clustering). Table S13 shows the average correlation between the EBV and the true spatial effect for all levels of genetic connectedness, using pedigree or genomic data, when the herd locations were simulated from a bi-variate normal distribution with mean equal to the village locations, and variance 3.5 · 10 −4 I 2×2 (intermediate herd clustering). This is an extended table from the main paper where model GHSC is included. Table S14 shows the average correlation between the EBV and the true spatial effect for all levels of genetic connectedness, using pedigree or genomic data, when the herd locations were simulated from a bi-variate normal distribution with mean equal to the village locations, and variance 9 · 10 −4 I 2×2 (weak herd clustering). Table S12: Average correlation between EBV and true spatial effect for all levels of genetic connectedness, using pedigree or genomic data, and the herd locations simulated using variance 1 · 10 −4 I 2×2 (strong herd clustering  Table S14: Average correlation between EBV and true spatial effect for all levels of genetic connectedness, using pedigree or genomic data, and the herd locations simulated using variance 9 · 10 −4 I 2×2 (weak herd clustering

Real data
For the models G, GH, GS and GHS applied to the full real data, we present the DIC in Table S15.