Genetic parameters and genomic breeding values for digital dermatitis in Holstein Friesian dairy cattle: host susceptibility, infectivity and the basic reproduction ratio

Background For infectious diseases, the probability that an animal gets infected depends on its own susceptibility, and on the number of infectious herd mates and their infectivity. Together with the duration of the infectious period, susceptibility and infectivity determine the basic reproduction ratio of the disease (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ R_{0} $$\end{document}R0). \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ R_{0} $$\end{document}R0 is the average number of secondary cases caused by a typical infectious individual in an otherwise uninfected population. An infectious disease dies out when \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ R_{0} $$\end{document}R0 is lower than 1. Thus, breeding strategies that aim at reducing disease prevalence should focus on reducing \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ R_{0} $$\end{document}R0, preferably to a value lower than 1. In animal breeding, however, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ R_{0} $$\end{document}R0 has received little attention. Here, we estimate the additive genetic variance in host susceptibility, host infectivity, and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ R_{0} $$\end{document}R0 for the endemic claw disease digital dermatitis (DD) in Holstein Friesian dairy cattle, and estimate genomic breeding values (GEBV) for these traits. We recorded DD disease status of both hind claws of 1513 cows from 12 Dutch dairy farms, every 2 weeks, 11 times. The genotype data consisted of 75,904 single nucleotide polymorphisms (SNPs) for 1401 of the cows. We modelled the probability that a cow got infected between recordings, and compared four generalized linear mixed models. All models included a genetic effect for susceptibility; Models 2 and 4 also included a genetic effect for infectivity, while Models 1 and 2 included a farm*period interaction. We corrected for variation in exposure to infectious herd mates via an offset. Results GEBV for \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ R_{0} $$\end{document}R0 from the model that included genetic effects for susceptibility only had an accuracy of ~ 0.39 based on cross-validation between farms, which is very high given the limited amount of data and the complexity of the trait. Models with a genetic effect for infectivity showed a larger bias, but also a slightly higher accuracy of GEBV. Additive genetic standard deviation for \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ R_{0} $$\end{document}R0 was large, i.e. ~ 1.17, while the mean \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ R_{0} $$\end{document}R0 was 2.36. Conclusions GEBV for \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ R_{0} $$\end{document}R0 showed substantial variation. The mean \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ R_{0} $$\end{document}R0 was only about one genetic standard deviation greater than 1. These results suggest that lowering DD prevalence by selective breeding is promising.


Background
Infectious disease transmission in a host population is a dynamic process. The probability that an animal is infected depends on its own susceptibility, on the number of infected contact animals ("group mates"), and on the infectivity of those group mates. The number and composition of the infectious contact animals in a population varies over time, since some animals get infected while others recover. Whereas state-of-the-art epidemiological models take this variation into account, it should probably also be taken into account to make optimal genetic inference. However, most genetic studies use linear mixed models that ignore the dynamics of the transmission process in the population, such as the variation in exposure of susceptible individuals to infectious contact individuals. Moreover, studies on host genetic variability commonly connect an individual's disease status to its own pedigree or genotype only, and therefore capture genetic effects related to susceptibility (or resistance) only [1,2]. However, individuals probably also differ in infectivity, as suggested for example by the observation of superspreaders, which are animals that infect substantially more contact individuals compared to a typical infectious animal [3]. Recently, studies on selection against disease transmission have started that include the infectivity of the herd mates [4][5][6][7][8].
Genetic inference on infectious diseases can probably be improved by using quantitative genetic models that are founded on epidemiological theory. Moreover, such models would give estimates of genetic variation and breeding values for fundamental epidemiological parameters, such as the basic reproduction number R 0 (see also below). Such knowledge would also facilitate the prediction of response to selection while accounting for the non-linear nature of infectious diseases, including phenomena such as positive feedback and the eradication of a disease when R 0 falls below 1 [9].
An animals' infectivity affects the disease status of other animals, rather than its own disease status. Thus, if the observed variation in infectivity has a genetic component, then infectivity is a so-called indirect genetic effect (IGE). IGE can have a considerable effect on the rate and direction of evolution by natural selection, and on response to selective breeding [10][11][12]. Hence, IGE can and should be used for the genetic improvement of populations whenever they play a role.
Together, susceptibility, infectivity, and the duration of the infectious period determine the basic reproduction ratio ( R 0 ), which is the average number of secondary cases caused by a typical infectious individual in a fully susceptible population [13]. R 0 contains information on the ability of an infection to transmit and establish itself in the population [13], and has a threshold value of 1; if R 0 is lower than 1, an infectious animal will infect, on average, less than one susceptible animal and the disease will die out with certainty. If R 0 is higher than 1, an epidemic disease can affect a substantial proportion of the population, while an endemic disease may persist in the population. For endemic diseases, the prevalence at the equilibrium state depends on R 0 , and is (ignoring differences in susceptibility between individuals) equal to 1 − 1 R 0 when R 0 is higher than 1. Because R 0 determines the prevalence of a disease in a group of animals (e.g., a herd), breeding strategies that aim at reducing the prevalence should focus on reducing R 0 , ideally to a value lower than 1 [4].
In this study, we focused on the endemic infectious disease digital dermatitis (DD). DD is a claw disorder that affects (mainly) the hind feet of (dairy) cattle [14,15]. Typically, affected cattle have a round lesion above the interdigital space next to the heel bulbs [16]. These lesions can be painful and prone to bleed, and can develop filiform papillae or be surrounded by hyperkeratotic skin with hairs longer than normal [15]. Severely affected cows show signs of lameness; they bear their weight on the toes of the affected foot, shake the foot as if in pain, and show reluctance to move [15,17,18]. The herd prevalence of DD in 383 dairy herds in the Netherlands was estimated at 21.2% in 2003, and within these herds cow level prevalence estimates ranged from 0 to 83.0% [19]. Thus, DD has an impact on the welfare of cows and, furthermore, causes economic losses for the farmer [20,21].
The prevalence of DD is affected by many factors, such as herd management, lactation stage, flooring system, climate, and breed [19]. Optimizing management strategies are one way to reduce the DD prevalence on a dairy farm [22]. An additional strategy could be to improve claw health through genetic selection [23][24][25]. As we will argue, this is best done by selecting for lower R 0 .
The objective of our work was to quantify the genetic variation in host susceptibility, host infectivity, and R 0 for DD in Holstein Friesian dairy cattle, and to investigate our ability to estimate genomic breeding values (GEBV) for these traits. In addition, for model validation, models with and without genetic variation for infectivity will be compared for their ability to predict whether susceptible (infection-free) claws of an animal get infected.

Phenotype data
Phenotypes for DD were collected on 12 dairy cattle farms in the Netherlands, between November 2014 and April 2015. Two observers (author FB being one of them) visited these farms 11 times with a 2-week interval between visits [26]. On each farm, one of the observers rinsed and scored the hind feet using the method of Relun et al. [27], while the other observer recorded cow ID and DD status. Six distinct classes were scored: skin on which lesions were macroscopically absent (M0), displayed a small lesion of 0 to 2 cm (M1), a lesion of more than 2 cm (M2), a lesion covered by a scab (M3), altered skin with dyskeratotis or surface proliferation (M4), and a small lesion in addition to altered skin (M4.1) [28][29][30]. These classes were divided into susceptible (M0) and infected (M1, M2, M3, M4, and M4.1). Farmers were not informed on the DD status of the cows, but were allowed to identify and treat affected cows using their normal routine. Phenotypes were collected on 1513 cows, of which 1401 were genotyped (see below). On average, a cow was scored 8.7 times. Table 1 shows several characteristics of the farms enrolled in the study.

Genotype data
The cows were genotyped with the Eurogenomics 10 K chip. A first round of quality control was performed before imputation, following the standard procedure of the breeding company CRV. A marker (single nucleotide polymorphism, SNP) was included only when its call rate was higher than 0.85, the deviation of the observed frequency from the expected Hardy Weinberg equilibrium frequency was less than 0.15, and the minor allele frequency was higher than 0.025. Furthermore, inconsistent genotypes between parents and offspring were set to missing. SNPs that passed the quality control were imputed to a set of 76,438 SNPs based on the Illumina BovineSNP50 chip and a custom chip from the breeding company CRV, with a reference population of more than 1000 animals with genotypes on both chips and the combination of Beagle [31] and PHASEBOOK [32] software. A second round of quality control was performed on the imputed data. A SNP was included only when there was no strong deviation from Hardy Weinberg equilibrium (p > 10 −15 ), the missing rate was lower than 0.05, and the minor allele frequency was higher than 0.02. In total, 75,904 SNPs passed the quality control and were included in the analysis.

Models
In this section, we develop a generalized linear mixed model (GLMM) founded on epidemiological principles, to estimate genetic parameters for susceptibility, infectivity and R 0 . To develop the GLMM, we need to find the probability that a susceptible cow gets infected between two observations. We present an epidemiological model, derive the infection probability from this model, and present the resulting GLMM, building on work of Velthuis et al. [5], Lipschutz-Powell et al. [6], Anacleto et al. [7], Anche et al. [4], and Biemans et al. [8].
DD is an endemic infectious disease with infections reoccurring in the same animals. DD is transmitted via the environment [33], where the environment is defined as any possible pathogen reservoir through which transmission can occur. Thus, we used a stochastic compartmental susceptible-infected-susceptible-model (SIS-model) with an environmental route (E) see e.g. [34] to model disease transmission (Fig. 1). An SIS-model has two categories of individuals: (1) non-infected individuals who can become infected; these are referred to as susceptible; and (2) infected individuals that are also infectious, and who can recover; these are referred to as infected. Hence, the term "susceptible" merely denotes a non-infected individual that can, in principle, become infected. It does not indicate a high degree of susceptibility. In the SIS-model, the infection of a susceptible claw occurs randomly with a probability per observation interval, depending on the parameters of the model, the number of infected claws in the group, and the infection pressure coming from the environment.
In the SIS-model with an environmental route ( E ), the expected rate with which susceptible claws get infected is βS I+E N , where I is the number of currently infectious claws, S the number of susceptible claws, and S + I = N the total number of claws in a group (twice the number of cows). Here, it is assumed that infected cows are immediately infectious to their herd mates. E is the infection pressure coming from the environmental reservoir, expressed as the equivalent number of currently infectious claws (i.e., I and E are on the same scale). β is the transmission rate parameter that contains information on the contact rate and transmission probability between individuals [35].
Because our interest is in genetic variation in susceptibility and infectivity among cows, we consider the pairwise β between a susceptible cow and her infected herd mate. This pairwise β depends on the susceptibility of the cow and the infectivity of the herd mate. Thus, the transmission rate parameter β ij from an infected claw of herd mate j , with infectivity ϕ j , to a susceptible claw of cow i , with susceptibility γ i , is: In Eq. 1, the overall contact rate c serves as a scaling parameter, so that mean susceptibility and infectivity are approximately 1, γ = ϕ = 1 , and mean logarithms are 0, log (γ ) ≈ log (ϕ) ≈ 0. The latter is relevant for the inclusion of random effects in mixed models, which are commonly assumed to have a mean of 0.
Thus, the expected (i.e., average) rate with which a single susceptible claw of cow i gets infected when exposed to all infected claws in the group depends on the susceptibility of cow i , on the number of infected claws in the group, and on their average infectivity: where cγ i j ϕ j I g is the pairwise transmission rate parameter averaged over the genotyped infected herd mates j of susceptible cow i , I g is the number of infected claws of herd mates that had genotype records, and I tot is the total number of infected claws at the start of the observation interval. We distinguished between I g and I tot because some of the cows were not genotyped. While we estimated infectivity for the genotyped cows only, the nongenotyped infected cows also contributed to transmission. To account for all infected claws, we assumed that the claws of the non-genotyped cows ( n = 112) had the same infectivity as their infected genotyped herd mates ( j ϕ j /I g ).
In the j ϕ j I g term (Eq. 2), we should ideally average the infectivity over all claws that contribute to the current infection pressure. This also includes the infection pressure via the environment of the claws that were infected at an earlier time. However, in the statistical software we did not manage to keep track of all the weighted genotypes of claws that were infected at earlier times. Therefore, we only included claws that were infected at the start of the observation interval in the j ϕ j I g term. In contrast, the E-term included infection pressure of the full number of previously infected claws, but these were not weighted according to their infectivity. Thus, our estimates of genetic variation in infectivity use only part of the variation in the infection pressure. This issue is further addressed in the "Discussion" section.
The probability that a susceptible claw becomes infected in an observation interval varies among observation intervals, because it depends on the number of infected herd mates, the infectivity of those herd mates, and on the infection pressure coming from the environment. To estimate this probability, we assumed that the transmission rate (probability per unit of time) is constant within the interval, which is the default assumption in disease transmission models. With this assumption, transmission follows a so-called Poisson process, where the number of transmissions within the interval has a Poisson distribution with a mean equal to the product of the rate and the length of the interval; µ = rate × �t . A claw becomes a case when it is infected at least once within the interval. Hence, the probability of becoming a case is the complement of the probability of no infection, P = 1 − e −µ , where e −µ is the probability of a zero outcome from a Poisson distribution. Using Eq. 2: where, P i (t) is the probability that a susceptible claw of i is a case (becomes infected) in interval t , given the number of infected herd mates, the infectivity of those herd mates, and on the infection pressure coming from the environment at the start of the interval.
The number of cases for each interval (counting process in discrete time) follows a binomial distribution with a probability that follows from a Poisson process within the interval (counting process in continuous time). Fig. 1 Susceptible-infected-susceptible model with environmental route Therefore, the complementary log-log is the appropriate link function to connect the explanatory variables to the expected value of the observed variable [36]. Thus, the GLMM follows from applying the complementary loglog transformation to Eq. 3: where log (c) is an intercept, log (γ i ) is the logarithm of susceptibility of the focal individual, log j ϕ j I g (t) the log of the mean infectivity of infected (genotyped) herd mates, and log I tot (t)+E(t) �t is an offset, i.e., a known "explanatory variable". The offset accounts for the infectious pressure coming from the infected cows (both genotyped and non-genotyped) at time t : ( I tot (t)/N (t)) and from the environment at t : ( E(t)/N (t) ), and for the length of the interval ( t ). Note that the dependent variable for Eq. 4 is the number of cases for a cow with at least one susceptible claw. For cows with one susceptible claw, the binomial total equals 1 and the number of cases takes values C = 0 or 1. For cows with two susceptible claws, the binomial total equals 2 and the number of cases takes values C = 0, 1 or 2.
Equation 4 is linear in the logarithm of susceptibility, but not in the logarithm of infectivity. To allow the fitting of a linear model, we linearized the model term for infectivity following [37] in the models in which the infectivity of the group mates was included (Models 2 and 4, see below): Equation 5 follows from approximating the j ϕ j I g (t) , which is an arithmetic mean, by the corresponding geometric mean (see [37] for details). The errors caused by this approximation are less than 5% for infectivity effects up to a factor of 3 (i.e., ϕ j between 0.33 and 3.0). However, this error estimation is based on a model with SNP effects rather than polygenic breeding values. The size of the error is not known for a trait determined by many genes each with a small effect. The method will anyway take the real differences in exposure for different observation periods on the different farms into account. We acknowledge that this is a relevant issue that requires further investigation.
We calculated the infectious pressure coming from the environment ( E(t) ) as described in detail in Biemans et al. [26]. In short, claws that were infected at an earlier time could still contribute partly to the environmental reservoir at the moment of observation. Their contribution was assumed to decrease each interval t by a factor , which may be interpreted as a survival rate of the pathogen. The estimate for this survival rate is 0.9 [26]. Thus, the number of pathogens in the environment coming from a claw that was infected at t , is a fraction 0.9 at t + 1 , a fraction 0.9 2 = 0.81 at t + 2 , a fraction 0.9 3 = 0.729 at t + 3 , etc. Therefore, the values for E(t) were calculated as: where I tot (t − 1) is the total number of infected claws at t − 1 , and E(t − 1) is the infectious pressure coming from the environmental reservoir at t − 1.
Because we did not observe the number of infected claws in the period before the first observation ( I tot (t0) ), we estimated this number with a linear model. For each farm, we fitted the model to the number of infected claws over the observation period. The intercept of the model, I tot (t = 0) , was used as the average number of infected claws on that particular farm in the periods before observations started ( I tot (t0) ). Thereafter, the value for the environmental reservoir at the first observation was estimated as, E(t = 0) = 0. 9 1−0.9 I tot (t ≤ 0) , and was used in Eq. 6.

Implementation of the model
With the GLMM, we modelled the expectation of the number of cases over the number of susceptible feet (claws) of cow i within the interval t , P i (t) = E C i (t) F i (t) . Only the hind feet of the cows were scored, so a susceptible cow could have one or two susceptible feet ( F ) at the start of an interval, which were zero, one, or two cases by the end of the interval. Thus, the number of cases C (0, 1 or 2) for each susceptible cow followed a binomial distribution with binomial total F (1 or 2).
We tested four models (Table 2). Model 1 included a genetic effect for susceptibility only: where c 0 is the intercept. The fixed effects were farm ( Farm k with k = A to L), period ( Period τ with τ = 1 to 10), parity ( Parity l with l = 1, 2, or > 2), and months in milk ( MIM m , a continuous covariate with m = 1 to 12). Random effects included an interaction between farm and (6) period ( Farm k .Period τ with k = A to L and τ = 1 to 10), a non-genetic permanent animal effect for the susceptible animal i ( Animal i ) to account for repeated observations on the same animal in different periods, and an additive genetic effect for susceptibility of animal i ( log(γ ) ∼ N 0, Gσ 2 a , where G is the genomic relationships matrix among animals).
Model 2 included genetic effects for both susceptibility and infectivity: where j log(ϕ j ) are the random genetic effects for infectivity of the infected group mates j of animal i , with ( log (ϕ) ∼ N 0, Gσ 2 a ). We expected that the interaction between farm and period could be partly confounded with the genetic effect for infectivity, because previous IGE studies showed that omitting group-effects may substantially inflate estimated genetic parameters for infectivity [38]. To investigate this issue, we dropped the farm by period interaction from Models 1 and 2, resulting in Models 3 and 4, respectively.

Data analyses
The G-matrix was computed using method 1 of Van-Raden [39], with the calc_grm software [40]. We fitted the four models with ASReml v4.1.0 [41]. Model fit was assessed with Akaike information criterion (AIC). The susceptibility and infectivity estimates from ASReml were on the log scale because of the complementary log-log link function (Eq. 5). These random effects are zero on average. By taking the exponent of the estimates, we obtained the genomic estimated breeding values (GEBV) for susceptibility and infectivity, relative to a typical (average) individual that has a GEBV of 1 (thus log (γ ) = log (ϕ) = 0 , therefore γ ≈ ϕ ≈ 1).

Cross-validation
We validated the GEBV to investigate their bias and accuracy with a 12-fold cross-validation on all four models. In each analysis, all the cases ( C ) of one of the 12 farms were censored from the dataset. For each susceptible claw of cow i at interval t at the censored farm, we predicted the dependent variable of the GLMM, C i (t)/F i (t) , based on the information of the other 11 farms. In the following, we refer to this prediction as the predicted probability P i (t).
However, as the fixed effects were nonlinear on the normal scale because they were estimated with a complementary log-log link function, correction of the observed records for fixed effects was not straightforward. To solve this issue, we translated both the predicted probabilities and the observed records to a standard (i.e., average) farm. Subsequently, we validated the models using a weighted correlation and a regression of observed records on predicted probabilities (see "Appendix" for detailed methods). To calculate the correlation and regression coefficients, we used observations and predictions averaged over the number of times an animal was susceptible at the start of an interval, and this number was used as the weight.

Genetic variance and breeding values for the basic reproduction ratio
For the best fitting model (lowest bias, highest accuracy), we calculated the genetic variance and GEBV ( A R 0 ,i ) for the basic reproduction ratio.
The additive genetic variance of R 0 was calculated as [4]: where the approximation follows from γ ≈φ ≈ 1.
The GEBV for the basic reproduction ratio is the product of the GEBV for susceptibility ( γ i ), the GEBV for infectivity ( φ i ) , the contact rate ( c ), and the average duration of the infectious period ( 1/α ) [4]:

Table 2 Overview of the fixed and random effects included in the four models
All models contained fixed effects for farm, period, parity, and months in milk; and a non-genetic random animal effect for the susceptible animal With a previously estimated R 0 for DD of 2.36 on these farms [26] and the average product of the estimated relative susceptibility and relative infectivity, the value for c/α was calculated as c/α = 2.36 γ iφi . The breeding value for R 0 in Eq. 7 is on the absolute scale, and thus has an average equal to R 0 , rather than zero. This is convenient, because absolute breeding values for R 0 can be interpreted relative to a value of 1, which is the threshold for eradication of a disease. Note that, in Models 1 and 3, variation in infectivity is not estimated. Hence, for these models we used φ i = 1 for all individuals. GEBV are prone to overestimation, as illustrated by regression coefficients of validation phenotypes on predicted GEBV, which are often less than 1 e.g. [42]. To avoid overestimation of the variation in breeding values for R 0 , we shrunk the estimates for the transmission rate parameter where b is the regression coefficient of the observed C/F on the predicted probability, averaged over cross-validation sets. The corrected individual breeding values for the basic reproduction ratio were: After this correction, the regression coefficient of the observed C/F on the predicted probability in cross-validation was 1, implying that the GEBV have the correct variance. Moreover, when the regression coefficient of phenotypes on predicted breeding values equals 1, and observations and predictions follow an approximate normal distribution, then the accuracy of EBV equals the ratio of the standard deviations, cor Â R 0 , A R 0 = σÂ R 0 /σ A R 0 [43]. We used this result to obtain the approximate accuracy of the GEBV for R 0 .

Fixed effects
The farm effect was significant (P < 0.05) in Models 1 and 3, but not in Models 2 and 4. For all models, there was a significant effect for period, parity, and months in milk. The probability of becoming infected during an interval increased during the first six periods and stabilized thereafter. The transmission rate parameter increased with increasing parity; it was 21% higher for parity 2 compared to parity 1, and 69% higher for parities greater than 2 compared to parity 1. For months in milk, the transmission rate parameter decreased by 4% with every month in milk. A table of fixed effect estimates and standard errors on the log scale from Model 1 is presented in Additional file 1.β

Estimated variance components
For Models 1, 2, and 4, the estimated genetic variance for susceptibility was about 0.55, and was strongly significant. These models had a genetic effect for infectivity and/or the interaction term between farm and period. For Model 3, the estimated genetic variance for susceptibility was smaller (0.49); this model had neither a genetic effect for infectivity, nor a term for the interaction between farm and period. Similarly, the variance of the non-genetic random animal effect was about 0.95 for Models 1, 2, and 4; and was smaller, ~ 0.92, for Model 3. The estimated variance for infectivity was large, and not significant for Model 2. The interpretation of the magnitude of these variance components is discussed below, in the section on R 0 . Figure 2 shows the infectivity estimates from Model 4 plotted against the infectivity estimates from Model 2. The estimates varied from − 3.20 to 3.80 for Model 2 and from − 8.76 to 9.04 for Model 4. Thus, the GEBV for infectivity from Model 2 showed less variation compared to the GEBV from Model 4; part of the variation that is attributed to the interaction between farm and period in Model 2, is attributed to the genetic infectivity effect in Model 4. This suggests that the GEBV for infectivity from Model 4 include both a genetic and a non-genetic component, and may therefore be inflated. Figure 3 shows the weighted linear regression of the average observed number of cases over the number of susceptible feet ( C/F ) on the average predicted probability, and their correlation. Bias was smallest for models

Susceptibility Infectivity
Farm*period Animal without the infectivity effect (Models 1 and 3), since regression coefficients of these models were closest to 1. The weighted correlations coefficients were higher for models with the infectivity effect (Models 2 and 4). Thus, models without an infectivity effect showed significantly lower bias, while models with an infectivity effect had a slightly higher accuracy. The higher bias for Model 4 compared to Model 2 agrees with results in Fig. 2, and confirms that omitting the random farm-by-period interaction inflates the variation in GEBV for infectivity.

Basic reproduction ratio
The genomic estimated breeding values (GEBV) for susceptibility corrected for bias were approximately the same for Models 1, 2 and 4, and ranged from 0.49 to 2.81; the GEBV from Model 3 ranged from 0.50 to 2.53. A cow with a susceptibility GEBV of 0.50 is genetically about two times less susceptible than an average cow, whereas a cow with a susceptibility GEBV of 2.50 is genetically about 2.5 times more susceptible than an average cow. The GEBV for infectivity showed substantially more variation than the GEBV for susceptibility. We chose to illustrate the variation in GEBV for R 0 using results from Model 3, which is the most conservative model as judged by the estimated genetic variance in susceptibility (smallest value) and the bias. For this model, we calculated individual breeding values for the basic reproduction ratio ( Â R 0 ,i ) corrected for the bias that we found from the cross-validation. The average GEBV for susceptibility was 1.081, which is slightly higher than 1 because of the transformation from the log scale to the normal scale. In Model 3, there was no infectivity effect, thus infectivity was 1.00 for all individuals. With an average R 0 for DD of 2.36 on our farms [26], c/α was estimated at 2.18. GEBV for R 0 ranged from 1.089 to 5.515 (Fig. 4) (the GEBV not corrected for bias ranged from 0.62 to 6.68). This result indicates very substantial genetic variation in R 0 . The expected prevalence in a population of individuals similar to the genetically best individual equals approximately 1 − 1 R 0 = 1 − 1 1.089 = 8.2% [45], while the corresponding value for the genetically worst individual equals 81.9%.
The estimated additive genetic standard deviation for R 0 was 1.17 (this value is corrected for bias). This result shows that the current R 0 of 2.36 [26] is only about one genetic standard deviation greater than 1. Hence, this suggests that a genetic improvement of R 0 by a bit more than one genetic standard deviation would be sufficient to eradicate DD.
The approximate accuracy of Â R 0 ,i follows from the ratio of the standard deviation of GEBV for R 0 over the additive genetic standard deviation of the true breeding values for R 0 ; ρÂ . This accuracy equals ~ 39%. To be conservative, we did not correct (i.e., shrink) the σ A R 0 in the denominator of this expression, while we did correct the numerator for the bias. Note that this accuracy is based on validation within a generation, rather than on validation forward in time. The accuracy based on validation forward in time is relevant for response to selection, and may be somewhat lower because genetic relationships may be a little weaker.

Discussion
We estimated the additive genetic variation in host susceptibility, infectivity, and the basic reproduction ratio ( R 0 ) for DD in dairy cattle. Furthermore, we calculated GEBV for susceptibility, infectivity, and R 0 for each animal. Four models were compared for their ability to predict whether a susceptible animal becomes infected. All four models included a genetic effect for susceptibility; Models 2 and 4 also included a genetic effect for infectivity, while Models 1 and 2 included an interaction term between farm and period. In all models, the estimates are corrected for the variation in exposure of the susceptible individuals to infected group mates via the offset of the model. The estimated additive genetic standard deviation for R 0 was large, ~ 1.17, and the mean R 0 (2.36) was only about one genetic standard deviation greater than the important threshold value of 1. Furthermore, GEBV for R 0 (corrected for bias) showed large variation, ranging from 1.089 to 5.515, and the approximate accuracy of GEBV for R 0 was ~ 0.39. These results show that genetic selection against DD is very promising; there is substantial heritable variation and a meaningful accuracy can be obtained from a limited amount of data.
Farm, parity, period, and months in milk of the focal cow were included in the models as fixed effects. The transmission rate parameter was 21% higher for parity 2 compared to parity 1, and 69% higher for parity greater than 2 compared to parity 1. The prevalence also increased with parity. This is in contrast with most previous studies where DD was most prevalent in first and second parity cows [15,46]. For months in milk, the transmission rate parameter decreased by 4% per month in milk. This is in agreement with Argáez-Rodríguez et al. [46] who found that cows had the highest risk of getting DD in the first and third months of lactation, after which the risk decreased. The potential effect of parity and months in milk on the infectivity of a cow was not considered, because incorporating these factors in the summed effects of the infected group mates of a focal cow was methodologically too challenging.
We managed, only partly, to account for genetic effects on infectivity. For technical reasons, we included only the Fig. 3 Weighted linear regression and correlation coefficients between the average observed number of cases over the number of susceptible feet ( C/F ) and the average predicted probability for the observations. Regression coefficients smaller than 1 indicate over prediction genetic effects of claws that were infected at the start of the observation interval ( t ) in the statistical model. However, the majority of the infection pressure originated from earlier infected claws that still contribute to transmission via the environment. We estimated a survival rate (λ) of the pathogen of 0.9 [26], meaning that 90% of the total infectious pressure originates from infectivity present in the environment from claws that were infected before the start of the observation interval. This suggests that we missed a large part of the potential heritable variation in infectivity. Hence, the relevance of genetic variation in infectivity for DD may be larger than suggested by the estimates presented here (Table 3). Unfortunately, in the statistical software we did not manage to keep track of all the weighted genotypes (i.e., weighing the genotypes of the infected individuals for their contribution to the environment). Nevertheless, the accuracy of predicting the phenotype from the cross-validation for the model with genetic effects of infectivity (Model 2) was 9% higher than the accuracy for the corresponding model without these effects (Model 1). Thus, even including a small proportion of the apparent variation in infectivity in the model appears to increase the accuracy of GEBV.
Estimates of infectivity show relatively larger standard errors because variation in infectivity must be estimated indirectly, unlike variation in susceptibility. Infectivity estimates are based on the number of susceptible group mates of an infectious individual that become infected, and on differences in genotype among the infected group mates at different time points. When there are multiple infected group mates, the accuracy of the infectivity estimates decreases [7]. Especially in large groups, as in this study (~ 100 cows), more records and groups are needed to estimate genetic variation in infectivity accurately [47]. This issue is very similar to the estimation of indirect genetic effects from large groups. In addition, we observed DD transmission for a relatively short period of time. More accurate estimation of genetic variation in infectivity requires data that are recorded over longer time periods (i.e., to be able to observe the entire infectious period for the majority of the cows), more groups (herds), a better statistical model (i.e., inclusion of the genetic effects from earlier infected cows of which the pathogens are still present in the environment), and the inclusion of infectivity effects of cows without genotypes. The latter could be done using single-step GBLUP [48].
In the first two models, we included an interaction between farm and period to account for non-genetic effects of infectivity. This interaction serves to avoid overestimation of the genetic variance in infectivity [49], similar to the inclusion of a random group effect in the analysis of indirect genetic effects [38,50]. The genetic effect for infectivity and the interaction term were partly confounded, because both terms have an effect on the number of susceptible animals that become a case within a certain period on a certain farm. However, confounding is not complete because of genetic relationships between the infected animals across farms and periods. Our results also suggest that inclusion of a random farm-by-period effect is essential to avoid overestimation of the genetic variation due to infectivity.
The estimated variances for susceptibility (genetic and non-genetic) were lower for Model 3 that included neither a genetic effect for infectivity nor an interaction between farm and period. Anacleto et al. [7] showed that estimates for susceptibility are less accurate when genetic variation in infectivity is not accounted for. Indeed, we also found a slightly higher correlation in the cross-validation when infectivity was included in the model, although this was accompanied by an inflation of the GEBV, as shown by the regression coefficients in Fig. 3. However, inflation of GEBV can be remedied by shrinking them based on the results of cross-validation, whereas a reduction in correlation cannot. Therefore, even when genetic variation in infectivity is small, it might be beneficial to include infectivity in the model to more accurately estimate susceptibility GEBV [7].
In the cross-validation, we estimated a weighted correlation of about 0.2 between the observed and predicted number of cases over the number of susceptible feet. This value can be used to approximate the accuracy of the GEBV ( r g, g ) [51], where, r p, g is the correlation between the observations and the predictions, and h 2 is the heritability of the trait. Heritability estimates for digital dermatitis from previous studies range from 0.05 to 0.29, depending on the model used [52]. Assuming a heritability of 0.28 for DD [53], the accuracy of the predicted number of cases is 0.37.
In general, studies on genetic variability of infectious diseases commonly focus on individual differences in susceptibility only, and those differences are estimated with linear models that ignore variation in exposure between individuals. In this study, we used a GLMM founded on epidemiological theory to estimate genetic variability in susceptibility and infectivity. An important advantage of models founded on epidemiological theory is that they provide estimates for epidemiological parameters, such as R 0 , that are interpretable in the context of infectious disease dynamics. Our results, for example, show that R 0 is only slightly more than one genetic standard deviation away from the threshold value of 1, which suggests that eradication of DD by genetic improvement is in principle feasible. Further work is needed to quantify the benefits of GLMM based on epidemiological theory over simpler linear models, to better account for the potential genetic variation in infectivity via the environment, and to include genetic variation in the duration of the infectious period.

Conclusions
Genetic variance components for susceptibility and infectivity for digital dermatitis were estimated with four generalized linear mixed models. We managed, only partly, to account for genetic effects on infectivity. We estimated GEBV for the basic reproduction ratio from a (conservative) model including genetic effects for susceptibility only. GEBV for R 0 were corrected for bias, and showed substantial variation, ranging from 1.089 to 5.515. The mean R 0 (2.36) was only about one genetic standard deviation greater than 1. Based on cross-validation between farms, the approximate accuracy of GEBV for R 0 was 0.39, in spite of the relatively small dataset of only 12 genotyped herds. These results suggest that lowering prevalence of DD by selective breeding is very promising.

Cross-validation
In the 12-fold cross-validation, we censored the number of cases in each period and predicted for each susceptible animal the number of cases over the number of susceptible feet ( C i /F i ) based on information of the 11 other farms. In general, the predicted probability for the number of cases over the number of susceptible feet can be calculated with the estimated effects. Because of the complementary log-log link function, these effects need to be back-calculated to the original scale: when the genetic effect for infectivity was not included (Models 1 and 3), and when the genetic effect for infectivity was included (Models 2 and 4).
In Eqs. (9) and (10), P i (t) is the predicted probability for the number of cases over the number of susceptible feet for animal i in the time from t to t + 1 , FE is the sum of the estimates for the fixed effects, i.e., the (9) P i (t) = 1 − e −e FE+ log(γ i )+log FE+l og(γ i )+ 1 Ig (t) j l og ϕ j +log I tot (t)+E(t) N (t) �t estimates for the intercept, farm k , period τ , parity l , and months in milk. The log(γ i ) is the estimated genetic effect for susceptibility on the log scale and j log ϕ j is the sum of the estimated genetic effects for infectivity of the infected group mates of the susceptible individual on the log scale. The last term in Eqs. (9) and (10) is the offset. In the 12-fold cross-validation, the cases ( C ) of one entire farm were censored from the dataset, therefore, the random effects for the interaction between farm and period, and non-genetic animal effect for animal i could not be estimated for this farm. Thus, these random effects did not contribute to the predicted probabilities, and they are not included in Eqs. (9) and (10).
To validate the estimated genetic effects, we wanted to estimate the probability that an animal would be a case during an interval independent of the fixed effects in the model. We achieved this independence by standardizing both P i (t) and C i (t)/F i (t) . We obtained the regression coefficients for the fixed effects for each of the four models that were applied to the full dataset where no data was censored. With these regression coefficients, we calculated the average value of summed fixed effects FE .
The FE can be interpreted as a standard/average farm and a standard/average cow. This FE was used in Eqs. (9) and (10) instead of the estimated fixed effects: for Models 1 and 3, and for Models 2 and 4.
Here, P i (t) * is the predicted probability for the number of cases over the number of susceptible feet for animal i in a period, as if it were an average cow with an average parity and months in milk, during a standard period, on a standard farm. Note that the genetic susceptibility and infectivity did differ between cows, and thus had an effect on the predictions.
Similarly, we wanted to standardize the observed cases over the number of susceptible claws (C i (t)/F i (t)) , so that they would be independent of the fixed effects that contributed to that observation. The observations were transformed to the complementary log-log scale so that they were linear in the effects: log(− log 1 − C i (t) F i (t) = FE + log (γ i ) + log I tot (t) + E(t) N (t) �t for Models 1 and 3, and, for Models 2 and 4. Next, the summed fixed effects FE in Eqs. (13) and (14) were replaced with the average value of the summed fixed effects FE , and back-calculated to the original scale to obtain the observed number of cases over the number of susceptible feet independent of the fixed effects ((C i (t)/F i (t)) * ) , for all models: Here, (C i (t)/F i (t)) * is the observed number of cases over the number of susceptible feet for animal i in a period, as if observed on an average cow with an average parity and months in milk, on a standard farm. Again, the genetic susceptibility and infectivity did differ between cows (see Eqs. (13) and (14)), and affected the dependent variable.
We calculated weighted correlation coefficients between the "corrected" observed number of cases over the number of susceptible feet ( (C i (t)/F i (t)) * ) and the "corrected" predicted probabilities ( P i (t) * ), averaged over the number of times an animal was susceptible at the start of an interval, and this number was used as the weight. (14) log(− log 1 − C i (t) F i (t)