Genetic analysis of infectious diseases: estimating gene effects for susceptibility and infectivity
- Mahlet T. Anche^{1, 2}Email author,
- P. Bijma^{1} and
- Mart C. M. De Jong^{2}
Received: 17 December 2014
Accepted: 16 October 2015
Published: 4 November 2015
Abstract
Background
Genetic selection of livestock against infectious diseases can complement existing interventions to control infectious diseases. Most genetic approaches that aim at reducing disease prevalence assume that individual disease status (infected/not-infected) is solely a function of its susceptibility to a particular pathogen. However, individual infectivity also affects the risk and prevalence of an infection in a population. Variation in susceptibility and infectivity between hosts affects transmission of an infection in the population, which is usually measured by the value of the basic reproduction ratio R _{ 0 }. R _{ 0 } is an important epidemiological parameter that determines the risk and prevalence of infectious diseases. An individual’s breeding value for R _{ 0 } is a function of its genes that influence both susceptibility and infectivity. Thus, to estimate the effects of genes on R _{ 0 }, we need to estimate the effects of genes on individual susceptibility and infectivity. To that end, we developed a generalized linear model (GLM) to estimate relative effects of genes for susceptibility and infectivity. A simulation was performed to investigate bias and precision of the estimates, the effect of R _{ 0 }, the size of the effects of genes for susceptibility and infectivity, and relatedness among group mates on bias and precision. We considered two bi-allelic loci that affect, respectively, the individuals’ susceptibility only and individuals’ infectivity only.
Results
A GLM with complementary log–log link function can be used to estimate the relative effects of genes on the individual’s susceptibility and infectivity. The model was developed from an equation that describes the probability of an individual to become infected as a function of its own susceptibility genotype and infectivity genotypes of all its infected group mates. Results show that bias is smaller when R _{ 0 } ranges approximately from 1.8 to 3.1 and relatedness among group mates is higher. With larger effects, both absolute and relative standard deviations become clearly smaller, but the relative bias remains the same.
Conclusions
We developed a GLM to estimate the relative effect of genes that affect individual susceptibility and infectivity. This model can be used in genome-wide association studies that aim at identifying genes that influence the prevalence of infectious diseases.
Background
New and existing infectious diseases represent a major and increasing threat to domestic plants and animals, and to humans. Infectious diseases of animals are a worldwide concern, particularly because of their effects on the productivity and welfare of livestock and also because of their zoonotic threats to human health. In spite of the availability of antibiotic and vaccine treatments, the undesirable environmental impact of antibiotic treatments, the rapid evolution of bacteria to develop resistance to antibiotics and of viruses to escape vaccine protection illustrate the need for additional control strategies that can provide a useful complement to the currently used interventions to control disease [1].
Host susceptibility and tolerance are two of the ways that individuals respond to pathogens. Several studies on the genetics of diseases in animals have shown that the host’s susceptibility and tolerance to infectious diseases have a genetic basis, and thus that genotypic differences exist between individuals regarding their susceptibility and tolerance to infectious challenges [2]. A number of genome-wide association studies (GWAS) have reported single nucleotide polymorphisms (SNPs) associated with susceptibility to various infectious diseases [3, 4].
Most genetic approaches that aim at reducing the prevalence of an infection assume that an individual’s disease status (infected/not-infected) is solely a function of its own genes and of non-genetic factors [2]. Hence, these methods capture only the genetic variation in susceptibility or tolerance (strictly, this latter statement is restricted to the measurement of disease occurrence in groups of unrelated individuals [5]). However, the prevalence and dynamics of an infection depend also on the infectivity of infected individuals in the population. Moreover, accumulating evidence on the existence of “superspreaders” in the outbreaks of epidemics suggests that (phenotypic) variation in infectivity exists among hosts [6]. Thus, the classical quantitative genetic approach of disease analysis based on individual disease status will capture only part of the heritable variation that is present in the host population and affects the dynamics of infectious diseases [7].
Between-host variation in susceptibility and infectivity affects the transmission of an infection in the population. This effect is measured by the value of the basic reproduction ratio R _{ 0 }. R _{ 0 } is defined as the average number of secondary cases produced by one typical infectious individual during its entire infectious lifetime, in an otherwise naïve population [8]. R _{ 0 } has a threshold value of 1, which implies that a major disease outbreak or a stable endemic equilibrium can only occur when R _{ 0 } is greater than 1. When R _{ 0 } is less than 1, the epidemic will die out. Thus, in order to reduce disease incidence and therewith prevalence, breeding strategies should aim at reducing R _{ 0 }, preferably to a value less than 1.
Genetic improvement that aims at reducing R _{ 0 } should be based on individual breeding values for R _{ 0 }. An individual’s breeding value for R _{ 0 } is the sum of the average effects of its alleles on R _{ 0 } [5], which means that investigating the effects of genes on R _{ 0 } is relevant. Anche et al. [5] showed that an individual’s breeding value for R _{ 0 } is a function of its genotype for susceptibility and infectivity, and of the population’s average susceptibility and infectivity. Thus, in order to estimate effects of genes on R _{ 0 }, the susceptibility and infectivity effects of the different alleles must be estimated.
Disease data are often available only in binary form (0/1) i.e. the value indicates whether an individual has become infected or not. Hence, methods for genetic analyses of disease traits have to be tailored to such data. Generalized linear models (GLM) are commonly used to analyse binary data, where the expected value of the binary response variable is linked to the explanatory variables (traits) by a linear equation after applying a link function [9]. Velthuis et al. [9] showed that the effect of susceptibility and infectivity of hosts on the transmission rate parameter β can be estimated by fitting a GLM with a complementary log–log link function to binary disease data. Lipschutz-Powell et al. [10] showed that a GLM with a complementary log–log link function can be used to link the probability of an individual to be infected to the susceptibility genotype of the individual itself and the infectivity genotypes of its infectious contacts. However, they observed that the infectivity component of the model was non-linear, and did not provide an explicit GLM or investigate the quality of estimates resulting from such a GLM.
In this study, we developed a GLM to estimate the relative effects of genes on individual susceptibility and infectivity, and investigated the quality of the resulting estimates in terms of bias and precision. We also investigated the effect of R _{ 0 }, different sizes of the effects of susceptibility and infectivity genes and population structure with respect to relatedness on bias and precision of the estimates. The GLM was fitted to binary disease data (0/1) recorded at the end of the epidemic. Thus, the data analysed were counts of infected individuals of different genotypes. These data were obtained from a simulated genetically heterogeneous population in which individuals differed in susceptibility and infectivity.
Methods
Population structure
We assumed a diploid population with between-host genetic heterogeneity in susceptibility and infectivity. We modelled genetic heterogeneity in this population using two bi-allelic loci, one locus for the susceptibility effect \((\gamma )\) with alleles G and g and susceptibility values \(\gamma_{G}\) and \(\gamma_{g}\), and one locus for the infectivity effect \((\varphi )\) with alleles F and f and infectivity values \(\varphi_{F}\) and \(\varphi_{f}\), respectively. Both loci were assumed to have multiplicative allelic effects and the reason for this assumption is explained in the section “Generalized linear models”.
Epidemiological model of disease dynamics
Disease dynamics that are caused by a microparasitic infection can be modelled with a basic compartmental stochastic susceptible, infected and recovered (SIR) model. In this model, two possible events can occur: infection of a susceptible individual, and recovery of an infectious individual [11]. With stochasticity, these events occur randomly at a certain rate (probability per unit of time) specified by the model parameters and the state variables. In the SIR-model, these parameters are the transmission rate parameter (β) for S → I with rate \(\beta \frac{SI}{N}\), and the recovery rate parameter (α) for I → R with rate \(\alpha I\), where N denotes population size, S the number of susceptible individuals and I the number of infectious individuals (in this study, we assumed that an individual will be infectious once it is infected, thus the terms infectious and infected will be used interchangeably; hence, the symbols S, I and R are used to denote both the disease status and the number of individuals with that disease status). The transmission rate parameter β describes the probability per unit of time for one infected individual to infect any other individual in a totally susceptible population [8, 12] (this can be seen from the transmission rate \(dS/dt = - \beta SI/N\), for I = 1 and S = N).
In the following, we will consider binary data at the end of an epidemic, which indicates for each individual whether it has become infected or not. Thus, binomial count data were available to quantify the occurrence of infected individuals according to genotype. As a step towards the GLM, first we derive the probability of an individual to become infected.
In [13], equation 10, which is equivalent to our equation (5), was presented as the final size equation for a population that is heterogeneous for susceptibility and infectivity (in epidemiology, the so-called final size equation gives the fraction of infected individuals of each type by the end of an epidemic). Our equations 5 and 14 in [10] follow a similar derivation but, in our case, the equation is applied to the end of the epidemic.
Generalized linear model (GLM)
Thus, the dependent variables have now become the fraction of each i type of individual that did become infected (see below).
In Eq. (7), the expectation of the response variable, \(cloglog(1 - P_{i} )\) is a linear expression of \({ \log }(\gamma_{i} )\) and \({ \log }(\varphi_{j} )\).
Equation (7) is linear in the log of susceptibility \((\gamma_{i} )\) and the log of infectivity \((\varphi_{j} )\). To be able to formulate the model in terms of allele counts within individuals, rather than in terms of individual genotypes, it was assumed that the two alleles that make up the genotype within an individual act multiplicatively, so that their effects are additive on the log-scale.
Simulation
To investigate the bias and precision of the \(\widehat{\gamma }_{G}\) and \(\widehat{\varphi }_{F}\), one generation of a diploid population was simulated based on the above assumptions with respect to the effect of alleles at both loci. These two loci were the only genetic effects simulated. Furthermore, it was assumed that allele frequencies at both loci were equal to 0.5, that is, \(p_{g} = p_{f} = 0.5\). The population was sub-divided into 100 groups of 100 individuals each. Each group was set up in such a way that group mates showed a certain genetic relatedness, r, at both loci. Here, relatedness is defined as the correlation of allele counts between group mates, irrespective of what causes the correlation. To limit the number of scenarios to be tested, relatedness at the susceptibility locus, \(r_{\gamma }\), and at the infectivity locus, \(r_{\varphi } ,\) were assumed to be the same (note that relatedness at both loci is expected to be the same when the loci are not under selection). In order to have a certain degree of relatedness among group mates, a fraction of fully related individuals was added to each group, supplemented by randomly selected individuals. Since each individual carries both the susceptibility and the infectivity locus, these additions were done jointly (see Appendix 4 in [5] for a detailed description of the strategy to make these additions jointly).
A basic stochastic SIR-model as described above was used to simulate the disease dynamics [12]. In each group, the epidemic began by one randomly infected individual. Then, the next event which could be either infection of a susceptible individual or recovery of infected individual was determined using Gillespie’s direct algorithm [16]. The type of event, i.e. either infection or recovery, was decided by drawing a random number v _{ 1 }, from a uniform distribution, v _{ 1 } ~ U(0,1). The next event was an infection of a susceptible individual if the random number \(v_{1} < \frac{{\mathop \sum \nolimits_{i} \mathop \sum \nolimits_{j} \beta_{ij} \frac{{S_{i} I_{j} }}{N}}}{{\mathop \sum \nolimits_{i} \mathop \sum \nolimits_{j} \beta_{ij} \frac{{S_{i} I_{j} }}{N} + I\alpha }},\) otherwise it was recovery of an infected individual. The numerator of this ratio represents the total infection rate, and the denominator the total rate, i.e., the sum of the infection and recovery rates. The sampling of the specific individual that became infected depended on individual susceptibility. The probability that a susceptible individual of genotype i became infected was proportional to \(\frac{{\mathop \sum \nolimits_{i} \mathop \sum \nolimits_{j} \beta_{ij} \frac{{S_{i} I_{j} }}{N}}}{{\mathop \sum \nolimits_{i} \mathop \sum \nolimits_{j} \beta_{ij} \frac{{S_{i} I_{j} }}{N} + I\alpha }}\). Hence, the transmission rates were updated based on the numbers of susceptible and infected individuals of each genotype, while the transmission rate parameter \(\beta_{ij}\) remained constant. The epidemic ended when there was no more infectious individual in the population or when there was no susceptible individual left to be infected. By the end of the epidemic, the number of individuals that got infected together with their genotypes for susceptibility and infectivity were recorded. The fraction of individuals of each genotype that got infected was the dependent variable in the analysis.
We hypothesised that different epidemiological and genetic factors will affect the quality of the estimates, as measured by the bias and precision of \(\widehat{\gamma }_{G}\) and \(\widehat{\varphi }_{F}\). For that purpose, we simulated different scenarios that are described below. The biases of the estimates were calculated by taking the difference between the ‘true’ and estimated values and the precision of the estimates were calculated using the standard deviation (SD) of the estimates.
Simulated scenarios
Parameters | Scenario 1 | Scenario 2 | Scenario 3 |
---|---|---|---|
Contact rate, c | 1.5 | 0.75–7.5 | 1.5 |
Recovery rate α | 0.5 | 0.5 | 0.5 |
\(\gamma_{G}\) | 0.6 | 0.6 | 0.97, 0.6 and 0.37 |
\(\varphi_{F}\) | 0.6 | 0.6 | 0.3, 0.6 and 0.9 |
Relatedness r | 0–1 | 0–1 | 0–1 |
R _{ 0 } | 1.2 | 0.6–6.1 | 1.2 |
Second, to investigate the effect of R _{ 0 } on the quality of \(\widehat{\gamma }_{G}\) and \(\widehat{\varphi }_{F}\), we simulated scenarios with different values of R _{ 0 }. We varied the contact rate c, so that R _{ 0 } for a population consisting of groups with unrelated individuals varied from 0.6 (for which no major outbreaks can occur) to 6.1 (for which major outbreaks can occur; Table 1, scenario 2).
Third, to investigate the impact of the size of effects of the genes for susceptibility and infectivity on the quality of \(\widehat{\gamma }_{G}\) and \(\widehat{\varphi }_{F}\), we simulated scenarios with different effect sizes for a constant value of R _{ 0 } = 1.2. We simulated all combinations of low, moderate and high values for \(\gamma_{G}\) and \(\varphi_{F}\) (Table 1, scenario 3).
Furthermore, in all of the above-mentioned scenarios, relatedness between group mates was varied between 0 and 1 to investigate the effect of population structure with respect to relatedness on the quality of \(\widehat{\gamma }_{G}\) and \(\widehat{\varphi }_{F}\). Relatedness was assumed to be the same at both loci (see [5] for details). We used R software to fit the model with a glm function and a binomial distribution.
Results
A scatter plot for \((1 - \widehat{\gamma }_{G} )\) and \((1 - \widehat{\varphi }_{F} )\) of the 2000 replicates for the basic scenario where r = 0 shows that the estimated differences are uniformly distributed over their range without any pattern (Fig. 1). This plot also shows that \((1 - \widehat{\varphi }_{F} )\) is more often underestimated than overestimated, which agrees with the underestimation in Fig. 2 for r = 0.
For all values of R _{ 0 }, standard deviations of estimates were greater for infectivity effect than for susceptibility effect, except for r = 1 for which they were nearly identical. Standard deviations decreased considerably as relatedness among group mates increased, particularly for infectivity effect. For both susceptibility and infectivity effects, standard deviations were smaller for values of R _{ 0 } for which the bias in \(\widehat{\gamma }_{G}\) and \(\widehat{\varphi }_{F}\) was smallest, i.e. when R _{ 0 } ranged approximately from 1.8 to 3.1.
Discussion
In this work, a generalized linear model with a complementary log–log link function was developed to estimate the relative effects of genes on individual susceptibility and infectivity. The model was developed from an equation that describes the probability of an individual to become infected as a function of its own susceptibility genotype and of the infectivity genotypes of its infected group mates. This GLM was developed following Velthuis et al. [9] who developed a GLM for binary data on a transmission trial to estimate the effect of susceptibility and infectivity of hosts on the transmission rate parameter β. A simulation study was performed to investigate the quality of the GLM. From the statistical analysis of the simulated data, we obtained fairly precise estimates, except for some scenarios for which estimates were more biased, particularly for infectivity. The best estimates were found for schemes with intermediate R _{ 0 } and related group members. For all the scenarios investigated, the sizes of the effects at both loci were underestimated.
The main objective of this study was to develop a methodology to estimate gene effects and also to investigate its quality in terms of bias and precision of the estimates. To test the methodology without introducing additional assumptions that may contribute to estimation error, we assumed additive allele effects on the log-scale for both susceptibility and infectivity. Thus, allelic effects were simulated multiplicatively on the original scale. This was done for two reasons. First, we wanted to formulate the model in terms of allele counts within individuals, rather than in terms of individual genotypes. In other words, we did not intend to estimate dominance effects. Whether allele effects are more likely to be additive on the log-scale than on the original scale is unknown at present. Second, since the objective of this study was to investigate the quality of the model rather than the assumptions on the genetic architecture, the data were simulated under a model that agreed with the assumptions of the statistical model.
Fraction of individuals infected at the end of the epidemic
r = 0 | r = 0.25 | r = 0.5625 | r = 1 | |
---|---|---|---|---|
R _{ 0 } = 0.6 | 0.02 | 0.03 | 0.03 | 0.04 |
R _{ 0 } = 1.2 | 0.10 | 0.12 | 0.14 | 0.16 |
R _{ 0 } = 1.8 | 0.30 | 0.30 | 0.30 | 0.30 |
R _{ 0 } = 2.5 | 0.46 | 0.45 | 0.44 | 0.43 |
R _{ 0 } = 3.1 | 0.58 | 0.57 | 0.55 | 0.53 |
R _{ 0 } = 3.7 | 0.66 | 0.65 | 0.63 | 0.61 |
R _{ 0 } = 4.3 | 0.71 | 0.70 | 0.69 | 0.67 |
R _{ 0 } = 4.9 | 0.75 | 0.75 | 0.73 | 0.71 |
R _{ 0 } = 5.5 | 0.79 | 0.78 | 0.77 | 0.75 |
R _{ 0 } = 6.1 | 0.81 | 0.80 | 0.80 | 0.78 |
For each scenario, more relatedness between individuals resulted in better estimates for both traits. This is because more relatedness creates more variation between groups, which results in groups with below or above average susceptibility and/or infectivity. This occurs because an individual with a lower susceptibility will also have related group mates with below average susceptibility, and vice versa. The same applies for infectivity. However, since we assumed absence of linkage disequilibrium (LD) between the susceptibility and infectivity loci, groups with below average susceptibility will not always have below average infectivity as well. Thus, only those groups with above average susceptibility and above average infectivity will have epidemics with a greater final size, i.e. the fraction of individuals that gets infected by the end of the epidemic, while those with below average susceptibility and infectivity will have a lower final size. This variation improved estimates of the effects of susceptibility and infectivity.
We have made a number of assumptions in building our methodology. In the derivation of Eq. (5), we assumed that all individuals that escaped the infection had been exposed to all infected individuals. Of course, this assumption is true for the simulations done here. To what extent, this will be true for real data remains to be seen. It seems reasonable to assume that individuals in relatively small and well-defined groups get mixed up over space and time as is often the case in animal husbandry: for example, in fattening pigs with group sizes of 10 to 30 individuals. The assumption is less reasonable for groups with a spatial structure, for example in tie stalls or when epidemics occur within barns subdivided into multiple groups. In such cases, data should be collected separately for different groups. We also assumed that epidemics could be completely recorded, so that the final disease status of all individuals is known, and all individuals that have escaped the infection have been exposed to all infected individuals. However, for reasons of, e.g., animal welfare and productivity, interventions are often carried out to limit the size of an epidemic. Hence, individuals may not have had the full potential to express their susceptibility and infectivity. For incomplete epidemics, the probability that an individual becomes infected follows from Eq. (5) when only the infected individuals to which the focal individual has been exposed are considered (see also [11]). Thus, extension to incompletely observed epidemics is straightforward (see also application in [_ENREF_189] and subsequent papers citing [9]).
Bias and precision of estimates may be improved when data are recorded within shorter time intervals. This may be particularly helpful for cases with high R _{ 0 }. In such cases, each interval forms an incompletely observed epidemic, which can be analysed with the same GLM statistical approach [9]. When data are collected in sufficiently short time intervals, only a fraction of individuals will become infected in a single interval, even when R _{0} is high. This will contribute to accuracy of the estimates. Moreover, collecting data in short time intervals also provide information on the order of infections, i.e., which animal has infected which animal. This will increase the accuracy of estimated gene effects, particularly for infectivity [17]. Thus, using data from short time intervals can be complementary to using groups composed of related individuals and data from multiple epidemics. The derivation and resulting model for such cases is very similar to the one presented here, since the probability that an individual escapes infection follows from the zero-term of the Poisson distribution (see also [9, 11]). The key step is to identify the infectious individuals to which the focal individual has been exposed in a time period.
Lipschutz-Powell et al. [11] showed that, when there is genetic variation in susceptibility only, a complementary log–log link function can be used to link an equation that describes the probability of an individual to become infected to a linear model that includes the individual’s genotype for susceptibility. They also suggested that, when there is genetic variation in infectivity, a Taylor-series expansion of the model term for infectivity can be used to further linearize the model in infectivity. In our study, we obtained a linear model for infectivity by approximating the arithmetic mean by a geometric mean. We quantified the error due to this approximation and found only negligible errors in the estimates (“Appendix”). Thus, this approximation can be ruled out as the cause of the observed bias. This suggests that, for cases for which there is variation in infectivity, the geometric mean approximation is suitable to obtain a linear combination of the parameters of interest. A full investigation of the causes of the bias is beyond the scope of this study. However, the fact that a population of finite size, i.e., 100 individuals in each group, was used to estimate gene effects can be one of the reasons for the observed underestimation.
Anche et al. [5] defined breeding value and heritable variation in R _{ 0 }. They showed that an individual’s breeding value for R _{ 0 } is a function of the population’s average susceptibility and infectivity, of the gene frequencies within the individual and of average effects of the alleles at both loci (Equation 7c in [5]). However, Anche et al. [5] assumed that effects of alleles at both loci were additive, whereas here we assumed that effects are multiplicative (so that they are additive on the log scale). Multiplicative effects introduce dominance. Hence, before applying the expressions for breeding value and heritable variation of [5] to estimates obtained from the methods proposed here, they need to be translated into average effects of alleles [15]. Using the common notation for the one-locus model [15], the additive effect is half the difference in genotypic value between both homogyzotes, \(a_{\gamma } = (\gamma_{g}^{2} - \gamma_{G}^{2} )/2\) and \(a_{\phi } = (\phi_{f}^{2} - \phi_{F}^{2} )/2\), the dominance deviation is the difference between the heterozygote and the average of both homozygotes, \(d_{\gamma } = \gamma_{g} \gamma_{G} - (\gamma_{g}^{2} + \gamma_{G}^{2} )/2\) and \(d_{\varphi } = \varphi_{f} \varphi_{F} - (\varphi_{f}^{2} + \varphi_{F}^{2} )/2\), and the average effects of alleles are given by \(\alpha_{\gamma } = a_{\gamma } + (p_{G} - p_{g} )d_{\gamma }\) and \(\alpha_{\varphi } = a_{\varphi } + (p_{F} - p_{f} )d_{\varphi }\), where p denotes allele frequency [15]. Hence, in Eqs. 7 and 11 of [5], \(\gamma_{g} - \gamma_{G}\) should be replaced by \(\alpha_{\gamma }\), and \(\varphi_{f} - \varphi_{F}\) should be replaced by \(\alpha_{\varphi }\). For example, for \(\gamma_{g} = 1\) and \(\gamma_{G} = 0.6\), genotypic values are \(\gamma_{gg} = 1\), \(\gamma_{gG} = 0.6\) and \(\gamma_{GG} = 0.36\), the additive effect is \(a_{\gamma } = (1 - 0.36)/2 = 0.32\), the dominance deviation is \(d_{\gamma } = 0.6 - (1 + 0.36)/2 = - 0.08\), and the average effect is \(\alpha_{\gamma } = 0.32 - 0.08\,(p_{G} - p_{g} )\).
In this study, we assumed a model with two bi-allelic loci, i.e. one locus that affects individual susceptibility and one locus that affects individual infectivity. Furthermore, we assumed that which locus affects infectivity and which locus affects susceptibility, are known. This may be the case with candidate gene approaches which include only the genes for which the function is related to the trait of interest. The effect of the putative causative gene is then examined by association study. In such studies, the GLM developed here can be applied to estimate and confirm the effect of the candidate gene on the trait of interest. However, applying a candidate gene approach is limited because it relies on knowing the functional relation between the genes and the trait of interest. The recent advances in molecular genomics allow us to genotype individuals for thousands of SNPs, and to perform GWAS in which all SNPs are examined for their association with the trait of interest. The GLM developed here can also be used in GWAS that aim at identifying genes associated with susceptibility and/or infectivity. In such studies, it is not known whether a SNP affects infectivity and/or susceptibility. Hence, this has to be inferred from the significance of the estimated effects. To avoid the need to test all combinations of two SNPs, one could first screen SNPs for susceptibility effects, and then fit only the significant loci for susceptibility effects, together with all other loci for infectivity effects. Moreover, when modified so that gene effects are estimated as random effects, our model can probably be used for polygenic traits, for example in genomic prediction, for which effects of all genes are estimated simultaneously and the interest lies in predicting the breeding value of entire genotypes [18].
Conclusions
We have developed a generalized linear model to estimate the relative effects of genes on individual susceptibility and infectivity. This model may be used in genome-wide association studies that aim at identifying genes that are involved in the prevalence of infectious diseases.
Declarations
Authors’ contributions
MTA conducted the study. MTA, PB and MdJ designed the statistical methods. MTA, PB and MdJ wrote the manuscript. All authors read and approved the final manuscript.
Acknowledgements
This study was financially supported by the Marie Curie Nematode Health project. The contribution of PB was supported by the foundation for applied sciences (STW) of the Dutch science council (NWO).
Competing interests
The authors declare that they have no competing interests.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Authors’ Affiliations
References
- Bishop S, de Jong M, Gray D. Opportunities for incorporating genetic elements into the management of farm animal diseases: policy issues. Commission on Genetic Resources for Food and Agriculture. Rome: FAO. 2002; p. 36.Google Scholar
- Axford RFE, Bishop SC, Nicholas FW, Owen JB. Breeding for disease resistance in farm animals. 2nd ed. Wallingford: CABI Publishing; 2000.Google Scholar
- Bermingham ML, Bishop SC, Woolliams JA, Pong-Wong R, Allen AR, McBride SH, et al. Genome-wide association study identifies novel loci associated with resistance to bovine tuberculosis. Heredity. 2014;112:543–51.PubMed CentralView ArticlePubMedGoogle Scholar
- Kirkpatrick BW, Shi X, Shook GE, Collins MT. Whole-genome association analysis of susceptibility to paratuberculosis in Holstein cattle. Anim Genet. 2011;42:149–60.View ArticlePubMedGoogle Scholar
- Anche M, de Jong M, Bijma P. On the definition and utilization of heritable variation among hosts in reproduction ratio R0 for infectious diseases. Heredity. 2014;113:364–74.PubMed CentralView ArticlePubMedGoogle Scholar
- Lloyd-Smith JO, Schreiber SJ, Kopp PE, Getz WM. Superspreading and the effect of individual variation on disease emergence. Nature. 2005;438:355–9.View ArticlePubMedGoogle Scholar
- Lipschutz-Powell D, Woolliams JA, Bijma P, Doeschl-Wilson AB. Indirect genetic effects and the spread of infectious disease: are we capturing the full heritable variation underlying disease prevalence? PLoS One. 2012;7:e39551.PubMed CentralView ArticlePubMedGoogle Scholar
- Diekmann O, Heesterbeek JA, Metz JA. On the definition and the computation of the basic reproduction ratio R0 in models for infectious diseases in heterogeneous populations. J Math Biol. 1990;28:365–82.View ArticlePubMedGoogle Scholar
- Velthuis A, De Jong M, Kamp E, Stockhofe N, Verheijden J. Design and analysis of an Actinobacillus pleuropneumoniae transmission experiment. Prev Vet Med. 2003;60:53–68.View ArticlePubMedGoogle Scholar
- Lipschutz-Powell D, Woolliams JA, Doeschl-Wilson AB. A unifying theory for genetic epidemiological analysis of binary disease data. Genet Sel Evol. 2014;46:15.PubMed CentralView ArticlePubMedGoogle Scholar
- Kermark WO, McKendrick AG. A contribution to the mathematical theory of epidemics. Proc R Soc A. 1927;115:700–21.View ArticleGoogle Scholar
- Anderson RM, May RM, Anderson B. Infectious diseases of humans: dynamics and control. New York: Oxford University Press Inc.; 1992.Google Scholar
- Andreasen V. The final size of an epidemic and its relation to the basic reproduction number. Bull Math Biol. 2011;73:2305–21.View ArticlePubMedGoogle Scholar
- McCullagh P, Nelder JA. Generalized linear models. 2nd ed. London: Chapman and Hall; 1989.View ArticleGoogle Scholar
- Falconer D, Mackay TC. Introduction to quantitative genetics. 4th ed. Harlow: Pearson Education Limited; 1996.Google Scholar
- Gillespie DT. Exact stochastic simulation of coupled chemical reactions. J Phys Chem. 1977;81:2340–61.View ArticleGoogle Scholar
- Pooley CM, Bishop SC, Marion G. Estimation of single locus effects on susceptibility, infectivity and recovery rates in an epidemic using temporal data. In: Proceedings of the 10th world congress of genetics applied to livestock production: 17–22 August 2014; Vancouver. 2014. https://asas.org/docs/default-source/wcgalp-proceedings-oral/221_paper_9069_manuscript_1681_0b.pdf?sfvrsn=2.
- Meuwissen THE, Hayes BJ, Goddard ME. Prediction of total genetic value using genome-wide dense marker maps. Genetics. 2001;157:1819–29.PubMed CentralPubMedGoogle Scholar