-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add reference paper for the reproduction number calculation #151
Conversation
@joshwlambert few points
would you be able to do this? Willing to see how you are storing references.
I am open to discussing this further. I am able to add a line that suits the best if appropriate. |
@avallecam I'm happy to add the reference to the bibliography. Please feel free to add information to the "Warning..." paragraph if you think it will be informative for users. It's worth adding it as we can always revert to the current commit if we decide not to include it. I'll let you add the text to the README first and then I'll add the reference. |
hi @joshwlambert, welcome back, and thank you for your patience! I just finished the paragraph content. This is ready for your review and to add the reference paper for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution @avallecam. It's always good to reference epi literature/theory in the packages.
However, in looking through this again I'm not uncertain about whether the descriptions of Bjornstad et al. directly correspond to the individual-based simulation used in {simulist} (see in-text comments).
@jamesmbaazam given your experience with epidemic modelling and branching processes, would you mind taking a look at this PR and letting us know whether the descriptions added to the README and the code are correct?
@@ -94,9 +94,9 @@ onset_to_death <- epiparameter::epidist_db( | |||
) | |||
``` | |||
|
|||
To simulate a line list for COVID-19 with an Poisson contact distribution with a mean number of contacts of 2 and a probability of infection per contact of 0.5, we use the `sim_linelist()` function. The mean number of contacts and probability of infection determine the outbreak reproduction number, if the resulting reproduction number is around one it means we will likely get a reasonably sized outbreak (10 - 1,000 cases, varying due to the stochastic simulation). | |||
To simulate a line list for COVID-19 with a Poisson contact distribution with a mean number of contacts of 2 per day and a probability of infection per contact of 0.5, we use the `sim_linelist()` function. As outlined in @bjornstad2020a, the contact rate ($k$) and probability of infection on contact ($\pi$) are combined into a transmission rate that, multiplied by the infectious period ($1/\gamma$), determines the outbreak reproduction number ($R_o$). If the resulting reproduction number is around one it means we will likely get a reasonably sized outbreak (10 - 1,000 cases, varying due to the stochastic simulation). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the addition of "per day" is incorrect, the contact distribution is independent of time, and only defines the number of contacts per case. It is the infectious period that defines the temporal aspect of transmission.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As outlined in @bjornstad2020a, the contact rate (
$k$ )
I'm now wondering if in fact the sim_linelist()
model is not parameterised slightly differently from the description given here and in Bjornstad et al. Related to the above comment, the contact_distribution
is not a rate and therefore I'm don't think it is equivalent to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps the fact that infectious_period
is uniformly sampled in {simulist} mean that both processes give the same result (making an assumption about large sample sizes).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the addition of "per day" is incorrect, the contact distribution is independent of time, and only defines the number of contacts per case. It is the infectious period that defines the temporal aspect of transmission.
The way the simulations are set up here assumes the number of contacts an individual has depends on time, i.e., generations of infections. In other words, it depends on how many generations the chain survives.
Thanks for tagging. I'll take a closer look tomorrow. |
@@ -94,9 +94,9 @@ onset_to_death <- epiparameter::epidist_db( | |||
) | |||
``` | |||
|
|||
To simulate a line list for COVID-19 with an Poisson contact distribution with a mean number of contacts of 2 and a probability of infection per contact of 0.5, we use the `sim_linelist()` function. The mean number of contacts and probability of infection determine the outbreak reproduction number, if the resulting reproduction number is around one it means we will likely get a reasonably sized outbreak (10 - 1,000 cases, varying due to the stochastic simulation). | |||
To simulate a line list for COVID-19 with a Poisson contact distribution with a mean number of contacts of 2 per day and a probability of infection per contact of 0.5, we use the `sim_linelist()` function. As outlined in @bjornstad2020a, the contact rate ($k$) and probability of infection on contact ($\pi$) are combined into a transmission rate that, multiplied by the infectious period ($1/\gamma$), determines the outbreak reproduction number ($R_o$). If the resulting reproduction number is around one it means we will likely get a reasonably sized outbreak (10 - 1,000 cases, varying due to the stochastic simulation). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thinking that this sentence may clarify the difference between the method (branching process) and the relationship between the variables (from a SIR model)
To simulate a line list for COVID-19 with a Poisson contact distribution with a mean number of contacts of 2 per day and a probability of infection per contact of 0.5, we use the `sim_linelist()` function. As outlined in @bjornstad2020a, the contact rate ($k$) and probability of infection on contact ($\pi$) are combined into a transmission rate that, multiplied by the infectious period ($1/\gamma$), determines the outbreak reproduction number ($R_o$). If the resulting reproduction number is around one it means we will likely get a reasonably sized outbreak (10 - 1,000 cases, varying due to the stochastic simulation). | |
To simulate a line list for COVID-19 with a Poisson contact distribution with a mean number of contacts of 2 per day and a probability of infection per contact of 0.5, we use the `sim_linelist()` function. This runs a branching process model to simulate a SIR model but with no depletion of susceptible individuals. As outlined in @bjornstad2020a, the contact rate ($k$) and probability of infection on contact ($\pi$) are combined into a transmission rate that, multiplied by the infectious period ($1/\gamma$), determines the outbreak reproduction number ($R_o$). If the resulting reproduction number is around one it means we will likely get a reasonably sized outbreak (10 - 1,000 cases, varying due to the stochastic simulation). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This runs a branching process model to simulate a SIR model but with no depletion of susceptible individuals.
EDIT:
I think it's fine to just say it simulates infections from a simple branching process as mentioning SIR models here might distract from the main point of telling them what the underlying engine is.
Here is a quick answer that I will update tomorrow. The transmission process in the SIR model described in Bjornstad's paper is not always the same as the result of the branching process defined here unless their assumptions match. The main difference here is that the SIR model assumes an exponentially distributed infectious period while the model here assumes an arbitrary infectious period (as far as I can tell from a quick look). The branching process also accounts for individual-level transmission heterogeneity whereas it is the same per capita in the SIR model. A really good reference for this that I can think of is Mathematical models of infectious disease transmission by Nicholas C. Grassly and Christophe Fraser. The section on "offspring" distribution is really insightful. You can reference equation 1, which states the expected number of offspring (here, an entry in the linelist) in terms of the infectious period and probability of susceptibility. |
@jamesmbaazam and @avallecam thank you both for inputing on this. Given that we agree there are fundamental differences between the branching process and compartmental models, succinctly wording an explanation in README will be difficult, and we want to avoid confusing users. Instead I propose we close this PR without merging the changes, and instead aim to add a model description vignette as outlined in epiverse-trace/discussions#83. What do you both think? This vignette can contain the main parts of this PR:
We can then add a line to the README that links to this vignette for users that would like a more detailed description of the model. |
I'm fine with that. There is quite a good amount of theory and mathematical formulations in epichains in the helpfile of
Here, I think the focus should be more on how and why to simulate case data: linelists, aggregated time series, etc., which then leads into the next bullet.
Why do you need to explain this here?
I think this is the same as the second bullet.
|
Does the "Why do you need to explain this here?" refer to "comparing branching process to compartmental models" or "providing a more mathematical explanation of the simulation"? If the former then this relates to Andree's changes in this PR which make comparison to Bjornstad et al. which discusses SIR modelling. If the latter then the reasoning is the same as outlined in epiverse-trace/discussions/83. |
Sorry, my bad. I was referring to the former. I don't see the need to do that comparison since this package doesn't use compartmental models. |
I see your point. The reason I added this was the changes made in this PR add context to the model and similarities to compartmental models. I'll let @avallecam chip in and see how he thinks we can best resolve #139. |
I actually think you have a good point to explore their similarities and differences, so ignore my previous comment. I, however, think it should be done in the scope of describing the different ways that one can generate synthetic linelist. For example, LLsim, though abandoned or in early stages, uses a compartmental model whereas simulist uses branching processes. A really good reference for these comparisons is Equivalence of the Erlang-Distributed SEIR Epidemic Model and the Renewal Equation. They do a nice comparison and alignment of compartmental models and renewal/branching processes. |
Hi, both; thanks for your review and additional references. I agree to close this PR. This agreement has two elements:
@joshwlambert Should these renaming proposals be part of a new and more specific PR to review? |
From meeting with @joshwlambert the method vignette will be an appropriate place to register {simulist} methods. Then, this vignette can be referenced in the README paragraph that starts with "To simulate a line list for..." where the function arguments are related to determine the outbreak reproduction number. Thus, I agree to close this PR. Additionally, here I draft some learning points from the meeting that clarify the renaming proposals mentioned above and, if appropriate, can be included in the methods vignette:
If any point needs to be rephrased, we can use the new PR that aims to include the methods for this. |
Thanks for summarising our discussion @avallecam! I'll now close this PR and tackle the simulation algorithm vignette in the future. |
NEWS.md
Fix #139
README paragraph Quick start section relates contact distribution and probability of infection for the Ro calculation. We can add a reference to this relationship.
This proposes to add complementary text to the README to refer to the equation relating
contact_distribution
,prob_infection
andinfectious_period
elements to calculate Ro, using as reference a paper cited in {epidemics} https://www.nature.com/articles/s41592-020-0822-zNo
This still requires adding the reference to the paper: https://www.nature.com/articles/s41592-020-0822-z to make
@bjornstad2020a
work. This can reuse content in https://github.com/epiverse-trace/epidemics/I wanted to include the
infectious_period
object to theWarning: ...
paragraph, but not sure how to phrase it or if it would be appropriate given the emphasis oncontact_distribution
andprob_infection
. So open for your assessment.