-
Notifications
You must be signed in to change notification settings - Fork 2
Coding Good Practices and Some Tips
Before going into some specifics, here are a few common rule one should try to follow when doing any kind of programming.
- Fail Faster: It's better to crash early than to find out far in the future
- Fail Clearer: When your program fails, it should be explicit and clear why
- Code is read way more than it is written: Make the code clear for an external user (you in a week)
- Make it work first, optimize second: when in doubt make the code safe
It is highly recommended to read the R for Data Science book. It will give you a good foundation for your R coding journey.
always.use.dot.case-
never use "underscore"
_ - only lower case
These are not only good practices. Not respecting these assumption (1 and 2 in particular) can lead to actual errors due to EpiModel expectations.
Attributes and Parameters are often of the same type. We try to indicate what to expect from an attribute and parameter with a set of common suffixes:
- Attributes:
-
.last: a timestep where something happened for the last time -
.count: the number of time something happened
-
- Parameters:
-
.int: an interval as a number of timesteps -
.or: an odds-ratio -
.prob: a probability[0, 1] -
.rate: a rate, probability of something happening per timestep[0, 1]
-
To improve the model clarity, try to use a common prefix for the attributes and parameters referring to the same thing.
-
gono.: things related to gonorrhea -
syph: things related to syphilis - ...
Finally, we often use the same components over and over. Try to use the commonly used terms to refer to them:
- dx: diagnosis / diagnosed
- ndx: not diagnosed
- tx: treatment / treated
- ntx: untreated
- inf: infection / infected
- test: test (diagnostic test / screening)
- sympt: symptom / symptomatic
- asympt: asymptomatic
- ...
Here are some syphilis attribues and parameters:
- Attributes:
-
syph.inf: is the node infected by syphilis? (0 or 1) -
syph.inf.last: when did the last syphilis infection occured? -
syph.inf.count: number of syphilis infections -
syph.dx: is the node diagnosed with syphilis? (0 or 1) -
syph.tx: is the node treated for syphilis? (0 or 1)
-
- Parameters:
-
syph.prob: probability of getting infected by syphilis per sex act -
syph.sympt.tx.prob: probability of getting treated for syphilis if symptomatic -
syph.screen.hivneg.rate: per timestep probability of getting screened for syphilis if HIV negative
-
When in doubt, try to mimic the conventions used in the project.
When assigning a variable using get_attr or get_param, keep the original name of the attribute or parameter.
All other variable should follow these rules:
snake_case- only lower case
-
never use "dot"
.
This naming distinction allows to easily discriminates what comes from dat and what has been defined elsewhere.
Similar to attributes and parameters, a set of common suffixes is often used:
-
_ids: positional IDs -
_acts: positions in the act list -
_name: name of something (not the thing)
Theoretically, an attribute can take any scalar value. However, it is easier when these rules are followed for the defautls:
- avoid
NAas much as possible- someone HIV negative should have the values
0forhiv.dxand notNA - this limits the need to always check for the
NAs edgecases
- someone HIV negative should have the values
- flags should be
0or1, it's rare to have a case whereNAis usefull - timesteps like
.last: should be-Infby default - it never occurred and is a valid number to do computations
Goal: get the positional IDs of nodes that match a given set of conditions
Example:
- HIV positive nodes
- Diagnosed for their HIV
- Not on PrEP
- Circumsised
- Infected with syphilis
- Treated for their syphilis
elig_ids <- which(
status == 1 &
diag.status == 1 &
prep == 0 &
circ == 1 &
syph.inf == 1 &
syph.tx == 1
)- Being diagnosed with HIV implies HIV infection (in this model)
- PrEP is not possible when diagnosed with HIV (in this model)
- Syphilis treatment implies syphilis infection (in this model
elig_ids <- which(
diag.status == 1 &
circ == 1 &
syph.tx == 1
)Because syph.tx == 1 is very rare, it's faster to first get these nodes, then keep only the circumsised ones and finally keep only the ones with an HIV+ diagnostic.
elig_ids <- which(syph.tx == 1)
elig_ids <- elig_ids[circ[elig_ids] == 1]
elig_ids <- elig_ids[diag.status[elig_ids] == 1]This last optimization is only useful when one of the condition is rare (~10%) of the population.
As a general rule, always use the simplest version first and optimize later if relevant.
When in doubt about the redundancy of two conditions, keep both.
Never use "common sense", only act if you are sure what is happening within the model