GitHub

Prevalence to Relative Abundance Project

In this project we're attempting to understand how the prevalence of human pathogens translates into the relative abundance we see in wastewater metagenomics. This work is split across two repositories: in this repo we're collecting prevalence estimates and building the model, while determining relative abundances from existing data is in the mgs-pipeline repo.

Working with prevalence data

In python, run import pathogens and then iterate over pathogens.pathogens. Each pathogen implements an estimate_prevalences method which gives one or more estimates.

Run ./summarize.py to get an overview of the data.

Statistical model

For an overview of the statistical model see model.md.

To fit the model, run ./fit.py. This will create:

input.tsv, a table of the input data to the model
fits.tsv, a table of samples from the posterior distribution of model parameters
fits_summary.tsv, a table listing summary statistics of the posterior distributions of model parameters
fig/, a directory containing a large number plots of posterior distributions and samples from posterior predictive distributions (see model.md for details)

Once the model has been fit, run ./plot_summaries.py to create plots of the posterior distribution of $RA(1\perthousand)$ for the write-up:

fig/incidence-violin.{pdf,png}, posteriors for all incidence viruses
fig/prevalence-violin.{pdf,png}, posteriors for all prevalence viruses
fig/by_location_incidence-violin.{pdf,png}, posteriors separated by location for the most common incidence viruses

Development

Making changes

Create a branch named yourname-purpose and push your changes to it. Then create a pull request. Use the "request review" feature to ask for a review: all PRs need to be reviewed by someone else, and for now include Jeff and Simon on all PRs unless they're OOO.

Once your change has been approved by your reviewers and passes presubmit, you can merge it. Don't merge someone else's PR without confirming with them: they may have other changes they've realized they needed to make, or a tricky branch structure that needs to be resolved in a particular order.

Handle incoming reviews at least twice a day (morning and afternoon) -- slow reviews add a lot of friction. As a PR author you can avoid this friction by creating another branch that diverges from the code you have under review; ask Jeff to show you how if you're interested. Configure notification routing on github so that work-related notifications go to your work account.

Testing

Run ./test.py

Presubmit

Before creating a PR or submitting code, run ./check.sh. It will run tests than check your types and formatting. This also runs automatically on GitHub when you make PRs, but it's much faster to catch problems locally first.

If ./check.sh complains about formatting or import sorting, you can fix this automatically with ./check.sh --fix.

Installing pystan

Pystan should be installed along with the other requirements when you run:

python -m pip install -r requirements-dev.txt"

However, on some non-Linux systems (including M2 Macbooks), one of pystan's dependencies,httpstan, may fail to install. To get around this problem, you can install httpstan from source. Once it is built and installed, you can then install the requirements file as above. (Note that you can clone the httpstan repo anywhere on your computer. I recommend doing it outside of the p2ra repo directory to that git doesn't try to track it.)

Name		Name	Last commit message	Last commit date
Latest commit History 657 Commits
.github/workflows		.github/workflows
figures		figures
pathogens		pathogens
prevalence-data		prevalence-data
.gitignore		.gitignore
README.md		README.md
authors.txt		authors.txt
check.sh		check.sh
determine_pseudocounts.py		determine_pseudocounts.py
fit.py		fit.py
fits_summary.tsv		fits_summary.tsv
generate_numbers_for_discussion.py		generate_numbers_for_discussion.py
get_rothman_virus_counts.py		get_rothman_virus_counts.py
input.tsv		input.tsv
list_taxids.py		list_taxids.py
mgs.py		mgs.py
model.md		model.md
model.stan		model.stan
pathogen_properties.py		pathogen_properties.py
plot_summaries.py		plot_summaries.py
populations.py		populations.py
process-authors.py		process-authors.py
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
stats.py		stats.py
summarize.py		summarize.py
test.py		test.py
tree.py		tree.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Prevalence to Relative Abundance Project

Working with prevalence data

Statistical model

Development

Making changes

Testing

Presubmit

Installing pystan

About

Releases

Packages

Contributors 4

Languages

naobservatory/p2ra

Folders and files

Latest commit

History

Repository files navigation

Prevalence to Relative Abundance Project

Working with prevalence data

Statistical model

Development

Making changes

Testing

Presubmit

Installing pystan

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages