Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse VIDRL human sera pool measurements #158

Open
huddlej opened this issue Aug 14, 2024 · 2 comments · Fixed by #160
Open

Parse VIDRL human sera pool measurements #158

huddlej opened this issue Aug 14, 2024 · 2 comments · Fixed by #160
Assignees
Labels
enhancement New feature or request

Comments

@huddlej
Copy link
Contributor

huddlej commented Aug 14, 2024

Description

Summary of the plan for parsing human sera pool measurements from VIDRL spreadsheets:

  1. Parse past VIDRL spreadsheets to find all distinct serum id values for human sera pools, so we know what values need to be mapped to vaccine strain names
  2. Create a TSV file per subtype in fauna that maps human sera pool ids from VIDRL to vaccine strains (e.g., “SH 2024 EGG” to “A/Thailand/8/2022-egg” for H3N2) using seasonal-flu vaccine.json files (e.g., H3N2) as a source of truth
  3. Add logic to tdb/vidrl_upload.py to convert serum_id (e.g., “SH 2024”) and serum_passage (e.g., “EGG”) values from the parsed titer blocks to a key that appears in the TSV file mapping above and use that key to set the serum strain to the vaccine strain name
  4. Run the upload script on past spreadsheets (as dryrun?) and confirm that human sera pool measurements get extract for H1, H3, and Vic
  5. Upload just the new human sera pool measurements to fauna
@huddlej huddlej added the enhancement New feature or request label Aug 14, 2024
@joverlee521 joverlee521 self-assigned this Aug 21, 2024
@joverlee521
Copy link
Contributor

I'll focus on getting the 2024 human sera pool measurements into fauna before the VCM in September. We will revisit generalizing patterns for ingesting earlier human sera data at a later when there's less time crunch.

@joverlee521 joverlee521 linked a pull request Aug 26, 2024 that will close this issue
1 task
@joverlee521 joverlee521 reopened this Aug 29, 2024
@joverlee521
Copy link
Contributor

Used this Snakefile to backfill all of the human sera data from 2024 with changes from #160.

I have not ingested any of the earlier data from 2023. I expect there will need to be updates for the regexes and definitely updates to the VACCINE_MAPPING for ingesting earlier data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants