Skip to content

Conversation

@tomjemmett
Copy link
Member

largely follows what is in the nhp_model repo's notebook, but adds this to the data extraction pipeline.

rather than extracting to the dev folder, it extracts to synth

closes The-Strategy-Unit/nhp_model#246

@tomjemmett tomjemmett self-assigned this Oct 1, 2025
@tomjemmett tomjemmett requested a review from a team as a code owner October 1, 2025 12:03
@tomjemmett tomjemmett added enhancement New feature or request priority: could labels Oct 1, 2025
@tomjemmett tomjemmett marked this pull request as draft October 1, 2025 12:05
@tomjemmett
Copy link
Member Author

tomjemmett commented Oct 1, 2025

need to handle the TODO before ready for review. We currently modify two HRG's to be called HRG1/HRG2, this is purely needed because that is what they are labelled as in the sample params

options:

  1. change sample params to pick a valid HRG
  2. remap all HRGs to HRG1..N,
  3. push the remap list from the ip step into the object, so it can be picked up by inequalities (though, this forces a specific order for the steps to run in)

largely follows what is in the nhp_model repo's notebook, but adds this to the data extraction pipeline.

rather than extracting to the dev folder, it extracts to synth
@tomjemmett tomjemmett force-pushed the add_synthetic_data_generation branch from 0dcfee8 to 6a51c0a Compare October 1, 2025 16:05
@tomjemmett tomjemmett marked this pull request as ready for review October 2, 2025 09:18
yiwen-h
yiwen-h previously approved these changes Oct 2, 2025
Copy link
Member

@yiwen-h yiwen-h left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Thank you @tomjemmett - makes loads of sense.

Wonder what's the best way of documenting this for developers? Add to README or repo wiki page?

@tomjemmett tomjemmett merged commit f4700bb into main Oct 3, 2025
3 checks passed
@tomjemmett tomjemmett deleted the add_synthetic_data_generation branch October 3, 2025 09:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request priority: could

Projects

None yet

Development

Successfully merging this pull request may close these issues.

synthetic data generation notebook is not reproducible

3 participants