-
Notifications
You must be signed in to change notification settings - Fork 2
Abstract table names #225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Abstract table names #225
Conversation
allows us to change where we read/write data from in a single place, so we can more easily switch between environments.
…prevent circular dependency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a centralized configuration system for managing database table names and file paths through a new table_names.py module. The changes replace hard-coded table names and file paths throughout the codebase with references to a configurable TableNames dataclass, making the system more maintainable and environment-aware.
Key changes:
- Creates a new
table_names.pymodule with aTableNamesdataclass and environment-based configuration - Refactors all files to import and use
table_namesinstead of hard-coded strings - Updates Databricks workflow configurations to remove redundant path parameters
- Standardizes function signatures by removing default parameters and using explicit spark session passing
Reviewed Changes
Copilot reviewed 68 out of 69 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| src/nhp/data/table_names.py | New centralized configuration module defining all table names and paths |
| src/nhp/data/reference/*.py | Updated to use table_names for reference data tables and paths |
| src/nhp/data/raw_data/*.py | Updated to use table_names for raw data tables |
| src/nhp/data/nhp_datasets/*.py | Updated to use table_names for HES dataset references |
| src/nhp/data/population_projections/*.py | Updated to use table_names for population projection paths and tables |
| src/nhp/data/model_data/*.py | Updated to use table_names and standardize function signatures |
| src/nhp/data/inputs_data/*.py | Updated to use table_names and refactor main functions |
| src/nhp/data/aggregated_data/*.py | Updated to use table_names for aggregated data tables |
| src/nhp/data/get_spark.py | Simplified to remove catalog/schema parameters |
| databricks_workflows/*.yaml | Removed redundant path parameters now handled by table_names |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
StatsRhian
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tom and I chatted through the changes in a code walk. Nice to get this stuff abstracted.
previously, all table names and save paths were hard-coded in. If we were to change where the location of a table was stored all instances would need to be updated.
This was going to prove to be an issue when we migrate to UDAL.
This PR abstracts the table names/file save paths into a new module,
nhp.data.table_names. That way we can more easily update names as needed.Also tidied up some potential circular dependencies and moved reference files out of more obscure locations and adds them into the code base as appropriate.