Skip to content

Conversation

@tomjemmett
Copy link
Member

previously, all table names and save paths were hard-coded in. If we were to change where the location of a table was stored all instances would need to be updated.

This was going to prove to be an issue when we migrate to UDAL.

This PR abstracts the table names/file save paths into a new module, nhp.data.table_names. That way we can more easily update names as needed.

Also tidied up some potential circular dependencies and moved reference files out of more obscure locations and adds them into the code base as appropriate.

Copilot AI review requested due to automatic review settings October 29, 2025 13:16
@tomjemmett tomjemmett requested a review from a team as a code owner October 29, 2025 13:16
@tomjemmett tomjemmett self-assigned this Oct 29, 2025
@tomjemmett tomjemmett added the enhancement New feature or request label Oct 29, 2025
@tomjemmett tomjemmett added this to the v4.3.0 milestone Oct 29, 2025
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a centralized configuration system for managing database table names and file paths through a new table_names.py module. The changes replace hard-coded table names and file paths throughout the codebase with references to a configurable TableNames dataclass, making the system more maintainable and environment-aware.

Key changes:

  • Creates a new table_names.py module with a TableNames dataclass and environment-based configuration
  • Refactors all files to import and use table_names instead of hard-coded strings
  • Updates Databricks workflow configurations to remove redundant path parameters
  • Standardizes function signatures by removing default parameters and using explicit spark session passing

Reviewed Changes

Copilot reviewed 68 out of 69 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
src/nhp/data/table_names.py New centralized configuration module defining all table names and paths
src/nhp/data/reference/*.py Updated to use table_names for reference data tables and paths
src/nhp/data/raw_data/*.py Updated to use table_names for raw data tables
src/nhp/data/nhp_datasets/*.py Updated to use table_names for HES dataset references
src/nhp/data/population_projections/*.py Updated to use table_names for population projection paths and tables
src/nhp/data/model_data/*.py Updated to use table_names and standardize function signatures
src/nhp/data/inputs_data/*.py Updated to use table_names and refactor main functions
src/nhp/data/aggregated_data/*.py Updated to use table_names for aggregated data tables
src/nhp/data/get_spark.py Simplified to remove catalog/schema parameters
databricks_workflows/*.yaml Removed redundant path parameters now handled by table_names

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Member

@StatsRhian StatsRhian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tom and I chatted through the changes in a code walk. Nice to get this stuff abstracted.

@tomjemmett tomjemmett merged commit de4d41c into main Oct 30, 2025
3 checks passed
@tomjemmett tomjemmett deleted the abstract-table-names branch October 30, 2025 14:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants