Skip to content

Restore string enum labels in UKSingleYearDataset.person/benunit/household#1498

Merged
nwoodruff-co merged 4 commits intomainfrom
fix/decoded-enum-columns-on-dataset
Feb 19, 2026
Merged

Restore string enum labels in UKSingleYearDataset.person/benunit/household#1498
nwoodruff-co merged 4 commits intomainfrom
fix/decoded-enum-columns-on-dataset

Conversation

@nwoodruff-co
Copy link
Contributor

@nwoodruff-co nwoodruff-co commented Feb 19, 2026

Summary

  • After Optimise Simulation() init: ~10x speedup on warm loads #1497, enum columns in dataset DataFrames were stored as int16 internally for the warm-load cache speedup, breaking sim.dataset[year].person for callers who expected string labels like 'MALE' / 'FT_EMPLOYED'
  • Converts person, benunit, household on UKSingleYearDataset from plain attributes to properties that decode int16 enum columns back to strings on access
  • Raw int16 data lives in private _person/_benunit/_household; .tables (used by the simulation engine) still points at these, so no performance regression
  • _pre_encode_enum_columns in simulation.py updated to write to _<table> directly

Test plan

  • sim.dataset[year].person['gender'].values returns ['MALE', 'FEMALE', ...] not integers
  • sim.dataset[year]._person['gender'].dtype is int16
  • Warm load time unchanged (~0.6s)
  • Existing core tests pass

…m labels

After the warm-load caching optimisation (#1497), enum columns in dataset
DataFrames are stored as int16 for performance. Add decoded_person,
decoded_benunit and decoded_household properties to UKSingleYearDataset that
return copies of the DataFrames with enum columns decoded back to their string
names (e.g. 0 -> 'MALE'). The internal int16 representation is unchanged so
simulation performance is unaffected.
…ings

Replace plain DataFrame attributes with properties that decode int16 enum
columns back to string names on access. The raw int16 data is stored in
private _person/_benunit/_household attributes and accessed via .tables by
the simulation engine, preserving the warm-load performance gain from #1497.

Callers (e.g. sim.dataset[year].person) now see string labels as before.
@nwoodruff-co nwoodruff-co changed the title Restore string enum labels via decoded_* properties on UKSingleYearDataset Restore string enum labels in UKSingleYearDataset.person/benunit/household Feb 19, 2026
@nwoodruff-co nwoodruff-co merged commit dd61183 into main Feb 19, 2026
2 checks passed
@nwoodruff-co nwoodruff-co deleted the fix/decoded-enum-columns-on-dataset branch February 19, 2026 14:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant