Skip to content

Fix bugs and clean up technical debt#69

Merged
mihiarc merged 3 commits intomainfrom
dev/critical-review
Feb 7, 2026
Merged

Fix bugs and clean up technical debt#69
mihiarc merged 3 commits intomainfrom
dev/critical-review

Conversation

@mihiarc
Copy link
Owner

@mihiarc mihiarc commented Feb 6, 2026

Summary

Commit 1: Critical review bug fixes and technical debt

  • Fix version mismatch: __init__.py was 1.1.0b7, now matches pyproject.toml (1.2.3)
  • Fix FIA.__exit__ no-op: Added close() method to properly disconnect DB backend on context manager exit
  • Remove pandas dependency: Removed from both core and optional deps — zero pandas imports exist in src/pyfia/
  • Fix hardcoded year 2023: format_output now uses _extract_evaluation_year() instead of config.get("year", 2023)
  • Preserve AREA_TOTAL column: Was dropped in compute_per_acre_values despite being documented as output
  • Fix lru_cache memory leak: Replaced @lru_cache on instance method get_stratification_data with manual instance-level cache
  • Migrate carbon_pools to AggregationResult pattern: Eliminated last estimator using old instance-state pattern; removed dead code
  • Document variance formula: Added algebraic identity explaining equivalence of W_h²×n_h and textbook N_h²/n_h forms

Commit 2: Area variance fix, deprecated code removal, example updates

  • Fix Issue Variance underestimated for rare categories when using grp_by #68: Area variance severely underestimated for rare categories when using grp_by. Now cross-joins all stratum plots with group values so non-matching plots contribute y=0
  • Fix LazyFrame/DataFrame concat type mismatch in batch_query_by_values
  • Fix EVALIDator client using incorrect JSONDecodeError for empty responses
  • Pass area_domain to get_cond_columns in all estimators so domain filter columns are loaded
  • Remove deprecated assign_forest_type_group (use add_forest_type_group)
  • Remove 5 outdated example scripts, modernize 3 remaining ones
  • Add descriptive aliases for EVALIDator estimate type codes
  • Add 4 tests for rare category variance (Issue Variance underestimated for rare categories when using grp_by #68)

Test plan

  • All 677 unit tests pass (21 skipped — optional spatial deps)
  • ruff check and ruff format pass
  • Run validation tests against EVALIDator to confirm estimates unchanged
  • Verify FIA context manager properly disconnects with close()
  • Test area variance with grouped estimates against EVALIDator SE values

- Fix version mismatch: __init__.py now matches pyproject.toml (1.2.3)
- Fix FIA.__exit__ no-op: added close() to properly disconnect DB backend
- Remove pandas dependency: not used anywhere in src/pyfia/
- Fix hardcoded year 2023 default in format_output
- Preserve AREA_TOTAL column in aggregation output (was dropped prematurely)
- Replace lru_cache on instance method with manual cache to prevent GC leak
- Migrate carbon_pools to AggregationResult pattern, remove dead code
- Add algebraic identity documentation to variance formula
- Use NotImplementedError for aggregate_results (not abstract) to support
  estimators like AreaChangeEstimator that override estimate() entirely

All 677 unit tests pass.
…mples

Bug fixes:
- Fix Issue #68: Area variance severely underestimated for rare categories
  when using grp_by. Now cross-joins all stratum plots with group values
  so non-matching plots contribute y=0 to variance calculation.
- Fix LazyFrame/DataFrame type mismatch in batch_query_by_values concat
- Fix EVALIDator client using incorrect JSONDecodeError for empty responses
- Pass area_domain to get_cond_columns in all estimators so domain filter
  columns are loaded from the database
- Update area_change calculate_variance signature for AggregationResult

Cleanup:
- Remove deprecated assign_forest_type_group (use add_forest_type_group)
- Remove 5 outdated example scripts, modernize 3 remaining ones
- Add descriptive aliases for EVALIDator estimate type codes
- Fix data_reader type annotation for MotherDuck path

Tests:
- Add 4 tests for rare category variance (Issue #68)
- Update classification tests to use add_forest_type_group directly
- Update reference table download test for bundled zip handling
- Improve biomass validation skip reason documentation
The base class was fixed to use _extract_evaluation_year() but the area
estimator overrides format_output and still had pl.lit(2023). Now uses
the same pattern as all other estimators.
@mihiarc mihiarc merged commit 6e7ad9c into main Feb 7, 2026
3 checks passed
@mihiarc mihiarc deleted the dev/critical-review branch February 7, 2026 11:40
mihiarc added a commit that referenced this pull request Feb 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant