Implement ratio-of-means variance for per-acre estimates#70
Merged
Conversation
Replace simplified variance formula V = Σ_h EXPNS² × s²_yh × n_h with the exact Bechtold & Patterson (2005) formula that includes the V2 post-stratification correction term, matching rFIA's unitVar(): V_EU = (A²/n) × Σ_h W_h × s²_yh/n_h + (A²/n²) × Σ_h (1-W_h) × s²_yh/n_h This corrects a 1-3% discrepancy with EVALIDator that grows when stratum allocation is non-proportional (W_h ≠ n_h/n). - Load POP_ESTN_UNIT and compute STRATUM_WGT in data_loading.py - Rewrite both variance functions with exact V1+V2 per estimation unit - Propagate B&P columns through all_plots selection in base.py/grm_base.py - Add 8 new tests including hand-calculated values and V2 > 0 verification - Falls back to simplified formula when B&P columns are absent
Replace the simplified se_acre = se_total / total_area formula with the correct ratio-of-means variance from B&P (2005) Section 4.2: V(R) = (1/X²) × [V(Y) + R²×V(X) - 2R×Cov(Y,X)] The old formula treated the denominator (total area X) as a known constant, ignoring the covariance between Y and X. Since they are estimated from the same plots and positively correlated, the old formula overestimated per-acre SE. Changes: - Add calculate_ratio_of_means_variance() with exact B&P and simplified fallback paths to variance.py - Extend grouped variance functions to compute V(X) and Cov(Y,X) alongside V(Y) for ratio variance in se_acre - Update callers in base.py, grm_base.py, carbon_pools.py - Fix carbon_pools.py x_i=1.0 bug (now uses sum(CONDPROP_UNADJ)) - Add 17 unit tests including hand-calculated verification, backward compatibility, and non-negativity property tests
Prevents inf/NaN propagation in downstream variance calculations if a POP_ESTN_UNIT record has null or zero P1PNTCNT_EU.
The exact B&P post-stratified variance formula uses s²_h directly: V = (A²/n) × Σ W_h × s²_h + (A²/n²) × Σ (1-W_h) × s²_h The previous implementation incorrectly divided s²_h by n_h_design (P2POINTCNT), which underestimated variance by a factor of ~n_h (typically 15-100x). This caused SE values to be far below EVALIDator benchmarks for TPA, volume, biomass, and GRM estimators. Fix applied to both _calculate_exact_bp_variance (domain total) and _calculate_exact_bp_ratio_variance (ratio-of-means). Unit test hand-calculations updated to match corrected formula. All 703 unit tests and EVALIDator validation tests now pass.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
V(R) = (1/X²) × [V(Y) + R²×V(X) - 2R×Cov(Y,X)]pl.lit(1.0)instead ofsum(CONDPROP_UNADJ)per plot, making covariance meaninglessThe old formula
se_acre = se_total / total_areatreated the denominator as a known constant, ignoring covariance between Y (tree attribute) and X (area). Since they're estimated from the same sample plots and positively correlated, the old formula overestimated per-acre SE.Files changed
variance.pycalculate_ratio_of_means_variance()+ exact B&P and simplified helpers; extend grouped functions to compute V(X), Cov(Y,X)base.py_calculate_grouped_multi_metric_variance()and_calculate_overall_multi_metric_variance()grm_base.py_calculate_grm_variance()grouped and ungrouped pathscarbon_pools.pytest_variance_formulas.pyTestRatioOfMeansVarianceclass with 17 testsTest plan
uv run pytest tests/unit/test_variance_formulas.py -v— 76/76 pass (59 existing + 17 new)uv run pytest tests/unit/ -v— 703/703 pass, 0 regressionsuv run pytest tests/validation/ -v— EVALIDator comparison (SE values should improve slightly)