Analyze model performance trends and extrapolate future outcomes #72

solresol · 2026-01-08T00:35:32Z

No description provided.

This commit implements several new features to analyze and visualize when the Importance-Optimised p-adic LR model will outperform other models: 1. Extrapolation Analysis: - Calculate crossover points where UMLLR regression will intersect with NN and DT regressions - Use bootstrap resampling (n=1000) to calculate 95% confidence intervals - Estimate probability of crossover occurring 2. Dataset Growth Prediction: - Calculate linear growth rate for products and tags over time - Predict future dates when dataset will reach extrapolated crossover points - Include R² and growth rate statistics 3. Trajectory Charts: - New chart type showing model progression from oldest to newest data - Arrows indicate trajectory of each model type - Helps visualize improvement patterns over time 4. Statistical Tests: - Bootstrap-based confidence intervals for crossover predictions - P-values and R² for all regression analyses - Standard error estimates for prediction uncertainty All analyses are displayed in HTML tables alongside existing regression statistics for both "Model Performance vs Dataset Size" and "Model Performance vs Feature Space" charts.

…PCNN and DT The extrapolation should compare UMLLR against the unconstrained models (UNN and ULR) that currently outperform it, not against the other p-adic constrained models (PCNN and DT).

Now comparing UMLLR against UNN, ULR, and DT to predict when UMLLR will outperform all three models.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c2334bfd80

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-01-08T00:38:06Z

padjective/build_site.py

+        # Bootstrap sample for model 1
+        idx1 = rng.choice(len(x1), size=len(x1), replace=True)
+        result1 = stats.linregress(x1[idx1], y1[idx1])
+
+        # Bootstrap sample for model 2
+        idx2 = rng.choice(len(x2), size=len(x2), replace=True)
+        result2 = stats.linregress(x2[idx2], y2[idx2])


Guard bootstrap regressions against identical x-values

The bootstrap loop calls stats.linregress on resampled x1[idx1]/x2[idx2] without checking variance. When there are only a few data points (e.g., 2 snapshots) or repeated num_products/num_tags values, resampling with replacement can easily produce all-identical x values, which causes linregress to raise a ValueError and abort site generation. Consider skipping samples where the resampled x has fewer than 2 unique values or catching the exception before continuing.

Useful? React with 👍 / 👎.

claude · 2026-01-08T00:40:36Z

PR Review: Analyze model performance trends and extrapolate future outcomes

Overall Assessment

This PR adds sophisticated statistical analysis capabilities to predict when the Importance-Optimised p-adic LR model will outperform other models. The implementation is mathematically sound and adds valuable analytical features, but there are several areas that need attention.

Strengths

Well-structured statistical analysis: The crossover point calculation and bootstrap confidence intervals are properly implemented
Good separation of concerns: Functions are modular and have clear responsibilities
Comprehensive visualization: Both trajectory charts and extrapolation tables provide good insights
Documentation: Functions have clear docstrings explaining their purpose

Critical Issues

1. Missing Error Handling for Division by Zero (build_site.py:2993)
While you check if abs(slope1 - slope2) < 1e-10, there's no try-except block. Recommendation: Add explicit error handling.

2. Hard-coded Bootstrap Iterations (build_site.py:3050)
The value n_bootstrap=1000 is hard-coded. Consider making this configurable.

3. No Unit Tests for New Functions - CRITICAL
The PR adds ~500 lines of complex statistical code but includes NO tests. This is a significant concern.

Functions that need tests:

_calculate_crossover_point()
_calculate_crossover_confidence()
_estimate_dataset_growth()
_predict_date_for_value()
_format_extrapolation_analysis_html()

Code Quality Issues

4. Function Complexity (build_site.py:3159-3262)
The _format_extrapolation_analysis_html() function is 103 lines long and does too much. Break into smaller functions.

5. Inconsistent Return Types (build_site.py:7281-7283)
Returning a 6-tuple is hard to work with. Consider using a dataclass or NamedTuple.

6. Type Hints Missing
Complete type annotations would improve code quality.

7. Import Organization (build_site.py:3158)
The timedelta import is inside the function. Move to top of file.

Statistical Concerns

8. Assumption of Linear Trends
The analysis assumes linear regression trends will continue. Add a warning when R² is low (< 0.7).

9. No Correction for Multiple Comparisons
Multiple crossover points without adjusted confidence intervals could lead to false positives.

Action Items

High Priority (Fix before merge):

Add unit tests for all new statistical functions
Add error handling for edge cases
Improve type hints throughout
Refactor complex functions

Medium Priority:
5. Use NamedTuple instead of 6-tuple return values
6. Extract repeated code to utilities
7. Add logging for debugging
8. Add warnings when R² is low

Conclusion

This is a solid contribution with sound statistical methods, but needs:

Better error handling for edge cases
Unit tests to ensure correctness
Code refactoring to improve maintainability

Recommendation: Request changes for test coverage and error handling before merging.

claude added 3 commits January 7, 2026 01:51

Fix extrapolation analysis to compare against UNN and ULR instead of …

3dcf96e

…PCNN and DT The extrapolation should compare UMLLR against the unconstrained models (UNN and ULR) that currently outperform it, not against the other p-adic constrained models (PCNN and DT).

Add Decision Tree to extrapolation analysis comparisons

c2334bf

Now comparing UMLLR against UNN, ULR, and DT to predict when UMLLR will outperform all three models.

solresol merged commit 6a4620e into main Jan 8, 2026
1 of 2 checks passed

solresol deleted the claude/model-performance-extrapolation-pwzRb branch January 8, 2026 00:36

chatgpt-codex-connector bot reviewed Jan 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Analyze model performance trends and extrapolate future outcomes #72

Analyze model performance trends and extrapolate future outcomes #72

Uh oh!

solresol commented Jan 8, 2026

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Jan 8, 2026

Uh oh!

claude bot commented Jan 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Analyze model performance trends and extrapolate future outcomes #72

Analyze model performance trends and extrapolate future outcomes #72

Uh oh!

Conversation

solresol commented Jan 8, 2026

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot commented Jan 8, 2026

PR Review: Analyze model performance trends and extrapolate future outcomes

Overall Assessment

Strengths

Critical Issues

Code Quality Issues

Statistical Concerns

Action Items

Conclusion

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants