Skip to content

Conversation

@solresol
Copy link
Owner

@solresol solresol commented Jan 8, 2026

No description provided.

claude added 3 commits January 7, 2026 01:51
This commit implements several new features to analyze and visualize when
the Importance-Optimised p-adic LR model will outperform other models:

1. Extrapolation Analysis:
   - Calculate crossover points where UMLLR regression will intersect with
     NN and DT regressions
   - Use bootstrap resampling (n=1000) to calculate 95% confidence intervals
   - Estimate probability of crossover occurring

2. Dataset Growth Prediction:
   - Calculate linear growth rate for products and tags over time
   - Predict future dates when dataset will reach extrapolated crossover points
   - Include R² and growth rate statistics

3. Trajectory Charts:
   - New chart type showing model progression from oldest to newest data
   - Arrows indicate trajectory of each model type
   - Helps visualize improvement patterns over time

4. Statistical Tests:
   - Bootstrap-based confidence intervals for crossover predictions
   - P-values and R² for all regression analyses
   - Standard error estimates for prediction uncertainty

All analyses are displayed in HTML tables alongside existing regression
statistics for both "Model Performance vs Dataset Size" and
"Model Performance vs Feature Space" charts.
…PCNN and DT

The extrapolation should compare UMLLR against the unconstrained models
(UNN and ULR) that currently outperform it, not against the other p-adic
constrained models (PCNN and DT).
Now comparing UMLLR against UNN, ULR, and DT to predict when
UMLLR will outperform all three models.
@solresol solresol merged commit 6a4620e into main Jan 8, 2026
1 of 2 checks passed
@solresol solresol deleted the claude/model-performance-extrapolation-pwzRb branch January 8, 2026 00:36
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c2334bfd80

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +3029 to +3035
# Bootstrap sample for model 1
idx1 = rng.choice(len(x1), size=len(x1), replace=True)
result1 = stats.linregress(x1[idx1], y1[idx1])

# Bootstrap sample for model 2
idx2 = rng.choice(len(x2), size=len(x2), replace=True)
result2 = stats.linregress(x2[idx2], y2[idx2])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Guard bootstrap regressions against identical x-values

The bootstrap loop calls stats.linregress on resampled x1[idx1]/x2[idx2] without checking variance. When there are only a few data points (e.g., 2 snapshots) or repeated num_products/num_tags values, resampling with replacement can easily produce all-identical x values, which causes linregress to raise a ValueError and abort site generation. Consider skipping samples where the resampled x has fewer than 2 unique values or catching the exception before continuing.

Useful? React with 👍 / 👎.

@claude
Copy link

claude bot commented Jan 8, 2026

PR Review: Analyze model performance trends and extrapolate future outcomes

Overall Assessment

This PR adds sophisticated statistical analysis capabilities to predict when the Importance-Optimised p-adic LR model will outperform other models. The implementation is mathematically sound and adds valuable analytical features, but there are several areas that need attention.

Strengths

  1. Well-structured statistical analysis: The crossover point calculation and bootstrap confidence intervals are properly implemented
  2. Good separation of concerns: Functions are modular and have clear responsibilities
  3. Comprehensive visualization: Both trajectory charts and extrapolation tables provide good insights
  4. Documentation: Functions have clear docstrings explaining their purpose

Critical Issues

1. Missing Error Handling for Division by Zero (build_site.py:2993)
While you check if abs(slope1 - slope2) < 1e-10, there's no try-except block. Recommendation: Add explicit error handling.

2. Hard-coded Bootstrap Iterations (build_site.py:3050)
The value n_bootstrap=1000 is hard-coded. Consider making this configurable.

3. No Unit Tests for New Functions - CRITICAL
The PR adds ~500 lines of complex statistical code but includes NO tests. This is a significant concern.

Functions that need tests:

  • _calculate_crossover_point()
  • _calculate_crossover_confidence()
  • _estimate_dataset_growth()
  • _predict_date_for_value()
  • _format_extrapolation_analysis_html()

Code Quality Issues

4. Function Complexity (build_site.py:3159-3262)
The _format_extrapolation_analysis_html() function is 103 lines long and does too much. Break into smaller functions.

5. Inconsistent Return Types (build_site.py:7281-7283)
Returning a 6-tuple is hard to work with. Consider using a dataclass or NamedTuple.

6. Type Hints Missing
Complete type annotations would improve code quality.

7. Import Organization (build_site.py:3158)
The timedelta import is inside the function. Move to top of file.

Statistical Concerns

8. Assumption of Linear Trends
The analysis assumes linear regression trends will continue. Add a warning when R² is low (< 0.7).

9. No Correction for Multiple Comparisons
Multiple crossover points without adjusted confidence intervals could lead to false positives.

Action Items

High Priority (Fix before merge):

  1. Add unit tests for all new statistical functions
  2. Add error handling for edge cases
  3. Improve type hints throughout
  4. Refactor complex functions

Medium Priority:
5. Use NamedTuple instead of 6-tuple return values
6. Extract repeated code to utilities
7. Add logging for debugging
8. Add warnings when R² is low

Conclusion

This is a solid contribution with sound statistical methods, but needs:

  • Better error handling for edge cases
  • Unit tests to ensure correctness
  • Code refactoring to improve maintainability

Recommendation: Request changes for test coverage and error handling before merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants