-
Notifications
You must be signed in to change notification settings - Fork 0
Analyze model performance trends and extrapolate future outcomes #72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This commit implements several new features to analyze and visualize when
the Importance-Optimised p-adic LR model will outperform other models:
1. Extrapolation Analysis:
- Calculate crossover points where UMLLR regression will intersect with
NN and DT regressions
- Use bootstrap resampling (n=1000) to calculate 95% confidence intervals
- Estimate probability of crossover occurring
2. Dataset Growth Prediction:
- Calculate linear growth rate for products and tags over time
- Predict future dates when dataset will reach extrapolated crossover points
- Include R² and growth rate statistics
3. Trajectory Charts:
- New chart type showing model progression from oldest to newest data
- Arrows indicate trajectory of each model type
- Helps visualize improvement patterns over time
4. Statistical Tests:
- Bootstrap-based confidence intervals for crossover predictions
- P-values and R² for all regression analyses
- Standard error estimates for prediction uncertainty
All analyses are displayed in HTML tables alongside existing regression
statistics for both "Model Performance vs Dataset Size" and
"Model Performance vs Feature Space" charts.
…PCNN and DT The extrapolation should compare UMLLR against the unconstrained models (UNN and ULR) that currently outperform it, not against the other p-adic constrained models (PCNN and DT).
Now comparing UMLLR against UNN, ULR, and DT to predict when UMLLR will outperform all three models.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c2334bfd80
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| # Bootstrap sample for model 1 | ||
| idx1 = rng.choice(len(x1), size=len(x1), replace=True) | ||
| result1 = stats.linregress(x1[idx1], y1[idx1]) | ||
|
|
||
| # Bootstrap sample for model 2 | ||
| idx2 = rng.choice(len(x2), size=len(x2), replace=True) | ||
| result2 = stats.linregress(x2[idx2], y2[idx2]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guard bootstrap regressions against identical x-values
The bootstrap loop calls stats.linregress on resampled x1[idx1]/x2[idx2] without checking variance. When there are only a few data points (e.g., 2 snapshots) or repeated num_products/num_tags values, resampling with replacement can easily produce all-identical x values, which causes linregress to raise a ValueError and abort site generation. Consider skipping samples where the resampled x has fewer than 2 unique values or catching the exception before continuing.
Useful? React with 👍 / 👎.
PR Review: Analyze model performance trends and extrapolate future outcomesOverall AssessmentThis PR adds sophisticated statistical analysis capabilities to predict when the Importance-Optimised p-adic LR model will outperform other models. The implementation is mathematically sound and adds valuable analytical features, but there are several areas that need attention. Strengths
Critical Issues1. Missing Error Handling for Division by Zero (build_site.py:2993) 2. Hard-coded Bootstrap Iterations (build_site.py:3050) 3. No Unit Tests for New Functions - CRITICAL Functions that need tests:
Code Quality Issues4. Function Complexity (build_site.py:3159-3262) 5. Inconsistent Return Types (build_site.py:7281-7283) 6. Type Hints Missing 7. Import Organization (build_site.py:3158) Statistical Concerns8. Assumption of Linear Trends 9. No Correction for Multiple Comparisons Action ItemsHigh Priority (Fix before merge):
Medium Priority: ConclusionThis is a solid contribution with sound statistical methods, but needs:
Recommendation: Request changes for test coverage and error handling before merging. |
No description provided.