-
Notifications
You must be signed in to change notification settings - Fork 21
Fix feature alignment when applying XGBoost/HistGradientBoosting weights without specifying ss_main_score #167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
|
@copilot can you add a test for this using the https://github.com/PyProphet/pyprophet/tree/master/tests/data/test_data.osw test file. You can test and then apply the weights with You can captuer the stdout of the importance of var XX. They should be the same |
|
@jcharkow can you test if this works for you? |
|
Not really, the only score that seems to work as intended is the Command Run: Apply scores run (same file for testing):
Furthermore, if I exclude any of the other parameters I get errors.
|
|
I am curious if possibly an easier solution is to create --apply_weights as its own command and then just not have any parameters like --ss_... or --classifier or --level and this is all just inferred from the --apply_weights file |
Problem
When training an XGBoost or HistGradientBoosting model with a specific
--ss_main_scoreparameter and then applying the trained weights without specifying the same parameter, features become misaligned, causing incorrect scoring results.Example of the issue:
# Step 1: Train with specific main score pyprophet score --in data.osw --level=ms1ms2 --classifier=XGBoost --ss_main_score=var_dotprod_scoreThe model trains successfully with
var_dotprod_scoreas the main score, showing correct feature importances:# Step 2: Apply weights WITHOUT specifying the main score pyprophet score --in data.osw --level=ms1ms2 --classifier=XGBoost --apply_weights=weights.binThis applies weights to incorrect features because
--ss_main_scoredefaults toauto, potentially selecting a different main score and changing the feature order:The root cause is that during training, features are prepared based on the specified
ss_main_score, but when applying weights, if this parameter is not specified, it defaults toauto, which may select a different main score. This changes the feature order, causing the model to apply weights to the wrong features.Solution
This PR stores metadata (ss_main_score, classifier, level) alongside the trained model and automatically restores the correct
ss_main_scorewhen applying weights.Implementation
1. Enhanced Model Serialization (
pyprophet/io/_base.py,pyprophet/io/scoring/osw.py)Models are now saved with metadata:
2. Automatic Metadata Restoration (
pyprophet/scoring/runner.py)When loading weights:
--ss_main_score=auto(default), automatically uses the stored valuess_use_dynamic_main_scoreflag for correct semi-supervised learning behavior3. Backward Compatibility
Old weight files (without metadata) are automatically detected and still work with an appropriate warning:
Usage
After this fix, applying weights no longer requires manually specifying
--ss_main_score:Benefits
ss_main_scorewas used during trainingTesting
Comprehensive testing demonstrates:
Related
Fixes issue: "
--apply_weightsrequires--ss_main_scoreto be specified as in the original command"Related to draft PR #117 which explored feature name tracking approaches.
Original prompt
Fixes #151
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.