-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SLEP021: Unified API for computing feature importance #86
base: main
Are you sure you want to change the base?
Conversation
ping @thomasjpfan. Feel free to edit and push directly in the branch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another API suggestion would be to introduce a FeatureImportanceCalculatorInterface
kinda thing and pass that either to __init__
, or plotting methods, or to get_feature_importance()
, or the other way around, to pass an estimator to it, which is like the meta-estimator idea you have here.
It's a good start here, but this is gonna be a bumpy road lol
Definitely a user trap worth removing! Are there cases where selecting the feature importance method(s) results in different work/storage of information during |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm super excited about this proposal! Congratulations, @glemaitre and @thomasjpfan, for structuring this!
From my point of view, we also have the option of it always being a function (just like for sklearn.inspection.permutation_importance
). Any reason it's not listed? (Sorry if I didn't follow some discussion that discourages this option.)
**Proposal 1**: Expose a parameter in `__init__` to select the method to use | ||
to compute the feature importance. The computation will be done using a method, | ||
e.g. `get_feature_importance` that could take additional parameters requested | ||
by the feature importance method. This method could therefore be used | ||
internally by :class:`sklearn.model_selection.SelectFromModel`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm violating some SOLID principle, but we could incorporate feature importance agnostic techniques into some mixin like ClassifierMixin
and RegressorMixin
and specific methods within each class when applicable. In this sense, we would have the "main" feature importance method selected during init (defining the behavior of get_feature_importance
). Still, one could always use the others because the estimator would have get_permutation_importance
, get_mdi_importance
, get_abs_coef_importance
etc.
Currently scikit-learn provides only global feature importance. The previous | ||
API could be extended by providing a `get_samples_importance` to compute an | ||
explanation per sample if the given method supports it (e.g. Shapley values). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About "explain_one
", I'm trying to think of scenarios where you can use it and "explain_all
" inside the same Explainer
. I don't see many different scenarios than doing something like explain_all(X) = np.mean(np.abs(explain_one(X)), axis=1)
(for SHAP or LIME for instance). Did you have any other technique in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Theoretically speaking, I would be surprised that the average of the absolute would always be the way to go. Having a class would allow us to redefine the proper way to combine individual explanations.
API could be extended by providing a `get_samples_importance` to compute an | ||
explanation per sample if the given method supports it (e.g. Shapley values). | ||
|
||
**Proposal 4**: Create a meta-estimator `FeatureImportanceCalculator` that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still need to choose my favorite proposal, although they are all very elegant (superficially thinking, I prefer the functions option). But I find proposal 1 less complex than adding an extra layer of abstraction proposed in 2, 3, and 4 (creating meta-estimators).
From the user's point of view, not having to "leave the estimator class" facilitates and increases the chance of use, at least for a scikit-learn noob. I don't know if this is a concern. I always see the scikit-learn development following this path and wanted to understand why. Is it just an object orientation issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the user's point of view, not having to "leave the estimator class" facilitates and increases the chance of use
This is a reasonable point but I think that most of the inspection/display tools are now leaving outside.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main thought regarding the usage of a class upon a function is to be able to store states. Having states can also help for efficient computation with potential caching mechanisms. One solution that I did not mention here is to incorporate the plotting capabilities directly inside the meta-estimator. This would also be new.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least to me, the one proposal which I had liked was to have a method at the estimator level, replacing feature_importances_
and taking all the arguments relevant to that method for that estimator, which I don't see in the proposals section here. We could also add different methods to explain both samples and features separately maybe?
I also don't mind the Explainer
, but that might be a tricky one to implement?
Whatever we decide, we should make sure it's easy to add explainer methods w/o too much __init__
param overhead via separate classes or separate methods.
This proposal should also provide pros and cons between different alternatives and suggest one to move forward. WDYT of having a draft PR for the one/two that we like to have a better idea?
This method is therefore estimator agnostic. | ||
- The linear estimators have a `coef_` attributes once fitted, which is | ||
sometimes used their corresponding importance. We documented the limitations | ||
when it comes to interpret those coefficients. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a link to the docs for this would be nice for the record.
Additionally, `feature_importances_` and `coef_` are statistics derived from | ||
the training set. We already documented that the reported | ||
`feature_importances_` will potentially show biases for features used by the | ||
model to overfit. Thus, it will potentially negatively impact the feature |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also a link to the docs we're mentioning would be nice
Reflection around a unified API to compute feature importance in scikit-learn