Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SLEP021: Unified API for computing feature importance #86

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

glemaitre
Copy link
Member

Reflection around a unified API to compute feature importance in scikit-learn

@glemaitre
Copy link
Member Author

ping @thomasjpfan. Feel free to edit and push directly in the branch.

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another API suggestion would be to introduce a FeatureImportanceCalculatorInterface kinda thing and pass that either to __init__, or plotting methods, or to get_feature_importance(), or the other way around, to pass an estimator to it, which is like the meta-estimator idea you have here.

It's a good start here, but this is gonna be a bumpy road lol

slep021/proposal.rst Outdated Show resolved Hide resolved
slep021/proposal.rst Outdated Show resolved Hide resolved
slep021/proposal.rst Outdated Show resolved Hide resolved
slep021/proposal.rst Outdated Show resolved Hide resolved
slep021/proposal.rst Outdated Show resolved Hide resolved
@jnothman
Copy link
Member

Definitely a user trap worth removing!

Are there cases where selecting the feature importance method(s) results in different work/storage of information during fit?

Copy link

@vitaliset vitaliset left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm super excited about this proposal! Congratulations, @glemaitre and @thomasjpfan, for structuring this!

From my point of view, we also have the option of it always being a function (just like for sklearn.inspection.permutation_importance). Any reason it's not listed? (Sorry if I didn't follow some discussion that discourages this option.)

Comment on lines +135 to +139
**Proposal 1**: Expose a parameter in `__init__` to select the method to use
to compute the feature importance. The computation will be done using a method,
e.g. `get_feature_importance` that could take additional parameters requested
by the feature importance method. This method could therefore be used
internally by :class:`sklearn.model_selection.SelectFromModel`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm violating some SOLID principle, but we could incorporate feature importance agnostic techniques into some mixin like ClassifierMixin and RegressorMixin and specific methods within each class when applicable. In this sense, we would have the "main" feature importance method selected during init (defining the behavior of get_feature_importance). Still, one could always use the others because the estimator would have get_permutation_importance, get_mdi_importance, get_abs_coef_importance etc.

Comment on lines +156 to +158
Currently scikit-learn provides only global feature importance. The previous
API could be extended by providing a `get_samples_importance` to compute an
explanation per sample if the given method supports it (e.g. Shapley values).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About "explain_one", I'm trying to think of scenarios where you can use it and "explain_all" inside the same Explainer. I don't see many different scenarios than doing something like explain_all(X) = np.mean(np.abs(explain_one(X)), axis=1) (for SHAP or LIME for instance). Did you have any other technique in mind?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theoretically speaking, I would be surprised that the average of the absolute would always be the way to go. Having a class would allow us to redefine the proper way to combine individual explanations.

API could be extended by providing a `get_samples_importance` to compute an
explanation per sample if the given method supports it (e.g. Shapley values).

**Proposal 4**: Create a meta-estimator `FeatureImportanceCalculator` that

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still need to choose my favorite proposal, although they are all very elegant (superficially thinking, I prefer the functions option). But I find proposal 1 less complex than adding an extra layer of abstraction proposed in 2, 3, and 4 (creating meta-estimators).
From the user's point of view, not having to "leave the estimator class" facilitates and increases the chance of use, at least for a scikit-learn noob. I don't know if this is a concern. I always see the scikit-learn development following this path and wanted to understand why. Is it just an object orientation issue?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the user's point of view, not having to "leave the estimator class" facilitates and increases the chance of use

This is a reasonable point but I think that most of the inspection/display tools are now leaving outside.

Copy link
Member Author

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main thought regarding the usage of a class upon a function is to be able to store states. Having states can also help for efficient computation with potential caching mechanisms. One solution that I did not mention here is to incorporate the plotting capabilities directly inside the meta-estimator. This would also be new.

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least to me, the one proposal which I had liked was to have a method at the estimator level, replacing feature_importances_ and taking all the arguments relevant to that method for that estimator, which I don't see in the proposals section here. We could also add different methods to explain both samples and features separately maybe?

I also don't mind the Explainer, but that might be a tricky one to implement?

Whatever we decide, we should make sure it's easy to add explainer methods w/o too much __init__ param overhead via separate classes or separate methods.

This proposal should also provide pros and cons between different alternatives and suggest one to move forward. WDYT of having a draft PR for the one/two that we like to have a better idea?

This method is therefore estimator agnostic.
- The linear estimators have a `coef_` attributes once fitted, which is
sometimes used their corresponding importance. We documented the limitations
when it comes to interpret those coefficients.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a link to the docs for this would be nice for the record.

Comment on lines +119 to +122
Additionally, `feature_importances_` and `coef_` are statistics derived from
the training set. We already documented that the reported
`feature_importances_` will potentially show biases for features used by the
model to overfit. Thus, it will potentially negatively impact the feature
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also a link to the docs we're mentioning would be nice

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants