SLEP021: Unified API for computing feature importance #86

glemaitre · 2023-03-09T08:45:06Z

Reflection around a unified API to compute feature importance in scikit-learn

glemaitre · 2023-03-09T09:47:39Z

ping @thomasjpfan. Feel free to edit and push directly in the branch.

adrinjalali

Another API suggestion would be to introduce a FeatureImportanceCalculatorInterface kinda thing and pass that either to __init__, or plotting methods, or to get_feature_importance(), or the other way around, to pass an estimator to it, which is like the meta-estimator idea you have here.

It's a good start here, but this is gonna be a bumpy road lol

slep021/proposal.rst

jnothman · 2023-04-29T13:31:14Z

Definitely a user trap worth removing!

Are there cases where selecting the feature importance method(s) results in different work/storage of information during fit?

vitaliset

I'm super excited about this proposal! Congratulations, @glemaitre and @thomasjpfan, for structuring this!

From my point of view, we also have the option of it always being a function (just like for sklearn.inspection.permutation_importance). Any reason it's not listed? (Sorry if I didn't follow some discussion that discourages this option.)

vitaliset · 2023-05-08T03:38:22Z

slep021/proposal.rst

+**Proposal 1**: Expose a parameter in `__init__` to select the method to use
+to compute the feature importance. The computation will be done using a method,
+e.g. `get_feature_importance` that could take additional parameters requested
+by the feature importance method. This method could therefore be used
+internally by :class:`sklearn.model_selection.SelectFromModel`.


Maybe I'm violating some SOLID principle, but we could incorporate feature importance agnostic techniques into some mixin like ClassifierMixin and RegressorMixin and specific methods within each class when applicable. In this sense, we would have the "main" feature importance method selected during init (defining the behavior of get_feature_importance). Still, one could always use the others because the estimator would have get_permutation_importance, get_mdi_importance, get_abs_coef_importance etc.

vitaliset · 2023-05-08T03:40:12Z

slep021/proposal.rst

+Currently scikit-learn provides only global feature importance. The previous
+API could be extended by providing a `get_samples_importance` to compute an
+explanation per sample if the given method supports it (e.g. Shapley values).


About "explain_one", I'm trying to think of scenarios where you can use it and "explain_all" inside the same Explainer. I don't see many different scenarios than doing something like explain_all(X) = np.mean(np.abs(explain_one(X)), axis=1) (for SHAP or LIME for instance). Did you have any other technique in mind?

Theoretically speaking, I would be surprised that the average of the absolute would always be the way to go. Having a class would allow us to redefine the proper way to combine individual explanations.

vitaliset · 2023-05-08T03:51:07Z

slep021/proposal.rst

+API could be extended by providing a `get_samples_importance` to compute an
+explanation per sample if the given method supports it (e.g. Shapley values).
+
+**Proposal 4**: Create a meta-estimator `FeatureImportanceCalculator` that


I still need to choose my favorite proposal, although they are all very elegant (superficially thinking, I prefer the functions option). But I find proposal 1 less complex than adding an extra layer of abstraction proposed in 2, 3, and 4 (creating meta-estimators).
From the user's point of view, not having to "leave the estimator class" facilitates and increases the chance of use, at least for a scikit-learn noob. I don't know if this is a concern. I always see the scikit-learn development following this path and wanted to understand why. Is it just an object orientation issue?

From the user's point of view, not having to "leave the estimator class" facilitates and increases the chance of use

This is a reasonable point but I think that most of the inspection/display tools are now leaving outside.

glemaitre

My main thought regarding the usage of a class upon a function is to be able to store states. Having states can also help for efficient computation with potential caching mechanisms. One solution that I did not mention here is to incorporate the plotting capabilities directly inside the meta-estimator. This would also be new.

adrinjalali

At least to me, the one proposal which I had liked was to have a method at the estimator level, replacing feature_importances_ and taking all the arguments relevant to that method for that estimator, which I don't see in the proposals section here. We could also add different methods to explain both samples and features separately maybe?

I also don't mind the Explainer, but that might be a tricky one to implement?

Whatever we decide, we should make sure it's easy to add explainer methods w/o too much __init__ param overhead via separate classes or separate methods.

This proposal should also provide pros and cons between different alternatives and suggest one to move forward. WDYT of having a draft PR for the one/two that we like to have a better idea?

adrinjalali · 2024-05-25T14:23:46Z

slep021/proposal.rst

+  This method is therefore estimator agnostic.
+- The linear estimators have a `coef_` attributes once fitted, which is
+  sometimes used their corresponding importance. We documented the limitations
+  when it comes to interpret those coefficients.


a link to the docs for this would be nice for the record.

adrinjalali · 2024-05-25T14:29:06Z

slep021/proposal.rst

+Additionally, `feature_importances_` and `coef_` are statistics derived from
+the training set. We already documented that the reported
+`feature_importances_` will potentially show biases for features used by the
+model to overfit. Thus, it will potentially negatively impact the feature


also a link to the docs we're mentioning would be nice

glemaitre added 2 commits March 9, 2023 09:42

Initial barebone draft

7aabb2a

draft motivation

e753bad

glemaitre added 7 commits March 9, 2023 14:33

add link

93dcd2f

wrong link

bcf37e7

add available methods and first use case

0459715

iter

341c79a

iter

f95a7a3

iter

7f1fcb5

iter

487e762

adrinjalali reviewed Mar 15, 2023

View reviewed changes

slep021/proposal.rst Outdated Show resolved Hide resolved

slep021/proposal.rst Outdated Show resolved Hide resolved

slep021/proposal.rst Outdated Show resolved Hide resolved

slep021/proposal.rst Outdated Show resolved Hide resolved

slep021/proposal.rst Outdated Show resolved Hide resolved

some rephresaing and new proposal

71fed85

vitaliset reviewed May 8, 2023

View reviewed changes

glemaitre commented May 15, 2023

View reviewed changes

adrinjalali mentioned this pull request Jun 12, 2023

feature_importances_ should be a method in the ideal design scikit-learn/scikit-learn#9606

Open

glemaitre mentioned this pull request Sep 29, 2023

ENH Add Feature Importances to _MultiOutputEstimator for Both Classifier and Regressor scikit-learn/scikit-learn#27495

Closed

add related method for feature importances

934904c

adrinjalali reviewed May 25, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SLEP021: Unified API for computing feature importance #86

SLEP021: Unified API for computing feature importance #86

glemaitre commented Mar 9, 2023

glemaitre commented Mar 9, 2023

adrinjalali left a comment

jnothman commented Apr 29, 2023

vitaliset left a comment

vitaliset May 8, 2023

vitaliset May 8, 2023

glemaitre May 15, 2023

vitaliset May 8, 2023

glemaitre May 15, 2023

glemaitre left a comment

adrinjalali left a comment

adrinjalali May 25, 2024

adrinjalali May 25, 2024

SLEP021: Unified API for computing feature importance #86

Are you sure you want to change the base?

SLEP021: Unified API for computing feature importance #86

Conversation

glemaitre commented Mar 9, 2023

glemaitre commented Mar 9, 2023

adrinjalali left a comment

Choose a reason for hiding this comment

jnothman commented Apr 29, 2023

vitaliset left a comment

Choose a reason for hiding this comment

vitaliset May 8, 2023

Choose a reason for hiding this comment

vitaliset May 8, 2023

Choose a reason for hiding this comment

glemaitre May 15, 2023

Choose a reason for hiding this comment

vitaliset May 8, 2023

Choose a reason for hiding this comment

glemaitre May 15, 2023

Choose a reason for hiding this comment

glemaitre left a comment

Choose a reason for hiding this comment

adrinjalali left a comment

Choose a reason for hiding this comment

adrinjalali May 25, 2024

Choose a reason for hiding this comment

adrinjalali May 25, 2024

Choose a reason for hiding this comment