Releases: oegedijk/explainerdashboard
v0.3.0: reducing memory footprint
Version 0.3.0:
This is a major release and comes with lots of breaking changes to the lower level
ClassifierExplainer
and RegressionExplainer
API. The higherlevel ExplainerComponent
and ExplainerDashboard
API has not been
changed however, except for the deprecation of the cats
and hide_cats
parameters.
Explainers generated with version explainerdashboard <= 0.2.20.1
will not work
with this version! So if you have stored explainers to disk you either have to
rebuild them with this new version, or downgrade back to explainerdashboard==0.2.20.1
!
(hope you pinned your dependencies in production! ;)
Main motivation for these breaking changes was to improve memory usage of the
dashboards, especially in production. This lead to the deprecation of the
dual cats grouped/not grouped functionality of the dashboard. Once I had committed
to that breaking change, I decided to clean up the entire API and do all the
needed breaking changes at once.
Breaking Changes
-
onehot encoded features (passed with the
cats
parameter) are now merged by default. This means that thecats=True
parameter has been removed from all explainer methods, and thegroup cats
toggle has been removed from allExplainerComponents
. This saves both
on code complexity and memory usage. If you wish to see the see the individual
contributions of onehot encoded columns, simply don't pass them to the
cats
parameter upon construction. -
Deprecated explainer attributes:
BaseExplainer
:shap_values_cats
shap_interaction_values_cats
permutation_importances_cats
get_dfs()
formatted_contrib_df()
to_sql()
check_cats()
equivalent_col
ClassifierExplainer
:get_prop_for_label
-
Naming changes to attributes:
BaseExplainer
:importances_df()
->get_importances_df()
feature_permutations_df()
->get_feature_permutations_df()
get_int_idx(index)
->get_idx(index)
importances_df()
->get_importances_df()
contrib_df()
->get_contrib_df()
*contrib_summary_df()
->self.get_summary_contrib_df()
*interaction_df()
->get_interactions_df()
*shap_values
->get_shap_values_df
plot_shap_contributions()
->plot_contributions()
plot_shap_summary()
->plot_importances_detailed()
plot_shap_dependence()
->plot_dependence()
plot_shap_interaction()
->plot_interaction()
plot_shap_interaction_summary()
->plot_interactions_detailed()
plot_interactions()
->plot_interactions_importance()
n_features()
->n_features
shap_top_interaction()
->top_shap_interactions
shap_interaction_values_by_col()
->shap_interactions_values_for_col()
ClassifierExplainer
:self.pred_probas
->self.pred_probas()
precision_df()
->get_precision_df()
*lift_curve_df()
->get_liftcurve_df()
*
RandomForestExplainer
/XGBExplainer
:decision_trees
->shadow_trees
decisiontree_df()
->get_decisionpath_df()
decisiontree_summary_df()
->get_decisionpath_summary_df()
decision_path_file()
->decisiontree_file()
decision_path()
->decisiontree()
decision_path_encoded()
->decisiontree_encoded()
New Features
- new
Explainer
parameterprecision
: defaults to'float64'
. Can be set to
'float32'
to save on memory usage:ClassifierExplainer(model, X, y, precision='float32')
- new
memory_usage()
method to show which internal attributes take the most memory. - for multiclass classifiers:
keep_shap_pos_label_only(pos_label)
method:- drops shap values and shap interactions for all labels except
pos_label
- this should significantly reduce memory usage for multi class classification
models. - not needed for binary classifiers.
- drops shap values and shap interactions for all labels except
- added
get_index_list()
,get_X_row(index)
, andget_y(index)
methods.- these can be overridden with
.set_index_list_func()
,.set_X_row_func()
and.set_y_func()
. - by overriding these functions you can for example sample observations
from a database or other external storage instead of fromX_test
,y_test
.
- these can be overridden with
- added
Popout
buttons to all the major graphs that open a large modal
showing just the graph. This makes it easier to focus on a particular
graph without distraction from the rest of the dashboard and all it's toggles. - added
max_cat_colors
parameters toplot_importance_detailed
andplot_dependence
andplot_interactions_detailed
- prevents plotting getting slow with categorical features with many categories.
- defaults to
5
- can be set as
**kwarg
toExplainerDashboard
- adds category limits and sorting to
RegressionVsCol
component - adds property
X_merged
that gives a dataframe with the onehot columns merged.
Bug Fixes
- shap dependence: when no point cloud, do not highlight!
- Fixed bug with calculating contributions plot/table for whatif component,
when InputFeatures had not fully loaded, resulting in shap error.
Improvements
- saving
X.copy()
, instead of using a reference toX
- this would result in more memory usage in development
though, so you candel X_test
to save memory.
- this would result in more memory usage in development
ClassifierExplainer
only stores shap (interaction) values for the positive
class: shap values for the negative class are generated on the fly
by multiplying with-1
.- encoding onehot columns as
np.int8
saving memory usage - encoding categorical features as
pd.category
saving memory usage - added base
TreeExplainer
class thatRandomForestExplainer
andXGBExplainer
both derive from- will make it easier to extend tree explainers to other models in the future
- e.g. catboost and lightgbm
- will make it easier to extend tree explainers to other models in the future
- got rid of the callable properties (that were their to assure backward compatibility),
and replaced them with regular methods.
v0.2.20.1: backward compatibility fix
0.2.20.1:
Bug Fixes
- fixes bug allowing single list of logins for ExplainerDashboard when passed
on to ExplainerHub - fixes bug with explainers generated with explainerdashboard <= 0.2.19
that did not have a.onehot_cols
property
v0.2.20: supporting categorical features
0.2.20:
Breaking Changes
WhatIfComponent
deprecated. UseWhatIfComposite
or connect components
yourself to aFeatureInputComponent
- renaming properties:
explainer.cats
->explainer.onehot_cols
explainer.cats_dict
->explainer.onehot_dict
New Features
- Adds support for models with categorical features (e.g. CatBoost)
- Adds filter on number of categories to display in violin plots and pdp plot,
and how to sort the categories (alphabetical, by frequency or by mean abs shap)
Bug Fixes
- fixes bug where str tab indicators returned e.g. the old ImportancesTab instead of ImportancesComposite
Improvements
- No longer dependening on PDPbox dependency: built own partial dependence
functions with categorical feature support - autodetect xgboost.core.Booster or lightgbm.Booster and give ValueError to
use the sklearn compatible wrappers instead.
Other Changes
- Introduces list of categorical columns:
explainer.categorical_cols
- Introduces dictionary with categorical columns categories:
explainer.categorical_dict
- Introduces list of all categorical features:
explainer.cat_cols
Bugfix: support custom dashboard components that dont take name or kwargs
Bug fix:
- custom ExplainerComponent that do not have
name
or**kwargs
parameters in the__init__
are no longer broken.
v0.2.19: ExplainerHub improvements (NavBar!)
0.2.19
Breaking Changes
- ExplainerHub: parameter
user_json
is now calledusers_file
(and default to ausers.yaml
file) - Renamed a bunch of
ExplainerHub
private methods:_validate_user_json
->_validate_users_file
_add_user_to_json
->_add_user_to_file
_add_user_to_dashboard_json
->_add_user_to_dashboard_file
_delete_user_from_json
->_delete_user_from_file
_delete_user_from_dashboard_json
->_delete_user_from_dashboard_file
New Features
- Added NavBar to
ExplainerHub
- Made
users.yaml
to default file for storing users and hashed passwords
forExplainerHub
for easier manual editing. - Added option
min_height
toExplainerHub
to set the size of the iFrame
containing the dashboard. - Added option
fluid=True
toExplainerHub
to stretch bootstrap container
to width of the browser. - added parameter
bootstrap
toExplainerHub
to override default bootstrap theme. - added option
dbs_open_by_default=True
toExplainerHub
so that no login
is required for dashboards for which there wasn't a specific lists
of users declared throughdb_users
. So only dashboards for which users
have been defined are password protected. - Added option
no_index
toExplainerHub
so that no flask route is created
for index"/"
, so that you can add your own custom index. The dashboards
are still loaded on their respective routes, so you can link to them
or embed them in iframes, etc. - Added a "wizard" perfect prediction to the lift curve.
- hide with
hide_wizard=True
default to not show withwizard=False
.
- hide with
Bug Fixes
ExplainerHub.from_config()
now works with non-cwd pathsExplainerHub.to_yaml("subdirectory/hub.yaml")
now correctly stores
the users.yaml file in the correct subdirectory when specified.
Improvements
- added a "powered by: explainerdashboard" footer. Hide it with hide_poweredby=True.
- added option "None" to shap dependence color col. Also removes the point cloud
from the violin plots for categorical features. - added option
mode
toExplainerDashboard.run()
that can overrideself.mode
.
v0.2.18.2: fix bug with ExplainerHub and logins=None
added secret key docs
v0.2.18: ExplainerHub user management + CLI
v0.2.18.1
New Features
ExplainerHub
now does user managment throughFlask-Login
and auser.json
file- Can now set specific access policies for specific explainer with
db_users
parameter - adds an
explainerhub
cli to start explainerhubs and do user management from the command-line
v0.2.17.3: fixes version bump
Update setup.py
v0.2.17.2: sklearn v0.24 RandomForestRegressor bugfix
v0.2.17: Introducing ExplainerHub
0.2.17:
New Features
- Introducing
ExplainerHub
: combine multiple dashboards together behind a single frontend with convenient url paths.- code example:
db1 = ExplainerDashboard(explainer, title="Dashboard One", name='db1') db2 = ExplainerDashboard(explainer2, title="Dashboard Two", name='project_alpha', description="New proposed model") hub = ExplainerHub([db1, db2]) hub.run() # store an recover from config: hub.to_yaml("hub.yaml") hub2 = ExplainerHub.from_config("hub.yaml")
- adds option
dump_explainer
toExplainerDashboard.to_yaml()
to automatically
dump theexplainer
along with the.yaml
. - adds option
use_waitress
toExplainerDashboard.run()
andExplainerHub.run()
, to use thewaitress
python webserver instead of theFlask
development server - adds parameters to
ExplainerDashboard
:name
: this will be used to assign a url forExplainerHub
(otherwise defaults todashboard1
,dashboard2
, etcdescription
: this will be used for the title tooltip in the dashboard
and in theExplainerHub
frontend.
Improvements
- the
cli
now uses thewaitress
server by default.