This repository contains the code of the experiments in the paper
Connecting Interpretability and Robustness in Decision Trees through Separation
Authors: Michal Moshkovitz, Yao-Yuan Yang, Kamalika Chaudhuri
Recent research has recognized interpretability and robustness as essential properties of trustworthy classification. Curiously, a connection between robustness and interpretability was empirically observed, but the theoretical reasoning behind it remained elusive. In this paper, we rigorously investigate this connection. Specifically, we focus on interpretation using decision trees and robustness to
pip install -r requirements.txt
- Install gurobi: https://www.cvxpy.org/install/index.html#install-with-gurobi-support
- Install GLPK: https://www.gnu.org/software/glpk/
- Install CVXOPT with GLPK support:
CVXOPT_BUILD_GLPK=1
CVXOPT_GLPK_LIB_DIR=/path/to/glpk-X.X/lib
CVXOPT_GLPK_INC_DIR=/path/to/glpk-X.X/include
pip install --upgrade cvxopt
git submodule init
git submodule update
cd risk-slim
pip install -r requirements.txt
pip install .
For more LCPA installation instructions, please visit https://github.com/ustunb/risk-slim
cd RobustTrees
git submodule update --init --recursive
./build.sh
cd python-package
python setup.py install
For more RobDT installation instructions, please visit https://github.com/chenhongge/RobustTrees
- params.py: listed all parameters run
- rsep_explain/datasets/init.py: load datasets
- experiments/lin_sep_bbm_rob_3.py: run experiment for BBM-RS
- experiments/dt_interpret_rob_3.py: run experiment for DT (Breiman et al., 1984)
- experiments/xgboostrobdt_interpret_rob.py: run experiment for RobDT (Chen et al., 2019)
- experiments/risk_slim_3.py: run experiment for LCAP (Ustun & Rudin, 2019)
- experiments/calc_lin_separation.py: estimating the linear separateness of each dataset
-
experiments/calc_separation.py: estimating the
$r$ - separateness of each dataset
- notebooks/case_study.ipynb: generate Table 1
- notebooks/separation.ipynb: generate Table 2
- notebooks/risk_score_3.ipynb: generate Table 3 and 4
- notebooks/tradeoff.ipynb: generate images in Figure 1 and 4
- notebooks/plot_bbm.ipynb: generate images in Figure 5
usage: main.py [-h] [--no-hooks] --experiment
{lin_sep_bbm_rob_3,risk_slim_3,dt_interpret_rob_3,xgboostrobdt_interpret_rob,calc_lin_separation,calc_separation}
--dataset DATASET --preprocessor PREPROCESSOR --random_seed
RANDOM_SEED --rsep RSEP
Datasets: {risk_ionosphere, risk_diabetes, risk_breastcancer, risk_adult, risk_mushroom, risk_mammo, risk_spambase, risk_bank, risk_careval, risk_compasbin, risk_ficobin, risk_bank_2, risk_heart}
The result of each example is outputed as a joblib pickle file named temp.pkl.
Run BBM-RS with
python main.py --no-hooks --experiment lin_sep_bbm_rob_3 \
--dataset risk_bank --preprocessor rminmax \
--rsep 0.05 \
--random_seed 0
Run RobDT with robust radius $ = 0.1$ on the mammo dataset.
python main.py --no-hooks --experiment xgboostrobdt_interpret_rob \
--dataset risk_mammo --preprocessor rminmax \
--rsep 0.1 \
--random_seed 0
Run LCAP on the mammo dataset.
python main.py --no-hooks --experiment risk_slim_3 \
--dataset risk_mammo --preprocessor rminmax \
--random_seed 0
Run DT on the heart dataset.
python main.py --no-hooks --experiment dt_interpret_rob_3 \
--dataset risk_heart --preprocessor rminmax \
--random_seed 0
Calculate the r-separateness of the heart dataset.
python main.py --no-hooks --experiment calc_separation \
--dataset risk_heart --preprocessor rminmax \
--random_seed 0
Calculate the linear separateness of the heart dataset.
python main.py --no-hooks --experiment calc_lin_separation \
--dataset risk_heart --preprocessor rminmax \
--random_seed 0