Causal Proxy Models For Concept-Based Model Explanations

Zhengxuan Wu*, Karel D'Oosterlinck*, Atticus Geiger*, Amir Zur, Christopher Potts

The codebase contains some implementations of our preprint Causal Proxy Models For Concept-Based Model Explanations. In this paper, we introuce two variants of CPM,

CPM_IN: Input-base CPM uses auxiliary token to represent the intervention, and is trained in a supervised way of predicting counterfactual output. This model is built on an input-level intervention.
CPM_HI: Hidden-state CPM uses Interchange Intervention Training (IIT) to localize concept information within its representations, and swaps hidden-states to represent the intervention. It is trained in a supervised way of predicting counterfactual output. This model is built on a hidden-state intervention.

This codebase contains implementations and experiments for both CPM_IN and CPM_HI. If you experience any issues or have suggestions, please contact me either thourgh the issues page or at wuzhengx@cs.stanford.edu or at karel.doosterlinck@ugent.be.

Citation

If you use this repository, please consider to cite our relevant papers:

  @article{wu-etal-2021-cpm,
        title={Causal Proxy Models For Concept-Based Model Explanations}, 
        author={Wu, Zhengxuan and D'Oosterlinck, Karel and Geiger, Atticus and Zur, Amir and Potts, Christopher},
        year={2022},
        eprint={2209.14279},
        archivePrefix={arXiv},
        primaryClass={cs.LG}
  }

  @article{geiger-etal-2021-iit,
        title={Inducing Causal Structure for Interpretable Neural Networks}, 
        author={Geiger, Atticus and Wu, Zhengxuan and Lu, Hanson and Rozner, Josh and Kreiss, Elisa and Icard, Thomas and Goodman, Noah D. and Potts, Christopher},
        year={2021},
        eprint={2112.00826},
        archivePrefix={arXiv},
        primaryClass={cs.LG}
  }

Requirements

Python 3.6 or 3.7 are supported.
Pytorch Version: 1.11.0
Transfermers Version: 4.21.1
Datasets Version: Version: 2.3.2

Installation

First clone the directory. Then run the following command to initialize the submodules:

git submodule init; git submodule update

Loading Black-box Models for CEBaB

These models are avaliable from the CEBaB website. Here is one example about how to load these models!

from transformers import AutoTokenizer, BertForNonlinearSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("CEBaB/bert-base-uncased.CEBaB.sa.5-class.exclusive.seed_42")

model = BertForNonlinearSequenceClassification.from_pretrained("CEBaB/bert-base-uncased.CEBaB.sa.5-class.exclusive.seed_42")

Loading CPMs for CEBaB

We aim to make all of our CPMs public. Currently, they are be found on our huggingface repo.

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("CPMs/cpm.hi.bert-base-uncased.layer.10.size.192")

model = AutoModelForSequenceClassification.from_pretrained("CPMs/cpm.hi.bert-base-uncased.layer.10.size.192")

Note that we also have different helpers to load these models into our explainer module. Please refer to notebooks under experiments folder.

Training CPM_IN

To train CPM_IN, we follow the basic finetuning setup since the intervention is on the inputs. To train, you should first go to CEBaB-inclusive/eval_pipeline/; and you can run the following command to train.

python main.py \
--model_architecture bert-base-uncased \
--train_setting inclusive \
--model_output_dir model_output \
--output_dir output \
--flush_cache true \
--task_name opentable_5_way \
--batch_size 128 \
--k_array 19684

To train with different variants of approximate counterfactuals, you need to change the flag --train_setting approximate for metadata-sampled counterfactuals. Note that in this setting, you can ignore the field --k_array. You should change --model_architecture for different model architectures.

Training CPM_HI

To train CPM_HI, we adapt interchange intervention training (IIT). To train, you can use the following command, and you can refer to our paper for configurations.

python Proxy_training.py \
--model_name_or_path ./saved_models/bert-base-uncased.opentable.CEBaB.sa.5-class.exclusive.seed_42/ \
--task_name CEBaB \
--dataset_name CEBaB/CEBaB \
--do_train \
--per_device_train_batch_size 256 \
--per_device_eval_batch_size 256 \
--learning_rate 8e-05 \
--output_dir ./proxy_training_results/your_first_try/ \
--cache_dir ./train_cache/ \
--seed 42 \
--report_to none \
--logging_steps 1 \
--alpha 1.0 \
--beta 1.0 \
--gemma 3.0 \
--overwrite_output_dir \
--intervention_h_dim 192 \
--counterfactual_type true \
--k 19684 \
--interchange_hidden_layer 10 \
--save_steps 10 \
--early_stopping_patience 20

To train with different variants of approximate counterfactuals, you need to change the flag --counterfactual_type approximate for metadata-sampled counterfactuals. Note that in this setting, you can ignore the field --k. You should change --model_name_or_path for different model architectures. These models can be downloaded from CEBaB website.

Name		Name	Last commit message	Last commit date
Latest commit History 102 Commits
CEBaB-inclusive @ 3e01a54		CEBaB-inclusive @ 3e01a54
experiments		experiments
models		models
utils		utils
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
ProxyTrainer.py		ProxyTrainer.py
Proxy_training.py		Proxy_training.py
README.md		README.md
Trainer.py		Trainer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Causal Proxy Models For Concept-Based Model Explanations

Citation

Requirements

Installation

Loading Black-box Models for CEBaB

Loading CPMs for CEBaB

Training CPM_IN

Training CPM_HI

About

Uh oh!

Releases

Packages

Languages

License

frankaging/Causal-Proxy-Model

Folders and files

Latest commit

History

Repository files navigation

Causal Proxy Models For Concept-Based Model Explanations

Citation

Requirements

Installation

Loading Black-box Models for CEBaB

Loading CPMs for CEBaB

Training CPMIN

Training CPMHI

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Training CPM_IN

Training CPM_HI

Packages