Mechanistically Interpreting Arithmetic in LLMs

This repository contains the code for the EMNLP 2023 paper "A Mechanistic Interpretation of Arithmetic Reasoning in Language Models using Causal Mediation Analysis".

Setup

The requirements are listed in requirements.txt. To install them, run:

pip install -r requirements.txt

The configuration of the parameters is handled with Hydra. The configuration files are located in conf/. The default configuration is conf/config.yaml.

Parameters

intervention_type: defines how the two input prompts differ.
- 1 -> the two prompts differ for the value of the operands. For example, 2 + 3 = and 4 + 5 =.
- 2 -> the two prompts differ for the value of the operands, but the result is the same. For example, 2 + 3 = and 4 + 1 =.
- 3 -> the two prompts differ for the operation. For example, 3 + 1 = and 3 - 1 =.
- 11 -> number retrieval synthetic task.
- 20 -> factual knowledge queries. For this task set the lama_path parameter to the path to the locally downloaded LAMA dataset.
intervention_loc: defines the type of components on which the interventions take place. Use layer for MLPs and attention_layer_output for the attention modules.
model : EleutherAI/gpt-j-6B, EleutherAI/pythia-2.8b-deduped or goat. For LLaMA, set this parameter to the path to the locally downloaded model weights.
model_ckpt : path to a fine-tuned version of one of the models above. Can be null.
n_operands: number of operands in the input prompts.
examples_per_template: number of prompt pairs generated per template.
n_shots: number of exemplars included in the prompts.
max_n: maximum value that the operands and the results can attain. (Experiments with LLaMA and Goat require max_n=9. However, in this case the constraint applies only to the value of the result, for example, 164 - 159 = is a valid prompt.)
representation: arabic or words. Defines the representation used for the numbers in the input prompts.
all_tokens: if true, carry out the interventions on the components at each position of the input sequence. If false, carry out the interventions only on the components at the last position of the input sequence.
output_dir: path to the directory where the results will be saved.

Run

To run the code with the default configuration, run:

python math_cma.py

Results

The results are saved in the directory specified by the output_dir parameter. The results are saved as .feather files. In the notebooks/ directory, we provide some notebooks that can be used to visualize the results.

Citation

@inproceedings{stolfo-etal-2023-mechanistic,
    title = "A Mechanistic Interpretation of Arithmetic Reasoning in Language Models using Causal Mediation Analysis",
    author = "Stolfo, Alessandro  and
      Belinkov, Yonatan  and
      Sachan, Mrinmaya",
    booktitle = "Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing",
    year = "2023",
    url = "https://aclanthology.org/2023.emnlp-main.435/",
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
conf		conf
intervention_models		intervention_models
interventions		interventions
notebooks		notebooks
utils		utils
LICENSE		LICENSE
README.md		README.md
arithmetic_cma.png		arithmetic_cma.png
math_cma.py		math_cma.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mechanistically Interpreting Arithmetic in LLMs

Setup

Parameters

Run

Results

Citation

About

Releases

Packages

Languages

License

alestolfo/lm-arithmetic

Folders and files

Latest commit

History

Repository files navigation

Mechanistically Interpreting Arithmetic in LLMs

Setup

Parameters

Run

Results

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages