This repository contains the code that is able to reproduce the results and figures for:
Counting in Small Transformers: The Delicate Interplay between Attention and Feed-Forward Layers, by Freya Behrens, Luca Biggio, Lenka Zdeborová
The results are decentrally saved in a wandb repository that is accessible to the public:
- feeds/phase_diagram_T32
- feeds/phase_diagram_T32_L15
- feeds/random_phase_diagram_T32
- feeds/phase_diagram_T64
The learning process is documented with the weights along the training. The notebooks to create the figures rely on downloading these results. The notebooks are named according to the figure number in the paper, which should help identifying the results you are interested in.
The results were generated from scripts, that can be generated from [experiments]-scripts.ipynb
.
These should be configured with your personal wandb id.
Running all experiments takes approximately 1 week on a single GPU (NVIDIA RTX A5000).
In [theory]-explicit-constructions-d=T.ipynb
and its dependencies we implemented the algorithms from out explicit constructions for [theory]-explicit-constructions-d<T-mutual-coherence.ipynb
and its dependencies we implemented the algorithms from out explicit constructions for [theory]-explicit-constructions-d<T-softmax.ipynb
and its dependencies we implemented the algorithms from out explicit constructions for
wandb
torch
numpy
matplotlib
seaborn
tqdm
If you have any questions, feel free to contact us through the email adresses linked on the paper, or simply create an issue on this repo.