Code

This repository contains the code that is able to reproduce the results and figures for:

Counting in Small Transformers: The Delicate Interplay between Attention and Feed-Forward Layers, by Freya Behrens, Luca Biggio, Lenka Zdeborová

Reproducing Figures

The results are decentrally saved in a wandb repository that is accessible to the public:

- feeds/phase_diagram_T32
- feeds/phase_diagram_T32_L15
- feeds/random_phase_diagram_T32
- feeds/phase_diagram_T64

The learning process is documented with the weights along the training. The notebooks to create the figures rely on downloading these results. The notebooks are named according to the figure number in the paper, which should help identifying the results you are interested in.

Reproducing the Results

The results were generated from scripts, that can be generated from [experiments]-scripts.ipynb. These should be configured with your personal wandb id. Running all experiments takes approximately 1 week on a single GPU (NVIDIA RTX A5000).

Explicit Constructions

In [theory]-explicit-constructions-d=T.ipynb and its dependencies we implemented the algorithms from out explicit constructions for $d=T$. In [theory]-explicit-constructions-d<T-mutual-coherence.ipynb and its dependencies we implemented the algorithms from out explicit constructions for $d<T$ that rely on ideas connected to the mutual coherence bounds. In [theory]-explicit-constructions-d<T-softmax.ipynb and its dependencies we implemented the algorithms from out explicit constructions for $d<T$ that rely on ideas connected to the softmax.

Software requirements

wandb
torch
numpy
matplotlib
seaborn
tqdm

Questions and Contact

If you have any questions, feel free to contact us through the email adresses linked on the paper, or simply create an issue on this repo.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
figures		figures
random_embd		random_embd
.gitignore		.gitignore
README.md		README.md
[Fig01,02,10,11]-phase-diagram-T32.ipynb		[Fig01,02,10,11]-phase-diagram-T32.ipynb
[Fig03,15]-BOS+sftm-introspection.ipynb		[Fig03,15]-BOS+sftm-introspection.ipynb
[Fig04]-dot+sftm-introspection.ipynb		[Fig04]-dot+sftm-introspection.ipynb
[Fig05]-lin+sftm-introspection.ipynb		[Fig05]-lin+sftm-introspection.ipynb
[Fig06]-singular values.ipynb		[Fig06]-singular values.ipynb
[Fig07.a]-approximation-vary-dimension.ipynb		[Fig07.a]-approximation-vary-dimension.ipynb
[Fig07.b]-approximations-vary-hidden.ipynb		[Fig07.b]-approximations-vary-hidden.ipynb
[Fig08]-output-layer-manual.ipynb		[Fig08]-output-layer-manual.ipynb
[Fig09]-output-layer-experiment.ipynb		[Fig09]-output-layer-experiment.ipynb
[Fig12]-compare-random.ipynb		[Fig12]-compare-random.ipynb
[Fig13]-phase-diagram-T64.ipynb		[Fig13]-phase-diagram-T64.ipynb
[Fig14]-phase-diagram-L15.ipynb		[Fig14]-phase-diagram-L15.ipynb
[experiments]-scripts.ipynb		[experiments]-scripts.ipynb
[theory]-explicit-constructions-d<T-softmax.ipynb		[theory]-explicit-constructions-d<T-softmax.ipynb
[theory]-explicit-constructions-d=T.ipynb		[theory]-explicit-constructions-d=T.ipynb
[theory]-explicit_constructions-d<T-mutual-coherence.ipynb		[theory]-explicit_constructions-d<T-mutual-coherence.ipynb
data_import.py		data_import.py
dot-p=1.ipynb		dot-p=1.ipynb
manual_models.py		manual_models.py
memory_exp.py		memory_exp.py
plotting.py		plotting.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Code

Reproducing Figures

Reproducing the Results

Explicit Constructions

Software requirements

Questions and Contact

About

Releases

Packages

Languages

SPOC-group/counting-attention

Folders and files

Latest commit

History

Repository files navigation

Code

Reproducing Figures

Reproducing the Results

Explicit Constructions

Software requirements

Questions and Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages