Automated Attention Pattern Discovery at Scale in Large Language Models

This is the reproduction package for the paper entitled Automated Attention Pattern Discovery at Scale in Large Language Models

The repository is structured into three main directories:

Clustering
Dataset
Model

The Clustering directory comprises all the code for clustering attention heads and visualizing these clusters. This code corresponds to Section 5 of the paper.

The Dataset directory comprises all the code for creating the dataset. This includes scraping GitHub repositories, extracting code files, removing autogenerated files, and removing exact- and near-duplicates between our custom dataset and Java-Stack v2. This code corresponds to Section 3 of the paper.

The Model directory comprises all the code for building the AP-MAE model. This includes the model architecture and the training setup. This code corresponds to Section 4 of the paper.

For further instructions on running the code, please refer to the README files in each directory.

We also add more visualizations similar to Figure 10 in Visualization. For each SC2 size, we show a plot for every task, split between correct and incorrect.

Links

We release the StackLessV2 Java dataset here.

We release the AP-MAE model collection here.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Clustering		Clustering
Dataset		Dataset
Model		Model
Visualization		Visualization
.gitignore		.gitignore
Automated_Attention_Pattern_Discovery_at_Scale_in_Large_Language_Models.pdf		Automated_Attention_Pattern_Discovery_at_Scale_in_Large_Language_Models.pdf
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automated Attention Pattern Discovery at Scale in Large Language Models

Links

About

Releases

Packages

Languages

License

AISE-TUDelft/AP-MAE

Folders and files

Latest commit

History

Repository files navigation

Automated Attention Pattern Discovery at Scale in Large Language Models

Links

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages