Skip to content
/ tdlab Public

Ablation experiments that dissect the Transformer

Notifications You must be signed in to change notification settings

aditvenk/tdlab

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Transformer Dissection Experiment

Boundary-conditioned character task (len 20): position 0 holds rule digit k; one [B] appears uniformly in [2, T-3]; other tokens are random letters/digits. Output copies tokens through [B], then transforms: digits add k mod 10, letters flip case. Loss/metrics cover all non-[PAD] tokens.

Quickstart

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python train.py --config config/default.yaml

You can override defaults, e.g. fewer steps for a smoke run:

python train.py --max_steps 200 --device cpu

Training logs per-step loss/acc and per-epoch train/val metrics (overall, identity-region, transform-region, digit vs letter splits). After training, attention heatmaps and MLP logit-shift heatmaps for one validation sample save to /tmp/{experiment_name}_attn.png and /tmp/{experiment_name}_mlp_logits.png.

Ablations

Swap configs for ablations:

  • No positional encoding: python train.py --config config/ablation_no_pos.yaml
  • Attention zeroed: python train.py --config config/ablation_no_attention.yaml
  • No MLP: python train.py --config config/ablation_no_mlp.yaml
  • No residuals: python train.py --config config/ablation_no_residual.yaml
  • Frozen embeddings: python train.py --config config/ablation_frozen_embeddings.yaml

Config

Defaults are in config/default.yaml; CLI flags --config, --max_steps, --device override for quick runs. Post-training, a few synthetic samples print for sanity checks.

About

Ablation experiments that dissect the Transformer

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages