Code and data for the paper "The Effect of Quantization on Memorization in Large Language Models".
We study how post-training quantization affects memorization in LLMs. Using Pythia 12B, we compare FP16 against LLM.int8(), NF4, and FP4 quantization via bitsandbytes, under both duplicated and deduplicated training regimes.
llm-quantization-memorization/
├── data/
│ ├── mem_eval_results/ # Memorization evaluation results (JSONL)
│ │ ├── deduped/
│ │ └── duped/
│ └── perf_eval_results/ # Downstream benchmark results (JSON/CSV)
│ ├── Open_LLM_Leaderboard_v1/
│ └── Pythia_Hugging_Face_eval/
├── download_scripts/ # Download models, datasets, and lm-evaluation-harness
├── main_scripts/ # Top level pipeline scripts (setup, eval, plotting)
├── plots/
│ ├── mem_eval/ # Generated memorization plots (PDF)
│ ├── svg/ # Generated plots (SVG)
│ └── plotting_scripts/ # Scripts for generating figures
├── quantization_scripts/ # Quantize Pythia 12B (8-bit, NF4, FP4)
├── tables/ # Scripts for generating result tables
├── test_memorization/ # Fixed-prefix and variable-context memorization tests
├── test_performance/ # Downstream evaluation via lm-evaluation-harness
├── test_scripts/ # Setup verification scripts
└── requirements.txt
bash main_scripts/run_everything.sh
This runs setup, model download, quantization, all evaluations, and plot generation end-to-end.
| Step | Script |
|---|---|
| Install dependencies & download data | bash main_scripts/run_all_installs.sh |
| Run all evaluations (memorization + downstream) | bash main_scripts/run_all_evals.sh |
| Generate plots and tables | bash main_scripts/run_all_plotting.sh |
All raw results are included in data/:
mem_eval_results/— Memorization evaluation results (JSONL)perf_eval_results/— Downstream benchmark results (JSON/CSV) from the LM Evaluation Harness