Official Repository for "Does Localization Inform Unlearning? A Rigorous Examination of Local Parameter Attribution for Knowledge Unlearning in Language Models" [Paper Link (arXiv)]
This work investigates whether parameter localization techniques can enhance knowledge unlearning in language models. Localized unlearning pinpoints the parameters that encode the unwanted knowledge and updates only that subset, aiming to erase the target knowledge while preserving the model's overall capabilities.
However, we identify critical gaps that have largely been overlooked in this line of work:
- Limited Evaluation: Fails to capture the extent to which knowledge is stored in the parameters and ignores the trade-off between removal and retention.
- Unverified Locality Assumption: It remains untested whether localization success causally drives unlearning success.
Our main contributions are as follows:
- Rigorous Evaluation Framework: We introduce a comprehensive evaluation framework that thoroughly measures the removal-retention trade-off.
- Empirical Analysis: Applying this framework, we show that current localized unlearning methods perform no better than random updates.
- Causal Analysis: Controlled experiments reveal that localization success does not causally translate into unlearning success.
- Python 3.10
- CUDA 12.4+
- Conda package manager
-
Clone and setup environment:
git clone https://github.com/HYU-NLP/loc-unlearn.git cd loc_unlearn conda create -n loc_unlearn python=3.10 conda activate loc_unlearn -
Install dependencies:
conda install pytorch pytorch-cuda=12.4 -c pytorch -c nvidia conda env update -f environment.yml pip install flash-attn --no-build-isolation
Generate binary masks for parameter attribution using various localization methods:
CUDA_VISIBLE_DEVICES=$GPU_IDS python -m torch.distributed.run \
--nproc_per_node=$NUM_GPUS \
--master_port=$MASTER_PORT \
generate_mask.py \
--config-name get_mask_tofu \
attribution=$ATTRIBUTION_METHODParameters:
$GPU_IDS: GPU device IDs (e.g., "0,1,2,3")$NUM_GPUS: Number of GPUs to use$MASTER_PORT: Master port for distributed training$ATTRIBUTION_METHOD: Attribution method (e.g., "memflex_down", "wagle_down", "activation", etc.). Seemask_generator.pyfor all available methods.
Apply the generated masks to perform localized unlearning:
CUDA_VISIBLE_DEVICES=$GPU_IDS python -m torch.distributed.run \
--nproc_per_node=$NUM_GPUS \
--master_port=$MASTER_PORT \
unlearn.py \
--config-name forget_tofu \
model_family=$MODEL_FAMILY \
model_path=$MODEL_PATH \
forget_loss=$FORGET_LOSS \
mask_path=$MASK_PATH \
lr=$LR \
hyper_param=$HP \
save_checkpoint=True \
num_epochs=5Parameters:
$MODEL_FAMILY: Model family (e.g., "llama3.1-8b-inst", "olmo2-7b-inst"). Seemodel_config.yamlfor all available models.$MODEL_PATH: Path to the fine-tuned model$FORGET_LOSS: Unlearning loss type (e.g., "learn", "wga", "dpo", "npo", "rmu_*", "rt_diff", etc.). Seedataloader.pyfor all available methods.$MASK_PATH: Path to the generated parameter mask$LR: Learning rate for unlearning$HP: Hyperparameter for the unlearning method
For evaluating a single specific checkpoint (Forget Quality, Model Utility, Forget Strength, Retain Strength, etc.):
CUDA_VISIBLE_DEVICES=$GPU_IDS python -m torch.distributed.run \
--nproc_per_node=$NUM_GPUS \
--master_port=$MASTER_PORT \
eval_tofu.py \
--config-name eval_tofu \
model_family=$MODEL_FAMILY \
model_path=$MODEL_PATHTo measure AUES and MU95, use the provided script:
bash scripts/eval.shGenerate "Activation" mask:
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.run \
--nproc_per_node=2 \
generate_mask.py \
attribution=activationFull Parameter Update (Learn Full Data):
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.run \
--nproc_per_node=2 \
unlearn.py \
model_family=llama3.1-8b-inst \
forget_loss=learn \
forget_split=full \
lr=1e-5 \
save_checkpoint=True \
num_epochs=5Local Parameter Update (Random Mask, Learn Forget Data):
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.run \
--nproc_per_node=2 \
unlearn.py \
model_family=llama3.1-8b-inst \
forget_loss=learn_reg \
forget_split=forget10 \
model_path=llama3.1-8b-inst/TOFU/default/retain90/learn/lora_r0/masked_False/lr_1e-05_wd_0.01_ep_5_bs_8/checkpoint-565 \
mask_path=mask/random_down/llama3.1-8b-inst/TOFU/default/forget10/random_down_mask_normal_0.1_s42.pt \
lr=0.0002 \
save_checkpoint=True \
num_epochs=5Local Parameter Update (Random Complement Mask, NPO Unlearning):
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.run \
--nproc_per_node=2 \
unlearn.py \
model_family=llama3.1-8b-inst \
forget_loss=npo \
forget_split=forget10 \
model_path=llama3.1-8b-inst/TOFU/default/retain90/learn/lora_r0/masked_False/lr_1e-05_wd_0.01_ep_5_bs_8/checkpoint-565/forget10/learn_reg/lora_r0_lm_False/masked_mask/random_down/llama3.1-8b-inst/TOFU/default/forget10/random_down_mask_normal_0.1_s42.pt/lr_0.0002_wd_0.01_ep_5_bs_8/checkpoint-65 \
mask_path=mask/random_down/llama3.1-8b-inst/TOFU/default/forget10/random_down_mask_random_complement_0.1_s42.pt \
lr=1e-5 \
save_checkpoint=True \
num_epochs=5Single-point Evaluation:
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.run \
--nproc_per_node=2 \
eval_tofu.py \
model_path=llama3.1-8b-inst/TOFU/default/retain90/learn/lora_r0/masked_False/lr_1e-05_wd_0.01_ep_5_bs_8/checkpoint-565Comprehensive Evaluation (AUES/MU95):
- Edit
scripts/eval.sh:
# List of model directories to evaluate (Unlearned Models)
MODEL_DIR=(
"/path/to/model_dir_1"
"/path/to/model_dir_2"
"/path/to/model_dir_3"
)
# Initial model directory
INIT_MODEL_DIR="/path/to/init_model_dir"- Run evaluation:
bash scripts/eval.shPlease refer to Appendix F (Hyperparameter Details) of the paper for the full, per-experiment configurations.
The pipeline uses YAML configuration files in the config/ directory:
get_mask_tofu.yaml: Mask generation configurationforget_tofu.yaml: Unlearning configurationeval_tofu.yaml: Evaluation configurationmodel_config.yaml: Model-specific settingsds_config.json: DeepSpeed configuration for distributed training
@misc{lee2025doeslocalizationinformunlearning,
title={Does Localization Inform Unlearning? A Rigorous Examination of Local Parameter Attribution for Knowledge Unlearning in Language Models},
author={Hwiyeong Lee and Uiji Hwang and Hyelim Lim and Taeuk Kim},
year={2025},
eprint={2505.16252},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2505.16252},
}We thank the authors of the following repositories for providing valuable code and implementations that served as references for this work: