Does Localization Inform Unlearning?

Official Repository for "Does Localization Inform Unlearning? A Rigorous Examination of Local Parameter Attribution for Knowledge Unlearning in Language Models" [Paper Link (arXiv)]

Hwiyeong Lee, Uiji Hwang, Hyelim Lim and Taeuk Kim. Accepted to EMNLP 2025 short paper.

🔍 Overview

This work investigates whether parameter localization techniques can enhance knowledge unlearning in language models. Localized unlearning pinpoints the parameters that encode the unwanted knowledge and updates only that subset, aiming to erase the target knowledge while preserving the model's overall capabilities.

However, we identify critical gaps that have largely been overlooked in this line of work:

Limited Evaluation: Fails to capture the extent to which knowledge is stored in the parameters and ignores the trade-off between removal and retention.
Unverified Locality Assumption: It remains untested whether localization success causally drives unlearning success.

Our main contributions are as follows:

Rigorous Evaluation Framework: We introduce a comprehensive evaluation framework that thoroughly measures the removal-retention trade-off.
Empirical Analysis: Applying this framework, we show that current localized unlearning methods perform no better than random updates.
Causal Analysis: Controlled experiments reveal that localization success does not causally translate into unlearning success.

🚀 Quick Start

Prerequisites

Python 3.10
CUDA 12.4+
Conda package manager

Environment Setup

Clone and setup environment:

git clone https://github.com/HYU-NLP/loc-unlearn.git
cd loc_unlearn
conda create -n loc_unlearn python=3.10
conda activate loc_unlearn

Install dependencies:

conda install pytorch pytorch-cuda=12.4 -c pytorch -c nvidia
conda env update -f environment.yml
pip install flash-attn --no-build-isolation

Running Experiments

1. Mask Generation

Generate binary masks for parameter attribution using various localization methods:

CUDA_VISIBLE_DEVICES=$GPU_IDS python -m torch.distributed.run \
    --nproc_per_node=$NUM_GPUS \
    --master_port=$MASTER_PORT \
    generate_mask.py \
    --config-name get_mask_tofu \
    attribution=$ATTRIBUTION_METHOD

Parameters:

$GPU_IDS: GPU device IDs (e.g., "0,1,2,3")
$NUM_GPUS: Number of GPUs to use
$MASTER_PORT: Master port for distributed training
$ATTRIBUTION_METHOD: Attribution method (e.g., "memflex_down", "wagle_down", "activation", etc.). See mask_generator.py for all available methods.

2. Localized Unlearning

Apply the generated masks to perform localized unlearning:

CUDA_VISIBLE_DEVICES=$GPU_IDS python -m torch.distributed.run \
    --nproc_per_node=$NUM_GPUS \
    --master_port=$MASTER_PORT \
    unlearn.py \
    --config-name forget_tofu \
    model_family=$MODEL_FAMILY \
    model_path=$MODEL_PATH \
    forget_loss=$FORGET_LOSS \
    mask_path=$MASK_PATH \
    lr=$LR \
    hyper_param=$HP \
    save_checkpoint=True \
    num_epochs=5

Parameters:

$MODEL_FAMILY: Model family (e.g., "llama3.1-8b-inst", "olmo2-7b-inst"). See model_config.yaml for all available models.
$MODEL_PATH: Path to the fine-tuned model
$FORGET_LOSS: Unlearning loss type (e.g., "learn", "wga", "dpo", "npo", "rmu_*", "rt_diff", etc.). See dataloader.py for all available methods.
$MASK_PATH: Path to the generated parameter mask
$LR: Learning rate for unlearning
$HP: Hyperparameter for the unlearning method

3. Evaluation

Single-point Evaluation

For evaluating a single specific checkpoint (Forget Quality, Model Utility, Forget Strength, Retain Strength, etc.):

CUDA_VISIBLE_DEVICES=$GPU_IDS python -m torch.distributed.run \
    --nproc_per_node=$NUM_GPUS \
    --master_port=$MASTER_PORT \
    eval_tofu.py \
    --config-name eval_tofu \
    model_family=$MODEL_FAMILY \
    model_path=$MODEL_PATH

Comprehensive Evaluation (AUES and MU95)

To measure AUES and MU95, use the provided script:

bash scripts/eval.sh

4. Example Usage

Mask Generation

Generate "Activation" mask:

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.run \
    --nproc_per_node=2 \
    generate_mask.py \
    attribution=activation

Localized Unlearning

Full Parameter Update (Learn Full Data):

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.run \
    --nproc_per_node=2 \
    unlearn.py \
    model_family=llama3.1-8b-inst \
    forget_loss=learn \
    forget_split=full \
    lr=1e-5 \
    save_checkpoint=True \
    num_epochs=5

Local Parameter Update (Random Mask, Learn Forget Data):

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.run \
    --nproc_per_node=2 \
    unlearn.py \
    model_family=llama3.1-8b-inst \
    forget_loss=learn_reg \
    forget_split=forget10 \
    model_path=llama3.1-8b-inst/TOFU/default/retain90/learn/lora_r0/masked_False/lr_1e-05_wd_0.01_ep_5_bs_8/checkpoint-565 \
    mask_path=mask/random_down/llama3.1-8b-inst/TOFU/default/forget10/random_down_mask_normal_0.1_s42.pt \
    lr=0.0002 \
    save_checkpoint=True \
    num_epochs=5

Local Parameter Update (Random Complement Mask, NPO Unlearning):

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.run \
    --nproc_per_node=2 \
    unlearn.py \
    model_family=llama3.1-8b-inst \
    forget_loss=npo \
    forget_split=forget10 \
    model_path=llama3.1-8b-inst/TOFU/default/retain90/learn/lora_r0/masked_False/lr_1e-05_wd_0.01_ep_5_bs_8/checkpoint-565/forget10/learn_reg/lora_r0_lm_False/masked_mask/random_down/llama3.1-8b-inst/TOFU/default/forget10/random_down_mask_normal_0.1_s42.pt/lr_0.0002_wd_0.01_ep_5_bs_8/checkpoint-65 \
    mask_path=mask/random_down/llama3.1-8b-inst/TOFU/default/forget10/random_down_mask_random_complement_0.1_s42.pt \
    lr=1e-5 \
    save_checkpoint=True \
    num_epochs=5

Evaluation

Single-point Evaluation:

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.run \
    --nproc_per_node=2 \
    eval_tofu.py \
    model_path=llama3.1-8b-inst/TOFU/default/retain90/learn/lora_r0/masked_False/lr_1e-05_wd_0.01_ep_5_bs_8/checkpoint-565

Comprehensive Evaluation (AUES/MU95):

Edit scripts/eval.sh:

# List of model directories to evaluate (Unlearned Models)
MODEL_DIR=(
    "/path/to/model_dir_1"
    "/path/to/model_dir_2"
    "/path/to/model_dir_3"
)

# Initial model directory
INIT_MODEL_DIR="/path/to/init_model_dir"

Run evaluation:

bash scripts/eval.sh

5. Hyperparameters

Please refer to Appendix F (Hyperparameter Details) of the paper for the full, per-experiment configurations.

6. Configuration Files

The pipeline uses YAML configuration files in the config/ directory:

get_mask_tofu.yaml: Mask generation configuration
forget_tofu.yaml: Unlearning configuration
eval_tofu.yaml: Evaluation configuration
model_config.yaml: Model-specific settings
ds_config.json: DeepSpeed configuration for distributed training

📚 Citation

@misc{lee2025doeslocalizationinformunlearning,
      title={Does Localization Inform Unlearning? A Rigorous Examination of Local Parameter Attribution for Knowledge Unlearning in Language Models}, 
      author={Hwiyeong Lee and Uiji Hwang and Hyelim Lim and Taeuk Kim},
      year={2025},
      eprint={2505.16252},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2505.16252}, 
}

🙏 Acknowledgments

We thank the authors of the following repositories for providing valuable code and implementations that served as references for this work:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Does Localization Inform Unlearning?

Official Repository for "Does Localization Inform Unlearning? A Rigorous Examination of Local Parameter Attribution for Knowledge Unlearning in Language Models" [Paper Link (arXiv)]

Hwiyeong Lee, Uiji Hwang, Hyelim Lim and Taeuk Kim. Accepted to EMNLP 2025 short paper.

📋 Table of Contents

🔍 Overview

🚀 Quick Start

Prerequisites

Environment Setup

Running Experiments

1. Mask Generation

2. Localized Unlearning

3. Evaluation

Single-point Evaluation

Comprehensive Evaluation (AUES and MU95)

4. Example Usage

Mask Generation

Localized Unlearning

Evaluation

5. Hyperparameters

6. Configuration Files

📚 Citation

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
config		config
data		data
scripts		scripts
README.md		README.md
data_module.py		data_module.py
dataloader.py		dataloader.py
environment.yml		environment.yml
eval_tofu.py		eval_tofu.py
evaluator.py		evaluator.py
generate_mask.py		generate_mask.py
mask_generator.py		mask_generator.py
unlearn.py		unlearn.py
utils.py		utils.py
visualize.ipynb		visualize.ipynb

Folders and files

Latest commit

History

Repository files navigation

Does Localization Inform Unlearning?

Official Repository for "Does Localization Inform Unlearning? A Rigorous Examination of Local Parameter Attribution for Knowledge Unlearning in Language Models" [Paper Link (arXiv)]

Hwiyeong Lee, Uiji Hwang, Hyelim Lim and Taeuk Kim. Accepted to EMNLP 2025 short paper.

📋 Table of Contents

🔍 Overview

🚀 Quick Start

Prerequisites

Environment Setup

Running Experiments

1. Mask Generation

2. Localized Unlearning

3. Evaluation

Single-point Evaluation

Comprehensive Evaluation (AUES and MU95)

4. Example Usage

Mask Generation

Localized Unlearning

Evaluation

5. Hyperparameters

6. Configuration Files

📚 Citation

🙏 Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages