Skip to content

HYU-NLP/loc-unlearn

Repository files navigation

Does Localization Inform Unlearning?

Official Repository for "Does Localization Inform Unlearning? A Rigorous Examination of Local Parameter Attribution for Knowledge Unlearning in Language Models" [Paper Link (arXiv)]

Hwiyeong Lee, Uiji Hwang, Hyelim Lim and Taeuk Kim. Accepted to EMNLP 2025 short paper.

📋 Table of Contents

🔍 Overview

This work investigates whether parameter localization techniques can enhance knowledge unlearning in language models. Localized unlearning pinpoints the parameters that encode the unwanted knowledge and updates only that subset, aiming to erase the target knowledge while preserving the model's overall capabilities.

However, we identify critical gaps that have largely been overlooked in this line of work:

  • Limited Evaluation: Fails to capture the extent to which knowledge is stored in the parameters and ignores the trade-off between removal and retention.
  • Unverified Locality Assumption: It remains untested whether localization success causally drives unlearning success.

Our main contributions are as follows:

  1. Rigorous Evaluation Framework: We introduce a comprehensive evaluation framework that thoroughly measures the removal-retention trade-off.
  2. Empirical Analysis: Applying this framework, we show that current localized unlearning methods perform no better than random updates.
  3. Causal Analysis: Controlled experiments reveal that localization success does not causally translate into unlearning success.

🚀 Quick Start

Prerequisites

  • Python 3.10
  • CUDA 12.4+
  • Conda package manager

Environment Setup

  1. Clone and setup environment:

    git clone https://github.com/HYU-NLP/loc-unlearn.git
    cd loc_unlearn
    conda create -n loc_unlearn python=3.10
    conda activate loc_unlearn
  2. Install dependencies:

    conda install pytorch pytorch-cuda=12.4 -c pytorch -c nvidia
    conda env update -f environment.yml
    pip install flash-attn --no-build-isolation

Running Experiments

1. Mask Generation

Generate binary masks for parameter attribution using various localization methods:

CUDA_VISIBLE_DEVICES=$GPU_IDS python -m torch.distributed.run \
    --nproc_per_node=$NUM_GPUS \
    --master_port=$MASTER_PORT \
    generate_mask.py \
    --config-name get_mask_tofu \
    attribution=$ATTRIBUTION_METHOD

Parameters:

  • $GPU_IDS: GPU device IDs (e.g., "0,1,2,3")
  • $NUM_GPUS: Number of GPUs to use
  • $MASTER_PORT: Master port for distributed training
  • $ATTRIBUTION_METHOD: Attribution method (e.g., "memflex_down", "wagle_down", "activation", etc.). See mask_generator.py for all available methods.

2. Localized Unlearning

Apply the generated masks to perform localized unlearning:

CUDA_VISIBLE_DEVICES=$GPU_IDS python -m torch.distributed.run \
    --nproc_per_node=$NUM_GPUS \
    --master_port=$MASTER_PORT \
    unlearn.py \
    --config-name forget_tofu \
    model_family=$MODEL_FAMILY \
    model_path=$MODEL_PATH \
    forget_loss=$FORGET_LOSS \
    mask_path=$MASK_PATH \
    lr=$LR \
    hyper_param=$HP \
    save_checkpoint=True \
    num_epochs=5

Parameters:

  • $MODEL_FAMILY: Model family (e.g., "llama3.1-8b-inst", "olmo2-7b-inst"). See model_config.yaml for all available models.
  • $MODEL_PATH: Path to the fine-tuned model
  • $FORGET_LOSS: Unlearning loss type (e.g., "learn", "wga", "dpo", "npo", "rmu_*", "rt_diff", etc.). See dataloader.py for all available methods.
  • $MASK_PATH: Path to the generated parameter mask
  • $LR: Learning rate for unlearning
  • $HP: Hyperparameter for the unlearning method

3. Evaluation

Single-point Evaluation

For evaluating a single specific checkpoint (Forget Quality, Model Utility, Forget Strength, Retain Strength, etc.):

CUDA_VISIBLE_DEVICES=$GPU_IDS python -m torch.distributed.run \
    --nproc_per_node=$NUM_GPUS \
    --master_port=$MASTER_PORT \
    eval_tofu.py \
    --config-name eval_tofu \
    model_family=$MODEL_FAMILY \
    model_path=$MODEL_PATH

Comprehensive Evaluation (AUES and MU95)

To measure AUES and MU95, use the provided script:

bash scripts/eval.sh

4. Example Usage

Mask Generation

Generate "Activation" mask:

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.run \
    --nproc_per_node=2 \
    generate_mask.py \
    attribution=activation

Localized Unlearning

Full Parameter Update (Learn Full Data):

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.run \
    --nproc_per_node=2 \
    unlearn.py \
    model_family=llama3.1-8b-inst \
    forget_loss=learn \
    forget_split=full \
    lr=1e-5 \
    save_checkpoint=True \
    num_epochs=5

Local Parameter Update (Random Mask, Learn Forget Data):

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.run \
    --nproc_per_node=2 \
    unlearn.py \
    model_family=llama3.1-8b-inst \
    forget_loss=learn_reg \
    forget_split=forget10 \
    model_path=llama3.1-8b-inst/TOFU/default/retain90/learn/lora_r0/masked_False/lr_1e-05_wd_0.01_ep_5_bs_8/checkpoint-565 \
    mask_path=mask/random_down/llama3.1-8b-inst/TOFU/default/forget10/random_down_mask_normal_0.1_s42.pt \
    lr=0.0002 \
    save_checkpoint=True \
    num_epochs=5

Local Parameter Update (Random Complement Mask, NPO Unlearning):

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.run \
    --nproc_per_node=2 \
    unlearn.py \
    model_family=llama3.1-8b-inst \
    forget_loss=npo \
    forget_split=forget10 \
    model_path=llama3.1-8b-inst/TOFU/default/retain90/learn/lora_r0/masked_False/lr_1e-05_wd_0.01_ep_5_bs_8/checkpoint-565/forget10/learn_reg/lora_r0_lm_False/masked_mask/random_down/llama3.1-8b-inst/TOFU/default/forget10/random_down_mask_normal_0.1_s42.pt/lr_0.0002_wd_0.01_ep_5_bs_8/checkpoint-65 \
    mask_path=mask/random_down/llama3.1-8b-inst/TOFU/default/forget10/random_down_mask_random_complement_0.1_s42.pt \
    lr=1e-5 \
    save_checkpoint=True \
    num_epochs=5

Evaluation

Single-point Evaluation:

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.run \
    --nproc_per_node=2 \
    eval_tofu.py \
    model_path=llama3.1-8b-inst/TOFU/default/retain90/learn/lora_r0/masked_False/lr_1e-05_wd_0.01_ep_5_bs_8/checkpoint-565

Comprehensive Evaluation (AUES/MU95):

  1. Edit scripts/eval.sh:
# List of model directories to evaluate (Unlearned Models)
MODEL_DIR=(
    "/path/to/model_dir_1"
    "/path/to/model_dir_2"
    "/path/to/model_dir_3"
)

# Initial model directory
INIT_MODEL_DIR="/path/to/init_model_dir"
  1. Run evaluation:
bash scripts/eval.sh

5. Hyperparameters

Please refer to Appendix F (Hyperparameter Details) of the paper for the full, per-experiment configurations.

6. Configuration Files

The pipeline uses YAML configuration files in the config/ directory:

  • get_mask_tofu.yaml: Mask generation configuration
  • forget_tofu.yaml: Unlearning configuration
  • eval_tofu.yaml: Evaluation configuration
  • model_config.yaml: Model-specific settings
  • ds_config.json: DeepSpeed configuration for distributed training

📚 Citation

@misc{lee2025doeslocalizationinformunlearning,
      title={Does Localization Inform Unlearning? A Rigorous Examination of Local Parameter Attribution for Knowledge Unlearning in Language Models}, 
      author={Hwiyeong Lee and Uiji Hwang and Hyelim Lim and Taeuk Kim},
      year={2025},
      eprint={2505.16252},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2505.16252}, 
}

🙏 Acknowledgments

We thank the authors of the following repositories for providing valuable code and implementations that served as references for this work:

About

Official Repository for “Does Localization Inform Unlearning? A Rigorous Examination of Local Parameter Attribution for Knowledge Unlearning in Language Models”.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors