- February 2025 β Code cleaned and additional instructions for generating the Biography dataset added. The processed Biography dataset is available for download here.
- February 2025 β Paper interpretations now available on PaperWeekly and η₯δΉ.
- January 2025 β The code for our ICLR 2025 paper, "Spurious Forgetting in Continual Learning of Language Models", is now publicly available! π Explore the code and our findings.
Welcome to the repository for our ICLR 2025 paper, "Spurious Forgetting in Continual Learning of Language Models". This repository is organized into two main sections, each focusing on different experiments and use cases.
This section contains experiments utilizing the Biography Dataset, a synthetic dataset designed to simulate a controlled continual learning environment for language models.
To generate the Biography dataset, follow these steps:
-
Navigate to the
physics_of_forgetting
directory:cd ./code_for_biography_dataset/physics_of_forgetting/
-
Set the
PYTHONPATH
and run the preprocessing script:export PYTHONPATH=. python ./data/preprocess.py
After running the script for about 15 minutes, the preprocessed data will be saved in the following locations:
- Pretraining data:
./data/processed_final/biography/multi5_permute_fullname.json
(~1.17GB) - Fine-tuning QA data:
./data/processed_final/qa/all.json
(~183MB)
Note: The pretraining data (
multi5_permute_fullname.json
) contains five instances of each person in the dataset, with the attributes shuffled to simulate a dynamic learning environment. For further details, refer to this paper. - Pretraining data:
The preprocessed data is structured as follows:
-
Pretraining Data (
multi5_permute_fullname.json
): Each person in the dataset has five entries, with shuffled attributes. Here is an example structure for a single person (Person 0):{ "0_0": { "biography": "Lucy Damian Moscicki held a job in Palo Alto, CA. Lucy Damian Moscicki's life journey started in Elk Grove, CA. Lucy Damian Moscicki specialized in EMT and Paramedic. Lucy Damian Moscicki completed his degree requirements at Kansas State University. Lucy Damian Moscicki celebrates his special day on May 28, 1952. Lucy Damian Moscicki contributed his skills to HP.", "token_info": { "company_city": { "first_token_position": 10, "first_token": 5226 }, "birth_city": { "first_token_position": 27, "first_token": 3599 }, "major": { "first_token_position": 41, "first_token": 33566 }, "university": { "first_token_position": 58, "first_token": 15391 }, "birthday": { "first_token_position": 73, "first_token": 2552 }, "company_name": { "first_token_position": 88, "first_token": 19517 } }, "tokenizer": "GPTNeoXTokenizerFast" } }
-
QA Data (
all.json
): This contains question-answer pairs about each person's biography. Here's an example for Person 0:{ "0": { "birthday": { "prompt": "What is the birth date of Lucy Damian Moscicki?\nAnswer:", "answer": " May 28, 1952" }, "birth_city": { "prompt": "What is the birth city of Lucy Damian Moscicki?\nAnswer:", "answer": " Elk Grove, CA" }, "university": { "prompt": "Which university did Lucy Damian Moscicki study?\nAnswer:", "answer": " Kansas State University" }, "major": { "prompt": "What major did Lucy Damian Moscicki study?\nAnswer:", "answer": " EMT and Paramedic" }, "company_name": { "prompt": "Which company did Lucy Damian Moscicki work for?\nAnswer:", "answer": " HP" }, "company_city": { "prompt": "Where did Lucy Damian Moscicki work?\nAnswer:", "answer": " Palo Alto, CA" } } }
Alternatively, you can download the final preprocessed dataset directly from Google Drive, which contains the exact data used for all experiments in our paper. This data is identical to what you would generate by running the preprocessing script except the order of person name. You can copy the pretraining and QA files from our directory processed_0720_v0730
to your directory processed_final
.
We generate data for a total of 200K persons. When running the pretraining or fine-tuning experiments, use the configuration files in the ./config
folder to specify which set of persons to use for each phase:
- For pretraining on persons 0-100K, use the configuration file:
./config/v0731/single/pre_training.json
. - For fine-tuning on the first 50K persons and testing on the next 50K, use the configuration file:
./config/v0731/single/fine_tuning.json
.
- Pretraining:
- Train a model on 100K individuals to establish a foundational knowledge base.
- Continual Finetuning:
- Incrementally finetune the model on 20K individuals.
- Extended Settings:
- Include more tasks.
- Vary the number of individuals.
- Explore diverse task types.
- Recovery Experiments:
- Investigate the modelβs ability to recover performance on previously seen tasks.
- Feature Perspective:
- Analyze residual stream shifts in the visualization directory:
./code_for_biography_dataset/physics_of_forgetting/residual_stream_shift_analysis
- Analyze residual stream shifts in the visualization directory:
This section extends the research to real-world scenarios, integrating methods and datasets that reflect practical continual learning challenges.
This section builds upon this incremental learning repository. For detailed instructions on dataset preprocessing and usage, refer to the README within this directory:
./code_for_realworld_scenarios/README.md
- Continual Finetuning on Biography Dataset:
- Methods: EWC, LAMOL, Task Vector, Gradient Projection, SEQ, REPLAY, Freeze.
- Safety Alignment:
- Methods: Freeze, SEQ.
- Continual Instruction Tuning:
- Methods: Freeze, SEQ.
- Continual Knowledge Editing:
- Methods: Freeze, SEQ.
- Instance Incremental Learning:
- Methods: Freeze, SEQ.
- Task Vector:
- Explore tradeoffs using:
./code_for_realworld_scenarios/visualization-tradeoff
- Explore tradeoffs using:
- Continual Learning Methods:
- Visualize EWC, LAMOL, and Gradient Projection results:
./code_for_realworld_scenarios/visualization_continual_learning_methods
- Visualize EWC, LAMOL, and Gradient Projection results:
- Weight Update Perspective:
- Examine orthogonal weight updates:
./code_for_realworld_scenarios/visualization-orthogonal-weight-update
- Examine orthogonal weight updates:
- Loss Landscape Perspective:
- Analyze the modelβs loss landscape:
./code_for_realworld_scenarios/visualization-loss-landscape
- Analyze the modelβs loss landscape:
If you find this repository useful, please consider citing our research:
@inproceedings{
zheng2025spurious,
title={Spurious Forgetting in Continual Learning of Language Models},
author={Junhao Zheng and Xidi Cai and Shengjie Qiu and Qianli Ma},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=ScI7IlKGdI}
}
Help us grow by starring π this repository on GitHub! π
Thank you for your interest in our work. We look forward to your feedback and collaboration! β¨
If you have questions about this repository, please feel free to contact me at junhaozheng47@outlook.com.