Continual Pre-Training with Replay + Gradient Alignment (GPT-NeoX)

This repository provides a GPT-NeoX/Megatron-DeepSpeed pipeline to perform continual pre-training (CPT) with:

Experience Replay using a disk-backed buffer (async prefetch + RAM cache) mixed into each batch.
Gradient Alignment via lightweight Reptile/MER meta-updates applied at a configurable cadence.

If you use this repository, please cite the accompanying paper: “Revisiting Replay and Gradient Alignment for Continual Pre-Training of Large Language Models” (2025). See the ** Citation** section below.

Highlights

Drop-in mixed-batch replay that plugs into NeoX’s dataloader loop.
Disk-resident buffer with streaming writes, and prefetch to hide I/O latency.
MER/Reptile hook that interpolates weights every k steps with negligible compute/memory overhead.
Metrics scripts for Forgetting Score, Retained Loss, and Learned Loss, plus lm-eval integration.

How It Works (brief)

Mixed-batch replay: each step draws (1−α) samples from the current task stream and α from the disk buffer M.

Disk-backed buffer: examples are appended as they arrive + async prefetch keeps GPUs fed.

MER/Reptile: every k steps, interpolate current weights toward the weights from the last k steps (θ ← (1−ε)·θ + ε·θ_k).

@inproceedings{abbes2025revisiting, title = {Revisiting Replay and Gradient Alignment for Continual Pre-Training of Large Language Models}, author = {Istabrak Abbes and Gopeshh Subbaraj and Matthew Riemer and Nizar Islah and Benjamin Therien and Tsuguchika Tabaru and Hiroaki Kingetsu and Sarath Chandar and Irina Rish}, booktitle = {Conference on Lifelong Learning Agents (CoLLAs)}, year = {2025}, archivePrefix= {arXiv}, eprint = {2508.01908} }

Name		Name	Last commit message	Last commit date
Latest commit History 2,311 Commits
.github		.github
configs		configs
continual_utils		continual_utils
data		data
eval_tasks		eval_tasks
finetuning		finetuning
images		images
megatron		megatron
post-training		post-training
requirements		requirements
tests		tests
tools		tools
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CITATION.cff		CITATION.cff
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README-MUP.md		README-MUP.md
README.md		README.md
deepy.py		deepy.py
docker-compose-dockerhub.yml		docker-compose-dockerhub.yml
docker-compose.yml		docker-compose.yml
eval.py		eval.py
generate.py		generate.py
prepare_data.py		prepare_data.py
syncall-wandb.sh		syncall-wandb.sh
tokenize_helper.py		tokenize_helper.py
train.py		train.py
train_new.py		train_new.py
train_new_replay.py		train_new_replay.py
train_replay.py		train_replay.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Continual Pre-Training with Replay + Gradient Alignment (GPT-NeoX)

Highlights

How It Works (brief)

About

Uh oh!

Releases

Packages

Languages

License

chandar-lab/continual-pretraining

Folders and files

Latest commit

History

Repository files navigation

Continual Pre-Training with Replay + Gradient Alignment (GPT-NeoX)

Highlights

How It Works (brief)

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages