Does Catastrophic Forgetting Happen in Tiny Subspaces? [PDF]
Catastrophic forgetting remains a significant challenge in continual learning, where adapting to new tasks often disrupts previously acquired knowledge. Recent studies on neural network optimization indicate that learning in a non-continual framework primarily occurs within the bulk subspace of the loss Hessian, which is associated with small eigenvalues of the latter. However, the role of the bulk subspace in a continual learning setting, particularly in relation to forgetting, is not well understood. In this work, we investigate how constraining gradient updates to either the bulk or dominant subspace affects learning and forgetting. Through experiments on Permuted MNIST, Split-CIFAR10, and Split-CIFAR100, we confirm that task-specific learning occurs in the bulk subspace of the loss Hessian. Additionally, there is evidence suggesting that forgetting may also predominantly occur within the bulk subspace, although further large-scale experiments are needed to validate this. Our findings provide promising avenues for efficient implementations of algorithms that counter catastrophic forgetting.
- Rufat Asadli (22-953-632)
- Armin Begic (20-614-582)
- Jan Schlegel (19-747-096)
- Philemon Thalmann (18-111-674)
Clone this repository.
git clone git@github.com:JHSchlegel/cf-tiny-subspaces.git
cd cf-tiny-subspacesWe recommend using a conda environment to install the required packages. Run the following commands to create a new environment and install the dependencies.
conda create -n cf python=3.10 pip
conda activate cf
pip install -r requirements.txtMoreover, this repository includes pytorch-hessian-eigenthings as a submodule; after cloning, run the following commands at the root level of the cf-tiny-subspaces repository to initialize, update and install the submodule:
git submodule init
git submodule update
cd pytorch_hessian_eigenthings
pip install -e ..
├── LICENSE
├── README.md
├── requirements.txt
├── reports
│ ├── proposal.pdf # project proposal
│ └── paper.pdf # final project report
├── configs
│ ├── permuted_mnist.yaml # configuration file for permuted MNIST
│ ├── split_cifar10.yaml # configuration file for split CIFAR-10
│ └── split_cifar100.yaml # configuration file for split CIFAR-100
├── modules
│ ├── __init__.py
│ ├── CLTrainer.py # trainer for continual learning
│ ├── JointTrainer.py # trainer for multitask learning
│ ├── cnn.py # CNN model for split CIFAR-10 and split CIFAR-100
│ ├── mlp.py # MLP model for permuted MNIST
│ └── subspace_sgd.py # SGD-optimizer that allows for gradient projection into a supspace
├── notebooks
│ ├── multitask_learning.ipynb # notebook for multitask learning
│ ├── subspace_sgd_examples.ipynb # tutorial notebook for subspace-SGD
│ └── visualizations.ipynb # notebook for visualizations
├── scripts
│ ├── ablation_study.sh # script to run bulk space ablations
│ └── train.sh # script to run main experiments
├── train
│ ├── __init__.py
│ ├── train_permuted_mnist.py # train script for permuted MNIST
│ ├── train_split_cifar10.py # train script for split CIFAR-10
│ └── train_split_cifar100.py # train script for split CIFAR-100
└── utils
├── data_utils
│ ├── continual_dataset.py # abstract dataset class for continual learning
│ ├── permuted_mnist.py # permuted MNIST dataset
│ └── sequential_CIFAR.py # sequential CIFAR datasets
├── __init__.py
├── metrics.py # overlap metrics
├── reproducibility.py # utilities for pytorch reproducibility
└── wandb_utils.py # utilities for logging to wandbThe main results of our work, as summarized in Table 1 of the report, can be reproduced by running the scripts/train.sh script:
bash scripts/train.shThe bulk space ablations can be reproduced by running the scripts/ablation_study.sh script:
bash scripts/ablation_study.sh <dataset_name>where <dataset_name> is one of cifar10, cifar100, or pmnist.
All visualizations included in the report were created using the notebooks/visualizations.ipynb notebook.
Finally, the multitask oracle baseline can be reproduced by running the notebooks/multitask_learning.ipynb notebook.