This is the PyTorch implementation for Bonsai: Gradient-free Graph Distillation for Node Classification
For Cora, Citeseer, and Pubmed, the code will directly download them from PyTorch Geometric. For Flickr, Ogbn-arxiv, and Reddit, we use the datasets provided by GraphSAINT. They are available on this Google Drive link provided by the GraphSAINT team. Download the files and unzip them to datasets at the root directory.
- Please install dependencies from requirements.txt in
python==3.9.19 - You can directly run
bash run.sh 0. Here,0is the index of the GPU to run on it. If no index is passed, the code runs on CPU. - The outputs will be saved to
saved_oursdirectory. It also containstrain_bonsai.pyscript along withtrain_all.shto train aGCN,GAT, andGINon the saved outputs to get the results.
The code doesn't perform sampling by default, but it can be enabled by passing 0 < frac_to_sample < 1 to repr_to_dist function in main.py in Ln. 610.
We over-ride the default train/val/test split in Ln. 545-550 of main.py for all datasets. Specifically,
train, idx_test = train_test_split(range(nnodes), test_size=0.2, random_state=42)
rng = np.random.RandomState(seed=0)
idx_train = rng.choice(train, size=int(0.7 * len(train)), replace=False)
idx_val = list(set(range(nnodes)) - set(idx_train).union(set(idx_test)))
splits = {"train": idx_train, "val": idx_val, "test": idx_test}Please refer to the mag240m directory's README for the instructions.
@inproceedings{
gupta2025bonsai,
title={Bonsai: Gradient-free Graph Condensation for Node Classification},
author={Mridul Gupta and Samyak Jain and Vansh Ramani and Hariprasad Kodamana and Sayan Ranu},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=5x88lQ2MsH}
}