GC-Bench is an open and unified benchmark for Graph Condensation (GC) based on PyTorch and PyTorch Geometric. We embark on 12 state-of-the-art graph condensation algorithms in node-level and graph-level tasks and analyze their performance in 12 distinct graph datasets.
GC-Bench is a comprehensive Graph Condensation Benchmark designed to systematically analyze the performance of graph condensation methods in various scenarios. It examines the effectiveness, transferability, and complexity of graph condensation. We evaluate 12 state-of-the-art graph condensation algorithms on both node-level and graph-level tasks across 12 diverse graph datasets. Through benchmarking these GC algorithms, we make the following contributions:
-
Comprehensive Benchmark. GC-Bench systematically integrates 12 representative and competitive GC methods on both node-level and graph-level tasks by unified condensation and evaluation, providing a thorough analysis in terms of effectiveness, transferability, and efficiency.
-
Multi-faceted Evaluation and Analysis. We conduct a detailed evaluation of GC methods, examining their effectiveness, efficiency, and complexity. This comprehensive analysis uncovers the strengths and limitations of current GC algorithms, offering valuable insights for future research.
-
Open-sourced Benchmark Library. GC-Bench is open-sourced and easy to extend with new methods and datasets. This facilitates further exploration and encourages reproducible research, helping to advance the field of graph condensation.
To get started with GC-Bench, please follow the instructions below:
-
Installation
git clone https://github.com/RingBDStack/GC-Bench.git cd GC-Bench pip install -r requirements.txt conda env create -f environment.yml
-
Download Datasets
Download the node classification and graph classification datasets and store them in the specified directory. By default, this is the data directory, but you can customize it by changing the
data_dir
parameter in your configuration. The project structure should look like the following:GC-Bench ├── data │ ├── cora │ ├── citeseer │ └── ... └── DM └── ...
Alternatively, you can leverage PyG to download and manage these datasets directly, eliminating the need to manually place them in the data directory.
Different graph condensation methods (gradient-matching, distribution-matching, kernel ridge regression etc) can be used in corresponding directories.
For example, to run the Distribution Matching (DM) method, use the following command:
python DM/main.py --dataset=citeseer --epochs=2000 --gpu_id=0 --lr_adj=0.001 --lr_feat=0.01 --lr_model=0.1 --method=GCDM --nlayers=2 --outer=10 --reduction_rate=1 --save=1 --seed=1 --transductive=1
To run the Gradient Matching (GM) method for node classification, use the following command:
python GM/main_nc.py --dataset cora --transductive=1 --nlayers=2 --sgc=1 --lr_feat=1e-4 --lr_adj=1e-4 --r=0.5 --seed=1 --epoch=600 --save=1
To run the Gradient Matching (GM) method for graph classification, use the following command:
python GM/main_gc.py --dataset ogbg-molhiv --init real --nconvs=3 --dis=mse --lr_adj=0.01 --lr_feat=0.01 --epochs=1000 --eval_init=1 --net_norm=none --pool=mean --seed=1 --ipc=5 --save=1
Parameters can also be set in configuration files. To run experiments using a configuration file, use the following command:
python GM/main_nc.py --config config_DosCond --section DBLP-r0.250
This command will run the corresponding experiments with the parameters specified in the configuration file. The provided configuration files contain the parameters used to obtain the results presented in our benchmark.
For evaluation on different architectures, you can simply run:
python baselines/test_nc.py --method ${method} --dataset cora --gpu_id=0 --r=0.5 --nruns=5
Replace ${method} with the specific condensation method you used. For evaluation on different tasks, you can simply run:
python evaluator/test_other_tasks.py --method ${method} --dataset cora --gpu_id=0 --r=0.5 --seed=1 --nruns=5 --task=LP
Replace ${method} with the specific condensation method you used. The --task parameter can be set to LP for link prediction, AD for anomaly detection, etc.
Summary of Graph Condensation (GC) algorithms. We also provide public access to the official algorithm implementations. "KRR" is short for Kernel Ridge Regression and "CTC" is short for computation tree compression. "GNN" is short for Graph Neural Network, "GNTK" is short for Graph Neural Tangent Kernel, "SD" is short for Spectral Decomposition. "NC" is short for node classification, "LP" is short for link prediction, "AD" is short for anomaly detection, and "GC" is short for graph classification.