This repository is meant as a starting point for your own GNN research projects. This code allows you to tune, train and evaluate basic models on well known graph datasets.
Projects based on this repository:
- Expressivity-Preserving GNN Simulation, NeurIPS, 2023: paper, code
- Expectation-Complete Graph Representations with Homomorphisms, ICML, 2023: paper, code
- Weisfeiler and Leman Return with Graph Transformations, MLG@ECMLPKDD, 2022: paper, code
- Reducing Learning on Cell Complexes to Graphs, GTRL@ICLR, 2022, paper, code
If you find this repository helpful please give it a ⭐.
Models:
- Message Pasing Graph Neural Networks:
GIN
,GCN
,GAT
- Equivariant Subgraph Aggregation Networks:
DS
,DSS
- Multilayer perceptron that ignores the graph structure:
MLP
Datasets:
ZINC
CSL
: please use cross validation for this dataset- OGB datasets:
ogbg-molhiv
,ogbg-moltox21
,ogbg-molesol
,ogbg-molbace
,ogbg-molclintox
,ogbg-molbbbp
,ogbg-molsider
,ogbg-moltoxcast
,ogbg-mollipo
- Long Range Graph Benchmark datasets:
Peptides-struct
,Peptides-func
,PascalVOC-SP
- QM9:
QM9
orQM9_i
if you only want to predict the i-th property
Clone this repository and open the directory
git clone https://github.com/ocatias/BasicGNNProject
cd BasicGNNProject
Add this directory to the python path. Let $PATH
be the path to where this repository is stored (i.e. the result of running pwd
).
export PYTHONPATH=$PYTHONPATH:$PATH
Create a conda environment (this assume miniconda is installed)
conda create --name GNNs
Activate environment
conda activate GNNs
Install dependencies
conda install pytorch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 -c pytorch; conda install -c pyg pyg=2.2.0; pip install -r requirements.txt
Per default, experiments are tracked tracked via wandb. This can be disabled in Configs/config.yaml
. If you want to make use of this tracking you need a wandb account. The first time you train a model, you will be prompted to enter you wandb API key. If you want to disable tracking you can do this in the config Configs/config.yaml
.
To train a GNN $GNN
once on a datasets $dataset
run
python Exp/run_model.py --model $GNN --dataset $dataset
For example python Exp/run_model.py --model GIN --dataset ZINC
. This trains the GNN GIN on the ZINC dataset a single time. The result of the training will be shown in the terminal. The different hyperparameters of the GNN can be set via commandline parameters. For more details call python Exp/run_model.py -h
.
The script Exp/run_experiment.py
optimizes hyperparameters over a parameter grid and then evaluates the parameters with the best performance on the validation set multiple times. For example:
python Exp/run_experiment.py -grid Configs/Benchmark/GIN_grid.yaml -dataset ogbg-molesol --candidates 20 --repeats 10
This command tries 20 hyperparameter configurations defined in the GIN_grid.yaml
config on the ogbg-molesol
dataset and evaluates the best parameters 10 times. The result of these experiments will be stored in the directory Results/ogbg-molesol_GIN_grid.yaml
, the averages of the best parameters are stored in final.json
. If you have a dataset that requires cross-validation (e.g. CSL
), then you need to set the number of folds (for example --folds 10
).
As Exp/run_model.py
allows to set model hyperparameters from the commandline, we can use WandB sweeps to optimize hyperparameters. Here is a short guide, you need to specify your parameter and scripts to run in a config file (see Configs/WandB_grids/example_grid.yaml
). The sweep can then be initialized with
wandb sweep Configs/WandB_Grids/example_grid.yaml
This command will tell you the command needed to join agents to the sweep. You can even join agents on different computers to the same sweep! Sweeps can also be initialized purely from scripts. More details on sweeps be found here.
To run integration tests
python -m unittest
Models
- GIN: How Powerful are Graph Neural Networks?; Xu et al.; ICLR 2019
- GCN: Semi-Supervised Classification with Graph Convolutional Networks; Kipf and Welling; ICLR 2017
- GAT: Graph Attention Networks; Veličković at al.; ICLR 2018
- DS and DSS: Equivariant Subgraph Aggregation Networks; Bevilacqua et al.; ICLR 2022
Datasets
- ZINC: Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules; Gómez-Bombarelli et al.; ACS Central Science 2018
- ZINC: ZINC 15 – Ligand Discovery for Everyone; Sterling and Irwin; Journal of Chemical Information and Modeling 2018
- CSL: Relational Pooling for Graph Representations; Murphy et al.; ICML 2019
- OGB: Open Graph Benchmark: Datasets for Machine Learning on Graph; Hu et al.; NeurIPS 2020
- Long Range Graph Benchmark: Long Range Graph Benchmark; Dwivedi et al.; NeurIPS 2022
- QM9: MoleculeNet: A Benchmark for Molecular Machine Learning; Wu et al.; Chemical Science 2018