This repository is the official PyTorch implementation of GraphGen, a generative graph model using auto-regressive model.
Nikhil Goyal, Harsh Vardhan Jain, and Sayan Ranu, GraphGen: A Scalable Approach to Domain-agnostic Labeled Graph Generation, in WWW, 2020.
Most of the code has been adapted from GraphRNN
We recommend anaconda distribution for Python and other packages. The code has been tested over PyTorch 1.2.0 version with Python 3.7.0.
Pytorch and pip installation in conda. Change cuda version as per your GPU hardware support.
conda install pip pytorch=1.2.0 torchvision cudatoolkit=10.1 -c pytorchThen install the other dependencies.
pip install -r requirements.txtBoost (Version >= 1.70.0) and OpenMP are required for compling C++ binaries. Run build.sh script in the project's root directory.
./build.shpython3 main.pymain.pyis the main script file, and specific arguments are set inargs.py.train.pyincludes training iterations framework and calls generative algorithm specific training files.datasets/preprocess.pyandutil.pycontain preprocessing and utility functions.datasets/process_dataset.pyreads graphs from various formats.
GraphGen:
dfscode/dfs_code.cppcalculates the minimum DFS code required by GraphGen. It is adapted from kaviniitm.dfscode/dfs_wrapper.pyis a python wrapper for the cpp file.graphgen/model.pyandgraphgen/data.pycontain the model and DataLoader class respectively.graphgen/train.pycontains the core loss evaluation and generation algorithm for GraphGen
For baseline models:
- We extend DeepGMG model for labeled graphs based on the DGL (Deep Graph Library). DeepGMG specific files are contained in
baselines/dgmg/folder - We extended DeepGMG model for labeled graphs based upon GraphRNN. GraphRNN specfic code is contained in
baselines/graph_rnn/folder
Parameter setting:
- All the input arguments and hyper parameters setting are included in
args.py. - Set
args.noteto specify which generative algorithm (GraphGen, GraphRNN or DeepGMG) to run. - For example,
args.devicecontrols which device (GPU) is used to train the model, andargs.graph_typespecifies which dataset is used to train the generative model. - See the documentation in
args.pyfor more detailed descriptions of all fields.
There are several different types of outputs, each saved into a different directory under a path prefix. The path prefix is set at args.dir_input. Suppose that this field is set to '':
tensorboard/contains tensorboard event objects which can be used to view training and validation graphs in real time.model_save/stores the model checkpointstmp/stores all the temporary files generated during training and evaluation.
- The evaluation is done in
evaluate.py, where user can choose which model to evaluate. Change theArgsEvaluateclass fields accordingly. - We use GraphRNN implementation for structural metrics.
- NSPDK is evaluated using EDeN python package.
metrics/isomorph.cppandmetrics/unique.cppcontain C++ function call to boost subgraph isomorphism algorithm to evaluate novelty and uniqueness.
To evaluate, run
python3 evaluate.py