Partial Implementation of Devign

This is a partial implementation of the Devign model, a graph neural network based model that identifies vulnerabilities in functions written in the C programming language. The paper. The dataset.

Installation

Python Packages

From requirements.txt

One option is to install from the requirements.txt file. This can be done with conda with the following command: conda create --name <env_name> --file requirements.txt -c conda-forge -c pytorch -c rusty1s

Manually

The following (and dependencies) will be required:

PyTorch
Pytorch Geometric
Natural Language Toolkit
Pytorch Lightning
Torchmetrics
Gensim
Numpy

Joern

Joern, is the tool used for generating code property graphs and can be installed according to these instructions. The data preparation process will check first for installation in <project root>/joern/, i.e. the Joern executable path will be <project root>/joern/joern/joern-cli/joern. If Joern has not been installed in <project root>/joern/, then ~/bin/joern/ will be searched in accordance with the default installation of Joern.

Instructions

Setup

Upon first pulling the project, run python main.py setup to set up project directories that will be used for storing unpacked data and embedding models.

Running Processes

Run python main.py run -h for help. A sample dataset has been split off of the main dataset to verify that things are set up properly and to assess runtime. Training on this dataset uses fewer epochs.

Running Joern and training Devign on the full dataset takes a few hours on an RTX 2070 Super.

Preparing The Data

To prepare the data, unpacking the dataset and running Joern to create the graphs, run python main.py run sample prepare for the sample data or python main.py run full prepare for the full dataset. Data preparation can also be combined with training by running the model with --rebuild.

Running The Model

If the corresponding data has been prepared, run python main.py run sample model flat or python main.py run full model flat for the baseline model described in the paper, or python main.py run sample model devign for the Devign model. Alternatively, you can run python main.py run sample model flat --rebuild to both prepare the sample data and run the model.

Citation

@inproceedings{NEURIPS2019_49265d24,author = {Zhou, Yaqin and Liu, Shangqing and Siow, Jingkai and Du, Xiaoning and Liu, Yang}, booktitle = {Advances in Neural Information Processing Systems},editor = {H. Wallach and H. Larochelle and A. Beygelzimer and F. d\textquotesingle Alch\'{e}-Buc and E. Fox and R. Garnett}, pages = {}, publisher = {Curran Associates, Inc.}, title = {Devign: Effective Vulnerability Identification by Learning Comprehensive Program Semantics via Graph Neural Networks}, url = {https://proceedings.neurips.cc/paper/2019/file/49265d2447bc3bbfe9e76306ce40a31f-Paper.pdf}, volume = {32},year = {2019}}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
joern		joern
sample/data		sample/data
src		src
LICENSE		LICENSE
README.rst		README.rst
main.py		main.py
requirements.txt		requirements.txt
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Partial Implementation of Devign

Installation

Python Packages

From requirements.txt

Manually

Joern

Instructions

Setup

Running Processes

Preparing The Data

Running The Model

Citation

About

Releases

Packages

Languages

License

paultheal1en/devign

Folders and files

Latest commit

History

Repository files navigation

Partial Implementation of Devign

Installation

Python Packages

From requirements.txt

Manually

Joern

Instructions

Setup

Running Processes

Preparing The Data

Running The Model

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages