VELVET: a noVel Ensemble Learning approach to automatically locate VulnErable sTatements (SANER 2022)

This paper presents VELVET, a novel ensemble learning approach to locate vulnerable statements. Our model combines graph-based and sequence-based neural networks to successfully capture the local and global context of a program graph and effectively understand code semantics and vulnerable patterns. This work is done by researchers from Columbia University and IBM Research.

Updates

Nov, 2023: For the convenience of PyTorch users and easier adoption or customization, and to enable VELVET's potential of integrating the latest deep-learning techniques, we are re-implementing VELVET with PyTorch, taking advantage of the latest pre-trained code LM checkpoints and GNN architectures. Please check VELVET-PyTorch.

Data

This paper considers two datasets as the main resources for the evaluation:

Juliet Test Suite for C/C++
IBM D2A Dataset. Our processed function-level dataset can be found here.

Approach

Graph-based neural networks are effective at understanding the semantic order of programs, since they directly learn control flows and data dependencies with the pre-defined edges. However, training involves a message passing algorithm where nodes only communicate with their neighbors. The ability to learn long-range dependencies is limited by the number of message passing iterations, which are typically set to a small number (e.g., less than eight) due to computational cost. Such a limitation will result in an inherently local model. In contrast, Transformer allows global, program-wise information aggregation, and without pre-defined edges, the self-attention mechanism of Transformer is expected to encode considerable code semantics – which can be complementary to those defined explicitly by the code graph. Therefore, to learn the diversity of vulnerable patterns, we separately train these two distinct models and use their predictions in an ensemble learning setting at inference time.

Our implementation for the model can be found here.

Citation

@inproceedings{ding2022velvet,
author = {Y. Ding and S. Suneja and Y. Zheng and J. Laredo and A. Morari and G. Kaiser and B. Ray},
booktitle = {2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)},
title = {VELVET: a noVel Ensemble Learning approach to automatically locate VulnErable sTatements},
year = {2022},
issn = {1534-5351},
pages = {959-970},
keywords = {location awareness;codes;neural networks;static analysis;software;data models;security},
doi = {10.1109/SANER53432.2022.00114},
url = {https://doi.ieeecomputersociety.org/10.1109/SANER53432.2022.00114},
publisher = {IEEE Computer Society},
address = {Los Alamitos, CA, USA},
month = {mar}
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
images		images
src		src
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VELVET: a noVel Ensemble Learning approach to automatically locate VulnErable sTatements (SANER 2022)

Updates

Data

Approach

Citation

About

Releases

Packages

Languages

License

ARiSE-Lab/VELVET

Folders and files

Latest commit

History

Repository files navigation

VELVET: a noVel Ensemble Learning approach to automatically locate VulnErable sTatements (SANER 2022)

Updates

Data

Approach

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages