This is a dataset consisting of vulnerable code snippets in C and C++ collected from repositories on GitHub. These are processed versions of four original datasets available publicly and popularly used to train vulnerability detection models (Big-Vul, D2A, Devign and Juliet). The original datasets were made unique and consistent to form this dataset.
The paper for this dataset can be found here: Data Quality for Software Vulnerability Datasets.