Dataset of known vulnerabilities in the Mozilla Firefox project.
Cite as:
@article{yu2018improving,
title={Improving Vulnerability Inspection Efficiency Using Active Learning},
author={Yu, Zhe and Theisen, Christopher and Williams, Laurie and Menzies, Tim},
journal={arXiv preprint arXiv:1803.06545},
year={2018}
}
Each row in vulnerabilities.csv related to a bug report being classified as security vulnerability-related by human reviewers.
Mapping between vulnerability types in vulnerabilities.csv to the categories in the paper:
{'arbitrary-code': 'Protection Mechanism Failure', 'injection': 'Protection Mechanism Failure', 'Code - Security Features - Protection Mechanism Failure': 'Protection Mechanism Failure', 'cross-site-scripting': 'Protection Mechanism Failure', 'Code - Resource Management Error - Improper Resource Shutdown or Release': 'Resource Management Errors', 'data-leakage': 'Resource Management Errors', 'use-after-free': 'Resource Management Errors', 'Code - Resource Management Error - Uncontrolled Resource Consumption': 'Resource Management Errors', 'Code - Resource Management Error': 'Resource Management Errors', 'spoofing': 'Resource Management Errors', 'Code - Resource Management Error - Use After Free': 'Resource Management Errors', 'denial-of-service': 'Resource Management Errors', 'Code - Data Processing': 'Data Processing Errors', 'memory-corruption': 'Data Processing Errors', 'buffer-overflow': 'Data Processing Errors', 'exploitable-crash': 'Data Processing Errors', 'Code - Code Quality': 'Code Quality', 'Configuration': 'Other', 'Environment': 'Other', 'Code - Traversal - Link Following': 'Other', 'Code - Time and State - Race Conditions': 'Other', 'privilege-escalation': 'Other', 'Code - Traversal': 'Other', '?': 'Other'}
The snapshot was taken from the main branch on mercurial on November 21st, 2017.
Each row in the Combined data has crash counts, software metrics, and source code of the file as independent variables and the categories of vulnerabilities the file contains as dependent variable. Using this data alone can reproduce the result of the paper.