-
Hi, Can anyone explain any prior evaluation work for Bandit that reports how accurately it can detect vulnerabilities? For example, this would consist of running Bandit against a dataset with known vulnerable python code and would count the number of true/false positives and true/false negatives. I understand this could be quite difficult, considering the lack of reference datasets in python for testing security flaws. I haven't come across any datasets, like the dataset NIST Software Assurance Reference Dataset (SARD) for C/C++/Java, but specifically for python. I could not find any assessment within the documentation reporting evaluation statistics of Bandit, such as precision and recall. I was hoping someone could point me in the direction of an analysis of this tool. Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
It's an interesting idea, but I haven't seen any effort on that. Bandit does distinguish a confidence factor for checks it performs. That can give you a rough estimate of whether a check would result in a false positive/negative. |
Beta Was this translation helpful? Give feedback.
It's an interesting idea, but I haven't seen any effort on that. Bandit does distinguish a confidence factor for checks it performs. That can give you a rough estimate of whether a check would result in a false positive/negative.