-
Notifications
You must be signed in to change notification settings - Fork 0
Presentation
k----n edited this page Dec 5, 2020
·
17 revisions
- How widespread are cloned files that contain known vulnerabilities?
- Perhaps go over the algorithm in https://github.com/woc-hack/hemlock/blob/master/program/README.md?
- ob2b mapping did not exist. Dr. Mockus added it.
- uncertain about whether to use c2P or c2p, determined c2p was best.
- The sheer volume of data makes it hard to get results in reasonable time. Spent some time on performance issues. Adding some parallel processing gave the biggest improvement in performance.
- Go over the output of the find_cloned_files script at https://woc-hack.github.io/hemlock/
- Chris' observations about labapart/polymcu: not updated for a while, but since commits stopped, interested people have come along and expressed interest, suggesting that people are picking up and using this code, not realizing there is a vulnerability: https://github.com/labapart/polymcu/issues/6
- There are 5000+ forks of QEMU; we identified 21 that were NOT vulnerable.
- Some forks of QEMU are super old, like http://github.com/jeffreymingyue/qemu, which is "30957 commits behind qemu:master". Suggests maybe ones that have no extra commits beyond the QEMU project are lower risk: they were a one time fork and ignored after that.
- 572 projects copied QEMU code before this fix, and redistribute it, for example in a subdirectory. Ex: a Georgia Tech projected, https://github.com/sslab-gatech/opensgx, "OpenSGX: An open platform for Intel SGX". Last edited in 2016, and had this bug. Comment this fall (https://github.com/sslab-gatech/opensgx/issues/67) proposes to restart the project. New forks, such as gutjuri/opensgx, last edited (June 2020), contain the bug.
- Find more candidate projects
- search commit logs for CVE
- search nvd.nist.gov, cve.mitre.org, etc
- Run the tool on many many more projects
- Collect more in-depth information about a few specific vulnerable projects
- Try to find out when a vulnerability was introduced
- maybe diff fixed version with previous version to find lines that changed
- maybe use SZZ algorithm
- Crowdsource dataset to determine which blobs contain vulnerable code, and which lines of code are vulnerable (https://github.com/doccano/doccano)
- Looking at blobs might be weird (Kalvin note: it might be useful to help drill down files containing parts of code to figure out if the vulnerable code still exists)
- Consider sources for CVE (might not be referenced in commit message, consider messages in GHtorrent)