-
Notifications
You must be signed in to change notification settings - Fork 4
Description
While working on improving the htp module on Wapiti ( wapiti-scanner/wapiti#344 ), I noticed several inconsistencies in the hashtheplanet database.
What happens is that a version appears in the hash table but doesn't have its counterpart in the version table
sqlite> select count(*) from hash where versions like "%4.0.0-alpha4x%" and technology = "WordPress";
202
sqlite> select count(*) from version where technology = "WordPress" and version = "4.0.0-alpha4x";
0This is particularily true with the aforementioned version that appears with a lot of hashes (I cut the output):
GET https://blog.logrocket.com/wp-includes/js/tinymce/license.txt (0) led to technology ('magento2', '"{\\"versions\\": [\\"2.3.0\\", \\"2.3.1\\", \\"2.3.2\\", \\"2.3.2-p2\\", \\"2.3.3\\", \\"2.3.3-p1\\", \\"2.3.4\\", \\"2.3.4-p2\\", \\"2.3.5\\", \\"2.3.5-p1\\", \\"2.3.5-p2\\", \\"2.3.6\\", \\"2.3.6-p1\\", \\"2.3.7\\", \\"2.3.7-p1\\", \\"2.3.7-p2\\", \\"2.3.7-p3\\", \\"2.3.7-p4\\", \\"2.4.0\\", \\"2.4.0-p1\\", \\"2.4.1\\", \\"2.4.1-p1\\", \\"2.4.2\\", \\"2.4.2-p1\\", \\"2.4.2-p2\\", \\"2.4.3\\", \\"2.4.3-p1\\", \\"2.4.3-p2\\", \\"2.4.3-p3\\", \\"2.4.4\\", \\"4.0.0-alpha1\\", \\"4.0.0-alpha10\\", \\"4.0.0-alpha11\\", \\"4.0.0-alpha12\\", \\"4.0.0-alpha2\\", \\"4.0.0-alpha3\\", \\"4.0.0-alpha4\\", \\"4.0.0-alpha4x\\"]}"')
GET https://blog.logrocket.com/wp-includes/js/mediaelement/mediaelementplayer.css (0) led to technology ('joomla-cms', '"{\\"versions\\": [\\"4.0.0-alpha4x\\"]}"')
GET https://blog.logrocket.com/wp-includes/sodium_compat/src/Core/Curve25519/README.md (0) led to technology ('WordPress', '"{\\"versions\\": [\\"5.2\\", \\"3.10.0\\", \\"3.10.0-alpha1\\", \\"3.10.0-alpha2\\", \\"4.0.0\\", \\"4.0.0-alpha1\\", \\"4.0.0-alpha10\\", \\"4.0.0-alpha11\\", \\"4.0.0-alpha12\\", \\"4.0.0-alpha2\\", \\"4.0.0-alpha3\\", \\"4.0.0-alpha4\\", \\"4.0.0-alpha4x\\", \\"4.0.0-alpha5\\", \\"4.0.0-alpha6\\", \\"psr12anchor\\"]}"')
GET https://blog.logrocket.com/wp-content/themes/twentytwentytwo/templates/blank.html (0) led to technology ('underscore', '"{\\"versions\\": [\\"1.12.1\\", \\"1.13.0-0\\", \\"1.13.0-2\\", \\"1.13.0-1\\", \\"8.0-alpha10\\", \\"8.0-alpha11\\", \\"8.0-alpha12\\", \\"8.0-alpha13\\", \\"8.0-alpha2\\", \\"8.0-alpha3\\", \\"8.0-alpha4\\", \\"8.0-alpha5\\", \\"8.0-alpha6\\", \\"8.0-alpha7\\", \\"8.0-alpha8\\", \\"4.0.0\\", \\"4.0.0-alpha1\\", \\"4.0.0-alpha10\\", \\"4.0.0-alpha11\\", \\"4.0.0-alpha12\\", \\"4.0.0-alpha2\\", \\"4.0.0-alpha3\\", \\"4.0.0-alpha4\\", \\"4.0.0-alpha4x\\", \\"4.0.0-alpha5\\", \\"4.0.0-alpha6\\", \\"4.0.0-alpha7\\", \\"4.0.0-alpha8\\", \\"4.0.0-alpha9\\", \\"4.0.0-beta\\", \\"4.0.0-beta2\\", \\"4.0.0-beta3\\", \\"4.0.0-beta4\\", \\"4.0.0-beta5\\", \\"4.0.0-beta6\\", \\"4.0.0-beta7\\", \\"4.0.0-rc1\\", \\"psr12anchor\\", \\"psr12final\\", \\"search1\\"]}"')
Only the joomla-cms entry is relevant because that tag is specific to Joomla: https://github.com/joomla/joomla-cms/releases/tag/4.0.0-alpha4x
It is the same problem with tags psr12anchor and psr12final and certainly more.
Also some hashes should maybe be blacklisted because they match files that can be found in a lot of software like (in the previous output) :
- a file with a single empty line (blank.html)
- the default LGPL licence file
Those invalid version numbers certainly have an impact on the database size (issue #28 )