Skip to content

Inconsistency in database #30

@devl00p

Description

@devl00p

While working on improving the htp module on Wapiti ( wapiti-scanner/wapiti#344 ), I noticed several inconsistencies in the hashtheplanet database.

What happens is that a version appears in the hash table but doesn't have its counterpart in the version table

sqlite> select count(*) from hash where versions like "%4.0.0-alpha4x%" and technology = "WordPress";
202
sqlite> select count(*) from version where technology = "WordPress" and version = "4.0.0-alpha4x";
0

This is particularily true with the aforementioned version that appears with a lot of hashes (I cut the output):

GET https://blog.logrocket.com/wp-includes/js/tinymce/license.txt (0) led to technology ('magento2', '"{\\"versions\\": [\\"2.3.0\\", \\"2.3.1\\", \\"2.3.2\\", \\"2.3.2-p2\\", \\"2.3.3\\", \\"2.3.3-p1\\", \\"2.3.4\\", \\"2.3.4-p2\\", \\"2.3.5\\", \\"2.3.5-p1\\", \\"2.3.5-p2\\", \\"2.3.6\\", \\"2.3.6-p1\\", \\"2.3.7\\", \\"2.3.7-p1\\", \\"2.3.7-p2\\", \\"2.3.7-p3\\", \\"2.3.7-p4\\", \\"2.4.0\\", \\"2.4.0-p1\\", \\"2.4.1\\", \\"2.4.1-p1\\", \\"2.4.2\\", \\"2.4.2-p1\\", \\"2.4.2-p2\\", \\"2.4.3\\", \\"2.4.3-p1\\", \\"2.4.3-p2\\", \\"2.4.3-p3\\", \\"2.4.4\\", \\"4.0.0-alpha1\\", \\"4.0.0-alpha10\\", \\"4.0.0-alpha11\\", \\"4.0.0-alpha12\\", \\"4.0.0-alpha2\\", \\"4.0.0-alpha3\\", \\"4.0.0-alpha4\\", \\"4.0.0-alpha4x\\"]}"')

GET https://blog.logrocket.com/wp-includes/js/mediaelement/mediaelementplayer.css (0) led to technology ('joomla-cms', '"{\\"versions\\": [\\"4.0.0-alpha4x\\"]}"')

GET https://blog.logrocket.com/wp-includes/sodium_compat/src/Core/Curve25519/README.md (0) led to technology ('WordPress', '"{\\"versions\\": [\\"5.2\\", \\"3.10.0\\", \\"3.10.0-alpha1\\", \\"3.10.0-alpha2\\",  \\"4.0.0\\", \\"4.0.0-alpha1\\", \\"4.0.0-alpha10\\", \\"4.0.0-alpha11\\", \\"4.0.0-alpha12\\", \\"4.0.0-alpha2\\", \\"4.0.0-alpha3\\", \\"4.0.0-alpha4\\", \\"4.0.0-alpha4x\\", \\"4.0.0-alpha5\\", \\"4.0.0-alpha6\\", \\"psr12anchor\\"]}"')


GET https://blog.logrocket.com/wp-content/themes/twentytwentytwo/templates/blank.html (0) led to technology ('underscore', '"{\\"versions\\": [\\"1.12.1\\", \\"1.13.0-0\\", \\"1.13.0-2\\", \\"1.13.0-1\\", \\"8.0-alpha10\\", \\"8.0-alpha11\\", \\"8.0-alpha12\\", \\"8.0-alpha13\\", \\"8.0-alpha2\\", \\"8.0-alpha3\\", \\"8.0-alpha4\\", \\"8.0-alpha5\\", \\"8.0-alpha6\\", \\"8.0-alpha7\\", \\"8.0-alpha8\\",  \\"4.0.0\\", \\"4.0.0-alpha1\\", \\"4.0.0-alpha10\\", \\"4.0.0-alpha11\\", \\"4.0.0-alpha12\\", \\"4.0.0-alpha2\\", \\"4.0.0-alpha3\\", \\"4.0.0-alpha4\\", \\"4.0.0-alpha4x\\", \\"4.0.0-alpha5\\", \\"4.0.0-alpha6\\", \\"4.0.0-alpha7\\", \\"4.0.0-alpha8\\", \\"4.0.0-alpha9\\", \\"4.0.0-beta\\", \\"4.0.0-beta2\\", \\"4.0.0-beta3\\", \\"4.0.0-beta4\\", \\"4.0.0-beta5\\", \\"4.0.0-beta6\\", \\"4.0.0-beta7\\", \\"4.0.0-rc1\\", \\"psr12anchor\\", \\"psr12final\\", \\"search1\\"]}"')

Only the joomla-cms entry is relevant because that tag is specific to Joomla: https://github.com/joomla/joomla-cms/releases/tag/4.0.0-alpha4x

It is the same problem with tags psr12anchor and psr12final and certainly more.

Also some hashes should maybe be blacklisted because they match files that can be found in a lot of software like (in the previous output) :

  • a file with a single empty line (blank.html)
  • the default LGPL licence file

Those invalid version numbers certainly have an impact on the database size (issue #28 )

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions