Add subwords capability to ffuf_shortnames #2237

liquidsec · 2025-02-01T00:19:52Z

Also adds ignore_case option to ffuf (useful for IIS where case doesn't matter)

Subwords uses python nltk (natural language toolkit) to try and find smaller words at the beginning of shortnames. If it does, it sends the remainder off to the predictor. This works well because web developers have a habit of making lots of "VerbAction" type two-word file names.

…ion to ffuf

codecov · 2025-02-03T20:09:19Z

Codecov Report

Attention: Patch coverage is 92.22222% with 7 lines in your changes missing coverage. Please review.

Project coverage is 93%. Comparing base (703a313) to head (b756df2).
Report is 25 commits behind head on dev.

Files with missing lines	Patch %	Lines
bbot/core/helpers/web/web.py	70%	4 Missing ⚠️
bbot/modules/deadly/ffuf.py	75%	2 Missing ⚠️
bbot/modules/ffuf_shortnames.py	98%	1 Missing ⚠️

Additional details and impacted files

@@          Coverage Diff          @@
##             dev   #2237   +/-   ##
=====================================
- Coverage     93%     93%   -0%     
=====================================
  Files        378     378           
  Lines      29363   29443   +80     
=====================================
+ Hits       27155   27227   +72     
- Misses      2208    2216    +8

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

TheTechromancer · 2025-02-10T16:27:09Z

bbot/modules/ffuf_shortnames.py


 from bbot.modules.deadly.ffuf import ffuf


 class ffuf_shortnames(ffuf):
    watched_events = ["URL_HINT"]
    produced_events = ["URL_UNVERIFIED"]
-    deps_pip = ["numpy"]
+    deps_pip = ["numpy", "nltk"]


Is there an advantage to using nltk over the builtin subword helper?

mainly just the size of the wordlist, which is massive, and should be well maintained being part of nltk

Also the functionality of that helper doesn't quite match the use, that finds all the subwords (as a list), whereas this is just checking for prefixes

TheTechromancer · 2025-02-12T16:02:29Z

bbot/modules/ffuf_shortnames.py

+                self.debug("NLTK words data already present")
+            except LookupError:
+                self.debug("NLTK words data not found, downloading")
+                nltk.download("words", download_dir=self.nltk_dir, quiet=True)


self.helpers.wordlist("https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/corpora/words.zip")

liquidsec added 10 commits January 31, 2025 19:17

adding subwords capability to ffuf_shortnames, adding ignore_case opt…

3c10b34

…ion to ffuf

adjust log message

b5dac08

adding debug message

a9e06e0

better ntlk data handling

ea9e736

ruff format

bf4c0d7

undoing error

958dad9

better status message

eee42f9

bug fix

2d9f75b

prevent ffuf_shortnames from trying to solve impossible URL_HINTs

e4c0711

adding optin description

64b201b

liquidsec requested a review from TheTechromancer February 4, 2025 23:14

TheTechromancer reviewed Feb 10, 2025

View reviewed changes

TheTechromancer reviewed Feb 12, 2025

View reviewed changes

liquidsec added 3 commits February 12, 2025 14:58

reworking wordlist download, removing deps

f61943d

ruff format

74839d5

removing unneeded imports

b70d1a0

TheTechromancer approved these changes Feb 12, 2025

View reviewed changes

readding numpy

b756df2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add subwords capability to ffuf_shortnames #2237

Add subwords capability to ffuf_shortnames #2237

liquidsec commented Feb 1, 2025

codecov bot commented Feb 3, 2025 •

edited

Loading

TheTechromancer Feb 10, 2025

liquidsec Feb 10, 2025 •

edited

Loading

TheTechromancer Feb 12, 2025

Add subwords capability to ffuf_shortnames #2237

Are you sure you want to change the base?

Add subwords capability to ffuf_shortnames #2237

Conversation

liquidsec commented Feb 1, 2025

codecov bot commented Feb 3, 2025 • edited Loading

Codecov Report

TheTechromancer Feb 10, 2025

Choose a reason for hiding this comment

liquidsec Feb 10, 2025 • edited Loading

Choose a reason for hiding this comment

TheTechromancer Feb 12, 2025

Choose a reason for hiding this comment

codecov bot commented Feb 3, 2025 •

edited

Loading

liquidsec Feb 10, 2025 •

edited

Loading