Module Todo: Document Metadata Extraction #717
Replies: 5 comments 12 replies
-
Note: it would also be nice to emit the text from these documents as a generic event consumable by |
Beta Was this translation helpful? Give feedback.
-
As a prerunner for this, I have written a proof-of-concept @nicpenning here is the module: You can use it like this: bbot -t evilcorp.com -f subdomain-enum -m filedownload Pairing it with the web spider can also be very effective: bbot -t evilcorp.com -f subdomain-enum -m filedownload -c web_spider_depth=2 web_spider_distance=2 |
Beta Was this translation helpful? Give feedback.
-
This is probably relevant to this discussion #907 (comment). Now there are As mentioned in the linked discussion that is a ML model to detect human passwords in several file formats. Perhaps more interesting though is it uses Apache Tika to extract the strings from
which we could then raise as |
Beta Was this translation helpful? Give feedback.
-
Circling back around to this one, as recently we've run into problems with unstructured. Overall it's great that unstructured runs without a server component and without a Java dependency. However we should be on the lookout for a better alternative, preferably one written in rust or golang. It seems they are just now starting to emerge. @domwhewell-sage this is one to keep an eye on:
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
It would be useful to have a collection of modules that download documents (.pdf, .docx, etc.) and extract useful metadata such as usernames and internal domain names. Thanks to @pjhartlieb and @Sw3d1shPh1sh for requesting.
Also, per @nicpenning:
Would require:
EDIT: Possible sources of metadata-extraction logic:
Beta Was this translation helpful? Give feedback.
All reactions