Skip to content

feat: --noise-filter preset + fix binary garbage in parser output#88

Open
Vasco0x4 wants to merge 4 commits intoblacklanternsecurity:masterfrom
Vasco0x4:master
Open

feat: --noise-filter preset + fix binary garbage in parser output#88
Vasco0x4 wants to merge 4 commits intoblacklanternsecurity:masterfrom
Vasco0x4:master

Conversation

@Vasco0x4
Copy link
Copy Markdown

@Vasco0x4 Vasco0x4 commented Apr 7, 2026

Summary

Two independent improvements targeting common pain points when running
MANSPIDER against Windows infrastructure.


1. --noise-filter {moderate,aggressive} — suppress Windows system noise

Running against domain controllers or file servers produces massive amounts
of results from Windows system paths (WinSxS, PolicyDefinitions, System32…)
that never contain useful data. This adds a --noise-filter flag with two
presets:

Mode Excluded dirs Excluded extensions
moderate PolicyDefinitions, WinSxS, Servicing .adml .admx .mui .mof .cat .manifest
aggressive + System32, SysWOW64, Assembly, Fonts, Spool, Windows Defender same

Presets feed directly into the existing exclude_dirnames /
exclude_extensions infrastructure, so they compose cleanly with
--exclude-dirnames and --exclude-extensions.

Usage:

manspider <target> -f password --noise-filter moderate
manspider <target> -f password --noise-filter aggressive

2. Fix: binary garbage chunks in parser output

Files like PE/DLL/binary were being misidentified as text by
charset-normalizer, producing massive \xef\xbf\xbd garbage dumps in
match output. Fixed with:

  • is_text_file() now rejects files where >1% of decoded chars are
    Unicode replacement chars (U+FFFD)
  • extract_text() applies the same ratio check after any extraction path
    and falls back to extract_strings_from_binary() when exceeded
  • Removed the -a flag from grep to stop binary stdin being treated as
    text (was causing single-line binary dumps even with -m 5)

Testing

Tested against an internal AD environment with:

  • A DC SYSVOL share (noise-filter moderate/aggressive)
  • Binary files (.dll, .exe) in accessible shares (parser fix)

claude and others added 4 commits February 24, 2026 14:25
- is_text_file() now rejects files where >1% of decoded chars are
  Unicode replacement chars (U+FFFD), stopping charset-normalizer
  false positives on PE/DLL/binary files
- extract_text() now checks replacement char ratio after ANY extraction
  path (charset-normalizer or kreuzberg) and falls back to
  extract_strings_from_binary() when ratio exceeds 1%
- Removed grep -a flag to stop binary stdin being treated as text,
  which was causing massive single-line binary dumps even with -m 5

Fixes: large chunks of \xef\xbf\xbd garbage being logged as matches
when binary files were misidentified as text or extracted with corrupt
encoding.

https://claude.ai/code/session_01HhXFjA6jdctfoi1MTfG9jY
…stem noise

Adds two preset modes that auto-populate exclude_dirnames and
exclude_extensions with well-known Windows system paths/extensions
that clutter results without containing useful data:

moderate: PolicyDefinitions (ADMX/ADML), WinSxS, Servicing
aggressive: also System32, SysWOW64, Assembly, Fonts, Spool, Defender

Both modes also suppress: .adml .admx .mui .mof .cat .manifest

The presets feed directly into the existing dir/extension blacklist
infrastructure, so they compose cleanly with --exclude-dirnames
and --exclude-extensions.

https://claude.ai/code/session_01HhXFjA6jdctfoi1MTfG9jY
…FHCR

Claude/review project structure dfhcr
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 7, 2026


Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.


I have read the CLA Document and I hereby sign the CLA


0 out of 2 committers have signed the CLA.
@claude
@Vasco0x4
You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants