Skip to content

feat: Stringy v1.0 -- pipeline architecture, CLI hardening, and comprehensive testing#141

Merged
mergify[bot] merged 72 commits intomainfrom
finish_epic_v1
Mar 11, 2026
Merged

feat: Stringy v1.0 -- pipeline architecture, CLI hardening, and comprehensive testing#141
mergify[bot] merged 72 commits intomainfrom
finish_epic_v1

Conversation

@unclesp1d3r
Copy link
Member

@unclesp1d3r unclesp1d3r commented Mar 10, 2026

Summary

This PR delivers the complete v1.0 implementation of Stringy, transforming it from a prototype into a production-ready binary string extraction tool. The work spans 56 commits across 132 files (+7,963 / -6,477 lines), organized into five major workstreams.

610 tests pass (23 platform-skipped), zero clippy warnings, all pre-commit hooks green.

What Changed

1. Pipeline Architecture (New)

Introduced a modular pipeline orchestrator that replaces ad-hoc extraction logic:

  • Pipeline::new(config).run(&path) -- single entry point for all analysis
  • PipelineConfig with builder pattern for extraction, filtering, output, and debug options
  • FilterEngine for tag inclusion/exclusion, encoding, min-length, and top-N filtering
  • Score normalizer with 0-100 display scores and section weight / semantic boost / noise penalty breakdown
  • mmap-guard integration for safe memory-mapped file I/O (replaces raw memmap2)
  • Graceful fallback for unknown formats (raw byte scanning with Info diagnostic)
  • Processing warnings via env-var injection for testable error paths

Key files: src/pipeline/mod.rs, src/pipeline/config.rs, src/pipeline/filter.rs, src/pipeline/normalizer.rs

2. CLI Hardening

Complete rewrite of the CLI layer with production-quality UX:

  • Typed exit codes: 0=success, 2=config/validation, 3=not-found, 4=permission-denied, 1=other
  • Short flags: -j (json), -m (min-len), -t (top), -e (enc)
  • Actionable error messages with fix suggestions
  • NO_COLOR env var support (no-color.org spec)
  • Graceful spinner fallback (no .expect(), hidden when non-TTY or NO_COLOR)
  • stdin support via patharg::InputArg with progress feedback
  • --notags renamed to --no-tags (kebab-case consistency)
  • --summary, --debug, --raw modes with proper conflict declarations
  • Help text with EXAMPLES section and all 21 tag names listed

Key files: src/main.rs, src/types/error.rs

3. Core Data Model Improvements

  • FoundString::new() + builder methods replace struct literals (forward-compatible)
  • StringContext constructors for cleaner extraction code
  • SectionInfo made #[non_exhaustive] with with_* builders
  • OutputMetadata with analysis duration, top tag distribution, builder pattern
  • FilterContext with builder methods for test ergonomics
  • ExtractionConfig::with_min_length() builder
  • Container parsers split into module directories (elf/, pe/, macho/) with separate test files

Key files: src/types/constructors.rs, src/types/found_string.rs, src/types/mod.rs

4. Output Enhancements

  • Plain text: Full C0 control character sanitization (not just newlines)
  • TTY table: Analysis duration with minute-level formatting, summary banner
  • JSON: Debug-only fields (section_weight, semantic_boost, noise_penalty) via skip_serializing_if
  • YARA: Long string skip behavior with deterministic comments
  • display_score always present in non-raw mode; forced to 0 in raw mode

Key files: src/output/table/plain.rs, src/output/table/tty.rs, src/output/json.rs

5. Build & Infrastructure

  • Zig cross-compilation replaces Docker for test fixtures (ELF, PE, Mach-O)
  • Release profile optimization: strip = true, codegen-units = 1, lto = "thin" (release) / "fat" (dist)
  • mmap-guard dependency for safe memory-mapped I/O
  • cargo-dist version bump to 0.31.0
  • GOTCHAS.md added as living documentation for edge cases
  • CI workflow updates (action versions, scorecard)

Test Plan

  • 610 tests pass across 25 test files (597 previously + 13 new short-flag tests)
  • cargo clippy --all-targets -- -D warnings clean
  • cargo fmt --check clean
  • All pre-commit hooks pass (fmt, clippy, cargo-check, mdformat, cargo-audit)
  • Integration test coverage for all CLI flags, exit codes, and conflict combinations
  • Score determinism verified across consecutive runs
  • Warning emission tested via debug-build env var injection
  • Unknown/empty binary fallback paths tested
  • Short flag equivalence verified (-j = --json, -m = --min-len, etc.)
  • NO_COLOR env var tested
  • stdin pipe tested with ELF binary and empty input
  • Permission denied exit code 4 tested (Unix)

New Test Files in This PR

File Tests Coverage Area
integration_flows_1_5.rs 16 Quick analysis, filtering, top-N, JSON, YARA
integration_flows_6_7.rs 11 Plain text output, summary mode
integration_flow8_errors.rs 11 Argument conflicts, validation failures
integration_flow8_diagnostics.rs 13 Warnings, fallback, score determinism, debug
integration_cli_errors.rs 10 Exit codes, error messages
integration_cli_coverage.rs 18 Encoding, stdin, combined filters, permissions
integration_cli_short_flags.rs 13 Short flags, NO_COLOR, OR logic, edge cases
integration_flow1_minlen.rs 2 Min-length default and explicit filter
integration_flows_5_yara.rs 1 YARA long string deterministic skip

Breaking Changes

  • --notags renamed to --no-tags (CLI flag)
  • FoundString struct fields are no longer public for direct construction (use FoundString::new() + builders)
  • SectionInfo is #[non_exhaustive] (external crate consumers must use constructors)
  • Exit codes changed: file-not-found is now 3 (was 1), permission-denied is now 4 (was 1)

Risk Assessment

Risk Level: Medium

Factor Assessment
Size Large (132 files), but most changes are additive
Breaking changes CLI flag rename + exit code changes -- documented above
Test coverage Strong -- 610 tests covering all new paths
Dependencies +mmap-guard (thin wrapper), +patharg (stdin handling)
Security #![forbid(unsafe_code)] maintained, no new unsafe

Reviewer Notes

  • The pipeline module (src/pipeline/) is entirely new -- start review there
  • Container parser splits (elf.rs -> elf/mod.rs + elf/tests.rs) are mechanical moves with no logic changes
  • Documentation changes are extensive but mostly rewrites for accuracy
  • GOTCHAS.md is a new file worth reading for context on edge cases

Generated with Claude Code

unclesp1d3r and others added 30 commits March 7, 2026 16:57
- Deleted the `concept.md` file which outlined the Stringy tool's principles, architecture, and features.
- Removed the `2026-02-16-extraction-refactor-and-cli.md` plan detailing the refactor of the extraction module and CLI integration tasks.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
…stency

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
…ction functions

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
… documentation

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
…rors

- Add assert_cmd + predicates for declarative CLI integration tests
- Add patharg for idiomatic file/stdin input (supports "-" for stdin)
- Rewrite and expand CLI tests from 4 to 18 covering help, version,
  validation, format conflicts, tag filters, and multi-format binaries
- Enrich UnsupportedFormat error to list supported formats (ELF, PE, Mach-O)
- Include file path context in I/O error messages
- Add author, long_about, and usage examples to --help output

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Add exit code verification tests (code 2 for clap errors, code 1 for
runtime errors). Add SerializationError variant to StringyError and
migrate JSON serialization errors from ConfigError to the new variant
for more precise error categorization.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
…action and filtering

- Added `pipeline` module with configuration, filtering, and normalization functionalities.
- Implemented `Pipeline` struct to orchestrate the entire string extraction process.
- Created `FilterEngine` to apply post-extraction filters based on length, encoding, and tags.
- Added `ScoreNormalizer` to normalize internal scores for user-friendly display.
- Enhanced CLI to support new filtering options and output formats.
- Updated output metadata to include binary format and summary options.
- Refactored extraction and output logic to integrate with the new pipeline structure.
- Added tests for new features and ensured existing functionality remains intact.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
… for common pitfalls

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Replace std::fs::read() with mmap_guard::map_file() for zero-copy
read-only access with advisory locking. Empty files are short-circuited
to an empty buffer to preserve graceful pipeline behavior.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
…troubleshooting

- Updated quick start guide to reflect current and planned features, improved output examples, and clarified usage instructions.
- Revised ranking algorithm documentation to simplify scoring formula, enhance section weight descriptions, and clarify semantic boosts and noise penalties.
- Streamlined troubleshooting section by removing outdated solutions, clarifying error messages, and improving performance issue guidance.
- Enhanced consistency in command examples and output formatting across all documentation sections.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Replace bare '|' assertion with ' | ' table separator check to avoid
false positives from legitimate strings containing pipe characters.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Replace all Docker-based build instructions with Zig cross-compilation
commands. Update AGENTS.md and GOTCHAS.md to reflect the new toolchain.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Add -v flag to zig cc and /v flag to zig rc so gen-fixtures shows
linker invocations instead of running silently.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Compiled fixtures (ELF, PE, Mach-O) were already in .gitignore but
still tracked. Remove them from the index so gitignore takes effect.
Add test_empty.bin and test_unknown.bin as committed static fixtures.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
…alysis duration

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
…warnings

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
…ntent

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
…p examples

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
…ess and content details

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 103 out of 112 changed files in this pull request and generated 4 comments.

Comment on lines +72 to +116
/// Emit JSONL output (one JSON object per line)
#[arg(long, conflicts_with = "yara")]
json: bool,

/// Emit YARA rule template output
#[arg(long, conflicts_with = "json")]
yara: bool,

/// Only include strings with these tags (repeatable)
#[arg(
long = "only-tags",
action = ArgAction::Append,
value_parser = Tag::from_str,
value_name = "TAG",
long_help = "Include only strings with this tag. Repeat the flag for multiple tags (OR logic).\n\
Valid tags: url, domain, ipv4, ipv6, filepath, regpath, guid, email, b64, fmt,\n\
user-agent-ish, demangled, import, export, version, manifest, resource,\n\
dylib-path, rpath, rpath-var, framework-path"
)]
only_tags: Vec<Tag>,

/// Exclude strings with these tags (repeatable)
#[arg(
long = "notags",
action = ArgAction::Append,
value_parser = Tag::from_str,
value_name = "TAG",
long_help = "Exclude strings with this tag. Repeat the flag for multiple tags (OR logic).\n\
Valid tags: url, domain, ipv4, ipv6, filepath, regpath, guid, email, b64, fmt,\n\
user-agent-ish, demangled, import, export, version, manifest, resource,\n\
dylib-path, rpath, rpath-var, framework-path"
)]
notags: Vec<Tag>,

/// Minimum string length in bytes (must be >= 1)
#[arg(long, short = 'l', default_value_t = 4)]
min_length: usize,
#[arg(long = "min-len", value_parser = parse_positive_usize)]
min_len: Option<usize>,

/// Show only the top N strings by score
#[arg(long, value_parser = parse_positive_usize)]
top: Option<usize>,

/// Filter by encoding
#[arg(long, value_enum)]
enc: Option<CliEncoding>,
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CLI currently only defines long flags and doesn't implement the PR description's behavior around short flags (-j, -m, -t, -e). If short flags are intended for v1.0, add short = 'j'/'m'/'t'/'e' to the relevant #[arg] attributes (and update tests/docs accordingly). If not intended, please align the PR description/docs to the actual CLI.

Copilot uses AI. Check for mistakes.
Comment on lines +250 to +255
fn main() {
let cli = Cli::parse();
run(&cli)
if let Err(e) = run(&cli) {
eprintln!("Error: {e}");
std::process::exit(1);
}
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

main() always exits with status code 1 for any runtime error. This doesn't match the PR description that calls out typed exit codes (e.g., not-found vs permission-denied). If typed exit codes are part of the v1.0 contract, map StringyError variants / underlying std::io::ErrorKind to the documented exit codes here (and adjust tests/docs).

Copilot uses AI. Check for mistakes.
Comment on lines +152 to +161
/// Create an `indicatif` spinner targeting stderr.
fn create_spinner() -> ProgressBar {
let pb = ProgressBar::with_draw_target(None, ProgressDrawTarget::stderr());
let style = ProgressStyle::default_spinner()
.template("{spinner} {msg}")
.expect("invalid spinner template");
pb.set_style(style);
pb.enable_steady_tick(std::time::Duration::from_millis(120));
pb
}
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pipeline always creates and updates an indicatif spinner targeting stderr. This can pollute stderr in non-interactive/scripted usage and doesn't match the PR description about hiding the spinner when non-TTY or NO_COLOR is set. Consider making create_spinner() return a hidden progress bar when !std::io::stderr().is_terminal() or when NO_COLOR is present.

Copilot uses AI. Check for mistakes.
…thods

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
…es command

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
@mergify
Copy link
Contributor

mergify bot commented Mar 11, 2026

⚠️ The sha of the head commit of this PR conflicts with #140. Mergify cannot evaluate rules on this PR. Once #140 is merged or closed, Mergify will resume processing this PR. ⚠️

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 31

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
docs/src/string-extraction.md (1)

30-39: ⚠️ Potential issue | 🟠 Major

Update this example to stop accessing FoundString fields directly.

This snippet still uses string.text and string.offset, but the v1.0 API change made FoundString fields non-public. Copy-pasting this example will not compile. Update it to use the public accessors or another supported display API; the same issue recurs later in the filtering example. As per coding guidelines, "*.md: Review documentation for: Accuracy with current implementation".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/src/string-extraction.md` around lines 30 - 39, The docs example uses
non-public fields on FoundString (string.text and string.offset); update the
snippet that calls extract_ascii_strings and AsciiExtractionConfig to use the
public API instead (e.g., call the FoundString accessor methods or Display
implementation such as string.text() and string.offset() or format!("{}",
string) depending on the crate's public API). Replace all direct field accesses
in the printing loop and in the later filtering example with the public methods
on FoundString so the example compiles against the v1.0 API.
src/lib.rs (1)

1-2: ⚠️ Potential issue | 🟠 Major

Add #![deny(warnings)] to the crate root.

The library entry point still only forbids unsafe code. Without deny(warnings), the zero-warning policy is enforced only by CI, not by normal builds of the crate.

Proposed fix
 #![forbid(unsafe_code)]
+#![deny(warnings)]

As per coding guidelines, "src/lib.rs: This is the library entry point with module declarations and re-exports. Review for: - #![forbid(unsafe_code)] and #![deny(warnings)] are present".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib.rs` around lines 1 - 2, The crate root only has
#![forbid(unsafe_code)] but lacks #![deny(warnings)]; add the attribute
#![deny(warnings)] to the top of the library entry (near the existing
#![forbid(unsafe_code)] in lib.rs) so that warnings are treated as errors for
local builds as well as CI; ensure the deny attribute appears before any module
or use declarations to apply crate-wide.
tests/integration_extraction.rs (1)

497-498: 🧹 Nitpick | 🔵 Trivial

Consider using builder pattern for consistency.

The test directly mutates config.enabled_encodings.push(...) while other tests use with_enabled_encodings(...). For consistency with the new builder API, consider:

let config = ExtractionConfig::default()
    .with_enabled_encodings(vec![Encoding::Ascii, Encoding::Utf8, Encoding::Utf16Le]);

This is a minor style inconsistency - the current code functions correctly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration_extraction.rs` around lines 497 - 498, Replace the direct
mutation of ExtractionConfig.enabled_encodings with the builder API: instead of
creating a mutable config and pushing Encoding::Utf16Le onto
config.enabled_encodings, call
ExtractionConfig::default().with_enabled_encodings(vec![Encoding::Utf16Le]) (or
include the full set like vec![Encoding::Ascii, Encoding::Utf8,
Encoding::Utf16Le]) so the test uses the same with_enabled_encodings builder
pattern as other tests.
♻️ Duplicate comments (6)
tests/integration_flows_5_yara.rs (1)

7-48: 🛠️ Refactor suggestion | 🟠 Major

Rename this test to the required test_<function>_<scenario> form.

flow5_yara_long_string_deterministic_skip is missing the test_ prefix required by the test naming convention.

As per coding guidelines, "Test names follow test__ pattern".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration_flows_5_yara.rs` around lines 7 - 48, Rename the test
function flow5_yara_long_string_deterministic_skip to follow the
test_<function>_<scenario> convention by adding the test_ prefix (e.g.,
test_flow5_yara_long_string_deterministic_skip); update the fn declaration name
and any references to that symbol in the file so the test framework discovers
it.
tests/integration_flows_6_7.rs (1)

19-180: 🛠️ Refactor suggestion | 🟠 Major

Rename these integration tests to the required test_<function>_<scenario> form.

All seven test functions in this file miss the test_ prefix, so they do not follow the repository naming rule.

As per coding guidelines, "Test names follow test__ pattern".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration_flows_6_7.rs` around lines 19 - 180, Several integration
test functions in this file violate the naming rule by omitting the required
"test_" prefix; rename each test function identifier by prepending "test_" so
they follow test_<function>_<scenario>. Specifically, rename:
flow6_piped_output_is_one_string_per_line_no_headers ->
test_flow6_piped_output_is_one_string_per_line_no_headers,
flow6_piped_plain_text_compatible_with_grep ->
test_flow6_piped_plain_text_compatible_with_grep,
flow7_summary_conflicts_with_json_exit_2 ->
test_flow7_summary_conflicts_with_json_exit_2,
flow7_summary_conflicts_with_yara_exit_2 ->
test_flow7_summary_conflicts_with_yara_exit_2, flow7_summary_non_tty_exit_2 ->
test_flow7_summary_non_tty_exit_2, and
flow7_tty_summary_block_contains_required_fields ->
test_flow7_tty_summary_block_contains_required_fields; keep the existing #[test]
attributes and update any internal references to those function names (e.g., if
referenced elsewhere), ensuring each new name is unique.
src/container/elf/tests.rs (1)

166-184: 🧹 Nitpick | 🔵 Trivial

API contract tests are compile-time only.

These tests assign method references to typed variables, which verifies signatures at compile time but adds no runtime coverage. The comments now clarify this intent, which is an improvement.

Consider removing these or converting to functional tests with fixture data if actual runtime behavior coverage is needed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/container/elf/tests.rs` around lines 166 - 184, These two tests
(test_import_export_extraction_methods_exist and
test_library_extraction_behavior) only verify signatures at compile time by
assigning method references (ElfParser::extract_imports,
ElfParser::extract_exports, ElfParser::get_symbol_providing_library) and provide
no runtime coverage; either remove them or replace them with functional tests
that load a small ELF fixture and call ElfParser::extract_imports /
ElfParser::extract_exports and ElfParser::get_symbol_providing_library asserting
expected results, ensuring you construct a valid Elf value (or a test helper)
and exercise the real methods instead of merely binding to their function
pointers.
tests/integration_cli.rs (2)

303-307: ⚠️ Potential issue | 🟡 Minor

Case-insensitive tag matching is overly permissive.

Tags are canonically lowercase per Tag::from_str in src/types/mod.rs. Use exact match.

Use exact string comparison
         assert!(
             tags.iter()
-                .any(|t| t.as_str().is_some_and(|s| s.eq_ignore_ascii_case("url"))),
+                .any(|t| t.as_str() == Some("url")),
             "every result should contain the 'url' tag, got: {tags:?}"
         );
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration_cli.rs` around lines 303 - 307, The test in
tests/integration_cli.rs is using case-insensitive matching
(eq_ignore_ascii_case) against the tag string; since tags are canonicalized to
lowercase by Tag::from_str in src/types/mod.rs, change the assertion to require
an exact lowercase match (e.g., check that the tag option equals Some("url") via
equality or as_deref()) so the test uses strict equality rather than
case-insensitive comparison.

170-180: 🧹 Nitpick | 🔵 Trivial

Assertion could be more precise for Mach-O cross-platform handling.

The assertion result.status.success() || !result.stderr.is_empty() silently accepts crashes that produce no stderr. Consider checking exit code presence explicitly.

Document the assertion intent
-    // We just verify it runs without panicking
-    assert!(result.status.success() || !result.stderr.is_empty());
+    // Accept either success, or graceful failure with error message.
+    // Mach-O parsing may fail on non-macOS platforms due to fixture format.
+    assert!(
+        result.status.success() || (result.status.code().is_some() && !result.stderr.is_empty()),
+        "should either succeed or fail gracefully with an error message"
+    );
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration_cli.rs` around lines 170 - 180, The test cli_macho_binary
currently allows silent crashes because it only checks result.status.success()
|| !result.stderr.is_empty(); change the assertion to explicitly require either
success, a non-empty stderr, or an explicit exit code to reject
signal-terminated (no-code, no-stderr) runs. In function cli_macho_binary (the
result from stringy().arg(...).output()), replace the assert with one that
checks result.status.success() || result.status.code().is_some() ||
!result.stderr.is_empty() so the test fails on silent crashes while still
permitting platform differences.
tests/integration_flow8_diagnostics.rs (1)

13-244: 🧹 Nitpick | 🔵 Trivial

Test names don't follow test_<function>_<scenario> pattern.

Functions like flow8_injected_demangle_failures_emit_warning_on_stderr should be renamed to follow the convention (e.g., test_cli_injected_demangle_failures_emit_warning).

As per coding guidelines, "Test names follow test__ pattern".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration_flow8_diagnostics.rs` around lines 13 - 244, Rename the
tests to follow the test_<function>_<scenario> naming convention; specifically
rename functions such as
flow8_injected_demangle_failures_emit_warning_on_stderr,
flow8_injected_classify_failures_emit_warning_on_stderr,
flow8_injected_both_failures_emit_combined_warning,
flow8_normal_processing_no_spurious_warnings,
flow8_unknown_binary_info_stderr_exit_0,
flow8_unknown_binary_json_has_raw_bytes_section_and_tags,
flow8_empty_binary_info_stderr_exit_0,
flow8_filters_match_nothing_info_stderr_exit_0,
score_determinism_across_two_runs, debug_flag_adds_score_breakdown_fields, and
no_debug_flag_omits_score_breakdown_fields to names like
test_cli_injected_demangle_failures_emit_warning,
test_cli_injected_classify_failures_emit_warning,
test_cli_injected_both_failures_emit_combined_warning,
test_cli_normal_processing_no_spurious_warnings,
test_cli_unknown_binary_info_exit_0,
test_cli_unknown_binary_json_has_raw_bytes_and_tags,
test_cli_empty_binary_exit_0, test_cli_filters_match_nothing_exit_0,
test_score_determinism_across_two_runs,
test_debug_flag_adds_score_breakdown_fields, and
test_no_debug_flag_omits_score_breakdown_fields respectively so they adhere to
the test_<function>_<scenario> pattern and update any references to these
functions in the test module if present.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.zig-version:
- Line 1: The repository pins Zig to 0.15.2 but that release has a known
regression in `zig build` (GH `#25553`) that can hang or leak resources; update
documentation and CI to note this limitation and add mitigation steps: amend the
`.zig-version` explanation or project README to mention the regression, add a
short note in any build/fixture-generation scripts or docs that use Zig (e.g.,
the fixture generation and cross-compilation steps that invoke `zig build`)
instructing maintainers to monitor upstream fixes, and add a temporary guard in
CI/docs recommending downgrading to a prior working Zig patch or using an
alternative invocation/workaround until the issue is resolved. Ensure the
documentation references Zig 0.15.2 and the GitHub issue number so future
maintainers can track the fix.

In `@AGENTS.md`:
- Line 3: Replace the literal token "@GOTCHAS.md" with a proper Markdown link so
the reference is clickable; specifically locate the top-level pointer showing
"@GOTCHAS.md" and change it to the standard link format like GOTCHAS.md wrapped
as [GOTCHAS.md](GOTCHAS.md) to follow the "*.md: Review documentation for:
Proper markdown formatting" guideline.

In `@docs/src/api.md`:
- Around line 112-125: The docs incorrectly state FilterConfig.min_length
default is 4; update the documentation block for FilterConfig to reflect the
actual default (min_length: Option<usize> is initialized to None in
FilterConfig::new()), e.g. change "default: 4" to "default: none (no minimum)"
or similar wording so it matches the behavior of FilterConfig and
FilterConfig::new().

In `@docs/src/cli.md`:
- Around line 126-133: Update the "Exit Codes" table under the "Exit Codes"
section to match the v1.0 CLI contract: add rows for code 3 (file not found) and
code 4 (permission denied), remove "file not found" from code 1's description
(leave code 1 for runtime errors like tag overlap and `--summary` in non-TTY),
and adjust the textual meanings so codes 0,1,2,3,4 reflect: 0 = success, 1 =
runtime error (excluding file-not-found/permission issues), 2 = argument parsing
error, 3 = file not found, 4 = permission denied; ensure the table
header/formatting and wording match the existing style in the "Exit Codes"
section.

In `@docs/src/configuration.md`:
- Around line 3-4: Update the note so it no longer claims configuration is
"CLI-only": change the text to state that config-file support in the CLI is not
yet available and clarify that programmatic configuration via PipelineConfig and
the builders exists; then explicitly label the rest of the page as planned
config-file syntax (not yet implemented) so readers understand those sections
describe planned behavior rather than current runtime behavior. Reference the
PipelineConfig and builder types in the note to avoid implying CLI exclusivity
and rephrase the sentence that currently reads "Stringy currently uses CLI flags
exclusively for configuration" to something like "The CLI does not yet support
config files; programmatic configuration is available via PipelineConfig and
builder APIs, while the sections below describe the planned config-file syntax."

In `@docs/src/quickstart.md`:
- Around line 50-54: The examples mix raw "score" and user-facing
"display_score" and unnecessarily include "--debug" in the first JSON example;
update all JSON examples (including the ones referenced around the other ranges)
to consistently emit and filter on "display_score" (0–100) and remove the
"--debug" flag where not needed (e.g., change "stringy --json --debug binary.elf
| jq 'select(.display_score > 80)'" to "stringy --json binary.elf | jq
'select(.display_score > 80)'"), and replace any occurrences of ".score" in
examples with ".display_score" so the scoring field is consistent across the
doc.

In `@docs/src/troubleshooting.md`:
- Around line 278-285: Update the "Exit Code Reference" table under the "Exit
Code Reference" header so it matches the PR objectives and integration tests:
change the row currently listing "file not found" as exit code 1 to exit code 3,
add/ensure a row for "permission denied" with exit code 4, and adjust the
description for exit code 1 to reflect remaining runtime errors (e.g., tag
overlap, `--summary` in non-TTY) if applicable; reference the integration tests
`exit_code_3_missing_file` and `permission_denied_exit_code_4` and the table
titled "Exit Code Reference" to locate where to edit.
- Around line 227-237: Update the "File Not Found" documentation block so its
"Exit code" matches the implementation and test naming: change the documented
exit code from 1 to 3; ensure the "File Not Found" section text (the header and
the "Exit code" line) reflects exit code 3 to match the
`exit_code_3_missing_file` behavior.
- Around line 124-128: Wrap the example commands for different encodings in a
fenced code block with a bash language specifier so the linter stops
complaining; specifically, locate the four command lines starting with "stringy
--enc ascii", "stringy --enc utf8", "stringy --enc utf16le", and "stringy --enc
utf16be" and add a ```bash fence before them and a closing ``` fence after them.

In `@justfile`:
- Around line 310-311: Replace the Windows fixture creation that uses New-Item
-ItemType File -Force for "tests/fixtures/test_empty.bin" with an explicit write
of zero bytes using the .NET File.WriteAllBytes pattern used for
test_unknown.bin; specifically, call System.IO.File.WriteAllBytes (or the
PowerShell equivalent) with an empty byte[] to guarantee the file is truncated
to 0 bytes on repeated runs.

In `@mise.toml`:
- Line 24: Remove the duplicate cargo backend entry for git-cliff by deleting
the explicit "cargo:git-cliff" entry and keep the shorthand "git-cliff" (which
resolves to the aqua backend) so the tool is installed via prebuilt binaries;
locate the duplicate entries by searching for "git-cliff" and "cargo:git-cliff"
in the file and remove the "cargo:git-cliff" line only.

In `@README.md`:
- Around line 107-110: The README's JSONL example uses jq '.[] | select(.tags[]
| contains("Url"))' which expects a JSON array; update the example that follows
the "stringy --json target_binary" invocation to use a per-object jq filter
compatible with JSONL, e.g. replace the pipeline example with something like:
stringy --json target_binary | jq 'select(any(.tags[]; . == "Url"))' so the
filter evaluates each newline-delimited object (reference the examples using
"stringy --json" and the jq pipeline).
- Around line 167-180: The README contains a duplicated "## Features" heading;
remove or rename the second "## Features" header and merge its bullet list into
the original Features section (or rename it to something distinct like "##
Additional Features" or "## Capabilities") so the document outline is valid for
markdownlint; update the heading text and move the bullets under the canonical
Features section (referencing the second "## Features" header and its bullets:
Format Detection, Container Parsing, String Extraction, Semantic Classification,
Symbol Demangling, Ranking, Deduplication, Output Formats, PE Resources,
Import/Export Analysis, Pipeline Architecture) to resolve the duplicate.
- Around line 124-133: The fenced code block under the "Human-readable mode
(TTY):" section is missing a language tag and triggers markdownlint; update the
opening fence from ``` to ```text so the sample block (containing the table
lines like "String | Tags | Score | Section" and the example rows) is labeled as
text and the README becomes lint-clean.

In `@src/classification/mod.rs`:
- Line 41: SemanticClassifier is using its own GUID_REGEX, FORMAT_REGEX, and
USER_AGENT_REGEX which diverge from the public patterns in patterns::data
(FORMAT_STRING_REGEX, classify_guid(), classify_format(),
classify_user_agent()), causing inconsistent classification; fix this by
consolidating to one authoritative source: remove or replace the internal
GUID_REGEX, FORMAT_REGEX, and USER_AGENT_REGEX in mod.rs and have
SemanticClassifier import and reuse the public regexes/classifier functions from
patterns::data (or expose shared constants there), ensure GUID matching uses the
same optional-brace and case-insensitive semantics as classify_guid(), and
ensure format and user-agent detection use FORMAT_STRING_REGEX and the same
agent list used by classify_user_agent() so both public API and
SemanticClassifier yield identical results.

In `@src/extraction/ascii/extraction.rs`:
- Around line 122-139: Duplicate logic creating FoundString entries for
end-of-buffer ASCII handling should be consolidated; extract a small helper
(e.g., push_ascii_string or finalize_ascii_string) that takes the target
Vec<FoundString>, the taken bytes (current_string_bytes), start and len,
converts bytes to String via String::from_utf8, and pushes FoundString::new(...,
Encoding::Ascii, start as u64, len as u32, StringSource::SectionData). Replace
both inline duplicate blocks with calls to that helper (referencing
current_string_bytes, start, len, FoundString::new, Encoding::Ascii,
StringSource::SectionData) so end-of-buffer and max_length paths reuse the same
logic.

In `@src/extraction/pe_resources/manifests.rs`:
- Around line 173-180: The length passed to FoundString::new is currently
computed from the decoded UTF-8 string (manifest_text.len()) which underreports
the original resource byte span for UTF-16; instead compute length from the
original manifest payload bytes used for decoding (after stripping any BOM) —
e.g., use the payload byte slice length (payload_bytes.len() minus any
bom_bytes.len()) that you passed into the decoder — and pass that byte-length
into FoundString::new so the emitted metadata reflects the actual resource span;
update the code around manifest_text, encoding, and FoundString::new to use that
payload-byte length variable.
- Around line 175-183: The manifest extraction is incorrectly hardcoding the
file offset as 0 when constructing FoundString, losing provenance; update the
FoundString::new(...) call in the manifest handling to pass the real resource
file offset/RVA (not 0) obtained from the pelite DataEntry/resource directory
(propagate the resource's file offset or RVA into this function), ensuring the
same pattern is applied to other PE resource extractors that use FoundString so
offset/section/RVA are preserved before calling
.with_section(".rsrc".to_string()). Keep .with_tags(...) unchanged but remove
the placeholder 0 and wire the actual resource location value through the call
chain.

In `@src/extraction/traits.rs`:
- Around line 37-39: ExtractionConfig is now public and #[non_exhaustive], so
add an explicit public constructor ExtractionConfig::new() that returns a
default-initialized instance (delegating to Default::default()) and document it
as the preferred builder entry point; update the docs/examples to use
ExtractionConfig::new() instead of struct literals or field mutation. Locate the
ExtractionConfig type (the struct with #[non_exhaustive] and #[derive(Debug,
Clone)]) and implement pub fn new() -> Self { Self::default() } plus a short doc
comment; apply the same new() constructor pattern to the other public
non-exhaustive structs referenced in the file (lines ~112-183) to follow the
coding guideline.

In `@src/lib.rs`:
- Around line 23-29: The rustdoc example uses a deep import path (use
stringy::pipeline::{Pipeline, PipelineConfig}) instead of the crate-root
re-exports; update the doctest in lib.rs to import the public types from the
crate root (e.g., use stringy::Pipeline and stringy::PipelineConfig or the
appropriate re-exported modules such as stringy::extraction/stringy::types per
your lib.rs re-exports) and keep the rest of the example (Pipeline::new and
pipeline.run(...)) unchanged so the example teaches the ergonomic public API.

In `@src/main.rs`:
- Around line 45-53: The CLI encoding flag is only being applied as a
post-extraction filter (cli_encoding_to_filter -> EncodingFilter), so UTF‑16
extraction and raw mode are effectively ignored and UTF‑8 mapping drops ASCII;
update the wiring so the CLI encoding sets both the extraction and filter
behavior: change cli_encoding_to_filter (or replace it with a function that
returns both a FilterConfig and updates ExtractionConfig.enabled_encodings) so
that CliEncoding::Utf16, Utf16Le, Utf16Be enable wide/UTF‑16 scanning in
ExtractionConfig.enabled_encodings (not just EncodingFilter::Utf16Any), ensure
CliEncoding::Utf8 enables UTF‑8 extraction while also including ASCII extraction
(or a combined enabled encoding set), and in
BasicExtractor::extract_from_section ensure raw mode respects the chosen
encoding by applying the extraction config before the early return (or by
applying the filter during raw extraction) so the --enc flag affects extraction
and raw output consistently.

In `@src/output/table/plain.rs`:
- Around line 79-83: The test currently uses contains(...) which allows extra
characters; replace those partial checks with exact equality checks: after
collecting let lines: Vec<&str> = result.lines().collect(), assert the sanitized
output exactly by using assert_eq!(lines[0], "tab\\there") and
assert_eq!(lines[1], "pipe|here") (replace the two assert!(...contains(...))
assertions that reference lines and result.lines()).

In `@src/output/table/tty.rs`:
- Around line 136-139: The summary header currently writes metadata.binary_name
directly into the terminal (in the format! that builds block), allowing crafted
names with newlines or control/escape sequences to forge summary lines; before
formatting the header, sanitize metadata.binary_name (e.g., remove or escape
control characters like \n, \r and terminal escape sequences, or replace
non-printable chars) and use the sanitized value in the format! that constructs
block so only safe printable characters are emitted to the terminal.

In `@src/pipeline/mod.rs`:
- Around line 431-501: The test block is large and should be moved out of
mod.rs; create a new file src/pipeline/tests.rs containing the entire
#[cfg(test)] mod tests { ... } contents (keeping use super::* and the individual
test functions like warning_format_both_zero_returns_none,
warning_format_demangle_only, warning_format_classify_only,
warning_format_both_nonzero, load_file_happy_path,
load_file_empty_file_returns_empty_buffer,
load_file_missing_file_returns_io_error), then replace the moved block in
src/pipeline/mod.rs with a single declaration #[cfg(test)] mod tests; so the
tests still import the parent module symbols (format_processing_warnings,
load_file, StringyError) via super::* and compile unchanged.

In `@tests/integration_cli_coverage.rs`:
- Around line 138-145: The test currently uses case-insensitive matching to
detect a "url" tag; change the predicate in tests/integration_cli_coverage.rs so
it performs an exact lowercase match (e.g., compare the tag string to "url"
using == or eq) instead of eq_ignore_ascii_case, because tags are canonicalized
to lowercase by Tag::from_str in src/types/mod.rs; update the any(|t| ... )
closure to only flag tags that exactly equal "url".

In `@tests/integration_cli_short_flags.rs`:
- Around line 13-36: The tests currently only compare exit status or line counts
(variables long_output/short_output and long_lines/short_lines) which can miss
differences; update each short-flag test to assert full output equivalence by
comparing long_output.stdout == short_output.stdout and long_output.stderr ==
short_output.stderr, and for JSON cases parse both stdout values into records
(e.g., serde_json::from_str::<Vec<_>> or streaming deserialize) and assert the
parsed structures are equal instead of just counting lines; apply the same
change to the other short-flag blocks (the tests around the other ranges
mentioned) so -m/-t/-e and combined short-form paths are validated by content,
not just line count or status.
- Around line 147-180: The test currently only verifies each record contains
"url" or "domain" which still passes if the parser does "last value wins";
modify the test to run three invocations: one with --only-tags url, one with
--only-tags domain, and one with both --only-tags url --only-tags domain (the
existing multi run using stringy() and elf_path), parse each run's stdout lines
into JSON Values (as done with parsed and tags), collect a canonical identifier
for each record (e.g., the whole JSON string or an ID field) into three Sets,
then assert that the multi-run set equals the union of the single-url and
single-domain sets and also assert the multi-run set is not identical to either
single set alone to prove values are ORed rather than last-wins.
- Around line 133-140: The current test test_no_color_env_var uses a piped
invocation so the CLI already behaves as non-TTY and the NO_COLOR env var isn't
being exercised; update the test that calls stringy() to force a color-producing
code path (e.g. add the CLI flag that forces color, such as "--color=always" or
otherwise invoke the color decision path) while still setting
env("NO_COLOR","1"), then assert that the command output contains no ANSI escape
sequences (use a regex like \x1b\[[0-9;]*m via your test assertion helpers) to
ensure NO_COLOR is respected; alternatively, add a focused unit test for the
color decision function to verify NO_COLOR overrides forced-color settings.

In `@tests/integration_flows_1_5.rs`:
- Around line 245-264: The test function flow2_enc_utf16_filters_encoding
currently passes silently if stdout is empty, masking a broken --enc utf16;
update the test to use a fixture that contains a known UTF-16/wide string
(replace "tests/fixtures/test_binary_elf" with a fixture containing a wide
string or add such a fixture), invoke stringy().args([... , "--enc", "utf16",
"--json"]) as before, parse lines into serde_json::Value and assert that at
least one parsed result has
v["encoding"].as_str().unwrap_or("").starts_with("Utf16"), failing the test if
there are zero Utf16* results so the CLI behavior is actually exercised.
- Around line 105-123: The tests flow1_piped_output_no_score_header and
flow1_raw_piped_output_no_score_header currently use
predicate::str::contains("Score").not(), which can false-fail if the fixture
contains "Score"; change each assertion to check only the first line (or exact
plain-text snapshot) of stdout instead: capture stdout, split on newlines and
assert the first line does not contain "Score" (or compare the entire stdout to
a saved expected plain-text snapshot). Update the assertions in those two
functions to target the first-line or snapshot rather than using
contains("Score").not().

In `@tests/test_noise_filters.rs`:
- Around line 217-220: The tests build a FilterContext using
FilterContext::default() which implicitly sets section_weight to 0.5; make that
explicit by chaining .with_section_weight(0.5) on the constructed context (e.g.,
after
.with_section_type(SectionType::StringData).with_is_executable(false).with_is_writable(false)
add .with_section_weight(0.5)), and apply the same explicit
.with_section_weight(...) change to the other similar test case at the indicated
location so the tests no longer depend on the default.

---

Outside diff comments:
In `@docs/src/string-extraction.md`:
- Around line 30-39: The docs example uses non-public fields on FoundString
(string.text and string.offset); update the snippet that calls
extract_ascii_strings and AsciiExtractionConfig to use the public API instead
(e.g., call the FoundString accessor methods or Display implementation such as
string.text() and string.offset() or format!("{}", string) depending on the
crate's public API). Replace all direct field accesses in the printing loop and
in the later filtering example with the public methods on FoundString so the
example compiles against the v1.0 API.

In `@src/lib.rs`:
- Around line 1-2: The crate root only has #![forbid(unsafe_code)] but lacks
#![deny(warnings)]; add the attribute #![deny(warnings)] to the top of the
library entry (near the existing #![forbid(unsafe_code)] in lib.rs) so that
warnings are treated as errors for local builds as well as CI; ensure the deny
attribute appears before any module or use declarations to apply crate-wide.

In `@tests/integration_extraction.rs`:
- Around line 497-498: Replace the direct mutation of
ExtractionConfig.enabled_encodings with the builder API: instead of creating a
mutable config and pushing Encoding::Utf16Le onto config.enabled_encodings, call
ExtractionConfig::default().with_enabled_encodings(vec![Encoding::Utf16Le]) (or
include the full set like vec![Encoding::Ascii, Encoding::Utf8,
Encoding::Utf16Le]) so the test uses the same with_enabled_encodings builder
pattern as other tests.

---

Duplicate comments:
In `@src/container/elf/tests.rs`:
- Around line 166-184: These two tests
(test_import_export_extraction_methods_exist and
test_library_extraction_behavior) only verify signatures at compile time by
assigning method references (ElfParser::extract_imports,
ElfParser::extract_exports, ElfParser::get_symbol_providing_library) and provide
no runtime coverage; either remove them or replace them with functional tests
that load a small ELF fixture and call ElfParser::extract_imports /
ElfParser::extract_exports and ElfParser::get_symbol_providing_library asserting
expected results, ensuring you construct a valid Elf value (or a test helper)
and exercise the real methods instead of merely binding to their function
pointers.

In `@tests/integration_cli.rs`:
- Around line 303-307: The test in tests/integration_cli.rs is using
case-insensitive matching (eq_ignore_ascii_case) against the tag string; since
tags are canonicalized to lowercase by Tag::from_str in src/types/mod.rs, change
the assertion to require an exact lowercase match (e.g., check that the tag
option equals Some("url") via equality or as_deref()) so the test uses strict
equality rather than case-insensitive comparison.
- Around line 170-180: The test cli_macho_binary currently allows silent crashes
because it only checks result.status.success() || !result.stderr.is_empty();
change the assertion to explicitly require either success, a non-empty stderr,
or an explicit exit code to reject signal-terminated (no-code, no-stderr) runs.
In function cli_macho_binary (the result from stringy().arg(...).output()),
replace the assert with one that checks result.status.success() ||
result.status.code().is_some() || !result.stderr.is_empty() so the test fails on
silent crashes while still permitting platform differences.

In `@tests/integration_flow8_diagnostics.rs`:
- Around line 13-244: Rename the tests to follow the test_<function>_<scenario>
naming convention; specifically rename functions such as
flow8_injected_demangle_failures_emit_warning_on_stderr,
flow8_injected_classify_failures_emit_warning_on_stderr,
flow8_injected_both_failures_emit_combined_warning,
flow8_normal_processing_no_spurious_warnings,
flow8_unknown_binary_info_stderr_exit_0,
flow8_unknown_binary_json_has_raw_bytes_section_and_tags,
flow8_empty_binary_info_stderr_exit_0,
flow8_filters_match_nothing_info_stderr_exit_0,
score_determinism_across_two_runs, debug_flag_adds_score_breakdown_fields, and
no_debug_flag_omits_score_breakdown_fields to names like
test_cli_injected_demangle_failures_emit_warning,
test_cli_injected_classify_failures_emit_warning,
test_cli_injected_both_failures_emit_combined_warning,
test_cli_normal_processing_no_spurious_warnings,
test_cli_unknown_binary_info_exit_0,
test_cli_unknown_binary_json_has_raw_bytes_and_tags,
test_cli_empty_binary_exit_0, test_cli_filters_match_nothing_exit_0,
test_score_determinism_across_two_runs,
test_debug_flag_adds_score_breakdown_fields, and
test_no_debug_flag_omits_score_breakdown_fields respectively so they adhere to
the test_<function>_<scenario> pattern and update any references to these
functions in the test module if present.

In `@tests/integration_flows_5_yara.rs`:
- Around line 7-48: Rename the test function
flow5_yara_long_string_deterministic_skip to follow the
test_<function>_<scenario> convention by adding the test_ prefix (e.g.,
test_flow5_yara_long_string_deterministic_skip); update the fn declaration name
and any references to that symbol in the file so the test framework discovers
it.

In `@tests/integration_flows_6_7.rs`:
- Around line 19-180: Several integration test functions in this file violate
the naming rule by omitting the required "test_" prefix; rename each test
function identifier by prepending "test_" so they follow
test_<function>_<scenario>. Specifically, rename:
flow6_piped_output_is_one_string_per_line_no_headers ->
test_flow6_piped_output_is_one_string_per_line_no_headers,
flow6_piped_plain_text_compatible_with_grep ->
test_flow6_piped_plain_text_compatible_with_grep,
flow7_summary_conflicts_with_json_exit_2 ->
test_flow7_summary_conflicts_with_json_exit_2,
flow7_summary_conflicts_with_yara_exit_2 ->
test_flow7_summary_conflicts_with_yara_exit_2, flow7_summary_non_tty_exit_2 ->
test_flow7_summary_non_tty_exit_2, and
flow7_tty_summary_block_contains_required_fields ->
test_flow7_tty_summary_block_contains_required_fields; keep the existing #[test]
attributes and update any internal references to those function names (e.g., if
referenced elsewhere), ensuring each new name is unique.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 3e46dc0d-a745-496f-a8cc-7eaaa23aa55c

📥 Commits

Reviewing files that changed from the base of the PR and between 459950e and 5cab95a.

⛔ Files ignored due to path filters (3)
  • mise.lock is excluded by !**/*.lock
  • tests/snapshots/output_table_integration__plain_preserves_special_characters.snap is excluded by !**/*.snap, !**/tests/snapshots/**
  • tests/snapshots/output_table_integration__tty_special_characters.snap is excluded by !**/*.snap, !**/tests/snapshots/**
📒 Files selected for processing (63)
  • .github/workflows/scorecard.yml
  • .zig-version
  • AGENTS.md
  • CHANGELOG.md
  • CONTRIBUTING.md
  • Cargo.toml
  • GOTCHAS.md
  • README.md
  • deny.toml
  • docs/src/api.md
  • docs/src/architecture.md
  • docs/src/cli.md
  • docs/src/configuration.md
  • docs/src/quickstart.md
  • docs/src/ranking.md
  • docs/src/string-extraction.md
  • docs/src/troubleshooting.md
  • justfile
  • mise.toml
  • project_plan/specs/Technical_Plan__Stringy_v1.0_Implementation.md
  • project_plan/tickets/Add_performance_optimizations_and_dependency_updates.md
  • project_plan/tickets/Build_Pipeline_orchestration_with_filtering_and_progress_feedback.md
  • src/classification/mod.rs
  • src/classification/patterns/data.rs
  • src/classification/patterns/ip.rs
  • src/classification/patterns/network.rs
  • src/classification/patterns/paths.rs
  • src/container/elf/mod.rs
  • src/container/elf/tests.rs
  • src/extraction/ascii/extraction.rs
  • src/extraction/basic_extractor.rs
  • src/extraction/filters/mod.rs
  • src/extraction/helpers.rs
  • src/extraction/macho_load_commands.rs
  • src/extraction/pe_resources/manifests.rs
  • src/extraction/pe_resources/string_tables.rs
  • src/extraction/pe_resources/version_info.rs
  • src/extraction/tests.rs
  • src/extraction/traits.rs
  • src/extraction/utf16/extraction.rs
  • src/lib.rs
  • src/main.rs
  • src/output/table/formatting.rs
  • src/output/table/plain.rs
  • src/output/table/tty.rs
  • src/pipeline/config.rs
  • src/pipeline/filter.rs
  • src/pipeline/mod.rs
  • src/types/constructors.rs
  • src/types/error.rs
  • tests/integration_cli.rs
  • tests/integration_cli_coverage.rs
  • tests/integration_cli_errors.rs
  • tests/integration_cli_short_flags.rs
  • tests/integration_extraction.rs
  • tests/integration_flow8_diagnostics.rs
  • tests/integration_flow8_errors.rs
  • tests/integration_flows_1_5.rs
  • tests/integration_flows_5_yara.rs
  • tests/integration_flows_6_7.rs
  • tests/test_ascii_integration.rs
  • tests/test_deduplication.rs
  • tests/test_noise_filters.rs

@@ -0,0 +1 @@
0.15.2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Is Zig 0.15.2 a valid release? What is the latest stable version of Zig?

💡 Result:

Yes. Zig 0.15.2 is a valid official release: it appears on Zig’s official download page with release date 2025-10-11 and published release artifacts (tarballs/zips + minisig signatures). [1]

As of March 11, 2026, the latest stable Zig release is 0.15.2. [1] (The official documentation site also has a dedicated 0.15.2 version selector/page.) [2]

Sources: [1] [2]


🌐 Web query:

Are there any security advisories or known issues with Zig compiler version 0.15.2?

💡 Result:

  • Security advisories: The Zig compiler repo (ziglang/zig) currently has no published GitHub Security Advisories, and it also reports no SECURITY.md / security policy file. [1]
  • Known issues / regressions reported for 0.15.2: There are user-reported bugs against Zig 0.15.2, e.g. a report that some zig build invocations can hang, peg CPU, and leak memory on multiple OSes (regression vs 0.15.1). [2]
  • Context: The 0.15.2 release was presented as a bug-fix release, and the 0.15.2 milestone shows many issues closed for that release (including TLS/HTTP-related fixes), but that does not preclude additional post-release bugs like the one above. [3][4]

Sources:
[1] GitHub Security overview for ziglang/zig (github.com)
[2] “Crashing & Leaking Build Script Compilation with Zig 0.15.2” (issue #25553) (github.com)
[3] Zig 0.15.2 release thread on Ziggit (ziggit.dev)
[4] ziglang/zig milestone “0.15.2” (closed issues list) (github.com)

Citations:


Zig 0.15.2 is valid but has a known build regression.

Zig 0.15.2 is confirmed as the latest stable release (released 2025-10-11), and no security advisories are published. However, there is a documented regression affecting zig build invocations that can cause hangs, CPU pegging, and memory leaks (GitHub issue #25553). Since this project uses Zig for fixture generation and cross-compilation as part of the build process, document this limitation and monitor for a fix in future releases, or evaluate a mitigation strategy if builds are affected.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.zig-version at line 1, The repository pins Zig to 0.15.2 but that release
has a known regression in `zig build` (GH `#25553`) that can hang or leak
resources; update documentation and CI to note this limitation and add
mitigation steps: amend the `.zig-version` explanation or project README to
mention the regression, add a short note in any build/fixture-generation scripts
or docs that use Zig (e.g., the fixture generation and cross-compilation steps
that invoke `zig build`) instructing maintainers to monitor upstream fixes, and
add a temporary guard in CI/docs recommending downgrading to a prior working Zig
patch or using an alternative invocation/workaround until the issue is resolved.
Ensure the documentation references Zig 0.15.2 and the GitHub issue number so
future maintainers can track the fix.

@@ -1,5 +1,7 @@
# AI Agent Guidelines for Stringy

@GOTCHAS.md
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use a real Markdown link here.

@GOTCHAS.md renders as plain text, so the new top-level pointer is not actually clickable. Replace it with a normal link such as [GOTCHAS.md](GOTCHAS.md). As per coding guidelines, "*.md: Review documentation for: Proper markdown formatting".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AGENTS.md` at line 3, Replace the literal token "@GOTCHAS.md" with a proper
Markdown link so the reference is clickable; specifically locate the top-level
pointer showing "@GOTCHAS.md" and change it to the standard link format like
GOTCHAS.md wrapped as [GOTCHAS.md](GOTCHAS.md) to follow the "*.md: Review
documentation for: Proper markdown formatting" guideline.

Comment on lines +112 to +125
```text
pub struct FilterConfig {
/// Minimum string length to include (default: 4)
pub min_length: usize, // --min-len
/// Restrict to a specific encoding
pub encoding: Option<EncodingFilter>, // --enc
/// Only include strings with these tags (empty = no filter)
pub include_tags: Vec<Tag>, // --only-tags
/// Exclude strings with these tags (empty = no filter)
pub exclude_tags: Vec<Tag>, // --no-tags
/// Limit output to top N strings by score
pub top_n: Option<usize>, // --top
}
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Documentation mismatch: min_length default value.

The documentation states min_length has "default: 4", but FilterConfig::new() in src/pipeline/config.rs initializes min_length: None. The struct field is Option<usize>, so None means no minimum filter is applied by default.

Suggested fix
 pub struct FilterConfig {
-    /// Minimum string length to include (default: 4)
-    pub min_length: usize,          // --min-len
+    /// Minimum string length to include (default: None, no minimum)
+    pub min_length: Option<usize>,  // --min-len
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
```text
pub struct FilterConfig {
/// Minimum string length to include (default: 4)
pub min_length: usize, // --min-len
/// Restrict to a specific encoding
pub encoding: Option<EncodingFilter>, // --enc
/// Only include strings with these tags (empty = no filter)
pub include_tags: Vec<Tag>, // --only-tags
/// Exclude strings with these tags (empty = no filter)
pub exclude_tags: Vec<Tag>, // --no-tags
/// Limit output to top N strings by score
pub top_n: Option<usize>, // --top
}
```
pub struct FilterConfig {
/// Minimum string length to include (default: None, no minimum)
pub min_length: Option<usize>, // --min-len
/// Restrict to a specific encoding
pub encoding: Option<EncodingFilter>, // --enc
/// Only include strings with these tags (empty = no filter)
pub include_tags: Vec<Tag>, // --only-tags
/// Exclude strings with these tags (empty = no filter)
pub exclude_tags: Vec<Tag>, // --no-tags
/// Limit output to top N strings by score
pub top_n: Option<usize>, // --top
}
🧰 Tools
🪛 markdownlint-cli2 (0.21.0)

[warning] 125-125: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 112-112: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/src/api.md` around lines 112 - 125, The docs incorrectly state
FilterConfig.min_length default is 4; update the documentation block for
FilterConfig to reflect the actual default (min_length: Option<usize> is
initialized to None in FilterConfig::new()), e.g. change "default: 4" to
"default: none (no minimum)" or similar wording so it matches the behavior of
FilterConfig and FilterConfig::new().

Comment on lines +126 to 133
## Exit Codes

| Code | Meaning |
| ---- | -------------------------------------------------------------------------- |
| 0 | Success (including unknown binary format, empty binary, no filter matches) |
| 1 | Runtime error (file not found, tag overlap, `--summary` in non-TTY) |
| 2 | Argument parsing error (invalid flag, flag conflict, invalid tag name) |

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Exit code docs are out of sync with the v1.0 CLI contract.

The PR contract moved file-not-found to 3 and permission-denied to 4, but this table still documents only 0, 1, and 2, and groups file-not-found under 1. That makes the scripting guidance wrong.

🧰 Tools
🪛 markdownlint-cli2 (0.21.0)

[warning] 126-126: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/src/cli.md` around lines 126 - 133, Update the "Exit Codes" table under
the "Exit Codes" section to match the v1.0 CLI contract: add rows for code 3
(file not found) and code 4 (permission denied), remove "file not found" from
code 1's description (leave code 1 for runtime errors like tag overlap and
`--summary` in non-TTY), and adjust the textual meanings so codes 0,1,2,3,4
reflect: 0 = success, 1 = runtime error (excluding file-not-found/permission
issues), 2 = argument parsing error, 3 = file not found, 4 = permission denied;
ensure the table header/formatting and wording match the existing style in the
"Exit Codes" section.

Comment on lines +3 to +4
> [!NOTE]
> The configuration file system described below is planned but **not yet implemented**. Stringy currently uses CLI flags exclusively for configuration. See the [CLI Reference](cli.md) for available options.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Do not describe configuration as CLI-only.

This note is now too broad: the PR adds programmatic configuration via PipelineConfig/builders, so "CLI flags exclusively" is inaccurate. It also does not prevent readers from treating the rest of this page as implemented behavior today. Narrow this to "the CLI has no config-file support yet" and explicitly frame the remaining config-file sections as planned syntax.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/src/configuration.md` around lines 3 - 4, Update the note so it no
longer claims configuration is "CLI-only": change the text to state that
config-file support in the CLI is not yet available and clarify that
programmatic configuration via PipelineConfig and the builders exists; then
explicitly label the rest of the page as planned config-file syntax (not yet
implemented) so readers understand those sections describe planned behavior
rather than current runtime behavior. Reference the PipelineConfig and builder
types in the note to avoid implying CLI exclusivity and rephrase the sentence
that currently reads "Stringy currently uses CLI flags exclusively for
configuration" to something like "The CLI does not yet support config files;
programmatic configuration is available via PipelineConfig and builder APIs,
while the sections below describe the planned config-file syntax."

Comment on lines +133 to +140
fn test_no_color_env_var() {
let elf_path = "tests/fixtures/test_binary_elf";

stringy()
.arg(elf_path)
.env("NO_COLOR", "1")
.assert()
.success();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

NO_COLOR is not exercised by this test.

Because the command is captured through pipes here, the CLI is already on a non-TTY path and color or spinner output is usually suppressed before NO_COLOR matters. This stays green even if the env var is ignored. Assert absence of ANSI escapes from a forced-color path, or unit-test the color decision directly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration_cli_short_flags.rs` around lines 133 - 140, The current
test test_no_color_env_var uses a piped invocation so the CLI already behaves as
non-TTY and the NO_COLOR env var isn't being exercised; update the test that
calls stringy() to force a color-producing code path (e.g. add the CLI flag that
forces color, such as "--color=always" or otherwise invoke the color decision
path) while still setting env("NO_COLOR","1"), then assert that the command
output contains no ANSI escape sequences (use a regex like \x1b\[[0-9;]*m via
your test assertion helpers) to ensure NO_COLOR is respected; alternatively, add
a focused unit test for the color decision function to verify NO_COLOR overrides
forced-color settings.

Comment on lines +147 to +180
let output = stringy()
.arg(elf_path)
.arg("--only-tags")
.arg("url")
.arg("--only-tags")
.arg("domain")
.arg("--json")
.output()
.expect("Failed to run with multiple --only-tags");

assert!(output.status.success());

let stdout = String::from_utf8_lossy(&output.stdout);
let lines: Vec<&str> = stdout.lines().collect();

assert!(!lines.is_empty(), "Expected at least one result");

for line in lines {
let parsed: Value = serde_json::from_str(line).expect("Each line should be valid JSON");

let tags = parsed["tags"]
.as_array()
.expect("tags field should be an array");

let has_url_or_domain = tags.iter().any(|tag| {
let tag_str = tag.as_str().expect("tag should be a string");
tag_str.eq_ignore_ascii_case("url") || tag_str.eq_ignore_ascii_case("domain")
});

assert!(
has_url_or_domain,
"Every record should have either Url or Domain tag, got: {:?}",
tags
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

This does not prove repeated --only-tags values are ORed.

If argument parsing regresses to "last value wins", every record here can still satisfy url || domain. Compare the combined run against the union of the single-tag runs, or at least prove it is not equal to either single-tag result set.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration_cli_short_flags.rs` around lines 147 - 180, The test
currently only verifies each record contains "url" or "domain" which still
passes if the parser does "last value wins"; modify the test to run three
invocations: one with --only-tags url, one with --only-tags domain, and one with
both --only-tags url --only-tags domain (the existing multi run using stringy()
and elf_path), parse each run's stdout lines into JSON Values (as done with
parsed and tags), collect a canonical identifier for each record (e.g., the
whole JSON string or an ID field) into three Sets, then assert that the
multi-run set equals the union of the single-url and single-domain sets and also
assert the multi-run set is not identical to either single set alone to prove
values are ORed rather than last-wins.

Comment on lines +105 to +123
fn flow1_piped_output_no_score_header() {
// In non-TTY (piped) mode the plain-text output should not contain a
// "Score" table header.
stringy()
.arg("tests/fixtures/test_binary_elf")
.assert()
.success()
.stdout(predicate::str::contains("Score").not());
}

#[test]
fn flow1_raw_piped_output_no_score_header() {
// Raw mode plain-text output should also not contain a "Score" header.
stringy()
.args(["tests/fixtures/test_binary_elf", "--raw"])
.assert()
.success()
.stdout(predicate::str::contains("Score").not());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Make the header absence check more specific.

contains("Score").not() will fail as soon as the fixture itself contains the word Score, even if the non-TTY header stays gone. Assert on the first line or snapshot the exact plain-text shape instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration_flows_1_5.rs` around lines 105 - 123, The tests
flow1_piped_output_no_score_header and flow1_raw_piped_output_no_score_header
currently use predicate::str::contains("Score").not(), which can false-fail if
the fixture contains "Score"; change each assertion to check only the first line
(or exact plain-text snapshot) of stdout instead: capture stdout, split on
newlines and assert the first line does not contain "Score" (or compare the
entire stdout to a saved expected plain-text snapshot). Update the assertions in
those two functions to target the first-line or snapshot rather than using
contains("Score").not().

Comment on lines +245 to +264
fn flow2_enc_utf16_filters_encoding() {
let assert = stringy()
.args(["tests/fixtures/test_binary_elf", "--enc", "utf16", "--json"])
.assert()
.success();

let stdout = String::from_utf8(assert.get_output().stdout.clone()).unwrap();
// The ELF fixture has no UTF-16 strings, so output should be empty.
// If it ever gains UTF-16 strings, verify encodings.
if !stdout.trim().is_empty() {
for line in stdout.lines().filter(|l| !l.is_empty()) {
let v: Value = serde_json::from_str(line).expect("valid JSON");
let enc = v["encoding"].as_str().unwrap_or("");
assert!(
enc.starts_with("Utf16"),
"expected Utf16* encoding, got: {enc}"
);
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

This UTF-16 case can pass while --enc utf16 is broken.

The empty-output branch treats "fixture has no UTF-16 strings" and "the CLI never enabled UTF-16 extraction" the same. Use a fixture with a known wide string and assert at least one Utf16* result so this path is actually covered.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration_flows_1_5.rs` around lines 245 - 264, The test function
flow2_enc_utf16_filters_encoding currently passes silently if stdout is empty,
masking a broken --enc utf16; update the test to use a fixture that contains a
known UTF-16/wide string (replace "tests/fixtures/test_binary_elf" with a
fixture containing a wide string or add such a fixture), invoke
stringy().args([... , "--enc", "utf16", "--json"]) as before, parse lines into
serde_json::Value and assert that at least one parsed result has
v["encoding"].as_str().unwrap_or("").starts_with("Utf16"), failing the test if
there are zero Utf16* results so the CLI behavior is actually exercised.

Comment on lines +217 to +220
let context = FilterContext::default()
.with_section_type(SectionType::StringData)
.with_is_executable(false)
.with_is_writable(false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Make the section weight explicit in these context tests.

FilterContext::default() currently sets section_weight to 0.5 in src/extraction/filters/mod.rs, so these cases now depend on a default rather than on the section semantics they are supposed to validate. The code-section test already pins its weight explicitly; these should do the same so a default change does not silently change the meaning of the assertions.

Also applies to: 243-243

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_noise_filters.rs` around lines 217 - 220, The tests build a
FilterContext using FilterContext::default() which implicitly sets
section_weight to 0.5; make that explicit by chaining .with_section_weight(0.5)
on the constructed context (e.g., after
.with_section_type(SectionType::StringData).with_is_executable(false).with_is_writable(false)
add .with_section_weight(0.5)), and apply the same explicit
.with_section_weight(...) change to the other similar test case at the indicated
location so the tests no longer depend on the default.

…mbol extraction test

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Copilot AI review requested due to automatic review settings March 11, 2026 07:48
@mergify
Copy link
Contributor

mergify bot commented Mar 11, 2026

⚠️ The sha of the head commit of this PR conflicts with #140. Mergify cannot evaluate rules on this PR. Once #140 is merged or closed, Mergify will resume processing this PR. ⚠️

@unclesp1d3r unclesp1d3r linked an issue Mar 11, 2026 that may be closed by this pull request
10 tasks
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 115 out of 124 changed files in this pull request and generated 5 comments.


Basic functionality is implemented with the full interface in development:
# Exclude noisy tags
stringy --no-tags format_string target_binary
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The README example uses --no-tags format_string, but Tag::from_str only accepts fmt for format strings (and serde renames the variant to "fmt"). Update the example to a supported tag value so users can copy/paste it successfully.

Suggested change
stringy --no-tags format_string target_binary
stringy --no-tags fmt target_binary

Copilot uses AI. Check for mistakes.
@@ -142,10 +143,12 @@ Score Offset Section Tags String
"encoding": "utf-8",
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JSON output example shows "encoding": "utf-8", but Encoding derives Serialize without serde(rename), so it serializes as "Utf8" (and similarly "Ascii", "Utf16Le", etc.). Please update the README example to match the actual JSON schema.

Suggested change
"encoding": "utf-8",
"encoding": "Utf8",

Copilot uses AI. Check for mistakes.
Comment on lines +111 to +112
// NOTE: --summary non-TTY exit 1 is covered in integration_flows_6_7.rs
// (flow7_summary_non_tty_exit_1).
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment says the non-TTY --summary path exits 1 and references flow7_summary_non_tty_exit_1, but the actual test is flow7_summary_non_tty_exit_2 (validation error -> exit code 2). Update the comment to reflect the current contract to avoid confusion when maintaining the flow matrix.

Copilot uses AI. Check for mistakes.
Comment on lines +110 to +113
/// Sets the display score (debug mode)
#[must_use]
pub fn with_display_score(mut self, score: i32) -> Self {
self.display_score = Some(score);
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with_display_score is documented as "(debug mode)", but display_score is also used in normal (non-raw) output after score normalization (e.g., TTY table prefers display_score). Please update the doc comment to avoid implying it is debug-only.

Copilot uses AI. Check for mistakes.
# Output formats
stringy --json target_binary
stringy --yara target_binary
stringy --json target_binary | jq '.[] | select(.tags[] | contains("Url"))'
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The README pipes --json output into jq '.[] | ...', but stringy --json emits JSONL (one JSON object per line), not a JSON array. Update the example to a JSONL-compatible jq invocation (e.g., processing input line-by-line) so it works as written.

Suggested change
stringy --json target_binary | jq '.[] | select(.tags[] | contains("Url"))'
stringy --json target_binary | jq 'select(.tags[]? | contains("Url"))'

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
tests/integration_pe.rs (5)

554-556: ⚠️ Potential issue | 🟡 Minor

Another tautology: is_empty() || !is_empty() is always true.

The comment indicates the intent is to verify no panic occurs. If that's the goal, remove the assertion entirely - the test passing without panic is sufficient.

Proposed fix
     let strings = stringy::extraction::extract_resource_strings(&pe_data);
-    // Should return empty Vec without panicking
-    assert!(strings.is_empty() || !strings.is_empty()); // Either is fine, just no panic
+    // Should return without panicking - either empty or populated Vec is acceptable
+    let _ = strings; // Silence unused warning; test passes if no panic
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration_pe.rs` around lines 554 - 556, The assertion
assert!(strings.is_empty() || !strings.is_empty()) is a tautology and should be
removed; simply call stringy::extraction::extract_resource_strings(&pe_data) and
either drop or bind the result (e.g., let _ = strings;) so the test verifies "no
panic" by completing without assertions—update the test around the variable
strings (from extract_resource_strings) accordingly.

121-124: ⚠️ Potential issue | 🟡 Minor

Tautological assertion provides no value.

is_some() || is_none() is always true for an Option. This assertion cannot fail and doesn't verify anything meaningful. Either remove it or assert something specific about the expected state.

Proposed fix
-        // Verify resources field exists (may be None for simple binaries)
-        assert!(
-            container_info.resources.is_some() || container_info.resources.is_none(),
-            "Resources field should exist in ContainerInfo"
-        );
+        // Resources may be None for simple binaries - no assertion needed,
+        // the type system guarantees the field exists.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration_pe.rs` around lines 121 - 124, The assertion uses a
tautology (container_info.resources.is_some() ||
container_info.resources.is_none()) which can never fail; replace or remove it.
Update the test in integration_pe.rs to assert a concrete expectation about
ContainerInfo::resources (for example,
assert!(container_info.resources.is_some(), "expected resources to be present")
or assert_eq!(container_info.resources, None) as appropriate) or remove the
assertion entirely if no specific state is required; reference the
container_info variable and its resources field when making the meaningful
assertion.

138-146: 🧹 Nitpick | 🔵 Trivial

Inconsistent fixture-missing behavior across tests.

Tests like test_pe_resource_enumeration and test_pe_resource_strings_empty_binary silently return early when fixtures are missing, while others (e.g., test_pe_resource_extraction_with_resources) assert fixture presence with detailed build instructions.

Silent returns mask CI misconfigurations. Prefer consistent behavior: either all tests assert fixture presence, or use #[ignore] with a reason for optional fixtures.

Also applies to: 546-551

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration_pe.rs` around lines 138 - 146, The test currently swallows
a missing PE fixture by printing and returning early (the match on
fs::read(&fixture_path) in tests/integration_pe.rs), causing inconsistent
behavior across tests; update this test (and the other early-return cases such
as in test_pe_resource_enumeration and test_pe_resource_strings_empty_binary) to
assert fixture presence instead of silently returning: replace the Err branch
with an assert!(false, "...") or panic! containing the same detailed message and
build instructions used by test_pe_resource_extraction_with_resources, or
alternatively mark the test with #[ignore = "..."] if the fixture is
optional—apply the same change consistently to the other affected blocks (lines
around the second occurrence) so all tests behave uniformly.

187-190: ⚠️ Potential issue | 🟡 Minor

Same tautological assertion pattern here.

Identical issue as line 121-124. Option<T> is always either Some or None by definition.

Proposed fix
-    // Verify the structure is correct even if empty
-    assert!(
-        container_info.resources.is_some() || container_info.resources.is_none(),
-        "Resources field should exist in ContainerInfo"
-    );
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration_pe.rs` around lines 187 - 190, The current assertion checks
a tautology on the Option type (container_info.resources.is_some() ||
container_info.resources.is_none()), which always succeeds; replace it with a
meaningful check: either assert that resources is present when the test expects
it (use container_info.resources.is_some()) or remove the assertion if absence
is acceptable — reference container_info.resources and the ContainerInfo test
expectation to make the correct choice and update the assert message
accordingly.

1-557: 🧹 Nitpick | 🔵 Trivial

File exceeds 500-line limit (557 lines).

Per coding guidelines, files should stay under 500 lines. Consider splitting into focused modules:

  • integration_pe_parsing.rs (detection, section classification, imports/exports)
  • integration_pe_resources.rs (resource enumeration, VERSIONINFO, STRINGTABLE extraction)

This also improves test isolation and parallel execution.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration_pe.rs` around lines 1 - 557, The file is over the 500-line
limit—split tests into two focused integration files: move
detection/section/imports/exports/symbol snapshot tests (functions
test_pe_import_export_extraction, test_pe_section_classification,
test_pe_symbol_extraction_snapshot) into a new tests/integration_pe_parsing.rs
and move resource-related tests (test_pe_resource_enumeration,
test_pe_resource_extraction_with_resources,
test_pe_version_info_string_extraction, test_pe_string_table_extraction,
test_pe_resource_string_extraction_snapshot,
test_pe_resource_strings_empty_binary) into tests/integration_pe_resources.rs;
duplicate or extract the shared helper get_fixture_path and necessary use/import
lines into both files (or create a small tests/common util and duplicate if
simpler), delete the oversized tests/integration_pe.rs, and run cargo test to
ensure names (PeParser, stringy::extraction::extract_resource_strings, and the
test function names) still compile and tests run in parallel.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@tests/integration_pe.rs`:
- Around line 554-556: The assertion assert!(strings.is_empty() ||
!strings.is_empty()) is a tautology and should be removed; simply call
stringy::extraction::extract_resource_strings(&pe_data) and either drop or bind
the result (e.g., let _ = strings;) so the test verifies "no panic" by
completing without assertions—update the test around the variable strings (from
extract_resource_strings) accordingly.
- Around line 121-124: The assertion uses a tautology
(container_info.resources.is_some() || container_info.resources.is_none()) which
can never fail; replace or remove it. Update the test in integration_pe.rs to
assert a concrete expectation about ContainerInfo::resources (for example,
assert!(container_info.resources.is_some(), "expected resources to be present")
or assert_eq!(container_info.resources, None) as appropriate) or remove the
assertion entirely if no specific state is required; reference the
container_info variable and its resources field when making the meaningful
assertion.
- Around line 138-146: The test currently swallows a missing PE fixture by
printing and returning early (the match on fs::read(&fixture_path) in
tests/integration_pe.rs), causing inconsistent behavior across tests; update
this test (and the other early-return cases such as in
test_pe_resource_enumeration and test_pe_resource_strings_empty_binary) to
assert fixture presence instead of silently returning: replace the Err branch
with an assert!(false, "...") or panic! containing the same detailed message and
build instructions used by test_pe_resource_extraction_with_resources, or
alternatively mark the test with #[ignore = "..."] if the fixture is
optional—apply the same change consistently to the other affected blocks (lines
around the second occurrence) so all tests behave uniformly.
- Around line 187-190: The current assertion checks a tautology on the Option
type (container_info.resources.is_some() || container_info.resources.is_none()),
which always succeeds; replace it with a meaningful check: either assert that
resources is present when the test expects it (use
container_info.resources.is_some()) or remove the assertion if absence is
acceptable — reference container_info.resources and the ContainerInfo test
expectation to make the correct choice and update the assert message
accordingly.
- Around line 1-557: The file is over the 500-line limit—split tests into two
focused integration files: move detection/section/imports/exports/symbol
snapshot tests (functions test_pe_import_export_extraction,
test_pe_section_classification, test_pe_symbol_extraction_snapshot) into a new
tests/integration_pe_parsing.rs and move resource-related tests
(test_pe_resource_enumeration, test_pe_resource_extraction_with_resources,
test_pe_version_info_string_extraction, test_pe_string_table_extraction,
test_pe_resource_string_extraction_snapshot,
test_pe_resource_strings_empty_binary) into tests/integration_pe_resources.rs;
duplicate or extract the shared helper get_fixture_path and necessary use/import
lines into both files (or create a small tests/common util and duplicate if
simpler), delete the oversized tests/integration_pe.rs, and run cargo test to
ensure names (PeParser, stringy::extraction::extract_resource_strings, and the
test function names) still compile and tests run in parallel.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: ASSERTIVE

Plan: Pro

Run ID: f9a14bd1-cd3a-4fc7-9b4c-210e194f34a0

📥 Commits

Reviewing files that changed from the base of the PR and between 5cab95a and 4787717.

⛔ Files ignored due to path filters (1)
  • tests/snapshots/integration_pe__pe_symbol_extraction.snap is excluded by !**/*.snap, !**/tests/snapshots/**
📒 Files selected for processing (2)
  • Cargo.toml
  • tests/integration_pe.rs

…talled

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
@codecov
Copy link

codecov bot commented Mar 11, 2026

@unclesp1d3r unclesp1d3r enabled auto-merge (squash) March 11, 2026 08:14
@unclesp1d3r unclesp1d3r disabled auto-merge March 11, 2026 08:14
@unclesp1d3r
Copy link
Member Author

@Mergifyio queue

@mergify
Copy link
Contributor

mergify bot commented Mar 11, 2026

Merge Queue Status

This pull request spent 12 minutes in the queue, including 11 minutes 47 seconds running CI.

Required conditions to merge
  • check-success = coverage
  • check-success = msrv (stable minus 1 releases)
  • check-success = msrv (stable minus 2 releases)
  • check-success = msrv (stable minus 3 releases)
  • check-success = msrv (stable minus 4 releases)
  • check-success = msrv (stable)
  • check-success = quality
  • check-success = test
  • check-success = test-cross-platform (macos-latest, macOS)
  • check-success = test-cross-platform (ubuntu-latest, Linux)
  • check-success = test-cross-platform (windows-latest, Windows)
  • all of [🛡 Merge Protections rule CI must pass]:
    • check-success = coverage
    • check-success = msrv (stable minus 1 releases)
    • check-success = msrv (stable minus 2 releases)
    • check-success = msrv (stable minus 3 releases)
    • check-success = msrv (stable minus 4 releases)
    • check-success = msrv (stable)
    • check-success = quality
    • check-success = test
    • check-success = test-cross-platform (macos-latest, macOS)
    • check-success = test-cross-platform (ubuntu-latest, Linux)
    • check-success = test-cross-platform (windows-latest, Windows)
  • all of [🛡 Merge Protections rule Do not merge outdated PRs]:
  • all of [🛡 Merge Protections rule Enforce conventional commit]:
  • any of [🛡 GitHub repository ruleset rule main]:
    • check-neutral = Mergify Merge Protections
    • check-skipped = Mergify Merge Protections
    • check-success = Mergify Merge Protections

@mergify mergify bot added the queued label Mar 11, 2026
mergify bot added a commit that referenced this pull request Mar 11, 2026
@mergify mergify bot merged commit 2e446e7 into main Mar 11, 2026
30 checks passed
@mergify mergify bot deleted the finish_epic_v1 branch March 11, 2026 08:26
@mergify mergify bot removed the queued label Mar 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Create Classifier Trait and Classification Pipeline Architecture

2 participants