Hardening: defender timeouts, truncation, prefilter#1
Hardening: defender timeouts, truncation, prefilter#1Braath1 wants to merge 1 commit intolasso-security:devfrom
Conversation
📝 WalkthroughWalkthroughSafety constants and regex timeout support have been introduced to the prompt injection defender. The text extraction function now normalizes tool outputs more robustly with truncation, while the injection scanning function implements a lightweight pre-filter for pattern matching efficiency and timeout-based detection. Changes
Sequence DiagramsequenceDiagram
participant Client as Input Handler
participant Extract as Text Extractor
participant PreFilter as Pre-filter Logic
participant Compiler as Pattern Compiler
participant Matcher as Regex Engine
participant Detector as Detection Handler
Client->>Extract: Tool output (string/dict/list)
Extract->>Extract: Normalize content<br/>(multiple fallback paths)
Extract->>Extract: Truncate to 100 KB
Extract->>Client: Extracted text
Client->>PreFilter: Text + Patterns
PreFilter->>PreFilter: Extract cheap keywords<br/>from patterns
PreFilter->>PreFilter: Check if text < 2000 chars<br/>OR keywords match?
alt Short text OR Keywords match
PreFilter->>Compiler: Proceed with scan
else Long text AND no keywords
PreFilter->>Client: Skip full scan
Note over Client: Return no detection
end
Compiler->>Compiler: Compile pattern<br/>(regex or re module)
Compiler->>Matcher: Execute match<br/>with timeout (200ms)
Matcher->>Detector: Pattern match result
Detector->>Detector: Handle timeout/error<br/>as medium severity
Detector->>Client: Detection result
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
.claude/skills/prompt-injection-defender/hooks/defender-python/post-tool-defender.py (1)
116-128:⚠️ Potential issue | 🔴 Critical
UnboundLocalErrorwhencontentis neitherstrnorlistIf
tool_resultis adict,"content"exists, but its value isNone, anint, anotherdict, etc. — neither thestrnor thelistbranch executes, sotextis never assigned. The reference at line 170 (if text and len(text) > MAX_SCAN_CHARS) then raisesUnboundLocalError.🐛 Proposed fix — add an else branch
elif isinstance(content, list): texts = [] for block in content: if isinstance(block, dict) and "text" in block: texts.append(str(block["text"])) elif isinstance(block, str): texts.append(block) text = "\n".join(texts) + else: + try: + text = json.dumps(tool_result) + except (TypeError, ValueError): + text = str(tool_result)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.claude/skills/prompt-injection-defender/hooks/defender-python/post-tool-defender.py around lines 116 - 128, The code can leave `text` unassigned when `tool_result["content"]` exists but is neither a str nor a list, causing an UnboundLocalError later; update the branch handling around `if "content" in tool_result:` (the block that currently handles `content`, `text`, and `texts`) to add an else branch that assigns a safe fallback to `text` (e.g., set `text = ""` for None or `text = str(content)` for other types) so subsequent checks like `if text and len(text) > MAX_SCAN_CHARS` cannot fail.
🧹 Nitpick comments (1)
.claude/skills/prompt-injection-defender/hooks/defender-python/post-tool-defender.py (1)
94-94:re2is a well-known separate library — the alias is misleading
re2conventionally refers to Google's RE2 binding for Python, which has a different API and behavior. Using it as an alias for theregexmodule will surprise readers and could cause genuine confusion ifre2is ever added as a dependency alongsideregex.Consider
import regex as _regex(or simplyregex) to avoid the collision.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.claude/skills/prompt-injection-defender/hooks/defender-python/post-tool-defender.py at line 94, The import alias "re2" for the third-party "regex" module is misleading; change the import so it does not shadow the well-known google-re2 package name (e.g., import regex as _regex or import regex) and update usages in this file that reference "re2" to the new identifier (search for references to the "re2" symbol in this module and replace them with the new name) so callers like any_regex_matching_logic continue to function with the correct module.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
@.claude/skills/prompt-injection-defender/hooks/defender-python/post-tool-defender.py:
- Around line 93-98: The optional import block that sets re2 and HAS_REGEX is
catching all exceptions; change the except clause to catch ImportError only so
runtime errors inside the import block aren't masked. Update the try/except
around "import regex as re2" to "except ImportError" and keep the fallback
assignments to re2 = None and HAS_REGEX = False so the rest of the code using
re2 and HAS_REGEX behaves the same.
- Around line 215-219: The nested "import re as _re" is redundant and should be
removed; inside the block that currently does "import re as _re" then "tokens =
_re.findall(...)" simply use the module already imported at top-level (call
re.findall(...)) and delete the inner import statement so the loop that iterates
over tokens[:3] and adds to keywords (variables: pat, tokens, keywords) uses the
existing re binding.
- Around line 260-270: The current except Exception block around the regex
search is too broad; change it to catch TimeoutError (from the regex module) and
re.error explicitly so you only treat timeouts and malformed patterns as
detections. In the try/except that wraps the search using compiled (and the
re2.Path check for re2.Pattern and PER_PATTERN_TIMEOUT_MS), replace the bare
Exception handler with two except clauses: one for TimeoutError and one for
re.error that append to detections the same tuple (category_name, pattern,
"pattern timeout or error", "medium"); let other exceptions bubble up so
logic/interrupts aren’t swallowed.
- Around line 176-183: The helper _compile_pattern currently catches all
Exception; narrow this to only regex compilation errors by catching re.error
(and re2.error when HAS_REGEX is True). Update the except clause in
_compile_pattern to catch the specific exception types raised by
re.compile/re2.compile (e.g., except (re.error, re2.error): or use getattr(re2,
"error", re.error) when checking HAS_REGEX) and continue to return None for
those cases.
- Around line 203-232: The prefilter can be bypassed when the extracted keywords
set is empty, so change the logic in the prefilter block that builds keywords
and determines do_full_scan: after populating keywords from config (used by
pattern extraction via item.get("pattern")), detect if keywords is empty and in
that case force do_full_scan = True (or skip the prefilter entirely) so the full
pattern matching always runs for long texts; update the code paths that
reference keywords, do_full_scan, lowered, and text to ensure an empty keywords
set does not cause an early return of [] for texts >= 2000 characters.
- Line 91: PER_PATTERN_TIMEOUT_MS is defined as milliseconds but passed directly
to regex.search(timeout=...), which expects seconds; change the constant to
represent seconds (e.g., PER_PATTERN_TIMEOUT_S = 0.2) or convert milliseconds to
seconds before calling regex.search (timeout=PER_PATTERN_TIMEOUT_MS / 1000.0),
and update both usages (the declaration PER_PATTERN_TIMEOUT_MS and the call
sites where regex.search(...) is invoked, including the second occurrence around
the other reported line) to use the corrected seconds value or the converted
expression so the timeout is actually ~200ms.
---
Outside diff comments:
In
@.claude/skills/prompt-injection-defender/hooks/defender-python/post-tool-defender.py:
- Around line 116-128: The code can leave `text` unassigned when
`tool_result["content"]` exists but is neither a str nor a list, causing an
UnboundLocalError later; update the branch handling around `if "content" in
tool_result:` (the block that currently handles `content`, `text`, and `texts`)
to add an else branch that assigns a safe fallback to `text` (e.g., set `text =
""` for None or `text = str(content)` for other types) so subsequent checks like
`if text and len(text) > MAX_SCAN_CHARS` cannot fail.
---
Nitpick comments:
In
@.claude/skills/prompt-injection-defender/hooks/defender-python/post-tool-defender.py:
- Line 94: The import alias "re2" for the third-party "regex" module is
misleading; change the import so it does not shadow the well-known google-re2
package name (e.g., import regex as _regex or import regex) and update usages in
this file that reference "re2" to the new identifier (search for references to
the "re2" symbol in this module and replace them with the new name) so callers
like any_regex_matching_logic continue to function with the correct module.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
.claude/skills/prompt-injection-defender/hooks/defender-python/post-tool-defender.py
📜 Review details
🧰 Additional context used
🪛 Ruff (0.15.2)
.claude/skills/prompt-injection-defender/hooks/defender-python/post-tool-defender.py
[warning] 96-96: Do not catch blind exception: Exception
(BLE001)
[warning] 182-182: Do not catch blind exception: Exception
(BLE001)
[warning] 268-268: Do not catch blind exception: Exception
(BLE001)
|
|
||
| # safety constants | ||
| MAX_SCAN_CHARS = 100 * 1024 # 100 KB default max to scan | ||
| PER_PATTERN_TIMEOUT_MS = 200 # 200 ms per pattern timeout when using regex module |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's examine the file structure and the relevant lines
fd "post-tool-defender.py" --type fRepository: lasso-security/claude-hooks
Length of output: 53
🏁 Script executed:
# Once we find the file, check the content around lines 91 and 263
if [ -f ".claude/skills/prompt-injection-defender/hooks/defender-python/post-tool-defender.py" ]; then
echo "=== File found, checking lines 85-100 ==="
sed -n '85,100p' ".claude/skills/prompt-injection-defender/hooks/defender-python/post-tool-defender.py"
echo ""
echo "=== Checking lines 255-270 ==="
sed -n '255,270p' ".claude/skills/prompt-injection-defender/hooks/defender-python/post-tool-defender.py"
fiRepository: lasso-security/claude-hooks
Length of output: 1241
🌐 Web query:
Python regex module timeout parameter unit seconds or milliseconds
💡 Result:
In the third‑party Python regex module (PyPI package name regex, a drop‑in replacement for re), the timeout= parameter is specified in seconds (you can pass an int or float), and it applies to the entire matching operation, not milliseconds. [1]
Fix timeout unit: PER_PATTERN_TIMEOUT_MS is passed to regex.search() which expects seconds
The regex module's timeout parameter is in seconds, not milliseconds. Setting PER_PATTERN_TIMEOUT_MS = 200 and passing it as timeout=PER_PATTERN_TIMEOUT_MS results in a 200-second timeout (1000× larger than the intended 200 milliseconds), completely defeating the ReDoS protection this PR provides.
Proposed fix
-PER_PATTERN_TIMEOUT_MS = 200 # 200 ms per pattern timeout when using regex module
+PER_PATTERN_TIMEOUT_S = 0.2 # 200 ms per pattern timeout when using regex module- if re2.search(compiled, text, timeout=PER_PATTERN_TIMEOUT_MS):
+ if re2.search(compiled, text, timeout=PER_PATTERN_TIMEOUT_S):Also applies to: 263-263
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
@.claude/skills/prompt-injection-defender/hooks/defender-python/post-tool-defender.py
at line 91, PER_PATTERN_TIMEOUT_MS is defined as milliseconds but passed
directly to regex.search(timeout=...), which expects seconds; change the
constant to represent seconds (e.g., PER_PATTERN_TIMEOUT_S = 0.2) or convert
milliseconds to seconds before calling regex.search
(timeout=PER_PATTERN_TIMEOUT_MS / 1000.0), and update both usages (the
declaration PER_PATTERN_TIMEOUT_MS and the call sites where regex.search(...) is
invoked, including the second occurrence around the other reported line) to use
the corrected seconds value or the converted expression so the timeout is
actually ~200ms.
| try: | ||
| import regex as re2 | ||
| HAS_REGEX = True | ||
| except Exception: | ||
| re2 = None | ||
| HAS_REGEX = False |
There was a problem hiding this comment.
Use ImportError instead of bare Exception for the optional-dependency guard
except Exception catches unrelated runtime errors (e.g., a NameError inside the import body), masking genuine problems. The expected failure mode here is a missing package.
♻️ Proposed fix
try:
import regex as re2
HAS_REGEX = True
-except Exception:
+except ImportError:
re2 = None
HAS_REGEX = False🧰 Tools
🪛 Ruff (0.15.2)
[warning] 96-96: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
@.claude/skills/prompt-injection-defender/hooks/defender-python/post-tool-defender.py
around lines 93 - 98, The optional import block that sets re2 and HAS_REGEX is
catching all exceptions; change the except clause to catch ImportError only so
runtime errors inside the import block aren't masked. Update the try/except
around "import regex as re2" to "except ImportError" and keep the fallback
assignments to re2 = None and HAS_REGEX = False so the rest of the code using
re2 and HAS_REGEX behaves the same.
| def _compile_pattern(pattern: str, flags=0): | ||
| try: | ||
| if HAS_REGEX: | ||
| return re2.compile(pattern, flags) | ||
| else: | ||
| return re.compile(pattern, flags) | ||
| except Exception: | ||
| return None |
There was a problem hiding this comment.
Catch re.error specifically instead of bare Exception
re.compile / re2.compile raises re.error (or re2.error) for invalid patterns; catching Exception swallows unrelated runtime errors. Narrowing the catch also makes it clear what failure mode the silent None return is intended to represent.
♻️ Proposed fix
-def _compile_pattern(pattern: str, flags=0):
+def _compile_pattern(pattern: str, flags: int = 0):
try:
if HAS_REGEX:
return re2.compile(pattern, flags)
else:
return re.compile(pattern, flags)
- except Exception:
+ except re.error:
return None🧰 Tools
🪛 Ruff (0.15.2)
[warning] 182-182: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
@.claude/skills/prompt-injection-defender/hooks/defender-python/post-tool-defender.py
around lines 176 - 183, The helper _compile_pattern currently catches all
Exception; narrow this to only regex compilation errors by catching re.error
(and re2.error when HAS_REGEX is True). Update the except clause in
_compile_pattern to catch the specific exception types raised by
re.compile/re2.compile (e.g., except (re.error, re2.error): or use getattr(re2,
"error", re.error) when checking HAS_REGEX) and continue to return None for
those cases.
| # quick keyword prefilter: build a small set of keywords from patterns to cheaply decide | ||
| keywords = set() | ||
| for key in [ | ||
| "instructionOverridePatterns", | ||
| "rolePlayingPatterns", | ||
| "encodingPatterns", | ||
| "contextManipulationPatterns", | ||
| ]: | ||
| for item in config.get(key, []): | ||
| if isinstance(item, dict): | ||
| pat = item.get("pattern", "") | ||
| # extract cheap keyword tokens (alphanumeric sequences of length>=4) | ||
| import re as _re | ||
|
|
||
| tokens = _re.findall(r"[A-Za-z]{4,}", pat) | ||
| for t in tokens[:3]: | ||
| keywords.add(t.lower()) | ||
|
|
||
| # only run full patterns if at least one keyword matches OR text is short | ||
| do_full_scan = False | ||
| lowered = text.lower() | ||
| for k in keywords: | ||
| if k in lowered: | ||
| do_full_scan = True | ||
| break | ||
| if len(text) < 2000: | ||
| do_full_scan = True | ||
|
|
||
| if not do_full_scan: | ||
| return [] |
There was a problem hiding this comment.
Empty keywords set causes complete scan bypass on long text — security regression
If every pattern in patterns.yaml contains no alphabetic tokens of 4+ characters (e.g., purely symbolic or numeric regexes), keywords stays empty. The for k in keywords loop is a no-op, do_full_scan stays False, and when len(text) >= 2000 the function returns [] — silently reporting zero detections regardless of actual content. An attacker can also induce this path by prepending 2 000+ padding characters before the payload.
🐛 Proposed fix — skip the prefilter entirely when no keywords could be extracted
- # only run full patterns if at least one keyword matches OR text is short
- do_full_scan = False
+ # only run full patterns if at least one keyword matches, text is short,
+ # or no keywords were extracted (can't use prefilter)
+ do_full_scan = len(keywords) == 0 # no signal → must scan
lowered = text.lower()
for k in keywords:
if k in lowered:
do_full_scan = True
break
if len(text) < 2000:
do_full_scan = True🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
@.claude/skills/prompt-injection-defender/hooks/defender-python/post-tool-defender.py
around lines 203 - 232, The prefilter can be bypassed when the extracted
keywords set is empty, so change the logic in the prefilter block that builds
keywords and determines do_full_scan: after populating keywords from config
(used by pattern extraction via item.get("pattern")), detect if keywords is
empty and in that case force do_full_scan = True (or skip the prefilter
entirely) so the full pattern matching always runs for long texts; update the
code paths that reference keywords, do_full_scan, lowered, and text to ensure an
empty keywords set does not cause an early return of [] for texts >= 2000
characters.
| import re as _re | ||
|
|
||
| tokens = _re.findall(r"[A-Za-z]{4,}", pat) | ||
| for t in tokens[:3]: | ||
| keywords.add(t.lower()) |
There was a problem hiding this comment.
Redundant import re as _re inside a nested loop
re is already imported at module level (line 27). Importing it again inside the inner loop on every iteration is noise; just use the already-bound re name directly.
♻️ Proposed fix
- import re as _re
-
- tokens = _re.findall(r"[A-Za-z]{4,}", pat)
+ tokens = re.findall(r"[A-Za-z]{4,}", pat)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
@.claude/skills/prompt-injection-defender/hooks/defender-python/post-tool-defender.py
around lines 215 - 219, The nested "import re as _re" is redundant and should be
removed; inside the block that currently does "import re as _re" then "tokens =
_re.findall(...)" simply use the module already imported at top-level (call
re.findall(...)) and delete the inner import statement so the loop that iterates
over tokens[:3] and adds to keywords (variables: pat, tokens, keywords) uses the
existing re binding.
| try: | ||
| if HAS_REGEX and isinstance(compiled, re2.Pattern): | ||
| # use timeout | ||
| if re2.search(compiled, text, timeout=PER_PATTERN_TIMEOUT_MS): | ||
| detections.append((category_name, pattern, reason, severity)) | ||
| else: | ||
| if re.search(compiled, text): | ||
| detections.append((category_name, pattern, reason, severity)) | ||
| except Exception: | ||
| # timeouts and other issues: treat as potential detection | ||
| detections.append((category_name, pattern, "pattern timeout or error", "medium")) |
There was a problem hiding this comment.
Catch TimeoutError specifically instead of bare Exception
The regex module raises the built-in TimeoutError on a per-pattern timeout. Catching Exception also swallows KeyboardInterrupt-adjacent errors and any logic bugs in surrounding code, making debugging harder. If you also want to guard against malformed pattern objects, add re.error explicitly.
♻️ Proposed fix
- except Exception:
- # timeouts and other issues: treat as potential detection
+ except (TimeoutError, re.error):
+ # timeout = possible ReDoS probe → treat as detection; re.error = bad pattern
detections.append((category_name, pattern, "pattern timeout or error", "medium"))🧰 Tools
🪛 Ruff (0.15.2)
[warning] 268-268: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
@.claude/skills/prompt-injection-defender/hooks/defender-python/post-tool-defender.py
around lines 260 - 270, The current except Exception block around the regex
search is too broad; change it to catch TimeoutError (from the regex module) and
re.error explicitly so you only treat timeouts and malformed patterns as
detections. In the try/except that wraps the search using compiled (and the
re2.Path check for re2.Pattern and PER_PATTERN_TIMEOUT_MS), replace the bare
Exception handler with two except clauses: one for TimeoutError and one for
re.error that append to detections the same tuple (category_name, pattern,
"pattern timeout or error", "medium"); let other exceptions bubble up so
logic/interrupts aren’t swallowed.
|
Great hardening work. A few additional considerations from running hooks in production (160+ hours autonomous):
|
Add safety hardening: input truncation (100KB), per-pattern timeout with regex module, precompiled patterns and cheap keyword prefilter. This reduces ReDoS/CPU risks. Includes minor CLI scaffold.
Summary by CodeRabbit
Bug Fixes
Performance