guardllm uses a defense-in-depth security pipeline designed to harden MCP servers and MCP clients against prompt injection, data exfiltration, replay attacks, and trust-boundary violations from unknown-provenance content sources such as web search results, emails, documents, calendar data, and other untrusted inputs.
| Layer | Name | Purpose | Primary Module |
|---|---|---|---|
| L0 | Input Sanitization | Strip hidden HTML, dangerous attributes/comments, invisible Unicode, and normalize content before further processing | guardllm.security.sanitizer |
| L1 | Content Isolation | Wrap untrusted input in <untrusted_content ...> tags with source attribution |
guardllm.security.isolation |
| L2 | Source Gate | Enforce provenance-based KG extraction policies (allow, quarantine, block) |
guardllm.security.source_gate |
| L3 | Outbound DLP | Block high-overlap egress, secret-like patterns, and deobfuscated variants (reversed text, spelled-out characters) | guardllm.security.outbound_dlp |
| L4 | Provenance Tracking | Track untrusted spans and block suspicious reuse across trust boundaries, including deobfuscated content variants | guardllm.security.provenance |
| L5 | Canary Detection | Session canary generation/detection to flag leakage/exfiltration | guardllm.security.canary |
| L6 | Rate Limiting | Per-context action throttling for abuse resistance | guardllm.security.rate_limiter |
| L7 | Error Sanitization | Sanitize error payloads before returning to clients | guardllm.security.error_sanitizer |
| L8 | OAuth Scope Resolution | Scope narrowing/escalation policy between auth/session states | Host application responsibility |
| L9 | Tool Firewall | Authorize tools by policy + explicit authorization events | guardllm.security.policy_engine |
| L10 | Validation | Validate tool arguments before dispatch | guardllm.security.validation |
| L11 | Request Binding | Bind tool execution to message hash + args hash + TTL | guardllm.security.request_binding |
| L12 | Action Gate | Optional interactive confirmation gate for sensitive actions | guardllm.security.action_gate |
| - | Audit Logging | Structured security event logging for analysis and incident response (cross-cutting observer) | guardllm.security.audit |
Note: Layer numbering follows pipeline execution order. L8 is documented for completeness but remains outside the library boundary.
This section is the source of truth for what is wired through guardllm.Guard today.
| Layer | Status | Guard API Surface |
|---|---|---|
| L0 Input Sanitization | Implemented | process_inbound(...) |
| L1 Content Isolation | Implemented | process_inbound(...) |
| L2 Source Gate | Implemented | guardllm.security.source_gate.check_extraction_allowed(...) |
| L3 Outbound DLP | Implemented | check_outbound(...) |
| L4 Provenance Tracking | Implemented | process_inbound(...), check_outbound(...) |
| L5 Canary Detection | Implemented | Guard(canary_session_id=...) + inbound/outbound checks |
| L6 Rate Limiting | Implemented | check_tool_call(...), check_outbound(...), guard_tool_call(...) |
| L7 Error Sanitization | Implemented | sanitize_exception(...) |
| L8 OAuth Scope Resolution | Not implemented in library | Host application responsibility |
| L9 Tool Firewall | Implemented | check_tool_call(...), guard_tool_call(...) |
| L10 Validation | Implemented | validate_tool_args(...), guard_tool_call(validate=True) |
| L11 Request Binding | Implemented | bind_request(...), check_tool_call(...), guard_tool_call(...) |
| L12 Action Gate | Implemented | confirm_action(...), guard_tool_call(..., require_confirmation=True) |
| Audit Logging | Implemented | Guard(audit_logger=...) emits security events |
The central orchestrator is guardllm.security.pipeline.SecurityPipeline, exposed through the high-level Guard API.
Inbound path:
- Sanitize untrusted input (L0)
- Isolate by trust level (L1)
- Ingest for outbound DLP comparisons (L3 data prep)
- Record provenance spans (L4)
- Check canary presence in inbound payloads (L5)
L0 encoded payload handling:
- Base64 and URL-encoded segments are decoded and scored with the prompt-injection detector.
- This avoids relying only on fixed suspicious keyword lists.
Tool-call path:
- Tool authorization/firewall checks (L9)
- Rate limiting (L6)
- Optional request binding verification (L11)
Outbound path:
- DLP overlap/secret checks (L3), including deobfuscated variants (reversed text, spelled-out characters)
- Provenance reuse guard (L4), including deobfuscated variants
- Rate limiting (L6)
- Canary leakage detection (L5)
Threshold tuning:
- L3 and L4 overlap thresholds are configurable per context via
PolicyConfig(dlp_verbatim_lcs_min,dlp_ngram_overlap_min,provenance_verbatim_lcs_min,provenance_ngram_overlap_min). - Defaults preserve prior behavior (
100/0.40for DLP,50/0.30for provenance).
guardllm supports explicit security contexts for:
- MCP server responses (
context_mcp_server) - MCP client requests (
context_mcp_client) - Documents (
context_document) - Web results (
context_web)
You can also define custom SecurityContext values for sources like:
email_contentcalendar_contenttool_outputrag_content
These source types integrate directly with source-gate and provenance behavior.
- Trust defaults to
UNTRUSTED. - Destructive tool calls are blocked unless explicitly enabled via
PolicyConfig(enable_destructive=True). - Destructive tool calls require authorization events in client mode.
- Request binding is optional but recommended for all write-capable actions.
- OAuth/OIDC integration is supported via host-side scope-to-policy mapping (see
docs/oauth_integration.md).
- Hidden-instruction prompt injection in HTML/text payloads
- Unicode obfuscation attacks (zero-width/bidi controls)
- Exfiltration by copying untrusted spans into outbound content
- Obfuscated exfiltration via reversed text or spelled-out characters (e.g.
s-t-r-i-p-e) - Replay/deferred tool execution after conversation state changes
- Over-privileged tool invocation and destructive action abuse
guardllm is an application-layer hardening library. It does not replace:
- network segmentation
- host/container isolation
- secret management systems
- transport-layer authN/authZ
- OAuth/OIDC token issuance, validation, and lifecycle management
Use guardllm as one layer in a full security architecture.