Security Architecture

guardllm uses a defense-in-depth security pipeline designed to harden MCP servers and MCP clients against prompt injection, data exfiltration, replay attacks, and trust-boundary violations from unknown-provenance content sources such as web search results, emails, documents, calendar data, and other untrusted inputs.

Defense Layers

Layer	Name	Purpose	Primary Module
L0	Input Sanitization	Strip hidden HTML, dangerous attributes/comments, invisible Unicode, and normalize content before further processing	`guardllm.security.sanitizer`
L1	Content Isolation	Wrap untrusted input in `<untrusted_content ...>` tags with source attribution	`guardllm.security.isolation`
L2	Source Gate	Enforce provenance-based KG extraction policies (`allow`, `quarantine`, `block`)	`guardllm.security.source_gate`
L3	Outbound DLP	Block high-overlap egress, secret-like patterns, and deobfuscated variants (reversed text, spelled-out characters)	`guardllm.security.outbound_dlp`
L4	Provenance Tracking	Track untrusted spans and block suspicious reuse across trust boundaries, including deobfuscated content variants	`guardllm.security.provenance`
L5	Canary Detection	Session canary generation/detection to flag leakage/exfiltration	`guardllm.security.canary`
L6	Rate Limiting	Per-context action throttling for abuse resistance	`guardllm.security.rate_limiter`
L7	Error Sanitization	Sanitize error payloads before returning to clients	`guardllm.security.error_sanitizer`
L8	OAuth Scope Resolution	Scope narrowing/escalation policy between auth/session states	Host application responsibility
L9	Tool Firewall	Authorize tools by policy + explicit authorization events	`guardllm.security.policy_engine`
L10	Validation	Validate tool arguments before dispatch	`guardllm.security.validation`
L11	Request Binding	Bind tool execution to message hash + args hash + TTL	`guardllm.security.request_binding`
L12	Action Gate	Optional interactive confirmation gate for sensitive actions	`guardllm.security.action_gate`
-	Audit Logging	Structured security event logging for analysis and incident response (cross-cutting observer)	`guardllm.security.audit`

Note: Layer numbering follows pipeline execution order. L8 is documented for completeness but remains outside the library boundary.

Guard API Coverage (Current)

This section is the source of truth for what is wired through guardllm.Guard today.

Layer	Status	Guard API Surface
L0 Input Sanitization	Implemented	`process_inbound(...)`
L1 Content Isolation	Implemented	`process_inbound(...)`
L2 Source Gate	Implemented	`guardllm.security.source_gate.check_extraction_allowed(...)`
L3 Outbound DLP	Implemented	`check_outbound(...)`
L4 Provenance Tracking	Implemented	`process_inbound(...)`, `check_outbound(...)`
L5 Canary Detection	Implemented	`Guard(canary_session_id=...)` + inbound/outbound checks
L6 Rate Limiting	Implemented	`check_tool_call(...)`, `check_outbound(...)`, `guard_tool_call(...)`
L7 Error Sanitization	Implemented	`sanitize_exception(...)`
L8 OAuth Scope Resolution	Not implemented in library	Host application responsibility
L9 Tool Firewall	Implemented	`check_tool_call(...)`, `guard_tool_call(...)`
L10 Validation	Implemented	`validate_tool_args(...)`, `guard_tool_call(validate=True)`
L11 Request Binding	Implemented	`bind_request(...)`, `check_tool_call(...)`, `guard_tool_call(...)`
L12 Action Gate	Implemented	`confirm_action(...)`, `guard_tool_call(..., require_confirmation=True)`
Audit Logging	Implemented	`Guard(audit_logger=...)` emits security events

Unified Pipeline

The central orchestrator is guardllm.security.pipeline.SecurityPipeline, exposed through the high-level Guard API.

Inbound path:

Sanitize untrusted input (L0)
Isolate by trust level (L1)
Ingest for outbound DLP comparisons (L3 data prep)
Record provenance spans (L4)
Check canary presence in inbound payloads (L5)

L0 encoded payload handling:

Base64 and URL-encoded segments are decoded and scored with the prompt-injection detector.
This avoids relying only on fixed suspicious keyword lists.

Tool-call path:

Tool authorization/firewall checks (L9)
Rate limiting (L6)
Optional request binding verification (L11)

Outbound path:

DLP overlap/secret checks (L3), including deobfuscated variants (reversed text, spelled-out characters)
Provenance reuse guard (L4), including deobfuscated variants
Rate limiting (L6)
Canary leakage detection (L5)

Threshold tuning:

L3 and L4 overlap thresholds are configurable per context via PolicyConfig (dlp_verbatim_lcs_min, dlp_ngram_overlap_min, provenance_verbatim_lcs_min, provenance_ngram_overlap_min).
Defaults preserve prior behavior (100/0.40 for DLP, 50/0.30 for provenance).

Unknown-Provenance Source Handling

guardllm supports explicit security contexts for:

MCP server responses (context_mcp_server)
MCP client requests (context_mcp_client)
Documents (context_document)
Web results (context_web)

You can also define custom SecurityContext values for sources like:

email_content
calendar_content
tool_output
rag_content

These source types integrate directly with source-gate and provenance behavior.

Default Security Posture

Trust defaults to UNTRUSTED.
Destructive tool calls are blocked unless explicitly enabled via PolicyConfig(enable_destructive=True).
Destructive tool calls require authorization events in client mode.
Request binding is optional but recommended for all write-capable actions.
OAuth/OIDC integration is supported via host-side scope-to-policy mapping (see docs/oauth_integration.md).

Threats Covered

Hidden-instruction prompt injection in HTML/text payloads
Unicode obfuscation attacks (zero-width/bidi controls)
Exfiltration by copying untrusted spans into outbound content
Obfuscated exfiltration via reversed text or spelled-out characters (e.g. s-t-r-i-p-e)
Replay/deferred tool execution after conversation state changes
Over-privileged tool invocation and destructive action abuse

Operational Boundaries

guardllm is an application-layer hardening library. It does not replace:

network segmentation
host/container isolation
secret management systems
transport-layer authN/authZ
OAuth/OIDC token issuance, validation, and lifecycle management

Use guardllm as one layer in a full security architecture.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Security

docs/security.md

Security Architecture

Defense Layers

Guard API Coverage (Current)

Unified Pipeline

Unknown-Provenance Source Handling

Default Security Posture

Threats Covered

Operational Boundaries

There aren’t any published security advisories

Security: mhcoen/guardllm

Security

docs/security.md

Security Architecture

Defense Layers

Guard API Coverage (Current)

Unified Pipeline

Unknown-Provenance Source Handling

Default Security Posture

Threats Covered

Operational Boundaries

There aren’t any published security advisories