Detect prompt injection attacks hidden in long content (emails, web pages, documents).
Powered by OpenGuardrails SOTA security detection capabilities.
GitHub: https://github.com/openguardrails/moltguard
npm: https://www.npmjs.com/package/@openguardrails/moltguard
OpenGuardrails achieves SOTA results across multilingual safety benchmarks, outperforming LlamaGuard, Qwen3Guard, and other leading guard models.
| Metric | Score | Comparison |
|---|---|---|
| English Prompt F1 | 87.1% | +2.8% vs next best |
| English Response F1 | 88.5% | +8.0% vs next best |
| Multilingual Prompt F1 | 97.3% | +12.3% vs next best |
| Multilingual Response F1 | 97.2% | +19.1% vs next best |
Core Capabilities:
- Unified LLM Architecture - Single 14B dense model quantized to 3.3B via GPTQ. Handles both content-safety and manipulation detection with superior semantic understanding.
- Configurable Policy Adaptation - Dynamic per-request policy with continuous sensitivity thresholds. Tune precision-recall trade-offs in real time via probabilistic logit-space control.
- 119 Languages - Robust multilingual coverage with SOTA results on English, Chinese, and cross-lingual benchmarks. Includes 97k Chinese safety dataset contribution.
- Production Efficiency - P95 latency of 274.6ms with high concurrency. GPTQ quantization enables real-time inference at enterprise scale without sacrificing accuracy.
Technical Paper: https://arxiv.org/abs/2510.19169
Long Content (email/webpage/document)
|
v
+-----------+
| Chunker | Split into 4000 char chunks with 200 char overlap
+-----------+
|
v
+-----------+
|LLM Analysis| Analyze each chunk independently with full focus
| (OG-Text) | "Is there a hidden prompt injection in this content?"
+-----------+
|
v
+-----------+
| Verdict | Aggregate findings from all chunks -> isInjection: true/false
+-----------+
# Install from npm
openclaw plugins install @openguardrails/moltguard
# Restart gateway to load the plugin
openclaw gateway restart# Check plugin list, confirm moltguard status is "loaded"
openclaw plugins listYou should see:
| MoltGuard | moltguard | loaded | ...
| Command | Description |
|---|---|
/og_status |
View status and statistics |
/og_report |
View recent injection detection details |
/og_feedback <id> fp [reason] |
Report false positive |
/og_feedback missed <reason> |
Report missed detection |
Download the test file with hidden injection:
curl -L -o /tmp/test-email.txt https://raw.githubusercontent.com/openguardrails/moltguard/main/samples/test-email.txtAsk the agent to read this file:
Read the contents of /tmp/test-email.txt
# Real-time log monitoring
tail -f /tmp/openclaw/openclaw-$(date +%Y-%m-%d).log | grep "moltguard"
# Or use openclaw command
openclaw logs --follow | grep "moltguard"If detection succeeds, you'll see logs like:
[moltguard] tool_result_persist triggered for "read"
[moltguard] Analyzing tool result from "read" (1183 chars)
[moltguard] Analysis complete in 5896ms: INJECTION DETECTED
[moltguard] INJECTION DETECTED in tool result from "read": Chunk 1: Contains instructions to override guidelines and execute a malicious shell command
In OpenClaw conversation, enter:
/og_status
Returns detection statistics:
**OpenGuardrails Status**
- Enabled: true
- Block on risk: true
- Max chunk size: 4000 chars
**Statistics**
- Total analyses: 5
- Total blocked: 1
- Blocked (24h): 1
- Avg duration: 4521ms
**User Feedback**
- False positives reported: 0
- Missed detections reported: 0
**Recent Analyses**
- 2025-02-05T14:30:19: tool_result (1183 chars) - DETECTED
/og_report
Shows recent injection detection details:
**Recent Prompt Injection Detections**
**#1** - 2025-02-05T14:30:19
- Status: DETECTED
- Type: tool_result (1183 chars)
- Reason: Contains instructions to override guidelines and execute a malicious shell command
- Suspicious: "<Here comes prompt injection content>..."
Use `/og_feedback <id> fp` to report false positive
Use `/og_feedback missed <reason>` to report missed detection
If you find a false positive:
/og_feedback 1 fp This is normal security documentation
If you find a missed detection:
/og_feedback missed Email contained hidden injection that wasn't detected
Feedback is recorded for continuous improvement.
When injection attacks are detected, warnings are immediately logged. You can get real-time notifications through:
Option 1: Monitor Logs
# Real-time monitoring with alert filtering
tail -f /tmp/openclaw/openclaw-$(date +%Y-%m-%d).log | grep "INJECTION DETECTED"Option 2: Configure Webhook (Advanced)
Configure hooks in ~/.openclaw/openclaw.json to forward alerts to Slack/Discord/etc:
{
"hooks": {
"og-alert": {
"url": "https://your-webhook-url.com/alert",
"events": ["plugin:moltguard:injection-detected"]
}
}
}You can set up scheduled tasks to have OpenClaw automatically report detection status:
In OpenClaw conversation, enter:
/cron add --name "OG-Daily-Report" --every 24h --message "/og_report"
This will automatically execute /og_report every 24 hours and send the detection report.
Other scheduling options:
--every 1h- Every hour--every 7d- Every week--cron "0 9 * * *"- Every day at 9 AM (cron expression)
View scheduled tasks:
/cron list
Remove scheduled task:
/cron remove OG-Daily-Report
Edit OpenClaw config file (~/.openclaw/openclaw.json):
{
"plugins": {
"entries": {
"moltguard": {
"enabled": true,
"config": {
"blockOnRisk": true,
"maxChunkSize": 4000,
"overlapSize": 200,
"timeoutMs": 60000
}
}
}
}
}| Option | Default | Description |
|---|---|---|
enabled |
true | Enable/disable plugin |
blockOnRisk |
true | Block tool calls when injection is detected |
maxChunkSize |
4000 | Maximum characters per chunk |
overlapSize |
200 | Overlap characters between chunks |
timeoutMs |
60000 | Analysis timeout in milliseconds |
openclaw plugins uninstall @openguardrails/moltguard
openclaw gateway restart# Clone repository
git clone https://github.com/openguardrails/moltguard.git
cd moltguard
# Install dependencies
npm install
# Local development install
openclaw plugins install -l .
openclaw gateway restart
# Type check
npm run typecheck
# Run tests
npm testMIT