This directory contains security hardening scripts for the cagent-action GitHub Action.
This action includes built-in security features for all agent executions:
-
Authorization Check - Users are verified for comment-triggered events:
- Only
OWNER,MEMBER, andCOLLABORATORroles can trigger via comments (e.g.,/review) - External contributors (
CONTRIBUTOR,FIRST_TIME_CONTRIBUTOR,NONE) are blocked - Skips for non-comment events (PR triggers, scheduled jobs, workflow_dispatch)
- Comment-triggered actions are the main abuse vector - this protects against cost/spam attacks
- Only
-
Output Scanning - All agent responses are scanned for leaked secrets:
- API key patterns:
sk-ant-*,sk-*,sk-proj-* - GitHub tokens:
ghp_*,gho_*,ghu_*,ghs_*,github_pat_* - Environment variable names in output
- If secrets detected: workflow fails, security issue created
- API key patterns:
-
Prompt Sanitization - User prompts are checked in two tiers:
- Critical patterns (block): Direct secret exfiltration commands (
echo $API_KEY,console.log(process.env)) - Suspicious patterns (strip + warn): Behavioral/natural language injection ("ignore previous instructions", "base64 decode", etc.) — matching lines are stripped from the prompt before it reaches the agent
- Medium-risk patterns (warn): API key variable names in configuration
- Critical patterns (block): Direct secret exfiltration commands (
The action implements a defense-in-depth approach:
┌─────────────────────────────────────────────────────────────┐
│ 1. Authorization Check (check-auth.sh) │
│ ✓ Verify user's author_association role │
│ ✓ Block external contributors by default │
│ ✓ Only OWNER, MEMBER, COLLABORATOR allowed │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ 2. Prompt Sanitization │
│ ✓ Detect prompt injection attempts │
│ ✓ Warn about suspicious patterns │
│ ✓ Check for encoded malicious content │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ 3. Agent Execution │
│ ✓ User-provided agent runs in isolated cagent runtime │
│ ✓ No direct access to secrets or environment vars │
│ ✓ Controlled execution environment │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ 4. Output Scanning │
│ ✓ Scan for leaked API keys (Anthropic, OpenAI, etc.) │
│ ✓ Scan for leaked tokens (GitHub PAT, OAuth, etc.) │
│ ✓ Block execution if secrets found │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ 5. Incident Response │
│ ✓ Create security issue with details │
│ ✓ Fail workflow with clear error │
│ ✓ Prevent secret exposure in logs │
└─────────────────────────────────────────────────────────────┘
Central source of truth for secret detection patterns. This file is sourced by:
sanitize-output.sh- UsesSECRET_PATTERNSarray for comprehensive regex matchingaction.yml(Build safe prompt step) - UsesSECRET_PATTERNSfor prompt verification
Why shared patterns?
- DRY principle: Single source of truth prevents drift
- Consistency: Same patterns across all security layers
- Maintainability: Update patterns in one place
Secret patterns detected:
SECRET_PATTERNS=(
'sk-ant-[a-zA-Z0-9_-]{30,}' # Anthropic API keys
'ghp_[a-zA-Z0-9]{36}' # GitHub personal access tokens
'gho_[a-zA-Z0-9]{36}' # GitHub OAuth tokens
'ghu_[a-zA-Z0-9]{36}' # GitHub user tokens
'ghs_[a-zA-Z0-9]{36}' # GitHub server tokens
'github_pat_[a-zA-Z0-9_]+' # GitHub fine-grained tokens
'sk-[a-zA-Z0-9]{48}' # OpenAI API keys
'sk-proj-[a-zA-Z0-9]{48}' # OpenAI project keys
)Purpose: Output scanning for leaked secrets
Function: Last line of defense - scans AI responses for leaked API keys/tokens
Patterns: Sources from secret-patterns.sh for comprehensive detection
Usage:
./sanitize-output.sh output-file.txtOutputs:
leaked=true/falseto$GITHUB_OUTPUT- Exits with code 1 if secrets detected
Purpose: Input sanitization for PR diffs and user prompts
Function:
- Removes code comments from diffs (prevents hidden instructions)
- Detects CRITICAL patterns (blocks execution with exit 1)
- Direct secret extraction commands (
echo $API_KEY,console.log(process.env)) - Environment variable extraction (
printenv ANTHROPIC_API_KEY) - Secret file access (
cat .env)
- Direct secret extraction commands (
- Detects SUSPICIOUS patterns (strips matching lines from output, warns, exit 0)
- Instruction override attempts ("ignore previous instructions")
- System/mode overrides ("system mode", "debug mode")
- Natural language secret requests ("show me the API key")
- System prompt extraction attempts
- Jailbreak attempts
- Encoding/obfuscation (base64, hex)
- Detects MEDIUM-RISK patterns (warns but allows execution)
- API key variable names in configuration
Usage:
./sanitize-input.sh input-file.txt output-file.txtOutputs:
blocked=true/falseto$GITHUB_OUTPUT(true only for CRITICAL patterns)stripped=true/falseto$GITHUB_OUTPUT(true when suspicious content was removed)risk-level=low/medium/highto$GITHUB_OUTPUT- Exits with code 1 only for CRITICAL patterns (direct secret exfiltration)
- Removes all code comments before analysis (prevents hidden instructions)
- Blocks patterns like "ignore previous instructions", "show me the API key"
- Detects encoded requests (base64, hex, ROT13)
- Scans for API key patterns with specific lengths and formats
- Checks for environment variable names in output
- Blocks execution if any secrets detected
- Creates security incident issues automatically
cd tests
# Run security test suite (21 tests)
./test-security.sh
# Run exploit simulation tests (6 tests)
./test-exploits.shtest-security.sh (21 tests):
- Clean input (should pass)
- Prompt injection in comment (should strip, not block)
- Clean output (should pass)
- Leaked API key (should block)
- Leaked GitHub token (should block)
- Authorization - OWNER (should pass)
- Authorization - COLLABORATOR (should pass)
- Authorization - CONTRIBUTOR (should block)
- Clean prompt (should pass)
- Prompt injection in user prompt (should strip, not block)
- Encoded content in prompt (should strip, not block)
- Low risk input - normal code (should pass)
- Medium risk input - API key variable (should warn but pass)
- Critical input - secret exfiltration command (should block)
- Regex pattern in output (should NOT flag as leak)
- Real GitHub server token (should flag as leak)
- Release notes with 'system...models' (should NOT block)
- Real 'system mode' injection (should strip, not block)
- Verify suspicious content physically removed from output file
- Critical pattern (
echo $ANTHROPIC_API_KEY) still blocks with exit 1 - Mixed suspicious + clean content preserves clean parts
test-exploits.sh (6 tests):
- Prompt injection via comment (should be stripped)
- High-risk behavioral injection (should be blocked)
- Output token leak (should be blocked)
- Prompt override attempt (should warn)
- Extra args parsing sanity check
- Quoted arguments handling
All tests must pass before deployment.
- name: Run Agent
id: agent
uses: docker/cagent-action@latest
with:
agent: my-agent
prompt: "Analyze the logs"
anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}
- name: Check for security issues
if: always()
run: |
if [ "${{ steps.agent.outputs.secrets-detected }}" == "true" ]; then
echo "⚠️ Secret leak detected - incident issue created"
fi
if [ "${{ steps.agent.outputs.prompt-suspicious }}" == "true" ]; then
echo "⚠️ Prompt had suspicious patterns"
fiAll executions automatically include:
- Prompt sanitization warnings
- Output scanning for secrets
- Incident issue creation if secrets detected
- Workflow failure on security violations
When adding new secret patterns:
-
Update
secret-patterns.shwith new regex pattern:SECRET_PATTERNS=( # ... existing patterns ... 'new-provider-[a-zA-Z0-9]{40}' # New provider API keys )
-
Add to
SECRET_PREFIXESif needed for quick checks:SECRET_PREFIXES='(sk-ant-|...|new-provider-)' -
Run tests to verify:
cd tests ./test-security.sh ./test-exploits.sh -
Consider adding a specific test case for the new pattern in
test-security.sh
Before deploying changes:
- All security tests pass (
test-security.sh) - All exploit tests pass (
test-exploits.sh) - Shared patterns are used consistently
- New patterns added to
secret-patterns.shonly - No hardcoded secrets in code
- Authorization checks cannot be bypassed
- Output scanning covers all execution paths
The action provides security-related outputs that can be checked in subsequent steps:
| Output | Description |
|---|---|
secrets-detected |
true if secrets were detected in output |
prompt-suspicious |
true if suspicious patterns were detected in prompt |
If you discover a security vulnerability, please:
- Do NOT open a public issue
- Email security concerns to the maintainers
- Provide detailed information about the vulnerability
- Allow time for a fix before public disclosure