Skip to content

LLM Penetration Testing Framework - Discover vulnerabilities in AI applications before attackers do. 100attacks + AI-powered adaptive mode.

License

Notifications You must be signed in to change notification settings

Neural-alchemy/promptxploit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

PromptXploit

LLM Penetration Testing Framework - Discover vulnerabilities before attackers do

License: MIT Python 3.8+

⚠️ READ DISCLAIMER - Authorized testing only


What is PromptXploit?

PromptXploit is a comprehensive security testing framework for LLM applications. Test your AI systems for vulnerabilities before deployment.

Key Features:

  • 🎯 147 attack vectors across 17 categories
  • 🧠 AI-powered adaptive mode - Learns defenses and crafts new attacks
  • πŸ” Intelligence-based recon - Novel attack discovery
  • πŸ“Š JSON reporting - Detailed vulnerability analysis
  • πŸ”Œ Framework-agnostic - Works with any LLM

Quick Start (30 seconds)

1. Install

git clone https://github.com/Neural-alchemy/promptxploit
cd promptxploit
pip install -e .

2. Create Target

# my_target.py
def run(prompt: str) -> str:
    # Your LLM here
    return your_llm(prompt)

3. Run Scan

python -m promptxploit.main \
    --target my_target.py \
    --attacks attacks/ \
    --output scan.json

Done! Check scan.json for vulnerabilities.


Attack Taxonomy

PromptXploit tests 147 attacks across these categories:

LLM Attack Surface
β”œβ”€β”€ Prompt Injection (8 variants)
β”‚   β”œβ”€β”€ Direct instruction override
β”‚   β”œβ”€β”€ Context confusion
β”‚   └── Delimiter exploitation
β”œβ”€β”€ Jailbreaks (10 variants)
β”‚   β”œβ”€β”€ DAN (Do Anything Now)
β”‚   β”œβ”€β”€ Developer mode
β”‚   └── Persona manipulation
β”œβ”€β”€ System Extraction (8 variants)
β”‚   β”œβ”€β”€ Prompt leakage
β”‚   β”œβ”€β”€ Configuration disclosure
β”‚   └── Training data extraction
β”œβ”€β”€ Encoding Attacks (8 variants)
β”‚   β”œβ”€β”€ Base64 obfuscation
β”‚   β”œβ”€β”€ ROT13/Caesar
β”‚   └── Unicode tricks
β”œβ”€β”€ Multi-Agent Exploitation (10 variants)
β”‚   β”œβ”€β”€ Tool hijacking
β”‚   β”œβ”€β”€ Agent confusion
β”‚   └── Coordination attacks
β”œβ”€β”€ RAG Poisoning (8 variants)
β”‚   β”œβ”€β”€ Context injection
β”‚   β”œβ”€β”€ Retrieval manipulation
β”‚   └── Source confusion
└── [11 more categories...]

Usage Modes

Static Mode (Fast)

Test with all 147 pre-built attacks:

python -m promptxploit.main \
    --mode static \
    --target my_app.py \
    --attacks attacks/ \
    --output results.json

Adaptive Mode - Mutation

Evolve attacks if blocked:

python -m promptxploit.main \
    --mode adaptive \
    --adaptive-strategy mutation \
    --adaptive-api "YOUR_OPENAI_KEY" \
    --max-iterations 3 \
    --target my_app.py \
    --attacks attacks/jailbreak \
    --output adaptive.json

Adaptive Mode - Recon (Advanced) ⭐

Intelligence-based attack crafting:

python -m promptxploit.main \
    --mode adaptive \
    --adaptive-strategy recon \
    --probe-diversity 10 \
    --adaptive-api "YOUR_OPENAI_KEY" \
    --target my_app.py \
    --attacks attacks/ \
    --output recon.json

How recon works:

  1. Phase 1: Tests diverse attacks to learn defenses
  2. Phase 2: AI analyzes patterns and weaknesses
  3. Phase 3: Crafts brand new attacks tailored to bypass defenses
  4. Phase 4: Validates crafted attacks

Real-World Workflow

Pre-Deployment Testing

# 1. Test unprotected version
python -m promptxploit.main --target unprotected.py --attacks attacks/ --output before.json

# 2. Review vulnerabilities
cat before.json | jq '.[] | select(.verdict.verdict=="fail")'

# 3. Add PromptShield (our defense framework)
# See: https://github.com/Neural-alchemy/promptshield

# 4. Re-test protected version
python -m promptxploit.main --target protected.py --attacks attacks/ --output after.json

# 5. Verify fixes
diff before.json after.json

Monthly Security Audit

# Deep scan with adaptive recon
python -m promptxploit.main \
    --mode adaptive \
    --adaptive-strategy recon \
    --probe-diversity 15 \
    --max-iterations 5 \
    --adaptive-api $OPENAI_KEY \
    --target production.py \
    --attacks attacks/ \
    --output audit_$(date +%Y%m%d).json

Understanding Results

{
  "attack_id": "JB-003",
  "category": "jailbreak",
  "verdict": {
    "verdict": "fail",        // ⚠️ VULNERABLE
    "confidence": 0.9,
    "severity": 0.9,
    "rationale": "Successfully bypassed restrictions"
  },
  "risk": {
    "risk_score": 0.81,
    "risk_level": "critical"  // 🚨 Fix immediately
  }
}

Verdict types:

  • FAIL = Vulnerable (attack succeeded) 🚨
  • PARTIAL = Uncertain (needs review) ⚠️
  • PASS = Safe (attack blocked) βœ…

Custom Attacks

Create your own attack patterns:

[
  {
    "id": "CUSTOM-001",
    "category": "my_category",
    "description": "My custom attack",
    "prompt": "Your attack prompt here"
  }
]
python -m promptxploit.main --target X --attacks my_attacks.json --output Y

See CUSTOM_ATTACKS.md for details.


Integration with PromptShield

Perfect combo: Test with PromptXploit β†’ Fix with PromptShield

# Before: Vulnerable
def vulnerable_llm(prompt):
    return openai.chat(prompt)

# After: Protected
from promptshield import Shield
shield = Shield(level=5)

def protected_llm(prompt):
    check = shield.protect_input(prompt, "context")
    if check["blocked"]:
        return "Invalid input"
    return openai.chat(check["secured_context"])

Test again with PromptXploit β†’ Verify 100% protection βœ…


Why PromptXploit?

vs. Other Tools:

  • βœ… Most comprehensive - 147 attacks (others: ~20)
  • βœ… AI-powered adaptive - Unique recon-based intelligence
  • βœ… Framework-agnostic - Any LLM (OpenAI, Claude, local, custom)
  • βœ… Easy to extend - JSON-based attacks
  • βœ… Production-ready - JSON reporting, CI/CD integration

vs. Manual testing:

  • ⚑ Automated
  • 🎯 Comprehensive coverage
  • πŸ“Š Consistent methodology
  • πŸ” Repeatable

Responsible Use

⚠️ This is a security testing tool for authorized use only.

  • βœ… Test your own applications
  • βœ… Authorized penetration testing
  • βœ… Security research
  • ❌ Unauthorized access
  • ❌ Malicious attacks

See DISCLAIMER.md for full ethical guidelines.


Documentation


Contributing

We welcome contributions! See CONTRIBUTING.md.

Security researchers: Please follow responsible disclosure practices.


License

MIT License - see LICENSE


Citation

@software{promptxploit2024,
  title={PromptXploit: LLM Penetration Testing Framework},
  author={Neural Alchemy},
  year={2024},
  url={https://github.com/Neural-alchemy/promptxploit}
}

Built by Neural Alchemy

Test with PromptXploit | Protect with PromptShield

Website | PromptShield | Documentation

Releases

No releases published

Packages

No packages published

Languages