Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 8 additions & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,23 +18,26 @@ Thank you for your interest in contributing to Deadend CLI! This document provid

- **Python 3.11+** required
- **Docker** - Required for running the pgvector database and sandbox execution
- **uv** - Package manager for dependency management
- **uv >= 0.5.30** - Package manager for dependency management
- **Playwright** - For browser automation

### Setting Up Your Development Environment

1. **Fork and clone the repository**:

```bash
git clone https://github.com/<your-username>/deadend-cli.git
cd deadend-cli
```

2. **Install dependencies**:

```bash
uv sync
```

3. **Install Playwright browsers**:

```bash
pipx install pytest-playwright
playwright install
Expand Down Expand Up @@ -140,7 +143,6 @@ class AgentOutput(BaseModel):
updated_state: dict[str, Any] | None = None
```


### Conventions Summary

- **Confidence scores**: Always 0.0 to 1.0 (float), not percentages
Expand Down Expand Up @@ -204,13 +206,15 @@ async def test_my_async_function():
### Pull Request Process

1. **Create a branch**:

```bash
git checkout -b feature/your-feature-name
```

2. **Make your changes** following the code style guidelines

3. **Run tests and formatting**:

```bash
black .
isort .
Expand All @@ -219,12 +223,14 @@ async def test_my_async_function():
```

4. **Commit your changes**:

```bash
git add .
git commit -m "Add: brief description of changes"
```

5. **Push and create a PR**:

```bash
git push origin feature/your-feature-name
```
Expand Down
35 changes: 28 additions & 7 deletions deadend_cli/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ Achieves ~78% on XBOW benchmarks with fully local execution and model-agnostic a
Deadend CLI is an autonomous web application penetration testing agent that uses feedback-driven iteration to adapt exploitation strategies. When standard tools fail, it generates custom Python payloads, observes responses, and iteratively refines its approach until breakthrough.

**Key features:**

- Fully local execution (no cloud dependencies, zero data exfiltration)
- Model-agnostic design (works with any deployable LLM)
- Custom sandboxed tools (Playwright, Docker, WebAssembly)
Expand Down Expand Up @@ -51,11 +52,14 @@ The framework focuses on **intelligent security analysis** through:
## Quick Start

### Prerequisites

- Docker (required)
- Python 3.11+
- uv >= 0.5.30
- Playwright: `playwright install`

### Installation

```bash
# Install via pipx (recommended)
pipx install deadend_cli
Expand All @@ -67,6 +71,7 @@ uv sync && uv build
```

### First Run

```bash
# Initialize configuration
deadend-cli init
Expand All @@ -82,6 +87,7 @@ deadend-cli chat \
## Usage Examples

### Basic Vulnerability Testing

```bash
# Test OWASP Juice Shop
docker run -p 3000:3000 bkimminich/juice-shop
Expand All @@ -92,13 +98,15 @@ deadend-cli chat \
```

### API Security Testing

```bash
deadend-cli chat \
--target "https://api.example.com" \
--prompt "test authentication endpoints"
```

### Autonomous Mode

```bash
# Run without approval prompts (CTFs/labs only)
deadend-cli chat \
Expand All @@ -112,21 +120,27 @@ deadend-cli chat \
## Commands

### `deadend-cli init`

Initialize configuration and set up pgvector database

### `deadend-cli chat`

Start interactive security testing session

- `--target`: Target URL
- `--prompt`: Initial testing prompt
- `--mode`: `hacker` (approval required) or `yolo` (autonomous)

### `deadend-cli eval-agent`

Run evaluation against challenge datasets

- `--eval-metadata-file`: Challenge dataset file
- `--llm-providers`: AI model providers to test
- `--guided`: Run with subtask decomposition

### `deadend-cli version`

Display current version

---
Expand All @@ -149,12 +163,12 @@ The agent uses a two-phase approach (reconnaissance → exploitation) with a sup

Evaluated on XBOW's 104-challenge validation suite (black-box mode, January 2026):

| Agent | Success Rate | Infrastructure | Blind SQLi |
|-------|-------------|----------------|------------|
| XBOW (proprietary) | 85% | Proprietary | ? |
| Cyber-AutoAgent | 81% | AWS Bedrock | 0% |
| **Deadend CLI** | **78%** | **Fully local** | **33%** |
| MAPTA | 76.9% | External APIs | 0% |
| Agent | Success Rate | Infrastructure | Blind SQLi |
| ------------------ | ------------ | --------------- | ---------- |
| XBOW (proprietary) | 85% | Proprietary | ? |
| Cyber-AutoAgent | 81% | AWS Bedrock | 0% |
| **Deadend CLI** | **78%** | **Fully local** | **33%** |
| MAPTA | 76.9% | External APIs | 0% |

**Models tested:** Claude Sonnet 4.5 (~78%), Kimi K2 Thinking (~69%)

Expand All @@ -166,11 +180,13 @@ Perfect scores: GraphQL, SSRF, NoSQL injection, HTTP method tampering (100%)
## Operating Modes

**Hacker Mode (default):** Requires approval for dangerous operations

```bash
deadend-cli chat --target URL --mode hacker
```

**YOLO Mode:** Autonomous execution (CTFs/labs only)

```bash
deadend-cli chat --target URL --mode yolo
```
Expand All @@ -197,23 +213,26 @@ Configuration is managed via `~/.cache/deadend/config.toml`. Run `deadend-cli in
## Current Status & Roadmap

### Stable (v0.0.15)

✅ New architecture
✅ XBOW benchmark evaluation (78%)
✅ Custom sandboxed tools
✅ Multi-model support with liteLLM
✅ Two-phase execution (recon + exploitation)

### In Progress (v0.1.0)

🚧 **CLI Redesign** with enhanced workflows:

- Plan mode (review strategies before execution)
- Preset configuration workflows (API testing, web apps, auth bypass)
- Workflow automation (save/replay attack chains)

🚧 Context optimization (reduce redundant tool calls)
🚧 Secrets management improvements


### Future roadmap

The current architecture proves competitive autonomous pentesting (78%) is achievable without cloud dependencies. Next challenges:

- **Open-Source Models**: Achieve 75%+ with Llama/Qwen (eliminate proprietary dependencies)
Expand All @@ -229,6 +248,7 @@ Goal: Make autonomous pentesting accessible (open models), comprehensive (hybrid
## Contributing

Contributions welcome in:

- Context optimization algorithms
- Vulnerability test cases
- Open-weight model fine-tuning
Expand All @@ -239,6 +259,7 @@ See [CONTRIBUTING.md](../CONTRIBUTING.md) for guidelines on how to contribute.
---

## Citation

```bibtex
@software{deadend_cli_2026,
author = {Yassine Bargach},
Expand Down
44 changes: 31 additions & 13 deletions deadend_cli/deadend_agent/src/deadend_agent/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@

from pathlib import Path
import hashlib
import platform
import subprocess
import requests
from deadend_agent.config.settings import Config
Expand All @@ -19,12 +20,29 @@
from deadend_agent.rag.db_cruds import RetrievalDatabaseConnector


PYTHON_SANDBOX_NAME = "python-sandbox-tool-linux"
SIMPLE_PYTHON_SANDBOX_URL = (
"https://github.com/xoxruns/simple-python-interpreter-sandbox/"
"releases/download/v0.0.3/python-sandbox-tool-linux"
)
PYTHON_SANDBOX_SHA256 = "74b8a80709a912028600f39b9953889c011278a80acf066af5bd6979366455f4"
# Platform-specific sandbox binary configurations
SANDBOX_CONFIGS = {
"Linux": {
"name": "python-sandbox-tool-linux",
"url": "https://github.com/xoxruns/simple-python-interpreter-sandbox/releases/download/v0.0.3/python-sandbox-tool-linux",
"sha256": "74b8a80709a912028600f39b9953889c011278a80acf066af5bd6979366455f4",
},
"Darwin": {
"name": "python-sandbox-tool-macos",
"url": "https://github.com/xoxruns/simple-python-interpreter-sandbox/releases/download/v0.0.3/python-sandbox-tool-macos",
"sha256": "9dc49652b1314978544e3e56eef67610d10a2fbb51ecaf06bc10f9c27ad75d7c",
},
}


def get_sandbox_config():
"""Get the sandbox configuration for the current platform."""
system = platform.system()
if system not in SANDBOX_CONFIGS:
raise RuntimeError(
f"Unsupported platform: {system}. Supported platforms: {', '.join(SANDBOX_CONFIGS.keys())}"
)
return SANDBOX_CONFIGS[system]

def config_setup() -> Config:
"""Setup config"""
Expand All @@ -46,9 +64,10 @@ def sandbox_setup() -> SandboxManager:
sandbox_manager = SandboxManager()
return sandbox_manager

def setup_model_registry(config: Config) -> ModelRegistry:
async def setup_model_registry(config: Config) -> ModelRegistry:
"""Setup Model registry"""
model_registry = ModelRegistry(config=config)
await model_registry.initialize()
return model_registry

def _file_matches_sha256(path: Path, expected_hash: str) -> bool:
Expand All @@ -65,36 +84,35 @@ def _file_matches_sha256(path: Path, expected_hash: str) -> bool:

def download_python_sandbox(
destination_dir: Path | None = None,
expected_sha256: str = PYTHON_SANDBOX_SHA256,
) -> Path:
"""Download the Python sandbox binary to the local cache if missing or outdated.

Args:
destination_dir: Optional directory to store the sandbox binary. Defaults
to ~/.cache/deadend/python/.
expected_sha256: Expected SHA-256 checksum of the binary.

Returns:
Path to the downloaded (or existing) sandbox binary.
"""
config = get_sandbox_config()
cache_dir = destination_dir or Path.home() / ".cache" / "deadend" / "python"
cache_dir.mkdir(parents=True, exist_ok=True)
sandbox_path = cache_dir / PYTHON_SANDBOX_NAME
sandbox_path = cache_dir / config["name"]

if _file_matches_sha256(sandbox_path, expected_sha256):
if _file_matches_sha256(sandbox_path, config["sha256"]):
return sandbox_path

if sandbox_path.exists():
sandbox_path.unlink()

response = requests.get(SIMPLE_PYTHON_SANDBOX_URL, stream=True, timeout=120)
response = requests.get(config["url"], stream=True, timeout=120)
response.raise_for_status()
with open(sandbox_path, "wb") as fd:
for chunk in response.iter_content(chunk_size=8192):
if chunk:
fd.write(chunk)

if not _file_matches_sha256(sandbox_path, expected_sha256):
if not _file_matches_sha256(sandbox_path, config["sha256"]):
sandbox_path.unlink(missing_ok=True)
raise RuntimeError(
"Downloaded Python sandbox binary failed checksum verification."
Expand Down
Loading