xoxruns · ldemesla · Jan 27, 2026 · Jan 27, 2026 · Jan 27, 2026 · Jan 27, 2026
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -18,23 +18,26 @@ Thank you for your interest in contributing to Deadend CLI! This document provid
 
 - **Python 3.11+** required
 - **Docker** - Required for running the pgvector database and sandbox execution
-- **uv** - Package manager for dependency management
+- **uv >= 0.5.30** - Package manager for dependency management
 - **Playwright** - For browser automation
 
 ### Setting Up Your Development Environment
 
 1. **Fork and clone the repository**:
+
    ```bash
    git clone https://github.com/<your-username>/deadend-cli.git
    cd deadend-cli
    ```
 
 2. **Install dependencies**:
+
    ```bash
    uv sync
    ```
 
 3. **Install Playwright browsers**:
+
    ```bash
    pipx install pytest-playwright
    playwright install
@@ -140,7 +143,6 @@ class AgentOutput(BaseModel):
     updated_state: dict[str, Any] | None = None
 ```
 
-
 ### Conventions Summary
 
 - **Confidence scores**: Always 0.0 to 1.0 (float), not percentages
@@ -204,13 +206,15 @@ async def test_my_async_function():
 ### Pull Request Process
 
 1. **Create a branch**:
+
    ```bash
    git checkout -b feature/your-feature-name
    ```
 
 2. **Make your changes** following the code style guidelines
 
 3. **Run tests and formatting**:
+
    ```bash
    black .
    isort .
@@ -219,12 +223,14 @@ async def test_my_async_function():
    ```
 
 4. **Commit your changes**:
+
    ```bash
    git add .
    git commit -m "Add: brief description of changes"
    ```
 
 5. **Push and create a PR**:
+
    ```bash
    git push origin feature/your-feature-name
    ```

diff --git a/deadend_cli/README.md b/deadend_cli/README.md
@@ -15,6 +15,7 @@ Achieves ~78% on XBOW benchmarks with fully local execution and model-agnostic a
 Deadend CLI is an autonomous web application penetration testing agent that uses feedback-driven iteration to adapt exploitation strategies. When standard tools fail, it generates custom Python payloads, observes responses, and iteratively refines its approach until breakthrough.
 
 **Key features:**
+
 - Fully local execution (no cloud dependencies, zero data exfiltration)
 - Model-agnostic design (works with any deployable LLM)
 - Custom sandboxed tools (Playwright, Docker, WebAssembly)
@@ -51,11 +52,14 @@ The framework focuses on **intelligent security analysis** through:
 ## Quick Start
 
 ### Prerequisites
+
 - Docker (required)
 - Python 3.11+
+- uv >= 0.5.30
 - Playwright: `playwright install`
 
 ### Installation
+
 ```bash
 # Install via pipx (recommended)
 pipx install deadend_cli
@@ -67,6 +71,7 @@ uv sync && uv build
 ```
 
 ### First Run
+
 ```bash
 # Initialize configuration
 deadend-cli init
@@ -82,6 +87,7 @@ deadend-cli chat \
 ## Usage Examples
 
 ### Basic Vulnerability Testing
+
 ```bash
 # Test OWASP Juice Shop
 docker run -p 3000:3000 bkimminich/juice-shop
@@ -92,13 +98,15 @@ deadend-cli chat \
 ```
 
 ### API Security Testing
+
 ```bash
 deadend-cli chat \
   --target "https://api.example.com" \
   --prompt "test authentication endpoints"
 ```
 
 ### Autonomous Mode
+
 ```bash
 # Run without approval prompts (CTFs/labs only)
 deadend-cli chat \
@@ -112,21 +120,27 @@ deadend-cli chat \
 ## Commands
 
 ### `deadend-cli init`
+
 Initialize configuration and set up pgvector database
 
 ### `deadend-cli chat`
+
 Start interactive security testing session
+
 - `--target`: Target URL
 - `--prompt`: Initial testing prompt
 - `--mode`: `hacker` (approval required) or `yolo` (autonomous)
 
 ### `deadend-cli eval-agent`
+
 Run evaluation against challenge datasets
+
 - `--eval-metadata-file`: Challenge dataset file
 - `--llm-providers`: AI model providers to test
 - `--guided`: Run with subtask decomposition
 
 ### `deadend-cli version`
+
 Display current version
 
 ---
@@ -149,12 +163,12 @@ The agent uses a two-phase approach (reconnaissance → exploitation) with a sup
 
 Evaluated on XBOW's 104-challenge validation suite (black-box mode, January 2026):
 
-| Agent | Success Rate | Infrastructure | Blind SQLi |
-|-------|-------------|----------------|------------|
-| XBOW (proprietary) | 85% | Proprietary | ? |
-| Cyber-AutoAgent | 81% | AWS Bedrock | 0% |
-| **Deadend CLI** | **78%** | **Fully local** | **33%** |
-| MAPTA | 76.9% | External APIs | 0% |
+| Agent              | Success Rate | Infrastructure  | Blind SQLi |
+| ------------------ | ------------ | --------------- | ---------- |
+| XBOW (proprietary) | 85%          | Proprietary     | ?          |
+| Cyber-AutoAgent    | 81%          | AWS Bedrock     | 0%         |
+| **Deadend CLI**    | **78%**      | **Fully local** | **33%**    |
+| MAPTA              | 76.9%        | External APIs   | 0%         |
 
 **Models tested:** Claude Sonnet 4.5 (~78%), Kimi K2 Thinking (~69%)
 
@@ -166,11 +180,13 @@ Perfect scores: GraphQL, SSRF, NoSQL injection, HTTP method tampering (100%)
 ## Operating Modes
 
 **Hacker Mode (default):** Requires approval for dangerous operations
+
 ```bash
 deadend-cli chat --target URL --mode hacker
 ```
 
 **YOLO Mode:** Autonomous execution (CTFs/labs only)
+
 ```bash
 deadend-cli chat --target URL --mode yolo
 ```
@@ -197,23 +213,26 @@ Configuration is managed via `~/.cache/deadend/config.toml`. Run `deadend-cli in
 ## Current Status & Roadmap
 
 ### Stable (v0.0.15)
+
 ✅ New architecture
 ✅ XBOW benchmark evaluation (78%)
 ✅ Custom sandboxed tools
 ✅ Multi-model support with liteLLM
 ✅ Two-phase execution (recon + exploitation)
 
 ### In Progress (v0.1.0)
+
 🚧 **CLI Redesign** with enhanced workflows:
+
 - Plan mode (review strategies before execution)
 - Preset configuration workflows (API testing, web apps, auth bypass)
 - Workflow automation (save/replay attack chains)
 
 🚧 Context optimization (reduce redundant tool calls)
 🚧 Secrets management improvements
 
-
 ### Future roadmap
+
 The current architecture proves competitive autonomous pentesting (78%) is achievable without cloud dependencies. Next challenges:
 
 - **Open-Source Models**: Achieve 75%+ with Llama/Qwen (eliminate proprietary dependencies)
@@ -229,6 +248,7 @@ Goal: Make autonomous pentesting accessible (open models), comprehensive (hybrid
 ## Contributing
 
 Contributions welcome in:
+
 - Context optimization algorithms
 - Vulnerability test cases
 - Open-weight model fine-tuning
@@ -239,6 +259,7 @@ See [CONTRIBUTING.md](../CONTRIBUTING.md) for guidelines on how to contribute.
 ---
 
 ## Citation
+
 ```bibtex
 @software{deadend_cli_2026,
   author = {Yassine Bargach},

diff --git a/deadend_cli/deadend_agent/src/deadend_agent/core.py b/deadend_cli/deadend_agent/src/deadend_agent/core.py
@@ -11,6 +11,7 @@
 
 from pathlib import Path
 import hashlib
+import platform
 import subprocess
 import requests
 from deadend_agent.config.settings import Config
@@ -19,12 +20,29 @@
 from deadend_agent.rag.db_cruds import RetrievalDatabaseConnector
 
 
-PYTHON_SANDBOX_NAME = "python-sandbox-tool-linux"
-SIMPLE_PYTHON_SANDBOX_URL = (
-    "https://github.com/xoxruns/simple-python-interpreter-sandbox/"
-    "releases/download/v0.0.3/python-sandbox-tool-linux"
-)
-PYTHON_SANDBOX_SHA256 = "74b8a80709a912028600f39b9953889c011278a80acf066af5bd6979366455f4"
+# Platform-specific sandbox binary configurations
+SANDBOX_CONFIGS = {
+    "Linux": {
+        "name": "python-sandbox-tool-linux",
+        "url": "https://github.com/xoxruns/simple-python-interpreter-sandbox/releases/download/v0.0.3/python-sandbox-tool-linux",
+        "sha256": "74b8a80709a912028600f39b9953889c011278a80acf066af5bd6979366455f4",
+    },
+    "Darwin": {
+        "name": "python-sandbox-tool-macos",
+        "url": "https://github.com/xoxruns/simple-python-interpreter-sandbox/releases/download/v0.0.3/python-sandbox-tool-macos",
+        "sha256": "9dc49652b1314978544e3e56eef67610d10a2fbb51ecaf06bc10f9c27ad75d7c",
+    },
+}
+
+
+def get_sandbox_config():
+    """Get the sandbox configuration for the current platform."""
+    system = platform.system()
+    if system not in SANDBOX_CONFIGS:
+        raise RuntimeError(
+            f"Unsupported platform: {system}. Supported platforms: {', '.join(SANDBOX_CONFIGS.keys())}"
+        )
+    return SANDBOX_CONFIGS[system]
 
 def config_setup() -> Config:
     """Setup config"""
@@ -46,9 +64,10 @@ def sandbox_setup() -> SandboxManager:
     sandbox_manager = SandboxManager()
     return sandbox_manager
 
-def setup_model_registry(config: Config) -> ModelRegistry:
+async def setup_model_registry(config: Config) -> ModelRegistry:
     """Setup Model registry"""
     model_registry = ModelRegistry(config=config)
+    await model_registry.initialize()
     return model_registry
 
 def _file_matches_sha256(path: Path, expected_hash: str) -> bool:
@@ -65,36 +84,35 @@ def _file_matches_sha256(path: Path, expected_hash: str) -> bool:
 
 def download_python_sandbox(
     destination_dir: Path | None = None,
-    expected_sha256: str = PYTHON_SANDBOX_SHA256,
 ) -> Path:
     """Download the Python sandbox binary to the local cache if missing or outdated.
 
     Args:
         destination_dir: Optional directory to store the sandbox binary. Defaults
             to ~/.cache/deadend/python/.
-        expected_sha256: Expected SHA-256 checksum of the binary.
 
     Returns:
         Path to the downloaded (or existing) sandbox binary.
     """
+    config = get_sandbox_config()
     cache_dir = destination_dir or Path.home() / ".cache" / "deadend" / "python"
     cache_dir.mkdir(parents=True, exist_ok=True)
-    sandbox_path = cache_dir / PYTHON_SANDBOX_NAME
+    sandbox_path = cache_dir / config["name"]
 
-    if _file_matches_sha256(sandbox_path, expected_sha256):
+    if _file_matches_sha256(sandbox_path, config["sha256"]):
         return sandbox_path
 
     if sandbox_path.exists():
         sandbox_path.unlink()
 
-    response = requests.get(SIMPLE_PYTHON_SANDBOX_URL, stream=True, timeout=120)
+    response = requests.get(config["url"], stream=True, timeout=120)
     response.raise_for_status()
     with open(sandbox_path, "wb") as fd:
         for chunk in response.iter_content(chunk_size=8192):
             if chunk:
                 fd.write(chunk)
 
-    if not _file_matches_sha256(sandbox_path, expected_sha256):
+    if not _file_matches_sha256(sandbox_path, config["sha256"]):
         sandbox_path.unlink(missing_ok=True)
         raise RuntimeError(
             "Downloaded Python sandbox binary failed checksum verification."