huggingface · burtenshaw · Apr 3, 2026 · Apr 3, 2026 · Apr 3, 2026 · Apr 3, 2026
diff --git a/docs/hub/agents-local.md b/docs/hub/agents-local.md
@@ -1,9 +1,6 @@
-# Local Agents with llama.cpp and Pi
+# Local Agents with llama.cpp
 
-You can run a coding agent entirely on your own hardware. [Pi](https://pi.dev) connects to a local [llama.cpp](https://github.com/ggerganov/llama.cpp) server to give you an experience similar to Claude Code or Codex — but everything runs on your machine.
-
-> [!TIP]
-> Pi is the agent behind [OpenClaw](https://github.com/openclaw) and is now integrated directly into Hugging Face, giving you access to thousands of compatible models.
+You can run a coding agent entirely on your own hardware. Several open-source agents can connect to a local [llama.cpp](https://github.com/ggerganov/llama.cpp) server to give you an experience similar to Claude Code or Codex — but everything runs on your machine.
 
 ## Getting Started
 
@@ -15,22 +12,28 @@ Go to [huggingface.co/settings/local-apps](https://huggingface.co/settings/local
 
 ### 2. Find a Compatible Model
 
-Browse for models compatible with Pi: [huggingface.co/models?apps=pi&sort=trending](https://huggingface.co/models?apps=pi&sort=trending)
+Browse for [Llama.cpp-compatible models](https://huggingface.co/models?apps=llama.cpp&sort=trending).
+
 
 ### 3. Launch the llama.cpp Server
 
-On the model page, click the **"Use this model"** button and select `llama.cpp`. Pi will show you the exact commands for your setup. The first step is to start a llama.cpp server, e.g.
+On the model page, click the **"Use this model"** button and select `llama.cpp`. It will show you the exact commands for your setup. The first step is to start a llama.cpp server, e.g.
 
 ```bash
-# Start a local OpenAI-compatible server:
-llama-server -hf unsloth/Qwen3.5-122B-A10B-GGUF:Q4_K_M --jinja
+llama-server -hf ggml-org/gemma-4-26b-a4b-it-GGUF:Q4_K_M --jinja
 ```
 
 This downloads the model and starts an OpenAI-compatible API server on your machine. See the [llama.cpp guide](./gguf-llamacpp) for installation instructions.
 
-### 4. Install and Configure Pi
+### 4. Connect Your Agent
+
+Pick one of the agents below and follow the setup instructions.
+
+## Pi
 
-In a separate terminal, install Pi:
+[Pi](https://pi.dev) is the agent behind [OpenClaw](https://github.com/openclaw) and is now integrated directly into Hugging Face, giving you access to thousands of compatible models.
+
+Install Pi:
 
 ```bash
 npm install -g @mariozechner/pi-coding-agent
@@ -47,18 +50,14 @@ Then add your local model to Pi's configuration file at `~/.pi/agent/models.json
       "apiKey": "none",
       "models": [
         {
-          "id": "Qwen3.5-122B-A10B-GGUF"
+          "id": "ggml-org-gemma-4-26b-4b-gguf"
         }
       ]
     }
   }
 }
 ```
 
-Update the `id` field to match the model you launched in step 3.
-
-### 5. Run Pi
-
 Start Pi in your project directory:
 
 ```bash
@@ -69,17 +68,80 @@ Pi connects to your local llama.cpp server and gives you an interactive agent se
 
 ![Demo](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/pi-llama-cpp-demo.gif)
 
+## OpenClaw
+
+[OpenClaw](https://github.com/openclaw) works locally with llama.cpp. You can set your model via the onboard command:
+
+```bash
+openclaw onboard --non-interactive \
+  --auth-choice custom-api-key \
+  --custom-base-url "http://127.0.0.1:8080/v1" \
+  --custom-model-id "ggml-org-gemma-4-26b-a4b-gguf" \
+  --custom-api-key "llama.cpp" \
+  --secret-input-mode plaintext \
+  --custom-compatibility openai \
+  --accept-risk
+```
+
+You can also run `openclaw onboard` interactively, select `custom-compatibility` with `openai`, and pass the same configuration.
+
+## Hermes
+
+[Hermes](https://hermes-agent.nousresearch.com/) works locally with llama.cpp. Define a default config as:
+
+```yaml
+model:
+  provider: custom
+  default: ggml-org/gemma-4-26B-A4B-it-GGUF:Q4_K_M
+  base_url: http://127.0.0.1:8080/v1
+  api_key: llama.cpp
+
+custom_providers:
+  - name: Local (127.0.0.1:8080)
+    base_url: http://127.0.0.1:8080/v1
+    api_key: llama.cpp
+    model: ggml-org/gemma-4-26B-A4B-it-GGUF:Q4_K_M
+```
+
+## OpenCode
+
+[OpenCode](https://opencode.ai) works locally with llama.cpp. Define a `~/.config/opencode/opencode.json`:
+
+```json
+{
+  "$schema": "https://opencode.ai/config.json",
+  "provider": {
+    "llama.cpp": {
+      "npm": "@ai-sdk/openai-compatible",
+      "name": "llama-server (local)",
+      "options": {
+        "baseURL": "http://127.0.0.1:8080/v1"
+      },
+      "models": {
+        "gemma-4-26b-4b-it": {
+          "name": "Gemma 4 (local)",
+          "limit": {
+            "context": 128000,
+            "output": 8192
+          }
+        }
+      }
+    }
+  }
+}
+```
+
 ## How It Works
 
 The setup has two components running locally:
 
 1. **llama.cpp server** — Serves the model as an OpenAI-compatible API on `localhost`.
-2. **Pi** — The agent process that sends prompts to the local server, reasons about tasks, and executes actions.
+2. **Your agent** — The agent process that sends prompts to the local server, reasons about tasks, and executes actions.
 
 ```
 ┌─────────┐     API calls     ┌──────────────────┐
-│   Pi    │ ───────────────▶  │  llama.cpp server │
-│ (agent) │ ◀───────────────  │  (local model)    │
+│  Agent  │ ───────────────▶  │  llama.cpp server │
+│         │ ◀───────────────  │  (local model)    │
 └─────────┘    responses      └──────────────────┘
      │
      ▼
@@ -100,7 +162,7 @@ cmake -B build
 cmake --build build --target llama-agent
 
 # Run (downloads the model automatically)
-./build/bin/llama-agent -hf unsloth/GLM-4.7-Flash-GGUF:UD-Q4_K_XL
+./build/bin/llama-agent -hf ggml-org/gemma-4-26b-a4b-it-GGUF:Q4_K_M
 ```
 
 Because tool calls happen in-process rather than over HTTP, there is no network overhead between the model and the agent. It also supports subagents, MCP servers, and an HTTP API server mode.