From 235e6795bde295d23789c111ccf31ef1e72213ee Mon Sep 17 00:00:00 2001 From: burtenshaw Date: Fri, 3 Apr 2026 08:58:22 +0200 Subject: [PATCH 1/6] add hermes, claw and opencode. use gemma 4 --- docs/hub/agents-local.md | 106 +++++++++++++++++++++++++++++++-------- 1 file changed, 86 insertions(+), 20 deletions(-) diff --git a/docs/hub/agents-local.md b/docs/hub/agents-local.md index b300a8f8df..89bc36d168 100644 --- a/docs/hub/agents-local.md +++ b/docs/hub/agents-local.md @@ -1,9 +1,6 @@ -# Local Agents with llama.cpp and Pi +# Local Agents with llama.cpp -You can run a coding agent entirely on your own hardware. [Pi](https://pi.dev) connects to a local [llama.cpp](https://github.com/ggerganov/llama.cpp) server to give you an experience similar to Claude Code or Codex — but everything runs on your machine. - -> [!TIP] -> Pi is the agent behind [OpenClaw](https://github.com/openclaw) and is now integrated directly into Hugging Face, giving you access to thousands of compatible models. +You can run a coding agent entirely on your own hardware. Several open-source agents connect to a local [llama.cpp](https://github.com/ggerganov/llama.cpp) server to give you an experience similar to Claude Code or Codex — but everything runs on your machine. ## Getting Started @@ -15,22 +12,32 @@ Go to [huggingface.co/settings/local-apps](https://huggingface.co/settings/local ### 2. Find a Compatible Model -Browse for models compatible with Pi: [huggingface.co/models?apps=pi&sort=trending](https://huggingface.co/models?apps=pi&sort=trending) +Browse for models compatible with your agent of choice: + +- [Pi-compatible models](https://huggingface.co/models?apps=pi&sort=trending) +- [OpenClaw-compatible models](https://huggingface.co/models?apps=openclaw&sort=trending) +- [Hermes-compatible models](https://huggingface.co/models?apps=hermes&sort=trending) +- [OpenCode-compatible models](https://huggingface.co/models?apps=opencode&sort=trending) ### 3. Launch the llama.cpp Server -On the model page, click the **"Use this model"** button and select `llama.cpp`. Pi will show you the exact commands for your setup. The first step is to start a llama.cpp server, e.g. +On the model page, click the **"Use this model"** button and select `llama.cpp`. It will show you the exact commands for your setup. The first step is to start a llama.cpp server, e.g. ```bash -# Start a local OpenAI-compatible server: -llama-server -hf unsloth/Qwen3.5-122B-A10B-GGUF:Q4_K_M --jinja +llama-server -hf ggml-org/gemma-4-26b-a4b-it-GGUF:Q4_K_M ``` This downloads the model and starts an OpenAI-compatible API server on your machine. See the [llama.cpp guide](./gguf-llamacpp) for installation instructions. -### 4. Install and Configure Pi +### 4. Connect Your Agent + +Pick one of the agents below and follow the setup instructions. + +## Pi -In a separate terminal, install Pi: +[Pi](https://pi.dev) is the agent behind [OpenClaw](https://github.com/openclaw) and is now integrated directly into Hugging Face, giving you access to thousands of compatible models. + +Install Pi: ```bash npm install -g @mariozechner/pi-coding-agent @@ -47,7 +54,7 @@ Then add your local model to Pi's configuration file at `~/.pi/agent/models.json "apiKey": "none", "models": [ { - "id": "Qwen3.5-122B-A10B-GGUF" + "id": "ggml-org-gemma-4--gguf" } ] } @@ -55,10 +62,6 @@ Then add your local model to Pi's configuration file at `~/.pi/agent/models.json } ``` -Update the `id` field to match the model you launched in step 3. - -### 5. Run Pi - Start Pi in your project directory: ```bash @@ -69,17 +72,80 @@ Pi connects to your local llama.cpp server and gives you an interactive agent se ![Demo](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/pi-llama-cpp-demo.gif) +## OpenClaw + +[OpenClaw](https://github.com/openclaw) works locally with llama.cpp. You can set your model via the onboard command: + +```bash +openclaw onboard --non-interactive \ + --auth-choice custom-api-key \ + --custom-base-url "http://127.0.0.1:8080/v1" \ + --custom-model-id "ggml-org-gemma-4-26b-a4b-gguf" \ + --custom-api-key "llama.cpp" \ + --secret-input-mode plaintext \ + --custom-compatibility openai \ + --accept-risk +``` + +You can also run `openclaw onboard` interactively, select `custom-compatibility` with `openai`, and pass the same configuration. + +## Hermes + +[Hermes](https://github.com/anthropics/hermes) works locally with llama.cpp. Define a default config as: + +```yaml +model: + provider: custom + default: ggml-org/gemma-4-26B-A4B-it-GGUF:Q4_K_M + base_url: http://127.0.0.1:8080/v1 + api_key: llama.cpp + +custom_providers: + - name: Local (127.0.0.1:8080) + base_url: http://127.0.0.1:8080/v1 + api_key: llama.cpp + model: ggml-org/gemma-4-26B-A4B-it-GGUF:Q4_K_M +``` + +## OpenCode + +[OpenCode](https://opencode.ai) works locally with llama.cpp. Define a `~/.config/opencode/opencode.json`: + +```json +{ + "$schema": "https://opencode.ai/config.json", + "provider": { + "llama.cpp": { + "npm": "@ai-sdk/openai-compatible", + "name": "llama-server (local)", + "options": { + "baseURL": "http://127.0.0.1:8080/v1" + }, + "models": { + "gemma-4--it": { + "name": "Gemma 4 (local)", + "limit": { + "context": 128000, + "output": 8192 + } + } + } + } + } +} +``` + ## How It Works The setup has two components running locally: 1. **llama.cpp server** — Serves the model as an OpenAI-compatible API on `localhost`. -2. **Pi** — The agent process that sends prompts to the local server, reasons about tasks, and executes actions. +2. **Your agent** — The agent process that sends prompts to the local server, reasons about tasks, and executes actions. ``` ┌─────────┐ API calls ┌──────────────────┐ -│ Pi │ ───────────────▶ │ llama.cpp server │ -│ (agent) │ ◀─────────────── │ (local model) │ +│ Agent │ ───────────────▶ │ llama.cpp server │ +│ │ ◀─────────────── │ (local model) │ └─────────┘ responses └──────────────────┘ │ ▼ @@ -100,7 +166,7 @@ cmake -B build cmake --build build --target llama-agent # Run (downloads the model automatically) -./build/bin/llama-agent -hf unsloth/GLM-4.7-Flash-GGUF:UD-Q4_K_XL +./build/bin/llama-agent -hf ggml-org/gemma-4-26b-a4b-it-GGUF:Q4_K_M ``` Because tool calls happen in-process rather than over HTTP, there is no network overhead between the model and the agent. It also supports subagents, MCP servers, and an HTTP API server mode. From eb606cd06d8e70da36a5b9ffb016ccea3b272866 Mon Sep 17 00:00:00 2001 From: burtenshaw Date: Fri, 3 Apr 2026 09:01:28 +0200 Subject: [PATCH 2/6] focus on llama cpp --- docs/hub/agents-local.md | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/docs/hub/agents-local.md b/docs/hub/agents-local.md index 89bc36d168..7a9c3517c2 100644 --- a/docs/hub/agents-local.md +++ b/docs/hub/agents-local.md @@ -14,10 +14,8 @@ Go to [huggingface.co/settings/local-apps](https://huggingface.co/settings/local Browse for models compatible with your agent of choice: -- [Pi-compatible models](https://huggingface.co/models?apps=pi&sort=trending) -- [OpenClaw-compatible models](https://huggingface.co/models?apps=openclaw&sort=trending) -- [Hermes-compatible models](https://huggingface.co/models?apps=hermes&sort=trending) -- [OpenCode-compatible models](https://huggingface.co/models?apps=opencode&sort=trending) +- [Llama.cpp-compatible models](https://huggingface.co/models?apps=llama.cpp&sort=trending) + ### 3. Launch the llama.cpp Server From a48d864b91be8825265b7e2d233beb8f11657b0d Mon Sep 17 00:00:00 2001 From: burtenshaw Date: Fri, 3 Apr 2026 09:03:35 +0200 Subject: [PATCH 3/6] use actual sizes --- docs/hub/agents-local.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/hub/agents-local.md b/docs/hub/agents-local.md index 7a9c3517c2..0f9cd26a51 100644 --- a/docs/hub/agents-local.md +++ b/docs/hub/agents-local.md @@ -22,7 +22,7 @@ Browse for models compatible with your agent of choice: On the model page, click the **"Use this model"** button and select `llama.cpp`. It will show you the exact commands for your setup. The first step is to start a llama.cpp server, e.g. ```bash -llama-server -hf ggml-org/gemma-4-26b-a4b-it-GGUF:Q4_K_M +llama-server -hf ggml-org/gemma-4-26b-a4b-it-GGUF:Q4_K_M --jinja ``` This downloads the model and starts an OpenAI-compatible API server on your machine. See the [llama.cpp guide](./gguf-llamacpp) for installation instructions. @@ -52,7 +52,7 @@ Then add your local model to Pi's configuration file at `~/.pi/agent/models.json "apiKey": "none", "models": [ { - "id": "ggml-org-gemma-4--gguf" + "id": "ggml-org-gemma-4-26b-4b-gguf" } ] } @@ -120,7 +120,7 @@ custom_providers: "baseURL": "http://127.0.0.1:8080/v1" }, "models": { - "gemma-4--it": { + "gemma-4-26b-4b-it": { "name": "Gemma 4 (local)", "limit": { "context": 128000, From 83e42e938b8cfa1e037bb0416500f70006f5c16a Mon Sep 17 00:00:00 2001 From: burtenshaw Date: Fri, 3 Apr 2026 09:04:27 +0200 Subject: [PATCH 4/6] tidy --- docs/hub/agents-local.md | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/docs/hub/agents-local.md b/docs/hub/agents-local.md index 0f9cd26a51..f0ca306dd1 100644 --- a/docs/hub/agents-local.md +++ b/docs/hub/agents-local.md @@ -12,9 +12,7 @@ Go to [huggingface.co/settings/local-apps](https://huggingface.co/settings/local ### 2. Find a Compatible Model -Browse for models compatible with your agent of choice: - -- [Llama.cpp-compatible models](https://huggingface.co/models?apps=llama.cpp&sort=trending) +Browse for [Llama.cpp-compatible models](https://huggingface.co/models?apps=llama.cpp&sort=trending). ### 3. Launch the llama.cpp Server From d7f342799e01679a534fbceb758f9926819f116d Mon Sep 17 00:00:00 2001 From: burtenshaw Date: Fri, 3 Apr 2026 20:06:06 +0200 Subject: [PATCH 5/6] Update docs/hub/agents-local.md Co-authored-by: Pedro Cuenca --- docs/hub/agents-local.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/hub/agents-local.md b/docs/hub/agents-local.md index f0ca306dd1..55dfe34818 100644 --- a/docs/hub/agents-local.md +++ b/docs/hub/agents-local.md @@ -1,6 +1,6 @@ # Local Agents with llama.cpp -You can run a coding agent entirely on your own hardware. Several open-source agents connect to a local [llama.cpp](https://github.com/ggerganov/llama.cpp) server to give you an experience similar to Claude Code or Codex — but everything runs on your machine. +You can run a coding agent entirely on your own hardware. Several open-source agents can connect to a local [llama.cpp](https://github.com/ggerganov/llama.cpp) server to give you an experience similar to Claude Code or Codex — but everything runs on your machine. ## Getting Started From dccb1dc4e45c98ede6abf4eb8aa2f8e330ef2945 Mon Sep 17 00:00:00 2001 From: burtenshaw Date: Fri, 3 Apr 2026 20:06:13 +0200 Subject: [PATCH 6/6] Update docs/hub/agents-local.md Co-authored-by: Pedro Cuenca --- docs/hub/agents-local.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/hub/agents-local.md b/docs/hub/agents-local.md index 55dfe34818..710da60db3 100644 --- a/docs/hub/agents-local.md +++ b/docs/hub/agents-local.md @@ -87,7 +87,7 @@ You can also run `openclaw onboard` interactively, select `custom-compatibility` ## Hermes -[Hermes](https://github.com/anthropics/hermes) works locally with llama.cpp. Define a default config as: +[Hermes](https://hermes-agent.nousresearch.com/) works locally with llama.cpp. Define a default config as: ```yaml model: