diff --git a/docs/hub/agents-local.md b/docs/hub/agents-local.md index b300a8f8d..710da60db 100644 --- a/docs/hub/agents-local.md +++ b/docs/hub/agents-local.md @@ -1,9 +1,6 @@ -# Local Agents with llama.cpp and Pi +# Local Agents with llama.cpp -You can run a coding agent entirely on your own hardware. [Pi](https://pi.dev) connects to a local [llama.cpp](https://github.com/ggerganov/llama.cpp) server to give you an experience similar to Claude Code or Codex — but everything runs on your machine. - -> [!TIP] -> Pi is the agent behind [OpenClaw](https://github.com/openclaw) and is now integrated directly into Hugging Face, giving you access to thousands of compatible models. +You can run a coding agent entirely on your own hardware. Several open-source agents can connect to a local [llama.cpp](https://github.com/ggerganov/llama.cpp) server to give you an experience similar to Claude Code or Codex — but everything runs on your machine. ## Getting Started @@ -15,22 +12,28 @@ Go to [huggingface.co/settings/local-apps](https://huggingface.co/settings/local ### 2. Find a Compatible Model -Browse for models compatible with Pi: [huggingface.co/models?apps=pi&sort=trending](https://huggingface.co/models?apps=pi&sort=trending) +Browse for [Llama.cpp-compatible models](https://huggingface.co/models?apps=llama.cpp&sort=trending). + ### 3. Launch the llama.cpp Server -On the model page, click the **"Use this model"** button and select `llama.cpp`. Pi will show you the exact commands for your setup. The first step is to start a llama.cpp server, e.g. +On the model page, click the **"Use this model"** button and select `llama.cpp`. It will show you the exact commands for your setup. The first step is to start a llama.cpp server, e.g. ```bash -# Start a local OpenAI-compatible server: -llama-server -hf unsloth/Qwen3.5-122B-A10B-GGUF:Q4_K_M --jinja +llama-server -hf ggml-org/gemma-4-26b-a4b-it-GGUF:Q4_K_M --jinja ``` This downloads the model and starts an OpenAI-compatible API server on your machine. See the [llama.cpp guide](./gguf-llamacpp) for installation instructions. -### 4. Install and Configure Pi +### 4. Connect Your Agent + +Pick one of the agents below and follow the setup instructions. + +## Pi -In a separate terminal, install Pi: +[Pi](https://pi.dev) is the agent behind [OpenClaw](https://github.com/openclaw) and is now integrated directly into Hugging Face, giving you access to thousands of compatible models. + +Install Pi: ```bash npm install -g @mariozechner/pi-coding-agent @@ -47,7 +50,7 @@ Then add your local model to Pi's configuration file at `~/.pi/agent/models.json "apiKey": "none", "models": [ { - "id": "Qwen3.5-122B-A10B-GGUF" + "id": "ggml-org-gemma-4-26b-4b-gguf" } ] } @@ -55,10 +58,6 @@ Then add your local model to Pi's configuration file at `~/.pi/agent/models.json } ``` -Update the `id` field to match the model you launched in step 3. - -### 5. Run Pi - Start Pi in your project directory: ```bash @@ -69,17 +68,80 @@ Pi connects to your local llama.cpp server and gives you an interactive agent se ![Demo](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/pi-llama-cpp-demo.gif) +## OpenClaw + +[OpenClaw](https://github.com/openclaw) works locally with llama.cpp. You can set your model via the onboard command: + +```bash +openclaw onboard --non-interactive \ + --auth-choice custom-api-key \ + --custom-base-url "http://127.0.0.1:8080/v1" \ + --custom-model-id "ggml-org-gemma-4-26b-a4b-gguf" \ + --custom-api-key "llama.cpp" \ + --secret-input-mode plaintext \ + --custom-compatibility openai \ + --accept-risk +``` + +You can also run `openclaw onboard` interactively, select `custom-compatibility` with `openai`, and pass the same configuration. + +## Hermes + +[Hermes](https://hermes-agent.nousresearch.com/) works locally with llama.cpp. Define a default config as: + +```yaml +model: + provider: custom + default: ggml-org/gemma-4-26B-A4B-it-GGUF:Q4_K_M + base_url: http://127.0.0.1:8080/v1 + api_key: llama.cpp + +custom_providers: + - name: Local (127.0.0.1:8080) + base_url: http://127.0.0.1:8080/v1 + api_key: llama.cpp + model: ggml-org/gemma-4-26B-A4B-it-GGUF:Q4_K_M +``` + +## OpenCode + +[OpenCode](https://opencode.ai) works locally with llama.cpp. Define a `~/.config/opencode/opencode.json`: + +```json +{ + "$schema": "https://opencode.ai/config.json", + "provider": { + "llama.cpp": { + "npm": "@ai-sdk/openai-compatible", + "name": "llama-server (local)", + "options": { + "baseURL": "http://127.0.0.1:8080/v1" + }, + "models": { + "gemma-4-26b-4b-it": { + "name": "Gemma 4 (local)", + "limit": { + "context": 128000, + "output": 8192 + } + } + } + } + } +} +``` + ## How It Works The setup has two components running locally: 1. **llama.cpp server** — Serves the model as an OpenAI-compatible API on `localhost`. -2. **Pi** — The agent process that sends prompts to the local server, reasons about tasks, and executes actions. +2. **Your agent** — The agent process that sends prompts to the local server, reasons about tasks, and executes actions. ``` ┌─────────┐ API calls ┌──────────────────┐ -│ Pi │ ───────────────▶ │ llama.cpp server │ -│ (agent) │ ◀─────────────── │ (local model) │ +│ Agent │ ───────────────▶ │ llama.cpp server │ +│ │ ◀─────────────── │ (local model) │ └─────────┘ responses └──────────────────┘ │ ▼ @@ -100,7 +162,7 @@ cmake -B build cmake --build build --target llama-agent # Run (downloads the model automatically) -./build/bin/llama-agent -hf unsloth/GLM-4.7-Flash-GGUF:UD-Q4_K_XL +./build/bin/llama-agent -hf ggml-org/gemma-4-26b-a4b-it-GGUF:Q4_K_M ``` Because tool calls happen in-process rather than over HTTP, there is no network overhead between the model and the agent. It also supports subagents, MCP servers, and an HTTP API server mode.