Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
102 changes: 82 additions & 20 deletions docs/hub/agents-local.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,6 @@
# Local Agents with llama.cpp and Pi
# Local Agents with llama.cpp

You can run a coding agent entirely on your own hardware. [Pi](https://pi.dev) connects to a local [llama.cpp](https://github.com/ggerganov/llama.cpp) server to give you an experience similar to Claude Code or Codex — but everything runs on your machine.

> [!TIP]
> Pi is the agent behind [OpenClaw](https://github.com/openclaw) and is now integrated directly into Hugging Face, giving you access to thousands of compatible models.
You can run a coding agent entirely on your own hardware. Several open-source agents can connect to a local [llama.cpp](https://github.com/ggerganov/llama.cpp) server to give you an experience similar to Claude Code or Codex — but everything runs on your machine.

## Getting Started

Expand All @@ -15,22 +12,28 @@ Go to [huggingface.co/settings/local-apps](https://huggingface.co/settings/local

### 2. Find a Compatible Model

Browse for models compatible with Pi: [huggingface.co/models?apps=pi&sort=trending](https://huggingface.co/models?apps=pi&sort=trending)
Browse for [Llama.cpp-compatible models](https://huggingface.co/models?apps=llama.cpp&sort=trending).


### 3. Launch the llama.cpp Server

On the model page, click the **"Use this model"** button and select `llama.cpp`. Pi will show you the exact commands for your setup. The first step is to start a llama.cpp server, e.g.
On the model page, click the **"Use this model"** button and select `llama.cpp`. It will show you the exact commands for your setup. The first step is to start a llama.cpp server, e.g.

```bash
# Start a local OpenAI-compatible server:
llama-server -hf unsloth/Qwen3.5-122B-A10B-GGUF:Q4_K_M --jinja
llama-server -hf ggml-org/gemma-4-26b-a4b-it-GGUF:Q4_K_M --jinja
```

This downloads the model and starts an OpenAI-compatible API server on your machine. See the [llama.cpp guide](./gguf-llamacpp) for installation instructions.

### 4. Install and Configure Pi
### 4. Connect Your Agent

Pick one of the agents below and follow the setup instructions.

## Pi

In a separate terminal, install Pi:
[Pi](https://pi.dev) is the agent behind [OpenClaw](https://github.com/openclaw) and is now integrated directly into Hugging Face, giving you access to thousands of compatible models.

Install Pi:

```bash
npm install -g @mariozechner/pi-coding-agent
Expand All @@ -47,18 +50,14 @@ Then add your local model to Pi's configuration file at `~/.pi/agent/models.json
"apiKey": "none",
"models": [
{
"id": "Qwen3.5-122B-A10B-GGUF"
"id": "ggml-org-gemma-4-26b-4b-gguf"
}
]
}
}
}
```

Update the `id` field to match the model you launched in step 3.

### 5. Run Pi

Start Pi in your project directory:

```bash
Expand All @@ -69,17 +68,80 @@ Pi connects to your local llama.cpp server and gives you an interactive agent se

![Demo](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/pi-llama-cpp-demo.gif)

## OpenClaw

[OpenClaw](https://github.com/openclaw) works locally with llama.cpp. You can set your model via the onboard command:

```bash
openclaw onboard --non-interactive \
--auth-choice custom-api-key \
--custom-base-url "http://127.0.0.1:8080/v1" \
--custom-model-id "ggml-org-gemma-4-26b-a4b-gguf" \
--custom-api-key "llama.cpp" \
--secret-input-mode plaintext \
--custom-compatibility openai \
--accept-risk
```

You can also run `openclaw onboard` interactively, select `custom-compatibility` with `openai`, and pass the same configuration.

## Hermes

[Hermes](https://hermes-agent.nousresearch.com/) works locally with llama.cpp. Define a default config as:

```yaml
model:
provider: custom
default: ggml-org/gemma-4-26B-A4B-it-GGUF:Q4_K_M
base_url: http://127.0.0.1:8080/v1
api_key: llama.cpp

custom_providers:
- name: Local (127.0.0.1:8080)
base_url: http://127.0.0.1:8080/v1
api_key: llama.cpp
model: ggml-org/gemma-4-26B-A4B-it-GGUF:Q4_K_M
```

## OpenCode

[OpenCode](https://opencode.ai) works locally with llama.cpp. Define a `~/.config/opencode/opencode.json`:

```json
{
"$schema": "https://opencode.ai/config.json",
"provider": {
"llama.cpp": {
"npm": "@ai-sdk/openai-compatible",
"name": "llama-server (local)",
"options": {
"baseURL": "http://127.0.0.1:8080/v1"
},
"models": {
"gemma-4-26b-4b-it": {
"name": "Gemma 4 (local)",
"limit": {
"context": 128000,
"output": 8192
}
}
}
}
}
}
```

## How It Works

The setup has two components running locally:

1. **llama.cpp server** — Serves the model as an OpenAI-compatible API on `localhost`.
2. **Pi** — The agent process that sends prompts to the local server, reasons about tasks, and executes actions.
2. **Your agent** — The agent process that sends prompts to the local server, reasons about tasks, and executes actions.

```
┌─────────┐ API calls ┌──────────────────┐
Pi │ ───────────────▶ │ llama.cpp server │
(agent) │ ◀─────────────── │ (local model) │
Agent │ ───────────────▶ │ llama.cpp server │
│ ◀─────────────── │ (local model) │
└─────────┘ responses └──────────────────┘
Expand All @@ -100,7 +162,7 @@ cmake -B build
cmake --build build --target llama-agent

# Run (downloads the model automatically)
./build/bin/llama-agent -hf unsloth/GLM-4.7-Flash-GGUF:UD-Q4_K_XL
./build/bin/llama-agent -hf ggml-org/gemma-4-26b-a4b-it-GGUF:Q4_K_M
```

Because tool calls happen in-process rather than over HTTP, there is no network overhead between the model and the agent. It also supports subagents, MCP servers, and an HTTP API server mode.
Expand Down
Loading