Give your AI a real browser. A pi extension that lets the LLM navigate, interact with, and see web pages through a fully automated Chromium instance.
You: Find the top story on Hacker News and summarize it
browser open https://news.ycombinator.com
β 30 interactive elements
browser snapshot -i
β 30 interactive elements
browser click @e3
β Navigated to article
browser screenshot
β Screenshot saved: /tmp/screenshot-1707441234.png
browser close
β Browser closed
pi install npm:pi-agent-browserOr try it without installing:
pi -e npm:pi-agent-browserThat's it. On first use, the extension will offer to install agent-browser and download Chromium automatically.
The extension registers a browser tool that the LLM can call. Under the hood, each call runs an agent-browser CLI command against a persistent Chromium session.
βββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββ
β LLM calls ββββββΆβ pi-agent-browser ββββββΆβ agent-browser β
β browser() β β (this extension) β β CLI + Chromium β
βββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββ
β
ββββββββ΄βββββββ
β Returns: β
β β’ text β
β β’ images β
β β’ @refs β
βββββββββββββββ
The typical workflow the LLM follows:
open <url>β Navigate to a pagesnapshot -iβ Get interactive elements with@refhandles (e.g.@e1,@e2)- Interact β
click @e1,fill @e2 "query",press Enter - Re-snapshot β See what changed after interaction
screenshotβ Get a visual of the page (returned as an inline image)closeβ Done
Screenshots are returned as base64 images directly to the LLM. With a vision-capable model, the AI can literally see the page and describe what's on screen.
The snapshot -i command returns a structured list of interactive elements, each tagged with a clickable @ref handle. The LLM uses these to interact with buttons, links, inputs, and more β no CSS selectors or XPath needed.
Complex pages can produce enormous snapshot output. Large results are automatically truncated to fit context windows, with the full output saved to a temp file for reference.
No setup required. If agent-browser isn't found on first use, the extension prompts to install it (npm package + Chromium binary) β all from within pi.
The browser is automatically closed when the pi session ends. No orphaned Chromium processes left behind.
Tool calls display cleanly in pi's terminal UI:
- Commands show as
browser open https://example.com - Snapshots show element counts:
β 30 interactive elements - Screenshots show the saved path
- Errors are highlighted in red
| Command | Description | Example |
|---|---|---|
open <url> |
Navigate to a URL | open https://example.com |
snapshot -i |
List interactive elements with @ref handles |
snapshot -i |
click <@ref> |
Click an element | click @e3 |
fill <@ref> <text> |
Clear field and type text | fill @e5 "search query" |
type <@ref> <text> |
Type text without clearing | type @e5 "more text" |
select <@ref> <value> |
Select a dropdown option | select @e7 "Option B" |
press <key> |
Press a keyboard key | press Enter |
scroll <dir> [px] |
Scroll the page | scroll down 500 |
get text|url|title [@ref] |
Get page or element info | get title |
wait <@ref|ms> |
Wait for element or time | wait 2000 |
screenshot [--full] |
Take a screenshot (returned inline) | screenshot --full |
close |
Close the browser session | close |
Any valid agent-browser command works β the extension passes it through directly.
You: Search Google for "pi coding agent" and tell me the first result
browser open https://www.google.com
browser snapshot -i
browser fill @e3 "pi coding agent"
browser press Enter
browser snapshot -i
browser close
The first result is...
You: Go to httpbin.org/forms/post and fill out the form
browser open https://httpbin.org/forms/post
browser snapshot -i
browser fill @e1 "John"
browser fill @e2 "john@example.com"
browser click @e5
browser close
You: Show me what the Anthropic homepage looks like
browser open https://www.anthropic.com
browser screenshot
// LLM sees the page and describes layout, content, design...
browser close
The Anthropic homepage features a clean design with...
- Node.js β₯ 20
- pi β the coding agent this extends
- agent-browser β installed automatically on first use, or manually:
npm install -g agent-browser agent-browser install # downloads Chromium - Vision-capable model (for screenshots): Claude Sonnet/Opus, GPT-4o, Gemini Pro, etc.
pi-agent-browser/
βββ extensions/
β βββ agent-browser.ts # The pi extension (single file, ~180 lines)
βββ docs/
β βββ plans/ # Implementation plans
βββ package.json # pi package manifest
βββ LICENSE # MIT
βββ README.md
The extension is a single TypeScript file that:
- Registers the
browsertool with pi's extension API - Auto-detects agent-browser installation, prompts to install if missing
- Executes commands via
pi.exec("agent-browser", [...args]) - Handles screenshots by reading the saved image and returning it as base64
- Truncates large outputs to protect context windows
- Renders results with custom TUI formatting
- Cleans up the browser on
session_shutdown
The extension will prompt to install automatically. If that fails, install manually:
npm install -g agent-browser
agent-browser installMake sure Chromium dependencies are installed. On Debian/Ubuntu:
sudo apt-get install -y libx11-xcb1 libxcomposite1 libxdamage1 libxi6 \
libxtst6 libnss3 libcups2 libxrandr2 libasound2 libpangocairo-1.0-0 \
libatk1.0-0 libatk-bridge2.0-0 libgtk-3-0Or run in headless mode (agent-browser's default).
Screenshots require a vision-capable model. Make sure you're using one of:
- Claude 3.5 Sonnet, Claude 3 Opus, Claude 4 Sonnet/Opus
- GPT-4o, GPT-4 Turbo
- Gemini 1.5 Pro, Gemini 2.0 Pro
This is by design. The full output is saved to a temp file (path shown in the truncation notice). The LLM usually has enough context from the truncated output to continue working.
If pi exits unexpectedly, you may need to clean up manually:
agent-browser close
# or
pkill -f chromiumContributions welcome! This is a straightforward single-file extension β the entire implementation lives in extensions/agent-browser.ts.
git clone https://github.com/coctostan/pi-agent-browser.git
cd pi-agent-browser
# Test locally
pi -e extensions/agent-browser.ts