🌐 pi-agent-browser

Give your AI a real browser. A pi extension that lets the LLM navigate, interact with, and see web pages through a fully automated Chromium instance.

You: Find the top story on Hacker News and summarize it

  browser  open https://news.ycombinator.com
  ✓ 30 interactive elements

  browser  snapshot -i
  ✓ 30 interactive elements

  browser  click @e3
  ✓ Navigated to article

  browser  screenshot
  ✓ Screenshot saved: /tmp/screenshot-1707441234.png

  browser  close
  ✓ Browser closed

Install

pi install npm:pi-agent-browser

Or try it without installing:

pi -e npm:pi-agent-browser

That's it. On first use, the extension will offer to install agent-browser and download Chromium automatically.

How It Works

The extension registers a browser tool that the LLM can call. Under the hood, each call runs an agent-browser CLI command against a persistent Chromium session.

┌─────────────┐     ┌──────────────────┐     ┌────────────────┐
│  LLM calls  │────▶│  pi-agent-browser │────▶│  agent-browser  │
│  browser()  │     │  (this extension) │     │  CLI + Chromium │
└─────────────┘     └──────────────────┘     └────────────────┘
                           │
                    ┌──────┴──────┐
                    │  Returns:   │
                    │  • text     │
                    │  • images   │
                    │  • @refs    │
                    └─────────────┘

The typical workflow the LLM follows:

open <url> — Navigate to a page
snapshot -i — Get interactive elements with @ref handles (e.g. @e1, @e2)
Interact — click @e1, fill @e2 "query", press Enter
Re-snapshot — See what changed after interaction
screenshot — Get a visual of the page (returned as an inline image)
close — Done

Features

📸 Inline Screenshots

Screenshots are returned as base64 images directly to the LLM. With a vision-capable model, the AI can literally see the page and describe what's on screen.

🔍 Smart Snapshots with @refs

The snapshot -i command returns a structured list of interactive elements, each tagged with a clickable @ref handle. The LLM uses these to interact with buttons, links, inputs, and more — no CSS selectors or XPath needed.

📏 Output Truncation

Complex pages can produce enormous snapshot output. Large results are automatically truncated to fit context windows, with the full output saved to a temp file for reference.

🔧 Auto-Install

No setup required. If agent-browser isn't found on first use, the extension prompts to install it (npm package + Chromium binary) — all from within pi.

🧹 Session Cleanup

The browser is automatically closed when the pi session ends. No orphaned Chromium processes left behind.

🎨 Custom TUI Rendering

Tool calls display cleanly in pi's terminal UI:

Commands show as browser open https://example.com
Snapshots show element counts: ✓ 30 interactive elements
Screenshots show the saved path
Errors are highlighted in red

Command Reference

Command	Description	Example
`open <url>`	Navigate to a URL	`open https://example.com`
`snapshot -i`	List interactive elements with `@ref` handles	`snapshot -i`
`click <@ref>`	Click an element	`click @e3`
`fill <@ref> <text>`	Clear field and type text	`fill @e5 "search query"`
`type <@ref> <text>`	Type text without clearing	`type @e5 "more text"`
`select <@ref> <value>`	Select a dropdown option	`select @e7 "Option B"`
`press <key>`	Press a keyboard key	`press Enter`
`scroll <dir> [px]`	Scroll the page	`scroll down 500`
`get text\|url\|title [@ref]`	Get page or element info	`get title`
`wait <@ref\|ms>`	Wait for element or time	`wait 2000`
`screenshot [--full]`	Take a screenshot (returned inline)	`screenshot --full`
`close`	Close the browser session	`close`

Any valid agent-browser command works — the extension passes it through directly.

Examples

Search the web

You: Search Google for "pi coding agent" and tell me the first result

  browser  open https://www.google.com
  browser  snapshot -i
  browser  fill @e3 "pi coding agent"
  browser  press Enter
  browser  snapshot -i
  browser  close

The first result is...

Fill out a form

You: Go to httpbin.org/forms/post and fill out the form

  browser  open https://httpbin.org/forms/post
  browser  snapshot -i
  browser  fill @e1 "John"
  browser  fill @e2 "john@example.com"
  browser  click @e5
  browser  close

Take a visual snapshot

You: Show me what the Anthropic homepage looks like

  browser  open https://www.anthropic.com
  browser  screenshot
  // LLM sees the page and describes layout, content, design...
  browser  close

The Anthropic homepage features a clean design with...

Requirements

Node.js ≥ 20
pi — the coding agent this extends

agent-browser — installed automatically on first use, or manually:

npm install -g agent-browser
agent-browser install   # downloads Chromium

Vision-capable model (for screenshots): Claude Sonnet/Opus, GPT-4o, Gemini Pro, etc.

Architecture

pi-agent-browser/
├── extensions/
│   └── agent-browser.ts    # The pi extension (single file, ~180 lines)
├── docs/
│   └── plans/              # Implementation plans
├── package.json            # pi package manifest
├── LICENSE                 # MIT
└── README.md

The extension is a single TypeScript file that:

Registers the browser tool with pi's extension API
Auto-detects agent-browser installation, prompts to install if missing
Executes commands via pi.exec("agent-browser", [...args])
Handles screenshots by reading the saved image and returning it as base64
Truncates large outputs to protect context windows
Renders results with custom TUI formatting
Cleans up the browser on session_shutdown

Troubleshooting

"agent-browser not found"

The extension will prompt to install automatically. If that fails, install manually:

npm install -g agent-browser
agent-browser install

Chromium won't start

Make sure Chromium dependencies are installed. On Debian/Ubuntu:

sudo apt-get install -y libx11-xcb1 libxcomposite1 libxdamage1 libxi6 \
  libxtst6 libnss3 libcups2 libxrandr2 libasound2 libpangocairo-1.0-0 \
  libatk1.0-0 libatk-bridge2.0-0 libgtk-3-0

Or run in headless mode (agent-browser's default).

Screenshots aren't working with my model

Screenshots require a vision-capable model. Make sure you're using one of:

Claude 3.5 Sonnet, Claude 3 Opus, Claude 4 Sonnet/Opus
GPT-4o, GPT-4 Turbo
Gemini 1.5 Pro, Gemini 2.0 Pro

Large pages produce truncated snapshots

This is by design. The full output is saved to a temp file (path shown in the truncation notice). The LLM usually has enough context from the truncated output to continue working.

Browser left running after crash

If pi exits unexpectedly, you may need to clean up manually:

agent-browser close
# or
pkill -f chromium

Contributing

Contributions welcome! This is a straightforward single-file extension — the entire implementation lives in extensions/agent-browser.ts.

git clone https://github.com/coctostan/pi-agent-browser.git
cd pi-agent-browser

# Test locally
pi -e extensions/agent-browser.ts

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌐 pi-agent-browser

Install

How It Works

Features

📸 Inline Screenshots

🔍 Smart Snapshots with @refs

📏 Output Truncation

🔧 Auto-Install

🧹 Session Cleanup

🎨 Custom TUI Rendering

Command Reference

Examples

Search the web

Fill out a form

Take a visual snapshot

Requirements

Architecture

Troubleshooting

"agent-browser not found"

Chromium won't start

Screenshots aren't working with my model

Large pages produce truncated snapshots

Browser left running after crash

Contributing

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets/images		assets/images
docs		docs
extensions		extensions
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package.json		package.json

License

coctostan/pi-agent-browser

Folders and files

Latest commit

History

Repository files navigation

🌐 pi-agent-browser

Install

How It Works

Features

📸 Inline Screenshots

🔍 Smart Snapshots with @refs

📏 Output Truncation

🔧 Auto-Install

🧹 Session Cleanup

🎨 Custom TUI Rendering

Command Reference

Examples

Search the web

Fill out a form

Take a visual snapshot

Requirements

Architecture

Troubleshooting

"agent-browser not found"

Chromium won't start

Screenshots aren't working with my model

Large pages produce truncated snapshots

Browser left running after crash

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages