Foundational Broswer Agent

Cora.Demo.Updated.1.1.1.mp4

This project is a Chrome-based browser agent that observes web pages, reasons over the DOM + screenshots using OpenAI models, and executes actions such as clicking, typing, and scrolling.

It combines:

A FastAPI backend (LLM controller / planner)
A Chrome MV3 extension (UI observation + action execution)
Screenshot-based multimodal reasoning
Overlay indexing for deterministic element selection

Architecture Overview

This project consists of two main components:

1️⃣ Chrome Extension (Frontend Agent)

Files

manifest.json
background.js
content.js
llmClient.js
prompts.js
onboarding.html
onboarding.js
onboarding.css

Responsibilities

Injects content scripts into pages
Detects clickable elements
Draws numbered overlays
Captures screenshots
Sends structured observations to backend
Executes validated actions returned by the model

2️⃣ FastAPI Backend (LLM Controller)

File

server.py

Responsibilities

Receives page context + screenshot
Sends multimodal prompt to OpenAI
Returns structured action JSON

Supported Endpoints

/agent-step — baseline loop
/execute-step
/status
/summarize
/intent
/profile-answer
/tts

🔄 End-to-End Agent Flow

User starts agent (goal provided)
background.js requests page observation
content.js:
- Distills interactive elements
- Shows overlays
- Returns structured elements[] + pageContext
Background captures screenshot via:
```
chrome.tabs.captureVisibleTab()
```
llmClient.js sends:
- goal
- screenshot (base64)
- elements
- action history
- metadata
Backend sends multimodal request to OpenAI
Model returns structured action:
- click_index
- type_text
- scroll
- finish
Action is validated against allowlist
content.js executes action
Loop continues until finish or max steps

📂 Repository Structure

cora/
│
├── manifest.json
├── background.js
├── content.js
├── llmClient.js
├── prompts.js
│
├── onboarding.html
├── onboarding.js
├── onboarding.css
│
└── server.py

⚙️ Installation & Setup

🔹 Backend Setup (FastAPI)

1️⃣ Install Python Dependencies

Create virtual environment:

python -m venv venv

Activate:

macOS / Linux

source venv/bin/activate

Windows

venv\Scripts\activate

Install packages:

pip install fastapi uvicorn openai websockets python-dotenv

2️⃣ Set OpenAI API Key

This project requires your OpenAI key as an environment variable.

Mac / Linux

export OPENAI_API_KEY="your-key-here"

Windows PowerShell

$env:OPENAI_API_KEY="your-key-here"

⚠️ Never hardcode API keys in source code.

3️⃣ Run Backend

uvicorn server:app --reload --port 8000

Backend runs at:

http://localhost:8000

🔹 Chrome Extension Setup

Open Chrome
Navigate to:

chrome://extensions

Enable Developer Mode
Click Load unpacked
Select the project folder (where manifest.json lives)

The extension is now active.

▶️ Running the Agent

Ensure backend is running on localhost:8000
Open any webpage
Start agent via:
- Extension UI
- Onboarding interface
- Custom trigger in your code

Agent will:

Index elements
Show overlays
Begin iterative reasoning loop

🔐 Privacy & Security Notes

This project uses powerful browser permissions.

Permissions Used

<all_urls>
activeTab
tabs
scripting
webNavigation
storage

Screenshot Capture

The system captures the visible viewport when making LLM decisions.

Captured data includes:

Visible screenshot (PNG)
URL
Page title
Indexed elements (text + metadata)
Action history

Data flow:

Browser → Localhost Backend → OpenAI API

API Keys

Loaded from OPENAI_API_KEY
Never committed to repository
Never stored in Chrome extension

🚀 Future Improvements

Observation caching
Delta-based DOM diffing
Plan caching
Persistent memory layer
Element ranking (top-K distilled elements)
Local embedding store

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
README.md		README.md
algorithm.py		algorithm.py
background.js		background.js
content.js		content.js
content2.js		content2.js
display.png		display.png
icon.png		icon.png
llmClient.js		llmClient.js
manifest.json		manifest.json
onboarding.css		onboarding.css
onboarding.html		onboarding.html
onboarding.js		onboarding.js
prompts.js		prompts.js
screen.png		screen.png
server.py		server.py
update.png		update.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Foundational Broswer Agent

Architecture Overview

1️⃣ Chrome Extension (Frontend Agent)

2️⃣ FastAPI Backend (LLM Controller)

🔄 End-to-End Agent Flow

📂 Repository Structure

⚙️ Installation & Setup

🔹 Backend Setup (FastAPI)

1️⃣ Install Python Dependencies

2️⃣ Set OpenAI API Key

3️⃣ Run Backend

🔹 Chrome Extension Setup

▶️ Running the Agent

🔐 Privacy & Security Notes

Permissions Used

Screenshot Capture

API Keys

🚀 Future Improvements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Foundational Broswer Agent

Architecture Overview

1️⃣ Chrome Extension (Frontend Agent)

2️⃣ FastAPI Backend (LLM Controller)

🔄 End-to-End Agent Flow

📂 Repository Structure

⚙️ Installation & Setup

🔹 Backend Setup (FastAPI)

1️⃣ Install Python Dependencies

2️⃣ Set OpenAI API Key

3️⃣ Run Backend

🔹 Chrome Extension Setup

▶️ Running the Agent

🔐 Privacy & Security Notes

Permissions Used

Screenshot Capture

API Keys

🚀 Future Improvements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages