From a1d70fa5b0ca38717cf223dc1a97cfb2ab7451cf Mon Sep 17 00:00:00 2001 From: zerob13 Date: Mon, 19 Jan 2026 14:49:42 +0800 Subject: [PATCH 1/6] docs: add specs for yo browesr context manager --- docs/specs/yobrowser-optimization/plan.md | 128 +++++++++++++++++++++ docs/specs/yobrowser-optimization/spec.md | 78 +++++++++++++ docs/specs/yobrowser-optimization/tasks.md | 86 ++++++++++++++ 3 files changed, 292 insertions(+) create mode 100644 docs/specs/yobrowser-optimization/plan.md create mode 100644 docs/specs/yobrowser-optimization/spec.md create mode 100644 docs/specs/yobrowser-optimization/tasks.md diff --git a/docs/specs/yobrowser-optimization/plan.md b/docs/specs/yobrowser-optimization/plan.md new file mode 100644 index 000000000..26821eb84 --- /dev/null +++ b/docs/specs/yobrowser-optimization/plan.md @@ -0,0 +1,128 @@ +# YoBrowser Optimization:实施方案(Plan) + +## 现状盘点(基于代码) + +- Renderer:`src/renderer/src/components/workspace/WorkspaceView.vue` 在 `agent` 模式下渲染 `WorkspaceBrowserTabs`,但不关心是否存在 tabs。 +- Renderer:`src/renderer/src/stores/yoBrowser.ts` 已维护 tabs 与 `tabCount`(由 IPC 事件更新)。 +- Main:YoBrowser 通过 `BrowserToolManager` 导出大量 `browser_*` 工具定义,并在 `AgentToolManager` 中注入到 agent 工具集中。 +- Agent loop:`src/main/presenter/agentPresenter/loop/toolCallProcessor.ts` 对部分 `browser_*` 工具有输出 offload 白名单。 + +## 总体设计 + +按你的要求分成三块: + +1) UI:Browser Tabs 分区只在 `tabCount > 0` 时出现。 +2) 完全移除 `browser_*`:从 agent 工具列表/路由/实现/文档/测试中彻底清理。 +3) CDP Skill 化:新增 `yo-browser-cdp`,并提供最小工具面(CDP send + tab 管理),工具仅在 skill 激活时可用。 + +> 约束:不做任何 system prompt / browser context 缩减。 + +--- + +## 1) UI:Browser Tabs 分区仅在 `tabCount > 0` 时渲染 + +- 修改 `WorkspaceView.vue`: + - 引入 `useYoBrowserStore()`。 + - 将 `showBrowserTabs` 改为:`chatMode.currentMode.value === 'agent' && yoBrowserStore.tabCount > 0`。 + +说明: +- `yoBrowserStore.tabCount` 已存在且由 tabs 数组计算。 +- tabs 更新依赖现有 `YO_BROWSER_EVENTS.*`(TAB_CREATED/TAB_CLOSED/TAB_COUNT_CHANGED 等),无需新增事件。 + +--- + +## 2) 完全移除 `browser_*` 工具体系 + +这里的“完全移除”需要同时覆盖: + +### 2.1 从 agent 工具注入与执行链路移除 + +- `src/main/presenter/agentPresenter/acp/agentToolManager.ts` + - 移除 `getAllToolDefinitions()` 中对 `yoBrowserPresenter.getToolDefinitions()` 的注入。 + - 移除 `callTool()` 中对 `browser_*`(例如 `toolName.startsWith('browser_')`)的路由分支。 + +### 2.2 删除旧工具实现与依赖 + +- 删除/移除构建引用链: + - `src/main/presenter/browser/BrowserToolManager.ts` + - `src/main/presenter/browser/tools/**` +- 同步清理任何残留引用(全局搜索 `browser_` / `BrowserToolManager`)。 + +### 2.3 同步更新外围逻辑、文档、测试 + +- `src/main/presenter/agentPresenter/loop/toolCallProcessor.ts` + - 从 `TOOLS_REQUIRING_OFFLOAD` 中移除 `browser_*`(至少 `browser_read_links` / `browser_get_clickable_elements`)。 +- `docs/architecture/tool-system.md` + - 移除 `browser_*` 相关说明与示例,改为描述:YoBrowser 自动化由 `yo-browser-cdp` skill 触发 + `yo_browser_*` 最小工具面。 +- 测试 + - 更新/删除任何假设 `browser_*` 存在的测试(例如 tool definitions 断言)。 + +风险: +- 删除 `browser_*` 后,若 repo 内仍存在引用会导致构建/运行失败;因此需要全局清理。 + +回滚: +- 按你的要求不保留旧实现,回滚需依赖 git 历史恢复。 + +--- + +## 3) CDP Skill 化:`yo-browser-cdp` + 最小工具面 + +### 3.1 Skill 目录与命名 + +- 新增 skill:`resources/skills/yo-browser-cdp/SKILL.md`(名称 `yo-browser-cdp`)。 + +### 3.2 最小工具面(仅在 skill 激活时暴露) + +工具名以 `yo_browser_` 前缀区分旧 `browser_*`: + +- `yo_browser_tab_list` + - 输出:tabs 列表 + active tabId。 +- `yo_browser_tab_new` + - 输入:`{ url?: string }`;输出:新 tabId + url。 +- `yo_browser_tab_activate` + - 输入:`{ tabId: string }`。 +- `yo_browser_tab_close` + - 输入:`{ tabId: string }`。 +- `yo_browser_cdp_send` + - 输入:`{ tabId?: string, method: string, params?: object }`。 + - 输出:`Debugger.sendCommand(method, params)` 的响应(推荐 JSON 文本)。 + +关键约束: +- 必须复用现有安全边界:当 tab URL 为 `local://` 时拒绝 CDP attach。 + - 推荐实现路径:通过 `BrowserTab.ensureSession()` 获取 session,再执行 `sendCommand`,避免绕过现有检查。 + +### 3.3 Skill gating 方案 + +- 目标:默认情况下 tool definitions **不包含** `yo_browser_*`。 +- 当且仅当 active skills 包含 `yo-browser-cdp` 时,才将 `yo_browser_*` 注入 tool definitions。 +- 建议复用现有 `deepchat-settings` 的 gating 模式: + - 在 `AgentToolManager.getAllToolDefinitions()` 中读取 `presenter.skillPresenter.getActiveSkills(conversationId)`,若包含 `yo-browser-cdp` 则追加 `yo_browser_*` tool definitions。 + +### 3.4 Skill 文档内容要点(SKILL.md) + +- 激活规则:仅当用户明确需要网页浏览/网页自动化/抓取验证时才激活;结束后及时 deactivate。 +- 心智模型:tab → active tab → CDP session → method/params。 +- 推荐 workflow: + - `yo_browser_tab_list`(确认 active tab) + - 必要时 `yo_browser_tab_new` / `yo_browser_tab_activate` + - `yo_browser_cdp_send` 执行: + - 页面导航:`Page.navigate` + - 读 DOM/内容:`Runtime.evaluate` + - 等待:`Runtime.evaluate` + 轮询/超时 +- 常见错误处理:element not found、导航超时、tab 被销毁、CDP attach 被拒绝(local://)。 + +### 3.5 测试策略 + +- Renderer + - 验证 `WorkspaceView`:tabCount=0 不显示;tabCount>0 显示。 +- Main + - tool definitions:默认无 `browser_*`,默认无 `yo_browser_*`。 + - 激活 `yo-browser-cdp` 后:出现 `yo_browser_*`。 + - 永久:`browser_*` 不再出现。 + +--- + +## 不在本计划内 + +- system prompt / browser context 的缩减或重写。 +- 任何对 YoBrowser UI 行为(窗口位置/大小等)的调整。 diff --git a/docs/specs/yobrowser-optimization/spec.md b/docs/specs/yobrowser-optimization/spec.md new file mode 100644 index 000000000..9950e2d1a --- /dev/null +++ b/docs/specs/yobrowser-optimization/spec.md @@ -0,0 +1,78 @@ +# YoBrowser Optimization(UI + CDP Skill 化) + +## 背景 + +当前 YoBrowser 在 Workspace 侧边栏与 agent 工具体系中存在两类问题: + +1) **UI**:`src/renderer/src/components/workspace/WorkspaceView.vue` 在 `agent` 模式下总会渲染 `WorkspaceBrowserTabs` 分区,即便没有任何 tab,也会出现一块空区域。 + +2) **工具体系**:main 侧通过 `BrowserToolManager` 暴露大量 `browser_*` 工具(navigate/action/content/tabs/download 等),你希望: +- **完全移除**这套 `browser_*` 工具(不暴露、不保留旧实现)。 +- 走“**完全 CDP**”路线:用 **skill** 教会模型如何操作 CDP,并只提供 **最小工具面**(优先 1 个通用 CDP send + 少量 tab 管理)。 + +> 重要约束:你明确要求 **不做 system prompt / browser context 的减少**,因此本需求不包含任何“缩减 prompt 注入”的工作。 + +## 目标(Goals) + +1. **UI**:只有存在 YoBrowser tabs 时,Workspace 侧边栏才显示 Browser Tabs 分区。 +2. **完全移除 `browser_*`**:从 agent 工具列表、路由、实现代码、文档与测试中彻底移除 `browser_*` 体系。 +3. **CDP Skill 化**:新增 `yo-browser-cdp` skill(目录命名按约定),通过 skill 文档讲清 CDP 工作流;并提供最小工具面以 CDP 为核心。 + +## 非目标(Non-Goals) + +- 不调整 YoBrowser window 的 UI、尺寸、布局、位置策略。 +- 不修改 `BrowserContextBuilder.buildSystemPrompt` 的注入策略(不做减少/压缩/裁剪)。 +- 不改造其他 agent 工具(filesystem/bash/mcp 等)。 + +## 用户故事(User Stories) + +- 作为用户,我不希望在没有任何浏览器 tab 的情况下,Workspace 侧边栏仍出现空的 Browser Tabs 分区。 +- 作为 agent 用户,我希望 YoBrowser 自动化能力以 CDP 为核心,并通过 skill 教会模型如何使用,而不是依赖大量高层封装工具。 +- 作为维护者,我希望彻底移除现有 `browser_*` 工具实现,减少长期维护面。 + +## 约束与假设(Constraints & Assumptions) + +- YoBrowser 现有实现已经基于 Electron Debugger/CDP(`CDPManager`, `BrowserTab.ensureSession()`)。 +- 安全边界:`local://` URL 禁止绑定 CDP(`BrowserTab` 现有逻辑已做限制),skill 必须明确写出该约束。 + +## 验收标准(Acceptance Criteria) + +### A. UI:Workspace Browser Tabs 展示逻辑 + +- [ ] `src/renderer/src/components/workspace/WorkspaceView.vue` 仅在 `chatMode === 'agent' && yoBrowserStore.tabCount > 0` 时渲染 `WorkspaceBrowserTabs`。 +- [ ] 当 `tabCount === 0` 时,不显示 Browser Tabs 分区(不保留空白区域)。 + +### B. 工具:完全移除 `browser_*` 系列工具 + +- [ ] agent tool definitions 中不再出现任何 `browser_*` 工具。 +- [ ] agent 的 tool call 路由中不再接受/处理 `browser_*`(例如 `toolName.startsWith('browser_')` 这类分支被移除)。 +- [ ] 旧工具实现不再保留:`BrowserToolManager` 与 `src/main/presenter/browser/tools/**`(以及其他相关封装)被删除或完全移出构建引用链。 +- [ ] repo 内所有对 `browser_*` 的残留引用被清理/更新: + - `src/main/presenter/agentPresenter/loop/toolCallProcessor.ts` 的 offload 白名单不再包含 `browser_*`。 + - `docs/architecture/tool-system.md` 不再描述 `browser_*` 使用方式。 + - 相关测试不再假设 `browser_*` 存在。 + +### C. Skill:新增 `yo-browser-cdp`(CDP 为核心 + 最小工具面) + +- [ ] 新增 skill:`resources/skills/yo-browser-cdp/SKILL.md`。 +- [ ] skill 明确写清:何时 activate/deactivate、tab/active tab、CDP session、`local://` 禁止 CDP、安全/失败处理。 +- [ ] `yo-browser-cdp` 激活后才暴露最小工具面(未激活时不可见/不可用)。 +- [ ] 最小工具面(名称为最终约定,后续实现需对齐): + - `yo_browser_tab_list`:列出 tabs 与 active tab。 + - `yo_browser_tab_new`:创建新 tab(可选 url)。 + - `yo_browser_tab_activate`:激活 tab。 + - `yo_browser_tab_close`:关闭 tab。 + - `yo_browser_cdp_send`:向指定/当前 tab 的 CDP session 发送 `{ method, params }`。 + +### D. Prompt/Context + +- [ ] `BrowserContextBuilder.buildSystemPrompt` 的注入保持现状(不做减少/压缩/裁剪)。 + +### E. 兼容性 + +- [ ] 不涉及数据迁移。 +- [ ] 现有 YoBrowser UI/窗口/Tab 生命周期保持可用。 + +## Open Questions + +无。 diff --git a/docs/specs/yobrowser-optimization/tasks.md b/docs/specs/yobrowser-optimization/tasks.md new file mode 100644 index 000000000..edf2d1418 --- /dev/null +++ b/docs/specs/yobrowser-optimization/tasks.md @@ -0,0 +1,86 @@ +# YoBrowser Optimization:任务拆分(Tasks) + +> 说明:本 tasks 用于后续落地实现的拆分;当前请求仅完善文档,因此这里只给出可执行顺序与验收点,不执行代码变更。 + +## Phase 1:UI(Workspace 侧边栏) + +1. 调整 Browser Tabs 分区显示条件 +- 文件:`src/renderer/src/components/workspace/WorkspaceView.vue` +- 改动:`WorkspaceBrowserTabs` 仅在 `chatMode === 'agent' && yoBrowserStore.tabCount > 0` 时渲染。 +- 验收:无 tabs 时不出现分区;有 tabs 时出现并能点击切换。 + +2.(可选)补 renderer 单测 +- 文件:`test/renderer/**`(按现有测试组织落位) +- 用例:tabCount=0/1 下的条件渲染。 + +--- + +## Phase 2:完全移除 `browser_*` 工具体系 + +3. 移除 agent 工具定义注入(browser_*) +- 文件:`src/main/presenter/agentPresenter/acp/agentToolManager.ts` +- 改动:删除 YoBrowser `getToolDefinitions()` 注入逻辑。 +- 验收:tool definitions 不再包含任何 `browser_*`。 + +4. 移除 agent 的 `browser_*` call 路由 +- 文件:`src/main/presenter/agentPresenter/acp/agentToolManager.ts` +- 改动:删除/禁止 `toolName.startsWith('browser_')` 分支。 +- 验收:任何 `browser_*` tool call 都会被视为 unknown tool。 + +5. 清理 agent loop 对 `browser_*` 的特殊处理 +- 文件:`src/main/presenter/agentPresenter/loop/toolCallProcessor.ts` +- 改动:从 `TOOLS_REQUIRING_OFFLOAD` 移除所有 `browser_*`。 + +6. 删除旧工具实现代码 +- 目录/文件(预计): + - `src/main/presenter/browser/BrowserToolManager.ts` + - `src/main/presenter/browser/tools/**` +- 验收:repo 中不再保留这套封装工具实现,且无残留引用。 + +7. 更新文档与示例 +- 文件:`docs/architecture/tool-system.md`(以及搜索到的其他文档) +- 改动:移除/更新关于 `browser_*` 的示例与说明,改为 `yo-browser-cdp` skill + `yo_browser_*` 最小工具面。 + +8. 更新/修复测试 +- 文件:`test/main/**`(按失败点定位) +- 改动:移除任何假设 `browser_*` 存在的断言。 + +--- + +## Phase 3:新增 `yo-browser-cdp`(CDP skill + 最小工具面) + +9. 创建 skill 文档 +- 文件:`resources/skills/yo-browser-cdp/SKILL.md` +- 内容要求: + - 激活/关闭规则(仅在需要网页自动化时)。 + - tab/active tab/CDP session 概念。 + - 安全边界:`local://` 禁止 CDP。 + - 推荐工作流:list tabs → activate/new → `yo_browser_cdp_send(Page.navigate)` → `yo_browser_cdp_send(Runtime.evaluate)` → wait/retry。 + - 常见错误处理:element not found、navigation 超时、tab 被销毁等。 + +10. 实现最小 `yo_browser_*` 工具集,并做 skill gating +- 目标:工具只在 `yo-browser-cdp` 激活时可用(默认不可见)。 +- 工具集合(最终以实现为准): + - `yo_browser_tab_list` + - `yo_browser_tab_new` + - `yo_browser_tab_activate` + - `yo_browser_tab_close` + - `yo_browser_cdp_send` + +11.(可选)补 main 单测覆盖 gating +- 验证: + - 默认无 `yo_browser_*`。 + - 激活 `yo-browser-cdp` 后出现 `yo_browser_*`。 + - `browser_*` 永久不存在。 + +--- + +## Phase 4:验收与质量门禁 + +12. 手工验收 +- Agent 模式下:无 tabs 时 Workspace 不显示 Browser Tabs;创建 tab 后显示。 +- 激活 `yo-browser-cdp`:模型可按 skill 指引使用 CDP 工具完成基本操作。 + +13. 质量门禁 +- `pnpm run format && pnpm run lint && pnpm run typecheck` +- `pnpm test` From 9b450d9ccc45c96c19528a4cc81dc769e937eeda Mon Sep 17 00:00:00 2001 From: zerob13 Date: Mon, 19 Jan 2026 18:08:36 +0800 Subject: [PATCH 2/6] feat(yo-browser): improve skill description and add yo_browser_cdp_send to offload whitelist --- resources/skills/yo-browser-cdp/SKILL.md | 219 ++++++++++++++++++ .../agentPresenter/loop/toolCallProcessor.ts | 3 +- .../loop/toolCallProcessor.test.ts | 8 +- 3 files changed, 224 insertions(+), 6 deletions(-) create mode 100644 resources/skills/yo-browser-cdp/SKILL.md diff --git a/resources/skills/yo-browser-cdp/SKILL.md b/resources/skills/yo-browser-cdp/SKILL.md new file mode 100644 index 000000000..bcfba53ec --- /dev/null +++ b/resources/skills/yo-browser-cdp/SKILL.md @@ -0,0 +1,219 @@ +--- +name: yo-browser-cdp +description: DeepChat's built-in YoBrowser capability, controlled via Chrome DevTools Protocol (CDP). Prefer this skill whenever the model needs web browsing, navigation, extraction, or interaction. +allowedTools: + - yo_browser_tab_list + - yo_browser_tab_new + - yo_browser_tab_activate + - yo_browser_tab_close + - yo_browser_cdp_send +--- + +# YoBrowser CDP Skill + +## Overview + +YoBrowser is DeepChat's built-in browser. This skill exposes browser capabilities by controlling YoBrowser through Chrome DevTools Protocol (CDP). + +Use this skill as the default choice when you need any web browsing or page-level interaction (navigate, read, extract, click, fill forms, verify UI behavior). It is especially suitable for tasks that require real browser behavior rather than simple HTTP fetching. + +## Mental Model + +The browser automation workflow follows this pattern: +1. **Tab Management**: Tabs are isolated browser instances. You can list, create, activate, and close tabs. +2. **Active Tab**: Each browser has one active tab that receives commands by default. +3. **CDP Session**: Each tab has a CDP session attached that accepts protocol commands. +4. **CDP Commands**: Send `{ method, params }` to the active tab's CDP session to perform operations. + +## Security Constraints + +**IMPORTANT**: Browser tabs with URLs starting with `local://` are **NOT** allowed to attach CDP sessions. This is a security boundary to prevent automation of DeepChat's internal UI pages. + +If you attempt to use CDP commands on a `local://` tab, the operation will fail with an error. + +## Recommended Workflow + +### Basic Navigation and Content Reading + +1. List current tabs: + ``` + yo_browser_tab_list() + ``` + +2. If no suitable tab exists, create a new one: + ``` + yo_browser_tab_new({ url: "https://example.com" }) + ``` + +3. Navigate to a target URL using CDP: + ``` + yo_browser_cdp_send({ + method: "Page.navigate", + params: { url: "https://example.com" } + }) + ``` + +4. Wait for page load (check navigation result): + - The `Page.navigate` response includes `result: { success: boolean }` + - If navigation is successful, proceed to content extraction + +5. Extract page content using CDP: + ``` + yo_browser_cdp_send({ + method: "Runtime.evaluate", + params: { + expression: "document.body.innerText", + returnByValue: true + } + }) + ``` + +### Advanced: DOM Interaction + +1. Find elements: + ``` + yo_browser_cdp_send({ + method: "Runtime.evaluate", + params: { + expression: "document.querySelector('.button').click()", + returnByValue: false + } + }) + ``` + +2. Wait for state changes: + ``` + yo_browser_cdp_send({ + method: "Runtime.evaluate", + params: { + expression: ` + new Promise(resolve => { + const check = () => { + if (document.querySelector('.loaded')) resolve(true); + else setTimeout(check, 100); + }; + check(); + }) + `, + awaitPromise: true + } + }) + ``` + +## Available Tools + +### `yo_browser_tab_list` +List all browser tabs and identify the active tab. + +**Returns**: JSON object with `activeTabId` and `tabs` (each tab has `id`, `url`, `title`, `isActive`). + +### `yo_browser_tab_new` +Create a new browser tab with an optional URL. + +**Parameters**: +- `url` (optional): Initial URL to navigate to + +**Returns**: New tab object with `id`, `url`, and `title`. + +### `yo_browser_tab_activate` +Make a specific tab the active tab. + +**Parameters**: +- `tabId`: ID of the tab to activate + +### `yo_browser_tab_close` +Close a specific browser tab. + +**Parameters**: +- `tabId`: ID of the tab to close + +### `yo_browser_cdp_send` +Send a CDP command to a tab. + +**Parameters**: +- `tabId` (optional): Target tab ID. If omitted, uses the active tab. +- `method`: CDP method name (e.g., `Page.navigate`, `Runtime.evaluate`) +- `params` (optional): Method parameters as an object + +**Returns**: CDP command response as JSON. + +## Common CDP Commands + +### Page Navigation +- `Page.navigate` - Navigate to a URL + - `params: { url: string }` + +### DOM Inspection +- `Runtime.evaluate` - Execute JavaScript in page context + - `params: { expression: string, returnByValue: boolean, awaitPromise: boolean }` +- `DOM.getDocument` - Get DOM tree +- `DOM.querySelector` - Find a DOM node + +### Page Interaction +- `Input.dispatchMouseEvent` - Simulate mouse events +- `Input.dispatchKeyEvent` - Simulate keyboard events +- `Runtime.evaluate` with `expression` containing `.click()` - Click elements + +## Error Handling + +Common errors and how to handle them: + +1. **Tab not found**: The specified `tabId` doesn't exist. Use `yo_browser_tab_list` to get valid tabs. + +2. **No active tab**: You tried to send a CDP command without specifying a tab ID, and no tab is active. Create or activate a tab first. + +3. **CDP attach rejected**: You attempted to attach CDP to a `local://` URL. This is a security restriction and cannot be bypassed. + +4. **Navigation failed**: `Page.navigate` returned `success: false`. The URL may be invalid or unreachable. Check the URL and try again. + +5. **Element not found**: Your DOM query returned null. Verify the selector is correct and the page has loaded completely. + +6. **Navigation timeout**: The page took too long to load. Consider adding retry logic or increasing timeout tolerance. + +7. **Tab destroyed**: The tab was closed during operations. List tabs again and recreate if needed. + +## Best Practices + +1. **Always list tabs first** to understand the current state before making assumptions. + +2. **Check navigation success** before proceeding with content extraction. + +3. **Use `awaitPromise: true`** when evaluating JavaScript that returns Promises. + +4. **Handle `local://` restrictions** by checking tab URLs before attempting CDP operations. + +5. **Close tabs when done** to free resources, especially if many tabs were created. + +6. **Use `returnByValue: true`** when you need the actual value from `Runtime.evaluate`, not a remote object reference. + +## Example Session + +```json +// 1. List tabs +{ "tool": "yo_browser_tab_list", "args": {} } + +// Response: { "activeTabId": "tab-1", "tabs": [{ "id": "tab-1", "url": "about:blank", "title": "", "isActive": true }] } + +// 2. Navigate to a page +{ "tool": "yo_browser_cdp_send", "args": { + "method": "Page.navigate", + "params": { "url": "https://example.com" } +}} + +// Response: { "result": { "success": true } } + +// 3. Extract page content +{ "tool": "yo_browser_cdp_send", "args": { + "method": "Runtime.evaluate", + "params": { + "expression": "document.title", + "returnByValue": true + } +}} + +// Response: { "result": { "type": "string", "value": "Example Domain" } } +``` + +## Deactivation + +When you're finished with browser operations, explicitly deactivate this skill to keep the tool set focused on the current task. diff --git a/src/main/presenter/agentPresenter/loop/toolCallProcessor.ts b/src/main/presenter/agentPresenter/loop/toolCallProcessor.ts index b36850a47..dddaf343b 100644 --- a/src/main/presenter/agentPresenter/loop/toolCallProcessor.ts +++ b/src/main/presenter/agentPresenter/loop/toolCallProcessor.ts @@ -52,8 +52,7 @@ const TOOLS_REQUIRING_OFFLOAD = new Set([ 'glob_search', 'grep_search', 'text_replace', - 'browser_read_links', - 'browser_get_clickable_elements' + 'yo_browser_cdp_send' ]) export class ToolCallProcessor { diff --git a/test/main/presenter/agentPresenter/loop/toolCallProcessor.test.ts b/test/main/presenter/agentPresenter/loop/toolCallProcessor.test.ts index a2a4168a0..ad4995e4b 100644 --- a/test/main/presenter/agentPresenter/loop/toolCallProcessor.test.ts +++ b/test/main/presenter/agentPresenter/loop/toolCallProcessor.test.ts @@ -18,8 +18,8 @@ describe('ToolCallProcessor tool output offload', () => { const toolDefinition = { type: 'function', function: { - name: 'mock_tool', - description: 'mock tool', + name: 'execute_command', + description: 'execute command', parameters: { type: 'object', properties: {} @@ -43,7 +43,7 @@ describe('ToolCallProcessor tool output offload', () => { }) it('offloads large tool responses and returns stub content', async () => { - const longOutput = 'x'.repeat(3001) + const longOutput = 'x'.repeat(5001) const rawData = { content: longOutput } as MCPToolResponse const processor = new ToolCallProcessor({ getAllToolDefinitions: async () => [toolDefinition], @@ -57,7 +57,7 @@ describe('ToolCallProcessor tool output offload', () => { const events: any[] = [] for await (const event of processor.process({ eventId: 'event-1', - toolCalls: [{ id: 'tool-1', name: 'mock_tool', arguments: '{}' }], + toolCalls: [{ id: 'tool-1', name: 'execute_command', arguments: '{}' }], enabledMcpTools: [], conversationMessages, modelConfig, From 6111bdc4ed4856fc8ed8344eab5b192f5a3a3c76 Mon Sep 17 00:00:00 2001 From: zerob13 Date: Mon, 19 Jan 2026 18:39:58 +0800 Subject: [PATCH 3/6] refactor(yobrowser): remove skill gating and make CDP tools always available in agent mode --- docs/architecture/tool-system.md | 37 ++- .../workspace-agent-refactoring-summary.md | 2 +- docs/specs/yobrowser-optimization/plan.md | 111 ++----- docs/specs/yobrowser-optimization/spec.md | 43 +-- docs/specs/yobrowser-optimization/tasks.md | 81 ++--- resources/skills/yo-browser-cdp/SKILL.md | 219 -------------- .../agentPresenter/acp/agentToolManager.ts | 45 ++- src/main/presenter/browser/BrowserTab.ts | 5 + .../presenter/browser/BrowserToolManager.ts | 103 ------- .../presenter/browser/YoBrowserPresenter.ts | 59 +--- .../browser/YoBrowserToolDefinitions.ts | 118 ++++++++ .../presenter/browser/YoBrowserToolHandler.ts | 110 +++++++ src/main/presenter/browser/tools/action.ts | 200 ------------- src/main/presenter/browser/tools/content.ts | 256 ---------------- src/main/presenter/browser/tools/download.ts | 89 ------ src/main/presenter/browser/tools/navigate.ts | 283 ------------------ .../presenter/browser/tools/screenshot.ts | 48 --- src/main/presenter/browser/tools/tabs.ts | 152 ---------- src/main/presenter/browser/tools/types.ts | 48 --- src/main/presenter/toolPresenter/index.ts | 1 - .../components/workspace/WorkspaceView.vue | 6 +- .../types/presenters/legacy.presenters.d.ts | 6 +- .../agentToolManagerSettings.test.ts | 10 +- 23 files changed, 367 insertions(+), 1665 deletions(-) delete mode 100644 resources/skills/yo-browser-cdp/SKILL.md delete mode 100644 src/main/presenter/browser/BrowserToolManager.ts create mode 100644 src/main/presenter/browser/YoBrowserToolDefinitions.ts create mode 100644 src/main/presenter/browser/YoBrowserToolHandler.ts delete mode 100644 src/main/presenter/browser/tools/action.ts delete mode 100644 src/main/presenter/browser/tools/content.ts delete mode 100644 src/main/presenter/browser/tools/download.ts delete mode 100644 src/main/presenter/browser/tools/navigate.ts delete mode 100644 src/main/presenter/browser/tools/screenshot.ts delete mode 100644 src/main/presenter/browser/tools/tabs.ts delete mode 100644 src/main/presenter/browser/tools/types.ts diff --git a/docs/architecture/tool-system.md b/docs/architecture/tool-system.md index 9403012da..f73c17213 100644 --- a/docs/architecture/tool-system.md +++ b/docs/architecture/tool-system.md @@ -29,7 +29,7 @@ graph TB AgentToolMgr[AgentToolManager] FsHandler[AgentFileSystemHandler] - Browser[Yo Browser Tools] + YoBrowser[Yo Browser CDP] end subgraph "外部服务" @@ -48,10 +48,10 @@ graph TB McpClient --> MCPServers AgentToolMgr --> FsHandler - AgentToolMgr --> Browser + AgentToolMgr --> YoBrowser FsHandler --> Files - Browser --> Web + YoBrowser --> Web classDef router fill:#e3f2fd classDef mcp fill:#fff3e0 @@ -60,7 +60,7 @@ graph TB class ToolP,Mapper router class McpP,ServerMgr,ToolMgr,McpClient mcp - class AgentToolMgr,FsHandler,Browser agent + class AgentToolMgr,FsHandler,YoBrowser agent class MCPServers,Files,Web external ``` @@ -622,23 +622,20 @@ class AgentFileSystemHandler { 3. **边界检查**:防止 `../` 越界访问 4. **正则验证**:`grep_search` 和 `text_replace` 使用 `validateRegexPattern` 防 ReDoS -### Browser 工具 +### YoBrowser CDP 工具 -```typescript -// 通过 Yo Browser Presenter 调用 -async callBrowserTool(toolName: string, args: any): Promise { - switch (toolName) { - case 'browser_navigate': - return await this.yoBrowserPresenter.navigate(args.url) - case 'browser_scrape': - return await this.yoBrowserPresenter.scrape(args.url) - case 'browser_screenshot': - return await this.yoBrowserPresenter.screenshot(args.url) - default: - throw new Error(`未知的 Browser 工具: ${toolName}`) - } -} -``` +YoBrowser 提供基于 Chrome DevTools Protocol (CDP) 的最小工具集,在 agent 模式下直接可用。 + +**可用工具**: +- `yo_browser_tab_list` - 列出所有浏览器 tabs +- `yo_browser_tab_new` - 创建新 tab +- `yo_browser_tab_activate` - 激活指定 tab +- `yo_browser_tab_close` - 关闭 tab +- `yo_browser_cdp_send` - 发送 CDP 命令 + +**安全约束**: +- `local://` URL 禁止 CDP attach(在 `BrowserTab.ensureSession()` 中检查) +- 所有 CDP 命令通过 `webContents.debugger.sendCommand()` 执行 ## 🔐 权限系统 diff --git a/docs/archives/workspace-agent-refactoring-summary.md b/docs/archives/workspace-agent-refactoring-summary.md index 0634495da..bc9d193e0 100644 --- a/docs/archives/workspace-agent-refactoring-summary.md +++ b/docs/archives/workspace-agent-refactoring-summary.md @@ -196,7 +196,7 @@ graph TB - MCP 工具:保持原始命名 - Agent FileSystem 工具:不加前缀(`read_file` 等) -- Yo Browser:保留 `browser_` 前缀 +- Yo Browser:使用 `yo_browser_` 前缀 ### 工具路由机制 diff --git a/docs/specs/yobrowser-optimization/plan.md b/docs/specs/yobrowser-optimization/plan.md index 26821eb84..37bfac326 100644 --- a/docs/specs/yobrowser-optimization/plan.md +++ b/docs/specs/yobrowser-optimization/plan.md @@ -4,16 +4,14 @@ - Renderer:`src/renderer/src/components/workspace/WorkspaceView.vue` 在 `agent` 模式下渲染 `WorkspaceBrowserTabs`,但不关心是否存在 tabs。 - Renderer:`src/renderer/src/stores/yoBrowser.ts` 已维护 tabs 与 `tabCount`(由 IPC 事件更新)。 -- Main:YoBrowser 通过 `BrowserToolManager` 导出大量 `browser_*` 工具定义,并在 `AgentToolManager` 中注入到 agent 工具集中。 -- Agent loop:`src/main/presenter/agentPresenter/loop/toolCallProcessor.ts` 对部分 `browser_*` 工具有输出 offload 白名单。 +- Main:YoBrowser 通过 `YoBrowserToolHandler` + `YoBrowserToolDefinitions` 暴露 `yo_browser_*` 工具,当前有 skill gating 逻辑(需要激活 `yo-browser-cdp` skill)。 +- Agent loop:`src/main/presenter/agentPresenter/loop/toolCallProcessor.ts` 中 `TOOLS_REQUIRING_OFFLOAD` 包含 `yo_browser_cdp_send`。 ## 总体设计 -按你的要求分成三块: - 1) UI:Browser Tabs 分区只在 `tabCount > 0` 时出现。 -2) 完全移除 `browser_*`:从 agent 工具列表/路由/实现/文档/测试中彻底清理。 -3) CDP Skill 化:新增 `yo-browser-cdp`,并提供最小工具面(CDP send + tab 管理),工具仅在 skill 激活时可用。 +2) YoBrowser 工具直接注入:agent 模式下直接提供 `yo_browser_*` 工具,不依赖 skills 体系。 +3) 工具实现保持 CDP 方式:`yo_browser_cdp_send` + tab 管理,参数 schema 按 CDP 定义。 > 约束:不做任何 system prompt / browser context 缩减。 @@ -31,94 +29,50 @@ --- -## 2) 完全移除 `browser_*` 工具体系 - -这里的“完全移除”需要同时覆盖: +## 2) YoBrowser 工具直接注入(agent 模式,不依赖 skills) -### 2.1 从 agent 工具注入与执行链路移除 - -- `src/main/presenter/agentPresenter/acp/agentToolManager.ts` - - 移除 `getAllToolDefinitions()` 中对 `yoBrowserPresenter.getToolDefinitions()` 的注入。 - - 移除 `callTool()` 中对 `browser_*`(例如 `toolName.startsWith('browser_')`)的路由分支。 +### 2.1 移除 tool definitions 的 skill gating -### 2.2 删除旧工具实现与依赖 +- `src/main/presenter/browser/YoBrowserToolHandler.ts` + - 删除 `getActiveSkills()` 方法或不再使用。 + - `getToolDefinitions()` 直接返回 `getYoBrowserToolDefinitions()`(不再受 `activeSkills` 控制)。 -- 删除/移除构建引用链: - - `src/main/presenter/browser/BrowserToolManager.ts` - - `src/main/presenter/browser/tools/**` -- 同步清理任何残留引用(全局搜索 `browser_` / `BrowserToolManager`)。 +### 2.2 同步更新 AgentToolManager 注入逻辑 -### 2.3 同步更新外围逻辑、文档、测试 - -- `src/main/presenter/agentPresenter/loop/toolCallProcessor.ts` - - 从 `TOOLS_REQUIRING_OFFLOAD` 中移除 `browser_*`(至少 `browser_read_links` / `browser_get_clickable_elements`)。 -- `docs/architecture/tool-system.md` - - 移除 `browser_*` 相关说明与示例,改为描述:YoBrowser 自动化由 `yo-browser-cdp` skill 触发 + `yo_browser_*` 最小工具面。 -- 测试 - - 更新/删除任何假设 `browser_*` 存在的测试(例如 tool definitions 断言)。 +- `src/main/presenter/agentPresenter/acp/agentToolManager.ts` + - `getAllToolDefinitions()` 中,在 agent 模式下直接追加 `yoBrowserPresenter.toolHandler.getToolDefinitions()`(不再传递/依赖 conversationId 做 gating)。 + - `callTool()` 中,`toolName.startsWith('yo_browser_')` 分支保持不变(继续路由到 YoBrowser handler)。 -风险: -- 删除 `browser_*` 后,若 repo 内仍存在引用会导致构建/运行失败;因此需要全局清理。 +### 2.3 移除 skill 文档与残留引用 -回滚: -- 按你的要求不保留旧实现,回滚需依赖 git 历史恢复。 +- 删除 `resources/skills/yo-browser-cdp/` 整个目录。 +- `docs/architecture/tool-system.md`: + - 删除或改写“YoBrowser CDP 工具仅在 `yo-browser-cdp` skill 激活时可用”的描述。 + - 改为:“YoBrowser CDP 工具在 agent 模式下直接可用”。 +- 全局搜索 `yo-browser-cdp` / `allowedTools` / `skill gated`,确保没有残留引用(代码、文档、测试)。 --- -## 3) CDP Skill 化:`yo-browser-cdp` + 最小工具面 - -### 3.1 Skill 目录与命名 +## 3) 工具实现:CDP 方式 + 参数定义(保持现状) -- 新增 skill:`resources/skills/yo-browser-cdp/SKILL.md`(名称 `yo-browser-cdp`)。 - -### 3.2 最小工具面(仅在 skill 激活时暴露) - -工具名以 `yo_browser_` 前缀区分旧 `browser_*`: +### 3.1 工具集合(无需改动) - `yo_browser_tab_list` - - 输出:tabs 列表 + active tabId。 - `yo_browser_tab_new` - - 输入:`{ url?: string }`;输出:新 tabId + url。 - `yo_browser_tab_activate` - - 输入:`{ tabId: string }`。 - `yo_browser_tab_close` - - 输入:`{ tabId: string }`。 - `yo_browser_cdp_send` - - 输入:`{ tabId?: string, method: string, params?: object }`。 - - 输出:`Debugger.sendCommand(method, params)` 的响应(推荐 JSON 文本)。 - -关键约束: -- 必须复用现有安全边界:当 tab URL 为 `local://` 时拒绝 CDP attach。 - - 推荐实现路径:通过 `BrowserTab.ensureSession()` 获取 session,再执行 `sendCommand`,避免绕过现有检查。 - -### 3.3 Skill gating 方案 - -- 目标:默认情况下 tool definitions **不包含** `yo_browser_*`。 -- 当且仅当 active skills 包含 `yo-browser-cdp` 时,才将 `yo_browser_*` 注入 tool definitions。 -- 建议复用现有 `deepchat-settings` 的 gating 模式: - - 在 `AgentToolManager.getAllToolDefinitions()` 中读取 `presenter.skillPresenter.getActiveSkills(conversationId)`,若包含 `yo-browser-cdp` 则追加 `yo_browser_*` tool definitions。 - -### 3.4 Skill 文档内容要点(SKILL.md) - -- 激活规则:仅当用户明确需要网页浏览/网页自动化/抓取验证时才激活;结束后及时 deactivate。 -- 心智模型:tab → active tab → CDP session → method/params。 -- 推荐 workflow: - - `yo_browser_tab_list`(确认 active tab) - - 必要时 `yo_browser_tab_new` / `yo_browser_tab_activate` - - `yo_browser_cdp_send` 执行: - - 页面导航:`Page.navigate` - - 读 DOM/内容:`Runtime.evaluate` - - 等待:`Runtime.evaluate` + 轮询/超时 -- 常见错误处理:element not found、导航超时、tab 被销毁、CDP attach 被拒绝(local://)。 - -### 3.5 测试策略 - -- Renderer - - 验证 `WorkspaceView`:tabCount=0 不显示;tabCount>0 显示。 -- Main - - tool definitions:默认无 `browser_*`,默认无 `yo_browser_*`。 - - 激活 `yo-browser-cdp` 后:出现 `yo_browser_*`。 - - 永久:`browser_*` 不再出现。 + +### 3.2 参数 schema(保持现状,无需改动) + +- `src/main/presenter/browser/YoBrowserToolDefinitions.ts`: + - `cdp_send` 参数:`{ tabId?: string, method: string, params?: object }`。 + - 其他 tab 管理工具参数保持不变。 + +### 3.3 安全边界(保持现状) + +- `src/main/presenter/browser/BrowserTab.ensureSession()`: + - 检查 `currentUrl.startsWith('local://')`,若为真则抛出错误(禁止 CDP attach)。 --- @@ -126,3 +80,4 @@ - system prompt / browser context 的缩减或重写。 - 任何对 YoBrowser UI 行为(窗口位置/大小等)的调整。 +- skills 体系(YoBrowser 不再使用 skills 来控制工具可见性)。 diff --git a/docs/specs/yobrowser-optimization/spec.md b/docs/specs/yobrowser-optimization/spec.md index 9950e2d1a..1180753e0 100644 --- a/docs/specs/yobrowser-optimization/spec.md +++ b/docs/specs/yobrowser-optimization/spec.md @@ -1,39 +1,31 @@ -# YoBrowser Optimization(UI + CDP Skill 化) +# YoBrowser Optimization(UI + CDP 工具) ## 背景 -当前 YoBrowser 在 Workspace 侧边栏与 agent 工具体系中存在两类问题: - -1) **UI**:`src/renderer/src/components/workspace/WorkspaceView.vue` 在 `agent` 模式下总会渲染 `WorkspaceBrowserTabs` 分区,即便没有任何 tab,也会出现一块空区域。 - -2) **工具体系**:main 侧通过 `BrowserToolManager` 暴露大量 `browser_*` 工具(navigate/action/content/tabs/download 等),你希望: -- **完全移除**这套 `browser_*` 工具(不暴露、不保留旧实现)。 -- 走“**完全 CDP**”路线:用 **skill** 教会模型如何操作 CDP,并只提供 **最小工具面**(优先 1 个通用 CDP send + 少量 tab 管理)。 - -> 重要约束:你明确要求 **不做 system prompt / browser context 的减少**,因此本需求不包含任何“缩减 prompt 注入”的工作。 +当前 YoBrowser 在 Workspace 侧边栏存在 UI 问题: +- `src/renderer/src/components/workspace/WorkspaceView.vue` 在 `agent` 模式下总会渲染 `WorkspaceBrowserTabs` 分区,即便没有任何 tab,也会出现一块空区域。 ## 目标(Goals) 1. **UI**:只有存在 YoBrowser tabs 时,Workspace 侧边栏才显示 Browser Tabs 分区。 -2. **完全移除 `browser_*`**:从 agent 工具列表、路由、实现代码、文档与测试中彻底移除 `browser_*` 体系。 -3. **CDP Skill 化**:新增 `yo-browser-cdp` skill(目录命名按约定),通过 skill 文档讲清 CDP 工作流;并提供最小工具面以 CDP 为核心。 +2. **Agent 工具直接注入**:YoBrowser 工具(`yo_browser_*`)在 agent 模式下直接可用,无需激活任何 skill。 ## 非目标(Non-Goals) - 不调整 YoBrowser window 的 UI、尺寸、布局、位置策略。 - 不修改 `BrowserContextBuilder.buildSystemPrompt` 的注入策略(不做减少/压缩/裁剪)。 - 不改造其他 agent 工具(filesystem/bash/mcp 等)。 +- 不使用 skills 系统来控制 YoBrowser 工具的可见性。 ## 用户故事(User Stories) - 作为用户,我不希望在没有任何浏览器 tab 的情况下,Workspace 侧边栏仍出现空的 Browser Tabs 分区。 -- 作为 agent 用户,我希望 YoBrowser 自动化能力以 CDP 为核心,并通过 skill 教会模型如何使用,而不是依赖大量高层封装工具。 -- 作为维护者,我希望彻底移除现有 `browser_*` 工具实现,减少长期维护面。 +- 作为 agent 用户,我希望 YoBrowser 自动化能力以 CDP 为核心,工具在 agent 模式下直接可用。 ## 约束与假设(Constraints & Assumptions) - YoBrowser 现有实现已经基于 Electron Debugger/CDP(`CDPManager`, `BrowserTab.ensureSession()`)。 -- 安全边界:`local://` URL 禁止绑定 CDP(`BrowserTab` 现有逻辑已做限制),skill 必须明确写出该约束。 +- 安全边界:`local://` URL 禁止绑定 CDP(`BrowserTab` 现有逻辑已做限制)。 ## 验收标准(Acceptance Criteria) @@ -42,27 +34,22 @@ - [ ] `src/renderer/src/components/workspace/WorkspaceView.vue` 仅在 `chatMode === 'agent' && yoBrowserStore.tabCount > 0` 时渲染 `WorkspaceBrowserTabs`。 - [ ] 当 `tabCount === 0` 时,不显示 Browser Tabs 分区(不保留空白区域)。 -### B. 工具:完全移除 `browser_*` 系列工具 +### B. 工具:YoBrowser CDP 工具直接注入(agent 模式) -- [ ] agent tool definitions 中不再出现任何 `browser_*` 工具。 -- [ ] agent 的 tool call 路由中不再接受/处理 `browser_*`(例如 `toolName.startsWith('browser_')` 这类分支被移除)。 -- [ ] 旧工具实现不再保留:`BrowserToolManager` 与 `src/main/presenter/browser/tools/**`(以及其他相关封装)被删除或完全移出构建引用链。 -- [ ] repo 内所有对 `browser_*` 的残留引用被清理/更新: - - `src/main/presenter/agentPresenter/loop/toolCallProcessor.ts` 的 offload 白名单不再包含 `browser_*`。 - - `docs/architecture/tool-system.md` 不再描述 `browser_*` 使用方式。 - - 相关测试不再假设 `browser_*` 存在。 +- [ ] agent tool definitions 中包含 `yo_browser_*` 工具(agent 模式下直接可用)。 +- [ ] agent 的 tool call 路由正确处理 `yo_browser_*` 工具(`toolName.startsWith('yo_browser_')`)。 +- [ ] 不依赖 skills 系统(不检查 `activeSkills`)。 -### C. Skill:新增 `yo-browser-cdp`(CDP 为核心 + 最小工具面) +### C. 工具实现:CDP 方式 + 合适的参数定义 -- [ ] 新增 skill:`resources/skills/yo-browser-cdp/SKILL.md`。 -- [ ] skill 明确写清:何时 activate/deactivate、tab/active tab、CDP session、`local://` 禁止 CDP、安全/失败处理。 -- [ ] `yo-browser-cdp` 激活后才暴露最小工具面(未激活时不可见/不可用)。 -- [ ] 最小工具面(名称为最终约定,后续实现需对齐): +- [ ] 工具集合: - `yo_browser_tab_list`:列出 tabs 与 active tab。 - `yo_browser_tab_new`:创建新 tab(可选 url)。 - `yo_browser_tab_activate`:激活 tab。 - `yo_browser_tab_close`:关闭 tab。 - `yo_browser_cdp_send`:向指定/当前 tab 的 CDP session 发送 `{ method, params }`。 +- [ ] 参数 schema 符合 CDP 使用方式(method、params 等)。 +- [ ] 保留安全边界:`local://` 禁止 CDP attach。 ### D. Prompt/Context diff --git a/docs/specs/yobrowser-optimization/tasks.md b/docs/specs/yobrowser-optimization/tasks.md index edf2d1418..25b972712 100644 --- a/docs/specs/yobrowser-optimization/tasks.md +++ b/docs/specs/yobrowser-optimization/tasks.md @@ -1,10 +1,8 @@ # YoBrowser Optimization:任务拆分(Tasks) -> 说明:本 tasks 用于后续落地实现的拆分;当前请求仅完善文档,因此这里只给出可执行顺序与验收点,不执行代码变更。 - ## Phase 1:UI(Workspace 侧边栏) -1. 调整 Browser Tabs 分区显示条件 +1. 调整调整 Browser Tabs 分区显示条件 - 文件:`src/renderer/src/components/workspace/WorkspaceView.vue` - 改动:`WorkspaceBrowserTabs` 仅在 `chatMode === 'agent' && yoBrowserStore.tabCount > 0` 时渲染。 - 验收:无 tabs 时不出现分区;有 tabs 时出现并能点击切换。 @@ -15,72 +13,49 @@ --- -## Phase 2:完全移除 `browser_*` 工具体系 +## Phase 2:移除 YoBrowser skill gating -3. 移除 agent 工具定义注入(browser_*) -- 文件:`src/main/presenter/agentPresenter/acp/agentToolManager.ts` -- 改动:删除 YoBrowser `getToolDefinitions()` 注入逻辑。 -- 验收:tool definitions 不再包含任何 `browser_*`。 +3. 移除 YoBrowser tool definitions 的 skill gating +- 文件:`src/main/presenter/browser/YoBrowserToolHandler.ts` +- 改动:删除 `getActiveSkills()` 方法或不再使用;`getToolDefinitions()` 直接返回 `getYoBrowserToolDefinitions()`。 +- 验收:不再依赖 `activeSkills`。 -4. 移除 agent 的 `browser_*` call 路由 +4. 调整 AgentToolManager 注入逻辑(不再依赖 conversationId 做 gating) - 文件:`src/main/presenter/agentPresenter/acp/agentToolManager.ts` -- 改动:删除/禁止 `toolName.startsWith('browser_')` 分支。 -- 验收:任何 `browser_*` tool call 都会被视为 unknown tool。 +- 改动:`getAllToolDefinitions()` 中,agent 模式下直接追加 `yoBrowserPresenter.toolHandler.getToolDefinitions()`(可不传 conversationId)。 +- 验收:tool definitions 包含 `yo_browser_*`。 -5. 清理 agent loop 对 `browser_*` 的特殊处理 -- 文件:`src/main/presenter/agentPresenter/loop/toolCallProcessor.ts` -- 改动:从 `TOOLS_REQUIRING_OFFLOAD` 移除所有 `browser_*`。 +5. 删除 skill 文档与残留引用 +- 删除 `resources/skills/yo-browser-cdp/` 整个目录。 +- 文件:`docs/architecture/tool-system.md`(以及搜索到的其他文档) +- 改动:删除或改写“仅在 `yo-browser-cdp` skill 激活时可用”的描述;改为“agent 模式下直接可用”。 +- 全局搜索:确认没有残留的 `yo-browser-cdp` / `skill gated` 引用。 -6. 删除旧工具实现代码 -- 目录/文件(预计): - - `src/main/presenter/browser/BrowserToolManager.ts` - - `src/main/presenter/browser/tools/**` -- 验收:repo 中不再保留这套封装工具实现,且无残留引用。 +--- -7. 更新文档与示例 -- 文件:`docs/architecture/tool-system.md`(以及搜索到的其他文档) -- 改动:移除/更新关于 `browser_*` 的示例与说明,改为 `yo-browser-cdp` skill + `yo_browser_*` 最小工具面。 +## Phase 3:验证工具实现(保持 CDP 方式) -8. 更新/修复测试 -- 文件:`test/main/**`(按失败点定位) -- 改动:移除任何假设 `browser_*` 存在的断言。 +6. 验证工具参数定义 +- 文件:`src/main/presenter/browser/YoBrowserToolDefinitions.ts` +- 验收:`yo_browser_cdp_send` 参数为 `{ tabId?: string, method: string, params?: object }`。 ---- +7. 验证安全边界 +- 文件:`src/main/presenter/browser/BrowserTab.ts` +- 验收:`ensureSession()` 中有 `local://` URL 检查。 -## Phase 3:新增 `yo-browser-cdp`(CDP skill + 最小工具面) - -9. 创建 skill 文档 -- 文件:`resources/skills/yo-browser-cdp/SKILL.md` -- 内容要求: - - 激活/关闭规则(仅在需要网页自动化时)。 - - tab/active tab/CDP session 概念。 - - 安全边界:`local://` 禁止 CDP。 - - 推荐工作流:list tabs → activate/new → `yo_browser_cdp_send(Page.navigate)` → `yo_browser_cdp_send(Runtime.evaluate)` → wait/retry。 - - 常见错误处理:element not found、navigation 超时、tab 被销毁等。 - -10. 实现最小 `yo_browser_*` 工具集,并做 skill gating -- 目标:工具只在 `yo-browser-cdp` 激活时可用(默认不可见)。 -- 工具集合(最终以实现为准): - - `yo_browser_tab_list` - - `yo_browser_tab_new` - - `yo_browser_tab_activate` - - `yo_browser_tab_close` - - `yo_browser_cdp_send` - -11.(可选)补 main 单测覆盖 gating +8.(可选)补 main 单测 - 验证: - - 默认无 `yo_browser_*`。 - - 激活 `yo-browser-cdp` 后出现 `yo_browser_*`。 - - `browser_*` 永久不存在。 + - agent 模式下 tool definitions 包含 `yo_browser_*`。 + - `callTool()` 正确路由到 YoBrowser handler。 --- ## Phase 4:验收与质量门禁 -12. 手工验收 +9. 手工验收 - Agent 模式下:无 tabs 时 Workspace 不显示 Browser Tabs;创建 tab 后显示。 -- 激活 `yo-browser-cdp`:模型可按 skill 指引使用 CDP 工具完成基本操作。 +- Agent 模式下:不激活任何 skill,`yo_browser_*` 工具直接可用。 -13. 质量门禁 +10. 质量门禁 - `pnpm run format && pnpm run lint && pnpm run typecheck` - `pnpm test` diff --git a/resources/skills/yo-browser-cdp/SKILL.md b/resources/skills/yo-browser-cdp/SKILL.md deleted file mode 100644 index bcfba53ec..000000000 --- a/resources/skills/yo-browser-cdp/SKILL.md +++ /dev/null @@ -1,219 +0,0 @@ ---- -name: yo-browser-cdp -description: DeepChat's built-in YoBrowser capability, controlled via Chrome DevTools Protocol (CDP). Prefer this skill whenever the model needs web browsing, navigation, extraction, or interaction. -allowedTools: - - yo_browser_tab_list - - yo_browser_tab_new - - yo_browser_tab_activate - - yo_browser_tab_close - - yo_browser_cdp_send ---- - -# YoBrowser CDP Skill - -## Overview - -YoBrowser is DeepChat's built-in browser. This skill exposes browser capabilities by controlling YoBrowser through Chrome DevTools Protocol (CDP). - -Use this skill as the default choice when you need any web browsing or page-level interaction (navigate, read, extract, click, fill forms, verify UI behavior). It is especially suitable for tasks that require real browser behavior rather than simple HTTP fetching. - -## Mental Model - -The browser automation workflow follows this pattern: -1. **Tab Management**: Tabs are isolated browser instances. You can list, create, activate, and close tabs. -2. **Active Tab**: Each browser has one active tab that receives commands by default. -3. **CDP Session**: Each tab has a CDP session attached that accepts protocol commands. -4. **CDP Commands**: Send `{ method, params }` to the active tab's CDP session to perform operations. - -## Security Constraints - -**IMPORTANT**: Browser tabs with URLs starting with `local://` are **NOT** allowed to attach CDP sessions. This is a security boundary to prevent automation of DeepChat's internal UI pages. - -If you attempt to use CDP commands on a `local://` tab, the operation will fail with an error. - -## Recommended Workflow - -### Basic Navigation and Content Reading - -1. List current tabs: - ``` - yo_browser_tab_list() - ``` - -2. If no suitable tab exists, create a new one: - ``` - yo_browser_tab_new({ url: "https://example.com" }) - ``` - -3. Navigate to a target URL using CDP: - ``` - yo_browser_cdp_send({ - method: "Page.navigate", - params: { url: "https://example.com" } - }) - ``` - -4. Wait for page load (check navigation result): - - The `Page.navigate` response includes `result: { success: boolean }` - - If navigation is successful, proceed to content extraction - -5. Extract page content using CDP: - ``` - yo_browser_cdp_send({ - method: "Runtime.evaluate", - params: { - expression: "document.body.innerText", - returnByValue: true - } - }) - ``` - -### Advanced: DOM Interaction - -1. Find elements: - ``` - yo_browser_cdp_send({ - method: "Runtime.evaluate", - params: { - expression: "document.querySelector('.button').click()", - returnByValue: false - } - }) - ``` - -2. Wait for state changes: - ``` - yo_browser_cdp_send({ - method: "Runtime.evaluate", - params: { - expression: ` - new Promise(resolve => { - const check = () => { - if (document.querySelector('.loaded')) resolve(true); - else setTimeout(check, 100); - }; - check(); - }) - `, - awaitPromise: true - } - }) - ``` - -## Available Tools - -### `yo_browser_tab_list` -List all browser tabs and identify the active tab. - -**Returns**: JSON object with `activeTabId` and `tabs` (each tab has `id`, `url`, `title`, `isActive`). - -### `yo_browser_tab_new` -Create a new browser tab with an optional URL. - -**Parameters**: -- `url` (optional): Initial URL to navigate to - -**Returns**: New tab object with `id`, `url`, and `title`. - -### `yo_browser_tab_activate` -Make a specific tab the active tab. - -**Parameters**: -- `tabId`: ID of the tab to activate - -### `yo_browser_tab_close` -Close a specific browser tab. - -**Parameters**: -- `tabId`: ID of the tab to close - -### `yo_browser_cdp_send` -Send a CDP command to a tab. - -**Parameters**: -- `tabId` (optional): Target tab ID. If omitted, uses the active tab. -- `method`: CDP method name (e.g., `Page.navigate`, `Runtime.evaluate`) -- `params` (optional): Method parameters as an object - -**Returns**: CDP command response as JSON. - -## Common CDP Commands - -### Page Navigation -- `Page.navigate` - Navigate to a URL - - `params: { url: string }` - -### DOM Inspection -- `Runtime.evaluate` - Execute JavaScript in page context - - `params: { expression: string, returnByValue: boolean, awaitPromise: boolean }` -- `DOM.getDocument` - Get DOM tree -- `DOM.querySelector` - Find a DOM node - -### Page Interaction -- `Input.dispatchMouseEvent` - Simulate mouse events -- `Input.dispatchKeyEvent` - Simulate keyboard events -- `Runtime.evaluate` with `expression` containing `.click()` - Click elements - -## Error Handling - -Common errors and how to handle them: - -1. **Tab not found**: The specified `tabId` doesn't exist. Use `yo_browser_tab_list` to get valid tabs. - -2. **No active tab**: You tried to send a CDP command without specifying a tab ID, and no tab is active. Create or activate a tab first. - -3. **CDP attach rejected**: You attempted to attach CDP to a `local://` URL. This is a security restriction and cannot be bypassed. - -4. **Navigation failed**: `Page.navigate` returned `success: false`. The URL may be invalid or unreachable. Check the URL and try again. - -5. **Element not found**: Your DOM query returned null. Verify the selector is correct and the page has loaded completely. - -6. **Navigation timeout**: The page took too long to load. Consider adding retry logic or increasing timeout tolerance. - -7. **Tab destroyed**: The tab was closed during operations. List tabs again and recreate if needed. - -## Best Practices - -1. **Always list tabs first** to understand the current state before making assumptions. - -2. **Check navigation success** before proceeding with content extraction. - -3. **Use `awaitPromise: true`** when evaluating JavaScript that returns Promises. - -4. **Handle `local://` restrictions** by checking tab URLs before attempting CDP operations. - -5. **Close tabs when done** to free resources, especially if many tabs were created. - -6. **Use `returnByValue: true`** when you need the actual value from `Runtime.evaluate`, not a remote object reference. - -## Example Session - -```json -// 1. List tabs -{ "tool": "yo_browser_tab_list", "args": {} } - -// Response: { "activeTabId": "tab-1", "tabs": [{ "id": "tab-1", "url": "about:blank", "title": "", "isActive": true }] } - -// 2. Navigate to a page -{ "tool": "yo_browser_cdp_send", "args": { - "method": "Page.navigate", - "params": { "url": "https://example.com" } -}} - -// Response: { "result": { "success": true } } - -// 3. Extract page content -{ "tool": "yo_browser_cdp_send", "args": { - "method": "Runtime.evaluate", - "params": { - "expression": "document.title", - "returnByValue": true - } -}} - -// Response: { "result": { "type": "string", "value": "Example Domain" } } -``` - -## Deactivation - -When you're finished with browser operations, explicitly deactivate this skill to keep the tool set focused on the current task. diff --git a/src/main/presenter/agentPresenter/acp/agentToolManager.ts b/src/main/presenter/agentPresenter/acp/agentToolManager.ts index 0438b613b..65051babf 100644 --- a/src/main/presenter/agentPresenter/acp/agentToolManager.ts +++ b/src/main/presenter/agentPresenter/acp/agentToolManager.ts @@ -1,4 +1,4 @@ -import type { IConfigPresenter, IYoBrowserPresenter, MCPToolDefinition } from '@shared/presenter' +import type { IConfigPresenter, MCPToolDefinition } from '@shared/presenter' import { zodToJsonSchema } from 'zod-to-json-schema' import { z } from 'zod' import fs from 'fs' @@ -52,14 +52,12 @@ export interface AgentToolCallResult { } interface AgentToolManagerOptions { - yoBrowserPresenter: IYoBrowserPresenter agentWorkspacePath: string | null configPresenter: IConfigPresenter commandPermissionHandler?: CommandPermissionService } export class AgentToolManager { - private readonly yoBrowserPresenter: IYoBrowserPresenter private agentWorkspacePath: string | null private fileSystemHandler: AgentFileSystemHandler | null = null private bashHandler: AgentBashHandler | null = null @@ -232,7 +230,6 @@ export class AgentToolManager { } constructor(options: AgentToolManagerOptions) { - this.yoBrowserPresenter = options.yoBrowserPresenter this.agentWorkspacePath = options.agentWorkspacePath this.configPresenter = options.configPresenter this.commandPermissionHandler = options.commandPermissionHandler @@ -275,17 +272,7 @@ export class AgentToolManager { this.agentWorkspacePath = effectiveWorkspacePath } - // 1. Yo Browser tools (agent mode only) - if (isAgentMode) { - try { - const yoDefs = await this.yoBrowserPresenter.getToolDefinitions(context.supportsVision) - defs.push(...yoDefs) - } catch (error) { - logger.warn('[AgentToolManager] Failed to load Yo Browser tool definitions', { error }) - } - } - - // 2. FileSystem tools (agent mode only) + // 1. FileSystem tools (agent mode only) if (isAgentMode && this.fileSystemHandler) { const fsDefs = this.getFileSystemToolDefinitions() defs.push(...fsDefs) @@ -324,6 +311,15 @@ export class AgentToolManager { } } + // 5. YoBrowser CDP tools (agent mode only) + if (isAgentMode) { + try { + defs.push(...presenter.yoBrowserPresenter.toolHandler.getToolDefinitions()) + } catch (error) { + logger.warn('[AgentToolManager] Failed to load YoBrowser tools', { error }) + } + } + return defs } @@ -335,17 +331,6 @@ export class AgentToolManager { args: Record, conversationId?: string ): Promise { - // Route to Yo Browser tools - if (toolName.startsWith('browser_')) { - const response = await this.yoBrowserPresenter.callTool( - toolName, - args as Record - ) - return { - content: typeof response === 'string' ? response : JSON.stringify(response) - } - } - // Route to FileSystem tools if (this.isFileSystemTool(toolName)) { if (!this.fileSystemHandler) { @@ -364,6 +349,14 @@ export class AgentToolManager { return await this.callChatSettingsTool(toolName, args, conversationId) } + // Route to YoBrowser CDP tools + if (toolName.startsWith('yo_browser_')) { + const response = await presenter.yoBrowserPresenter.toolHandler.callTool(toolName, args) + return { + content: response + } + } + throw new Error(`Unknown Agent tool: ${toolName}`) } diff --git a/src/main/presenter/browser/BrowserTab.ts b/src/main/presenter/browser/BrowserTab.ts index 11bcfcaea..348b7703e 100644 --- a/src/main/presenter/browser/BrowserTab.ts +++ b/src/main/presenter/browser/BrowserTab.ts @@ -64,6 +64,11 @@ export class BrowserTab { return await this.cdpManager.evaluateScript(session, script) } + async sendCdpCommand(method: string, params?: Record): Promise { + const session = await this.ensureSession() + return await session.sendCommand(method, params ?? {}) + } + async takeScreenshot(options?: ScreenshotOptions): Promise { await this.ensureSession() this.ensureAvailable() diff --git a/src/main/presenter/browser/BrowserToolManager.ts b/src/main/presenter/browser/BrowserToolManager.ts deleted file mode 100644 index 38f441e4d..000000000 --- a/src/main/presenter/browser/BrowserToolManager.ts +++ /dev/null @@ -1,103 +0,0 @@ -import type { RequestHandlerExtra } from '@modelcontextprotocol/sdk/shared/protocol.js' -import type { Notification, Request } from '@modelcontextprotocol/sdk/types.js' -import type { BrowserToolContext, BrowserToolDefinition } from './tools/types' -import { createNavigateTools } from './tools/navigate' -import { createActionTools } from './tools/action' -import { createContentTools } from './tools/content' -// import { createScreenshotTools } from './tools/screenshot' -import { createTabTools } from './tools/tabs' -import { createDownloadTools } from './tools/download' -import type { YoBrowserPresenter } from './YoBrowserPresenter' - -export class BrowserToolManager { - private readonly presenter: YoBrowserPresenter - private readonly tools: BrowserToolDefinition[] - - constructor(presenter: YoBrowserPresenter) { - this.presenter = presenter - this.tools = [ - ...createNavigateTools(), - ...createActionTools(), - ...createContentTools(), - // ...createScreenshotTools(), - ...createTabTools(), - ...createDownloadTools() - ] - } - - getToolDefinitions() { - return this.tools - } - - async executeTool( - toolName: string, - args: any, - extra?: RequestHandlerExtra - ) { - const tool = this.tools.find((t) => t.name === toolName) - if (!tool) { - return { - content: [{ type: 'text', text: `Unknown tool: ${toolName}` }], - isError: true - } - } - - const context = this.createContext() - return await tool.handler(args, context, extra || ({} as any)) - } - - private createContext(): BrowserToolContext { - return { - getTab: async (tabId?: string) => { - return await this.presenter.getBrowserTab(tabId) - }, - getActiveTab: async () => { - return await this.presenter.getBrowserTab() - }, - resolveTabId: async (args?: { tabId?: string }) => { - if (args?.tabId) { - return args.tabId - } - const active = await this.presenter.getActiveTab() - if (active) return active.id - const tabs = await this.presenter.listTabs() - if (tabs.length > 0) return tabs[0].id - const newTab = await this.presenter.createTab('about:blank') - if (!newTab) { - throw new Error('No available tab to operate on') - } - return newTab.id - }, - createTab: async (url?: string) => { - const tab = await this.presenter.createTab(url) - if (!tab) return null - return { id: tab.id, url: tab.url, title: tab.title || '' } - }, - listTabs: async () => { - const tabs = await this.presenter.listTabs() - const active = await this.presenter.getActiveTab() - return tabs.map((tab) => ({ - id: tab.id, - url: tab.url, - title: tab.title || '', - isActive: tab.id === active?.id - })) - }, - activateTab: async (tabId: string) => { - await this.presenter.activateTab(tabId) - }, - closeTab: async (tabId: string) => { - await this.presenter.closeTab(tabId) - }, - downloadFile: async (url: string, savePath?: string) => { - const download = await this.presenter.startDownload(url, savePath) - return { - id: download.id, - url: download.url, - filePath: download.filePath, - status: download.status - } - } - } - } -} diff --git a/src/main/presenter/browser/YoBrowserPresenter.ts b/src/main/presenter/browser/YoBrowserPresenter.ts index d87029b3f..5ffb46c4c 100644 --- a/src/main/presenter/browser/YoBrowserPresenter.ts +++ b/src/main/presenter/browser/YoBrowserPresenter.ts @@ -5,7 +5,6 @@ import { TAB_EVENTS, YO_BROWSER_EVENTS } from '@/events' import { BrowserTabInfo, BrowserContextSnapshot, ScreenshotOptions } from '@shared/types/browser' import { IYoBrowserPresenter, - MCPToolDefinition, DownloadInfo, IWindowPresenter, ITabPresenter @@ -14,9 +13,8 @@ import { BrowserTab } from './BrowserTab' import { CDPManager } from './CDPManager' import { ScreenshotManager } from './ScreenshotManager' import { DownloadManager } from './DownloadManager' -import { BrowserToolManager } from './BrowserToolManager' -import { zodToJsonSchema } from 'zod-to-json-schema' import { clearYoBrowserSessionData } from './yoBrowserSession' +import { YoBrowserToolHandler } from './YoBrowserToolHandler' export class YoBrowserPresenter implements IYoBrowserPresenter { private windowId: number | null = null @@ -28,14 +26,14 @@ export class YoBrowserPresenter implements IYoBrowserPresenter { private readonly cdpManager = new CDPManager() private readonly screenshotManager = new ScreenshotManager(this.cdpManager) private readonly downloadManager = new DownloadManager() - private readonly browserToolManager: BrowserToolManager private readonly windowPresenter: IWindowPresenter private readonly tabPresenter: ITabPresenter + readonly toolHandler: YoBrowserToolHandler constructor(windowPresenter: IWindowPresenter, tabPresenter: ITabPresenter) { this.windowPresenter = windowPresenter this.tabPresenter = tabPresenter - this.browserToolManager = new BrowserToolManager(this) + this.toolHandler = new YoBrowserToolHandler(this) eventBus.on(TAB_EVENTS.CLOSED, (tabId: number) => this.handleTabClosed(tabId)) } @@ -170,10 +168,6 @@ export class YoBrowserPresenter implements IYoBrowserPresenter { return this.toTabInfo(tab) } - async getBrowserTab(tabId?: string): Promise { - return await this.resolveTab(tabId) - } - async goBack(tabId?: string): Promise { const tab = await this.resolveTab(tabId) if (tab?.contents.canGoBack()) { @@ -316,49 +310,6 @@ export class YoBrowserPresenter implements IYoBrowserPresenter { return this.viewIdToTabId.get(viewId) ?? null } - async getToolDefinitions(_supportsVision: boolean): Promise { - // Only return browser_* tools from BrowserToolManager - const browserTools = this.browserToolManager.getToolDefinitions() - const browserMcpTools: MCPToolDefinition[] = browserTools.map((tool) => { - const jsonSchema = zodToJsonSchema(tool.schema) as { - type?: string - properties?: Record - required?: string[] - [key: string]: unknown - } - return { - type: 'function' as const, - function: { - name: tool.name, - description: tool.description, - parameters: { - type: 'object' as const, - properties: (jsonSchema.properties || {}) as Record, - required: (jsonSchema.required || []) as string[] - } - }, - server: { - name: 'yo-browser', - icons: '🌐', - description: 'DeepChat built-in Yo Browser' - } - } - }) - return browserMcpTools - } - - async callTool(toolName: string, params: Record): Promise { - const result = await this.browserToolManager.executeTool(toolName, params) - const textParts = result.content - .filter((c): c is { type: 'text'; text: string } => c.type === 'text') - .map((c) => c.text) - const textContent = textParts.join('\n\n') - if (result.isError) { - throw new Error(textContent || 'Tool execution failed') - } - return textContent - } - async captureScreenshot(tabId: string, options?: ScreenshotOptions): Promise { const tab = await this.resolveTab(tabId) if (!tab) { @@ -641,6 +592,10 @@ export class YoBrowserPresenter implements IYoBrowserPresenter { eventBus.sendToRenderer(YO_BROWSER_EVENTS.TAB_CLOSED, SendTarget.ALL_WINDOWS, tabId) } + async getBrowserTab(tabId?: string): Promise { + return await this.resolveTab(tabId) + } + private emitTabActivated(tabId: string) { eventBus.sendToRenderer(YO_BROWSER_EVENTS.TAB_ACTIVATED, SendTarget.ALL_WINDOWS, tabId) } diff --git a/src/main/presenter/browser/YoBrowserToolDefinitions.ts b/src/main/presenter/browser/YoBrowserToolDefinitions.ts new file mode 100644 index 000000000..c597a696d --- /dev/null +++ b/src/main/presenter/browser/YoBrowserToolDefinitions.ts @@ -0,0 +1,118 @@ +import { z } from 'zod' +import { zodToJsonSchema } from 'zod-to-json-schema' +import type { MCPToolDefinition } from '@shared/presenter' + +const yoBrowserSchemas = { + tab_list: z.object({}), + tab_new: z.object({ + url: z.string().url().optional().describe('Optional URL to navigate to when creating the tab') + }), + tab_activate: z.object({ + tabId: z.string().min(1).describe('ID of the tab to activate') + }), + tab_close: z.object({ + tabId: z.string().min(1).describe('ID of the tab to close') + }), + cdp_send: z.object({ + tabId: z.string().optional().describe('Optional tab ID. If omitted, uses the active tab'), + method: z + .string() + .min(1) + .describe('CDP method name (e.g., "Page.navigate", "Runtime.evaluate")'), + params: z + .record(z.unknown()) + .optional() + .describe('Optional parameters object for the CDP method') + }) +} + +export function getYoBrowserToolDefinitions(): MCPToolDefinition[] { + return [ + { + type: 'function', + function: { + name: 'yo_browser_tab_list', + description: 'List all browser tabs and identify the active tab', + parameters: zodToJsonSchema(yoBrowserSchemas.tab_list) as { + type: string + properties: Record + required?: string[] + } + }, + server: { + name: 'yobrowser', + icons: '🌐', + description: 'YoBrowser CDP automation' + } + }, + { + type: 'function', + function: { + name: 'yo_browser_tab_new', + description: 'Create a new browser tab with an optional URL', + parameters: zodToJsonSchema(yoBrowserSchemas.tab_new) as { + type: string + properties: Record + required?: string[] + } + }, + server: { + name: 'yobrowser', + icons: '🌐', + description: 'YoBrowser CDP automation' + } + }, + { + type: 'function', + function: { + name: 'yo_browser_tab_activate', + description: 'Make a specific tab the active tab', + parameters: zodToJsonSchema(yoBrowserSchemas.tab_activate) as { + type: string + properties: Record + required?: string[] + } + }, + server: { + name: 'yobrowser', + icons: '🌐', + description: 'YoBrowser CDP automation' + } + }, + { + type: 'function', + function: { + name: 'yo_browser_tab_close', + description: 'Close a specific browser tab', + parameters: zodToJsonSchema(yoBrowserSchemas.tab_close) as { + type: string + properties: Record + required?: string[] + } + }, + server: { + name: 'yobrowser', + icons: '🌐', + description: 'YoBrowser CDP automation' + } + }, + { + type: 'function', + function: { + name: 'yo_browser_cdp_send', + description: + 'Send a Chrome DevTools Protocol (CDP) command to a browser tab. Use this for navigation, content extraction, and DOM interaction', + parameters: zodToJsonSchema(yoBrowserSchemas.cdp_send) as { + type: string + properties: Record + required?: string[] + } + }, + server: { + name: 'yobrowser', + icons: '🌐', + description: 'YoBrowser CDP automation' + } + } + ] +} diff --git a/src/main/presenter/browser/YoBrowserToolHandler.ts b/src/main/presenter/browser/YoBrowserToolHandler.ts new file mode 100644 index 000000000..de0e30011 --- /dev/null +++ b/src/main/presenter/browser/YoBrowserToolHandler.ts @@ -0,0 +1,110 @@ +import logger from '@shared/logger' +import { getYoBrowserToolDefinitions } from './YoBrowserToolDefinitions' +import type { YoBrowserPresenter } from './YoBrowserPresenter' + +export class YoBrowserToolHandler { + private readonly presenter: YoBrowserPresenter + + constructor(presenter: YoBrowserPresenter) { + this.presenter = presenter + } + + getToolDefinitions(): any[] { + return getYoBrowserToolDefinitions() + } + + async callTool(toolName: string, args: Record): Promise { + try { + switch (toolName) { + case 'yo_browser_tab_list': { + return await this.handleTabList() + } + case 'yo_browser_tab_new': { + const url = typeof args.url === 'string' ? args.url : undefined + return await this.handleTabNew(url) + } + case 'yo_browser_tab_activate': { + const tabId = typeof args.tabId === 'string' ? args.tabId : '' + if (!tabId) { + throw new Error('tabId is required') + } + return await this.handleTabActivate(tabId) + } + case 'yo_browser_tab_close': { + const tabId = typeof args.tabId === 'string' ? args.tabId : '' + if (!tabId) { + throw new Error('tabId is required') + } + return await this.handleTabClose(tabId) + } + case 'yo_browser_cdp_send': { + const tabId = typeof args.tabId === 'string' ? args.tabId : undefined + const method = typeof args.method === 'string' ? args.method : '' + const params = + typeof args.params === 'object' && args.params !== null + ? (args.params as Record) + : {} + return await this.handleCdpSend(tabId, method, params) + } + default: + throw new Error(`Unknown YoBrowser tool: ${toolName}`) + } + } catch (error) { + logger.error('[YoBrowserToolHandler] Tool execution failed', { toolName, error }) + throw error + } + } + + private async handleTabList(): Promise { + const tabs = await this.presenter.listTabs() + const activeTab = await this.presenter.getActiveTab() + return JSON.stringify({ + activeTabId: activeTab?.id ?? null, + tabs: tabs.map((tab: any) => ({ + id: tab.id, + url: tab.url, + title: tab.title, + isActive: tab.id === activeTab?.id + })) + }) + } + + private async handleTabNew(url?: string): Promise { + const tab = await this.presenter.createTab(url) + if (!tab) { + throw new Error('Failed to create new tab') + } + return JSON.stringify({ + id: tab.id, + url: tab.url, + title: tab.title + }) + } + + private async handleTabActivate(tabId: string): Promise { + await this.presenter.activateTab(tabId) + return JSON.stringify({ success: true, tabId }) + } + + private async handleTabClose(tabId: string): Promise { + await this.presenter.closeTab(tabId) + return JSON.stringify({ success: true, tabId }) + } + + private async handleCdpSend( + tabId: string | undefined, + method: string, + params: Record + ): Promise { + if (!method) { + throw new Error('CDP method is required') + } + const browserTab = await this.presenter.getBrowserTab(tabId) + if (!browserTab) { + throw new Error(tabId ? `Tab ${tabId} not found` : 'No active tab available') + } + + const response = await browserTab.sendCdpCommand(method, params) + return JSON.stringify(response ?? {}) + } +} diff --git a/src/main/presenter/browser/tools/action.ts b/src/main/presenter/browser/tools/action.ts deleted file mode 100644 index 698f0d1bf..000000000 --- a/src/main/presenter/browser/tools/action.ts +++ /dev/null @@ -1,200 +0,0 @@ -import { z } from 'zod' -import type { BrowserToolDefinition } from './types' - -const BaseArgsSchema = z.object({ - tabId: z.string().optional().describe('Tab identifier (defaults to active tab)') -}) - -const SelectorSchema = BaseArgsSchema.extend({ - selector: z.string().min(1).describe('CSS selector of the target element') -}) - -const ClickArgsSchema = SelectorSchema -const HoverArgsSchema = SelectorSchema - -const FormInputArgsSchema = SelectorSchema.extend({ - value: z.string().describe('Value to fill into the element'), - append: z - .boolean() - .optional() - .default(false) - .describe('Append to existing value instead of replacing') -}) - -const SelectArgsSchema = SelectorSchema.extend({ - value: z.union([z.string(), z.array(z.string())]).describe('Value or values to select') -}) - -const ScrollArgsSchema = BaseArgsSchema.extend({ - x: z.number().optional().default(0).describe('Horizontal scroll distance'), - y: z.number().optional().default(500).describe('Vertical scroll distance'), - behavior: z.enum(['auto', 'smooth']).optional().default('auto').describe('Scroll behavior') -}) - -const PressKeyArgsSchema = BaseArgsSchema.extend({ - key: z.string().min(1).describe('Key to press'), - count: z.number().int().min(1).optional().default(1).describe('Number of times to press the key') -}) - -export function createActionTools(): BrowserToolDefinition[] { - return [ - { - name: 'browser_click', - description: 'Click an element on the page using a CSS selector.', - schema: ClickArgsSchema, - handler: async (args, context) => { - const parsed = ClickArgsSchema.parse(args) - const tabId = await context.resolveTabId(parsed) - const tab = await context.getTab(tabId) - if (!tab) { - return { - content: [{ type: 'text', text: `Tab ${tabId} not found` }], - isError: true - } - } - await tab.waitForSelector(parsed.selector, { timeout: 5000 }) - await tab.click(parsed.selector) - return { - content: [ - { - type: 'text', - text: `Clicked element ${parsed.selector}` - } - ] - } - } - }, - { - name: 'browser_hover', - description: 'Hover over an element on the page.', - schema: HoverArgsSchema, - handler: async (args, context) => { - const parsed = HoverArgsSchema.parse(args) - const tabId = await context.resolveTabId(parsed) - const tab = await context.getTab(tabId) - if (!tab) { - return { - content: [{ type: 'text', text: `Tab ${tabId} not found` }], - isError: true - } - } - await tab.waitForSelector(parsed.selector, { timeout: 5000 }) - await tab.hover(parsed.selector) - return { - content: [ - { - type: 'text', - text: `Hovered over ${parsed.selector}` - } - ] - } - } - }, - { - name: 'browser_form_input_fill', - description: 'Fill text into an input or textarea element.', - schema: FormInputArgsSchema, - handler: async (args, context) => { - const parsed = FormInputArgsSchema.parse(args) - const tabId = await context.resolveTabId(parsed) - const tab = await context.getTab(tabId) - if (!tab) { - return { - content: [{ type: 'text', text: `Tab ${tabId} not found` }], - isError: true - } - } - await tab.waitForSelector(parsed.selector, { timeout: 5000 }) - await tab.fill(parsed.selector, parsed.value, parsed.append) - return { - content: [ - { - type: 'text', - text: `Filled ${parsed.selector} with value` - } - ] - } - } - }, - { - name: 'browser_select', - description: 'Select value(s) within a select element.', - schema: SelectArgsSchema, - handler: async (args, context) => { - const parsed = SelectArgsSchema.parse(args) - const tabId = await context.resolveTabId(parsed) - const tab = await context.getTab(tabId) - if (!tab) { - return { - content: [{ type: 'text', text: `Tab ${tabId} not found` }], - isError: true - } - } - await tab.waitForSelector(parsed.selector, { timeout: 5000 }) - await tab.select(parsed.selector, parsed.value) - return { - content: [ - { - type: 'text', - text: `Updated selection for ${parsed.selector}` - } - ] - } - } - }, - { - name: 'browser_scroll', - description: 'Scroll the page by specified offsets.', - schema: ScrollArgsSchema, - handler: async (args, context) => { - const parsed = ScrollArgsSchema.parse(args) - const tabId = await context.resolveTabId(parsed) - const tab = await context.getTab(tabId) - if (!tab) { - return { - content: [{ type: 'text', text: `Tab ${tabId} not found` }], - isError: true - } - } - await tab.scroll({ - x: parsed.x, - y: parsed.y, - behavior: parsed.behavior - }) - return { - content: [ - { - type: 'text', - text: `Scrolled by x=${parsed.x}, y=${parsed.y}` - } - ] - } - } - }, - { - name: 'browser_press_key', - description: 'Send keyboard input to the active page.', - schema: PressKeyArgsSchema, - handler: async (args, context) => { - const parsed = PressKeyArgsSchema.parse(args) - const tabId = await context.resolveTabId(parsed) - const tab = await context.getTab(tabId) - if (!tab) { - return { - content: [{ type: 'text', text: `Tab ${tabId} not found` }], - isError: true - } - } - await tab.pressKey(parsed.key, parsed.count) - return { - content: [ - { - type: 'text', - text: `Pressed key "${parsed.key}" ${parsed.count} time(s)` - } - ] - } - } - } - ] -} diff --git a/src/main/presenter/browser/tools/content.ts b/src/main/presenter/browser/tools/content.ts deleted file mode 100644 index cd10a692b..000000000 --- a/src/main/presenter/browser/tools/content.ts +++ /dev/null @@ -1,256 +0,0 @@ -import { z } from 'zod' -import TurndownService from 'turndown' -import type { BrowserToolDefinition } from './types' - -const BaseArgsSchema = z.object({ - tabId: z.string().optional().describe('Tab identifier (defaults to active tab)') -}) - -const SelectorArgsSchema = BaseArgsSchema.extend({ - selector: z.string().optional().describe('Optional CSS selector to scope extraction') -}) - -const ContentArgsSchema = SelectorArgsSchema.extend({ - offset: z - .number() - .int() - .min(0) - .optional() - .default(0) - .describe('Character offset from which to start reading (0 = beginning)'), - limit: z - .number() - .int() - .min(1) - .max(16000) - .optional() - .default(4000) - .describe('Maximum number of characters to return from the offset') -}) - -const LinksArgsSchema = BaseArgsSchema.extend({ - limit: z.number().int().min(1).max(200).optional().default(50).describe('Maximum links to return') -}) - -const ClickableArgsSchema = BaseArgsSchema.extend({ - limit: z - .number() - .int() - .min(1) - .max(200) - .optional() - .default(50) - .describe('Maximum clickable elements to return') -}) - -const turndown = new TurndownService({ - headingStyle: 'atx' -}) - -const DEFAULT_TEXT_LIMIT = 4000 -const MAX_TEXT_LIMIT = 16000 - -const paginateText = ( - fullText: string | undefined, - offset?: number, - limit?: number -): { slice: string; meta?: string } => { - const text = fullText || '' - const length = text.length - const safeOffset = Math.max(0, offset ?? 0) - const safeLimit = Math.min(MAX_TEXT_LIMIT, Math.max(1, limit ?? DEFAULT_TEXT_LIMIT)) - - if (!text) { - return { slice: '', meta: undefined } - } - - if (safeOffset >= length) { - return { - slice: '', - meta: `Offset ${safeOffset} is beyond content length ${length}. No content returned.` - } - } - - const end = Math.min(length, safeOffset + safeLimit) - const slice = text.slice(safeOffset, end) - const remaining = length - end - - if (remaining <= 0) { - return { slice, meta: undefined } - } - - const nextOffset = end - const meta = `Content length: ${length} characters. Returned range: [${safeOffset}, ${end}) (${slice.length} characters). Remaining: ${remaining} characters. To continue reading, call this tool again with offset=${nextOffset}.` - - return { slice, meta } -} - -export function createContentTools(): BrowserToolDefinition[] { - return [ - { - name: 'browser_get_text', - description: - 'Extract visible text from the page or a specific element. Supports offset/limit pagination to avoid overly long outputs.', - schema: ContentArgsSchema, - handler: async (args, context) => { - const parsed = ContentArgsSchema.parse(args) - const tabId = await context.resolveTabId(parsed) - const tab = await context.getTab(tabId) - if (!tab) { - return { - content: [{ type: 'text', text: `Tab ${tabId} not found` }], - isError: true - } - } - - const text = await tab.getInnerText(parsed.selector) - const { slice, meta } = paginateText(text, parsed.offset, parsed.limit) - - const content: { type: 'text'; text: string }[] = [] - - if (!slice && !meta) { - content.push({ type: 'text', text: '(no text found)' }) - } else { - if (meta) { - content.push({ - type: 'text', - text: `[pagination] ${meta}` - }) - } - if (slice) { - content.push({ - type: 'text', - text: slice - }) - } - } - - return { content } - }, - annotations: { - readOnlyHint: true - } - }, - { - name: 'browser_get_markdown', - description: - 'Extract the page content as Markdown. Supports offset/limit pagination to avoid overly long outputs.', - schema: ContentArgsSchema, - handler: async (args, context) => { - const parsed = ContentArgsSchema.parse(args) - const tabId = await context.resolveTabId(parsed) - const tab = await context.getTab(tabId) - if (!tab) { - return { - content: [{ type: 'text', text: `Tab ${tabId} not found` }], - isError: true - } - } - - await tab.waitForNetworkIdle() - const html = await tab.getHtml(parsed.selector) - const markdown = html ? turndown.turndown(html) : '' - const { slice, meta } = paginateText(markdown, parsed.offset, parsed.limit) - - const content: { type: 'text'; text: string }[] = [] - - if (!slice && !meta) { - content.push({ type: 'text', text: '(no content found)' }) - } else { - if (meta) { - content.push({ - type: 'text', - text: `[pagination] ${meta}` - }) - } - if (slice) { - content.push({ - type: 'text', - text: slice - }) - } - } - - return { content } - }, - annotations: { - readOnlyHint: true - } - }, - { - name: 'browser_read_links', - description: 'List hyperlinks on the current page.', - schema: LinksArgsSchema, - handler: async (args, context) => { - const parsed = LinksArgsSchema.parse(args) - const tabId = await context.resolveTabId(parsed) - const tab = await context.getTab(tabId) - if (!tab) { - return { - content: [{ type: 'text', text: `Tab ${tabId} not found` }], - isError: true - } - } - - const links = await tab.getLinks(parsed.limit) - const formatted = - links.length === 0 - ? 'No links found.' - : links - .map((link, index) => `${index + 1}. ${link.text || '(no text)'} -> ${link.href}`) - .join('\n') - - return { - content: [ - { - type: 'text', - text: formatted - } - ] - } - }, - annotations: { - readOnlyHint: true - } - }, - { - name: 'browser_get_clickable_elements', - description: 'List clickable elements with simple selectors.', - schema: ClickableArgsSchema, - handler: async (args, context) => { - const parsed = ClickableArgsSchema.parse(args) - const tabId = await context.resolveTabId(parsed) - const tab = await context.getTab(tabId) - if (!tab) { - return { - content: [{ type: 'text', text: `Tab ${tabId} not found` }], - isError: true - } - } - - const elements = await tab.getClickableElements(parsed.limit) - const formatted = - elements.length === 0 - ? 'No clickable elements found.' - : elements - .map( - (element, index) => - `${index + 1}. [${element.tag}] ${element.text || element.ariaLabel || '(no text)'} -> ${element.selector}` - ) - .join('\n') - - return { - content: [ - { - type: 'text', - text: formatted - } - ] - } - }, - annotations: { - readOnlyHint: true - } - } - ] -} diff --git a/src/main/presenter/browser/tools/download.ts b/src/main/presenter/browser/tools/download.ts deleted file mode 100644 index 0a0fa0fa5..000000000 --- a/src/main/presenter/browser/tools/download.ts +++ /dev/null @@ -1,89 +0,0 @@ -import { z } from 'zod' -import type { BrowserToolDefinition } from './types' - -const DownloadListArgsSchema = z.object({ - tabId: z.string().optional().describe('Tab identifier (defaults to active tab)') -}) - -const DownloadFileArgsSchema = z.object({ - url: z.string().url().describe('File URL to download'), - savePath: z.string().optional().describe('Optional file path to save as'), - tabId: z - .string() - .optional() - .describe('Tab identifier to use for download context (defaults to active tab)') -}) - -export function createDownloadTools(): BrowserToolDefinition[] { - return [ - { - name: 'browser_get_download_list', - description: 'Get download items for the current browser session.', - schema: DownloadListArgsSchema, - handler: async () => { - // Note: Download list functionality needs to be implemented in YoBrowserPresenter - // For now, return empty list - const downloads: Array<{ - filename: string - state: string - receivedBytes: number - totalBytes: number - url: string - }> = [] - const formatted = - downloads.length === 0 - ? 'No downloads yet.' - : downloads - .map( - (item) => - `- ${item.filename} [${item.state}] ${item.receivedBytes}/${item.totalBytes} bytes (${item.url})` - ) - .join('\n') - - return { - content: [ - { - type: 'text', - text: formatted - } - ] - } - } - }, - { - name: 'browser_download_file', - description: - 'Download a file using the browser session, preserving cookies of the active tab.', - schema: DownloadFileArgsSchema, - handler: async (args, context) => { - const parsed = DownloadFileArgsSchema.parse(args) - if (!context.downloadFile) { - return { - content: [{ type: 'text', text: 'Download functionality not available' }], - isError: true - } - } - - try { - const download = await context.downloadFile(parsed.url, parsed.savePath) - return { - content: [ - { - type: 'text', - text: `Download started: ${parsed.url}\nStatus: ${download.status}\nID: ${download.id}${ - download.filePath ? `\nSave path: ${download.filePath}` : '' - }` - } - ] - } - } catch (error) { - const errorMsg = error instanceof Error ? error.message : String(error) - return { - content: [{ type: 'text', text: `Download failed: ${errorMsg}` }], - isError: true - } - } - } - } - ] -} diff --git a/src/main/presenter/browser/tools/navigate.ts b/src/main/presenter/browser/tools/navigate.ts deleted file mode 100644 index 2d6e95938..000000000 --- a/src/main/presenter/browser/tools/navigate.ts +++ /dev/null @@ -1,283 +0,0 @@ -import { z } from 'zod' -import type { BrowserToolDefinition, ToolResult } from './types' - -const NavigateArgsSchema = z.object({ - url: z.string().url().describe('URL to navigate to'), - tabId: z.string().optional().describe('Tab identifier (defaults to active tab)'), - newTab: z.boolean().optional().default(false).describe('Open navigation in a new tab'), - reuse: z - .boolean() - .optional() - .default(true) - .describe('Reuse an existing tab that matches the domain when true') -}) - -const NavigationOnlyArgsSchema = z.object({ - tabId: z.string().optional().describe('Tab identifier (defaults to active tab)') -}) - -export function createNavigateTools(): BrowserToolDefinition[] { - return [ - { - name: 'browser_navigate', - description: 'Navigate the browser to the specified URL.', - schema: NavigateArgsSchema, - handler: async (args, context) => { - const parsed = NavigateArgsSchema.parse(args) - - // Handle new tab creation - if (parsed.newTab) { - if (!context.createTab) { - return { - content: [{ type: 'text', text: 'Tab creation not available' }], - isError: true - } - } - const newTab = await context.createTab(parsed.url) - if (!newTab) { - return { - content: [{ type: 'text', text: 'Failed to create new tab' }], - isError: true - } - } - return { - content: [ - { - type: 'text', - text: `Opened new tab ${newTab.id} -> ${parsed.url}\nTitle: ${newTab.title || 'unknown'}` - } - ] - } - } - - // Handle reuse logic - if (parsed.reuse && !parsed.tabId) { - // Try to find a reusable tab by domain - const tabs = await context.listTabs?.() - if (tabs && tabs.length > 0) { - try { - const targetHost = new URL(parsed.url).hostname - const reusableTab = tabs.find((t) => { - try { - return new URL(t.url).hostname === targetHost - } catch { - return false - } - }) - if (reusableTab) { - const tab = await context.getTab(reusableTab.id) - if (tab) { - await tab.navigate(parsed.url) - await context.activateTab?.(reusableTab.id) - return { - content: [ - { - type: 'text', - text: `Reused tab ${reusableTab.id} and navigated to ${parsed.url}\nTitle: ${tab.title || 'unknown'}` - } - ] - } - } - } - } catch { - // Ignore URL parse errors, fall through to normal navigation - } - } - } - - // Normal navigation - let tab = parsed.tabId ? await context.getTab(parsed.tabId) : await context.getActiveTab() - - if (!tab) { - // Create a new tab if none exists - if (context.createTab) { - const newTab = await context.createTab(parsed.url) - if (newTab) { - // Add a small delay to ensure BrowserTab is fully initialized - // This is especially important on first call when browser window is just created - await new Promise((resolve) => setTimeout(resolve, 100)) - // Get the BrowserTab object and wait for navigation to complete - // Note: createTab already started navigation via tabPresenter.createTab, - // so we just need to wait for it to complete - const browserTab = await context.getTab(newTab.id) - if (browserTab) { - try { - // createTab already started navigation via tabPresenter.createTab - // If tab is loading, wait for it to complete instead of calling navigate again - if (browserTab.contents.isLoading()) { - // Wait for current navigation to complete - await new Promise((resolve, reject) => { - let timeout: ReturnType - let onStopLoading: () => void - let onFailLoad: ( - _event: unknown, - errorCode: number, - errorDescription: string - ) => void - - const cleanup = () => { - clearTimeout(timeout) - browserTab.contents.removeListener('did-stop-loading', onStopLoading) - browserTab.contents.removeListener('did-fail-load', onFailLoad) - } - - onStopLoading = () => { - cleanup() - resolve() - } - - onFailLoad = (_event, errorCode, errorDescription) => { - cleanup() - reject(new Error(`Navigation failed ${errorCode}: ${errorDescription}`)) - } - - timeout = setTimeout(() => { - cleanup() - reject(new Error('Timeout waiting for page load')) - }, 15000) - - browserTab.contents.once('did-stop-loading', onStopLoading) - browserTab.contents.once('did-fail-load', onFailLoad) - }) - - // Check if URL matches after loading - const finalUrl = browserTab.contents.getURL() - if (finalUrl !== parsed.url) { - // URL doesn't match, need to navigate - await browserTab.navigate(parsed.url, 15000) // 15 second timeout - } - } else { - // Tab is not loading, check if URL matches - const currentUrl = browserTab.contents.getURL() - if (currentUrl !== parsed.url) { - // URL doesn't match, need to navigate - await browserTab.navigate(parsed.url, 15000) // 15 second timeout - } - } - - const result: ToolResult = { - content: [ - { - type: 'text' as const, - text: `Created new tab and navigated to ${parsed.url}\nTitle: ${browserTab.title || 'unknown'}` - } - ] - } - return result - } catch (error) { - console.error('[browser_navigate] Failed to navigate newly created tab:', error) - const errorMessage = error instanceof Error ? error.message : String(error) - const result: ToolResult = { - content: [ - { - type: 'text' as const, - text: `Failed to navigate new tab ${browserTab.tabId} to ${parsed.url}\nError: ${errorMessage}\nTitle: ${browserTab.title || 'unknown'}` - } - ], - isError: true - } - return result - } - } - // Fallback if getTab fails - const result: ToolResult = { - content: [ - { - type: 'text' as const, - text: `Created new tab and navigated to ${parsed.url}\nTitle: ${newTab.title || 'unknown'}` - } - ] - } - return result - } - } - const errorResult: ToolResult = { - content: [ - { - type: 'text' as const, - text: 'No active tab available' - } - ], - isError: true - } - return errorResult - } - - await tab.navigate(parsed.url) - const result: ToolResult = { - content: [ - { - type: 'text' as const, - text: `Navigated to ${parsed.url}\nTitle: ${tab.title || 'unknown'}` - } - ] - } - return result - } - }, - { - name: 'browser_go_back', - description: 'Go back to the previous page in the current tab.', - schema: NavigationOnlyArgsSchema, - handler: async (args, context) => { - const parsed = NavigationOnlyArgsSchema.parse(args) - const tabId = await context.resolveTabId(parsed) - const tab = await context.getTab(tabId) - - if (!tab) { - return { - content: [ - { - type: 'text', - text: `Tab ${tabId} not found` - } - ], - isError: true - } - } - - await tab.goBack() - return { - content: [ - { - type: 'text', - text: `Went back. Current URL: ${tab.url || 'about:blank'}` - } - ] - } - } - }, - { - name: 'browser_go_forward', - description: 'Go forward to the next page in the current tab.', - schema: NavigationOnlyArgsSchema, - handler: async (args, context) => { - const parsed = NavigationOnlyArgsSchema.parse(args) - const tabId = await context.resolveTabId(parsed) - const tab = await context.getTab(tabId) - - if (!tab) { - return { - content: [ - { - type: 'text', - text: `Tab ${tabId} not found` - } - ], - isError: true - } - } - - await tab.goForward() - return { - content: [ - { - type: 'text', - text: `Went forward. Current URL: ${tab.url || 'about:blank'}` - } - ] - } - } - } - ] -} diff --git a/src/main/presenter/browser/tools/screenshot.ts b/src/main/presenter/browser/tools/screenshot.ts deleted file mode 100644 index a4de7c1b6..000000000 --- a/src/main/presenter/browser/tools/screenshot.ts +++ /dev/null @@ -1,48 +0,0 @@ -import { z } from 'zod' -import type { BrowserToolDefinition } from './types' - -const ScreenshotArgsSchema = z.object({ - tabId: z.string().optional().describe('Tab identifier (defaults to active tab)'), - selector: z.string().optional().describe('Capture only the element matching this selector'), - fullPage: z.boolean().optional().default(false).describe('Capture the full page'), - highlightSelectors: z - .array(z.string()) - .optional() - .describe('Selectors to highlight before capture') -}) - -export function createScreenshotTools(): BrowserToolDefinition[] { - return [ - { - name: 'browser_screenshot', - description: 'Capture a screenshot of the current page or a specific element.', - schema: ScreenshotArgsSchema, - handler: async (args, context) => { - const parsed = ScreenshotArgsSchema.parse(args) - const tabId = await context.resolveTabId(parsed) - const tab = await context.getTab(tabId) - if (!tab) { - return { - content: [{ type: 'text', text: `Tab ${tabId} not found` }], - isError: true - } - } - - const base64 = await tab.takeScreenshot({ - selector: parsed.selector, - fullPage: parsed.fullPage, - highlightSelectors: parsed.highlightSelectors - }) - - return { - content: [ - { - type: 'text', - text: `data:image/png;base64,${base64}` - } - ] - } - } - } - ] -} diff --git a/src/main/presenter/browser/tools/tabs.ts b/src/main/presenter/browser/tools/tabs.ts deleted file mode 100644 index 5cdec59af..000000000 --- a/src/main/presenter/browser/tools/tabs.ts +++ /dev/null @@ -1,152 +0,0 @@ -import { z } from 'zod' -import type { BrowserToolDefinition } from './types' - -const BaseArgsSchema = z.object({ - tabId: z.string().optional().describe('Tab identifier (defaults to active tab)') -}) - -const NewTabArgsSchema = z.object({ - url: z.string().url().optional().describe('Optional URL to open in the new tab') -}) - -const SwitchTabArgsSchema = z.object({ - tabId: z.string().describe('Tab identifier to activate') -}) - -const CloseTabArgsSchema = z.object({ - tabId: z.string().optional().describe('Tab identifier to close (defaults to active tab)') -}) - -export function createTabTools(): BrowserToolDefinition[] { - return [ - { - name: 'browser_new_tab', - description: 'Open a new browser tab (window) for the session.', - schema: NewTabArgsSchema, - handler: async (args, context) => { - const parsed = NewTabArgsSchema.parse(args) - if (!context.createTab) { - return { - content: [{ type: 'text', text: 'Tab creation not available' }], - isError: true - } - } - const tab = await context.createTab(parsed.url) - if (!tab) { - return { - content: [{ type: 'text', text: 'Failed to create new tab' }], - isError: true - } - } - return { - content: [ - { - type: 'text', - text: `Opened new tab ${tab.id}${parsed.url ? ` -> ${parsed.url}` : ''}` - } - ] - } - } - }, - { - name: 'browser_tab_list', - description: 'List all tabs (windows) for the current session.', - schema: BaseArgsSchema, - handler: async (_args, context) => { - if (!context.listTabs) { - return { - content: [{ type: 'text', text: 'Tab list not available' }], - isError: true - } - } - const tabs = await context.listTabs() - const formatted = - tabs.length === 0 - ? 'No tabs open.' - : tabs - .map( - (tab) => - `${tab.isActive ? '*' : ' '} Tab ${tab.id}: ${tab.title || 'Untitled'} (${tab.url || 'about:blank'})` - ) - .join('\n') - - return { - content: [ - { - type: 'text', - text: formatted - } - ] - } - } - }, - { - name: 'browser_switch_tab', - description: 'Activate a specific tab (window) by its id.', - schema: SwitchTabArgsSchema, - handler: async (args, context) => { - const parsed = SwitchTabArgsSchema.parse(args) - if (!context.activateTab) { - return { - content: [{ type: 'text', text: 'Tab activation not available' }], - isError: true - } - } - const tab = await context.getTab(parsed.tabId) - if (!tab) { - return { - content: [ - { - type: 'text', - text: `Tab ${parsed.tabId} not found` - } - ], - isError: true - } - } - - await context.activateTab(parsed.tabId) - return { - content: [ - { - type: 'text', - text: `Switched to tab ${parsed.tabId}: ${tab.title || 'Untitled'}` - } - ] - } - } - }, - { - name: 'browser_close_tab', - description: 'Close a tab (window). Defaults to the active tab.', - schema: CloseTabArgsSchema, - handler: async (args, context) => { - const parsed = CloseTabArgsSchema.parse(args) - if (!context.closeTab) { - return { - content: [{ type: 'text', text: 'Tab closing not available' }], - isError: true - } - } - const tabId = parsed.tabId - ? parsed.tabId - : (await context.getActiveTab())?.tabId || (await context.resolveTabId(undefined)) - if (!tabId) { - return { - content: [{ type: 'text', text: 'No active tab to close' }], - isError: true - } - } - await context.closeTab(tabId) - return { - content: [ - { - type: 'text', - text: `Closed tab ${tabId}` - } - ] - } - } - } - ] -} diff --git a/src/main/presenter/browser/tools/types.ts b/src/main/presenter/browser/tools/types.ts deleted file mode 100644 index eba58504d..000000000 --- a/src/main/presenter/browser/tools/types.ts +++ /dev/null @@ -1,48 +0,0 @@ -import type { RequestHandlerExtra } from '@modelcontextprotocol/sdk/shared/protocol.js' -import type { CallToolResult, Notification, Request } from '@modelcontextprotocol/sdk/types.js' -import type { ZodTypeAny } from 'zod' -import type { BrowserTab } from '../BrowserTab' - -export type ToolResult = CallToolResult - -export interface BrowserToolContext { - getTab: (tabId?: string) => Promise - getActiveTab: () => Promise - resolveTabId: ( - args: { tabId?: string } | undefined, - extra?: RequestHandlerExtra - ) => Promise - // Tab management methods - createTab?: (url?: string) => Promise<{ id: string; url: string; title: string } | null> - listTabs?: () => Promise> - activateTab?: (tabId: string) => Promise - closeTab?: (tabId: string) => Promise - // Download methods - downloadFile?: ( - url: string, - savePath?: string - ) => Promise<{ - id: string - url: string - filePath?: string - status: string - }> -} - -export interface BrowserToolDefinition { - name: string - description: string - schema: ZodTypeAny - handler: ( - args: any, - context: BrowserToolContext, - extra: RequestHandlerExtra - ) => Promise - annotations?: { - title?: string - readOnlyHint?: boolean - destructiveHint?: boolean - idempotentHint?: boolean - openWorldHint?: boolean - } -} diff --git a/src/main/presenter/toolPresenter/index.ts b/src/main/presenter/toolPresenter/index.ts index 8703b2b10..4d7975806 100644 --- a/src/main/presenter/toolPresenter/index.ts +++ b/src/main/presenter/toolPresenter/index.ts @@ -73,7 +73,6 @@ export class ToolPresenter implements IToolPresenter { // Initialize or update AgentToolManager if workspace path changed if (!this.agentToolManager) { this.agentToolManager = new AgentToolManager({ - yoBrowserPresenter: this.options.yoBrowserPresenter, agentWorkspacePath, configPresenter: this.options.configPresenter, commandPermissionHandler: this.options.commandPermissionHandler diff --git a/src/renderer/src/components/workspace/WorkspaceView.vue b/src/renderer/src/components/workspace/WorkspaceView.vue index d66fbfe13..c23b72f10 100644 --- a/src/renderer/src/components/workspace/WorkspaceView.vue +++ b/src/renderer/src/components/workspace/WorkspaceView.vue @@ -42,6 +42,7 @@ import { computed } from 'vue' import { Icon } from '@iconify/vue' import { useI18n } from 'vue-i18n' import { useWorkspaceStore } from '@/stores/workspace' +import { useYoBrowserStore } from '@/stores/yoBrowser' import { useChatMode } from '@/components/chat-input/composables/useChatMode' import WorkspacePlan from './WorkspacePlan.vue' import WorkspaceFiles from './WorkspaceFiles.vue' @@ -50,8 +51,11 @@ import WorkspaceBrowserTabs from './WorkspaceBrowserTabs.vue' const { t } = useI18n() const store = useWorkspaceStore() +const yoBrowserStore = useYoBrowserStore() const chatMode = useChatMode() -const showBrowserTabs = computed(() => chatMode.currentMode.value === 'agent') +const showBrowserTabs = computed( + () => chatMode.currentMode.value === 'agent' && yoBrowserStore.tabCount > 0 +) const i18nPrefix = computed(() => 'chat.workspace') const titleKey = computed(() => `${i18nPrefix.value}.title`) diff --git a/src/shared/types/presenters/legacy.presenters.d.ts b/src/shared/types/presenters/legacy.presenters.d.ts index 52de26233..50aaf4ad9 100644 --- a/src/shared/types/presenters/legacy.presenters.d.ts +++ b/src/shared/types/presenters/legacy.presenters.d.ts @@ -217,12 +217,14 @@ export interface IYoBrowserPresenter { canGoForward: boolean }> getTabIdByViewId(viewId: number): Promise - getToolDefinitions(supportsVision: boolean): Promise - callTool(toolName: string, params: Record): Promise captureScreenshot(tabId: string, options?: ScreenshotOptions): Promise startDownload(url: string, savePath?: string): Promise clearSandboxData(): Promise shutdown(): Promise + readonly toolHandler: { + getToolDefinitions(): any[] + callTool(toolName: string, args: Record): Promise + } } export interface IWindowPresenter { diff --git a/test/main/presenter/agentPresenter/agentToolManagerSettings.test.ts b/test/main/presenter/agentPresenter/agentToolManagerSettings.test.ts index d2a09741f..d06128922 100644 --- a/test/main/presenter/agentPresenter/agentToolManagerSettings.test.ts +++ b/test/main/presenter/agentPresenter/agentToolManagerSettings.test.ts @@ -18,6 +18,11 @@ vi.mock('@/presenter', () => ({ getActiveSkills: vi.fn(), getActiveSkillsAllowedTools: vi.fn() }, + yoBrowserPresenter: { + toolHandler: { + getToolDefinitions: vi.fn().mockReturnValue([]) + } + }, sessionPresenter: {}, windowPresenter: {} } @@ -27,9 +32,6 @@ describe('AgentToolManager DeepChat settings tool gating', () => { const configPresenter = { getSkillsEnabled: () => true } as any - const yoBrowserPresenter = { - getToolDefinitions: vi.fn().mockResolvedValue([]) - } as any beforeEach(() => { vi.clearAllMocks() @@ -40,7 +42,6 @@ describe('AgentToolManager DeepChat settings tool gating', () => { ;(presenter.skillPresenter.getActiveSkillsAllowedTools as any).mockResolvedValue([]) const manager = new AgentToolManager({ - yoBrowserPresenter, agentWorkspacePath: null, configPresenter }) @@ -64,7 +65,6 @@ describe('AgentToolManager DeepChat settings tool gating', () => { ]) const manager = new AgentToolManager({ - yoBrowserPresenter, agentWorkspacePath: null, configPresenter }) From c509b053fbc7bd11cb12594fb8b65697a437d9e2 Mon Sep 17 00:00:00 2001 From: zerob13 Date: Tue, 20 Jan 2026 10:36:07 +0800 Subject: [PATCH 4/6] refactor(yobrowser): add CDP method schema validation with strict enums - Add enum-based validation for cdp_send method (11 common CDP methods) - Add detailed union schemas for each method's parameters with examples - Add normalizeCdpParams method to handle both object and JSON string inputs - Prevent method typos and provide better type safety for CDP interactions --- .../browser/YoBrowserToolDefinitions.ts | 110 +++++++++++++++++- .../presenter/browser/YoBrowserToolHandler.ts | 22 +++- 2 files changed, 122 insertions(+), 10 deletions(-) diff --git a/src/main/presenter/browser/YoBrowserToolDefinitions.ts b/src/main/presenter/browser/YoBrowserToolDefinitions.ts index c597a696d..d8fbceff2 100644 --- a/src/main/presenter/browser/YoBrowserToolDefinitions.ts +++ b/src/main/presenter/browser/YoBrowserToolDefinitions.ts @@ -16,13 +16,111 @@ const yoBrowserSchemas = { cdp_send: z.object({ tabId: z.string().optional().describe('Optional tab ID. If omitted, uses the active tab'), method: z - .string() - .min(1) - .describe('CDP method name (e.g., "Page.navigate", "Runtime.evaluate")'), + .enum([ + 'Page.navigate', + 'Page.reload', + 'Page.captureScreenshot', + 'Runtime.evaluate', + 'DOM.getDocument', + 'DOM.querySelector', + 'DOM.querySelectorAll', + 'DOM.getOuterHTML', + 'Input.dispatchMouseEvent', + 'Input.dispatchKeyEvent' + ]) + .describe('Common CDP method name'), params: z - .record(z.unknown()) - .optional() - .describe('Optional parameters object for the CDP method') + .union([ + z + .object({ + url: z.string().url().describe('Example: "https://example.com"') + }) + .describe('For Page.navigate. Example: {"url":"https://example.com"}'), + z + .object({ + ignoreCache: z.boolean().optional().describe('Example: true'), + scriptToEvaluateOnLoad: z + .string() + .optional() + .describe('Example: "console.log(document.title)"') + }) + .describe('For Page.reload. Example: {"ignoreCache":true}'), + z + .object({ + format: z.enum(['png', 'jpeg']).optional().describe('Example: "png"'), + quality: z.number().int().min(0).max(100).optional().describe('Example: 80'), + clip: z + .object({ + x: z.number().describe('Example: 0'), + y: z.number().describe('Example: 0'), + width: z.number().positive().describe('Example: 800'), + height: z.number().positive().describe('Example: 600'), + scale: z.number().positive().optional().describe('Example: 1') + }) + .optional() + .describe('Example: {"x":0,"y":0,"width":800,"height":600,"scale":1}') + }) + .describe('For Page.captureScreenshot. Example: {"format":"png"}'), + z + .object({ + expression: z.string().min(1).describe('Example: "document.title"'), + returnByValue: z.boolean().optional().describe('Example: true'), + awaitPromise: z.boolean().optional().describe('Example: true') + }) + .describe( + 'For Runtime.evaluate. Example: {"expression":"document.title","returnByValue":true}' + ), + z + .object({ + depth: z.number().int().min(0).optional().describe('Example: 1'), + pierce: z.boolean().optional().describe('Example: true') + }) + .describe('For DOM.getDocument. Example: {"depth":1,"pierce":true}'), + z + .object({ + nodeId: z.number().int().positive().describe('Example: 1'), + selector: z.string().min(1).describe('Example: "body"') + }) + .describe('For DOM.querySelector. Example: {"nodeId":1,"selector":"body"}'), + z + .object({ + nodeId: z.number().int().positive().describe('Example: 1'), + selector: z.string().min(1).describe('Example: "a"') + }) + .describe('For DOM.querySelectorAll. Example: {"nodeId":1,"selector":"a"}'), + z + .object({ + nodeId: z.number().int().positive().describe('Example: 1') + }) + .describe('For DOM.getOuterHTML. Example: {"nodeId":1}'), + z + .object({ + type: z + .enum(['mousePressed', 'mouseReleased', 'mouseMoved']) + .describe('Example: "mousePressed"'), + x: z.number().describe('Example: 120'), + y: z.number().describe('Example: 240'), + button: z + .enum(['none', 'left', 'middle', 'right']) + .optional() + .describe('Example: "left"'), + clickCount: z.number().int().min(1).optional().describe('Example: 1') + }) + .describe( + 'For Input.dispatchMouseEvent. Example: {"type":"mousePressed","x":120,"y":240,"button":"left","clickCount":1}' + ), + z + .object({ + type: z.enum(['keyDown', 'keyUp', 'rawKeyDown', 'char']).describe('Example: "keyDown"'), + key: z.string().optional().describe('Example: "a"'), + code: z.string().optional().describe('Example: "KeyA"'), + text: z.string().optional().describe('Example: "a"') + }) + .describe( + 'For Input.dispatchKeyEvent. Example: {"type":"keyDown","key":"a","code":"KeyA","text":"a"}' + ) + ]) + .describe('Parameters for the selected CDP method. Must be an object, not a JSON string') }) } diff --git a/src/main/presenter/browser/YoBrowserToolHandler.ts b/src/main/presenter/browser/YoBrowserToolHandler.ts index de0e30011..84a17878b 100644 --- a/src/main/presenter/browser/YoBrowserToolHandler.ts +++ b/src/main/presenter/browser/YoBrowserToolHandler.ts @@ -40,10 +40,7 @@ export class YoBrowserToolHandler { case 'yo_browser_cdp_send': { const tabId = typeof args.tabId === 'string' ? args.tabId : undefined const method = typeof args.method === 'string' ? args.method : '' - const params = - typeof args.params === 'object' && args.params !== null - ? (args.params as Record) - : {} + const params = this.normalizeCdpParams(args.params) return await this.handleCdpSend(tabId, method, params) } default: @@ -107,4 +104,21 @@ export class YoBrowserToolHandler { const response = await browserTab.sendCdpCommand(method, params) return JSON.stringify(response ?? {}) } + + private normalizeCdpParams(value: unknown): Record { + if (typeof value === 'object' && value !== null && !Array.isArray(value)) { + return value as Record + } + if (typeof value === 'string' && value.trim()) { + try { + const parsed = JSON.parse(value) + if (typeof parsed === 'object' && parsed !== null && !Array.isArray(parsed)) { + return parsed as Record + } + } catch { + return {} + } + } + return {} + } } From bcdefd39544eaf812c2bc4374ec4ce67d94e8d35 Mon Sep 17 00:00:00 2001 From: zerob13 Date: Tue, 20 Jan 2026 13:15:12 +0800 Subject: [PATCH 5/6] fix(yobrowser): add strict tab ID validation in CDP send handler --- src/main/presenter/browser/YoBrowserToolHandler.ts | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/src/main/presenter/browser/YoBrowserToolHandler.ts b/src/main/presenter/browser/YoBrowserToolHandler.ts index 84a17878b..ffc1f07a9 100644 --- a/src/main/presenter/browser/YoBrowserToolHandler.ts +++ b/src/main/presenter/browser/YoBrowserToolHandler.ts @@ -100,6 +100,13 @@ export class YoBrowserToolHandler { if (!browserTab) { throw new Error(tabId ? `Tab ${tabId} not found` : 'No active tab available') } + if (tabId) { + const resolvedTabId = + (browserTab as { tabId?: string; id?: string }).tabId ?? (browserTab as { id?: string }).id + if (resolvedTabId !== tabId) { + throw new Error(`Tab ${tabId} not found`) + } + } const response = await browserTab.sendCdpCommand(method, params) return JSON.stringify(response ?? {}) From 15d4f75d363e4d9d30a467beac6bf45da4bd6077 Mon Sep 17 00:00:00 2001 From: zerob13 Date: Tue, 20 Jan 2026 13:19:29 +0800 Subject: [PATCH 6/6] chore: update deps --- package.json | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/package.json b/package.json index c2f359b8e..5923db0e2 100644 --- a/package.json +++ b/package.json @@ -76,7 +76,7 @@ "chokidar": "^5.0.0", "compare-versions": "^6.1.1", "cross-spawn": "^7.0.6", - "diff": "^7.0.0", + "diff": "^8.0.3", "electron-log": "^5.4.3", "electron-store": "^8.2.0", "electron-updater": "^6.6.2",