diff --git a/docs/architecture/tool-system.md b/docs/architecture/tool-system.md index 9403012da..f73c17213 100644 --- a/docs/architecture/tool-system.md +++ b/docs/architecture/tool-system.md @@ -29,7 +29,7 @@ graph TB AgentToolMgr[AgentToolManager] FsHandler[AgentFileSystemHandler] - Browser[Yo Browser Tools] + YoBrowser[Yo Browser CDP] end subgraph "外部服务" @@ -48,10 +48,10 @@ graph TB McpClient --> MCPServers AgentToolMgr --> FsHandler - AgentToolMgr --> Browser + AgentToolMgr --> YoBrowser FsHandler --> Files - Browser --> Web + YoBrowser --> Web classDef router fill:#e3f2fd classDef mcp fill:#fff3e0 @@ -60,7 +60,7 @@ graph TB class ToolP,Mapper router class McpP,ServerMgr,ToolMgr,McpClient mcp - class AgentToolMgr,FsHandler,Browser agent + class AgentToolMgr,FsHandler,YoBrowser agent class MCPServers,Files,Web external ``` @@ -622,23 +622,20 @@ class AgentFileSystemHandler { 3. **边界检查**:防止 `../` 越界访问 4. **正则验证**:`grep_search` 和 `text_replace` 使用 `validateRegexPattern` 防 ReDoS -### Browser 工具 +### YoBrowser CDP 工具 -```typescript -// 通过 Yo Browser Presenter 调用 -async callBrowserTool(toolName: string, args: any): Promise { - switch (toolName) { - case 'browser_navigate': - return await this.yoBrowserPresenter.navigate(args.url) - case 'browser_scrape': - return await this.yoBrowserPresenter.scrape(args.url) - case 'browser_screenshot': - return await this.yoBrowserPresenter.screenshot(args.url) - default: - throw new Error(`未知的 Browser 工具: ${toolName}`) - } -} -``` +YoBrowser 提供基于 Chrome DevTools Protocol (CDP) 的最小工具集,在 agent 模式下直接可用。 + +**可用工具**: +- `yo_browser_tab_list` - 列出所有浏览器 tabs +- `yo_browser_tab_new` - 创建新 tab +- `yo_browser_tab_activate` - 激活指定 tab +- `yo_browser_tab_close` - 关闭 tab +- `yo_browser_cdp_send` - 发送 CDP 命令 + +**安全约束**: +- `local://` URL 禁止 CDP attach(在 `BrowserTab.ensureSession()` 中检查) +- 所有 CDP 命令通过 `webContents.debugger.sendCommand()` 执行 ## 🔐 权限系统 diff --git a/docs/archives/workspace-agent-refactoring-summary.md b/docs/archives/workspace-agent-refactoring-summary.md index 0634495da..bc9d193e0 100644 --- a/docs/archives/workspace-agent-refactoring-summary.md +++ b/docs/archives/workspace-agent-refactoring-summary.md @@ -196,7 +196,7 @@ graph TB - MCP 工具:保持原始命名 - Agent FileSystem 工具:不加前缀(`read_file` 等) -- Yo Browser:保留 `browser_` 前缀 +- Yo Browser:使用 `yo_browser_` 前缀 ### 工具路由机制 diff --git a/docs/specs/yobrowser-optimization/plan.md b/docs/specs/yobrowser-optimization/plan.md new file mode 100644 index 000000000..37bfac326 --- /dev/null +++ b/docs/specs/yobrowser-optimization/plan.md @@ -0,0 +1,83 @@ +# YoBrowser Optimization:实施方案(Plan) + +## 现状盘点(基于代码) + +- Renderer:`src/renderer/src/components/workspace/WorkspaceView.vue` 在 `agent` 模式下渲染 `WorkspaceBrowserTabs`,但不关心是否存在 tabs。 +- Renderer:`src/renderer/src/stores/yoBrowser.ts` 已维护 tabs 与 `tabCount`(由 IPC 事件更新)。 +- Main:YoBrowser 通过 `YoBrowserToolHandler` + `YoBrowserToolDefinitions` 暴露 `yo_browser_*` 工具,当前有 skill gating 逻辑(需要激活 `yo-browser-cdp` skill)。 +- Agent loop:`src/main/presenter/agentPresenter/loop/toolCallProcessor.ts` 中 `TOOLS_REQUIRING_OFFLOAD` 包含 `yo_browser_cdp_send`。 + +## 总体设计 + +1) UI:Browser Tabs 分区只在 `tabCount > 0` 时出现。 +2) YoBrowser 工具直接注入:agent 模式下直接提供 `yo_browser_*` 工具,不依赖 skills 体系。 +3) 工具实现保持 CDP 方式:`yo_browser_cdp_send` + tab 管理,参数 schema 按 CDP 定义。 + +> 约束:不做任何 system prompt / browser context 缩减。 + +--- + +## 1) UI:Browser Tabs 分区仅在 `tabCount > 0` 时渲染 + +- 修改 `WorkspaceView.vue`: + - 引入 `useYoBrowserStore()`。 + - 将 `showBrowserTabs` 改为:`chatMode.currentMode.value === 'agent' && yoBrowserStore.tabCount > 0`。 + +说明: +- `yoBrowserStore.tabCount` 已存在且由 tabs 数组计算。 +- tabs 更新依赖现有 `YO_BROWSER_EVENTS.*`(TAB_CREATED/TAB_CLOSED/TAB_COUNT_CHANGED 等),无需新增事件。 + +--- + +## 2) YoBrowser 工具直接注入(agent 模式,不依赖 skills) + +### 2.1 移除 tool definitions 的 skill gating + +- `src/main/presenter/browser/YoBrowserToolHandler.ts` + - 删除 `getActiveSkills()` 方法或不再使用。 + - `getToolDefinitions()` 直接返回 `getYoBrowserToolDefinitions()`(不再受 `activeSkills` 控制)。 + +### 2.2 同步更新 AgentToolManager 注入逻辑 + +- `src/main/presenter/agentPresenter/acp/agentToolManager.ts` + - `getAllToolDefinitions()` 中,在 agent 模式下直接追加 `yoBrowserPresenter.toolHandler.getToolDefinitions()`(不再传递/依赖 conversationId 做 gating)。 + - `callTool()` 中,`toolName.startsWith('yo_browser_')` 分支保持不变(继续路由到 YoBrowser handler)。 + +### 2.3 移除 skill 文档与残留引用 + +- 删除 `resources/skills/yo-browser-cdp/` 整个目录。 +- `docs/architecture/tool-system.md`: + - 删除或改写“YoBrowser CDP 工具仅在 `yo-browser-cdp` skill 激活时可用”的描述。 + - 改为:“YoBrowser CDP 工具在 agent 模式下直接可用”。 +- 全局搜索 `yo-browser-cdp` / `allowedTools` / `skill gated`,确保没有残留引用(代码、文档、测试)。 + +--- + +## 3) 工具实现:CDP 方式 + 参数定义(保持现状) + +### 3.1 工具集合(无需改动) + +- `yo_browser_tab_list` +- `yo_browser_tab_new` +- `yo_browser_tab_activate` +- `yo_browser_tab_close` +- `yo_browser_cdp_send` + +### 3.2 参数 schema(保持现状,无需改动) + +- `src/main/presenter/browser/YoBrowserToolDefinitions.ts`: + - `cdp_send` 参数:`{ tabId?: string, method: string, params?: object }`。 + - 其他 tab 管理工具参数保持不变。 + +### 3.3 安全边界(保持现状) + +- `src/main/presenter/browser/BrowserTab.ensureSession()`: + - 检查 `currentUrl.startsWith('local://')`,若为真则抛出错误(禁止 CDP attach)。 + +--- + +## 不在本计划内 + +- system prompt / browser context 的缩减或重写。 +- 任何对 YoBrowser UI 行为(窗口位置/大小等)的调整。 +- skills 体系(YoBrowser 不再使用 skills 来控制工具可见性)。 diff --git a/docs/specs/yobrowser-optimization/spec.md b/docs/specs/yobrowser-optimization/spec.md new file mode 100644 index 000000000..1180753e0 --- /dev/null +++ b/docs/specs/yobrowser-optimization/spec.md @@ -0,0 +1,65 @@ +# YoBrowser Optimization(UI + CDP 工具) + +## 背景 + +当前 YoBrowser 在 Workspace 侧边栏存在 UI 问题: +- `src/renderer/src/components/workspace/WorkspaceView.vue` 在 `agent` 模式下总会渲染 `WorkspaceBrowserTabs` 分区,即便没有任何 tab,也会出现一块空区域。 + +## 目标(Goals) + +1. **UI**:只有存在 YoBrowser tabs 时,Workspace 侧边栏才显示 Browser Tabs 分区。 +2. **Agent 工具直接注入**:YoBrowser 工具(`yo_browser_*`)在 agent 模式下直接可用,无需激活任何 skill。 + +## 非目标(Non-Goals) + +- 不调整 YoBrowser window 的 UI、尺寸、布局、位置策略。 +- 不修改 `BrowserContextBuilder.buildSystemPrompt` 的注入策略(不做减少/压缩/裁剪)。 +- 不改造其他 agent 工具(filesystem/bash/mcp 等)。 +- 不使用 skills 系统来控制 YoBrowser 工具的可见性。 + +## 用户故事(User Stories) + +- 作为用户,我不希望在没有任何浏览器 tab 的情况下,Workspace 侧边栏仍出现空的 Browser Tabs 分区。 +- 作为 agent 用户,我希望 YoBrowser 自动化能力以 CDP 为核心,工具在 agent 模式下直接可用。 + +## 约束与假设(Constraints & Assumptions) + +- YoBrowser 现有实现已经基于 Electron Debugger/CDP(`CDPManager`, `BrowserTab.ensureSession()`)。 +- 安全边界:`local://` URL 禁止绑定 CDP(`BrowserTab` 现有逻辑已做限制)。 + +## 验收标准(Acceptance Criteria) + +### A. UI:Workspace Browser Tabs 展示逻辑 + +- [ ] `src/renderer/src/components/workspace/WorkspaceView.vue` 仅在 `chatMode === 'agent' && yoBrowserStore.tabCount > 0` 时渲染 `WorkspaceBrowserTabs`。 +- [ ] 当 `tabCount === 0` 时,不显示 Browser Tabs 分区(不保留空白区域)。 + +### B. 工具:YoBrowser CDP 工具直接注入(agent 模式) + +- [ ] agent tool definitions 中包含 `yo_browser_*` 工具(agent 模式下直接可用)。 +- [ ] agent 的 tool call 路由正确处理 `yo_browser_*` 工具(`toolName.startsWith('yo_browser_')`)。 +- [ ] 不依赖 skills 系统(不检查 `activeSkills`)。 + +### C. 工具实现:CDP 方式 + 合适的参数定义 + +- [ ] 工具集合: + - `yo_browser_tab_list`:列出 tabs 与 active tab。 + - `yo_browser_tab_new`:创建新 tab(可选 url)。 + - `yo_browser_tab_activate`:激活 tab。 + - `yo_browser_tab_close`:关闭 tab。 + - `yo_browser_cdp_send`:向指定/当前 tab 的 CDP session 发送 `{ method, params }`。 +- [ ] 参数 schema 符合 CDP 使用方式(method、params 等)。 +- [ ] 保留安全边界:`local://` 禁止 CDP attach。 + +### D. Prompt/Context + +- [ ] `BrowserContextBuilder.buildSystemPrompt` 的注入保持现状(不做减少/压缩/裁剪)。 + +### E. 兼容性 + +- [ ] 不涉及数据迁移。 +- [ ] 现有 YoBrowser UI/窗口/Tab 生命周期保持可用。 + +## Open Questions + +无。 diff --git a/docs/specs/yobrowser-optimization/tasks.md b/docs/specs/yobrowser-optimization/tasks.md new file mode 100644 index 000000000..25b972712 --- /dev/null +++ b/docs/specs/yobrowser-optimization/tasks.md @@ -0,0 +1,61 @@ +# YoBrowser Optimization:任务拆分(Tasks) + +## Phase 1:UI(Workspace 侧边栏) + +1. 调整调整 Browser Tabs 分区显示条件 +- 文件:`src/renderer/src/components/workspace/WorkspaceView.vue` +- 改动:`WorkspaceBrowserTabs` 仅在 `chatMode === 'agent' && yoBrowserStore.tabCount > 0` 时渲染。 +- 验收:无 tabs 时不出现分区;有 tabs 时出现并能点击切换。 + +2.(可选)补 renderer 单测 +- 文件:`test/renderer/**`(按现有测试组织落位) +- 用例:tabCount=0/1 下的条件渲染。 + +--- + +## Phase 2:移除 YoBrowser skill gating + +3. 移除 YoBrowser tool definitions 的 skill gating +- 文件:`src/main/presenter/browser/YoBrowserToolHandler.ts` +- 改动:删除 `getActiveSkills()` 方法或不再使用;`getToolDefinitions()` 直接返回 `getYoBrowserToolDefinitions()`。 +- 验收:不再依赖 `activeSkills`。 + +4. 调整 AgentToolManager 注入逻辑(不再依赖 conversationId 做 gating) +- 文件:`src/main/presenter/agentPresenter/acp/agentToolManager.ts` +- 改动:`getAllToolDefinitions()` 中,agent 模式下直接追加 `yoBrowserPresenter.toolHandler.getToolDefinitions()`(可不传 conversationId)。 +- 验收:tool definitions 包含 `yo_browser_*`。 + +5. 删除 skill 文档与残留引用 +- 删除 `resources/skills/yo-browser-cdp/` 整个目录。 +- 文件:`docs/architecture/tool-system.md`(以及搜索到的其他文档) +- 改动:删除或改写“仅在 `yo-browser-cdp` skill 激活时可用”的描述;改为“agent 模式下直接可用”。 +- 全局搜索:确认没有残留的 `yo-browser-cdp` / `skill gated` 引用。 + +--- + +## Phase 3:验证工具实现(保持 CDP 方式) + +6. 验证工具参数定义 +- 文件:`src/main/presenter/browser/YoBrowserToolDefinitions.ts` +- 验收:`yo_browser_cdp_send` 参数为 `{ tabId?: string, method: string, params?: object }`。 + +7. 验证安全边界 +- 文件:`src/main/presenter/browser/BrowserTab.ts` +- 验收:`ensureSession()` 中有 `local://` URL 检查。 + +8.(可选)补 main 单测 +- 验证: + - agent 模式下 tool definitions 包含 `yo_browser_*`。 + - `callTool()` 正确路由到 YoBrowser handler。 + +--- + +## Phase 4:验收与质量门禁 + +9. 手工验收 +- Agent 模式下:无 tabs 时 Workspace 不显示 Browser Tabs;创建 tab 后显示。 +- Agent 模式下:不激活任何 skill,`yo_browser_*` 工具直接可用。 + +10. 质量门禁 +- `pnpm run format && pnpm run lint && pnpm run typecheck` +- `pnpm test` diff --git a/package.json b/package.json index c2f359b8e..5923db0e2 100644 --- a/package.json +++ b/package.json @@ -76,7 +76,7 @@ "chokidar": "^5.0.0", "compare-versions": "^6.1.1", "cross-spawn": "^7.0.6", - "diff": "^7.0.0", + "diff": "^8.0.3", "electron-log": "^5.4.3", "electron-store": "^8.2.0", "electron-updater": "^6.6.2", diff --git a/src/main/presenter/agentPresenter/acp/agentToolManager.ts b/src/main/presenter/agentPresenter/acp/agentToolManager.ts index 0438b613b..65051babf 100644 --- a/src/main/presenter/agentPresenter/acp/agentToolManager.ts +++ b/src/main/presenter/agentPresenter/acp/agentToolManager.ts @@ -1,4 +1,4 @@ -import type { IConfigPresenter, IYoBrowserPresenter, MCPToolDefinition } from '@shared/presenter' +import type { IConfigPresenter, MCPToolDefinition } from '@shared/presenter' import { zodToJsonSchema } from 'zod-to-json-schema' import { z } from 'zod' import fs from 'fs' @@ -52,14 +52,12 @@ export interface AgentToolCallResult { } interface AgentToolManagerOptions { - yoBrowserPresenter: IYoBrowserPresenter agentWorkspacePath: string | null configPresenter: IConfigPresenter commandPermissionHandler?: CommandPermissionService } export class AgentToolManager { - private readonly yoBrowserPresenter: IYoBrowserPresenter private agentWorkspacePath: string | null private fileSystemHandler: AgentFileSystemHandler | null = null private bashHandler: AgentBashHandler | null = null @@ -232,7 +230,6 @@ export class AgentToolManager { } constructor(options: AgentToolManagerOptions) { - this.yoBrowserPresenter = options.yoBrowserPresenter this.agentWorkspacePath = options.agentWorkspacePath this.configPresenter = options.configPresenter this.commandPermissionHandler = options.commandPermissionHandler @@ -275,17 +272,7 @@ export class AgentToolManager { this.agentWorkspacePath = effectiveWorkspacePath } - // 1. Yo Browser tools (agent mode only) - if (isAgentMode) { - try { - const yoDefs = await this.yoBrowserPresenter.getToolDefinitions(context.supportsVision) - defs.push(...yoDefs) - } catch (error) { - logger.warn('[AgentToolManager] Failed to load Yo Browser tool definitions', { error }) - } - } - - // 2. FileSystem tools (agent mode only) + // 1. FileSystem tools (agent mode only) if (isAgentMode && this.fileSystemHandler) { const fsDefs = this.getFileSystemToolDefinitions() defs.push(...fsDefs) @@ -324,6 +311,15 @@ export class AgentToolManager { } } + // 5. YoBrowser CDP tools (agent mode only) + if (isAgentMode) { + try { + defs.push(...presenter.yoBrowserPresenter.toolHandler.getToolDefinitions()) + } catch (error) { + logger.warn('[AgentToolManager] Failed to load YoBrowser tools', { error }) + } + } + return defs } @@ -335,17 +331,6 @@ export class AgentToolManager { args: Record, conversationId?: string ): Promise { - // Route to Yo Browser tools - if (toolName.startsWith('browser_')) { - const response = await this.yoBrowserPresenter.callTool( - toolName, - args as Record - ) - return { - content: typeof response === 'string' ? response : JSON.stringify(response) - } - } - // Route to FileSystem tools if (this.isFileSystemTool(toolName)) { if (!this.fileSystemHandler) { @@ -364,6 +349,14 @@ export class AgentToolManager { return await this.callChatSettingsTool(toolName, args, conversationId) } + // Route to YoBrowser CDP tools + if (toolName.startsWith('yo_browser_')) { + const response = await presenter.yoBrowserPresenter.toolHandler.callTool(toolName, args) + return { + content: response + } + } + throw new Error(`Unknown Agent tool: ${toolName}`) } diff --git a/src/main/presenter/agentPresenter/loop/toolCallProcessor.ts b/src/main/presenter/agentPresenter/loop/toolCallProcessor.ts index b36850a47..dddaf343b 100644 --- a/src/main/presenter/agentPresenter/loop/toolCallProcessor.ts +++ b/src/main/presenter/agentPresenter/loop/toolCallProcessor.ts @@ -52,8 +52,7 @@ const TOOLS_REQUIRING_OFFLOAD = new Set([ 'glob_search', 'grep_search', 'text_replace', - 'browser_read_links', - 'browser_get_clickable_elements' + 'yo_browser_cdp_send' ]) export class ToolCallProcessor { diff --git a/src/main/presenter/browser/BrowserTab.ts b/src/main/presenter/browser/BrowserTab.ts index 11bcfcaea..348b7703e 100644 --- a/src/main/presenter/browser/BrowserTab.ts +++ b/src/main/presenter/browser/BrowserTab.ts @@ -64,6 +64,11 @@ export class BrowserTab { return await this.cdpManager.evaluateScript(session, script) } + async sendCdpCommand(method: string, params?: Record): Promise { + const session = await this.ensureSession() + return await session.sendCommand(method, params ?? {}) + } + async takeScreenshot(options?: ScreenshotOptions): Promise { await this.ensureSession() this.ensureAvailable() diff --git a/src/main/presenter/browser/BrowserToolManager.ts b/src/main/presenter/browser/BrowserToolManager.ts deleted file mode 100644 index 38f441e4d..000000000 --- a/src/main/presenter/browser/BrowserToolManager.ts +++ /dev/null @@ -1,103 +0,0 @@ -import type { RequestHandlerExtra } from '@modelcontextprotocol/sdk/shared/protocol.js' -import type { Notification, Request } from '@modelcontextprotocol/sdk/types.js' -import type { BrowserToolContext, BrowserToolDefinition } from './tools/types' -import { createNavigateTools } from './tools/navigate' -import { createActionTools } from './tools/action' -import { createContentTools } from './tools/content' -// import { createScreenshotTools } from './tools/screenshot' -import { createTabTools } from './tools/tabs' -import { createDownloadTools } from './tools/download' -import type { YoBrowserPresenter } from './YoBrowserPresenter' - -export class BrowserToolManager { - private readonly presenter: YoBrowserPresenter - private readonly tools: BrowserToolDefinition[] - - constructor(presenter: YoBrowserPresenter) { - this.presenter = presenter - this.tools = [ - ...createNavigateTools(), - ...createActionTools(), - ...createContentTools(), - // ...createScreenshotTools(), - ...createTabTools(), - ...createDownloadTools() - ] - } - - getToolDefinitions() { - return this.tools - } - - async executeTool( - toolName: string, - args: any, - extra?: RequestHandlerExtra - ) { - const tool = this.tools.find((t) => t.name === toolName) - if (!tool) { - return { - content: [{ type: 'text', text: `Unknown tool: ${toolName}` }], - isError: true - } - } - - const context = this.createContext() - return await tool.handler(args, context, extra || ({} as any)) - } - - private createContext(): BrowserToolContext { - return { - getTab: async (tabId?: string) => { - return await this.presenter.getBrowserTab(tabId) - }, - getActiveTab: async () => { - return await this.presenter.getBrowserTab() - }, - resolveTabId: async (args?: { tabId?: string }) => { - if (args?.tabId) { - return args.tabId - } - const active = await this.presenter.getActiveTab() - if (active) return active.id - const tabs = await this.presenter.listTabs() - if (tabs.length > 0) return tabs[0].id - const newTab = await this.presenter.createTab('about:blank') - if (!newTab) { - throw new Error('No available tab to operate on') - } - return newTab.id - }, - createTab: async (url?: string) => { - const tab = await this.presenter.createTab(url) - if (!tab) return null - return { id: tab.id, url: tab.url, title: tab.title || '' } - }, - listTabs: async () => { - const tabs = await this.presenter.listTabs() - const active = await this.presenter.getActiveTab() - return tabs.map((tab) => ({ - id: tab.id, - url: tab.url, - title: tab.title || '', - isActive: tab.id === active?.id - })) - }, - activateTab: async (tabId: string) => { - await this.presenter.activateTab(tabId) - }, - closeTab: async (tabId: string) => { - await this.presenter.closeTab(tabId) - }, - downloadFile: async (url: string, savePath?: string) => { - const download = await this.presenter.startDownload(url, savePath) - return { - id: download.id, - url: download.url, - filePath: download.filePath, - status: download.status - } - } - } - } -} diff --git a/src/main/presenter/browser/YoBrowserPresenter.ts b/src/main/presenter/browser/YoBrowserPresenter.ts index d87029b3f..5ffb46c4c 100644 --- a/src/main/presenter/browser/YoBrowserPresenter.ts +++ b/src/main/presenter/browser/YoBrowserPresenter.ts @@ -5,7 +5,6 @@ import { TAB_EVENTS, YO_BROWSER_EVENTS } from '@/events' import { BrowserTabInfo, BrowserContextSnapshot, ScreenshotOptions } from '@shared/types/browser' import { IYoBrowserPresenter, - MCPToolDefinition, DownloadInfo, IWindowPresenter, ITabPresenter @@ -14,9 +13,8 @@ import { BrowserTab } from './BrowserTab' import { CDPManager } from './CDPManager' import { ScreenshotManager } from './ScreenshotManager' import { DownloadManager } from './DownloadManager' -import { BrowserToolManager } from './BrowserToolManager' -import { zodToJsonSchema } from 'zod-to-json-schema' import { clearYoBrowserSessionData } from './yoBrowserSession' +import { YoBrowserToolHandler } from './YoBrowserToolHandler' export class YoBrowserPresenter implements IYoBrowserPresenter { private windowId: number | null = null @@ -28,14 +26,14 @@ export class YoBrowserPresenter implements IYoBrowserPresenter { private readonly cdpManager = new CDPManager() private readonly screenshotManager = new ScreenshotManager(this.cdpManager) private readonly downloadManager = new DownloadManager() - private readonly browserToolManager: BrowserToolManager private readonly windowPresenter: IWindowPresenter private readonly tabPresenter: ITabPresenter + readonly toolHandler: YoBrowserToolHandler constructor(windowPresenter: IWindowPresenter, tabPresenter: ITabPresenter) { this.windowPresenter = windowPresenter this.tabPresenter = tabPresenter - this.browserToolManager = new BrowserToolManager(this) + this.toolHandler = new YoBrowserToolHandler(this) eventBus.on(TAB_EVENTS.CLOSED, (tabId: number) => this.handleTabClosed(tabId)) } @@ -170,10 +168,6 @@ export class YoBrowserPresenter implements IYoBrowserPresenter { return this.toTabInfo(tab) } - async getBrowserTab(tabId?: string): Promise { - return await this.resolveTab(tabId) - } - async goBack(tabId?: string): Promise { const tab = await this.resolveTab(tabId) if (tab?.contents.canGoBack()) { @@ -316,49 +310,6 @@ export class YoBrowserPresenter implements IYoBrowserPresenter { return this.viewIdToTabId.get(viewId) ?? null } - async getToolDefinitions(_supportsVision: boolean): Promise { - // Only return browser_* tools from BrowserToolManager - const browserTools = this.browserToolManager.getToolDefinitions() - const browserMcpTools: MCPToolDefinition[] = browserTools.map((tool) => { - const jsonSchema = zodToJsonSchema(tool.schema) as { - type?: string - properties?: Record - required?: string[] - [key: string]: unknown - } - return { - type: 'function' as const, - function: { - name: tool.name, - description: tool.description, - parameters: { - type: 'object' as const, - properties: (jsonSchema.properties || {}) as Record, - required: (jsonSchema.required || []) as string[] - } - }, - server: { - name: 'yo-browser', - icons: '🌐', - description: 'DeepChat built-in Yo Browser' - } - } - }) - return browserMcpTools - } - - async callTool(toolName: string, params: Record): Promise { - const result = await this.browserToolManager.executeTool(toolName, params) - const textParts = result.content - .filter((c): c is { type: 'text'; text: string } => c.type === 'text') - .map((c) => c.text) - const textContent = textParts.join('\n\n') - if (result.isError) { - throw new Error(textContent || 'Tool execution failed') - } - return textContent - } - async captureScreenshot(tabId: string, options?: ScreenshotOptions): Promise { const tab = await this.resolveTab(tabId) if (!tab) { @@ -641,6 +592,10 @@ export class YoBrowserPresenter implements IYoBrowserPresenter { eventBus.sendToRenderer(YO_BROWSER_EVENTS.TAB_CLOSED, SendTarget.ALL_WINDOWS, tabId) } + async getBrowserTab(tabId?: string): Promise { + return await this.resolveTab(tabId) + } + private emitTabActivated(tabId: string) { eventBus.sendToRenderer(YO_BROWSER_EVENTS.TAB_ACTIVATED, SendTarget.ALL_WINDOWS, tabId) } diff --git a/src/main/presenter/browser/YoBrowserToolDefinitions.ts b/src/main/presenter/browser/YoBrowserToolDefinitions.ts new file mode 100644 index 000000000..d8fbceff2 --- /dev/null +++ b/src/main/presenter/browser/YoBrowserToolDefinitions.ts @@ -0,0 +1,216 @@ +import { z } from 'zod' +import { zodToJsonSchema } from 'zod-to-json-schema' +import type { MCPToolDefinition } from '@shared/presenter' + +const yoBrowserSchemas = { + tab_list: z.object({}), + tab_new: z.object({ + url: z.string().url().optional().describe('Optional URL to navigate to when creating the tab') + }), + tab_activate: z.object({ + tabId: z.string().min(1).describe('ID of the tab to activate') + }), + tab_close: z.object({ + tabId: z.string().min(1).describe('ID of the tab to close') + }), + cdp_send: z.object({ + tabId: z.string().optional().describe('Optional tab ID. If omitted, uses the active tab'), + method: z + .enum([ + 'Page.navigate', + 'Page.reload', + 'Page.captureScreenshot', + 'Runtime.evaluate', + 'DOM.getDocument', + 'DOM.querySelector', + 'DOM.querySelectorAll', + 'DOM.getOuterHTML', + 'Input.dispatchMouseEvent', + 'Input.dispatchKeyEvent' + ]) + .describe('Common CDP method name'), + params: z + .union([ + z + .object({ + url: z.string().url().describe('Example: "https://example.com"') + }) + .describe('For Page.navigate. Example: {"url":"https://example.com"}'), + z + .object({ + ignoreCache: z.boolean().optional().describe('Example: true'), + scriptToEvaluateOnLoad: z + .string() + .optional() + .describe('Example: "console.log(document.title)"') + }) + .describe('For Page.reload. Example: {"ignoreCache":true}'), + z + .object({ + format: z.enum(['png', 'jpeg']).optional().describe('Example: "png"'), + quality: z.number().int().min(0).max(100).optional().describe('Example: 80'), + clip: z + .object({ + x: z.number().describe('Example: 0'), + y: z.number().describe('Example: 0'), + width: z.number().positive().describe('Example: 800'), + height: z.number().positive().describe('Example: 600'), + scale: z.number().positive().optional().describe('Example: 1') + }) + .optional() + .describe('Example: {"x":0,"y":0,"width":800,"height":600,"scale":1}') + }) + .describe('For Page.captureScreenshot. Example: {"format":"png"}'), + z + .object({ + expression: z.string().min(1).describe('Example: "document.title"'), + returnByValue: z.boolean().optional().describe('Example: true'), + awaitPromise: z.boolean().optional().describe('Example: true') + }) + .describe( + 'For Runtime.evaluate. Example: {"expression":"document.title","returnByValue":true}' + ), + z + .object({ + depth: z.number().int().min(0).optional().describe('Example: 1'), + pierce: z.boolean().optional().describe('Example: true') + }) + .describe('For DOM.getDocument. Example: {"depth":1,"pierce":true}'), + z + .object({ + nodeId: z.number().int().positive().describe('Example: 1'), + selector: z.string().min(1).describe('Example: "body"') + }) + .describe('For DOM.querySelector. Example: {"nodeId":1,"selector":"body"}'), + z + .object({ + nodeId: z.number().int().positive().describe('Example: 1'), + selector: z.string().min(1).describe('Example: "a"') + }) + .describe('For DOM.querySelectorAll. Example: {"nodeId":1,"selector":"a"}'), + z + .object({ + nodeId: z.number().int().positive().describe('Example: 1') + }) + .describe('For DOM.getOuterHTML. Example: {"nodeId":1}'), + z + .object({ + type: z + .enum(['mousePressed', 'mouseReleased', 'mouseMoved']) + .describe('Example: "mousePressed"'), + x: z.number().describe('Example: 120'), + y: z.number().describe('Example: 240'), + button: z + .enum(['none', 'left', 'middle', 'right']) + .optional() + .describe('Example: "left"'), + clickCount: z.number().int().min(1).optional().describe('Example: 1') + }) + .describe( + 'For Input.dispatchMouseEvent. Example: {"type":"mousePressed","x":120,"y":240,"button":"left","clickCount":1}' + ), + z + .object({ + type: z.enum(['keyDown', 'keyUp', 'rawKeyDown', 'char']).describe('Example: "keyDown"'), + key: z.string().optional().describe('Example: "a"'), + code: z.string().optional().describe('Example: "KeyA"'), + text: z.string().optional().describe('Example: "a"') + }) + .describe( + 'For Input.dispatchKeyEvent. Example: {"type":"keyDown","key":"a","code":"KeyA","text":"a"}' + ) + ]) + .describe('Parameters for the selected CDP method. Must be an object, not a JSON string') + }) +} + +export function getYoBrowserToolDefinitions(): MCPToolDefinition[] { + return [ + { + type: 'function', + function: { + name: 'yo_browser_tab_list', + description: 'List all browser tabs and identify the active tab', + parameters: zodToJsonSchema(yoBrowserSchemas.tab_list) as { + type: string + properties: Record + required?: string[] + } + }, + server: { + name: 'yobrowser', + icons: '🌐', + description: 'YoBrowser CDP automation' + } + }, + { + type: 'function', + function: { + name: 'yo_browser_tab_new', + description: 'Create a new browser tab with an optional URL', + parameters: zodToJsonSchema(yoBrowserSchemas.tab_new) as { + type: string + properties: Record + required?: string[] + } + }, + server: { + name: 'yobrowser', + icons: '🌐', + description: 'YoBrowser CDP automation' + } + }, + { + type: 'function', + function: { + name: 'yo_browser_tab_activate', + description: 'Make a specific tab the active tab', + parameters: zodToJsonSchema(yoBrowserSchemas.tab_activate) as { + type: string + properties: Record + required?: string[] + } + }, + server: { + name: 'yobrowser', + icons: '🌐', + description: 'YoBrowser CDP automation' + } + }, + { + type: 'function', + function: { + name: 'yo_browser_tab_close', + description: 'Close a specific browser tab', + parameters: zodToJsonSchema(yoBrowserSchemas.tab_close) as { + type: string + properties: Record + required?: string[] + } + }, + server: { + name: 'yobrowser', + icons: '🌐', + description: 'YoBrowser CDP automation' + } + }, + { + type: 'function', + function: { + name: 'yo_browser_cdp_send', + description: + 'Send a Chrome DevTools Protocol (CDP) command to a browser tab. Use this for navigation, content extraction, and DOM interaction', + parameters: zodToJsonSchema(yoBrowserSchemas.cdp_send) as { + type: string + properties: Record + required?: string[] + } + }, + server: { + name: 'yobrowser', + icons: '🌐', + description: 'YoBrowser CDP automation' + } + } + ] +} diff --git a/src/main/presenter/browser/YoBrowserToolHandler.ts b/src/main/presenter/browser/YoBrowserToolHandler.ts new file mode 100644 index 000000000..ffc1f07a9 --- /dev/null +++ b/src/main/presenter/browser/YoBrowserToolHandler.ts @@ -0,0 +1,131 @@ +import logger from '@shared/logger' +import { getYoBrowserToolDefinitions } from './YoBrowserToolDefinitions' +import type { YoBrowserPresenter } from './YoBrowserPresenter' + +export class YoBrowserToolHandler { + private readonly presenter: YoBrowserPresenter + + constructor(presenter: YoBrowserPresenter) { + this.presenter = presenter + } + + getToolDefinitions(): any[] { + return getYoBrowserToolDefinitions() + } + + async callTool(toolName: string, args: Record): Promise { + try { + switch (toolName) { + case 'yo_browser_tab_list': { + return await this.handleTabList() + } + case 'yo_browser_tab_new': { + const url = typeof args.url === 'string' ? args.url : undefined + return await this.handleTabNew(url) + } + case 'yo_browser_tab_activate': { + const tabId = typeof args.tabId === 'string' ? args.tabId : '' + if (!tabId) { + throw new Error('tabId is required') + } + return await this.handleTabActivate(tabId) + } + case 'yo_browser_tab_close': { + const tabId = typeof args.tabId === 'string' ? args.tabId : '' + if (!tabId) { + throw new Error('tabId is required') + } + return await this.handleTabClose(tabId) + } + case 'yo_browser_cdp_send': { + const tabId = typeof args.tabId === 'string' ? args.tabId : undefined + const method = typeof args.method === 'string' ? args.method : '' + const params = this.normalizeCdpParams(args.params) + return await this.handleCdpSend(tabId, method, params) + } + default: + throw new Error(`Unknown YoBrowser tool: ${toolName}`) + } + } catch (error) { + logger.error('[YoBrowserToolHandler] Tool execution failed', { toolName, error }) + throw error + } + } + + private async handleTabList(): Promise { + const tabs = await this.presenter.listTabs() + const activeTab = await this.presenter.getActiveTab() + return JSON.stringify({ + activeTabId: activeTab?.id ?? null, + tabs: tabs.map((tab: any) => ({ + id: tab.id, + url: tab.url, + title: tab.title, + isActive: tab.id === activeTab?.id + })) + }) + } + + private async handleTabNew(url?: string): Promise { + const tab = await this.presenter.createTab(url) + if (!tab) { + throw new Error('Failed to create new tab') + } + return JSON.stringify({ + id: tab.id, + url: tab.url, + title: tab.title + }) + } + + private async handleTabActivate(tabId: string): Promise { + await this.presenter.activateTab(tabId) + return JSON.stringify({ success: true, tabId }) + } + + private async handleTabClose(tabId: string): Promise { + await this.presenter.closeTab(tabId) + return JSON.stringify({ success: true, tabId }) + } + + private async handleCdpSend( + tabId: string | undefined, + method: string, + params: Record + ): Promise { + if (!method) { + throw new Error('CDP method is required') + } + const browserTab = await this.presenter.getBrowserTab(tabId) + if (!browserTab) { + throw new Error(tabId ? `Tab ${tabId} not found` : 'No active tab available') + } + if (tabId) { + const resolvedTabId = + (browserTab as { tabId?: string; id?: string }).tabId ?? (browserTab as { id?: string }).id + if (resolvedTabId !== tabId) { + throw new Error(`Tab ${tabId} not found`) + } + } + + const response = await browserTab.sendCdpCommand(method, params) + return JSON.stringify(response ?? {}) + } + + private normalizeCdpParams(value: unknown): Record { + if (typeof value === 'object' && value !== null && !Array.isArray(value)) { + return value as Record + } + if (typeof value === 'string' && value.trim()) { + try { + const parsed = JSON.parse(value) + if (typeof parsed === 'object' && parsed !== null && !Array.isArray(parsed)) { + return parsed as Record + } + } catch { + return {} + } + } + return {} + } +} diff --git a/src/main/presenter/browser/tools/action.ts b/src/main/presenter/browser/tools/action.ts deleted file mode 100644 index 698f0d1bf..000000000 --- a/src/main/presenter/browser/tools/action.ts +++ /dev/null @@ -1,200 +0,0 @@ -import { z } from 'zod' -import type { BrowserToolDefinition } from './types' - -const BaseArgsSchema = z.object({ - tabId: z.string().optional().describe('Tab identifier (defaults to active tab)') -}) - -const SelectorSchema = BaseArgsSchema.extend({ - selector: z.string().min(1).describe('CSS selector of the target element') -}) - -const ClickArgsSchema = SelectorSchema -const HoverArgsSchema = SelectorSchema - -const FormInputArgsSchema = SelectorSchema.extend({ - value: z.string().describe('Value to fill into the element'), - append: z - .boolean() - .optional() - .default(false) - .describe('Append to existing value instead of replacing') -}) - -const SelectArgsSchema = SelectorSchema.extend({ - value: z.union([z.string(), z.array(z.string())]).describe('Value or values to select') -}) - -const ScrollArgsSchema = BaseArgsSchema.extend({ - x: z.number().optional().default(0).describe('Horizontal scroll distance'), - y: z.number().optional().default(500).describe('Vertical scroll distance'), - behavior: z.enum(['auto', 'smooth']).optional().default('auto').describe('Scroll behavior') -}) - -const PressKeyArgsSchema = BaseArgsSchema.extend({ - key: z.string().min(1).describe('Key to press'), - count: z.number().int().min(1).optional().default(1).describe('Number of times to press the key') -}) - -export function createActionTools(): BrowserToolDefinition[] { - return [ - { - name: 'browser_click', - description: 'Click an element on the page using a CSS selector.', - schema: ClickArgsSchema, - handler: async (args, context) => { - const parsed = ClickArgsSchema.parse(args) - const tabId = await context.resolveTabId(parsed) - const tab = await context.getTab(tabId) - if (!tab) { - return { - content: [{ type: 'text', text: `Tab ${tabId} not found` }], - isError: true - } - } - await tab.waitForSelector(parsed.selector, { timeout: 5000 }) - await tab.click(parsed.selector) - return { - content: [ - { - type: 'text', - text: `Clicked element ${parsed.selector}` - } - ] - } - } - }, - { - name: 'browser_hover', - description: 'Hover over an element on the page.', - schema: HoverArgsSchema, - handler: async (args, context) => { - const parsed = HoverArgsSchema.parse(args) - const tabId = await context.resolveTabId(parsed) - const tab = await context.getTab(tabId) - if (!tab) { - return { - content: [{ type: 'text', text: `Tab ${tabId} not found` }], - isError: true - } - } - await tab.waitForSelector(parsed.selector, { timeout: 5000 }) - await tab.hover(parsed.selector) - return { - content: [ - { - type: 'text', - text: `Hovered over ${parsed.selector}` - } - ] - } - } - }, - { - name: 'browser_form_input_fill', - description: 'Fill text into an input or textarea element.', - schema: FormInputArgsSchema, - handler: async (args, context) => { - const parsed = FormInputArgsSchema.parse(args) - const tabId = await context.resolveTabId(parsed) - const tab = await context.getTab(tabId) - if (!tab) { - return { - content: [{ type: 'text', text: `Tab ${tabId} not found` }], - isError: true - } - } - await tab.waitForSelector(parsed.selector, { timeout: 5000 }) - await tab.fill(parsed.selector, parsed.value, parsed.append) - return { - content: [ - { - type: 'text', - text: `Filled ${parsed.selector} with value` - } - ] - } - } - }, - { - name: 'browser_select', - description: 'Select value(s) within a select element.', - schema: SelectArgsSchema, - handler: async (args, context) => { - const parsed = SelectArgsSchema.parse(args) - const tabId = await context.resolveTabId(parsed) - const tab = await context.getTab(tabId) - if (!tab) { - return { - content: [{ type: 'text', text: `Tab ${tabId} not found` }], - isError: true - } - } - await tab.waitForSelector(parsed.selector, { timeout: 5000 }) - await tab.select(parsed.selector, parsed.value) - return { - content: [ - { - type: 'text', - text: `Updated selection for ${parsed.selector}` - } - ] - } - } - }, - { - name: 'browser_scroll', - description: 'Scroll the page by specified offsets.', - schema: ScrollArgsSchema, - handler: async (args, context) => { - const parsed = ScrollArgsSchema.parse(args) - const tabId = await context.resolveTabId(parsed) - const tab = await context.getTab(tabId) - if (!tab) { - return { - content: [{ type: 'text', text: `Tab ${tabId} not found` }], - isError: true - } - } - await tab.scroll({ - x: parsed.x, - y: parsed.y, - behavior: parsed.behavior - }) - return { - content: [ - { - type: 'text', - text: `Scrolled by x=${parsed.x}, y=${parsed.y}` - } - ] - } - } - }, - { - name: 'browser_press_key', - description: 'Send keyboard input to the active page.', - schema: PressKeyArgsSchema, - handler: async (args, context) => { - const parsed = PressKeyArgsSchema.parse(args) - const tabId = await context.resolveTabId(parsed) - const tab = await context.getTab(tabId) - if (!tab) { - return { - content: [{ type: 'text', text: `Tab ${tabId} not found` }], - isError: true - } - } - await tab.pressKey(parsed.key, parsed.count) - return { - content: [ - { - type: 'text', - text: `Pressed key "${parsed.key}" ${parsed.count} time(s)` - } - ] - } - } - } - ] -} diff --git a/src/main/presenter/browser/tools/content.ts b/src/main/presenter/browser/tools/content.ts deleted file mode 100644 index cd10a692b..000000000 --- a/src/main/presenter/browser/tools/content.ts +++ /dev/null @@ -1,256 +0,0 @@ -import { z } from 'zod' -import TurndownService from 'turndown' -import type { BrowserToolDefinition } from './types' - -const BaseArgsSchema = z.object({ - tabId: z.string().optional().describe('Tab identifier (defaults to active tab)') -}) - -const SelectorArgsSchema = BaseArgsSchema.extend({ - selector: z.string().optional().describe('Optional CSS selector to scope extraction') -}) - -const ContentArgsSchema = SelectorArgsSchema.extend({ - offset: z - .number() - .int() - .min(0) - .optional() - .default(0) - .describe('Character offset from which to start reading (0 = beginning)'), - limit: z - .number() - .int() - .min(1) - .max(16000) - .optional() - .default(4000) - .describe('Maximum number of characters to return from the offset') -}) - -const LinksArgsSchema = BaseArgsSchema.extend({ - limit: z.number().int().min(1).max(200).optional().default(50).describe('Maximum links to return') -}) - -const ClickableArgsSchema = BaseArgsSchema.extend({ - limit: z - .number() - .int() - .min(1) - .max(200) - .optional() - .default(50) - .describe('Maximum clickable elements to return') -}) - -const turndown = new TurndownService({ - headingStyle: 'atx' -}) - -const DEFAULT_TEXT_LIMIT = 4000 -const MAX_TEXT_LIMIT = 16000 - -const paginateText = ( - fullText: string | undefined, - offset?: number, - limit?: number -): { slice: string; meta?: string } => { - const text = fullText || '' - const length = text.length - const safeOffset = Math.max(0, offset ?? 0) - const safeLimit = Math.min(MAX_TEXT_LIMIT, Math.max(1, limit ?? DEFAULT_TEXT_LIMIT)) - - if (!text) { - return { slice: '', meta: undefined } - } - - if (safeOffset >= length) { - return { - slice: '', - meta: `Offset ${safeOffset} is beyond content length ${length}. No content returned.` - } - } - - const end = Math.min(length, safeOffset + safeLimit) - const slice = text.slice(safeOffset, end) - const remaining = length - end - - if (remaining <= 0) { - return { slice, meta: undefined } - } - - const nextOffset = end - const meta = `Content length: ${length} characters. Returned range: [${safeOffset}, ${end}) (${slice.length} characters). Remaining: ${remaining} characters. To continue reading, call this tool again with offset=${nextOffset}.` - - return { slice, meta } -} - -export function createContentTools(): BrowserToolDefinition[] { - return [ - { - name: 'browser_get_text', - description: - 'Extract visible text from the page or a specific element. Supports offset/limit pagination to avoid overly long outputs.', - schema: ContentArgsSchema, - handler: async (args, context) => { - const parsed = ContentArgsSchema.parse(args) - const tabId = await context.resolveTabId(parsed) - const tab = await context.getTab(tabId) - if (!tab) { - return { - content: [{ type: 'text', text: `Tab ${tabId} not found` }], - isError: true - } - } - - const text = await tab.getInnerText(parsed.selector) - const { slice, meta } = paginateText(text, parsed.offset, parsed.limit) - - const content: { type: 'text'; text: string }[] = [] - - if (!slice && !meta) { - content.push({ type: 'text', text: '(no text found)' }) - } else { - if (meta) { - content.push({ - type: 'text', - text: `[pagination] ${meta}` - }) - } - if (slice) { - content.push({ - type: 'text', - text: slice - }) - } - } - - return { content } - }, - annotations: { - readOnlyHint: true - } - }, - { - name: 'browser_get_markdown', - description: - 'Extract the page content as Markdown. Supports offset/limit pagination to avoid overly long outputs.', - schema: ContentArgsSchema, - handler: async (args, context) => { - const parsed = ContentArgsSchema.parse(args) - const tabId = await context.resolveTabId(parsed) - const tab = await context.getTab(tabId) - if (!tab) { - return { - content: [{ type: 'text', text: `Tab ${tabId} not found` }], - isError: true - } - } - - await tab.waitForNetworkIdle() - const html = await tab.getHtml(parsed.selector) - const markdown = html ? turndown.turndown(html) : '' - const { slice, meta } = paginateText(markdown, parsed.offset, parsed.limit) - - const content: { type: 'text'; text: string }[] = [] - - if (!slice && !meta) { - content.push({ type: 'text', text: '(no content found)' }) - } else { - if (meta) { - content.push({ - type: 'text', - text: `[pagination] ${meta}` - }) - } - if (slice) { - content.push({ - type: 'text', - text: slice - }) - } - } - - return { content } - }, - annotations: { - readOnlyHint: true - } - }, - { - name: 'browser_read_links', - description: 'List hyperlinks on the current page.', - schema: LinksArgsSchema, - handler: async (args, context) => { - const parsed = LinksArgsSchema.parse(args) - const tabId = await context.resolveTabId(parsed) - const tab = await context.getTab(tabId) - if (!tab) { - return { - content: [{ type: 'text', text: `Tab ${tabId} not found` }], - isError: true - } - } - - const links = await tab.getLinks(parsed.limit) - const formatted = - links.length === 0 - ? 'No links found.' - : links - .map((link, index) => `${index + 1}. ${link.text || '(no text)'} -> ${link.href}`) - .join('\n') - - return { - content: [ - { - type: 'text', - text: formatted - } - ] - } - }, - annotations: { - readOnlyHint: true - } - }, - { - name: 'browser_get_clickable_elements', - description: 'List clickable elements with simple selectors.', - schema: ClickableArgsSchema, - handler: async (args, context) => { - const parsed = ClickableArgsSchema.parse(args) - const tabId = await context.resolveTabId(parsed) - const tab = await context.getTab(tabId) - if (!tab) { - return { - content: [{ type: 'text', text: `Tab ${tabId} not found` }], - isError: true - } - } - - const elements = await tab.getClickableElements(parsed.limit) - const formatted = - elements.length === 0 - ? 'No clickable elements found.' - : elements - .map( - (element, index) => - `${index + 1}. [${element.tag}] ${element.text || element.ariaLabel || '(no text)'} -> ${element.selector}` - ) - .join('\n') - - return { - content: [ - { - type: 'text', - text: formatted - } - ] - } - }, - annotations: { - readOnlyHint: true - } - } - ] -} diff --git a/src/main/presenter/browser/tools/download.ts b/src/main/presenter/browser/tools/download.ts deleted file mode 100644 index 0a0fa0fa5..000000000 --- a/src/main/presenter/browser/tools/download.ts +++ /dev/null @@ -1,89 +0,0 @@ -import { z } from 'zod' -import type { BrowserToolDefinition } from './types' - -const DownloadListArgsSchema = z.object({ - tabId: z.string().optional().describe('Tab identifier (defaults to active tab)') -}) - -const DownloadFileArgsSchema = z.object({ - url: z.string().url().describe('File URL to download'), - savePath: z.string().optional().describe('Optional file path to save as'), - tabId: z - .string() - .optional() - .describe('Tab identifier to use for download context (defaults to active tab)') -}) - -export function createDownloadTools(): BrowserToolDefinition[] { - return [ - { - name: 'browser_get_download_list', - description: 'Get download items for the current browser session.', - schema: DownloadListArgsSchema, - handler: async () => { - // Note: Download list functionality needs to be implemented in YoBrowserPresenter - // For now, return empty list - const downloads: Array<{ - filename: string - state: string - receivedBytes: number - totalBytes: number - url: string - }> = [] - const formatted = - downloads.length === 0 - ? 'No downloads yet.' - : downloads - .map( - (item) => - `- ${item.filename} [${item.state}] ${item.receivedBytes}/${item.totalBytes} bytes (${item.url})` - ) - .join('\n') - - return { - content: [ - { - type: 'text', - text: formatted - } - ] - } - } - }, - { - name: 'browser_download_file', - description: - 'Download a file using the browser session, preserving cookies of the active tab.', - schema: DownloadFileArgsSchema, - handler: async (args, context) => { - const parsed = DownloadFileArgsSchema.parse(args) - if (!context.downloadFile) { - return { - content: [{ type: 'text', text: 'Download functionality not available' }], - isError: true - } - } - - try { - const download = await context.downloadFile(parsed.url, parsed.savePath) - return { - content: [ - { - type: 'text', - text: `Download started: ${parsed.url}\nStatus: ${download.status}\nID: ${download.id}${ - download.filePath ? `\nSave path: ${download.filePath}` : '' - }` - } - ] - } - } catch (error) { - const errorMsg = error instanceof Error ? error.message : String(error) - return { - content: [{ type: 'text', text: `Download failed: ${errorMsg}` }], - isError: true - } - } - } - } - ] -} diff --git a/src/main/presenter/browser/tools/navigate.ts b/src/main/presenter/browser/tools/navigate.ts deleted file mode 100644 index 2d6e95938..000000000 --- a/src/main/presenter/browser/tools/navigate.ts +++ /dev/null @@ -1,283 +0,0 @@ -import { z } from 'zod' -import type { BrowserToolDefinition, ToolResult } from './types' - -const NavigateArgsSchema = z.object({ - url: z.string().url().describe('URL to navigate to'), - tabId: z.string().optional().describe('Tab identifier (defaults to active tab)'), - newTab: z.boolean().optional().default(false).describe('Open navigation in a new tab'), - reuse: z - .boolean() - .optional() - .default(true) - .describe('Reuse an existing tab that matches the domain when true') -}) - -const NavigationOnlyArgsSchema = z.object({ - tabId: z.string().optional().describe('Tab identifier (defaults to active tab)') -}) - -export function createNavigateTools(): BrowserToolDefinition[] { - return [ - { - name: 'browser_navigate', - description: 'Navigate the browser to the specified URL.', - schema: NavigateArgsSchema, - handler: async (args, context) => { - const parsed = NavigateArgsSchema.parse(args) - - // Handle new tab creation - if (parsed.newTab) { - if (!context.createTab) { - return { - content: [{ type: 'text', text: 'Tab creation not available' }], - isError: true - } - } - const newTab = await context.createTab(parsed.url) - if (!newTab) { - return { - content: [{ type: 'text', text: 'Failed to create new tab' }], - isError: true - } - } - return { - content: [ - { - type: 'text', - text: `Opened new tab ${newTab.id} -> ${parsed.url}\nTitle: ${newTab.title || 'unknown'}` - } - ] - } - } - - // Handle reuse logic - if (parsed.reuse && !parsed.tabId) { - // Try to find a reusable tab by domain - const tabs = await context.listTabs?.() - if (tabs && tabs.length > 0) { - try { - const targetHost = new URL(parsed.url).hostname - const reusableTab = tabs.find((t) => { - try { - return new URL(t.url).hostname === targetHost - } catch { - return false - } - }) - if (reusableTab) { - const tab = await context.getTab(reusableTab.id) - if (tab) { - await tab.navigate(parsed.url) - await context.activateTab?.(reusableTab.id) - return { - content: [ - { - type: 'text', - text: `Reused tab ${reusableTab.id} and navigated to ${parsed.url}\nTitle: ${tab.title || 'unknown'}` - } - ] - } - } - } - } catch { - // Ignore URL parse errors, fall through to normal navigation - } - } - } - - // Normal navigation - let tab = parsed.tabId ? await context.getTab(parsed.tabId) : await context.getActiveTab() - - if (!tab) { - // Create a new tab if none exists - if (context.createTab) { - const newTab = await context.createTab(parsed.url) - if (newTab) { - // Add a small delay to ensure BrowserTab is fully initialized - // This is especially important on first call when browser window is just created - await new Promise((resolve) => setTimeout(resolve, 100)) - // Get the BrowserTab object and wait for navigation to complete - // Note: createTab already started navigation via tabPresenter.createTab, - // so we just need to wait for it to complete - const browserTab = await context.getTab(newTab.id) - if (browserTab) { - try { - // createTab already started navigation via tabPresenter.createTab - // If tab is loading, wait for it to complete instead of calling navigate again - if (browserTab.contents.isLoading()) { - // Wait for current navigation to complete - await new Promise((resolve, reject) => { - let timeout: ReturnType - let onStopLoading: () => void - let onFailLoad: ( - _event: unknown, - errorCode: number, - errorDescription: string - ) => void - - const cleanup = () => { - clearTimeout(timeout) - browserTab.contents.removeListener('did-stop-loading', onStopLoading) - browserTab.contents.removeListener('did-fail-load', onFailLoad) - } - - onStopLoading = () => { - cleanup() - resolve() - } - - onFailLoad = (_event, errorCode, errorDescription) => { - cleanup() - reject(new Error(`Navigation failed ${errorCode}: ${errorDescription}`)) - } - - timeout = setTimeout(() => { - cleanup() - reject(new Error('Timeout waiting for page load')) - }, 15000) - - browserTab.contents.once('did-stop-loading', onStopLoading) - browserTab.contents.once('did-fail-load', onFailLoad) - }) - - // Check if URL matches after loading - const finalUrl = browserTab.contents.getURL() - if (finalUrl !== parsed.url) { - // URL doesn't match, need to navigate - await browserTab.navigate(parsed.url, 15000) // 15 second timeout - } - } else { - // Tab is not loading, check if URL matches - const currentUrl = browserTab.contents.getURL() - if (currentUrl !== parsed.url) { - // URL doesn't match, need to navigate - await browserTab.navigate(parsed.url, 15000) // 15 second timeout - } - } - - const result: ToolResult = { - content: [ - { - type: 'text' as const, - text: `Created new tab and navigated to ${parsed.url}\nTitle: ${browserTab.title || 'unknown'}` - } - ] - } - return result - } catch (error) { - console.error('[browser_navigate] Failed to navigate newly created tab:', error) - const errorMessage = error instanceof Error ? error.message : String(error) - const result: ToolResult = { - content: [ - { - type: 'text' as const, - text: `Failed to navigate new tab ${browserTab.tabId} to ${parsed.url}\nError: ${errorMessage}\nTitle: ${browserTab.title || 'unknown'}` - } - ], - isError: true - } - return result - } - } - // Fallback if getTab fails - const result: ToolResult = { - content: [ - { - type: 'text' as const, - text: `Created new tab and navigated to ${parsed.url}\nTitle: ${newTab.title || 'unknown'}` - } - ] - } - return result - } - } - const errorResult: ToolResult = { - content: [ - { - type: 'text' as const, - text: 'No active tab available' - } - ], - isError: true - } - return errorResult - } - - await tab.navigate(parsed.url) - const result: ToolResult = { - content: [ - { - type: 'text' as const, - text: `Navigated to ${parsed.url}\nTitle: ${tab.title || 'unknown'}` - } - ] - } - return result - } - }, - { - name: 'browser_go_back', - description: 'Go back to the previous page in the current tab.', - schema: NavigationOnlyArgsSchema, - handler: async (args, context) => { - const parsed = NavigationOnlyArgsSchema.parse(args) - const tabId = await context.resolveTabId(parsed) - const tab = await context.getTab(tabId) - - if (!tab) { - return { - content: [ - { - type: 'text', - text: `Tab ${tabId} not found` - } - ], - isError: true - } - } - - await tab.goBack() - return { - content: [ - { - type: 'text', - text: `Went back. Current URL: ${tab.url || 'about:blank'}` - } - ] - } - } - }, - { - name: 'browser_go_forward', - description: 'Go forward to the next page in the current tab.', - schema: NavigationOnlyArgsSchema, - handler: async (args, context) => { - const parsed = NavigationOnlyArgsSchema.parse(args) - const tabId = await context.resolveTabId(parsed) - const tab = await context.getTab(tabId) - - if (!tab) { - return { - content: [ - { - type: 'text', - text: `Tab ${tabId} not found` - } - ], - isError: true - } - } - - await tab.goForward() - return { - content: [ - { - type: 'text', - text: `Went forward. Current URL: ${tab.url || 'about:blank'}` - } - ] - } - } - } - ] -} diff --git a/src/main/presenter/browser/tools/screenshot.ts b/src/main/presenter/browser/tools/screenshot.ts deleted file mode 100644 index a4de7c1b6..000000000 --- a/src/main/presenter/browser/tools/screenshot.ts +++ /dev/null @@ -1,48 +0,0 @@ -import { z } from 'zod' -import type { BrowserToolDefinition } from './types' - -const ScreenshotArgsSchema = z.object({ - tabId: z.string().optional().describe('Tab identifier (defaults to active tab)'), - selector: z.string().optional().describe('Capture only the element matching this selector'), - fullPage: z.boolean().optional().default(false).describe('Capture the full page'), - highlightSelectors: z - .array(z.string()) - .optional() - .describe('Selectors to highlight before capture') -}) - -export function createScreenshotTools(): BrowserToolDefinition[] { - return [ - { - name: 'browser_screenshot', - description: 'Capture a screenshot of the current page or a specific element.', - schema: ScreenshotArgsSchema, - handler: async (args, context) => { - const parsed = ScreenshotArgsSchema.parse(args) - const tabId = await context.resolveTabId(parsed) - const tab = await context.getTab(tabId) - if (!tab) { - return { - content: [{ type: 'text', text: `Tab ${tabId} not found` }], - isError: true - } - } - - const base64 = await tab.takeScreenshot({ - selector: parsed.selector, - fullPage: parsed.fullPage, - highlightSelectors: parsed.highlightSelectors - }) - - return { - content: [ - { - type: 'text', - text: `data:image/png;base64,${base64}` - } - ] - } - } - } - ] -} diff --git a/src/main/presenter/browser/tools/tabs.ts b/src/main/presenter/browser/tools/tabs.ts deleted file mode 100644 index 5cdec59af..000000000 --- a/src/main/presenter/browser/tools/tabs.ts +++ /dev/null @@ -1,152 +0,0 @@ -import { z } from 'zod' -import type { BrowserToolDefinition } from './types' - -const BaseArgsSchema = z.object({ - tabId: z.string().optional().describe('Tab identifier (defaults to active tab)') -}) - -const NewTabArgsSchema = z.object({ - url: z.string().url().optional().describe('Optional URL to open in the new tab') -}) - -const SwitchTabArgsSchema = z.object({ - tabId: z.string().describe('Tab identifier to activate') -}) - -const CloseTabArgsSchema = z.object({ - tabId: z.string().optional().describe('Tab identifier to close (defaults to active tab)') -}) - -export function createTabTools(): BrowserToolDefinition[] { - return [ - { - name: 'browser_new_tab', - description: 'Open a new browser tab (window) for the session.', - schema: NewTabArgsSchema, - handler: async (args, context) => { - const parsed = NewTabArgsSchema.parse(args) - if (!context.createTab) { - return { - content: [{ type: 'text', text: 'Tab creation not available' }], - isError: true - } - } - const tab = await context.createTab(parsed.url) - if (!tab) { - return { - content: [{ type: 'text', text: 'Failed to create new tab' }], - isError: true - } - } - return { - content: [ - { - type: 'text', - text: `Opened new tab ${tab.id}${parsed.url ? ` -> ${parsed.url}` : ''}` - } - ] - } - } - }, - { - name: 'browser_tab_list', - description: 'List all tabs (windows) for the current session.', - schema: BaseArgsSchema, - handler: async (_args, context) => { - if (!context.listTabs) { - return { - content: [{ type: 'text', text: 'Tab list not available' }], - isError: true - } - } - const tabs = await context.listTabs() - const formatted = - tabs.length === 0 - ? 'No tabs open.' - : tabs - .map( - (tab) => - `${tab.isActive ? '*' : ' '} Tab ${tab.id}: ${tab.title || 'Untitled'} (${tab.url || 'about:blank'})` - ) - .join('\n') - - return { - content: [ - { - type: 'text', - text: formatted - } - ] - } - } - }, - { - name: 'browser_switch_tab', - description: 'Activate a specific tab (window) by its id.', - schema: SwitchTabArgsSchema, - handler: async (args, context) => { - const parsed = SwitchTabArgsSchema.parse(args) - if (!context.activateTab) { - return { - content: [{ type: 'text', text: 'Tab activation not available' }], - isError: true - } - } - const tab = await context.getTab(parsed.tabId) - if (!tab) { - return { - content: [ - { - type: 'text', - text: `Tab ${parsed.tabId} not found` - } - ], - isError: true - } - } - - await context.activateTab(parsed.tabId) - return { - content: [ - { - type: 'text', - text: `Switched to tab ${parsed.tabId}: ${tab.title || 'Untitled'}` - } - ] - } - } - }, - { - name: 'browser_close_tab', - description: 'Close a tab (window). Defaults to the active tab.', - schema: CloseTabArgsSchema, - handler: async (args, context) => { - const parsed = CloseTabArgsSchema.parse(args) - if (!context.closeTab) { - return { - content: [{ type: 'text', text: 'Tab closing not available' }], - isError: true - } - } - const tabId = parsed.tabId - ? parsed.tabId - : (await context.getActiveTab())?.tabId || (await context.resolveTabId(undefined)) - if (!tabId) { - return { - content: [{ type: 'text', text: 'No active tab to close' }], - isError: true - } - } - await context.closeTab(tabId) - return { - content: [ - { - type: 'text', - text: `Closed tab ${tabId}` - } - ] - } - } - } - ] -} diff --git a/src/main/presenter/browser/tools/types.ts b/src/main/presenter/browser/tools/types.ts deleted file mode 100644 index eba58504d..000000000 --- a/src/main/presenter/browser/tools/types.ts +++ /dev/null @@ -1,48 +0,0 @@ -import type { RequestHandlerExtra } from '@modelcontextprotocol/sdk/shared/protocol.js' -import type { CallToolResult, Notification, Request } from '@modelcontextprotocol/sdk/types.js' -import type { ZodTypeAny } from 'zod' -import type { BrowserTab } from '../BrowserTab' - -export type ToolResult = CallToolResult - -export interface BrowserToolContext { - getTab: (tabId?: string) => Promise - getActiveTab: () => Promise - resolveTabId: ( - args: { tabId?: string } | undefined, - extra?: RequestHandlerExtra - ) => Promise - // Tab management methods - createTab?: (url?: string) => Promise<{ id: string; url: string; title: string } | null> - listTabs?: () => Promise> - activateTab?: (tabId: string) => Promise - closeTab?: (tabId: string) => Promise - // Download methods - downloadFile?: ( - url: string, - savePath?: string - ) => Promise<{ - id: string - url: string - filePath?: string - status: string - }> -} - -export interface BrowserToolDefinition { - name: string - description: string - schema: ZodTypeAny - handler: ( - args: any, - context: BrowserToolContext, - extra: RequestHandlerExtra - ) => Promise - annotations?: { - title?: string - readOnlyHint?: boolean - destructiveHint?: boolean - idempotentHint?: boolean - openWorldHint?: boolean - } -} diff --git a/src/main/presenter/toolPresenter/index.ts b/src/main/presenter/toolPresenter/index.ts index 8703b2b10..4d7975806 100644 --- a/src/main/presenter/toolPresenter/index.ts +++ b/src/main/presenter/toolPresenter/index.ts @@ -73,7 +73,6 @@ export class ToolPresenter implements IToolPresenter { // Initialize or update AgentToolManager if workspace path changed if (!this.agentToolManager) { this.agentToolManager = new AgentToolManager({ - yoBrowserPresenter: this.options.yoBrowserPresenter, agentWorkspacePath, configPresenter: this.options.configPresenter, commandPermissionHandler: this.options.commandPermissionHandler diff --git a/src/renderer/src/components/workspace/WorkspaceView.vue b/src/renderer/src/components/workspace/WorkspaceView.vue index d66fbfe13..c23b72f10 100644 --- a/src/renderer/src/components/workspace/WorkspaceView.vue +++ b/src/renderer/src/components/workspace/WorkspaceView.vue @@ -42,6 +42,7 @@ import { computed } from 'vue' import { Icon } from '@iconify/vue' import { useI18n } from 'vue-i18n' import { useWorkspaceStore } from '@/stores/workspace' +import { useYoBrowserStore } from '@/stores/yoBrowser' import { useChatMode } from '@/components/chat-input/composables/useChatMode' import WorkspacePlan from './WorkspacePlan.vue' import WorkspaceFiles from './WorkspaceFiles.vue' @@ -50,8 +51,11 @@ import WorkspaceBrowserTabs from './WorkspaceBrowserTabs.vue' const { t } = useI18n() const store = useWorkspaceStore() +const yoBrowserStore = useYoBrowserStore() const chatMode = useChatMode() -const showBrowserTabs = computed(() => chatMode.currentMode.value === 'agent') +const showBrowserTabs = computed( + () => chatMode.currentMode.value === 'agent' && yoBrowserStore.tabCount > 0 +) const i18nPrefix = computed(() => 'chat.workspace') const titleKey = computed(() => `${i18nPrefix.value}.title`) diff --git a/src/shared/types/presenters/legacy.presenters.d.ts b/src/shared/types/presenters/legacy.presenters.d.ts index 52de26233..50aaf4ad9 100644 --- a/src/shared/types/presenters/legacy.presenters.d.ts +++ b/src/shared/types/presenters/legacy.presenters.d.ts @@ -217,12 +217,14 @@ export interface IYoBrowserPresenter { canGoForward: boolean }> getTabIdByViewId(viewId: number): Promise - getToolDefinitions(supportsVision: boolean): Promise - callTool(toolName: string, params: Record): Promise captureScreenshot(tabId: string, options?: ScreenshotOptions): Promise startDownload(url: string, savePath?: string): Promise clearSandboxData(): Promise shutdown(): Promise + readonly toolHandler: { + getToolDefinitions(): any[] + callTool(toolName: string, args: Record): Promise + } } export interface IWindowPresenter { diff --git a/test/main/presenter/agentPresenter/agentToolManagerSettings.test.ts b/test/main/presenter/agentPresenter/agentToolManagerSettings.test.ts index d2a09741f..d06128922 100644 --- a/test/main/presenter/agentPresenter/agentToolManagerSettings.test.ts +++ b/test/main/presenter/agentPresenter/agentToolManagerSettings.test.ts @@ -18,6 +18,11 @@ vi.mock('@/presenter', () => ({ getActiveSkills: vi.fn(), getActiveSkillsAllowedTools: vi.fn() }, + yoBrowserPresenter: { + toolHandler: { + getToolDefinitions: vi.fn().mockReturnValue([]) + } + }, sessionPresenter: {}, windowPresenter: {} } @@ -27,9 +32,6 @@ describe('AgentToolManager DeepChat settings tool gating', () => { const configPresenter = { getSkillsEnabled: () => true } as any - const yoBrowserPresenter = { - getToolDefinitions: vi.fn().mockResolvedValue([]) - } as any beforeEach(() => { vi.clearAllMocks() @@ -40,7 +42,6 @@ describe('AgentToolManager DeepChat settings tool gating', () => { ;(presenter.skillPresenter.getActiveSkillsAllowedTools as any).mockResolvedValue([]) const manager = new AgentToolManager({ - yoBrowserPresenter, agentWorkspacePath: null, configPresenter }) @@ -64,7 +65,6 @@ describe('AgentToolManager DeepChat settings tool gating', () => { ]) const manager = new AgentToolManager({ - yoBrowserPresenter, agentWorkspacePath: null, configPresenter }) diff --git a/test/main/presenter/agentPresenter/loop/toolCallProcessor.test.ts b/test/main/presenter/agentPresenter/loop/toolCallProcessor.test.ts index a2a4168a0..ad4995e4b 100644 --- a/test/main/presenter/agentPresenter/loop/toolCallProcessor.test.ts +++ b/test/main/presenter/agentPresenter/loop/toolCallProcessor.test.ts @@ -18,8 +18,8 @@ describe('ToolCallProcessor tool output offload', () => { const toolDefinition = { type: 'function', function: { - name: 'mock_tool', - description: 'mock tool', + name: 'execute_command', + description: 'execute command', parameters: { type: 'object', properties: {} @@ -43,7 +43,7 @@ describe('ToolCallProcessor tool output offload', () => { }) it('offloads large tool responses and returns stub content', async () => { - const longOutput = 'x'.repeat(3001) + const longOutput = 'x'.repeat(5001) const rawData = { content: longOutput } as MCPToolResponse const processor = new ToolCallProcessor({ getAllToolDefinitions: async () => [toolDefinition], @@ -57,7 +57,7 @@ describe('ToolCallProcessor tool output offload', () => { const events: any[] = [] for await (const event of processor.process({ eventId: 'event-1', - toolCalls: [{ id: 'tool-1', name: 'mock_tool', arguments: '{}' }], + toolCalls: [{ id: 'tool-1', name: 'execute_command', arguments: '{}' }], enabledMcpTools: [], conversationMessages, modelConfig,