feat: agent mode tool-calling via Vercel AI SDK by gabrielste1n · Pull Request #451 · OpenWhispr/openwhispr

gabrielste1n · 2026-03-16T02:16:59Z

Summary

Migrate agent mode streaming from manual fetch/SSE to Vercel AI SDK streamText() with stepCountIs() for multi-step tool calling
Add unified AI provider factory supporting OpenAI, Groq, Anthropic, Gemini, and custom endpoints
Add tool registry with search notes, web search, clipboard copy, and calendar tools
Remove manual AgentLoop (185 lines) in favor of AI SDK's built-in step management

Changes

New files:

src/services/ai/providers.ts — getAIModel() factory wrapping all AI SDK providers
src/services/tools/ — ToolRegistry, searchNotesTool, webSearchTool, calendarTool, clipboardTool

Modified:

src/services/ReasoningService.ts — Add processTextStreamingAI() with tool support, AgentStreamChunk type with content/tool_calls/tool_result/done variants
src/components/AgentOverlay.tsx — 3 streaming paths (tools/cloud/BYOK), mounted ref guard, AudioManager cleanup, proper tool-result display
src/services/tools/ToolRegistry.ts — toAISDKFormat() with error handling via try/catch

Deleted:

src/services/AgentLoop.ts — Replaced by AI SDK stepCountIs()

Details:

Groq models with disableThinking flag pass providerOptions: { groq: { reasoningEffort: "none" } }
Tool execution errors return { error } to AI SDK instead of throwing
Stream loops break on unmount via mountedRef to prevent state updates on unmounted component
AudioManager properly cleaned up on overlay unmount
All 5 AI SDK packages added: ai, @ai-sdk/openai, @ai-sdk/groq, @ai-sdk/anthropic, @ai-sdk/google

Test plan

Verify agent chat works with OpenAI, Groq, Anthropic, Gemini providers
Verify tool calling works (search notes, web search, clipboard, calendar)
Verify tool results display in UI (not hardcoded "Done")
Verify Groq Qwen3 32B works without thinking mode errors
Verify cloud agent mode still works (IPC path unchanged)
Verify local model fallback still works
Verify closing overlay mid-stream doesn't cause React warnings
Verify custom OpenAI-compatible endpoints work

Transform Agent Mode from text-only chat into a full agentic experience with native tool/function calling. The agent can now search notes, copy to clipboard, search the web (via cloud API), and check calendar events. - Tool registry with OpenAI-compatible function calling format - ReAct execution loop with parallel read-only tool execution - SSE streaming with incremental tool call argument accumulation - Inline tool execution UI (compact pills with status animations) - Text input field alongside voice input for tool-heavy workflows - Dynamic system prompt with tool usage instructions - IPC handler for web search via OpenWhispr cloud API - Database migration for tool message metadata - i18n strings for all 10 supported locales

Replace manual SSE parsing and AgentLoop with AI SDK streamText + stepCountIs for tool-calling agent mode. Add unified provider factory supporting OpenAI, Groq, Anthropic, Gemini, and custom endpoints. - Add ai, @ai-sdk/openai, @ai-sdk/groq, @ai-sdk/anthropic, @ai-sdk/google - Add src/services/ai/providers.ts with getAIModel factory - Add ToolRegistry.toAISDKFormat() using jsonSchema wrapper - Add ReasoningService.processTextStreamingAI() with full tool support - Remove AgentLoop.ts (replaced by stepCountIs) - Remove dead toOpenAIFormat/OpenAIFunctionTool from ToolRegistry - Simplify AgentOverlay to 3 streaming paths (tools/cloud/BYOK)

…isableThinking Uses AI SDK's providerOptions API to send reasoning_effort: "none" to Groq for models flagged with disableThinking in the model registry.

- Add mounted ref to guard state updates after overlay unmount - Add AudioManager.cleanup() call on unmount - Handle tool-result stream chunks to show actual results in UI - Set tool status to "executing" until result arrives (fix state thrashing) - Add try/catch in ToolRegistry.toAISDKFormat() execute wrapper - Extend AgentStreamChunk type with tool_result variant - Add success field to IPC web-search response for contract consistency

- Add get_note tool to fetch full note content by ID - Add create_note tool with folder resolution and cloud sync - Add update_note tool for title, content, and folder changes - Add shared resolveFolderId utility for folder name lookup - Include note ID in search_notes results for cross-tool reference - Register tools and add system prompt instructions - Add translation keys for all 10 locales

- Enable tool calling for cloud agent mode (clipboard, notes, web search, calendar) - Implement NDJSON streaming via IPC batch approach for reliable event delivery - Add multi-step tool-calling loop with AI SDK v6 message format - Fix mountedRef StrictMode bug causing empty renders - Return actual tool result data to LLM instead of just display text - Extract MAX_TOOL_STEPS constant, remove redundant comments

When cloud backup is enabled and user is signed in, the search_notes agent tool now uses the cloud hybrid search (pgvector + FTS) instead of local SQLite FTS5 keyword search. Falls back to local search transparently on cloud failure.

Replace the buffered IPC middleman with direct fetch from the renderer to the API. Text now streams token-by-token instead of arriving all at once after the full response completes. Adds AbortController support for instant cancellation on unmount or user stop.

The renderer's fetch() can't access auth cookies stored on the Neon Auth domain. Add a get-session-cookies IPC handler to retrieve them from the main process cookie jar and forward as an explicit Cookie header.

Browser fetch() forbids setting the Cookie header, so direct renderer-to- API streaming can't authenticate. Switch to event-based IPC: main process reads the API stream and forwards each chunk via webContents.send() as it arrives. This matches the pattern used for AssemblyAI/Deepgram streaming.

Replace basic tool pills with step-based visualization showing the full tool lifecycle: shimmer accent while executing, checkmark pop on completion, expandable detail, and contextual clipboard confirmation. - ToolCallStep: left-border accent, per-tool icons, shimmer animation - Clipboard: inline "Copied to your clipboard" with green check - Input bar: thinking shimmer bar, tool icon in executing state - Empty state: mic icon with dual-line CTA - Title bar: shows agent name, softer shadow - Tighter spacing throughout (gaps, heights, bubble widths) - New CSS: tool-step-shimmer, tool-check-pop, tool-status-sweep - i18n: copiedToClipboard + orType in all 10 locales

…ults The executeToolCall callback was returning raw data (full JSON) which was displayed in the tool step UI. Now returns { data, displayText } so the LLM gets structured data while the UI shows a human-readable summary. Also truncates Exa web search article text to 500 chars per result to prevent massive payloads.

- Increase MAX_TOOL_STEPS from 5 to 20 to prevent agent from getting stuck on multi-step workflows - Add metadata field to ToolCallInfo so tool results can carry structured data (e.g. note ID) to the UI without mixing it into displayText - Created/updated/fetched notes show as clickable steps with a primary accent — clicking opens the note in the control panel - New agent-open-note IPC handler navigates the control panel to the note

- Note cards: when create/update/get_note completes, render a compact clickable card at the bottom of the message bubble with title, icon, and "Open note" label. Clicking opens the note in the control panel. - Input bar: replace generic loading dots with dictation-panel-inspired indicators — pulsing blue circle for listening, accent wave bars for transcribing, shimmer sweep for thinking. - Fix duplicate copiedToClipboard keys in all locale files.

…mode The agent-open-note IPC was reusing navigate-to-meeting-note which activates meeting recording mode. Add a dedicated navigate-to-note event that only sets the active note and view without starting the recorder.

gabrielste1n added 15 commits March 15, 2026 17:49

fix: pass reasoning_effort via providerOptions for Groq models with d…

e6f0cda

…isableThinking Uses AI SDK's providerOptions API to send reasoning_effort: "none" to Groq for models flagged with disableThinking in the model registry.

fix: forward session cookies via IPC for direct agent streaming

0921eec

The renderer's fetch() can't access auth cookies stored on the Neon Auth domain. Add a get-session-cookies IPC handler to retrieve them from the main process cookie jar and forward as an explicit Cookie header.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: agent mode tool-calling via Vercel AI SDK#451

feat: agent mode tool-calling via Vercel AI SDK#451
gabrielste1n wants to merge 15 commits intomainfrom
feat/agent-tool-calling

gabrielste1n commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

gabrielste1n commented Mar 16, 2026

Summary

Changes

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant