Skip to content

feat: agent mode tool-calling via Vercel AI SDK#451

Open
gabrielste1n wants to merge 15 commits intomainfrom
feat/agent-tool-calling
Open

feat: agent mode tool-calling via Vercel AI SDK#451
gabrielste1n wants to merge 15 commits intomainfrom
feat/agent-tool-calling

Conversation

@gabrielste1n
Copy link
Collaborator

Summary

  • Migrate agent mode streaming from manual fetch/SSE to Vercel AI SDK streamText() with stepCountIs() for multi-step tool calling
  • Add unified AI provider factory supporting OpenAI, Groq, Anthropic, Gemini, and custom endpoints
  • Add tool registry with search notes, web search, clipboard copy, and calendar tools
  • Remove manual AgentLoop (185 lines) in favor of AI SDK's built-in step management

Changes

New files:

  • src/services/ai/providers.tsgetAIModel() factory wrapping all AI SDK providers
  • src/services/tools/ — ToolRegistry, searchNotesTool, webSearchTool, calendarTool, clipboardTool

Modified:

  • src/services/ReasoningService.ts — Add processTextStreamingAI() with tool support, AgentStreamChunk type with content/tool_calls/tool_result/done variants
  • src/components/AgentOverlay.tsx — 3 streaming paths (tools/cloud/BYOK), mounted ref guard, AudioManager cleanup, proper tool-result display
  • src/services/tools/ToolRegistry.tstoAISDKFormat() with error handling via try/catch

Deleted:

  • src/services/AgentLoop.ts — Replaced by AI SDK stepCountIs()

Details:

  • Groq models with disableThinking flag pass providerOptions: { groq: { reasoningEffort: "none" } }
  • Tool execution errors return { error } to AI SDK instead of throwing
  • Stream loops break on unmount via mountedRef to prevent state updates on unmounted component
  • AudioManager properly cleaned up on overlay unmount
  • All 5 AI SDK packages added: ai, @ai-sdk/openai, @ai-sdk/groq, @ai-sdk/anthropic, @ai-sdk/google

Test plan

  • Verify agent chat works with OpenAI, Groq, Anthropic, Gemini providers
  • Verify tool calling works (search notes, web search, clipboard, calendar)
  • Verify tool results display in UI (not hardcoded "Done")
  • Verify Groq Qwen3 32B works without thinking mode errors
  • Verify cloud agent mode still works (IPC path unchanged)
  • Verify local model fallback still works
  • Verify closing overlay mid-stream doesn't cause React warnings
  • Verify custom OpenAI-compatible endpoints work

Transform Agent Mode from text-only chat into a full agentic experience
with native tool/function calling. The agent can now search notes, copy
to clipboard, search the web (via cloud API), and check calendar events.

- Tool registry with OpenAI-compatible function calling format
- ReAct execution loop with parallel read-only tool execution
- SSE streaming with incremental tool call argument accumulation
- Inline tool execution UI (compact pills with status animations)
- Text input field alongside voice input for tool-heavy workflows
- Dynamic system prompt with tool usage instructions
- IPC handler for web search via OpenWhispr cloud API
- Database migration for tool message metadata
- i18n strings for all 10 supported locales
Replace manual SSE parsing and AgentLoop with AI SDK streamText +
stepCountIs for tool-calling agent mode. Add unified provider factory
supporting OpenAI, Groq, Anthropic, Gemini, and custom endpoints.

- Add ai, @ai-sdk/openai, @ai-sdk/groq, @ai-sdk/anthropic, @ai-sdk/google
- Add src/services/ai/providers.ts with getAIModel factory
- Add ToolRegistry.toAISDKFormat() using jsonSchema wrapper
- Add ReasoningService.processTextStreamingAI() with full tool support
- Remove AgentLoop.ts (replaced by stepCountIs)
- Remove dead toOpenAIFormat/OpenAIFunctionTool from ToolRegistry
- Simplify AgentOverlay to 3 streaming paths (tools/cloud/BYOK)
…isableThinking

Uses AI SDK's providerOptions API to send reasoning_effort: "none" to
Groq for models flagged with disableThinking in the model registry.
- Add mounted ref to guard state updates after overlay unmount
- Add AudioManager.cleanup() call on unmount
- Handle tool-result stream chunks to show actual results in UI
- Set tool status to "executing" until result arrives (fix state thrashing)
- Add try/catch in ToolRegistry.toAISDKFormat() execute wrapper
- Extend AgentStreamChunk type with tool_result variant
- Add success field to IPC web-search response for contract consistency
- Add get_note tool to fetch full note content by ID
- Add create_note tool with folder resolution and cloud sync
- Add update_note tool for title, content, and folder changes
- Add shared resolveFolderId utility for folder name lookup
- Include note ID in search_notes results for cross-tool reference
- Register tools and add system prompt instructions
- Add translation keys for all 10 locales
- Enable tool calling for cloud agent mode (clipboard, notes, web search, calendar)
- Implement NDJSON streaming via IPC batch approach for reliable event delivery
- Add multi-step tool-calling loop with AI SDK v6 message format
- Fix mountedRef StrictMode bug causing empty renders
- Return actual tool result data to LLM instead of just display text
- Extract MAX_TOOL_STEPS constant, remove redundant comments
When cloud backup is enabled and user is signed in, the search_notes
agent tool now uses the cloud hybrid search (pgvector + FTS) instead
of local SQLite FTS5 keyword search. Falls back to local search
transparently on cloud failure.
Replace the buffered IPC middleman with direct fetch from the renderer
to the API. Text now streams token-by-token instead of arriving all at
once after the full response completes. Adds AbortController support for
instant cancellation on unmount or user stop.
The renderer's fetch() can't access auth cookies stored on the Neon Auth
domain. Add a get-session-cookies IPC handler to retrieve them from the
main process cookie jar and forward as an explicit Cookie header.
Browser fetch() forbids setting the Cookie header, so direct renderer-to-
API streaming can't authenticate. Switch to event-based IPC: main process
reads the API stream and forwards each chunk via webContents.send() as it
arrives. This matches the pattern used for AssemblyAI/Deepgram streaming.
Replace basic tool pills with step-based visualization showing the full
tool lifecycle: shimmer accent while executing, checkmark pop on
completion, expandable detail, and contextual clipboard confirmation.

- ToolCallStep: left-border accent, per-tool icons, shimmer animation
- Clipboard: inline "Copied to your clipboard" with green check
- Input bar: thinking shimmer bar, tool icon in executing state
- Empty state: mic icon with dual-line CTA
- Title bar: shows agent name, softer shadow
- Tighter spacing throughout (gaps, heights, bubble widths)
- New CSS: tool-step-shimmer, tool-check-pop, tool-status-sweep
- i18n: copiedToClipboard + orType in all 10 locales
…ults

The executeToolCall callback was returning raw data (full JSON) which was
displayed in the tool step UI. Now returns { data, displayText } so the
LLM gets structured data while the UI shows a human-readable summary.

Also truncates Exa web search article text to 500 chars per result to
prevent massive payloads.
- Increase MAX_TOOL_STEPS from 5 to 20 to prevent agent from getting
  stuck on multi-step workflows
- Add metadata field to ToolCallInfo so tool results can carry structured
  data (e.g. note ID) to the UI without mixing it into displayText
- Created/updated/fetched notes show as clickable steps with a primary
  accent — clicking opens the note in the control panel
- New agent-open-note IPC handler navigates the control panel to the note
- Note cards: when create/update/get_note completes, render a compact
  clickable card at the bottom of the message bubble with title, icon,
  and "Open note" label. Clicking opens the note in the control panel.
- Input bar: replace generic loading dots with dictation-panel-inspired
  indicators — pulsing blue circle for listening, accent wave bars for
  transcribing, shimmer sweep for thinking.
- Fix duplicate copiedToClipboard keys in all locale files.
…mode

The agent-open-note IPC was reusing navigate-to-meeting-note which
activates meeting recording mode. Add a dedicated navigate-to-note event
that only sets the active note and view without starting the recorder.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant