Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add Voice Mode API with WebSocket support and VAD processing for real-time audio interaction #6826

Draft
wants to merge 58 commits into
base: main
Choose a base branch
from

Conversation

Cristhianzl
Copy link
Member

This pull request introduces several new features and improvements to the langflow project, including the addition of new dependencies, the implementation of a new voice_mode API, and various enhancements to the MCP module. Below are the most important changes grouped by theme:

New Dependencies

  • Added webrtcvad and scipy to the project dependencies in pyproject.toml.

API Enhancements

  • Introduced the voice_mode_router and mcp_router to the main API router in src/backend/base/langflow/api/router.py. [1] [2]
  • Added the voice_mode_router to the API v1 initialization in src/backend/base/langflow/api/v1/__init__.py. [1] [2]

MCP Module Improvements

  • Added a global enable_progress_notifications variable and updated the handle_call_tool function to initialize it if not already set in src/backend/base/langflow/api/v1/mcp.py. [1] [2]

Voice Mode API Implementation

  • Implemented the voice_mode API in src/backend/base/langflow/api/v1/voice_mode.py, including WebSocket handling, VAD processing, and integration with OpenAI Realtime API.

Utility Enhancements

  • Enhanced the create_tool_coroutine function and added a new create_input_schema_from_json_schema function in src/backend/base/langflow/base/mcp/util.py. [1] [2]
  • Refactored the connect_to_server method to include a timeout mechanism in src/backend/base/langflow/components/tools/mcp_sse.py. [1] [2]

phact and others added 13 commits February 5, 2025 18:02
🔧 (frontend): Refactor import path for VoiceAssistant component
🔧 (frontend): Refactor class name for button in upload-file-button component
🔧 (frontend): Refactor class name for button in voice-button component
🔧 (frontend): Refactor class name for button in applies.css
🔧 (frontend): Refactor class name for button in styleUtils.ts
…s to enter their OpenAI API key for voice transcription

♻️ (index.tsx): Refactor VoiceAssistant component to check for the presence of OpenAI API key before starting voice transcription and show ApiKeyPopup component if key is missing
🔧 (apiKeyModal/index.tsx): Remove the obsolete APIKeyModal component as it is no longer needed after implementing ApiKeyPopup in the voice assistant feature
… OpenAI. Add components and hooks for handling audio recording, processing, and WebSocket communication. Implement functionality to start, stop recording, play audio chunks, handle WebSocket messages, and initialize audio context. Add support for entering API key for OpenAI.
@Cristhianzl Cristhianzl requested a review from phact February 25, 2025 20:23
@Cristhianzl Cristhianzl self-assigned this Feb 25, 2025
@github-actions github-actions bot added the enhancement New feature or request label Feb 25, 2025
…t component for recording and processing audio input

🔧 (use-post-voice.tsx): Remove unused file use-post-voice.tsx from the project
♻️ (use-handle-websocket-message.ts): Refactor useHandleWebsocketMessage function to improve readability and remove unnecessary console logs
♻️ (use-initialize-audio.ts): Refactor useInitializeAudio function to handle audio context creation and resume more efficiently
✅ (use-interrupt-playback.ts): Add useInterruptPlayback function to handle interrupting audio playback
✅ (use-start-conversation.ts): Add useStartConversation function to initiate a conversation using a WebSocket connection
📝 (chat-input.tsx): Update import path for VoiceAssistant component to match the new file structure
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Feb 25, 2025
Copy link

codspeed-hq bot commented Feb 25, 2025

CodSpeed Performance Report

Merging #6826 will degrade performances by 12.08%

Comparing cz/voice_mode (6cfeee8) with main (0134485)

Summary

⚡ 1 improvements
❌ 1 regressions
✅ 17 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark BASE HEAD Change
test_build_flow_invalid_job_id 8 ms 9.1 ms -12.08%
test_cancel_nonexistent_build 13.2 ms 10.6 ms +24.3%

… clear audio queue, stop playback, and send stop message to audio processor if it exists
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Feb 25, 2025
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Feb 26, 2025
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Feb 26, 2025
@phact
Copy link
Collaborator

phact commented Feb 27, 2025

Dupe of #4642 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants