feat: Add Voice Mode API with WebSocket support and VAD processing for real-time audio interaction #6826

Cristhianzl · 2025-02-25T20:23:03Z

This pull request introduces several new features and improvements to the langflow project, including the addition of new dependencies, the implementation of a new voice_mode API, and various enhancements to the MCP module. Below are the most important changes grouped by theme:

New Dependencies

Added webrtcvad and scipy to the project dependencies in pyproject.toml.

API Enhancements

Introduced the voice_mode_router and mcp_router to the main API router in src/backend/base/langflow/api/router.py. [1] [2]
Added the voice_mode_router to the API v1 initialization in src/backend/base/langflow/api/v1/__init__.py. [1] [2]

MCP Module Improvements

Added a global enable_progress_notifications variable and updated the handle_call_tool function to initialize it if not already set in src/backend/base/langflow/api/v1/mcp.py. [1] [2]

Voice Mode API Implementation

Implemented the voice_mode API in src/backend/base/langflow/api/v1/voice_mode.py, including WebSocket handling, VAD processing, and integration with OpenAI Realtime API.

Utility Enhancements

Enhanced the create_tool_coroutine function and added a new create_input_schema_from_json_schema function in src/backend/base/langflow/base/mcp/util.py. [1] [2]
Refactored the connect_to_server method to include a timeout mechanism in src/backend/base/langflow/components/tools/mcp_sse.py. [1] [2]

🔧 (frontend): Refactor import path for VoiceAssistant component 🔧 (frontend): Refactor class name for button in upload-file-button component 🔧 (frontend): Refactor class name for button in voice-button component 🔧 (frontend): Refactor class name for button in applies.css 🔧 (frontend): Refactor class name for button in styleUtils.ts

…s to enter their OpenAI API key for voice transcription ♻️ (index.tsx): Refactor VoiceAssistant component to check for the presence of OpenAI API key before starting voice transcription and show ApiKeyPopup component if key is missing 🔧 (apiKeyModal/index.tsx): Remove the obsolete APIKeyModal component as it is no longer needed after implementing ApiKeyPopup in the voice assistant feature

… OpenAI. Add components and hooks for handling audio recording, processing, and WebSocket communication. Implement functionality to start, stop recording, play audio chunks, handle WebSocket messages, and initialize audio context. Add support for entering API key for OpenAI.

…t component for recording and processing audio input 🔧 (use-post-voice.tsx): Remove unused file use-post-voice.tsx from the project ♻️ (use-handle-websocket-message.ts): Refactor useHandleWebsocketMessage function to improve readability and remove unnecessary console logs ♻️ (use-initialize-audio.ts): Refactor useInitializeAudio function to handle audio context creation and resume more efficiently ✅ (use-interrupt-playback.ts): Add useInterruptPlayback function to handle interrupting audio playback ✅ (use-start-conversation.ts): Add useStartConversation function to initiate a conversation using a WebSocket connection 📝 (chat-input.tsx): Update import path for VoiceAssistant component to match the new file structure

codspeed-hq · 2025-02-25T21:06:58Z

CodSpeed Performance Report

Merging #6826 will degrade performances by 12.08%

_{Comparing cz/voice_mode (6cfeee8) with main (0134485)}

Summary

⚡ 1 improvements
❌ 1 regressions
✅ 17 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

	Benchmark	`BASE`	`HEAD`	Change
❌	`test_build_flow_invalid_job_id`	8 ms	9.1 ms	-12.08%
⚡	`test_cancel_nonexistent_build`	13.2 ms	10.6 ms	+24.3%

… clear audio queue, stop playback, and send stop message to audio processor if it exists

…from modifying the file contents

phact · 2025-02-27T17:52:32Z

Dupe of #4642 ?

phact and others added 30 commits November 12, 2024 23:59

WIP

294b498

works

5f6833f

stereo

931052d

ui v0

99d5f64

unnecessary import

8bd3f2d

Merge branch 'main' into voice_mode

e2163ae

update steps in voice ws

dca19cd

[autofix.ci] apply automated fixes

c2f15fe

unused

2863d48

merge

e97e561

Merge branch 'voice_mode' of github.com:phact/langflow into voice_mode

c4773b7

Merge branch 'main' into voice_mode

b48837a

[autofix.ci] apply automated fixes

86755db

Merge branch 'main' into voice_mode

de727ca

Merge branch 'voice_mode' of github.com:phact/langflow into voice_mode

5b8c4bd

Merge branch 'langflow-ai:main' into voice_mode

824d3fd

cleanly handle missing OPENAI key

86a630c

ruff

968baa3

Merge branch 'main' into voice_mode

4fa7b2e

Merge branch 'main' into voice_mode

3d77aed

[autofix.ci] apply automated fixes

47a18ab

Merge branch 'main' into voice_mode

d3d2c8a

fix genericIconComponent path

d75d9d8

merge main

9e443b3

update for recent async fixes

0f40fb9

Merge branch 'main' into voice_mode

28b6101

accidentally commited html file

55aad4d

better prompt and threading

d41cb0c

client barge-in detection

d3d425f

fmt

baa855f

phact and others added 13 commits February 5, 2025 18:02

global variable exception handling

4a11a82

don't close the websocket

b008fcf

fix double send bug

41f7db6

fix double send bug

762fca6

response.output_item event type typo

78bc7f1

voice_mode logging

c616465

vad + dummy check

e797175

merge fix

c3b1f8e

merge fix

b57decf

Merge branch 'main' into cz/voice_mode

3fd1f51

Cristhianzl requested a review from phact February 25, 2025 20:23

Cristhianzl self-assigned this Feb 25, 2025

github-actions bot added the enhancement New feature or request label Feb 25, 2025

github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Feb 25, 2025

♻️ (use-interrupt-playback.ts): refactor useInterruptPlayback hook to…

2cff383

… clear audio queue, stop playback, and send stop message to audio processor if it exists

github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Feb 25, 2025

🔧 (gitattributes): add *.raw file extension as binary to prevent git …

42eb9bc

…from modifying the file contents

github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Feb 26, 2025

[autofix.ci] apply automated fixes

6cfeee8

github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Feb 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add Voice Mode API with WebSocket support and VAD processing for real-time audio interaction #6826

feat: Add Voice Mode API with WebSocket support and VAD processing for real-time audio interaction #6826

Cristhianzl commented Feb 25, 2025

codspeed-hq bot commented Feb 25, 2025 •

edited

Loading

phact commented Feb 27, 2025

feat: Add Voice Mode API with WebSocket support and VAD processing for real-time audio interaction #6826

Are you sure you want to change the base?

feat: Add Voice Mode API with WebSocket support and VAD processing for real-time audio interaction #6826

Conversation

Cristhianzl commented Feb 25, 2025

New Dependencies

API Enhancements

MCP Module Improvements

Voice Mode API Implementation

Utility Enhancements

codspeed-hq bot commented Feb 25, 2025 • edited Loading

CodSpeed Performance Report

Merging #6826 will degrade performances by 12.08%

Summary

Benchmarks breakdown

phact commented Feb 27, 2025

codspeed-hq bot commented Feb 25, 2025 •

edited

Loading