feat(whisper): add Whisper plugin for LiveKit#1392
feat(whisper): add Whisper plugin for LiveKit#1392imsakg wants to merge 4 commits intolivekit:mainfrom
Conversation
Introduce the Whisper plugin for LiveKit, enabling offline speech-to-text capabilities using local Whisper model inference. This includes: - Initial setup of the plugin structure with classes for Whisper model, speech-to-text processing, and audio utilities. - Configuration files such as `CHANGELOG.md` and `README.md` for version tracking and documentation. - Integration with essential libraries like `numpy`, `ctranslate2`, and `faster_whisper` for enhanced audio processing and transcription. - Setup files (`setup.py`, `pyproject.toml`) for building and packaging the plugin.
|
Do you have example code to use this. I am having issues using it. |
Hey, It should work like other STT services. I did some changes over Minimal Asistant. I did not test it tho, so let me know if it's works. Here is example code: import asyncio
import logging
from dotenv import load_dotenv
from livekit import rtc
from livekit.agents import (
AutoSubscribe,
JobContext,
JobProcess,
WorkerOptions,
cli,
llm,
metrics,
)
from livekit.agents.pipeline import VoicePipelineAgent
from livekit.plugins import openai, silero, whisper
load_dotenv()
logger = logging.getLogger("voice-assistant")
def prewarm(proc: JobProcess):
proc.userdata["vad"] = silero.VAD.load()
# We're loading Whisper over here to avoid loading it in the main process
proc.userdata["whisper"] = whisper.WhisperSTT()
async def entrypoint(ctx: JobContext):
initial_ctx = llm.ChatContext().append(
role="system",
text=(
"You are a voice assistant created by LiveKit. Your interface with users will be voice. "
"You should use short and concise responses, and avoiding usage of unpronouncable punctuation."
),
)
logger.info(f"connecting to room {ctx.room.name}")
await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)
# wait for the first participant to connect
participant = await ctx.wait_for_participant()
logger.info(f"starting voice assistant for participant {participant.identity}")
agent = VoicePipelineAgent(
vad=ctx.proc.userdata["vad"],
stt=ctx.proc.userdata["whisper"],
llm=openai.LLM(),
tts=openai.TTS(),
chat_ctx=initial_ctx,
)
agent.start(ctx.room, participant)
usage_collector = metrics.UsageCollector()
@agent.on("metrics_collected")
def _on_metrics_collected(mtrcs: metrics.AgentMetrics):
metrics.log_metrics(mtrcs)
usage_collector.collect(mtrcs)
async def log_usage():
summary = usage_collector.get_summary()
logger.info(f"Usage: ${summary}")
ctx.add_shutdown_callback(log_usage)
# listen to incoming chat messages, only required if you'd like the agent to
# answer incoming messages from Chat
chat = rtc.ChatManager(ctx.room)
async def answer_from_text(txt: str):
chat_ctx = agent.chat_ctx.copy()
chat_ctx.append(role="user", text=txt)
stream = agent.llm.chat(chat_ctx=chat_ctx)
await agent.say(stream)
@chat.on("message_received")
def on_chat_received(msg: rtc.ChatMessage):
if msg.message:
asyncio.create_task(answer_from_text(msg.message))
await agent.say("Hey, how can I help you today?", allow_interruptions=True)
if __name__ == "__main__":
cli.run_app(
WorkerOptions(
entrypoint_fnc=entrypoint,
prewarm_fnc=prewarm,
),
) |
I have tried but i get the following error "ERROR livekit.agents - initialization timed out, killing process" |
|
Yeah, you right. First, you need to download required model files for Whisper. You can achieve it by running |
|
tried this but facing same issue |
|
I checked now with exact code that I shared above. Everything works fine on my Macbook. IDK, what's the situation with you. Since devs not interested with this PR, I'm not looking into it anymore. |
|
what is the correct way to install this plugin? I think I may have installed incorrectly |
Introduce the Whisper plugin for LiveKit, enabling offline speech-to-text capabilities using local Whisper model inference. This includes:
numpy,ctranslate2, andfaster_whisperfor enhanced audio processing and transcription.setup.py,pyproject.toml) for building and packaging the plugin.