feat(whisper): add Whisper plugin for LiveKit by imsakg · Pull Request #1392 · livekit/agents

imsakg · 2025-01-20T17:22:40Z

Introduce the Whisper plugin for LiveKit, enabling offline speech-to-text capabilities using local Whisper model inference. This includes:

Initial setup of the plugin structure with classes for Whisper model, speech-to-text processing, and audio utilities.
Integration with essential libraries like numpy, ctranslate2, and faster_whisper for enhanced audio processing and transcription.
Setup files (setup.py, pyproject.toml) for building and packaging the plugin.

Introduce the Whisper plugin for LiveKit, enabling offline speech-to-text capabilities using local Whisper model inference. This includes: - Initial setup of the plugin structure with classes for Whisper model, speech-to-text processing, and audio utilities. - Configuration files such as `CHANGELOG.md` and `README.md` for version tracking and documentation. - Integration with essential libraries like `numpy`, `ctranslate2`, and `faster_whisper` for enhanced audio processing and transcription. - Setup files (`setup.py`, `pyproject.toml`) for building and packaging the plugin.

changeset-bot · 2025-01-20T17:22:44Z

⚠️ No Changeset found

Latest commit: 0f1a143

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Harras3 · 2025-02-09T16:18:58Z

Introduce the Whisper plugin for LiveKit, enabling offline speech-to-text capabilities using local Whisper model inference. This includes:

* Initial setup of the plugin structure with classes for Whisper model, speech-to-text processing, and audio utilities.

* Integration with essential libraries like `numpy`, `ctranslate2`, and `faster_whisper` for enhanced audio processing and transcription.

* Setup files (`setup.py`, `pyproject.toml`) for building and packaging the plugin.

Do you have example code to use this. I am having issues using it.

imsakg · 2025-02-10T08:16:27Z

Introduce the Whisper plugin for LiveKit, enabling offline speech-to-text capabilities using local Whisper model inference. This includes:
* Initial setup of the plugin structure with classes for Whisper model, speech-to-text processing, and audio utilities.

* Integration with essential libraries like `numpy`, `ctranslate2`, and `faster_whisper` for enhanced audio processing and transcription.

* Setup files (`setup.py`, `pyproject.toml`) for building and packaging the plugin.
Do you have example code to use this. I am having issues using it.

Hey,

It should work like other STT services.

I did some changes over Minimal Asistant. I did not test it tho, so let me know if it's works.

Here is example code:

import asyncio
import logging

from dotenv import load_dotenv
from livekit import rtc
from livekit.agents import (
    AutoSubscribe,
    JobContext,
    JobProcess,
    WorkerOptions,
    cli,
    llm,
    metrics,
)
from livekit.agents.pipeline import VoicePipelineAgent
from livekit.plugins import openai, silero, whisper

load_dotenv()
logger = logging.getLogger("voice-assistant")


def prewarm(proc: JobProcess):
    proc.userdata["vad"] = silero.VAD.load()
    # We're loading Whisper over here to avoid loading it in the main process
    proc.userdata["whisper"] = whisper.WhisperSTT()


async def entrypoint(ctx: JobContext):
    initial_ctx = llm.ChatContext().append(
        role="system",
        text=(
            "You are a voice assistant created by LiveKit. Your interface with users will be voice. "
            "You should use short and concise responses, and avoiding usage of unpronouncable punctuation."
        ),
    )

    logger.info(f"connecting to room {ctx.room.name}")
    await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)

    # wait for the first participant to connect
    participant = await ctx.wait_for_participant()
    logger.info(f"starting voice assistant for participant {participant.identity}")

    agent = VoicePipelineAgent(
        vad=ctx.proc.userdata["vad"],
        stt=ctx.proc.userdata["whisper"],
        llm=openai.LLM(),
        tts=openai.TTS(),
        chat_ctx=initial_ctx,
    )

    agent.start(ctx.room, participant)

    usage_collector = metrics.UsageCollector()

    @agent.on("metrics_collected")
    def _on_metrics_collected(mtrcs: metrics.AgentMetrics):
        metrics.log_metrics(mtrcs)
        usage_collector.collect(mtrcs)

    async def log_usage():
        summary = usage_collector.get_summary()
        logger.info(f"Usage: ${summary}")

    ctx.add_shutdown_callback(log_usage)

    # listen to incoming chat messages, only required if you'd like the agent to
    # answer incoming messages from Chat
    chat = rtc.ChatManager(ctx.room)

    async def answer_from_text(txt: str):
        chat_ctx = agent.chat_ctx.copy()
        chat_ctx.append(role="user", text=txt)
        stream = agent.llm.chat(chat_ctx=chat_ctx)
        await agent.say(stream)

    @chat.on("message_received")
    def on_chat_received(msg: rtc.ChatMessage):
        if msg.message:
            asyncio.create_task(answer_from_text(msg.message))

    await agent.say("Hey, how can I help you today?", allow_interruptions=True)


if __name__ == "__main__":
    cli.run_app(
        WorkerOptions(
            entrypoint_fnc=entrypoint,
            prewarm_fnc=prewarm,
        ),
    )

Harras3 · 2025-02-10T10:02:21Z

Introduce the Whisper plugin for LiveKit, enabling offline speech-to-text capabilities using local Whisper model inference. This includes:
* Initial setup of the plugin structure with classes for Whisper model, speech-to-text processing, and audio utilities.

* Integration with essential libraries like `numpy`, `ctranslate2`, and `faster_whisper` for enhanced audio processing and transcription.

* Setup files (`setup.py`, `pyproject.toml`) for building and packaging the plugin.
Do you have example code to use this. I am having issues using it.

Hey,

It should work like other STT services.

I did some changes over Minimal Asistant. I did not test it tho, so let me know if it's works.

Here is example code:

import asyncio
import logging

from dotenv import load_dotenv
from livekit import rtc
from livekit.agents import (
    AutoSubscribe,
    JobContext,
    JobProcess,
    WorkerOptions,
    cli,
    llm,
    metrics,
)
from livekit.agents.pipeline import VoicePipelineAgent
from livekit.plugins import openai, silero, whisper

load_dotenv()
logger = logging.getLogger("voice-assistant")


def prewarm(proc: JobProcess):
    proc.userdata["vad"] = silero.VAD.load()
    # We're loading Whisper over here to avoid loading it in the main process
    proc.userdata["whisper"] = whisper.WhisperSTT()


async def entrypoint(ctx: JobContext):
    initial_ctx = llm.ChatContext().append(
        role="system",
        text=(
            "You are a voice assistant created by LiveKit. Your interface with users will be voice. "
            "You should use short and concise responses, and avoiding usage of unpronouncable punctuation."
        ),
    )

    logger.info(f"connecting to room {ctx.room.name}")
    await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)

    # wait for the first participant to connect
    participant = await ctx.wait_for_participant()
    logger.info(f"starting voice assistant for participant {participant.identity}")

    agent = VoicePipelineAgent(
        vad=ctx.proc.userdata["vad"],
        stt=ctx.proc.userdata["whisper"],
        llm=openai.LLM(),
        tts=openai.TTS(),
        chat_ctx=initial_ctx,
    )

    agent.start(ctx.room, participant)

    usage_collector = metrics.UsageCollector()

    @agent.on("metrics_collected")
    def _on_metrics_collected(mtrcs: metrics.AgentMetrics):
        metrics.log_metrics(mtrcs)
        usage_collector.collect(mtrcs)

    async def log_usage():
        summary = usage_collector.get_summary()
        logger.info(f"Usage: ${summary}")

    ctx.add_shutdown_callback(log_usage)

    # listen to incoming chat messages, only required if you'd like the agent to
    # answer incoming messages from Chat
    chat = rtc.ChatManager(ctx.room)

    async def answer_from_text(txt: str):
        chat_ctx = agent.chat_ctx.copy()
        chat_ctx.append(role="user", text=txt)
        stream = agent.llm.chat(chat_ctx=chat_ctx)
        await agent.say(stream)

    @chat.on("message_received")
    def on_chat_received(msg: rtc.ChatMessage):
        if msg.message:
            asyncio.create_task(answer_from_text(msg.message))

    await agent.say("Hey, how can I help you today?", allow_interruptions=True)


if __name__ == "__main__":
    cli.run_app(
        WorkerOptions(
            entrypoint_fnc=entrypoint,
            prewarm_fnc=prewarm,
        ),
    )

I have tried but i get the following error "ERROR livekit.agents - initialization timed out, killing process"

imsakg · 2025-02-10T15:52:51Z

Yeah, you right. First, you need to download required model files for Whisper.

You can achieve it by running python my_agent.py download-files.

Harras3 · 2025-02-10T16:16:34Z

tried this but facing same issue

imsakg · 2025-02-10T16:58:14Z

I checked now with exact code that I shared above. Everything works fine on my Macbook.

IDK, what's the situation with you. Since devs not interested with this PR, I'm not looking into it anymore.

Harras3 · 2025-02-10T17:05:46Z

what is the correct way to install this plugin? I think I may have installed incorrectly

imsakg added 2 commits January 21, 2025 13:21

Update stt.py

3b6f172

Merge branch 'main' into main

7d1ffc4

Merge branch 'livekit:main' into main

0f1a143

imsakg closed this Feb 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(whisper): add Whisper plugin for LiveKit#1392

feat(whisper): add Whisper plugin for LiveKit#1392
imsakg wants to merge 4 commits intolivekit:mainfrom
imsakg:main

imsakg commented Jan 20, 2025 •

edited

Loading

Uh oh!

changeset-bot bot commented Jan 20, 2025 •

edited

Loading

Uh oh!

Harras3 commented Feb 9, 2025

Uh oh!

imsakg commented Feb 10, 2025 •

edited

Loading

Uh oh!

Harras3 commented Feb 10, 2025

Uh oh!

imsakg commented Feb 10, 2025

Uh oh!

Harras3 commented Feb 10, 2025

Uh oh!

imsakg commented Feb 10, 2025

Uh oh!

Harras3 commented Feb 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

imsakg commented Jan 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

changeset-bot bot commented Jan 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

Harras3 commented Feb 9, 2025

Uh oh!

imsakg commented Feb 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Harras3 commented Feb 10, 2025

Uh oh!

imsakg commented Feb 10, 2025

Uh oh!

Harras3 commented Feb 10, 2025

Uh oh!

imsakg commented Feb 10, 2025

Uh oh!

Harras3 commented Feb 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

imsakg commented Jan 20, 2025 •

edited

Loading

changeset-bot bot commented Jan 20, 2025 •

edited

Loading

imsakg commented Feb 10, 2025 •

edited

Loading