Skip to content

feat(whisper): add Whisper plugin for LiveKit#1392

Closed
imsakg wants to merge 4 commits intolivekit:mainfrom
imsakg:main
Closed

feat(whisper): add Whisper plugin for LiveKit#1392
imsakg wants to merge 4 commits intolivekit:mainfrom
imsakg:main

Conversation

@imsakg
Copy link
Contributor

@imsakg imsakg commented Jan 20, 2025

Introduce the Whisper plugin for LiveKit, enabling offline speech-to-text capabilities using local Whisper model inference. This includes:

  • Initial setup of the plugin structure with classes for Whisper model, speech-to-text processing, and audio utilities.
  • Integration with essential libraries like numpy, ctranslate2, and faster_whisper for enhanced audio processing and transcription.
  • Setup files (setup.py, pyproject.toml) for building and packaging the plugin.

Introduce the Whisper plugin for LiveKit, enabling offline speech-to-text capabilities using local Whisper model inference. This includes:
- Initial setup of the plugin structure with classes for Whisper model, speech-to-text processing, and audio utilities.
- Configuration files such as `CHANGELOG.md` and `README.md` for version tracking and documentation.
- Integration with essential libraries like `numpy`, `ctranslate2`, and `faster_whisper` for enhanced audio processing and transcription.
- Setup files (`setup.py`, `pyproject.toml`) for building and packaging the plugin.
@changeset-bot
Copy link

changeset-bot bot commented Jan 20, 2025

⚠️ No Changeset found

Latest commit: 0f1a143

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@Harras3
Copy link

Harras3 commented Feb 9, 2025

Introduce the Whisper plugin for LiveKit, enabling offline speech-to-text capabilities using local Whisper model inference. This includes:

* Initial setup of the plugin structure with classes for Whisper model, speech-to-text processing, and audio utilities.

* Integration with essential libraries like `numpy`, `ctranslate2`, and `faster_whisper` for enhanced audio processing and transcription.

* Setup files (`setup.py`, `pyproject.toml`) for building and packaging the plugin.

Do you have example code to use this. I am having issues using it.

@imsakg
Copy link
Contributor Author

imsakg commented Feb 10, 2025

Introduce the Whisper plugin for LiveKit, enabling offline speech-to-text capabilities using local Whisper model inference. This includes:

* Initial setup of the plugin structure with classes for Whisper model, speech-to-text processing, and audio utilities.

* Integration with essential libraries like `numpy`, `ctranslate2`, and `faster_whisper` for enhanced audio processing and transcription.

* Setup files (`setup.py`, `pyproject.toml`) for building and packaging the plugin.

Do you have example code to use this. I am having issues using it.

Hey,

It should work like other STT services.

I did some changes over Minimal Asistant. I did not test it tho, so let me know if it's works.

Here is example code:

import asyncio
import logging

from dotenv import load_dotenv
from livekit import rtc
from livekit.agents import (
    AutoSubscribe,
    JobContext,
    JobProcess,
    WorkerOptions,
    cli,
    llm,
    metrics,
)
from livekit.agents.pipeline import VoicePipelineAgent
from livekit.plugins import openai, silero, whisper

load_dotenv()
logger = logging.getLogger("voice-assistant")


def prewarm(proc: JobProcess):
    proc.userdata["vad"] = silero.VAD.load()
    # We're loading Whisper over here to avoid loading it in the main process
    proc.userdata["whisper"] = whisper.WhisperSTT()


async def entrypoint(ctx: JobContext):
    initial_ctx = llm.ChatContext().append(
        role="system",
        text=(
            "You are a voice assistant created by LiveKit. Your interface with users will be voice. "
            "You should use short and concise responses, and avoiding usage of unpronouncable punctuation."
        ),
    )

    logger.info(f"connecting to room {ctx.room.name}")
    await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)

    # wait for the first participant to connect
    participant = await ctx.wait_for_participant()
    logger.info(f"starting voice assistant for participant {participant.identity}")

    agent = VoicePipelineAgent(
        vad=ctx.proc.userdata["vad"],
        stt=ctx.proc.userdata["whisper"],
        llm=openai.LLM(),
        tts=openai.TTS(),
        chat_ctx=initial_ctx,
    )

    agent.start(ctx.room, participant)

    usage_collector = metrics.UsageCollector()

    @agent.on("metrics_collected")
    def _on_metrics_collected(mtrcs: metrics.AgentMetrics):
        metrics.log_metrics(mtrcs)
        usage_collector.collect(mtrcs)

    async def log_usage():
        summary = usage_collector.get_summary()
        logger.info(f"Usage: ${summary}")

    ctx.add_shutdown_callback(log_usage)

    # listen to incoming chat messages, only required if you'd like the agent to
    # answer incoming messages from Chat
    chat = rtc.ChatManager(ctx.room)

    async def answer_from_text(txt: str):
        chat_ctx = agent.chat_ctx.copy()
        chat_ctx.append(role="user", text=txt)
        stream = agent.llm.chat(chat_ctx=chat_ctx)
        await agent.say(stream)

    @chat.on("message_received")
    def on_chat_received(msg: rtc.ChatMessage):
        if msg.message:
            asyncio.create_task(answer_from_text(msg.message))

    await agent.say("Hey, how can I help you today?", allow_interruptions=True)


if __name__ == "__main__":
    cli.run_app(
        WorkerOptions(
            entrypoint_fnc=entrypoint,
            prewarm_fnc=prewarm,
        ),
    )

@Harras3
Copy link

Harras3 commented Feb 10, 2025

Introduce the Whisper plugin for LiveKit, enabling offline speech-to-text capabilities using local Whisper model inference. This includes:

* Initial setup of the plugin structure with classes for Whisper model, speech-to-text processing, and audio utilities.

* Integration with essential libraries like `numpy`, `ctranslate2`, and `faster_whisper` for enhanced audio processing and transcription.

* Setup files (`setup.py`, `pyproject.toml`) for building and packaging the plugin.

Do you have example code to use this. I am having issues using it.

Hey,

It should work like other STT services.

I did some changes over Minimal Asistant. I did not test it tho, so let me know if it's works.

Here is example code:

import asyncio
import logging

from dotenv import load_dotenv
from livekit import rtc
from livekit.agents import (
    AutoSubscribe,
    JobContext,
    JobProcess,
    WorkerOptions,
    cli,
    llm,
    metrics,
)
from livekit.agents.pipeline import VoicePipelineAgent
from livekit.plugins import openai, silero, whisper

load_dotenv()
logger = logging.getLogger("voice-assistant")


def prewarm(proc: JobProcess):
    proc.userdata["vad"] = silero.VAD.load()
    # We're loading Whisper over here to avoid loading it in the main process
    proc.userdata["whisper"] = whisper.WhisperSTT()


async def entrypoint(ctx: JobContext):
    initial_ctx = llm.ChatContext().append(
        role="system",
        text=(
            "You are a voice assistant created by LiveKit. Your interface with users will be voice. "
            "You should use short and concise responses, and avoiding usage of unpronouncable punctuation."
        ),
    )

    logger.info(f"connecting to room {ctx.room.name}")
    await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)

    # wait for the first participant to connect
    participant = await ctx.wait_for_participant()
    logger.info(f"starting voice assistant for participant {participant.identity}")

    agent = VoicePipelineAgent(
        vad=ctx.proc.userdata["vad"],
        stt=ctx.proc.userdata["whisper"],
        llm=openai.LLM(),
        tts=openai.TTS(),
        chat_ctx=initial_ctx,
    )

    agent.start(ctx.room, participant)

    usage_collector = metrics.UsageCollector()

    @agent.on("metrics_collected")
    def _on_metrics_collected(mtrcs: metrics.AgentMetrics):
        metrics.log_metrics(mtrcs)
        usage_collector.collect(mtrcs)

    async def log_usage():
        summary = usage_collector.get_summary()
        logger.info(f"Usage: ${summary}")

    ctx.add_shutdown_callback(log_usage)

    # listen to incoming chat messages, only required if you'd like the agent to
    # answer incoming messages from Chat
    chat = rtc.ChatManager(ctx.room)

    async def answer_from_text(txt: str):
        chat_ctx = agent.chat_ctx.copy()
        chat_ctx.append(role="user", text=txt)
        stream = agent.llm.chat(chat_ctx=chat_ctx)
        await agent.say(stream)

    @chat.on("message_received")
    def on_chat_received(msg: rtc.ChatMessage):
        if msg.message:
            asyncio.create_task(answer_from_text(msg.message))

    await agent.say("Hey, how can I help you today?", allow_interruptions=True)


if __name__ == "__main__":
    cli.run_app(
        WorkerOptions(
            entrypoint_fnc=entrypoint,
            prewarm_fnc=prewarm,
        ),
    )

I have tried but i get the following error "ERROR livekit.agents - initialization timed out, killing process"

@imsakg
Copy link
Contributor Author

imsakg commented Feb 10, 2025

Yeah, you right. First, you need to download required model files for Whisper.

You can achieve it by running python my_agent.py download-files.

@Harras3
Copy link

Harras3 commented Feb 10, 2025

tried this but facing same issue

@imsakg
Copy link
Contributor Author

imsakg commented Feb 10, 2025

I checked now with exact code that I shared above. Everything works fine on my Macbook.

IDK, what's the situation with you. Since devs not interested with this PR, I'm not looking into it anymore.

@Harras3
Copy link

Harras3 commented Feb 10, 2025

what is the correct way to install this plugin? I think I may have installed incorrectly

@imsakg imsakg closed this Feb 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants