Add text stream sink and multi text sink by lukasIO · Pull Request #1497 · livekit/agents

lukasIO · 2025-02-14T11:07:07Z

changeset-bot · 2025-02-14T11:07:11Z

🦋 Changeset detected

Latest commit: 6b9a3ff

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package

Name	Type
livekit-agents	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

longcw · 2025-02-18T08:37:29Z

livekit-agents/livekit/agents/pipeline/room_io.py

            # TODO: support multiple participants
-            self._text_sink = RoomTranscriptEventSink(
-                room=self._room, participant=self._participant, is_stream=False
+            self._text_sink = MultiTextSink(


should we make this configurable that enables one or both of the sinks?

Not sure what other decisions were made for the io API, but in my opinion it would be nice for users to be able to set a TextSink themselves

longcw · 2025-02-20T10:30:43Z

livekit-agents/livekit/agents/pipeline/room_io.py

            self._track_id = track.sid
+
+
+class DataStreamSink(TextSink):


perhaps move this to the datastream_io

just wanted to keep all the text sinks together, but don't have a strong opinion if you think it makes more sense to move it to datastream_io.

maybe a text_sinks.py makes sense where we have all of them?

Should this be merged inside the RoomOutput? I don't think we need to have it in a separate class. Also we have an ongoing discussion about merging RoomInput and RoomOutput together in Slack.

Users that only want text would be an option added to the RoomIO constructor.

longcw · 2025-02-20T10:33:54Z

livekit-agents/livekit/agents/pipeline/transcription/synchronizer.py


        async def _capture_text():
-            await self._base_text_sink.capture_text(segment.text)
+            await self._base_text_sink.capture_text(segment.text, segment_id=segment.id)


this won't work for other text sinks

There is already a check for the segment id before capture

if self._current_segment_id != segment.id: self._base_text_sink.flush() self._current_segment_id = segment.id

longcw · 2025-02-20T10:40:19Z

livekit-agents/livekit/agents/pipeline/room_io.py

+        self._participant_identity = identity
+        self._latest_text = ""
+
+    async def capture_text(self, text: str, *, segment_id: str | None = None) -> None:


I prefer to not add segment_id here to keep the same api for all text sinks, user should use flush() to mark the end of the segment ideally.

Though actually I also want a segment id here in case the text with different ids comes not in order like id1, id2, id1, .... wdyt @theomonnom ?

longcw · 2025-02-20T10:41:49Z

livekit-agents/livekit/agents/pipeline/io.py

 class TextSink(ABC):
    @abstractmethod
-    async def capture_text(self, text: str) -> None:
+    async def capture_text(self, text: str, *, segment_id: str | None = None) -> None:


ok, it was added here.

yeah, I think having it is better than not having it, guessing you're ok with the change as long as it's part of the base class here?

I still have a concern that the segment_id is conflicted with flush. If calling flush means the end of the segment, what is the "correct" behavior if the segment_id is the same as before after flush?

I don't think that's a problem that's caused by the segment_id being part of the params.

Is you concern about replacing current_id with segment_id?
I just wanted to avoid any instances where there are multiple places trying to call capture_text with different segment ids. In my head the place to handle this is the sink itself, but we can also leave it as is if we can be certain that every caller checks for the segment_id and flushes if necessary

not about the current_id or segment_id, but just not clear what is the expect behavior if the same segment_id captured after flush. Sounds like the flush and segment_id are overlapped in functionality.

I didn't see a logic change in the DatastreamSink from this commit 446e3a9.

For agent transcription with delta=True, the change of the segment_id is checked before calling the sink.capture_text() here https://github.com/livekit/agents/blob/lukas/ds-sink/livekit-agents/livekit/agents/pipeline/transcription/synchronizer.py#L356-L363,

if self._current_segment_id != segment.id: self._base_text_sink.flush() self._current_segment_id = segment.id async def _capture_text(): await self._base_text_sink.capture_text(segment.text) if segment.final: self._base_text_sink.flush() task = asyncio.create_task(_capture_text())

and for user transcription with delta=False, the capture is called here https://github.com/livekit/agents/blob/lukas/ds-sink/livekit-agents/livekit/agents/pipeline/room_io.py#L420-L426 so it only flush after the final transcript received.

async def _capture_text(): if ev.alternatives: data = ev.alternatives[0] await self._text_sink.capture_text(data.text) if ev.type == stt.SpeechEventType.FINAL_TRANSCRIPT: self._text_sink.flush()

that means it needs to be guaranteed that captureText() is not called for the same segment_id again after it has been flushed once with that segment_id.
Is that guaranteed?

For user transcription, yes, there is no segment id naturally in that case.
For agent transcription, it's not guaranteed but it streams delta so it should be fine.

could you point out what is the logic difference introduced by the commit if that's the concern?

the difference is that previously if a segment_id was provided to capture_text it would re-use that id for sending the text stream.

if we create a new id after a flush it will result in streams getting a new id, even though it might be originating from the same segment_id.
On the receiving side a new id would show up as a new message, while having the same id for the same segments would prompt the receiving side to replace the previous message instead of adding a new one.
Does that make it clearer?
Happy to jump on a call and discuss

theomonnom · 2025-02-20T17:14:44Z

livekit-agents/livekit/agents/pipeline/io.py

+class MultiTextSink(TextSink):
+    def __init__(self, sinks: list[TextSink]) -> None:
+        self._sinks = sinks
+
+    async def capture_text(self, text: str, *, segment_id: str | None = None) -> None:
+        await asyncio.gather(


Let's keep the public API minimal for V1.0, I wouldn't expose it for now and keep this class private.

lukasIO

lgtm! thanks for fixing the outstanding issues 🙏

Co-authored-by: Long Chen <longch1024@gmail.com>

lukasIO changed the title ~~text stream sink~~ Add text stream sink and multi text sink Feb 17, 2025

lukasIO requested review from longcw and theomonnom and removed request for longcw February 17, 2025 12:03

lukasIO marked this pull request as ready for review February 17, 2025 12:05

lukasIO requested a review from davidzhao February 17, 2025 12:05

longcw reviewed Feb 18, 2025

View reviewed changes

longcw reviewed Feb 20, 2025

View reviewed changes

theomonnom reviewed Feb 20, 2025

View reviewed changes

theomonnom force-pushed the dev-1.0 branch from 3f4faf0 to 3ee34f2 Compare February 21, 2025 10:45

theomonnom force-pushed the lukas/ds-sink branch 5 times, most recently from 446e3a9 to 02493b2 Compare February 21, 2025 11:44

longcw force-pushed the lukas/ds-sink branch from 02493b2 to 6b9a3ff Compare February 21, 2025 13:23

lukasIO and others added 9 commits February 21, 2025 21:25

wip

83d997a

Merge branch 'dev-1.0' into lukas/ds-sink

f707f22

move capture delta to init

390da49

move capture delta to init

a805cd2

Add DataStream text sink

184f210

multi text sink

28337e2

properly close non delta streams

89b4549

Create eighty-bikes-speak.md

f759c84

generate new stream ids for each capturing phase

ca57770

lukasIO and others added 3 commits February 21, 2025 21:25

forward segment_id for TTS

d4465be

fix set text sink set participant

7c47f44

fix datastream text sink

6b9a3ff

lukasIO commented Feb 21, 2025

View reviewed changes

theomonnom approved these changes Feb 21, 2025

View reviewed changes

longcw merged commit f2c469b into dev-1.0 Feb 21, 2025
1 check passed

longcw deleted the lukas/ds-sink branch February 21, 2025 13:35

jayesh-mivi pushed a commit to mivi-dev-org/custom-livekit-agents that referenced this pull request Jun 4, 2025

Add text stream sink and multi text sink (livekit#1497)

fc6c401

Co-authored-by: Long Chen <longch1024@gmail.com>

Comments

Conversation

lukasIO commented Feb 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

changeset-bot bot commented Feb 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

theomonnom Feb 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lukasIO Feb 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lukasIO left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lukasIO commented Feb 14, 2025 •

edited

Loading

changeset-bot bot commented Feb 14, 2025 •

edited

Loading

theomonnom Feb 20, 2025 •

edited

Loading

lukasIO Feb 20, 2025 •

edited

Loading