Conversation
🦋 Changeset detectedLatest commit: 6b9a3ff The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
| # TODO: support multiple participants | ||
| self._text_sink = RoomTranscriptEventSink( | ||
| room=self._room, participant=self._participant, is_stream=False | ||
| self._text_sink = MultiTextSink( |
There was a problem hiding this comment.
should we make this configurable that enables one or both of the sinks?
There was a problem hiding this comment.
Not sure what other decisions were made for the io API, but in my opinion it would be nice for users to be able to set a TextSink themselves
| self._track_id = track.sid | ||
|
|
||
|
|
||
| class DataStreamSink(TextSink): |
There was a problem hiding this comment.
perhaps move this to the datastream_io
There was a problem hiding this comment.
just wanted to keep all the text sinks together, but don't have a strong opinion if you think it makes more sense to move it to datastream_io.
maybe a text_sinks.py makes sense where we have all of them?
There was a problem hiding this comment.
Should this be merged inside the RoomOutput? I don't think we need to have it in a separate class. Also we have an ongoing discussion about merging RoomInput and RoomOutput together in Slack.
Users that only want text would be an option added to the RoomIO constructor.
|
|
||
| async def _capture_text(): | ||
| await self._base_text_sink.capture_text(segment.text) | ||
| await self._base_text_sink.capture_text(segment.text, segment_id=segment.id) |
There was a problem hiding this comment.
this won't work for other text sinks
There was a problem hiding this comment.
There is already a check for the segment id before capture
if self._current_segment_id != segment.id:
self._base_text_sink.flush()
self._current_segment_id = segment.id
| self._participant_identity = identity | ||
| self._latest_text = "" | ||
|
|
||
| async def capture_text(self, text: str, *, segment_id: str | None = None) -> None: |
There was a problem hiding this comment.
I prefer to not add segment_id here to keep the same api for all text sinks, user should use flush() to mark the end of the segment ideally.
Though actually I also want a segment id here in case the text with different ids comes not in order like id1, id2, id1, .... wdyt @theomonnom ?
| class TextSink(ABC): | ||
| @abstractmethod | ||
| async def capture_text(self, text: str) -> None: | ||
| async def capture_text(self, text: str, *, segment_id: str | None = None) -> None: |
There was a problem hiding this comment.
yeah, I think having it is better than not having it, guessing you're ok with the change as long as it's part of the base class here?
There was a problem hiding this comment.
I still have a concern that the segment_id is conflicted with flush. If calling flush means the end of the segment, what is the "correct" behavior if the segment_id is the same as before after flush?
There was a problem hiding this comment.
I don't think that's a problem that's caused by the segment_id being part of the params.
Is you concern about replacing current_id with segment_id?
I just wanted to avoid any instances where there are multiple places trying to call capture_text with different segment ids. In my head the place to handle this is the sink itself, but we can also leave it as is if we can be certain that every caller checks for the segment_id and flushes if necessary
There was a problem hiding this comment.
not about the current_id or segment_id, but just not clear what is the expect behavior if the same segment_id captured after flush. Sounds like the flush and segment_id are overlapped in functionality.
There was a problem hiding this comment.
I didn't see a logic change in the DatastreamSink from this commit 446e3a9.
For agent transcription with delta=True, the change of the segment_id is checked before calling the sink.capture_text() here https://github.com/livekit/agents/blob/lukas/ds-sink/livekit-agents/livekit/agents/pipeline/transcription/synchronizer.py#L356-L363,
if self._current_segment_id != segment.id:
self._base_text_sink.flush()
self._current_segment_id = segment.id
async def _capture_text():
await self._base_text_sink.capture_text(segment.text)
if segment.final:
self._base_text_sink.flush()
task = asyncio.create_task(_capture_text())and for user transcription with delta=False, the capture is called here https://github.com/livekit/agents/blob/lukas/ds-sink/livekit-agents/livekit/agents/pipeline/room_io.py#L420-L426 so it only flush after the final transcript received.
async def _capture_text():
if ev.alternatives:
data = ev.alternatives[0]
await self._text_sink.capture_text(data.text)
if ev.type == stt.SpeechEventType.FINAL_TRANSCRIPT:
self._text_sink.flush()There was a problem hiding this comment.
that means it needs to be guaranteed that captureText() is not called for the same segment_id again after it has been flushed once with that segment_id.
Is that guaranteed?
There was a problem hiding this comment.
For user transcription, yes, there is no segment id naturally in that case.
For agent transcription, it's not guaranteed but it streams delta so it should be fine.
There was a problem hiding this comment.
could you point out what is the logic difference introduced by the commit if that's the concern?
There was a problem hiding this comment.
the difference is that previously if a segment_id was provided to capture_text it would re-use that id for sending the text stream.
if we create a new id after a flush it will result in streams getting a new id, even though it might be originating from the same segment_id.
On the receiving side a new id would show up as a new message, while having the same id for the same segments would prompt the receiving side to replace the previous message instead of adding a new one.
Does that make it clearer?
Happy to jump on a call and discuss
| class MultiTextSink(TextSink): | ||
| def __init__(self, sinks: list[TextSink]) -> None: | ||
| self._sinks = sinks | ||
|
|
||
| async def capture_text(self, text: str, *, segment_id: str | None = None) -> None: | ||
| await asyncio.gather( |
There was a problem hiding this comment.
Let's keep the public API minimal for V1.0, I wouldn't expose it for now and keep this class private.
446e3a9 to
02493b2
Compare
02493b2 to
6b9a3ff
Compare
lukasIO
left a comment
There was a problem hiding this comment.
lgtm! thanks for fixing the outstanding issues 🙏
Co-authored-by: Long Chen <longch1024@gmail.com>
depends on livekit/python-sdks#362