avatar plugin based on v1.0#1391

Merged

longcw merged 33 commits intodev-1.0from

longc/avatar-plugin

Feb 11, 2025

Contributor

longcw commented Jan 20, 2025 •

edited

Loading

AudioSink based on DataStream (Add data stream support python-sdks#347)
Avatar worker example with video generation and av sync

longcw added 3 commits

January 17, 2025 18:13


          avatar plugin wip

d7a24d9


          add sink control and worker

9a96aa9


          update avatar io api

fc2bd2d

changeset-bot bot commented Jan 20, 2025 •

edited

Loading

⚠️ No Changeset found

Latest commit: f714070

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

longcw changed the base branch from main to dev-1.0

January 20, 2025 03:45


          update video generator protocol

ddde6af

longcw requested review from davidzhao and theomonnom

January 20, 2025 03:47


          fix wait_for_participant

f1ac76e

davidzhao reviewed

View reviewed changes

Member

davidzhao left a comment

this looks great! just a few comments.

we'll also need some error handling in various parts.. how do both sides handle cases where the other side is disconnected. if the avatar participant is gone for longer than a reasonable timeout, then the agent would likely need to report that error and shutdown itself.

similarly.. if the controller is gone, the service on the other side might want to avoid consuming resources and also exit

livekit-plugins/livekit-plugins-avatar/livekit/plugins/avatar/io.py Outdated

+              AUDIO_SENDER_ATTR = "__livekit_avatar_audio_sender"
+              AUDIO_RECEIVER_ATTR = "__livekit_avatar_audio_receiver"
+              RPC_INTERRUPT_PLAYBACK = "__livekit_avatar_interrupt_playback"

Member

davidzhao Jan 20, 2025

nit: for consistency, using lk. namespace to identify livekit specific actions

Suggested change

      
            RPC_INTERRUPT_PLAYBACK = "__livekit_avatar_interrupt_playback"
          
            RPC_INTERRUPT_PLAYBACK = "lk.interrupt_playback"

livekit-plugins/livekit-plugins-avatar/livekit/plugins/avatar/io.py Outdated

+                  async def start(self) -> None:
+                      """Wait for worker participant to join and start streaming"""
+                      # mark self as sender
+                      await self._room.local_participant.set_attributes({AUDIO_SENDER_ATTR: "true"})

Member

davidzhao Jan 20, 2025

was thinking we can simplify this step. instead the receiver could just wait for an audio stream of a particular name?

livekit-plugins/livekit-plugins-avatar/livekit/plugins/avatar/io.py Outdated

+                      """Wait for worker participant to join and start streaming"""
+                      # mark self as sender
+                      await self._room.local_participant.set_attributes({AUDIO_SENDER_ATTR: "true"})
+                      self._remote_participant = await wait_for_participant(

Member

davidzhao Jan 20, 2025

what if.. instead of waiting for an attribute, we could:

take avatar_identity as a param in the sink (with a sane default)
create a token for that identity and send it to the other side as part of initial handshake
here we can just wait for that agreed-upon identity

Contributor Author

longcw Jan 20, 2025

sounds good! on the avatar side, it wait for the audio stream with a particular name from the participant with kind=='agent'.

Contributor Author

longcw Jan 20, 2025

updated

livekit-plugins/livekit-plugins-avatar/livekit/plugins/avatar/io.py Outdated

+                          # start new stream
+                          # TODO: any better option to send the metadata?
+                          name = f"audio_{frame.sample_rate}_{frame.num_channels}"
+                          self._stream_writer = await self._room.local_participant.stream_file(

Member

davidzhao Jan 20, 2025

this is a good use of stream extensions:

writer = await room.local_participant.stream_file("audio",
    extensions={"sample_rate": "48000", "channels": "1"})

or

writer = await room.local_participant.stream_file("audio",
    extensions={"audio_settings": json.dumps({"sample_rate": 48000, channels: 1})})

Contributor Author

longcw Jan 20, 2025

Oh I see. the extensions is some kind of metadata? then what is the reason it is named extensions?

Member

davidzhao Jan 20, 2025

yeah.. I think attributes is probably a better name

livekit-plugins/livekit-plugins-avatar/livekit/plugins/avatar/io.py Outdated

+                      # mark self as receiver
+                      await self._room.local_participant.set_attributes({AUDIO_RECEIVER_ATTR: "true"})
+                      self._remote_participant = await wait_for_participant(

Member

davidzhao Jan 20, 2025

it seems here we can just wait for participant.kind == agent?

if we wanted to handle multiple avatars in the room, then the integration should take in the controller's identity.

lukasIO reviewed

View reviewed changes

livekit-plugins/livekit-plugins-avatar/livekit/plugins/avatar/io.py Outdated

+                          reader: rtc.FileStreamReader, remote_participant_id: str
+                      ) -> None:
+                          if remote_participant_id != self._remote_participant.identity:
+                              logger.warning(

Contributor

lukasIO Jan 20, 2025

would we really want to warn on any other incoming file stream? that seems like a rather narrow use case for this plugin

Contributor Author

longcw Jan 20, 2025

oh I see, I'll filter for the audio stream first, so other data streams can still be processed by other handlers.

Btw, what is the use case of the file_name in data stream, can I pass a tag and the metadata like sample_rate and num_channels using file name, or is there any better option for this.

Contributor

lukasIO Jan 20, 2025

see @davidzhao's comment above, the best option is the extensions map on the stream.

livekit-plugins/livekit-plugins-avatar/livekit/plugins/avatar/io.py Outdated

+                              reader = self._stream_readers.pop(0)
+                              async for data in reader.stream_reader:
+                                  yield rtc.AudioFrame(
+                                      data=data,

Contributor

lukasIO Jan 20, 2025

this pattern would suggest that we're sure a single audio frame never exceeds STREAM_CHUNK_SIZE (~15kb)

Contributor Author

longcw Jan 20, 2025

For audio bytes, splitting large chunks into smaller chunks before sending is fine, and should be less than 15kb. but for other use cases, receiving a different number of chunks than it send may not be good behavior. Maybe add a size limit at the send side?

Contributor

lukasIO Jan 20, 2025

I originally had a size limit in there, @theomonnom's wish was that we wouldn't enforce such a limit, but I agree, it might make things trickier if we don't have a sender side size limit

longcw added 16 commits

January 20, 2025 22:45


          update avatar connection

f27e8f6


          update example

ec1fadc


          move to example

2f1693a


          add dispatcher

b089a3a


          Merge remote-tracking branch 'origin/dev-1.0' into longc/avatar-plugin

6666a4d


          add readme

492d89b


          Merge remote-tracking branch 'origin/dev-1.0' into longc/avatar-plugin

e55b40a


          add connection info

e05d2af


          add wait_for_subscription for avatar worker


          Merge branch 'dev-1.0' into longc/avatar-plugin

b3ad4f6


          update datastream usage

bf36dfa


          Merge remote-tracking branch 'origin/dev-1.0' into longc/avatar-plugin

2e05b9c


          fix data stream for avatar plugin

f37d610


          fix video display

3fc6f78


          refactor avatar example

d3dfb10


          fix avatar worker for interruption

28145b3

longcw changed the title ~~[draft] avatar plugin based on v1.0~~ avatar plugin based on v1.0

longcw added 2 commits

February 4, 2025 19:06


          update wave viz

772a0b8


          revert data receiver

3f7dfc9

longcw and others added 6 commits

February 4, 2025 21:34


          fix types

a0b281a


          fix video generator audio flush

38e1973


          skip silence frame if queue not empty

01a45cf


          update package deps


          move the launch avatar to example

51192c0


          fix room input and make videogen clear buffer async

2b195b0

bcherry reviewed

View reviewed changes

examples/avatar/agent_worker.py

+              ) -> None:
+                  """Wait for worker participant to join and start streaming"""
+                  # create a token for the avatar worker
+                  # TODO(long): do we need to set agent=True here? in playground if not the video track is not automatically displayed

Contributor

bcherry Feb 10, 2025

we might want a new kind to distinguish but we probably shouldn't join with the standard participant kind

bcherry reviewed

View reviewed changes

examples/avatar/agent_worker.py

+                      .with_name("Avatar Worker")
+                      .with_grants(api.VideoGrants(room_join=True, room=ctx.room.name, agent=True))
+                      .with_metadata("avatar_worker")
+                      .to_jwt()

Contributor

bcherry Feb 10, 2025

we should specify the identity of the "original" agent as well so that the integrator can add a guard on their RPC handlers methods (if data.caller_identity !== agent_identity: print("RPC call from unexpected participant"); return)

bcherry reviewed

View reviewed changes

examples/avatar/agent_worker.py

+                  logger.info(f"Sending connection info to avatar dispatcher {avatar_dispatcher_url}")
+                  connection_info = AvatarConnectionInfo(
+                      room_name=ctx.room.name, url=ctx._info.url, token=token
+                  )

Contributor

bcherry Feb 10, 2025

we should specify the identity of the "original" agent as well so that the integrator can add a guard on their RPC handlers methods (if data.caller_identity !== agent_identity: print("RPC call from unexpected participant"); return)

Contributor Author

longcw Feb 11, 2025

This makes sense! I added the check in the rpc hanlders.

longcw mentioned this pull request

Extending VoicePipelineAgent with a generative video avatar (we have the model, we need to figure out how to plug it in) #1424

Closed

longcw added 4 commits

February 11, 2025 11:47


          add guard for rpc calls in data stream io

e628293


          fix room io example for wait_for_participant

53be3bf


          Merge remote-tracking branch 'origin/dev-1.0' into longc/avatar-plugin

3f265c7


          fix logs

f714070

longcw merged commit 26fa0a9 into dev-1.0

1 check passed

longcw deleted the longc/avatar-plugin branch

February 11, 2025 08:52

theomonnom reviewed

View reviewed changes

livekit-agents/livekit/agents/pipeline/room_io.py

                       async def _read_stream():
                           async for event in self._audio_stream:
                               yield event.frame
+                              await asyncio.sleep(0)

Member

theomonnom Feb 11, 2025

any reason why this is needed?

Contributor Author

longcw Feb 11, 2025

It was added for debugging an event loop issue. Will remove it.

theomonnom pushed a commit that referenced this pull request


          avatar plugin based on v1.0 (#1391)

9c26dcb

Co-authored-by: David Zhao <dz@livekit.io>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet