Support AWS plugin for TTS, STT and LLM #1302

jayeshp19 · 2024-12-26T20:22:29Z

This PR implements AWS plugin for TTS and STT

…-stt

changeset-bot · 2024-12-26T20:22:33Z

🦋 Changeset detected

Latest commit: fe986ab

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package

Name	Type
livekit-plugins-aws	Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

livekit-plugins/install_local.sh

theomonnom · 2025-01-13T12:21:18Z

livekit-plugins/livekit-plugins-aws/livekit/plugins/aws/_utils.py

+    return credentials.access_key, credentials.secret_key
+
+
+TTS_SPEECH_ENGINE = Literal["standard", "neural", "long-form", "generative"]


We should move this to another file. Check how we do it for other TTS/STT

theomonnom · 2025-01-13T12:21:54Z

livekit-plugins/livekit-plugins-aws/livekit/plugins/aws/tts.py

+
+                response = await client.synthesize_speech(**_strip_nones(params))
+
+                if "AudioStream" in response:


nit avoid the extra indent here

theomonnom · 2025-01-13T12:23:03Z

livekit-plugins/livekit-plugins-aws/livekit/plugins/aws/stt.py

+                except Exception as e:
+                    logger.exception(f"an error occurred while streaming inputs: {e}")
+
+            handler = TranscriptEventHandler(stream.output_stream, self._event_ch)


Why do we create a separate class?

theomonnom · 2025-01-13T12:24:47Z

livekit-plugins/livekit-plugins-aws/livekit/plugins/aws/tts.py

+        self,
+        *,
+        voice: str | None = "Ruth",
+        language: TTS_LANGUAGE | None = None,


Suggested change

language: TTS_LANGUAGE | None = None,

language: TTS_LANGUAGE | None = None,

We should always allow a str too here, we can't guarantee we will update the languages quickly

theomonnom · 2025-01-13T12:25:05Z

livekit-plugins/livekit-plugins-aws/livekit/plugins/aws/tts.py

+        *,
+        voice: str | None = "Ruth",
+        language: TTS_LANGUAGE | None = None,
+        output_format: TTS_OUTPUT_FORMAT = "pcm",


I don't think it makes sense to expose the output format. we only support pcm

we do support mp3

…-stt

theomonnom · 2025-01-24T09:54:04Z

livekit-plugins/livekit-plugins-aws/livekit/plugins/aws/_utils.py

+
+    # If API key and secret are provided, create a session with them
+    if api_key and api_secret:
+        session = boto3.Session(


Is this making network calls?

boto3.Session() doesn’t make network calls, but session.get_credentials() does if API keys and secrets aren’t cached., but we’re calling it during initialization, it’s a one-time operation.

theomonnom · 2025-01-24T09:54:59Z

livekit-plugins/livekit-plugins-aws/livekit/plugins/aws/stt.py

+                try:
+                    async for frame in self._input_ch:
+                        if isinstance(frame, rtc.AudioFrame):
+                            await stream.input_stream.send_audio_event(
+                                audio_chunk=frame.data.tobytes()
+                            )
+                    await stream.input_stream.end_stream()
+
+                except Exception as e:
+                    logger.exception(f"an error occurred while streaming inputs: {e}")
+
+            async def handle_transcript_events():
+                try:
+                    async for event in stream.output_stream:
+                        if isinstance(event, TranscriptEvent):
+                            self._process_transcript_event(event)
+                except Exception as e:
+                    logger.exception(
+                        f"An error occurred while handling transcript events: {e}"
+                    )


Instead of using try — finally. We have an utility for it here

theomonnom · 2025-01-24T09:55:28Z

livekit-plugins/livekit-plugins-aws/livekit/plugins/aws/stt.py

+            finally:
+                await utils.aio.gracefully_cancel(*tasks)
+        except Exception as e:
+            logger.exception(f"An error occurred while streaming inputs: {e}")


I think this is swallowing exceptions? In this case the baseclass will not try to reconnect on failure

theomonnom · 2025-01-24T09:57:43Z

livekit-plugins/livekit-plugins-aws/livekit/plugins/aws/tts.py

+    def get_client(self):
+        """Returns a client creator context."""
+        return self._session.create_client(
+            "polly",
+            region_name=self._opts.speech_region,
+            aws_access_key_id=self._api_key,
+            aws_secret_access_key=self._api_secret,
+        )


Should we hide this?

theomonnom · 2025-01-24T09:58:00Z

livekit-plugins/livekit-plugins-aws/livekit/plugins/aws/tts.py

+from typing import Any, Callable
+
+import aiohttp
+from aiobotocore.session import AioSession, get_session  # type: ignore


They don't support types?

They don't have official support for types. I found this thread boto/boto3#2213 and this library https://github.com/youtype should we include it?

dasxran and others added 15 commits July 14, 2024 14:26

base files for AWS plugins

23efbf4

aws workflow

70fcfc3

rename aws folder

fa9a9f5

Polly TTS

7282f51

Update test_tts with aws

7d07511

Merge branch 'livekit:main' into dex/aws

485e410

Merge branch 'livekit:main' into dex/aws

c1676ed

aws transcribe

b19b2c8

livekit-agents v0.8.0

32e8f32

Merge remote-tracking branch 'dasxran/dex/aws' into aws-tts-stt

0d40699

ruff check & yml changes

8d7210f

setup changes

0d84256

Merge branch 'main' of https://github.com/livekit/agents into aws-tts…

9c0c373

…-stt

Merge branch 'main' of https://github.com/livekit/agents into aws-tts…

b2b5614

…-stt

updates

c2d26fc

jayeshp19 added 4 commits December 27, 2024 02:42

updates

ca7d609

updates

84741d9

updates

660d783

updates

81e2ebd

theomonnom reviewed Jan 13, 2025

View reviewed changes

livekit-plugins/install_local.sh Show resolved Hide resolved

updates

93f8b19

theomonnom reviewed Jan 13, 2025

View reviewed changes

updates

69b286b

theomonnom reviewed Jan 13, 2025

View reviewed changes

jayeshp19 added 2 commits January 13, 2025 19:37

updates

9d52b78

updates

71f450d

jayeshp19 added 11 commits January 13, 2025 20:29

updates

dbf090d

updates

0f8b2a3

updates

121cc02

updates

a7dbac9

updates

98d9882

updates

856ba87

updates

599e197

updates

651390a

debug

c1127f3

debug

941ef3f

Merge branch 'main' of https://github.com/livekit/agents into aws-tts…

55768ae

…-stt

jayeshp19 changed the title ~~[draft] Support AWS plugin for TTS and STT~~ Support AWS plugin for TTS and STT Jan 20, 2025

jayeshp19 marked this pull request as ready for review January 20, 2025 09:51

jayeshp19 added 5 commits January 20, 2025 15:25

changeset

5af4053

Merge branch 'main' of https://github.com/livekit/agents into aws-tts…

33c787d

…-stt

updates

23de909

updates

6b8e706

Merge branch 'main' of https://github.com/livekit/agents into aws-tts…

1b56f02

…-stt

theomonnom reviewed Jan 24, 2025

View reviewed changes

jayeshp19 added 4 commits January 25, 2025 15:51

updates

827a09c

supprot bedrock llms

fa2efdd

ruff

c735e47

updates

e8557e1

jayeshp19 changed the title ~~Support AWS plugin for TTS and STT~~ Support AWS plugin for TTS, STT and LLM Jan 27, 2025

jayeshp19 added 6 commits January 27, 2025 20:21

updates

3e4869b

updates

c46554c

updates

1a07bb6

updates

c6687d6

updates

c9dabff

update test llm

fe986ab

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support AWS plugin for TTS, STT and LLM #1302

Support AWS plugin for TTS, STT and LLM #1302

jayeshp19 commented Dec 26, 2024 •

edited

Loading

changeset-bot bot commented Dec 26, 2024 •

edited

Loading

theomonnom Jan 13, 2025 •

edited

Loading

theomonnom Jan 13, 2025

theomonnom Jan 13, 2025

theomonnom Jan 13, 2025

theomonnom Jan 13, 2025

jayeshp19 Jan 13, 2025

theomonnom Jan 13, 2025

theomonnom Jan 24, 2025

jayeshp19 Jan 24, 2025

theomonnom Jan 24, 2025

theomonnom Jan 24, 2025

theomonnom Jan 24, 2025

theomonnom Jan 24, 2025

jayeshp19 Jan 27, 2025 •

edited

Loading

		return credentials.access_key, credentials.secret_key


		TTS_SPEECH_ENGINE = Literal["standard", "neural", "long-form", "generative"]


		response = await client.synthesize_speech(**_strip_nones(params))

		if "AudioStream" in response:

	language: TTS_LANGUAGE \| None = None,
	language: TTS_LANGUAGE \| None = None,

Support AWS plugin for TTS, STT and LLM #1302

Are you sure you want to change the base?

Support AWS plugin for TTS, STT and LLM #1302

Conversation

jayeshp19 commented Dec 26, 2024 • edited Loading

changeset-bot bot commented Dec 26, 2024 • edited Loading

🦋 Changeset detected

theomonnom Jan 13, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jayeshp19 Jan 27, 2025 • edited Loading

Choose a reason for hiding this comment

jayeshp19 commented Dec 26, 2024 •

edited

Loading

changeset-bot bot commented Dec 26, 2024 •

edited

Loading

theomonnom Jan 13, 2025 •

edited

Loading

jayeshp19 Jan 27, 2025 •

edited

Loading