-
Notifications
You must be signed in to change notification settings - Fork 583
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support AWS plugin for TTS, STT and LLM #1302
base: main
Are you sure you want to change the base?
Conversation
🦋 Changeset detectedLatest commit: fe986ab The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
return credentials.access_key, credentials.secret_key | ||
|
||
|
||
TTS_SPEECH_ENGINE = Literal["standard", "neural", "long-form", "generative"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should move this to another file. Check how we do it for other TTS/STT
|
||
response = await client.synthesize_speech(**_strip_nones(params)) | ||
|
||
if "AudioStream" in response: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit avoid the extra indent here
except Exception as e: | ||
logger.exception(f"an error occurred while streaming inputs: {e}") | ||
|
||
handler = TranscriptEventHandler(stream.output_stream, self._event_ch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we create a separate class?
self, | ||
*, | ||
voice: str | None = "Ruth", | ||
language: TTS_LANGUAGE | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
language: TTS_LANGUAGE | None = None, | |
language: TTS_LANGUAGE | None = None, |
We should always allow a str too here, we can't guarantee we will update the languages quickly
*, | ||
voice: str | None = "Ruth", | ||
language: TTS_LANGUAGE | None = None, | ||
output_format: TTS_OUTPUT_FORMAT = "pcm", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it makes sense to expose the output format. we only support pcm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do support mp3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah true!
|
||
# If API key and secret are provided, create a session with them | ||
if api_key and api_secret: | ||
session = boto3.Session( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this making network calls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
boto3.Session()
doesn’t make network calls, but session.get_credentials() does if API keys and secrets aren’t cached., but we’re calling it during initialization, it’s a one-time operation.
try: | ||
async for frame in self._input_ch: | ||
if isinstance(frame, rtc.AudioFrame): | ||
await stream.input_stream.send_audio_event( | ||
audio_chunk=frame.data.tobytes() | ||
) | ||
await stream.input_stream.end_stream() | ||
|
||
except Exception as e: | ||
logger.exception(f"an error occurred while streaming inputs: {e}") | ||
|
||
async def handle_transcript_events(): | ||
try: | ||
async for event in stream.output_stream: | ||
if isinstance(event, TranscriptEvent): | ||
self._process_transcript_event(event) | ||
except Exception as e: | ||
logger.exception( | ||
f"An error occurred while handling transcript events: {e}" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using try — finally. We have an utility for it here
finally: | ||
await utils.aio.gracefully_cancel(*tasks) | ||
except Exception as e: | ||
logger.exception(f"An error occurred while streaming inputs: {e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is swallowing exceptions? In this case the baseclass will not try to reconnect on failure
def get_client(self): | ||
"""Returns a client creator context.""" | ||
return self._session.create_client( | ||
"polly", | ||
region_name=self._opts.speech_region, | ||
aws_access_key_id=self._api_key, | ||
aws_secret_access_key=self._api_secret, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we hide this?
from typing import Any, Callable | ||
|
||
import aiohttp | ||
from aiobotocore.session import AioSession, get_session # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They don't support types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They don't have official support for types. I found this thread boto/boto3#2213 and this library https://github.com/youtype should we include it?
This PR implements AWS plugin for TTS and STT