Add support for voice styles to Text-to-Speech #1182
Replies: 4 comments 1 reply
-
For clarification: is the provided style set in the configuration of the TTS in voice assistant settings or can it also be set as part of the action to augment a message? It would be nice to include a style as an parameter. |
Beta Was this translation helpful? Give feedback.
-
What can this be used for? Why do we need it?
Those seem to be 18 different voices, not styles. I mean OK, technically they may be styles, but from an application standpoint, they can't really be used as such.
That's a test env on your tenant which can't be accessed by anyone who doesn't have the right. |
Beta Was this translation helpful? Give feedback.
-
We have discussed this one in the architectural core meeting last week. The idea/concept is OK to add. However, we think it should be part of the objects of voices we already return. This existing dataclass could, in our opinion, be extended with a property that holds these styles. Also: Maybe use "variants" or "moods" instead of styles? 🤷 |
Beta Was this translation helpful? Give feedback.
-
In addition to the voice styles, it would also be useful to be able to set the voice rate/speed. |
Beta Was this translation helpful? Give feedback.
-
Context
Text-to-Speech models can often generate the voice in different styles. Happy, friendly, angry, sad etc. Home Assistant is currently only able to expose a single style for each voice.
The Text-to-Speech entities currently allow listing the supported languages, and per language get the supported voices (docs).
The selected voice is passed as the
voice
key in theoptions
dictionary when calling thetts.speak
action. We have a UI to make this easy to configure in the media browser.Decision
We add a new callback method
async_get_supported_styles(self, language: str, voice: str) -> None | list[str]
.The default method implementation returns
None
. This will indicate to Home Assistant that this voice does not support styles.If a list is returned, the user can pick one and provide them as
style
in the TTS options.Consequences
The number of available voices/styles that a user can choose from for Text-to-Speech providers will greatly increase.
Example integrations that will benefit:
arctic
voice to see 18 different stylesAlternatives
As an alternative, we could list all styles of a voice as their own voice.
For example, we would list AmyNeural:friendly, AmyNeural:sad etc. The downside is that this will result in very long lists and difficult to browse.
Beta Was this translation helpful? Give feedback.
All reactions