Add support for voice styles to Text-to-Speech #1182

balloob · 2025-01-14T04:14:22Z

balloob
Jan 14, 2025
Maintainer

Context

Text-to-Speech models can often generate the voice in different styles. Happy, friendly, angry, sad etc. Home Assistant is currently only able to expose a single style for each voice.

The Text-to-Speech entities currently allow listing the supported languages, and per language get the supported voices (docs).

The selected voice is passed as the voice key in the options dictionary when calling the tts.speak action. We have a UI to make this easy to configure in the media browser.

Decision

We add a new callback method async_get_supported_styles(self, language: str, voice: str) -> None | list[str].

The default method implementation returns None. This will indicate to Home Assistant that this voice does not support styles.

If a list is returned, the user can pick one and provide them as style in the TTS options.

Consequences

The number of available voices/styles that a user can choose from for Text-to-Speech providers will greatly increase.

Example integrations that will benefit:

Piper via Wyoming (examples and pick US English, arctic voice to see 18 different styles
Home Assistant Cloud (Azure available styles)

Alternatives

As an alternative, we could list all styles of a voice as their own voice.

For example, we would list AmyNeural:friendly, AmyNeural:sad etc. The downside is that this will result in very long lists and difficult to browse.

AJediIAm · 2025-01-14T07:11:40Z

AJediIAm
Jan 14, 2025

For clarification: is the provided style set in the configuration of the TTS in voice assistant settings or can it also be set as part of the action to augment a message?

It would be nice to include a style as an parameter.

1 reply

AJediIAm Jan 14, 2025

Using voice styles as parameters will allow us to announce the weather in a sad voice when it's raining on a workday and a happy voice when Venus is visible in the night sky.

tetele · 2025-01-14T07:18:24Z

tetele
Jan 14, 2025

What can this be used for? Why do we need it?

Piper via Wyoming (examples and pick US English, arctic voice to see 18 different styles

Those seem to be 18 different voices, not styles. I mean OK, technically they may be styles, but from an application standpoint, they can't really be used as such.

Home Assistant Cloud (Azure available styles)

That's a test env on your tenant which can't be accessed by anyone who doesn't have the right.

0 replies

frenck · 2025-02-19T21:22:12Z

frenck
Feb 19, 2025
Maintainer

We have discussed this one in the architectural core meeting last week.

The idea/concept is OK to add. However, we think it should be part of the objects of voices we already return. This existing dataclass could, in our opinion, be extended with a property that holds these styles.

Also: Maybe use "variants" or "moods" instead of styles? 🤷

0 replies

noxhirsch · 2025-02-20T10:29:06Z

noxhirsch
Feb 20, 2025

In addition to the voice styles, it would also be useful to be able to set the voice rate/speed.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for voice styles to Text-to-Speech #1182

{{title}}

Replies: 4 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Add support for voice styles to Text-to-Speech #1182

balloob Jan 14, 2025 Maintainer

Context

Decision

Consequences

Alternatives

Replies: 4 comments · 1 reply

AJediIAm Jan 14, 2025

AJediIAm Jan 14, 2025

tetele Jan 14, 2025

frenck Feb 19, 2025 Maintainer

noxhirsch Feb 20, 2025

balloob
Jan 14, 2025
Maintainer

Replies: 4 comments 1 reply

AJediIAm
Jan 14, 2025

tetele
Jan 14, 2025

frenck
Feb 19, 2025
Maintainer

noxhirsch
Feb 20, 2025