The voice
format is <TTS System>:<Voice Name>[#<Speaker ID>
. The Speaker ID
is optional for multi speakers model only.
The following TTS Engines used:
- Coqui-TTS: Patched and embedded version of Coqui-TTS latest dev(0.7.0) version
- TTS System name:
tts
- Voice quality: Good
- Performance: Not Good, You need a powerful CPU and enough memory
- Resource overhead: High
- Builtin Voice Models:
zh_baker
: Chinese Voice from baker [F]en_vctk
: English Multi Speakers Voice [MF]
- TTS System name:
- ESpeaker
- TTS System name:
espeak
- Voice quality: Bad, like robotic.
- Performance: Very Good
- Resource overhead: Low
- Builtin Voice Models:
en-029
: English_(Caribbean) [M]en-gb
: English_(Great_Britain) [M]en-gb-scotland
: English_(Scotland) [M]en-gb-x-gbclan
: English_(Lancaster) [M]en-gb-x-gbcwmd
: English_(West_Midlands) [M]en-gb-x-rp
: English_(Received_Pronunciation) [M]en-us
: English_(America) [M]zh-cmn
: Chinese_(Mandarin) [M]zh-yue
: Chinese_(Cantonese) [M]
- TTS System name:
If your input text begins with a left angle bracket (<
) character, it will be interpreted as SSML.
A subset of SSML is supported:
<speak>
- wrap around SSML textlang
- set language for document
<s>
- sentence (disables automatic sentence breaking)lang
- set language for sentence
<w>
/<token>
- word (disables automatic tokenization)<voice name="...">
- set voice of inner textvoice
- name or language of voice- Name format is
tts:voice
(e.g., "glow-speak:en-us_mary_ann") ortts:voice#speaker_id
(e.g., "coqui-tts:en_vctk#p228") - If one of the supported languages, a preferred voice is used (override with
--preferred-voice <lang> <voice>
)
- Name format is
<say-as interpret-as="">
- force interpretation of inner textinterpret-as
one of "spell-out", "date", "number", "time", or "currency"format
- way to format text depending oninterpret-as
- number - one of "cardinal", "ordinal", "digits", "year"
- date - string with "d" (cardinal day), "o" (ordinal day), "m" (month), or "y" (year)
<break time="">
- Pause for given amount of time- time - seconds ("123s") or milliseconds ("123ms")
<sub alias="">
- substitutealias
for inner text
eg,
<speak>
<s lang="zh">欢迎使用离线语音合成</s>
<s lang="en-us">Welcome to Offline Speech Synthesis.</s>
</speak>
- Coqui-TTS
- ESpeaker
- Main Inspired by OpenTTS.
- Great Thanks. Without OpenTTS there would be no Offline TTS.
- Upgrade Coqui-TTS from 0.3.1 to latest version 0.7.0dev
- fix: Check if optional dependencies are installed before loading ZH/JA phonememizer
- Remove matplotlib (It is only useful during the train analysis phase).
- Optimal Coqui-TTS Models Size
- Optimal Coqui-TTS Models on Embedded device
- Espeak Chinese locale missing
- Show used languages only
- Can not use SSML on HA
- Can not modify options on HA for the
/data/options.json
cannot read via common user. - Add preferred voice for language option