Skip to content

LLaSA: Scaling Train-time and Test-time Compute for LLaMA-based Speech Synthesis

License

Notifications You must be signed in to change notification settings

zhenye234/LLaSA_training

Repository files navigation

LLaSA_training

LLaSA: Scaling Train-time and Test-time Compute for LLaMA-based Speech Synthesis (Comming Soon!)

Training

torchrun --nproc_per_node=8 train_tts.py config.json 

or

sbatch run_slurm.sh

Data

You can download tokenized open-source speech data here. This includes LibriHeavy, Emilia (in both Chinese and English), and WenetSpeech4TTS, totaling approximately 160,000 hours of open-source data.

Our models are trained on 250,000 hours of speech data. Of this, 160,000 hours come from the open-source datasets mentioned above, while the remaining 90,000 hours are from internal datasets, which are not yet available for open-source release.

Directly used on Hugging Face

Codec: xcodec2 (Please install new version xcodec2==0.1.3)

LLaMa based TTS 3b version: Llasa-3B

LLaMa based TTS 1b version: Llasa-1B

LLaMa based TTS 8b version: Llasa-8B (Comming Soon!)

About

LLaSA: Scaling Train-time and Test-time Compute for LLaMA-based Speech Synthesis

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published