LLaSA_training

LLaSA: Scaling Train-time and Test-time Compute for LLaMA-based Speech Synthesis (Comming Soon!)

Training

torchrun --nproc_per_node=8 train_tts.py config.json

or

sbatch run_slurm.sh

Data

You can download tokenized open-source speech data here. This includes LibriHeavy, Emilia (in both Chinese and English), and WenetSpeech4TTS, totaling approximately 160,000 hours of open-source data.

Our models are trained on 250,000 hours of speech data. Of this, 160,000 hours come from the open-source datasets mentioned above, while the remaining 90,000 hours are from internal datasets, which are not yet available for open-source release.

Directly used on Hugging Face

Codec: xcodec2 (Please install new version xcodec2==0.1.3)

LLaMa based TTS 3b version: Llasa-3B

LLaMa based TTS 1b version: Llasa-1B

LLaMa based TTS 8b version: Llasa-8B (Comming Soon!)

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
LICENSE		LICENSE
README.md		README.md
config.json		config.json
ds_config_zero2.json		ds_config_zero2.json
ds_config_zero3.json		ds_config_zero3.json
requirements.txt		requirements.txt
run_slurm.sh		run_slurm.sh
train_tts.py		train_tts.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLaSA_training

Training

Data

Directly used on Hugging Face

About

Releases

Packages

Languages

License

zhenye234/LLaSA_training

Folders and files

Latest commit

History

Repository files navigation

LLaSA_training

Training

Data

Directly used on Hugging Face

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages