Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

对单个人声音训练总是失败,什么原因 #1061

Open
isnew1234 opened this issue Mar 11, 2025 · 1 comment
Open

对单个人声音训练总是失败,什么原因 #1061

isnew1234 opened this issue Mar 11, 2025 · 1 comment

Comments

@isnew1234
Copy link

isnew1234 commented Mar 11, 2025

预训练模型cosyvoice2.05b , 单人音频,训练了好多次,每次cv表现最好的就是第2轮 。 更换数据还是一样。
在Mac Pro 单机,纯cpu 训练,第2轮最好
后来在google colab上 训练也是同样的结果 ,从第3轮开始,训练loss 持续下降,而验证loss持续上升

比如这一次
训练集 总文件数: 724 ,总时长: 1小时 56分钟 59.36秒
验证集 总文件数: 181 总时长: 0小时 29分钟 16.13秒
训练集和验证集数据是从900条音频里随机分配的

config的设置大致如下:
train_conf:
optim: adam
optim_conf:
lr: 1e-5 # change to 1e-5 during sft
scheduler: constantlr # change to constantlr during sft
scheduler_conf:
warmup_steps: 300
max_epoch: 10
grad_clip: 2
accum_grad: 2
log_interval: 50
save_per_step: -1

部分log:
2025-03-10 09:06:17,280 INFO training on multiple gpus, this gpu 0, rank 0, world_size 1
2025-03-10 09:06:52,889 INFO [Rank 0] Checkpoint: save to checkpoint /content/CosyVoice/examples/libritts/cosyvoice2/exp/cosyvoice2/llm/torch_ddp/init.pt
start step 0 start epoch -1
2025-03-10 09:06:52,893 INFO Epoch 0 TRAIN info lr 1e-05 rank 0
2025-03-10 09:06:52,893 INFO using accumulate grad, new batch size is 2 times larger than before
iterations to have unused parameters. (function operator())
2025-03-10 09:07:45,188 DEBUG TRAIN Batch 0/100 loss 2.229862 acc 0.135678 lr 0.00001000 grad_norm 3.552090 rank 0
2025-03-10 09:08:11,549 DEBUG TRAIN Batch 0/200 loss 1.861251 acc 0.210526 lr 0.00001000 grad_norm 3.053562 rank 0
2025-03-10 09:08:38,172 DEBUG TRAIN Batch 0/300 loss 1.903610 acc 0.160396 lr 0.00001000 grad_norm 2.526870 rank 0
2025-03-10 09:09:08,649 DEBUG TRAIN Batch 0/400 loss 2.031664 acc 0.145357 lr 0.00001000 grad_norm 2.179606 rank 0
2025-03-10 09:09:10,200 INFO Epoch 0 Step 203 on_batch_end True CV rank 0
2025-03-10 09:09:37,698 DEBUG CV Batch 0/100 loss 3.654108 acc 0.161337 rank 0
2025-03-10 09:09:38,796 INFO Epoch 0 Step 203 CV info lr 1e-05 0 rank loss 4.066304985330908 acc 0.14319655109834933
2025-03-10 09:09:46,884 INFO [Rank 0] Checkpoint: save to checkpoint /content/CosyVoice/examples/libritts/cosyvoice2/exp/cosyvoice2/llm/torch_ddp/epoch_0_whole.pt
2025-03-10 09:12:28,417 DEBUG CV Batch 1/100 loss 3.617022 acc 0.162791 rank 0
2025-03-10 09:12:29,520 INFO Epoch 1 Step 405 CV info lr 1e-05 0 rank loss 4.045755437724498 acc 0.1469015193002
2025-03-10 09:12:45,271 INFO [Rank 0] Checkpoint: save to checkpoint /content/CosyVoice/examples/libritts/cosyvoice2/exp/cosyvoice2/llm/torch_ddp/epoch_1_whole.pt
2025-03-10 09:12:45,290 INFO Epoch 2 TRAIN info lr 1e-05 rank 0
2025-03-10 09:13:35,745 DEBUG TRAIN Batch 2/100 loss 1.855622 acc 0.197115 lr 0.00001000 grad_norm 3.473299 rank 0
2025-03-10 09:14:02,876 DEBUG TRAIN Batch 2/200 loss 1.696587 acc 0.177778 lr 0.00001000 grad_norm 4.587659 rank 0
2025-03-10 09:14:29,606 DEBUG TRAIN Batch 2/300 loss 1.496050 acc 0.247082 lr 0.00001000 grad_norm 3.525435 rank 0
2025-03-10 09:15:00,061 DEBUG TRAIN Batch 2/400 loss 1.872693 acc 0.168975 lr 0.00001000 grad_norm 3.511331 rank 0
2025-03-10 09:15:02,284 INFO Epoch 2 Step 608 on_batch_end True CV rank 0

2025-03-10 09:15:29,138 DEBUG CV Batch 2/100 loss 3.668330 acc 0.145349 rank 0
2025-03-10 09:15:30,292 INFO Epoch 2 Step 608 CV info lr 1e-05 0 rank loss 4.112808653004262 acc

2025-03-10 09:16:59,619 DEBUG TRAIN Batch 3/200 loss 1.673165 acc 0.212766 lr 0.00001000 grad_norm 7.111817 rank 0
2025-03-10 09:17:26,375 DEBUG TRAIN Batch 3/300 loss 1.754496 acc 0.210953 lr 0.00001000 grad_norm 6.716798 rank 0
2025-03-10 09:17:57,078 DEBUG TRAIN Batch 3/400 loss 2.292027 acc 0.129323 lr 0.00001000 grad_norm 5.269428 rank 0

2025-03-10 09:10:36,655 DEBUG TRAIN Batch 1/100 loss 2.029551 acc 0.139842 lr 0.00001000 grad_norm 3.222941 rank 0
2025-03-10 09:11:03,631 DEBUG TRAIN Batch 1/200 loss 1.926054 acc 0.146417 lr 0.00001000 grad_norm 5.332690 rank 0
2025-03-10 09:11:30,268 DEBUG TRAIN Batch 1/300 loss 2.058213 acc 0.138947 lr 0.00001000 grad_norm 2.970304 rank 0
2025-03-10 09:12:00,848 DEBUG TRAIN Batch 1/400 loss 2.029423 acc 0.134094 lr 0.00001000 grad_norm 2.306992 rank 0
2025-03-10 09:12:02,675 INFO Epoch 1 Step 405 on_batch_end True CV rank 0

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
2025-03-10 09:13:35,745 DEBUG TRAIN Batch 2/100 loss 1.855622 acc 0.197115 lr 0.00001000 grad_norm 3.473299 rank 0
2025-03-10 09:14:02,876 DEBUG TRAIN Batch 2/200 loss 1.696587 acc 0.177778 lr 0.00001000 grad_norm 4.587659 rank 0
2025-03-10 09:14:29,606 DEBUG TRAIN Batch 2/300 loss 1.496050 acc 0.247082 lr 0.00001000 grad_norm 3.525435 rank 0
2025-03-10 09:15:00,061 DEBUG TRAIN Batch 2/400 loss 1.872693 acc 0.168975 lr 0.00001000 grad_norm 3.511331 rank 0
2025-03-10 09:15:02,284 INFO Epoch 2 Step 608 on_batch_end True CV rank 0

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
2025-03-10 09:16:32,691 DEBUG TRAIN Batch 3/100 loss 1.780373 acc 0.201531 lr 0.00001000 grad_norm 5.435814 rank 0
2025-03-10 09:16:47,219 WARNING get infinite grad_norm, check your code/data if it appears frequently
2025-03-10 09:16:59,619 DEBUG TRAIN Batch 3/200 loss 1.673165 acc 0.212766 lr 0.00001000 grad_norm 7.111817 rank 0
2025-03-10 09:17:26,375 DEBUG TRAIN Batch 3/300 loss 1.754496 acc 0.210953 lr 0.00001000 grad_norm 6.716798 rank 0
2025-03-10 09:17:57,078 DEBUG TRAIN Batch 3/400 loss 2.292027 acc 0.129323 lr 0.00001000 grad_norm 5.269428 rank 0
025-03-10 09:18:25,056 DEBUG CV Batch 3/100 loss 3.691987 acc 0.171388 rank 0
2025-03-10 09:18:26,186 INFO Epoch 3 Step 810 CV info lr 1e-05 0 rank loss 4.252981220161058 acc 0.13736484442626573

do model average and final checkpoint is /content/CosyVoice/examples/libritts/cosyvoice2/exp/cosyvoice2/llm/torch_ddp/llm.pt
Namespace(dst_model='/content/CosyVoice/examples/libritts/cosyvoice2/exp/cosyvoice2/llm/torch_ddp/llm.pt', src_path='/content/CosyVoice/examples/libritts/cosyvoice2/exp/cosyvoice2/llm/torch_ddp', val_best=True, num=7)
best val (epoch, step, loss, tag) = [[1, 404, 4.045755437724498, 'CV'], [0, 202, 4.066304985330908, 'CV'], [2, 607, 4.112808653004262, 'CV'], [3, 809, 4.252981220161058, 'CV'], [4, 1011, 4.516476654874686, 'CV'], [5, 1213, 4.7580557422743315, 'CV'], [6, 1415, 5.3814836127981955, 'CV']] 第2个的cv表现最好

测试集中的一个音频例子:
归档.zip

@JohnHerry
Copy link

配置文件里说 SFT训练的时候学习率和 lr scheduler要调整一下,没见你调啊?人家注释就在后边呢?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants