-
Notifications
You must be signed in to change notification settings - Fork 405
aishell1 zipformer recipe dose not work #2069
Copy link
Copy link
Open
Description
according to the readme, the running script is:
export CUDA_VISIBLE_DEVICES="0,1"
./zipformer/train.py
--world-size 2
--num-epochs 60
--start-epoch 1
--use-fp16 1
--context-size 1
--enable-musan 0
--exp-dir zipformer/exp
--max-duration 500
--base-lr 0.045
--lr-batches 7500
--lr-epochs 18
--spec-aug-time-warp-factor 20
--use-ctc 1
--use-cr-ctc 1
--use-transducer 0
--enable-spec-aug 0
--cr-loss-scale 0.2. however, even I decreased the base-lr to 0.02, the following errors still appeared:
========================= NOTE =========================
If you see this error, it means that the gradient scale is too small.
The default base_lr is 0.045 / 0.05 (depends on which recipe you are
using), this is an empirical value obtained mostly using 4 * 32GB V100
GPUs with a max_duration of approx. 1,000.
The proper value of base_lr may vary depending on the number of GPUs
and the value of max-duration you are using.
To fix this issue, you may need to adjust the value of base_lr accordingly.
We would suggest you to decrease the value of base_lr by 0.005 (e.g.,
from 0.045 to 0.04), and try again. If the error still exists, you may
repeat the process until base_lr hits 0.02. (Note that this will lead to
certain loss of performance, but it should work. You can compensate this by
increasing the num_epochs.)
If the error still exists, you could try to seek help by raising an issue,
with a detailed description of (a) your computational resources, (b) the
base_lr and (c) the max_duration you are using, (d) detailed configuration
of your model.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels