Skip to content

aishell1 zipformer recipe dose not work #2069

@menggedu

Description

@menggedu

according to the readme, the running script is:
export CUDA_VISIBLE_DEVICES="0,1"
./zipformer/train.py
--world-size 2
--num-epochs 60
--start-epoch 1
--use-fp16 1
--context-size 1
--enable-musan 0
--exp-dir zipformer/exp
--max-duration 500
--base-lr 0.045
--lr-batches 7500
--lr-epochs 18
--spec-aug-time-warp-factor 20
--use-ctc 1
--use-cr-ctc 1
--use-transducer 0
--enable-spec-aug 0
--cr-loss-scale 0.2. however, even I decreased the base-lr to 0.02, the following errors still appeared:
========================= NOTE =========================
If you see this error, it means that the gradient scale is too small.

    The default base_lr is 0.045 / 0.05 (depends on which recipe you are 
    using), this is an empirical value obtained mostly using 4 * 32GB V100 
    GPUs with a max_duration of approx. 1,000. 
    The proper value of base_lr may vary depending on the number of GPUs 
    and the value of max-duration you are using. 

    To fix this issue, you may need to adjust the value of base_lr accordingly.

    We would suggest you to decrease the value of base_lr by 0.005 (e.g., 
    from 0.045 to 0.04), and try again. If the error still exists, you may 
    repeat the process until base_lr hits 0.02. (Note that this will lead to 
    certain loss of performance, but it should work. You can compensate this by
    increasing the num_epochs.)
    
    If the error still exists, you could try to seek help by raising an issue, 
    with a detailed description of (a) your computational resources, (b) the 
    base_lr and (c) the max_duration you are using, (d) detailed configuration 
    of your model.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions