Skip to content

The decoder swallows one or more words. Is this due to the training or the decoder itself? #2071

@BorisV113

Description

@BorisV113

We are training the streaming model on our dataset, which has been manually reviewed and corrected. The audio and text are likely 99.9% verified.
For training, we are using the script based on LibriSpeech (egs/librispeech/ASR/zipformer/train.py).
All parameters are set to default. We are training up to 20 and 30 epochs.
After decoding with the zipformer-pretrained script (egs/librispeech/ASR/zipformer/pretrained.py), some words are missing in the resulting text. The missing words most frequently occur after longer pauses (pauses longer than 3 seconds), but not after every pause; it seems to happen very randomly.
We tried the sherpa-onnx decoder, and it makes the same errors.
Is this a problem with the training or with the two decoders?

What can we try? Do we need to use different parameters for training? Or extra methods for decoding or training?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions