The decoder swallows one or more words. Is this due to the training or the decoder itself?

We are training the streaming model on our dataset, which has been manually reviewed and corrected. The audio and text are likely 99.9% verified.
For training, we are using the script based on LibriSpeech (egs/librispeech/ASR/zipformer/train.py).
All parameters are set to default. We are training up to 20 and 30 epochs.
After decoding with the zipformer-pretrained script (egs/librispeech/ASR/zipformer/pretrained.py), some words are missing in the resulting text. The missing words most frequently occur after longer pauses (pauses longer than 3 seconds), but not after every pause; it seems to happen very randomly.
We tried the sherpa-onnx decoder, and it makes the same errors.
Is this a problem with the training or with the two decoders?

What can we try? Do we need to use different parameters for training? Or extra methods for decoding or training?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The decoder swallows one or more words. Is this due to the training or the decoder itself? #2071

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

The decoder swallows one or more words. Is this due to the training or the decoder itself? #2071

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions