Some speech zone with low-volume or high distortion are not detected #45
-
Hi there! I tried to use whisper_timestamped for better time stamping. My test samples are the premade voice lines recorded in one .wav file separated by 2 seconds. For me it looked like it should be an easy task for a Whisper. After it failed I moved to whisper_timestamped. I attach the results I got. The blue graph is the waveform of my audio, while red ranges are the timelines provided by whisper_timestamped. Most frequent errors I get are:
So, while overall performance is good, I would like to ask if I can do something to improve performance. I tried allowing refining whisper model up to 5 sec. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
Can you please describe more precisely what is the problem:
If you are using the trainscribe() function in python you can :
|
Beta Was this translation helpful? Give feedback.
Can you please describe more precisely what is the problem:
If you are using the trainscribe() function in python you can :
beam_size=5, best_of=5, temperature=(0.0, 0.2, 0.4, 0.6, 0.8, 1.0)
(for cases where some phrases are missing from the transcription)trust_whisper_timestamps = False
(then no need to tunerefine_whisper_precision
, it will just recompute all timestamps)