Option --translate doesn't work in most places. And when it works, there is no info about. #690

pcbua · 2023-03-30T00:53:13Z

pcbua
Mar 30, 2023

Hi,
I'm using --translate (translate from source language to English), and sometimes the transcription is done.
But many times transcription it is not done, and I see note about the language or text appears in the origin language.
Also, the results are missing information about the use of transcription, having this information would be very helpful.

I'm running Whisper CPP on several long recordings 20min - 2h with default settings, 15 threads (-p 1 -t 15).
I've tried multilanguage models: base, small, medium, large.
I'm using WAV files, not streaming.

Examples:
model=small:
[00:32:12.000 --> 00:32:15.000] (speaking in Spanish)
[00:33:18.000 --> 00:33:27.000] (speaking in Spanish)
[00:33:28.000 --> 00:33:31.000] (speaking in Spanish)
[00:33:31.000 --> 00:33:45.000] (speaking in Spanish)
[00:33:45.000 --> 00:33:48.000] (speaking in Spanish)

model=small, , --max-len 1
[00:01:32.000 --> 00:01:32.580] Much
[00:01:32.580 --> 00:01:32.840] os
[00:01:32.840 --> 00:01:33.850] gracias
[00:01:33.850 --> 00:01:34.130] ,
[00:01:34.130 --> 00:01:34.950] amigos
[00:01:34.950 --> 00:01:35.000] .
[00:01:35.000 --> 00:01:35.160]
[00:01:35.160 --> 00:01:35.570] Hola
[00:01:35.570 --> 00:01:35.870] ,
[00:01:35.870 --> 00:01:36.560] amigo
[00:01:36.560 --> 00:01:37.000] .
[00:01:37.000 --> 00:01:37.000] ¿
[00:01:37.000 --> 00:01:37.000] Cómo
[00:01:37.000 --> 00:01:37.000] estás
[00:01:37.000 --> 00:01:37.000] ?
[00:01:37.000 --> 00:01:37.160] Ma
[00:01:37.160 --> 00:01:37.650] ñana
[00:01:37.650 --> 00:01:37.770] ir
[00:01:37.770 --> 00:01:37.940] é
[00:01:37.940 --> 00:01:38.020] a
[00:01:38.020 --> 00:01:38.190] la
[00:01:38.190 --> 00:01:38.710] ciudad
[00:01:38.710 --> 00:01:39.030] .

model=medium
[00:02:36.480 --> 00:02:37.960] [SPEAKING SPANISH]
[00:02:37.960 --> 00:02:40.960] [SPEAKING SPANISH]
[00:02:40.960 --> 00:02:41.960] [SPEAKING SPANISH]
[00:02:41.960 --> 00:02:42.960] [SPEAKING SPANISH]

model=large, --max-len 1
[00:01:32.000 --> 00:01:34.010] [
[00:01:34.010 --> 00:01:36.000] S
[00:01:36.000 --> 00:01:42.010] pan
[00:01:42.010 --> 00:01:47.980] ish
[00:01:47.980 --> 00:01:50.000] ]

I see a lot of similar cases.
Love Whisper CPP,
Piotr

jhormigo · 2023-04-04T20:13:52Z

jhormigo
Apr 4, 2023

For me it worked with the large one.

You'll want to also add the -l es flagging to avoid default translation.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Option --translate doesn't work in most places. And when it works, there is no info about. #690

{{title}}

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Option --translate doesn't work in most places. And when it works, there is no info about. #690

pcbua Mar 30, 2023

Replies: 1 comment

jhormigo Apr 4, 2023

pcbua
Mar 30, 2023

jhormigo
Apr 4, 2023