Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement -float-stream option for whisper-cli #2741

Open
haraldrudell opened this issue Jan 14, 2025 · 0 comments
Open

Implement -float-stream option for whisper-cli #2741

haraldrudell opened this issue Jan 14, 2025 · 0 comments

Comments

@haraldrudell
Copy link

haraldrudell commented Jan 14, 2025

suggestion is to Implement -float-stream option for whisper-cli simplifying and enabling infinite streams

The simplest ingestion is by raw PCM samples read from standard input of a convention format

ffmpeg -hide_banner -i i.mp4 -ac 1 -ar 16k -f f32le - |
whisper-cli -float-stream --model ggml-large-v3-turbo.bin… | myTextConsumer

a raw PCM stream is easily produced by upstream software enabling mixing, file sequencing and infinite streams

  • upstream effectively a media player
  • avoids the wav 32-bit header issue that caps file length at about 30 hours
  • eliminates sample-caching on local storage by ffmpeg and use of intermediate files

.

ffmpeg produces samples within 3 s for 5 h file which otherwise takes 10 minutes to create wav headers

.

whisper-cli ingestion reads 30 s of samples at a time until EOF

always 32-bit float little endian 16 kHz

  • maximum simplicity of whisper-cli
  • whisper-cli flips bytes on rare big endian hardware
  • little endian is typical wav format
  • option: multiple channels for diarize

.

ffmpeg is unavoidable to allow for any file format to be output in specific sample format

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant