WIP Very rough cut of streaming from stdin. #1823

regularfry · 2024-02-01T21:17:19Z

This is a feature I've been wanting for a while, but haven't seen go past anywhere else. It allows streaming raw audio from stdin. It's different from the stdin support in main because it doesn't need to slurp the entire stream before processing it. That means you can (for instance) use ffmpeg to pipe audio from the network straight into stream-stdin without knowing how long the stream is going to be.

There are naturally a couple of trade-offs to this. Because it's raw audio, there's no metadata to tell it what the audio format is. Right now it must be 16kHz mono pcm_s16le. The trade-off is that it doesn't need to be compiled against SDL, and it doesn't need to know anything about the wav file format so dr_wav.h isn't needed either.

Given an input wav file, you might want to try it with a command like:

$ ffmpeg -i capture.wav -acodec pcm_s16le -f s16le -ac 1 -ar 16000 - | ./stream-stdin -m ./models/ggml-base.en.bin

Implementation-wise this is very rough: I've basically copy and pasted examples/stream.cpp, and written something that's got a similar enough interface to audio_async to not need too many changes. I'm very much aware that it's not exactly in a state where it would want to be merged, so what I would like is some indication either way as to whether it's worth my doing the refactoring work to share the streaming consumption code and get rid of the copy/paste duplication.

bobqianic · 2024-02-01T23:05:33Z

Welcome back @regularfry!

examples/CMakeLists.txt

examples/stream-stdin/CMakeLists.txt

ggerganov

Very cool! This is a very useful example to have

I'm very much aware that it's not exactly in a state where it would want to be merged, so what I would like is some indication either way as to whether it's worth my doing the refactoring work to share the streaming consumption code and get rid of the copy/paste duplication.

Yes, if you can figure out a way to reduce the copy-paste would be nice. If it turns out to be too complicated or too much effort, we can probably just merge it as it is, despite the copy-paste

shanelenagh · 2024-02-06T19:18:09Z

I love the simplicity of this, and I use ffmpeg for many of my workflows (e.g., rtsp publishing of a USB condenser mic source in my daughter's nursery), so I can appreciate the flexibility and minimalism of this approach. I would love to find a way to abstract out some of the common code with the SDL stream example (stream.cpp could be largely unchanged, and the audio_async is 90% the same as well) as I ran into the same thing with my gRPC PR, which I am thinking I could use this abstraction interface to more efficiently do my gRPC work. But of course, those abstractions require time and effort to tease out. :-) Let me know if you would be open to a collaborator on that, @regularfry

shanelenagh · 2024-02-15T04:29:05Z

Early on it felt a little bit like I was dodging lasers and and stretching to find "common" code for these two versions (like the bad old days of OOP, where everyone was attempting to use inheritance where it didn't fit), but I think I have a sensibly factored out and pushed up an abstract base class using templates (int16 vs float buffers, with float always being the "result" output, of course) that I use for both the common-sdl and audio-stdin versions--now I just need to bring together the two versions of stream.cpp into one that has a CLI param for either "stdin" or the old/existing SDL source: shanelenagh@59a1906

Using this makes the SDL version largely contain just SDL specific code, and the stdin version has this fairly short implementation:

audio_stdin::audio_stdin(int len_ms) : audio_async(len_ms) { }

audio_stdin::~audio_stdin() {
  // Nothing to do here, we don't own m_fd
}

bool audio_stdin::init(whisper_params params, int sample_rate) {

  audio_async::init(params, sample_rate);
  m_audio.resize((m_sample_rate*m_len_ms)/1000);

  return true;
}

void audio_stdin::get(int ms, std::vector<float> & result) {

    if (!m_running) {
        fprintf(stderr, "%s: not running!\n", __func__);
        return;
    }

    result.clear();

    {
        std::lock_guard<std::mutex> lock(m_mutex);

        if (ms <= 0) {
            ms = m_len_ms;
        }

        size_t n_samples = (m_sample_rate * ms) / 1000;

        assert(n_samples <= m_audio.size()/sizeof(int16_t));
        // stdin is PCM mono 16khz in s16le format.  Use ffmpeg to make that happen.
        int nread = read(STDIN_FILENO, m_audio.data(), n_samples*sizeof(int16_t) /*m_in_buffer.size()*/);
        if (nread <= 0) { 
          m_running = false;
          return; 
        } 
        transfer_buffer(result, 0, nread / sizeof(int16_t));
    }
}

bnolan · 2024-04-09T00:46:16Z

I've got this running on mac with this command:

sox -d -c1 -b16 -e signed -L -traw -r16000 - | ./stream-stdin

It doesn't work well (stream works perfectly), i'm trying a few options to see if I can get it running better.

main: processing 48000 samples (step = 3.0 sec / len = 10.0 sec / keep = 0.2 sec), 4 threads, lang = en, task = transcribe, timestamps = 0 ...
main: n_new_line = 2, no_context = 1

[Start speaking]
 [BLANK_AUDIO]
 the question
 Brown
 Box. Jump.
 over
 glaze, z dot,
 Oh.
 They Quick
 I can be around.
 Fox jump Folks jump
 Verducks jump, glaze jump.
 The docks jump. Vlogs jump.
 Next jump. Next jump.
 Quick jump. Brows jump.
 and fox jump Fox jump
 jumps jumps and works jump
 Overlooks jump. Lays jump.
 D docs jump
 The quick round. The quick round.
 for the quick the quick
 Jumped the quick. Go the quick.
 The quick. The late. The quick.
 Easy done. The quick. Oh, the quick.
In:0.00% 00:00:13.82 [00:00:00.00] Out:220k  [      |      ]        Clip:0    ^C
Aborted.

muety · 2024-06-08T04:21:20Z

I first came across @shanelenagh awesome gRPC streaming example before I found this PR, which seems like an even simpler and more straightforward solution and would perfectly fit my needs as well. Thanks a lot! Would love to see this getting its final polish and be merged then. 🙌

openaudible · 2024-08-29T22:38:24Z

What's the status on this? I have an audiobook player that I stream 2 channel pcm16s (16000hz) data to the audio device. Would be fun to send the data to stream-stdin and get transcription results.

What's the latest source code? https://github.com/regularfry/whisper.cpp/tree/stream-from-stdin

Unfortunately I'm a little rusty on c++ make..

Update, I was able to compile using "make stream-stdin" and had to add #include <unistd.h> to audio-stdin.cpp.

I converted an audio file to test.pcm (using the ffmpeg example in the readme) and can "cat test.pcm | ./stream-stdin" and it plays the first 3 words of the book. Then stops without error and prints the whisper_print_timings.

So super cool. I would need to support 2 channels.. and not quit.. but otherwise it looks promising.. Will keep playing with it and update this comment.

regularfry · 2024-08-30T13:42:38Z

I've not had an opportunity to loop back to it in a while, and it's going to be hideously behind master. That being said, I do still want to follow it through, and the idea of abstracting the streaming mechanism has to be right...

openaudible · 2024-08-30T16:32:55Z

Thanks for checking in. I thought I updated my comment -- but was able to compile and get it to work. Just added:

#ifdef _WIN32
    _setmode(_fileno(stdin), _O_BINARY);
#endif

I'm adding an -ch 2 command line argument for two channel data, which is what I'm sending to the speakers.
EDIT: After looking further, I'm sending 44100 sample rate to the speaker.. will need to downsample anyway.. so mono is fine!

Will keep playing with your code.

muety · 2025-02-04T07:18:05Z

Any chances to get this merged soon? 👀

regularfry added 2 commits February 1, 2024 20:58

Very rough cut of streaming from stdin.

aa9370f

Fix a bad command in the README.md

1a5cf60

bobqianic requested changes Feb 3, 2024

View reviewed changes

examples/CMakeLists.txt Outdated Show resolved Hide resolved

examples/stream-stdin/CMakeLists.txt Show resolved Hide resolved

bobqianic added 4 commits February 3, 2024 00:22

Update examples/CMakeLists.txt

fff48c0

Update examples/stream-stdin/CMakeLists.txt

64d54b4

Update CMakeLists.txt

a308108

Update CMakeLists.txt

ad14ed0

ggerganov reviewed Feb 5, 2024

View reviewed changes

ggerganov mentioned this pull request Feb 6, 2024

Bidirectional gRPC streaming (async) transcription #1833

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP Very rough cut of streaming from stdin. #1823

WIP Very rough cut of streaming from stdin. #1823

regularfry commented Feb 1, 2024

bobqianic commented Feb 1, 2024

ggerganov left a comment

shanelenagh commented Feb 6, 2024

shanelenagh commented Feb 15, 2024

bnolan commented Apr 9, 2024

muety commented Jun 8, 2024

openaudible commented Aug 29, 2024 •

edited

Loading

regularfry commented Aug 30, 2024

openaudible commented Aug 30, 2024 •

edited

Loading

muety commented Feb 4, 2025

WIP Very rough cut of streaming from stdin. #1823

Are you sure you want to change the base?

WIP Very rough cut of streaming from stdin. #1823

Conversation

regularfry commented Feb 1, 2024

bobqianic commented Feb 1, 2024

ggerganov left a comment

Choose a reason for hiding this comment

shanelenagh commented Feb 6, 2024

shanelenagh commented Feb 15, 2024

bnolan commented Apr 9, 2024

muety commented Jun 8, 2024

openaudible commented Aug 29, 2024 • edited Loading

regularfry commented Aug 30, 2024

openaudible commented Aug 30, 2024 • edited Loading

muety commented Feb 4, 2025

openaudible commented Aug 29, 2024 •

edited

Loading

openaudible commented Aug 30, 2024 •

edited

Loading