Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(Whisper-alfa): Add Whisper processor #399

Merged
merged 2 commits into from
Sep 8, 2024
Merged

feat(Whisper-alfa): Add Whisper processor #399

merged 2 commits into from
Sep 8, 2024

Conversation

n0th1ng-else
Copy link
Owner

In this commit we establish the early version of the Whisper processor. We need to check how to run it safely so it does not kill the process, we also need to estimate the memory consumption is well below under the limits of our low-resource availability model. Not to mention we want to probably have a Whisper as a singleton and we need to somehow download the recognition model - it take a lot of space so we can not just store it in the repository

We do not enable Whisper as of now due to the challenges mentioned above, But we also change the way we transform the voice file. Instead of prism-media we use fluent-ffmpeg and we now always dump the wav file into the file system as well. This is the requirement for the Whisper model but I also want to unify the interface and stop having different processing flows for audio and video. This change will be enabled with the release

In this commit we establish the early version of the Whisper processor.
We need to check how to run it safely so it does not kill the process,
we also need to estimate the memory consumption is well below under the
limits of our low-resource availability model. Not to mention we want to
probably have a Whisper as a singleton and we need to somehow download
the recognition model - it take a lot of space so we can not just store
it in the repository

We do not enable Whisper as of now due to the challenges mentioned above,
But we also change the way we transform the voice file. Instead of
prism-media we use fluent-ffmpeg and we now always dump the wav file into
the file system as well. This is the requirement for the Whisper model
but I also want to unify the interface and stop having different
processing flows for audio and video. This change will be enabled with
the release
Copy link

sonarcloud bot commented Sep 8, 2024

@n0th1ng-else n0th1ng-else merged commit dd45212 into master Sep 8, 2024
9 checks passed
@n0th1ng-else n0th1ng-else deleted the offline branch September 8, 2024 15:36
Copy link

github-actions bot commented Sep 8, 2024

🎉 This PR is included in version 4.30.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant