This project is a transcription app built using the Faster Whisper model, which transcribes audio and video files into text. It is powered by Gradio for a user-friendly web interface and supports audio or video file uploads for transcription.
- 🎧 Transcribe both audio and video files (e.g., MP3, MP4, AVI, etc.)
- ⚖️ Supports multiple model sizes for performance vs. accuracy balance
- 🚀 GPU support for faster transcription using CUDA
- 🎥 Extracts audio from video files automatically
- 🔍 High-precision transcription with options for beam search and other configurations
- 🖥️ Simple UI built with Gradio for easy access and use
- ⬇️ Download the transcript in
.txt
format - 🎛️ Tuning the model parameters via the interface
💻 You can try the Colab version here (remember to select GPU in 'Runtime Type' for faster execution ⚡)
- 🖥️ MAC/AMD Support (if you have one of those contact me 📩)
- You tell me! 🙂
- 🐍 Python 3.11+
- 🔥 PyTorch (CUDA version + CUDA Toolkit if using GPU)
- 🎬 FFmpeg (must be installed and added to your system's PATH)
- 🖼️ Gradio
git clone https://github.com/CorsiDanilo/whisper-utility.git
python3 -m venv .venv
source .venv/bin/activate # On Windows, use `.venv\Scripts\activate`
pip install -r requirements.py
- 🐧 Linux: Install via your package manager (e.g.,
sudo apt install ffmpeg
) - 🍎 macOS: Install via Homebrew (
brew install ffmpeg
) - 🖥️ Windows: Download FFmpeg and add it to your system's PATH.
- Follow this guide to ensure it's in your system's PATH.
- Download and install PyTorch, in
Compute Platform
select the last version. - Download and install CUDA Toolkit.
- 🐧 Linux: follow this guide.
- 🖥️ Windows: follow this guide.
Run the application:
python whisper.py
The Gradio interface will open in your default web browser. From there, you can upload an audio or video file, and the transcription will be displayed.
💡 REMEMBER: When you are done click Clear and Close
if you want to clean up the temporary files folder.
- Upload an audio or video file: Accepts audio formats like MP3, WAV, and video formats like MP4, AVI.
- Transcribe: Click this button to start the transcription process.
- Close and Clear: This button clears the folder where the file was temporarily stored and closes the application.
- Language: Set the transcription language. Default is Italian 🇮🇹 (
it
), but you can change it to English 🇬🇧 (en
) or other languages. - Model Size: By default, the large version of the Whisper model is used (
large-v3
), but you can switch tosmall-v3
for smaller, faster models. - Device: The model automatically selects the device based on GPU availability (
cuda
orcpu
). - Beam Size: Set beam size for decoding. Default is
4
, but you can reduce it to1
for faster inference.
- If you get the following error:
Download the missing dll from here and put it into the
Could not locate cudnn_ops_infer64_8.dll. Please make sure it is in your library path!
bin
folder of yourCUDA
installation folder.- 🗂️ The usual path is:
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\vXX.X\bin
.
- 🗂️ The usual path is:
This project is licensed under the MIT License. See the LICENSE file for details.
- Faster Whisper by Guillaume Klein
- Gradio for the UI interface