Skip to content

This project is a transcription app built using the Faster Whisper model, which transcribes audio and video files into text. It is powered by Gradio for a user-friendly web interface and supports audio or video file uploads for transcription.

License

Notifications You must be signed in to change notification settings

CorsiDanilo/Whisper-Audio-Video-Transcription-App

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🎙️ Whisper Audio/Video Transcription App

demo

📝 Description

This project is a transcription app built using the Faster Whisper model, which transcribes audio and video files into text. It is powered by Gradio for a user-friendly web interface and supports audio or video file uploads for transcription.

✨ Features

  • 🎧 Transcribe both audio and video files (e.g., MP3, MP4, AVI, etc.)
  • ⚖️ Supports multiple model sizes for performance vs. accuracy balance
  • 🚀 GPU support for faster transcription using CUDA
  • 🎥 Extracts audio from video files automatically
  • 🔍 High-precision transcription with options for beam search and other configurations
  • 🖥️ Simple UI built with Gradio for easy access and use
  • ⬇️ Download the transcript in .txt format
  • 🎛️ Tuning the model parameters via the interface

Demo

💻 You can try the Colab version here (remember to select GPU in 'Runtime Type' for faster execution ⚡)

🛠️ To do/fix

  • 🖥️ MAC/AMD Support (if you have one of those contact me 📩)
  • You tell me! 🙂

📋 Requirements

📦 Installation

Step 1: Clone the repository

git clone https://github.com/CorsiDanilo/whisper-utility.git

Step 2: Set up a virtual environment (optional but recommended):

python3 -m venv .venv
source .venv/bin/activate  # On Windows, use `.venv\Scripts\activate`

Step 3: Install the required dependencies

pip install -r requirements.py

Step 4: Install FFmpeg (if not already installed):

  • 🐧 Linux: Install via your package manager (e.g., sudo apt install ffmpeg)
  • 🍎 macOS: Install via Homebrew (brew install ffmpeg)
  • 🖥️ Windows: Download FFmpeg and add it to your system's PATH.
    • Follow this guide to ensure it's in your system's PATH.

(OPTIONAL) Step 5: Install Pythorch and CUDA Toolkit for NVIDIA GPU

🚀 Usage

Run the application:

python whisper.py 

The Gradio interface will open in your default web browser. From there, you can upload an audio or video file, and the transcription will be displayed.

💡 REMEMBER: When you are done click Clear and Close if you want to clean up the temporary files folder.

🎛️ Interface Guide

  • Upload an audio or video file: Accepts audio formats like MP3, WAV, and video formats like MP4, AVI.
  • Transcribe: Click this button to start the transcription process.
  • Close and Clear: This button clears the folder where the file was temporarily stored and closes the application.

⚙️ Model Configuration

  • Language: Set the transcription language. Default is Italian 🇮🇹 (it), but you can change it to English 🇬🇧 (en) or other languages.
  • Model Size: By default, the large version of the Whisper model is used (large-v3), but you can switch to small-v3 for smaller, faster models.
  • Device: The model automatically selects the device based on GPU availability (cuda or cpu).
  • Beam Size: Set beam size for decoding. Default is 4, but you can reduce it to 1 for faster inference.

🛠️ Troubleshooting

  • If you get the following error:
    Could not locate cudnn_ops_infer64_8.dll. Please make sure it is in your library path!
    
    Download the missing dll from here and put it into the bin folder of your CUDA installation folder.
    • 🗂️ The usual path is: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\vXX.X\bin.

📄 License

This project is licensed under the MIT License. See the LICENSE file for details.

🙏 Acknowledgments

About

This project is a transcription app built using the Faster Whisper model, which transcribes audio and video files into text. It is powered by Gradio for a user-friendly web interface and supports audio or video file uploads for transcription.

Resources

License

Stars

Watchers

Forks