WhisperXAPI

WhisperXAPI is a real-time transcription and speaker diarization API built using OpenAI's Whisper, pyannote-audio, and FastAPI. The API supports both real-time audio streaming from a user's microphone and file uploads for transcription. It provides speaker-labeled transcriptions that are processed in real-time or from uploaded audio files.

Features

Real-time Transcription: Stream audio directly from a microphone for on-the-fly transcription.
Speaker Diarization: Detect and label multiple speakers in the audio.
File Upload: Upload audio files for batch transcription with speaker labels.
GPU Support: Leverages GPU acceleration for faster processing (if available).
Asynchronous API: Built with FastAPI for handling multiple concurrent requests efficiently.

Getting Started

Prerequisites

Conda (Miniconda or Anaconda)
Python 3.8+
Git
FFmpeg (optional, for audio handling)

Installation

Clone the repository:

git clone https://github.com/your-username/whisperxapi.git
cd whisperxapi

Set up the Conda environment:

conda create -n whisperxapi_env python=3.8
conda activate whisperxapi_env

Install dependencies:

Install PyTorch with GPU support (if available):

conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch

For CPU-only installations, use:

conda install pytorch torchvision torchaudio cpuonly -c pytorch

Install FastAPI, WhisperX, and pyannote-audio:

pip install fastapi uvicorn
pip install git+https://github.com/m-bain/whisperX.git
pip install pyannote-audio

Optionally, install FFmpeg for handling audio processing:
```
conda install -c conda-forge ffmpeg
```

Running the API

Start the FastAPI server:
```
uvicorn main:app --reload
```
Access the API documentation: Once the server is running, navigate to http://127.0.0.1:8000/docs in your browser to interact with the API's Swagger-generated documentation.

API Endpoints

POST /transcribe/stream: Accepts audio streamed from the user's microphone and returns real-time transcription and diarization.
POST /transcribe/upload: Accepts uploaded audio files (e.g., WAV, MP3) for transcription and diarization.

Example Request for File Upload

You can use a tool like curl to test the file upload functionality:

curl -X 'POST' \
  'http://127.0.0.1:8000/transcribe/upload' \
  -H 'accept: application/json' \
  -H 'Content-Type: multipart/form-data' \
  -F 'audio_file=@path_to_audio_file.wav'

Usage

WhisperXAPI can be used for various applications that require transcription and diarization, such as:

Real-time transcription for meetings, lectures, or interviews.
Automatic content creation for podcasts, videos, and other media.
Customer support transcription for call centers.

Project Structure

whisperxapi/
├── main.py            # Main FastAPI application
├── test_env.py        # Environment test script
├── requirements.txt   # Dependencies list
└── README.md          # Project documentation

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WhisperXAPI

Features

Getting Started

Prerequisites

Installation

Running the API

API Endpoints

Example Request for File Upload

Usage

Project Structure

License

About

Releases

Packages

License

jurgenskrause/WhisperXAPI

Folders and files

Latest commit

History

Repository files navigation

WhisperXAPI

Features

Getting Started

Prerequisites

Installation

Running the API

API Endpoints

Example Request for File Upload

Usage

Project Structure

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages