diff --git a/README.md b/README.md index 0b0c502..07cd8bc 100644 --- a/README.md +++ b/README.md @@ -15,6 +15,37 @@ See the [WhisperX Documentation](https://github.com/m-bain/whisperX) for details - `.env` contains definition of logging level using `LOG_LEVEL`, if not defined **DEBUG** is used in development and **INFO** in production - `.env` contains definition of environment using `ENVIRONMENT`, if not defined **production** is used - `.env` contains a boolean `DEV` to indicate if the environment is development, if not defined **true** is used +- `.env` contains a boolean `FILTER_WARNING` to enable or disable filtering of specific warnings, if not defined **true** is used + +### Supported File Formats + +#### Audio Files + +- `.oga`, `.m4a`, `.aac`, `.wav`, `.amr`, `.wma`, `.awb`, `.mp3`, `.ogg` + +#### Video Files + +- `.wmv`, `.mkv`, `.avi`, `.mov`, `.mp4` + +### Available Services + +1. Speech-to-Text (`/speech-to-text`) + - Upload audio/video files for transcription + - Supports multiple languages and Whisper models + +2. Speech-to-Text URL (`/speech-to-text-url`) + - Transcribe audio/video from URLs + - Same features as direct upload + +3. Individual Services: + - Transcribe (`/service/transcribe`): Convert speech to text + - Align (`/service/align`): Align transcript with audio + - Diarize (`/service/diarize`): Speaker diarization + - Combine (`/service/combine`): Merge transcript with diarization + +4. Task Management: + - Get all tasks (`/task/all`) + - Get task status (`/task/{identifier}`) ### Task management and result storage @@ -28,6 +59,39 @@ See documentation for driver definition at [Sqlalchemy Engine configuration](htt Structure of the of the db is described in [DB Schema](app/docs/db_schema.md) +### Compute Settings + +Configure compute options in `.env`: + +- `DEVICE`: Device for inference (`cuda` or `cpu`, default: `cuda`) +- `COMPUTE_TYPE`: Computation type (`float16`, `float32`, `int8`, default: `float16`) + > Note: When using CPU, `COMPUTE_TYPE` must be set to `int8` + +### Available Models + +WhisperX supports these model sizes: + +- `tiny`, `tiny.en` +- `base`, `base.en` +- `small`, `small.en` +- `medium`, `medium.en` +- `large`, `large-v1`, `large-v2`, `large-v3` + +Note: `large-v3-turbo` is not yet supported by WhisperX. + +Set default model in `.env` using `WHISPER_MODEL=` (default: tiny) + +## System Requirements + +- Docker with GPU support (nvidia-docker) +- NVIDIA GPU with CUDA support +- At least 8GB RAM (16GB+ recommended for large models) +- Storage space for models (varies by model size): + - tiny/base: ~1GB + - small: ~2GB + - medium: ~5GB + - large: ~10GB + ## Getting Started ### Local Run @@ -115,6 +179,48 @@ The models used by whisperX are stored in `root/.cache`, if you want to avoid do - faster-whisper cache: `root/.cache/huggingface/hub` - pyannotate and other models cache: `root/.cache/torch` +## Known Issues + +1. **ctranslate2 Compatibility** + +- Only `ctranslate2==4.4.0` is supported due to CUDA compatibility issues with CTranslate2, as newer versions have different CUDA requirements . + +2. **faster-whisper Compatibility** + +- Only `faster-whisper==1.0.0` is supported due to compatibility issues with WhisperX. + +## Troubleshooting + +### Common Issues + +1. **Environment Variables Not Loaded** + - Ensure your `.env` file is correctly formatted and placed in the root directory. + - Verify that all required environment variables are defined. + +2. **Database Connection Issues** + - Check the `DB_URL` environment variable for correctness. + - Ensure the database server is running and accessible. + +3. **Model Download Failures** + - Verify your internet connection. + - Ensure the `HF_TOKEN` is correctly set in the `.env` file. + +4. **GPU Not Detected** + - Ensure NVIDIA drivers and CUDA are correctly installed. + - Verify that Docker is configured to use the GPU (`nvidia-docker`). + +5. **Warnings Not Filtered** + - Ensure the `FILTER_WARNING` environment variable is set to `true` in the `.env` file. + +### Logs and Debugging + +- Check the logs for detailed error messages. +- Use the `LOG_LEVEL` environment variable to set the appropriate logging level (`DEBUG`, `INFO`, `WARNING`, `ERROR`). + +### Support + +For further assistance, please open an issue on the [GitHub repository](https://github.com/pavelzbornik/whisperX-FastAPI/issues). + ## Related - [ahmetoner/whisper-asr-webservice](https://github.com/ahmetoner/whisper-asr-webservice)