feat: update README with new environment variables, supported file fo…

…rmats, compute settings, and troubleshooting guidelines
pavelzbornik · Nov 23, 2024 · 70ed509 · 70ed509
1 parent daf32ee
commit 70ed509
Showing 1 changed file with 106 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -15,6 +15,37 @@ See the [WhisperX Documentation](https://github.com/m-bain/whisperX) for details
 - `.env` contains definition of logging level using `LOG_LEVEL`, if not defined **DEBUG** is used in development and **INFO** in production
 - `.env` contains definition of environment using `ENVIRONMENT`, if not defined **production** is used
 - `.env` contains a boolean `DEV` to indicate if the environment is development, if not defined **true** is used
+- `.env` contains a boolean `FILTER_WARNING` to enable or disable filtering of specific warnings, if not defined **true** is used
+
+### Supported File Formats
+
+#### Audio Files
+
+- `.oga`, `.m4a`, `.aac`, `.wav`, `.amr`, `.wma`, `.awb`, `.mp3`, `.ogg`
+
+#### Video Files
+
+- `.wmv`, `.mkv`, `.avi`, `.mov`, `.mp4`
+
+### Available Services
+
+1. Speech-to-Text (`/speech-to-text`)
+   - Upload audio/video files for transcription
+   - Supports multiple languages and Whisper models
+
+2. Speech-to-Text URL (`/speech-to-text-url`)
+   - Transcribe audio/video from URLs
+   - Same features as direct upload
+
+3. Individual Services:
+   - Transcribe (`/service/transcribe`): Convert speech to text
+   - Align (`/service/align`): Align transcript with audio
+   - Diarize (`/service/diarize`): Speaker diarization
+   - Combine (`/service/combine`): Merge transcript with diarization
+
+4. Task Management:
+   - Get all tasks (`/task/all`)
+   - Get task status (`/task/{identifier}`)
 
 ### Task management and result storage
 
@@ -28,6 +59,39 @@ See documentation for driver definition at [Sqlalchemy Engine configuration](htt
 
 Structure of the of the db is described in [DB Schema](app/docs/db_schema.md)
 
+### Compute Settings
+
+Configure compute options in `.env`:
+
+- `DEVICE`: Device for inference (`cuda` or `cpu`, default: `cuda`)
+- `COMPUTE_TYPE`: Computation type (`float16`, `float32`, `int8`, default: `float16`)
+    > Note: When using CPU, `COMPUTE_TYPE` must be set to `int8`
+
+### Available Models
+
+WhisperX supports these model sizes:
+
+- `tiny`, `tiny.en`
+- `base`, `base.en`
+- `small`, `small.en`
+- `medium`, `medium.en`
+- `large`, `large-v1`, `large-v2`, `large-v3`
+
+Note: `large-v3-turbo` is not yet supported by WhisperX.
+
+Set default model in `.env` using `WHISPER_MODEL=` (default: tiny)
+
+## System Requirements
+
+- Docker with GPU support (nvidia-docker)
+- NVIDIA GPU with CUDA support
+- At least 8GB RAM (16GB+ recommended for large models)
+- Storage space for models (varies by model size):
+  - tiny/base: ~1GB
+  - small: ~2GB
+  - medium: ~5GB
+  - large: ~10GB
+
 ## Getting Started
 
 ### Local Run
@@ -115,6 +179,48 @@ The models used by whisperX are stored in `root/.cache`, if you want to avoid do
 - faster-whisper cache: `root/.cache/huggingface/hub`
 - pyannotate and other models cache: `root/.cache/torch`
 
+## Known Issues
+
+1. **ctranslate2 Compatibility**
+
+- Only `ctranslate2==4.4.0` is supported due to CUDA compatibility issues with CTranslate2, as newer versions have different CUDA requirements <https://github.com/SYSTRAN/faster-whisper/issues/1086>.
+
+2. **faster-whisper Compatibility**
+
+- Only `faster-whisper==1.0.0` is supported due to compatibility issues with WhisperX.
+
+## Troubleshooting
+
+### Common Issues
+
+1. **Environment Variables Not Loaded**
+   - Ensure your `.env` file is correctly formatted and placed in the root directory.
+   - Verify that all required environment variables are defined.
+
+2. **Database Connection Issues**
+   - Check the `DB_URL` environment variable for correctness.
+   - Ensure the database server is running and accessible.
+
+3. **Model Download Failures**
+   - Verify your internet connection.
+   - Ensure the `HF_TOKEN` is correctly set in the `.env` file.
+
+4. **GPU Not Detected**
+   - Ensure NVIDIA drivers and CUDA are correctly installed.
+   - Verify that Docker is configured to use the GPU (`nvidia-docker`).
+
+5. **Warnings Not Filtered**
+   - Ensure the `FILTER_WARNING` environment variable is set to `true` in the `.env` file.
+
+### Logs and Debugging
+
+- Check the logs for detailed error messages.
+- Use the `LOG_LEVEL` environment variable to set the appropriate logging level (`DEBUG`, `INFO`, `WARNING`, `ERROR`).
+
+### Support
+
+For further assistance, please open an issue on the [GitHub repository](https://github.com/pavelzbornik/whisperX-FastAPI/issues).
+
 ## Related
 
 - [ahmetoner/whisper-asr-webservice](https://github.com/ahmetoner/whisper-asr-webservice)