Skip to content

Commit

Permalink
feat: update README with new environment variables, supported file fo…
Browse files Browse the repository at this point in the history
…rmats, compute settings, and troubleshooting guidelines
  • Loading branch information
pavelzbornik committed Nov 23, 2024
1 parent daf32ee commit 70ed509
Showing 1 changed file with 106 additions and 0 deletions.
106 changes: 106 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,37 @@ See the [WhisperX Documentation](https://github.com/m-bain/whisperX) for details
- `.env` contains definition of logging level using `LOG_LEVEL`, if not defined **DEBUG** is used in development and **INFO** in production
- `.env` contains definition of environment using `ENVIRONMENT`, if not defined **production** is used
- `.env` contains a boolean `DEV` to indicate if the environment is development, if not defined **true** is used
- `.env` contains a boolean `FILTER_WARNING` to enable or disable filtering of specific warnings, if not defined **true** is used

### Supported File Formats

#### Audio Files

- `.oga`, `.m4a`, `.aac`, `.wav`, `.amr`, `.wma`, `.awb`, `.mp3`, `.ogg`

#### Video Files

- `.wmv`, `.mkv`, `.avi`, `.mov`, `.mp4`

### Available Services

1. Speech-to-Text (`/speech-to-text`)
- Upload audio/video files for transcription
- Supports multiple languages and Whisper models

2. Speech-to-Text URL (`/speech-to-text-url`)
- Transcribe audio/video from URLs
- Same features as direct upload

3. Individual Services:
- Transcribe (`/service/transcribe`): Convert speech to text
- Align (`/service/align`): Align transcript with audio
- Diarize (`/service/diarize`): Speaker diarization
- Combine (`/service/combine`): Merge transcript with diarization

4. Task Management:
- Get all tasks (`/task/all`)
- Get task status (`/task/{identifier}`)

### Task management and result storage

Expand All @@ -28,6 +59,39 @@ See documentation for driver definition at [Sqlalchemy Engine configuration](htt

Structure of the of the db is described in [DB Schema](app/docs/db_schema.md)

### Compute Settings

Configure compute options in `.env`:

- `DEVICE`: Device for inference (`cuda` or `cpu`, default: `cuda`)
- `COMPUTE_TYPE`: Computation type (`float16`, `float32`, `int8`, default: `float16`)
> Note: When using CPU, `COMPUTE_TYPE` must be set to `int8`
### Available Models

WhisperX supports these model sizes:

- `tiny`, `tiny.en`
- `base`, `base.en`
- `small`, `small.en`
- `medium`, `medium.en`
- `large`, `large-v1`, `large-v2`, `large-v3`

Note: `large-v3-turbo` is not yet supported by WhisperX.

Set default model in `.env` using `WHISPER_MODEL=` (default: tiny)

## System Requirements

- Docker with GPU support (nvidia-docker)
- NVIDIA GPU with CUDA support
- At least 8GB RAM (16GB+ recommended for large models)
- Storage space for models (varies by model size):
- tiny/base: ~1GB
- small: ~2GB
- medium: ~5GB
- large: ~10GB

## Getting Started

### Local Run
Expand Down Expand Up @@ -115,6 +179,48 @@ The models used by whisperX are stored in `root/.cache`, if you want to avoid do
- faster-whisper cache: `root/.cache/huggingface/hub`
- pyannotate and other models cache: `root/.cache/torch`
## Known Issues
1. **ctranslate2 Compatibility**
- Only `ctranslate2==4.4.0` is supported due to CUDA compatibility issues with CTranslate2, as newer versions have different CUDA requirements <https://github.com/SYSTRAN/faster-whisper/issues/1086>.
2. **faster-whisper Compatibility**
- Only `faster-whisper==1.0.0` is supported due to compatibility issues with WhisperX.
## Troubleshooting
### Common Issues
1. **Environment Variables Not Loaded**
- Ensure your `.env` file is correctly formatted and placed in the root directory.
- Verify that all required environment variables are defined.
2. **Database Connection Issues**
- Check the `DB_URL` environment variable for correctness.
- Ensure the database server is running and accessible.
3. **Model Download Failures**
- Verify your internet connection.
- Ensure the `HF_TOKEN` is correctly set in the `.env` file.
4. **GPU Not Detected**
- Ensure NVIDIA drivers and CUDA are correctly installed.
- Verify that Docker is configured to use the GPU (`nvidia-docker`).
5. **Warnings Not Filtered**
- Ensure the `FILTER_WARNING` environment variable is set to `true` in the `.env` file.
### Logs and Debugging
- Check the logs for detailed error messages.
- Use the `LOG_LEVEL` environment variable to set the appropriate logging level (`DEBUG`, `INFO`, `WARNING`, `ERROR`).
### Support
For further assistance, please open an issue on the [GitHub repository](https://github.com/pavelzbornik/whisperX-FastAPI/issues).
## Related
- [ahmetoner/whisper-asr-webservice](https://github.com/ahmetoner/whisper-asr-webservice)
Expand Down

0 comments on commit 70ed509

Please sign in to comment.