Skip to content

Commit

Permalink
docs: update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
Fedir Zadniprovskyi committed May 26, 2024
1 parent ce5dbe5 commit f043430
Show file tree
Hide file tree
Showing 3 changed files with 64 additions and 38 deletions.
100 changes: 64 additions & 36 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,23 @@
## Faster Whisper Server
`faster-whisper-server` is a web server that supports real-time transcription using WebSockets.
- [faster-whisper](https://github.com/SYSTRAN/faster-whisper) is used as the backend. Both GPU and CPU inference are supported.
- LocalAgreement2 ([paper](https://aclanthology.org/2023.ijcnlp-demo.3.pdf) | [original implementation](https://github.com/ufal/whisper_streaming)) algorithm is used for real-time transcription.
- Can be deployed using Docker (Compose configuration can be found in [compose.yaml](./compose.yaml)).
- All configuration is done through environment variables. See [config.py](./faster_whisper_server/config.py).
- NOTE: only transcription of single channel, 16000 sample rate, raw, 16-bit little-endian audio is supported.
- NOTE: this isn't really meant to be used as a standalone tool but rather to add transcription features to other applications.
# Faster Whisper Server
`faster-whisper-server` is an OpenAI API compatible transcription server which uses [faster-whisper](https://github.com/SYSTRAN/faster-whisper) as it's backend.
Features:
- GPU and CPU support.
- Easily deployable using Docker.
- Configurable through environment variables (see [config.py](./faster_whisper_server/config.py)).
- OpenAI API compatible.

Please create an issue if you find a bug, have a question, or a feature suggestion.
# Quick Start

## OpenAI API Compatibility ++
See [OpenAI API reference](https://platform.openai.com/docs/api-reference/audio) for more information.
- Audio file transcription via `POST /v1/audio/transcriptions` endpoint.
- Unlike OpenAI's API, `faster-whisper-server` also supports streaming transcriptions(and translations). This is usefull for when you want to process large audio files would rather receive the transcription in chunks as they are processed rather than waiting for the whole file to be transcribe. It works in the similar way to chat messages are being when chatting with LLMs.
- Audio file translation via `POST /v1/audio/translations` endpoint.
- (WIP) Live audio transcription via `WS /v1/audio/transcriptions` endpoint.
- LocalAgreement2 ([paper](https://aclanthology.org/2023.ijcnlp-demo.3.pdf) | [original implementation](https://github.com/ufal/whisper_streaming)) algorithm is used for live transcription.
- Only transcription of single channel, 16000 sample rate, raw, 16-bit little-endian audio is supported.

## Quick Start
Using Docker
```bash
docker run --gpus=all --publish 8000:8000 --volume ~/.cache/huggingface:/root/.cache/huggingface fedirz/faster-whisper-server:cuda
Expand All @@ -17,42 +27,60 @@ docker run --publish 8000:8000 --volume ~/.cache/huggingface:/root/.cache/huggin
Using Docker Compose
```bash
curl -sO https://raw.githubusercontent.com/fedirz/faster-whisper-server/master/compose.yaml
docker compose up --detach up faster-whisper-server-cuda
docker compose up --detach faster-whisper-server-cuda
# or
docker compose up --detach up faster-whisper-server-cpu
docker compose up --detach faster-whisper-server-cpu
```
## Usage
Streaming audio data from a microphone. [websocat](https://github.com/vi/websocat?tab=readme-ov-file#installation) installation is required.
### OpenAI API CLI
```bash
ffmpeg -loglevel quiet -f alsa -i default -ac 1 -ar 16000 -f s16le - | websocat --binary ws://0.0.0.0:8000/v1/audio/transcriptions
# or
arecord -f S16_LE -c1 -r 16000 -t raw -D default 2>/dev/null | websocat --binary ws://0.0.0.0:8000/v1/audio/transcriptions
export OPENAI_API_KEY="cant-be-empty"
export OPENAI_BASE_URL=http://localhost:8000/v1/
```
```bash
openai api audio.transcriptions.create -m distil-medium.en -f audio.wav --response-format text

openai api audio.translations.create -m distil-medium.en -f audio.wav --response-format verbose_json
```
### OpenAI API Python SDK
```python
from openai import OpenAI

client = OpenAI(api_key="cant-be-empty", base_url="http://localhost:8000/v1/")

audio_file = open("audio.wav", "rb")
transcript = client.audio.transcriptions.create(
model="distil-medium.en", file=audio_file
)
print(transcript.text)
```

### CURL
```bash
# If `model` isn't specified, the default model is used
curl http://localhost:8000/v1/audio/transcriptions -F "file=@audio.wav"
curl http://localhost:8000/v1/audio/transcriptions -F "file=@audio.mp3"
curl http://localhost:8000/v1/audio/transcriptions -F "file=@audio.wav" -F "streaming=true"
curl http://localhost:8000/v1/audio/transcriptions -F "file=@audio.wav" -F "streaming=true" -F "model=distil-large-v3"
# It's recommended that you always specify the language as that will reduce the transcription time
curl http://localhost:8000/v1/audio/transcriptions -F "file=@audio.wav" -F "streaming=true" -F "model=distil-large-v3" -F "language=en"

curl http://localhost:8000/v1/audio/translations -F "file=@audio.wav"
```

### Live Transcription
[websocat](https://github.com/vi/websocat?tab=readme-ov-file#installation) installation is required.
Live transcribing audio data from a microphone.
```bash
ffmpeg -loglevel quiet -f alsa -i default -ac 1 -ar 16000 -f s16le - | websocat --binary ws://localhost:8000/v1/audio/transcriptions
```
Streaming audio data from a file.
```bash
ffmpeg -loglevel quiet -f alsa -i default -ac 1 -ar 16000 -f s16le - > output.raw
ffmpeg -loglevel quiet -f alsa -i default -ac 1 -ar 16000 -f s16le - > audio.raw
# send all data at once
cat output.raw | websocat --no-close --binary ws://0.0.0.0:8000/v1/audio/transcriptions
cat audio.raw | websocat --no-close --binary ws://localhost:8000/v1/audio/transcriptions
# Output: {"text":"One,"}{"text":"One, two, three, four, five."}{"text":"One, two, three, four, five."}%
# streaming 16000 samples per second. each sample is 2 bytes
cat output.raw | pv -qL 32000 | websocat --no-close --binary ws://0.0.0.0:8000/v1/audio/transcriptions
cat audio.raw | pv -qL 32000 | websocat --no-close --binary ws://localhost:8000/v1/audio/transcriptions
# Output: {"text":"One,"}{"text":"One, two,"}{"text":"One, two, three,"}{"text":"One, two, three, four, five."}{"text":"One, two, three, four, five. one."}%
```
Transcribing a file
```bash
# convert the file if it has a different format
ffmpeg -i output.wav -ac 1 -ar 16000 -f s16le output.raw
curl -X POST -F "file=@output.raw" http://0.0.0.0:8000/v1/audio/transcriptions
# Output: "{\"text\":\"One, two, three, four, five.\"}"%
```
## Roadmap
- [ ] Support file transcription (non-streaming) of multiple formats.
- [ ] CLI client.
- [ ] Separate the web server related code from the "core", and publish "core" as a package.
- [ ] Additional documentation and code comments.
- [ ] Write benchmarks for measuring streaming transcription performance. Possible metrics:
- Latency (time when transcription is sent - time between when audio has been received)
- Accuracy (already being measured when testing but the process can be improved)
- Total seconds of audio transcribed / audio duration (since each audio chunk is being processed at least twice)
- [ ] Get the API response closer to the format used by OpenAI.
- [ ] Integrations...
Binary file added audio.wav
Binary file not shown.
2 changes: 0 additions & 2 deletions faster_whisper_server/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -237,7 +237,6 @@ async def transcribe_stream(
ws: WebSocket,
model: Annotated[Model, Query()] = config.whisper.model,
language: Annotated[Language | None, Query()] = config.default_language,
prompt: Annotated[str | None, Query()] = None,
response_format: Annotated[
ResponseFormat, Query()
] = config.default_response_format,
Expand All @@ -246,7 +245,6 @@ async def transcribe_stream(
await ws.accept()
transcribe_opts = {
"language": language,
"initial_prompt": prompt,
"temperature": temperature,
"vad_filter": True,
"condition_on_previous_text": False,
Expand Down

0 comments on commit f043430

Please sign in to comment.