A service for performing forced alignment between audio files and transcripts, providing word-level timestamps. The service can run in two modes: HTTP API (FastAPI) or RabbitMQ Consumer.
- Forced Alignment: Aligns audio files (MP3) with transcripts to generate word-level timestamps
- Dual Mode Operation: Run as an HTTP API server or as a RabbitMQ message consumer
- Language Support: Configurable language support (default: Arabic)
- Optional Authentication: Secure HTTP endpoints with secret key authentication
- Docker Support: Ready-to-use Docker container
- Python 3.10+
- FFmpeg
- Git
- Perl
- Build tools (for compiling dependencies)
-
Create and activate a virtual environment:
python3 -m venv .env source .env/bin/activate -
Install dependencies:
pip install -r requirements.txt
The service supports two operational modes, controlled by the MODE environment variable:
Runs a FastAPI server that exposes a REST API endpoint for forced alignment.
Start the HTTP server:
export MODE=http
python main.pyThe server will start on the configured host and port (default: 0.0.0.0:5000).
Consumes messages from a RabbitMQ queue, processes them, and publishes results to another queue.
Start the RabbitMQ consumer:
export MODE=rabbitmq
python main.pyThe consumer will listen for messages on the configured queue and process them automatically.
| Variable | Description | Default | Required |
|---|---|---|---|
MODE |
Operation mode: http or rabbitmq |
rabbitmq |
No |
| Variable | Description | Default | Required |
|---|---|---|---|
HTTP_HOST |
Host address to bind the HTTP server | 0.0.0.0 |
No |
HTTP_PORT |
Port number for the HTTP server | 5000 |
No |
ALIGN_SECRET_KEY |
Secret key for API authentication (if set, authentication is required) | None |
No |
RELOAD |
Enable auto-reload for development (true/false) |
false |
No |
| Variable | Description | Default | Required |
|---|---|---|---|
RABBITMQ_URL |
RabbitMQ connection URL | amqp://guest:guest@localhost:5672// |
No |
CONSUME_QUEUE_NAME |
Name of the queue to consume messages from | forced_alignment |
No |
CONSUME_ROUTING_KEY |
Routing key for the consume queue | forcedalignment.processing |
No |
| Variable | Description | Default | Required |
|---|---|---|---|
RESULT_QUEUE_NAME |
Name of the queue to publish results to | forced_alignment_result |
No |
RESULT_QUEUE_EXCHANGE |
Exchange name for the result queue | forced_alignment_result |
No |
RESULT_QUEUE_ROUTING_KEY |
Routing key for the result queue | forcedalignment_result.processing |
No |
RESULT_CELERY_TASK_NAME |
Celery task name for result messages | forced-alignment-result |
No |
export MODE=http
export HTTP_HOST=0.0.0.0
export HTTP_PORT=5000
export ALIGN_SECRET_KEY=your_secret_key_here # Optional
python main.pyPOST /align
Performs forced alignment between an audio file and transcript.
Request Body (JSON):
{
"mp3_url": "https://example.com/audio.mp3",
"text": "your transcript text here",
"language": "ar", // Optional, defaults to "ar"
"romanize": true, // Optional, defaults to true
"batch_size": 4 // Optional, defaults to 4
}Response:
[
{
"word": "example",
"start": 0.23,
"end": 0.56,
"score": 0.98
},
...
]word: The word from the transcriptstart: Start time in secondsend: End time in secondsscore: Alignment confidence score (0-1)
Example Request (without authentication):
curl -X POST "http://localhost:5000/align" \
-H "Content-Type: application/json" \
-d '{
"mp3_url": "https://example.com/audio.mp3",
"text": "your transcript here"
}'Example Request (with authentication):
curl -X POST "http://localhost:5000/align" \
-H "Content-Type: application/json" \
-H "Authorization: your_secret_key" \
-d '{
"mp3_url": "https://example.com/audio.mp3",
"text": "your transcript here",
"language": "ar",
"romanize": true,
"batch_size": 4
}'export MODE=rabbitmq
export RABBITMQ_URL=amqp://user:password@rabbitmq-host:5672//
export CONSUME_QUEUE_NAME=forced_alignment
export CONSUME_ROUTING_KEY=forcedalignment.processing
export RESULT_QUEUE_NAME=forced_alignment_result
export RESULT_QUEUE_EXCHANGE=forced_alignment_result
export RESULT_QUEUE_ROUTING_KEY=forcedalignment_result.processing
export RESULT_CELERY_TASK_NAME=forced-alignment-result
python main.pyThe consumer expects messages in the following format:
[
[
"https://example.com/audio.mp3", // audio_url
"your transcript text here", // text
{} // additional (optional metadata)
]
]The consumer will:
- Download and process the audio file
- Perform forced alignment with the transcript
- Publish results to the configured result queue in Celery-compatible format
Result Message Format: The result is published as a Celery task message with:
- Task ID: UUID v4
- Task name: Value of
RESULT_CELERY_TASK_NAME - Arguments:
[words_timestamps, additional] - Headers: Standard Celery headers
docker build -t forced-alignment .docker run -p 5000:5000 \
-e MODE=http \
-e HTTP_HOST=0.0.0.0 \
-e HTTP_PORT=5000 \
-e ALIGN_SECRET_KEY=your_secret_key \
forced-alignmentdocker run \
-e MODE=rabbitmq \
-e RABBITMQ_URL=amqp://user:password@rabbitmq-host:5672// \
-e CONSUME_QUEUE_NAME=forced_alignment \
-e CONSUME_ROUTING_KEY=forcedalignment.processing \
-e RESULT_QUEUE_NAME=forced_alignment_result \
-e RESULT_QUEUE_EXCHANGE=forced_alignment_result \
-e RESULT_QUEUE_ROUTING_KEY=forcedalignment_result.processing \
-e RESULT_CELERY_TASK_NAME=forced-alignment-result \
forced-alignmentIf a pre-built image is available:
docker run -p 5000:5000 \
-e MODE=http \
-e HTTP_PORT=5000 \
natiqquran/forced-alignment:latestFor development, you can enable auto-reload:
export MODE=http
export RELOAD=true
python main.pyOr with Docker:
docker run -p 5000:5000 \
-e MODE=http \
-e RELOAD=true \
forced-alignmentforced-alignment/
├── core/
│ └── align.py # Core alignment logic
├── modes/
│ ├── http.py # HTTP API mode implementation
│ └── rabbitmq.py # RabbitMQ consumer mode implementation
├── main.py # Entry point and mode selection
├── requirements.txt # Python dependencies
├── Dockerfile # Docker configuration
└── README.md # This file
- The service automatically uses CUDA if available, otherwise falls back to CPU
- Audio files are temporarily downloaded and converted to WAV format for processing
- The RabbitMQ consumer processes one message at a time (prefetch_count=1)
- Failed messages are rejected without requeueing
- Authentication is optional for HTTP mode; if
ALIGN_SECRET_KEYis not set, the endpoint is publicly accessible
See LICENSE file for details.