Forced Alignment Service

A service for performing forced alignment between audio files and transcripts, providing word-level timestamps. The service can run in two modes: HTTP API (FastAPI) or RabbitMQ Consumer.

Features

Forced Alignment: Aligns audio files (MP3) with transcripts to generate word-level timestamps
Dual Mode Operation: Run as an HTTP API server or as a RabbitMQ message consumer
Language Support: Configurable language support (default: Arabic)
Optional Authentication: Secure HTTP endpoints with secret key authentication
Docker Support: Ready-to-use Docker container

Setup

Prerequisites

Python 3.10+
FFmpeg
Git
Perl
Build tools (for compiling dependencies)

Local Setup

Create and activate a virtual environment:

python3 -m venv .env
source .env/bin/activate

Install dependencies:
```
pip install -r requirements.txt
```

Modes

The service supports two operational modes, controlled by the MODE environment variable:

1. HTTP Mode

Runs a FastAPI server that exposes a REST API endpoint for forced alignment.

Start the HTTP server:

export MODE=http
python main.py

The server will start on the configured host and port (default: 0.0.0.0:5000).

2. RabbitMQ Mode

Consumes messages from a RabbitMQ queue, processes them, and publishes results to another queue.

Start the RabbitMQ consumer:

export MODE=rabbitmq
python main.py

The consumer will listen for messages on the configured queue and process them automatically.

Environment Variables

Mode Selection

Variable	Description	Default	Required
`MODE`	Operation mode: `http` or `rabbitmq`	`rabbitmq`	No

HTTP Mode Variables

Variable	Description	Default	Required
`HTTP_HOST`	Host address to bind the HTTP server	`0.0.0.0`	No
`HTTP_PORT`	Port number for the HTTP server	`5000`	No
`ALIGN_SECRET_KEY`	Secret key for API authentication (if set, authentication is required)	`None`	No
`RELOAD`	Enable auto-reload for development (`true`/`false`)	`false`	No

RabbitMQ Mode Variables

Consumer Configuration

Variable	Description	Default	Required
`RABBITMQ_URL`	RabbitMQ connection URL	`amqp://guest:guest@localhost:5672//`	No
`CONSUME_QUEUE_NAME`	Name of the queue to consume messages from	`forced_alignment`	No
`CONSUME_ROUTING_KEY`	Routing key for the consume queue	`forcedalignment.processing`	No

Result Queue Configuration

Variable	Description	Default	Required
`RESULT_QUEUE_NAME`	Name of the queue to publish results to	`forced_alignment_result`	No
`RESULT_QUEUE_EXCHANGE`	Exchange name for the result queue	`forced_alignment_result`	No
`RESULT_QUEUE_ROUTING_KEY`	Routing key for the result queue	`forcedalignment_result.processing`	No
`RESULT_CELERY_TASK_NAME`	Celery task name for result messages	`forced-alignment-result`	No

Usage

HTTP Mode

Starting the Server

export MODE=http
export HTTP_HOST=0.0.0.0
export HTTP_PORT=5000
export ALIGN_SECRET_KEY=your_secret_key_here  # Optional
python main.py

API Endpoint

POST /align

Performs forced alignment between an audio file and transcript.

Request Body (JSON):

{
  "mp3_url": "https://example.com/audio.mp3",
  "text": "your transcript text here",
  "language": "ar", // Optional, defaults to "ar"
  "romanize": true, // Optional, defaults to true
  "batch_size": 4 // Optional, defaults to 4
}

Response:

[
  {
    "word": "example",
    "start": 0.23,
    "end": 0.56,
    "score": 0.98
  },
  ...
]

word: The word from the transcript
start: Start time in seconds
end: End time in seconds
score: Alignment confidence score (0-1)

Example Request (without authentication):

curl -X POST "http://localhost:5000/align" \
  -H "Content-Type: application/json" \
  -d '{
    "mp3_url": "https://example.com/audio.mp3",
    "text": "your transcript here"
  }'

Example Request (with authentication):

curl -X POST "http://localhost:5000/align" \
  -H "Content-Type: application/json" \
  -H "Authorization: your_secret_key" \
  -d '{
    "mp3_url": "https://example.com/audio.mp3",
    "text": "your transcript here",
    "language": "ar",
    "romanize": true,
    "batch_size": 4
  }'

RabbitMQ Mode

Starting the Consumer

export MODE=rabbitmq
export RABBITMQ_URL=amqp://user:password@rabbitmq-host:5672//
export CONSUME_QUEUE_NAME=forced_alignment
export CONSUME_ROUTING_KEY=forcedalignment.processing
export RESULT_QUEUE_NAME=forced_alignment_result
export RESULT_QUEUE_EXCHANGE=forced_alignment_result
export RESULT_QUEUE_ROUTING_KEY=forcedalignment_result.processing
export RESULT_CELERY_TASK_NAME=forced-alignment-result
python main.py

Message Format

The consumer expects messages in the following format:

[
  [
    "https://example.com/audio.mp3", // audio_url
    "your transcript text here", // text
    {} // additional (optional metadata)
  ]
]

The consumer will:

Download and process the audio file
Perform forced alignment with the transcript
Publish results to the configured result queue in Celery-compatible format

Result Message Format: The result is published as a Celery task message with:

Task ID: UUID v4
Task name: Value of RESULT_CELERY_TASK_NAME
Arguments: [words_timestamps, additional]
Headers: Standard Celery headers

Running with Docker

Build the Image

docker build -t forced-alignment .

Run HTTP Mode

docker run -p 5000:5000 \
  -e MODE=http \
  -e HTTP_HOST=0.0.0.0 \
  -e HTTP_PORT=5000 \
  -e ALIGN_SECRET_KEY=your_secret_key \
  forced-alignment

Run RabbitMQ Mode

docker run \
  -e MODE=rabbitmq \
  -e RABBITMQ_URL=amqp://user:password@rabbitmq-host:5672// \
  -e CONSUME_QUEUE_NAME=forced_alignment \
  -e CONSUME_ROUTING_KEY=forcedalignment.processing \
  -e RESULT_QUEUE_NAME=forced_alignment_result \
  -e RESULT_QUEUE_EXCHANGE=forced_alignment_result \
  -e RESULT_QUEUE_ROUTING_KEY=forcedalignment_result.processing \
  -e RESULT_CELERY_TASK_NAME=forced-alignment-result \
  forced-alignment

Using Pre-built Image

If a pre-built image is available:

docker run -p 5000:5000 \
  -e MODE=http \
  -e HTTP_PORT=5000 \
  natiqquran/forced-alignment:latest

Development

Enable Auto-reload (HTTP Mode)

For development, you can enable auto-reload:

export MODE=http
export RELOAD=true
python main.py

Or with Docker:

docker run -p 5000:5000 \
  -e MODE=http \
  -e RELOAD=true \
  forced-alignment

Project Structure

forced-alignment/
├── core/
│   └── align.py          # Core alignment logic
├── modes/
│   ├── http.py           # HTTP API mode implementation
│   └── rabbitmq.py       # RabbitMQ consumer mode implementation
├── main.py               # Entry point and mode selection
├── requirements.txt      # Python dependencies
├── Dockerfile           # Docker configuration
└── README.md            # This file

Notes

The service automatically uses CUDA if available, otherwise falls back to CPU
Audio files are temporarily downloaded and converted to WAV format for processing
The RabbitMQ consumer processes one message at a time (prefetch_count=1)
Failed messages are rejected without requeueing
Authentication is optional for HTTP mode; if ALIGN_SECRET_KEY is not set, the endpoint is publicly accessible

License

See LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github/workflows		.github/workflows
core		core
modes		modes
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

License

natiq-foundation/forced-alignment

Folders and files

Latest commit

History

Repository files navigation

Forced Alignment Service

Features

Setup

Prerequisites

Local Setup

Modes

1. HTTP Mode

2. RabbitMQ Mode

Environment Variables

Mode Selection

HTTP Mode Variables

RabbitMQ Mode Variables

Consumer Configuration

Result Queue Configuration

Usage

HTTP Mode

Starting the Server

API Endpoint

RabbitMQ Mode

Starting the Consumer

Message Format

Running with Docker

Build the Image

Run HTTP Mode

Run RabbitMQ Mode

Using Pre-built Image

Development

Enable Auto-reload (HTTP Mode)

Project Structure

Notes

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages