Skip to content

Commit f1afd8b

Browse files
committed
Add network aliases for speaker service in Docker Compose configuration
1 parent 2f89970 commit f1afd8b

File tree

4 files changed

+41
-33
lines changed

4 files changed

+41
-33
lines changed

Docs/audio-pipeline-architecture.md

Lines changed: 28 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -497,9 +497,8 @@ Session Starts
497497
└─────────────┬───────────────────┘
498498
↓ (when conversation ends)
499499
┌─────────────────────────────────┐
500-
│ Post-Conversation Pipeline │ ← Parallel batch jobs
500+
│ Post-Conversation Pipeline │
501501
├─────────────────────────────────┤
502-
│ • transcribe_full_audio_job │
503502
│ • recognize_speakers_job │
504503
│ • memory_extraction_job │
505504
│ • generate_title_summary_job │
@@ -597,32 +596,16 @@ Session Starts
597596

598597
### Post-Conversation Pipeline
599598

600-
All jobs run **in parallel** after conversation completes:
599+
**Streaming conversations**: Use streaming transcript saved during conversation. No batch re-transcription.
601600

602-
#### 1. Transcribe Full Audio Job
601+
**File uploads**: Batch transcription job runs first, then post-conversation jobs depend on it.
603602

604-
**File**: `backends/advanced/src/advanced_omi_backend/workers/transcription_jobs.py`
605-
606-
**Function**: `transcribe_full_audio_job()`
607-
608-
**Input**: Audio file from disk (`data/chunks/*.wav`)
609-
610-
**Process**:
611-
- Batch transcribes entire conversation audio
612-
- Validates meaningful speech
613-
- Marks conversation `deleted` if no speech detected
614-
- Stores transcript, segments, words in MongoDB
615-
616-
**Container**: `rq-worker`
617-
618-
#### 2. Recognize Speakers Job
603+
#### 1. Recognize Speakers Job
619604

620605
**File**: `backends/advanced/src/advanced_omi_backend/workers/transcription_jobs.py`
621606

622607
**Function**: `recognize_speakers_job()`
623608

624-
**Prerequisite**: `transcribe_full_audio_job` completes
625-
626609
**Process**:
627610
- Sends audio + segments to speaker recognition service
628611
- Identifies speakers using voice embeddings
@@ -634,13 +617,13 @@ All jobs run **in parallel** after conversation completes:
634617

635618
**External Service**: `speaker-recognition` container (if enabled)
636619

637-
#### 3. Memory Extraction Job
620+
#### 2. Memory Extraction Job
638621

639622
**File**: `backends/advanced/src/advanced_omi_backend/workers/memory_jobs.py`
640623

641624
**Function**: `memory_extraction_job()`
642625

643-
**Prerequisite**: `transcribe_full_audio_job` completes
626+
**Prerequisite**: Speaker recognition job
644627

645628
**Process**:
646629
- Uses LLM (OpenAI/Ollama) to extract semantic facts
@@ -654,32 +637,46 @@ All jobs run **in parallel** after conversation completes:
654637
- `ollama` or OpenAI API (LLM)
655638
- `qdrant` or OpenMemory MCP (vector storage)
656639

657-
#### 4. Generate Title Summary Job
640+
#### 3. Generate Title Summary Job
658641

659642
**File**: `backends/advanced/src/advanced_omi_backend/workers/conversation_jobs.py`
660643

661644
**Function**: `generate_title_summary_job()`
662645

663-
**Prerequisite**: `transcribe_full_audio_job` completes
646+
**Prerequisite**: Speaker recognition job
664647

665648
**Process**:
666-
- Uses LLM to generate:
667-
- Title (short summary)
668-
- Summary (1-2 sentences)
669-
- Detailed summary (paragraph)
649+
- Uses LLM to generate title, summary, detailed summary
670650
- Updates conversation document in MongoDB
671651

672652
**Container**: `rq-worker`
673653

674-
#### 5. Dispatch Conversation Complete Event
654+
#### 4. Dispatch Conversation Complete Event
675655

676656
**File**: `backends/advanced/src/advanced_omi_backend/workers/conversation_jobs.py`
677657

678658
**Function**: `dispatch_conversation_complete_event_job()`
679659

680660
**Process**:
681661
- Triggers `conversation.complete` plugin event
682-
- Only runs for **file uploads** (not streaming sessions)
662+
663+
**Container**: `rq-worker`
664+
665+
#### Batch Transcription Job
666+
667+
**File**: `backends/advanced/src/advanced_omi_backend/workers/transcription_jobs.py`
668+
669+
**Function**: `transcribe_full_audio_job()`
670+
671+
**When used**:
672+
- File uploads via `/api/process-audio-files`
673+
- Manual reprocessing via `/api/conversations/{id}/reprocess-transcript`
674+
- NOT used for streaming conversations
675+
676+
**Process**:
677+
- Reconstructs audio from MongoDB chunks
678+
- Batch transcribes entire audio
679+
- Stores transcript with word-level timestamps
683680

684681
**Container**: `rq-worker`
685682

extras/speaker-recognition/docker-compose.yml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,10 @@ services:
3333
interval: 30s
3434
timeout: 10s
3535
retries: 3
36+
networks:
37+
default:
38+
aliases:
39+
- speaker-service
3640

3741
# GPU Profile Configuration
3842
speaker-service-gpu:
@@ -50,6 +54,10 @@ services:
5054
- driver: nvidia
5155
count: all
5256
capabilities: [gpu]
57+
networks:
58+
default:
59+
aliases:
60+
- speaker-service
5361

5462
# React Web UI
5563
web-ui:

status.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -43,8 +43,8 @@ def get_container_status(service_name: str) -> Dict[str, Any]:
4343

4444
try:
4545
# Get container status using docker compose ps
46-
# Use 'ps -a' to get all containers regardless of profile
47-
cmd = ['docker', 'compose', 'ps', '-a', '--format', 'json']
46+
# Only check containers from active profiles (excludes inactive profile services)
47+
cmd = ['docker', 'compose', 'ps', '--format', 'json']
4848

4949
result = subprocess.run(
5050
cmd,

wizard.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -249,6 +249,9 @@ def run_service_setup(service_name, selected_services, https_enabled=False, serv
249249

250250
# For speaker-recognition, pass HF_TOKEN from centralized configuration
251251
if service_name == 'speaker-recognition':
252+
# Define the speaker env path
253+
speaker_env_path = 'extras/speaker-recognition/.env'
254+
252255
# HF Token should have been provided via setup_hf_token_if_needed()
253256
if hf_token:
254257
cmd.extend(['--hf-token', hf_token])

0 commit comments

Comments
 (0)