Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 28 additions & 1 deletion extras/speaker-recognition/.env.template
Original file line number Diff line number Diff line change
Expand Up @@ -39,8 +39,35 @@ REACT_UI_HOST=0.0.0.0
REACT_UI_PORT=5174
REACT_UI_HTTPS=false

# Optional: External Services
# ===================================================================
# Transcription Provider Configuration
# ===================================================================

# Choose transcription provider: 'deepgram' or 'parakeet'
# If not set, auto-detects based on available credentials
TRANSCRIPTION_PROVIDER=deepgram

# Option 1: Deepgram (cloud-based, requires API key)
DEEPGRAM_API_KEY=your_deepgram_api_key_here

# Option 2: Parakeet ASR (local/offline transcription)
# Point to Parakeet service from extras/asr-services or backends/advanced
# PARAKEET_ASR_URL=http://parakeet-asr:8767

# ===================================================================
# Diarization Configuration
# ===================================================================

# Speaker diarization mode: auto, native, pyannote, or none
# - auto: Use provider's native diarization if available, otherwise Pyannote (default)
# - native: Use only provider's native diarization (Deepgram has native, Parakeet doesn't)
# - pyannote: Always use standalone Pyannote diarization
# - none: Skip diarization entirely (transcription only)
DIARIZATION_MODE=auto

# ===================================================================
# Other External Services
# ===================================================================
GROQ_API_KEY=your_groq_api_key_here

# Test Configuration (for docker-compose-test.yml)
Expand Down
4 changes: 2 additions & 2 deletions extras/speaker-recognition/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -42,5 +42,5 @@ ENV PYTHONPATH=/app
EXPOSE 8085

# Run the service
# Use shell form to allow environment variable expansion
CMD uv run --extra ${PYTORCH_CUDA_VERSION} --no-dev simple-speaker-service
# Use JSON form with shell wrapper for environment variable expansion
CMD ["sh", "-c", "uv run --extra ${PYTORCH_CUDA_VERSION} --no-dev simple-speaker-service"]
18 changes: 9 additions & 9 deletions extras/speaker-recognition/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@ cp .env.template .env
# Edit .env and add your Hugging Face token
```
Get your HF token from https://huggingface.co/settings/tokens
Accept the terms and conditions for
https://huggingface.co/pyannote/speaker-diarization-3.1
Accept the terms and conditions for
https://huggingface.co/pyannote/speaker-diarization-community-1
Comment on lines +18 to +19
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Format the bare URL as a markdown link.

Line 19 contains a bare URL that should be wrapped in markdown link syntax for proper formatting and compliance with markdown standards.

Apply this diff to format the URL correctly:

-Accept the terms and conditions for
-https://huggingface.co/pyannote/speaker-diarization-community-1
+Accept the terms and conditions for:
+[pyannote/speaker-diarization-community-1](https://huggingface.co/pyannote/speaker-diarization-community-1)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Accept the terms and conditions for
https://huggingface.co/pyannote/speaker-diarization-community-1
Accept the terms and conditions for:
[pyannote/speaker-diarization-community-1](https://huggingface.co/pyannote/speaker-diarization-community-1)
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

19-19: Bare URL used

(MD034, no-bare-urls)

🤖 Prompt for AI Agents
In extras/speaker-recognition/README.md around lines 18 to 19, the bare URL
should be formatted as a markdown link; replace the plain URL with markdown link
syntax, for example using the repo name as the link text (e.g.
[pyannote/speaker-diarization-community-1](https://huggingface.co/pyannote/speaker-diarization-community-1))
so the URL is not bare and renders correctly in Markdown.

https://huggingface.co/pyannote/segmentation-3.0


Expand Down Expand Up @@ -829,7 +829,7 @@ The advanced backend communicates with this service through the `client.py` modu

## Laptop Client

A command-line client (`laptop_client.py`) that can record from your microphone and interact with the speaker recognition service.
A command-line client (`scripts/laptop_client.py`) that can record from your microphone and interact with the speaker recognition service.

### Setup for Laptop Client

Expand All @@ -854,22 +854,22 @@ pip install pyaudio
docker compose --profile cpu up -d

# Enroll a new speaker (records 10 seconds)
python laptop_client.py enroll --speaker-id "john" --speaker-name "John Doe" --duration 10
python scripts/laptop_client.py enroll --speaker-id "john" --speaker-name "John Doe" --duration 10

# Identify a speaker (records 5 seconds)
python laptop_client.py identify --duration 5
python scripts/laptop_client.py identify --duration 5

# Verify against a specific speaker (records 3 seconds)
python laptop_client.py verify --speaker-id "john" --duration 3
python scripts/laptop_client.py verify --speaker-id "john" --duration 3

# List all enrolled speakers
python laptop_client.py list
python scripts/laptop_client.py list

# Remove a speaker
python laptop_client.py remove --speaker-id "john"
python scripts/laptop_client.py remove --speaker-id "john"

# Use different service URL
python laptop_client.py --service-url "http://192.168.1.100:8001" identify
python scripts/laptop_client.py --service-url "http://192.168.1.100:8001" identify
```

### Laptop Client Features
Expand Down
2 changes: 1 addition & 1 deletion extras/speaker-recognition/docker-compose-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ services:
context: .
dockerfile: Dockerfile
args:
PYTORCH_CUDA_VERSION: ${COMPUTE_MODE:-cpu}
PYTORCH_CUDA_VERSION: ${PYTORCH_CUDA_VERSION:-cpu}
image: speaker-recognition:test
ports:
# Map host test port (default 8086) to container port 8085
Expand Down
3 changes: 3 additions & 0 deletions extras/speaker-recognition/docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,10 @@ services:
- SIMILARITY_THRESHOLD=${SIMILARITY_THRESHOLD:-0.15}
- SPEAKER_SERVICE_HOST=${SPEAKER_SERVICE_HOST:-0.0.0.0}
- SPEAKER_SERVICE_PORT=${SPEAKER_SERVICE_PORT:-8085}
# Transcription provider configuration
- TRANSCRIPTION_PROVIDER=${TRANSCRIPTION_PROVIDER:-deepgram}
- DEEPGRAM_API_KEY=${DEEPGRAM_API_KEY}
- PARAKEET_ASR_URL=${PARAKEET_ASR_URL:-http://parakeet-asr:8767}
restart: unless-stopped
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8085/health"]
Expand Down
Empty file.
23 changes: 5 additions & 18 deletions extras/speaker-recognition/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ dependencies = [
"fastapi>=0.115.12",
"uvicorn>=0.34.2",
"scipy>=1.10.0",
"pyannote.audio>=3.3.2",
"pyannote.audio>=4.0.0",
"aiohttp>=3.8.0",
"python-multipart>=0.0.6",
"pydantic>=2.0.0",
Expand Down Expand Up @@ -43,26 +43,20 @@ cpu = [
"torchaudio>=2.0.0",
]

cu121 = [
"torch>=2.0.0",
"torchaudio>=2.0.0",
]

cu126 = [
"torch>=2.0.0",
"torchaudio>=2.0.0",
"torch>=2.8.0",
"torchaudio>=2.8.0",
]

cu128 = [
"torch>=2.0.0",
"torchaudio>=2.0.0",
"torch>=2.8.0",
"torchaudio>=2.8.0",
]

[tool.uv]
conflicts = [
[
{ extra = "cpu" },
{ extra = "cu121" },
{ extra = "cu126" },
{ extra = "cu128" },
],
Expand All @@ -71,13 +65,11 @@ conflicts = [
[tool.uv.sources]
torch = [
{ index = "pytorch-cpu", extra = "cpu" },
{ index = "pytorch-cu121", extra = "cu121" },
{ index = "pytorch-cu126", extra = "cu126" },
{ index = "pytorch-cu128", extra = "cu128" },
]
torchaudio = [
{ index = "pytorch-cpu", extra = "cpu" },
{ index = "pytorch-cu121", extra = "cu121" },
{ index = "pytorch-cu126", extra = "cu126" },
{ index = "pytorch-cu128", extra = "cu128" },
]
Expand All @@ -87,11 +79,6 @@ name = "pytorch-cpu"
url = "https://download.pytorch.org/whl/cpu"
explicit = true

[[tool.uv.index]]
name = "pytorch-cu121"
url = "https://download.pytorch.org/whl/cu121"
explicit = true

[[tool.uv.index]]
name = "pytorch-cu126"
url = "https://download.pytorch.org/whl/cu126"
Expand Down
9 changes: 7 additions & 2 deletions extras/speaker-recognition/run-test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,9 @@ fi

print_info "Speaker Recognition Integration Test Runner"
print_info "=========================================="

# Load environment variables (CI or local)
if [ -f ".env" ]; then
print_info ".env file exists: $([ -f .env ] && echo 'yes' || echo 'no')"
Comment on lines +59 to 61
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical syntax error: orphaned if statement.

Lines 60-61 contain an unclosed if statement that causes the "syntax error: unexpected end of file" reported by the pipeline at line 171. This appears to be leftover debug code.

Apply this diff to fix the syntax error:

-
-if [ -f ".env" ]; then
-print_info ".env file exists: $([ -f .env ] && echo 'yes' || echo 'no')"
-
 # Load environment variables (CI or local)
 if [ -f ".env" ] && [ -z "${HF_TOKEN:-}" ]; then

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In extras/speaker-recognition/run-test.sh around lines 59 to 61 there is an
unclosed if statement causing a syntax error; close the conditional by adding a
matching "fi" after the print_info line (or remove the orphaned if/debug block
entirely) so the if [ -f ".env" ]; then ... is properly terminated and the
script no longer errors on unexpected end of file.


# Load environment variables (CI or local)
Expand Down Expand Up @@ -109,8 +112,10 @@ if [ -z "$DEEPGRAM_API_KEY" ]; then
exit 1
fi

print_info "HF_TOKEN length: ${#HF_TOKEN}"
print_info "DEEPGRAM_API_KEY length: ${#DEEPGRAM_API_KEY}"
# Now we can safely check the variables
print_info "Environment configuration:"
print_info " HF_TOKEN length: ${#HF_TOKEN}"
print_info " DEEPGRAM_API_KEY length: ${#DEEPGRAM_API_KEY}"

# Export variables early so docker compose can use them
export HF_TOKEN
Expand Down
Loading