This document provides a detailed explanation of the video processing pipeline's architecture, design decisions, and implementation details.
The video processing pipeline is a distributed, event-driven system designed for converting uploaded videos into adaptive bitrate streaming (HLS) format with multiple quality levels.
- Distributed: Multiple workers process videos in parallel
- Event-Driven: NATS JetStream coordinates asynchronous processing
- Stateless Workers: Workers can be scaled horizontally
- Idempotent: Retry-safe operations with exactly-once semantics
- Resilient: Automatic retries and failure recovery
Each component has a single, well-defined responsibility:
- API Server: Handle HTTP requests and coordinate uploads
- Workers: Process specific video qualities
- NATS: Message distribution and event coordination
- MySQL: Persistent state tracking
- S3: Durable storage
Components communicate through:
- Events (NATS messages) - not direct API calls
- Shared storage (S3) - not in-memory data
- Database state (MySQL) - not local files
This allows independent deployment, scaling, and failure.
Video processing is asynchronous:
- Upload confirmation returns immediately
- Processing happens in background
- Status tracked in database
- Final result appears in S3
- Workers retry failed operations (max 5 attempts)
- Temporary files cleaned up after success/failure
- Dead letter queues for permanent failures
- Graceful degradation (missing audio/transcript doesn't fail video)
Responsibilities:
- Accept video upload requests
- Generate S3 presigned URLs
- Publish upload events to NATS
- Track video state in database
Technology:
- Gin HTTP framework
- RESTful endpoints
- CORS middleware for web clients
Key Operations:
POST /upload/signed/url- Generate upload URLPOST /upload/confirmation- Trigger processing
Types:
-
Transcoding Workers (240p, 480p, 720p)
- Download raw video from S3
- Transcode with FFmpeg to target resolution
- Generate HLS segments (6-second chunks)
- Upload segments to local storage
- Update database status
- Publish completion event
-
Audio Worker
- Download raw video
- Extract audio track with FFmpeg
- Send to Gemini AI for transcription
- Save transcript as text
- Update database status
- Publish completion event
-
Done Worker
- Listen for completion events
- Check if all tasks complete
- Generate master.m3u8 playlist
- Upload all HLS files to S3
- Update database to "done" status
- Clean up temporary files
Stream Configuration:
Name: VIDEO
Subjects: VIDEO.>
Storage: File-based (persistent)
Retention: Limits-based
Event Flow:
VIDEO.uploaded → Workers (240p, 480p, 720p, audio)
↓
Workers → VIDEO.processing.done
↓
Done Worker → Finalization
Consumer Properties:
- Durable (survive restarts)
- Explicit ACK (manual confirmation)
- 30-second ACK timeout
- Max 5 delivery attempts
Schema:
video_processing_state
├── id (VARCHAR, PRIMARY KEY) - Video UUID
├── video_id (VARCHAR) - Original S3 key
├── hls_240_done (BOOLEAN) - 240p complete
├── hls_480_done (BOOLEAN) - 480p complete
├── hls_720_done (BOOLEAN) - 720p complete
├── audio_done (BOOLEAN) - Audio extracted
├── transcript_done (BOOLEAN) - Transcription complete
├── master_done (BOOLEAN) - Master playlist created
├── uploaded (BOOLEAN) - Uploaded to S3
├── status (ENUM) - processing | done | failed
├── created_at (TIMESTAMP)
└── updated_at (TIMESTAMP)State Transitions:
NULL → INSERT (status='processing')
↓
UPDATE flags (hls_*_done, audio_done, etc.)
↓
UPDATE (status='done') when all flags true
↓
UPDATE (uploaded=true) after S3 upload
Bucket Structure:
video-pipline/
├── raw/ # Original uploads
│ └── UUID.mp4
└── processed/ # Transcoded output
└── VIDEO_ID/
├── master.m3u8 # Master playlist
├── 240p/
│ ├── index.m3u8
│ └── seg_*.ts
├── 480p/
│ ├── index.m3u8
│ └── seg_*.ts
└── 720p/
├── index.m3u8
└── seg_*.ts
URL Formats:
- Virtual-hosted style:
https://bucket.s3.region.amazonaws.com/key - Path style:
https://s3.region.amazonaws.com/bucket/key✓ (used)
Path-style is more reliable for buckets with hyphens.
Client
↓ [1] POST /upload/signed/url
API Server
↓ [2] GeneratePresignedURL(S3)
S3
↓ [3] Return URL
Client
↓ [4] PUT video to presigned URL
S3
↓ [5] Store video
Client
↓ [6] POST /upload/confirmation
API Server
↓ [7] INSERT database record
↓ [8] PUBLISH VIDEO.uploaded
NATS
NATS VIDEO.uploaded
↓
┌─────────┬─────────┬─────────┬─────────┐
│ 240p │ 480p │ 720p │ Audio │
│ Worker │ Worker │ Worker │ Worker │
└─────────┴─────────┴─────────┴─────────┘
↓ ↓ ↓ ↓
Download Download Download Download
↓ ↓ ↓ ↓
FFmpeg FFmpeg FFmpeg FFmpeg
Transcode Transcode Transcode Extract
↓ ↓ ↓ ↓
HLS HLS HLS Gemini
Segments Segments Segments AI
↓ ↓ ↓ ↓
UPDATE UPDATE UPDATE UPDATE
DB DB DB DB
↓ ↓ ↓ ↓
PUBLISH PUBLISH PUBLISH PUBLISH
done done done done
└─────────┴─────────┴─────────┘
↓
Done Worker
↓
Check if all
tasks complete
↓
Generate master
playlist
↓
Upload to S3
↓
UPDATE DB
(status=done)
↓
Cleanup temp
files
S3 (processed/VIDEO_ID/)
↓
CDN (optional)
↓
Client (HLS Player)
↓ [1] Request master.m3u8
↓ [2] Parse quality options
↓ [3] Request index.m3u8 (selected quality)
↓ [4] Download segments (seg_*.ts)
↓ [5] Adaptive switching based on bandwidth
Benefits:
- Decoupling: Components don't know about each other
- Scalability: Add workers without changing code
- Resilience: Failures don't cascade
- Async: Non-blocking operations
- Replay: Can reprocess events if needed
Tradeoffs:
- More complex than synchronous calls
- Eventual consistency (not immediate)
- Debugging is harder (distributed traces needed)
| Event | Subject | Payload | Consumers |
|---|---|---|---|
| Upload Complete | VIDEO.uploaded | {id, key, bucket, timestamp} | All workers |
| Task Complete | VIDEO.processing.done | {videoId, dirPath, bucket} | Done worker |
Fan-out (VIDEO.uploaded):
NATS
↓
┌────┼────┬────┐
↓ ↓ ↓ ↓
240p 480p 720p Audio
Fan-in (VIDEO.processing.done):
240p 480p 720p Audio
↓ ↓ ↓ ↓
└────┼────┴────┘
↓
Done Worker
Challenge: Network failures can cause duplicate messages.
Solution:
- Idempotent operations: Database updates use WHERE clauses
- Message deduplication: NATS JetStream uses message IDs
- State checks: Done worker checks flags before acting
Example:
// Idempotent update
UPDATE video_processing_state
SET hls_240_done = true
WHERE id = ? AND hls_240_done = false ┌──────────┐
│ START │
└────┬─────┘
↓
┌──────────────┐
│ PROCESSING │ ← Initial state
└──────┬───────┘
↓
┌───────────────────────┐
│ All flags = true? │
└───┬───────────────┬───┘
NO YES
↓ ↓
Continue ┌──────────┐
Processing │ DONE │
└──────────┘
Optimistic Locking:
UPDATE video_processing_state
SET status = 'done'
WHERE id = ?
AND status = 'processing' // Prevents double-finalization
AND hls_240_done = true
AND hls_480_done = true
AND hls_720_done = trueAtomic Operations:
- Single database transaction per state change
- All-or-nothing S3 uploads (checked before marking complete)
Workers:
1 worker → 1 video/time
2 workers → 2 videos/time
N workers → N videos/time
Deploy multiple worker instances:
# Kubernetes
kubectl scale deployment worker-240p --replicas=5
# Docker Compose
docker-compose up --scale worker-240p=5NATS:
- Handles thousands of messages/second
- Workers pull from queue (competing consumers)
- No coordination needed
Bottlenecks:
- Database: Connection pool (tunable)
- S3: Request limits (usually not a problem)
- FFmpeg: CPU/memory per worker
- Disk: Temporary storage per worker
FFmpeg Performance:
- CPU: More cores = faster transcoding
- Memory: 2-4GB per worker recommended
- Disk: SSD for temp files (huge speedup)
Recommendations:
- 240p/480p workers: 2 CPU, 2GB RAM
- 720p workers: 4 CPU, 4GB RAM
- Audio workers: 1 CPU, 1GB RAM
Scenarios:
- Worker crashes mid-processing
- FFmpeg command fails
- S3 upload fails
- Database connection lost
Handling:
Failure
↓
NATS: NAK (negative acknowledgment)
↓
Message returns to queue
↓
Another worker picks it up
↓
Retry (up to MaxDeliver=5 times)
↓
If still failing → Dead Letter Queue(to be done)
Cleanup:
- Temporary files removed on failure
- Database status remains "processing"
- Manual intervention possible via admin tools
S3 Connectivity:
- Presigned URLs expire after 10 minutes
- Workers cache downloads locally
- Retries with exponential backoff
NATS Connectivity:
- Workers reconnect automatically
- In-flight messages preserved (JetStream)
- Processing resumes after reconnection
Database Connectivity:
- Connection pool handles transient failures
- Workers retry database operations
- Long outages cause worker backlog
Missing Quality: If 720p fails but others succeed:
- Video still usable (240p, 480p available)
- Master playlist excludes failed quality
- Can be reprocessed later
Missing Audio:
- Video processing continues
- Transcript unavailable
- Not critical for playback
Current:
- No authentication (demo)
- S3 presigned URLs provide time-limited access
- NATS has no authentication (local)
Production TODO:
- Add JWT authentication for API
- Use NATS authentication tokens
- Implement user quotas and rate limiting
- Add API keys for programmatic access
At Rest:
- S3: Enable server-side encryption (AES-256)
- MySQL: Encrypt sensitive columns
- Consider: Client-side encryption for uploads
In Transit:
- S3: HTTPS for all operations
- NATS: TLS encryption (configure)
- MySQL: TLS connections
Current:
- File type validation (TODO)
- Size limits (TODO)
- Malicious content scanning (TODO)
Production TODO:
- Validate video formats (only allow MP4, AVI, etc.)
- Scan for malware before processing
- Rate limit uploads per user
- Content moderation (violence, explicit content)
Current:
- Preset: "medium" (balance speed/quality)
- CRF: 22 (constant quality)
- GOP: 48 frames (2 seconds at 24fps)
Tuning:
- Faster: preset="fast", lower CRF
- Smaller files: preset="slow", higher CRF
- GPU: Use NVENC for 10x speedup
Current:
- 3 transcoding workers + 1 audio worker = 4x parallelism
- Each worker processes different qualities
Optimization:
- Run multiple instances of each worker type
- Use different machines for different qualities
- Priority queues for urgent videos
Opportunities:
- Thumbnail cache: Generate once, reuse
- CDN: CloudFront in front of S3
- Database: Query result caching
- Warm pool: Pre-started worker containers
Memory:
- Stream large files (don't load entirely)
- Limit concurrent FFmpeg processes
- Monitor with Prometheus metrics
Disk:
- Clean temp files aggressively
- Use separate disk for temp storage
- Monitor disk usage alerts
Network:
- Parallel S3 uploads (multipart)
- Compress logs before sending
- Regional deployments (closer to users)
This architecture balances:
- Simplicity: Easy to understand and debug
- Scalability: Horizontal scaling for workers
- Reliability: Automatic retries and failure recovery
- Performance: Parallel processing and optimized FFmpeg
The event-driven model with NATS JetStream provides flexibility to add new features (like thumbnails, subtitles) without changing existing code.
Questions or suggestions? Open an issue or discussion on GitHub!