Architecture Deep Dive

This document provides a detailed explanation of the video processing pipeline's architecture, design decisions, and implementation details.

System Overview

The video processing pipeline is a distributed, event-driven system designed for converting uploaded videos into adaptive bitrate streaming (HLS) format with multiple quality levels.

Key Characteristics

Distributed: Multiple workers process videos in parallel
Event-Driven: NATS JetStream coordinates asynchronous processing
Stateless Workers: Workers can be scaled horizontally
Idempotent: Retry-safe operations with exactly-once semantics
Resilient: Automatic retries and failure recovery

Design Principles

1. Separation of Concerns

Each component has a single, well-defined responsibility:

API Server: Handle HTTP requests and coordinate uploads
Workers: Process specific video qualities
NATS: Message distribution and event coordination
MySQL: Persistent state tracking
S3: Durable storage

2. Loose Coupling

Components communicate through:

Events (NATS messages) - not direct API calls
Shared storage (S3) - not in-memory data
Database state (MySQL) - not local files

This allows independent deployment, scaling, and failure.

3. Eventual Consistency

Video processing is asynchronous:

Upload confirmation returns immediately
Processing happens in background
Status tracked in database
Final result appears in S3

4. Fail-Safe Design

Workers retry failed operations (max 5 attempts)
Temporary files cleaned up after success/failure
Dead letter queues for permanent failures
Graceful degradation (missing audio/transcript doesn't fail video)

Component Architecture

API Server (main.go)

Responsibilities:

Accept video upload requests
Generate S3 presigned URLs
Publish upload events to NATS
Track video state in database

Technology:

Gin HTTP framework
RESTful endpoints
CORS middleware for web clients

Key Operations:

POST /upload/signed/url - Generate upload URL
POST /upload/confirmation - Trigger processing

Workers (worker/*)

Types:

Transcoding Workers (240p, 480p, 720p)
- Download raw video from S3
- Transcode with FFmpeg to target resolution
- Generate HLS segments (6-second chunks)
- Upload segments to local storage
- Update database status
- Publish completion event
Audio Worker
- Download raw video
- Extract audio track with FFmpeg
- Send to Gemini AI for transcription
- Save transcript as text
- Update database status
- Publish completion event
Done Worker
- Listen for completion events
- Check if all tasks complete
- Generate master.m3u8 playlist
- Upload all HLS files to S3
- Update database to "done" status
- Clean up temporary files

NATS JetStream

Stream Configuration:

Name: VIDEO
Subjects: VIDEO.>
Storage: File-based (persistent)
Retention: Limits-based

Event Flow:

VIDEO.uploaded → Workers (240p, 480p, 720p, audio)
                    ↓
Workers → VIDEO.processing.done
                    ↓
         Done Worker → Finalization

Consumer Properties:

Durable (survive restarts)
Explicit ACK (manual confirmation)
30-second ACK timeout
Max 5 delivery attempts

MySQL Database

Schema:

video_processing_state
├── id (VARCHAR, PRIMARY KEY) - Video UUID
├── video_id (VARCHAR) - Original S3 key
├── hls_240_done (BOOLEAN) - 240p complete
├── hls_480_done (BOOLEAN) - 480p complete
├── hls_720_done (BOOLEAN) - 720p complete
├── audio_done (BOOLEAN) - Audio extracted
├── transcript_done (BOOLEAN) - Transcription complete
├── master_done (BOOLEAN) - Master playlist created
├── uploaded (BOOLEAN) - Uploaded to S3
├── status (ENUM) - processing | done | failed
├── created_at (TIMESTAMP)
└── updated_at (TIMESTAMP)

State Transitions:

NULL → INSERT (status='processing')
    ↓
UPDATE flags (hls_*_done, audio_done, etc.)
    ↓
UPDATE (status='done') when all flags true
    ↓
UPDATE (uploaded=true) after S3 upload

AWS S3

Bucket Structure:

video-pipline/
├── raw/                  # Original uploads
│   └── UUID.mp4
└── processed/            # Transcoded output
    └── VIDEO_ID/
        ├── master.m3u8   # Master playlist
        ├── 240p/
        │   ├── index.m3u8
        │   └── seg_*.ts
        ├── 480p/
        │   ├── index.m3u8
        │   └── seg_*.ts
        └── 720p/
            ├── index.m3u8
            └── seg_*.ts

URL Formats:

Virtual-hosted style: https://bucket.s3.region.amazonaws.com/key
Path style: https://s3.region.amazonaws.com/bucket/key ✓ (used)

Path-style is more reliable for buckets with hyphens.

Data Flow

Upload Phase

Client
  ↓ [1] POST /upload/signed/url
API Server
  ↓ [2] GeneratePresignedURL(S3)
S3
  ↓ [3] Return URL
Client
  ↓ [4] PUT video to presigned URL
S3
  ↓ [5] Store video
Client
  ↓ [6] POST /upload/confirmation
API Server
  ↓ [7] INSERT database record
  ↓ [8] PUBLISH VIDEO.uploaded
NATS

Processing Phase

NATS VIDEO.uploaded
  ↓
┌─────────┬─────────┬─────────┬─────────┐
│ 240p    │ 480p    │ 720p    │ Audio   │
│ Worker  │ Worker  │ Worker  │ Worker  │
└─────────┴─────────┴─────────┴─────────┘
     ↓         ↓         ↓         ↓
Download  Download  Download  Download
     ↓         ↓         ↓         ↓
FFmpeg    FFmpeg    FFmpeg    FFmpeg
Transcode Transcode Transcode Extract
     ↓         ↓         ↓         ↓
HLS       HLS       HLS       Gemini
Segments  Segments  Segments  AI
     ↓         ↓         ↓         ↓
UPDATE    UPDATE    UPDATE    UPDATE
DB        DB        DB        DB
     ↓         ↓         ↓         ↓
PUBLISH   PUBLISH   PUBLISH   PUBLISH
done      done      done      done
     └─────────┴─────────┴─────────┘
                   ↓
              Done Worker
                   ↓
            Check if all
            tasks complete
                   ↓
            Generate master
            playlist
                   ↓
            Upload to S3
                   ↓
            UPDATE DB
            (status=done)
                   ↓
            Cleanup temp
            files

Delivery Phase

S3 (processed/VIDEO_ID/)
  ↓
CDN (optional)
  ↓
Client (HLS Player)
  ↓ [1] Request master.m3u8
  ↓ [2] Parse quality options
  ↓ [3] Request index.m3u8 (selected quality)
  ↓ [4] Download segments (seg_*.ts)
  ↓ [5] Adaptive switching based on bandwidth

Event-Driven Model

Why Event-Driven?

Benefits:

Decoupling: Components don't know about each other
Scalability: Add workers without changing code
Resilience: Failures don't cascade
Async: Non-blocking operations
Replay: Can reprocess events if needed

Tradeoffs:

More complex than synchronous calls
Eventual consistency (not immediate)
Debugging is harder (distributed traces needed)

Event Types

Event	Subject	Payload	Consumers
Upload Complete	VIDEO.uploaded	{id, key, bucket, timestamp}	All workers
Task Complete	VIDEO.processing.done	{videoId, dirPath, bucket}	Done worker

Consumer Patterns

Fan-out (VIDEO.uploaded):

        NATS
         ↓
    ┌────┼────┬────┐
    ↓    ↓    ↓    ↓
  240p  480p  720p Audio

Fan-in (VIDEO.processing.done):

  240p  480p  720p Audio
    ↓    ↓    ↓    ↓
    └────┼────┴────┘
         ↓
      Done Worker

Exactly-Once Processing

Challenge: Network failures can cause duplicate messages.

Solution:

Idempotent operations: Database updates use WHERE clauses
Message deduplication: NATS JetStream uses message IDs
State checks: Done worker checks flags before acting

Example:

// Idempotent update
UPDATE video_processing_state 
SET hls_240_done = true 
WHERE id = ? AND hls_240_done = false

State Management

State Lifecycle

            ┌──────────┐
            │  START   │
            └────┬─────┘
                 ↓
         ┌──────────────┐
         │  PROCESSING  │ ← Initial state
         └──────┬───────┘
                ↓
    ┌───────────────────────┐
    │  All flags = true?    │
    └───┬───────────────┬───┘
        NO              YES
        ↓                ↓
    Continue      ┌──────────┐
    Processing    │   DONE   │
                  └──────────┘

State Consistency

Optimistic Locking:

UPDATE video_processing_state 
SET status = 'done' 
WHERE id = ? 
  AND status = 'processing'  // Prevents double-finalization
  AND hls_240_done = true 
  AND hls_480_done = true 
  AND hls_720_done = true

Atomic Operations:

Single database transaction per state change
All-or-nothing S3 uploads (checked before marking complete)

Scalability

Horizontal Scaling

Workers:

1 worker  → 1 video/time
2 workers → 2 videos/time
N workers → N videos/time

Deploy multiple worker instances:

# Kubernetes
kubectl scale deployment worker-240p --replicas=5

# Docker Compose
docker-compose up --scale worker-240p=5

NATS:

Handles thousands of messages/second
Workers pull from queue (competing consumers)
No coordination needed

Bottlenecks:

Database: Connection pool (tunable)
S3: Request limits (usually not a problem)
FFmpeg: CPU/memory per worker
Disk: Temporary storage per worker

Vertical Scaling

FFmpeg Performance:

CPU: More cores = faster transcoding
Memory: 2-4GB per worker recommended
Disk: SSD for temp files (huge speedup)

Recommendations:

240p/480p workers: 2 CPU, 2GB RAM
720p workers: 4 CPU, 4GB RAM
Audio workers: 1 CPU, 1GB RAM

Failure Handling

Worker Failures

Scenarios:

Worker crashes mid-processing
FFmpeg command fails
S3 upload fails
Database connection lost

Handling:

Failure
  ↓
NATS: NAK (negative acknowledgment)
  ↓
Message returns to queue
  ↓
Another worker picks it up
  ↓
Retry (up to MaxDeliver=5 times)
  ↓
If still failing → Dead Letter Queue(to be done)

Cleanup:

Temporary files removed on failure
Database status remains "processing"
Manual intervention possible via admin tools

Network Failures

S3 Connectivity:

Presigned URLs expire after 10 minutes
Workers cache downloads locally
Retries with exponential backoff

NATS Connectivity:

Workers reconnect automatically
In-flight messages preserved (JetStream)
Processing resumes after reconnection

Database Connectivity:

Connection pool handles transient failures
Workers retry database operations
Long outages cause worker backlog

Partial Failures

Missing Quality: If 720p fails but others succeed:

Video still usable (240p, 480p available)
Master playlist excludes failed quality
Can be reprocessed later

Missing Audio:

Video processing continues
Transcript unavailable
Not critical for playback

Security Considerations

Authentication & Authorization

Current:

No authentication (demo)
S3 presigned URLs provide time-limited access
NATS has no authentication (local)

Production TODO:

Add JWT authentication for API
Use NATS authentication tokens
Implement user quotas and rate limiting
Add API keys for programmatic access

Data Protection

At Rest:

S3: Enable server-side encryption (AES-256)
MySQL: Encrypt sensitive columns
Consider: Client-side encryption for uploads

In Transit:

S3: HTTPS for all operations
NATS: TLS encryption (configure)
MySQL: TLS connections

Input Validation

Current:

File type validation (TODO)
Size limits (TODO)
Malicious content scanning (TODO)

Production TODO:

Validate video formats (only allow MP4, AVI, etc.)
Scan for malware before processing
Rate limit uploads per user
Content moderation (violence, explicit content)

Performance Optimization

FFmpeg Settings

Current:

Preset: "medium" (balance speed/quality)
CRF: 22 (constant quality)
GOP: 48 frames (2 seconds at 24fps)

Tuning:

Faster: preset="fast", lower CRF
Smaller files: preset="slow", higher CRF
GPU: Use NVENC for 10x speedup

Parallelization

Current:

3 transcoding workers + 1 audio worker = 4x parallelism
Each worker processes different qualities

Optimization:

Run multiple instances of each worker type
Use different machines for different qualities
Priority queues for urgent videos

Caching

Opportunities:

Thumbnail cache: Generate once, reuse
CDN: CloudFront in front of S3
Database: Query result caching
Warm pool: Pre-started worker containers

Resource Management

Memory:

Stream large files (don't load entirely)
Limit concurrent FFmpeg processes
Monitor with Prometheus metrics

Disk:

Clean temp files aggressively
Use separate disk for temp storage
Monitor disk usage alerts

Network:

Parallel S3 uploads (multipart)
Compress logs before sending
Regional deployments (closer to users)

Conclusion

This architecture balances:

Simplicity: Easy to understand and debug
Scalability: Horizontal scaling for workers
Reliability: Automatic retries and failure recovery
Performance: Parallel processing and optimized FFmpeg

The event-driven model with NATS JetStream provides flexibility to add new features (like thumbnails, subtitles) without changing existing code.

Questions or suggestions? Open an issue or discussion on GitHub!

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
ai		ai
aws		aws
db		db
models		models
nats		nats
schema		schema
worker		worker
.gitignore		.gitignore
main.go		main.go
readme.markdown		readme.markdown

Folders and files

Latest commit

History

Repository files navigation

Architecture Deep Dive

System Overview

Key Characteristics

Design Principles

1. Separation of Concerns

2. Loose Coupling

3. Eventual Consistency

4. Fail-Safe Design

Component Architecture

API Server (main.go)

Workers (worker/*)

NATS JetStream

MySQL Database

AWS S3

Data Flow

Upload Phase

Processing Phase

Delivery Phase

Event-Driven Model

Why Event-Driven?

Event Types

Consumer Patterns

Exactly-Once Processing

State Management

State Lifecycle

State Consistency

Scalability

Horizontal Scaling

Vertical Scaling

Failure Handling

Worker Failures

Network Failures

Partial Failures

Security Considerations

Authentication & Authorization

Data Protection

Input Validation

Performance Optimization

FFmpeg Settings

Parallelization

Caching

Resource Management

Conclusion

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages