Skip to content

yeshu2004/video-pipline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Architecture Deep Dive

This document provides a detailed explanation of the video processing pipeline's architecture, design decisions, and implementation details.


System Overview

The video processing pipeline is a distributed, event-driven system designed for converting uploaded videos into adaptive bitrate streaming (HLS) format with multiple quality levels.

Key Characteristics

  • Distributed: Multiple workers process videos in parallel
  • Event-Driven: NATS JetStream coordinates asynchronous processing
  • Stateless Workers: Workers can be scaled horizontally
  • Idempotent: Retry-safe operations with exactly-once semantics
  • Resilient: Automatic retries and failure recovery

Design Principles

1. Separation of Concerns

Each component has a single, well-defined responsibility:

  • API Server: Handle HTTP requests and coordinate uploads
  • Workers: Process specific video qualities
  • NATS: Message distribution and event coordination
  • MySQL: Persistent state tracking
  • S3: Durable storage

2. Loose Coupling

Components communicate through:

  • Events (NATS messages) - not direct API calls
  • Shared storage (S3) - not in-memory data
  • Database state (MySQL) - not local files

This allows independent deployment, scaling, and failure.

3. Eventual Consistency

Video processing is asynchronous:

  • Upload confirmation returns immediately
  • Processing happens in background
  • Status tracked in database
  • Final result appears in S3

4. Fail-Safe Design

  • Workers retry failed operations (max 5 attempts)
  • Temporary files cleaned up after success/failure
  • Dead letter queues for permanent failures
  • Graceful degradation (missing audio/transcript doesn't fail video)

Component Architecture

API Server (main.go)

Responsibilities:

  • Accept video upload requests
  • Generate S3 presigned URLs
  • Publish upload events to NATS
  • Track video state in database

Technology:

  • Gin HTTP framework
  • RESTful endpoints
  • CORS middleware for web clients

Key Operations:

  1. POST /upload/signed/url - Generate upload URL
  2. POST /upload/confirmation - Trigger processing

Workers (worker/*)

Types:

  1. Transcoding Workers (240p, 480p, 720p)

    • Download raw video from S3
    • Transcode with FFmpeg to target resolution
    • Generate HLS segments (6-second chunks)
    • Upload segments to local storage
    • Update database status
    • Publish completion event
  2. Audio Worker

    • Download raw video
    • Extract audio track with FFmpeg
    • Send to Gemini AI for transcription
    • Save transcript as text
    • Update database status
    • Publish completion event
  3. Done Worker

    • Listen for completion events
    • Check if all tasks complete
    • Generate master.m3u8 playlist
    • Upload all HLS files to S3
    • Update database to "done" status
    • Clean up temporary files

NATS JetStream

Stream Configuration:

Name: VIDEO
Subjects: VIDEO.>
Storage: File-based (persistent)
Retention: Limits-based

Event Flow:

VIDEO.uploaded → Workers (240p, 480p, 720p, audio)
                    ↓
Workers → VIDEO.processing.done
                    ↓
         Done Worker → Finalization

Consumer Properties:

  • Durable (survive restarts)
  • Explicit ACK (manual confirmation)
  • 30-second ACK timeout
  • Max 5 delivery attempts

MySQL Database

Schema:

video_processing_state
├── id (VARCHAR, PRIMARY KEY) - Video UUID
├── video_id (VARCHAR) - Original S3 key
├── hls_240_done (BOOLEAN) - 240p complete
├── hls_480_done (BOOLEAN) - 480p complete
├── hls_720_done (BOOLEAN) - 720p complete
├── audio_done (BOOLEAN) - Audio extracted
├── transcript_done (BOOLEAN) - Transcription complete
├── master_done (BOOLEAN) - Master playlist created
├── uploaded (BOOLEAN) - Uploaded to S3
├── status (ENUM) - processing | done | failed
├── created_at (TIMESTAMP)
└── updated_at (TIMESTAMP)

State Transitions:

NULL → INSERT (status='processing')
    ↓
UPDATE flags (hls_*_done, audio_done, etc.)
    ↓
UPDATE (status='done') when all flags true
    ↓
UPDATE (uploaded=true) after S3 upload

AWS S3

Bucket Structure:

video-pipline/
├── raw/                  # Original uploads
│   └── UUID.mp4
└── processed/            # Transcoded output
    └── VIDEO_ID/
        ├── master.m3u8   # Master playlist
        ├── 240p/
        │   ├── index.m3u8
        │   └── seg_*.ts
        ├── 480p/
        │   ├── index.m3u8
        │   └── seg_*.ts
        └── 720p/
            ├── index.m3u8
            └── seg_*.ts

URL Formats:

  • Virtual-hosted style: https://bucket.s3.region.amazonaws.com/key
  • Path style: https://s3.region.amazonaws.com/bucket/key ✓ (used)

Path-style is more reliable for buckets with hyphens.


Data Flow

Upload Phase

Client
  ↓ [1] POST /upload/signed/url
API Server
  ↓ [2] GeneratePresignedURL(S3)
S3
  ↓ [3] Return URL
Client
  ↓ [4] PUT video to presigned URL
S3
  ↓ [5] Store video
Client
  ↓ [6] POST /upload/confirmation
API Server
  ↓ [7] INSERT database record
  ↓ [8] PUBLISH VIDEO.uploaded
NATS

Processing Phase

NATS VIDEO.uploaded
  ↓
┌─────────┬─────────┬─────────┬─────────┐
│ 240p    │ 480p    │ 720p    │ Audio   │
│ Worker  │ Worker  │ Worker  │ Worker  │
└─────────┴─────────┴─────────┴─────────┘
     ↓         ↓         ↓         ↓
Download  Download  Download  Download
     ↓         ↓         ↓         ↓
FFmpeg    FFmpeg    FFmpeg    FFmpeg
Transcode Transcode Transcode Extract
     ↓         ↓         ↓         ↓
HLS       HLS       HLS       Gemini
Segments  Segments  Segments  AI
     ↓         ↓         ↓         ↓
UPDATE    UPDATE    UPDATE    UPDATE
DB        DB        DB        DB
     ↓         ↓         ↓         ↓
PUBLISH   PUBLISH   PUBLISH   PUBLISH
done      done      done      done
     └─────────┴─────────┴─────────┘
                   ↓
              Done Worker
                   ↓
            Check if all
            tasks complete
                   ↓
            Generate master
            playlist
                   ↓
            Upload to S3
                   ↓
            UPDATE DB
            (status=done)
                   ↓
            Cleanup temp
            files

Delivery Phase

S3 (processed/VIDEO_ID/)
  ↓
CDN (optional)
  ↓
Client (HLS Player)
  ↓ [1] Request master.m3u8
  ↓ [2] Parse quality options
  ↓ [3] Request index.m3u8 (selected quality)
  ↓ [4] Download segments (seg_*.ts)
  ↓ [5] Adaptive switching based on bandwidth

Event-Driven Model

Why Event-Driven?

Benefits:

  1. Decoupling: Components don't know about each other
  2. Scalability: Add workers without changing code
  3. Resilience: Failures don't cascade
  4. Async: Non-blocking operations
  5. Replay: Can reprocess events if needed

Tradeoffs:

  • More complex than synchronous calls
  • Eventual consistency (not immediate)
  • Debugging is harder (distributed traces needed)

Event Types

Event Subject Payload Consumers
Upload Complete VIDEO.uploaded {id, key, bucket, timestamp} All workers
Task Complete VIDEO.processing.done {videoId, dirPath, bucket} Done worker

Consumer Patterns

Fan-out (VIDEO.uploaded):

        NATS
         ↓
    ┌────┼────┬────┐
    ↓    ↓    ↓    ↓
  240p  480p  720p Audio

Fan-in (VIDEO.processing.done):

  240p  480p  720p Audio
    ↓    ↓    ↓    ↓
    └────┼────┴────┘
         ↓
      Done Worker

Exactly-Once Processing

Challenge: Network failures can cause duplicate messages.

Solution:

  1. Idempotent operations: Database updates use WHERE clauses
  2. Message deduplication: NATS JetStream uses message IDs
  3. State checks: Done worker checks flags before acting

Example:

// Idempotent update
UPDATE video_processing_state 
SET hls_240_done = true 
WHERE id = ? AND hls_240_done = false

State Management

State Lifecycle

            ┌──────────┐
            │  START   │
            └────┬─────┘
                 ↓
         ┌──────────────┐
         │  PROCESSING  │ ← Initial state
         └──────┬───────┘
                ↓
    ┌───────────────────────┐
    │  All flags = true?    │
    └───┬───────────────┬───┘
        NO              YES
        ↓                ↓
    Continue      ┌──────────┐
    Processing    │   DONE   │
                  └──────────┘

State Consistency

Optimistic Locking:

UPDATE video_processing_state 
SET status = 'done' 
WHERE id = ? 
  AND status = 'processing'  // Prevents double-finalization
  AND hls_240_done = true 
  AND hls_480_done = true 
  AND hls_720_done = true

Atomic Operations:

  • Single database transaction per state change
  • All-or-nothing S3 uploads (checked before marking complete)

Scalability

Horizontal Scaling

Workers:

1 worker  → 1 video/time
2 workers → 2 videos/time
N workers → N videos/time

Deploy multiple worker instances:

# Kubernetes
kubectl scale deployment worker-240p --replicas=5

# Docker Compose
docker-compose up --scale worker-240p=5

NATS:

  • Handles thousands of messages/second
  • Workers pull from queue (competing consumers)
  • No coordination needed

Bottlenecks:

  1. Database: Connection pool (tunable)
  2. S3: Request limits (usually not a problem)
  3. FFmpeg: CPU/memory per worker
  4. Disk: Temporary storage per worker

Vertical Scaling

FFmpeg Performance:

  • CPU: More cores = faster transcoding
  • Memory: 2-4GB per worker recommended
  • Disk: SSD for temp files (huge speedup)

Recommendations:

  • 240p/480p workers: 2 CPU, 2GB RAM
  • 720p workers: 4 CPU, 4GB RAM
  • Audio workers: 1 CPU, 1GB RAM

Failure Handling

Worker Failures

Scenarios:

  1. Worker crashes mid-processing
  2. FFmpeg command fails
  3. S3 upload fails
  4. Database connection lost

Handling:

Failure
  ↓
NATS: NAK (negative acknowledgment)
  ↓
Message returns to queue
  ↓
Another worker picks it up
  ↓
Retry (up to MaxDeliver=5 times)
  ↓
If still failing → Dead Letter Queue(to be done)

Cleanup:

  • Temporary files removed on failure
  • Database status remains "processing"
  • Manual intervention possible via admin tools

Network Failures

S3 Connectivity:

  • Presigned URLs expire after 10 minutes
  • Workers cache downloads locally
  • Retries with exponential backoff

NATS Connectivity:

  • Workers reconnect automatically
  • In-flight messages preserved (JetStream)
  • Processing resumes after reconnection

Database Connectivity:

  • Connection pool handles transient failures
  • Workers retry database operations
  • Long outages cause worker backlog

Partial Failures

Missing Quality: If 720p fails but others succeed:

  • Video still usable (240p, 480p available)
  • Master playlist excludes failed quality
  • Can be reprocessed later

Missing Audio:

  • Video processing continues
  • Transcript unavailable
  • Not critical for playback

Security Considerations

Authentication & Authorization

Current:

  • No authentication (demo)
  • S3 presigned URLs provide time-limited access
  • NATS has no authentication (local)

Production TODO:

  • Add JWT authentication for API
  • Use NATS authentication tokens
  • Implement user quotas and rate limiting
  • Add API keys for programmatic access

Data Protection

At Rest:

  • S3: Enable server-side encryption (AES-256)
  • MySQL: Encrypt sensitive columns
  • Consider: Client-side encryption for uploads

In Transit:

  • S3: HTTPS for all operations
  • NATS: TLS encryption (configure)
  • MySQL: TLS connections

Input Validation

Current:

  • File type validation (TODO)
  • Size limits (TODO)
  • Malicious content scanning (TODO)

Production TODO:

  • Validate video formats (only allow MP4, AVI, etc.)
  • Scan for malware before processing
  • Rate limit uploads per user
  • Content moderation (violence, explicit content)

Performance Optimization

FFmpeg Settings

Current:

  • Preset: "medium" (balance speed/quality)
  • CRF: 22 (constant quality)
  • GOP: 48 frames (2 seconds at 24fps)

Tuning:

  • Faster: preset="fast", lower CRF
  • Smaller files: preset="slow", higher CRF
  • GPU: Use NVENC for 10x speedup

Parallelization

Current:

  • 3 transcoding workers + 1 audio worker = 4x parallelism
  • Each worker processes different qualities

Optimization:

  • Run multiple instances of each worker type
  • Use different machines for different qualities
  • Priority queues for urgent videos

Caching

Opportunities:

  1. Thumbnail cache: Generate once, reuse
  2. CDN: CloudFront in front of S3
  3. Database: Query result caching
  4. Warm pool: Pre-started worker containers

Resource Management

Memory:

  • Stream large files (don't load entirely)
  • Limit concurrent FFmpeg processes
  • Monitor with Prometheus metrics

Disk:

  • Clean temp files aggressively
  • Use separate disk for temp storage
  • Monitor disk usage alerts

Network:

  • Parallel S3 uploads (multipart)
  • Compress logs before sending
  • Regional deployments (closer to users)

Conclusion

This architecture balances:

  • Simplicity: Easy to understand and debug
  • Scalability: Horizontal scaling for workers
  • Reliability: Automatic retries and failure recovery
  • Performance: Parallel processing and optimized FFmpeg

The event-driven model with NATS JetStream provides flexibility to add new features (like thumbnails, subtitles) without changing existing code.


Questions or suggestions? Open an issue or discussion on GitHub!

About

Tried a distributed, event-driven video processing system for adaptive bitrate streaming (ABS). Converts uploaded videos into multiple resolutions (240p, 480p, 720p) with HLS segmentation, enabling bandwidth-adaptive playback similar to YouTube or Netflix

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages