Serverless thumbnail and proxy generation for high-resolution media on VAST DataEngine.
oiio-proxy-generator is a VAST DataEngine function that automatically generates thumbnails and H.264 proxies from EXR, DPX, and other high-resolution media files as they are ingested into a VAST S3 bucket. It applies color space transforms via OpenColorIO, persists generation metadata to VAST DataBase, and publishes completion events via Kafka for downstream consumption.
Media file uploaded to S3 bucket (.exr, .dpx, .tif, .tiff)
--> VAST DataEngine ElementCreated trigger
--> oiio-proxy-generator function container
--> S3 download via boto3
--> OCIO color space detection
--> OpenImageIO resize (oiiotool)
--> JPEG thumbnail generation (sRGB, 256x256)
--> ffmpeg H.264 encoding (Rec.709, 1920x1080)
--> S3 upload for outputs
--> Persist metadata to VAST DataBase
--> Publish Kafka event (spaceharbor.proxy topic)
--> Return structured JSON result
Performance: Processes files concurrently with minimal ephemeral disk usage. S3 upload/download and OCIO transforms are streamed. Processing time: typically 5-30 seconds depending on file size and resolution.
| Output | Format | Resolution | Color Space | Purpose |
|---|---|---|---|---|
| Thumbnail | JPEG | 256x256 | sRGB | Web preview, UI display |
| Proxy | H.264 MP4 | 1920x1080 | Rec.709 | VFX review, editing, streaming |
Metadata persisted to VAST DataBase:
- Detected source color space
- Thumbnail and proxy S3 locations
- File sizes (source, thumb, proxy)
- Processing duration
- Generation timestamp and version
functions/oiio_proxy_generator/
main.py # DataEngine handler (init + handler)
vast_db_persistence.py # VAST DataBase persistence, file_id computation
oiio_processor.py # OpenImageIO thumbnail/proxy generation
ocio_transform.py # OCIO color space detection and transforms
publisher.py # Kafka event publishing
requirements.txt # Python dependencies
Aptfile # System packages (libopenimageio-dev, ffmpeg, etc.)
Dockerfile.fix # LD_LIBRARY_PATH fix for CNB buildpack images
docs/
DEPLOYMENT.md # Build, deploy, and configure guide
DATABASE_SCHEMA.md # proxy_outputs table schema, queries
ARCHITECTURE.md # Event flow, module design, parallel pipeline
CONFIGURATION.md # Environment variables reference
TROUBLESHOOTING.md # Common issues and solutions
# Clone
git clone https://github.com/ssotoa70/oiio-proxy-generator.git
cd oiio-proxy-generator
# Install dependencies (local development)
pip install -r functions/oiio_proxy_generator/requirements.txt
# Run tests (no VAST cluster required)
pytest functions/oiio_proxy_generator/test_vast_db_persistence.py -v
# Build container image
vastde functions build oiio-proxy-generator --target functions/oiio_proxy_generator --pull-policy never
# See docs/DEPLOYMENT.md for full deployment guide| Document | Description |
|---|---|
| Deployment Guide | Build, push, create function, configure pipeline, manage triggers |
| Database Schema | proxy_outputs table definition, JOIN with exr-inspector, query examples |
| Architecture | Event flow, module responsibilities, design decisions, parallel pipeline |
| Configuration | Environment variables, credentials, secrets, pipeline setup |
| Troubleshooting | Common issues, oiiotool/ffmpeg errors, OCIO config, database failures |
- VAST Cluster 5.4+ with DataEngine enabled
- vastde CLI v5.4.1+
- Docker with
min-api-version: "1.38"(see Troubleshooting) - Python 3.12 (container runtime)
- S3 bucket with DataEngine element trigger configured
- Database-enabled bucket for VAST DataBase persistence (optional, but recommended)
This function operates in parallel with exr-inspector, sharing the same VAST DataBase schema (exr_metadata_2). Both functions:
- Trigger on the same
ElementCreatedevent (EXR files) - Use identical
file_idcomputation (SHA256-based) - Persist to the same schema, different tables
This enables cross-functional queries via SQL JOINs:
SELECT e.file_path, e.multipart_count, p.thumbnail_s3_key, p.proxy_s3_key
FROM exr_metadata_2.files e
JOIN exr_metadata_2.proxy_outputs p ON e.file_id = p.file_id
WHERE e.is_deep = true
ORDER BY e.size_bytes DESC;For shared trigger configuration, see DEPLOYMENT.md.
Currently generates two outputs per file: thumbnail (256x256) + proxy (native or capped at 1920x1080).
A future release will add a third output — full-resolution preview — for detailed inspection and QC workflows:
| Output | Resolution | Use Case |
|---|---|---|
| Thumbnail | 256x256 | Grid views, asset browsers |
| Proxy | 1920x1080 (or native if smaller) | Quick review, editorial playback |
| Preview | Full source resolution | Detailed inspection, QC, pixel-level review |
Expected cost: +15-20% processing time per file. The expensive EXR decode + colorconvert happens once; adding a third branch only costs resize + JPEG encode.
Storage impact: Preview files will be significantly larger (~2-10MB for 4K sources) compared to the capped proxy (~100KB-500KB).
- Sequence detection and assembly (handled by future sequence-assembler function)
- Middle-frame thumbnail for sequences (instead of per-frame)
- Poster-frame override (user-designatable hero frame)
- OCIO ACES config support for facility-specific color pipelines