Research project exploring what video compression looks like when designed from scratch for GPU parallelism, rather than adapting CPU-era algorithms.
Rust + wgpu compute shaders (WGSL). Cross-platform: Metal, Vulkan, DX12, WebGPU/WASM. Patent-free.
Traditional codecs (H.264, HEVC, AV1) are shaped by decades of CPU constraints — sequential processing, complex prediction modes, intricate entropy coding with state chains. GPUs offer thousands of parallel threads, but these codecs can't exploit them.
GNC asks: if you start from zero with a GPU-first mindset, what do you end up with?
The answer so far: tile-independent processing, fully parallel entropy coding (256 independent streams per tile), and wavelet transforms that map naturally to GPU workgroups. The result competes with JPEG on compression while encoding/decoding at 40–70 fps at 1080p — including a full I/P/B video pipeline.
Spatial codec (I+P+B) is production-ready — 40 dB PSNR at 2–8 bpp, 20–60 fps encode/decode at 1080p, lossless mode, full video pipeline with motion estimation.
Temporal wavelet mode (--temporal-wavelet haar) is experimental — works for low-motion content but loses 2–5 dB on high-motion sequences; off by default.
See RESEARCH_LOG.md for detailed benchmarks and analysis.
| Quality | PSNR | BPP | Encode | Decode |
|---|---|---|---|---|
| q=25 (high compression) | 33.2 dB | 1.71 | 39 fps | 72 fps |
| q=50 (balanced) | 37.7 dB | 2.37 | 40 fps | 60 fps |
| q=75 (good quality) | 42.8 dB | 4.01 | 40 fps | 59 fps |
| q=90 (high quality) | 50.5 dB | 8.90 | 40 fps | 63 fps |
31.7 fps (1080p, q=75, keyframe interval 8, I+P+B frames)
Everything runs as wgpu compute shaders. The pipeline:
RGB → YCoCg-R → Wavelet → Quantize → Entropy Code → Bitstream
↕ ↕ ↕ ↕
(lossless (CDF 9/7 (adaptive, (Rice+ZRL:
integer) or 5/3) CfL, AQ) 256 streams)
Each tile (256x256) is fully independent — no cross-tile dependencies. This gives parallelism, random access, and error resilience for free. See docs/PIPELINE.md for a detailed stage-by-stage breakdown.
- Color space — YCoCg-R via lifting (integer-exact, lossless-capable)
- Wavelet transform — CDF 9/7 for lossy (q=1–99), LeGall 5/3 for lossless (q=100), 3–4 decomposition levels
- Adaptive quantization — Per-block variance analysis on LL subband, geometric mean normalization, 3×3 spatial smoothing
- Quantization — Uniform scalar with perceptual subband weights, dead zone, adaptive QP from AQ weight map. Fused quantize+histogram kernel when CfL is off.
- Chroma-from-Luma (CfL) — Per-tile per-subband least-squares alpha (14-bit), active at q=50–85. Encodes chroma residuals instead of raw coefficients.
- Entropy coding — Rice+ZRL (default): significance map + Golomb-Rice + zero-run-length, 256 independent streams per tile. rANS (32 streams), Huffman (64-symbol), and Bitplane also available but parked.
- I/P/B frames — motion-compensated prediction with half-pel bilinear interpolation
- Motion estimation — hierarchical coarse-to-fine block matching (16x16, ±32px search)
- Container — GNV1 format with frame index table, keyframe seeking
- Error resilience — per-tile CRC-32 checksums, corrupt tile detection and recovery
cargo build --releasegnc encode -i input.png -o output.gpuc -q 75
gnc decode -i output.gpuc -o output.pnggnc benchmark -i input.png -q 75 # Rice+ZRL (default)
gnc benchmark -i input.png -q 75 --rans # rANS entropy (better compression)gnc rd-curve -i input.png # sweep q=10..100, output CSV
gnc rd-curve -i input.png --compare-codecs # also compare vs JPEG, JPEG 2000gnc encode-sequence -i "frames/%04d.png" -o video.gnv -q 75 --keyframe-interval 8
gnc decode-sequence -i video.gnv -o "output/%04d.png"
gnc decode-sequence -i video.gnv -o "output/%04d.png" --seek 5.0 # seek to 5scargo test --release # 148 tests: unit, regression, conformancecd test_material && bash fetch_test_frames.shDownloads representative broadcast frames from Xiph.org (requires ffmpeg and curl).
GNC has four entropy coding backends, all running as GPU compute shaders:
| Coder | Streams/tile | Compression | Speed | Patent risk |
|---|---|---|---|---|
| Rice+ZRL (default) | 256 | 4.01 bpp @ q=75 | 1.5–2× faster | None |
rANS (--rans) |
32 | 4.22 bpp @ q=75 | Baseline | Possible (MS patent) |
| Huffman (parked) | 256 | 64-symbol + escape | Moderate | None |
| Bitplane (parked) | Per-block | Sign + magnitude bitplanes | Moderate | None |
Rice is the default because it eliminates the sequential state chain that limits rANS. Each of the 256 streams encodes independently — no shared state, no synchronization, minimal shared memory (< 1 KB vs rANS's 16 KB frequency tables). rANS, Huffman, and Bitplane are available but parked — they'll be revisited once speed targets are met.
Smooth, monotonic quality scaling from lossless to extreme compression:
q=100 Lossless — bit-exact round-trip (LeGall 5/3 integer wavelet)
q=90 High quality — 49 dB PSNR, near-transparent
q=75 Production — 42 dB PSNR, good general-purpose quality
q=50 Balanced — 37 dB PSNR, CfL + adaptive quantization
q=25 Compressed — 33 dB PSNR, broadcast-suitable
q=5 Extreme — 27 dB PSNR, 0.47 bpp (preview/thumbnail)
The full decoder compiles to WebAssembly (263 KB) and runs in browsers via WebGPU:
wasm-pack build --target web --releaseBrowser demo in examples/web/index.html.
src/
├── lib.rs Core types, quality_preset(), codec config
├── main.rs CLI (encode, decode, benchmark, rd-curve, ...)
├── format.rs Bitstream serialization (GP11 frame, GNV1 sequence)
├── encoder/
│ ├── pipeline.rs Encoder orchestration
│ ├── sequence.rs Video sequence, B-frames, rate control
│ ├── rice.rs CPU Rice encoder/decoder (reference)
│ ├── rice_gpu.rs GPU Rice encoder/decoder
│ ├── rans.rs CPU rANS encoder/decoder
│ ├── rans_gpu_encode.rs GPU rANS encoder
│ ├── huffman_gpu.rs GPU Huffman encoder
│ ├── motion.rs Motion estimation and compensation
│ ├── cfl.rs Chroma-from-Luma prediction
│ ├── adaptive.rs Adaptive quantization
│ ├── fused_block.rs Block DCT-8×8 mega-kernel
│ └── ...
├── decoder/
│ ├── pipeline.rs Decoder orchestration
│ ├── frame_data.rs Frame data upload
│ └── gpu_work.rs GPU dispatch
├── shaders/ 42 WGSL compute shaders
│ ├── rice_encode.wgsl, rice_decode.wgsl
│ ├── rans_encode.wgsl, rans_decode.wgsl
│ ├── transform_97.wgsl, transform_53.wgsl
│ ├── block_match.wgsl, motion_compensate.wgsl
│ └── ...
├── bench/ BD-rate, codec comparison, quality metrics
└── experiments/ Experimental features
tests/
├── quality_regression.rs Golden-baseline regression (q=25/50/75/90)
├── conformance.rs 5 conformance bitstreams + corruption tests
└── golden_baselines.toml Reference PSNR/SSIM/bpp values
docs/PIPELINE.md— Detailed encode pipeline descriptiondocs/BITSTREAM_SPEC.md— Complete bitstream format specification (GP11 frame, GNV1 sequence)RESEARCH_LOG.md— Experiment log with hypotheses, results, analysis
All code is patent-free. No H.264/5/6 patent pool or MPEG-LA encumbered techniques. All dependencies are open source.