AI-powered ultra-low bitrate audio codec CLI built on SNAC
Compress audio at 0.98–2.6 kbps — up to 392x smaller than WAV — with near-original quality using neural audio codecs.
WAV: 1 min speech = 2.88 MB
MP3: 1 min speech = 960 KB
SNAC: 1 min speech = 7.35 KB ← 392:1 compression
- Ultra-low bitrate: 0.98 kbps (speech) to 2.6 kbps (music)
- Near-original quality: MUSHRA 88.4/100 for speech at 0.98 kbps
- 3 pretrained models: Optimized for speech (24kHz), music (32kHz), high-fidelity (44kHz)
- Custom
.snacformat: Compact binary format with 44-byte header + token data - Quality metrics: SNR, SI-SNR, spectral distance, PESQ (optional)
- Streaming mode: Chunked encoding for large files (1-hour+ podcasts)
- GPU acceleration: CUDA, MPS (Apple Silicon), and CPU support
- Rich terminal UI: Progress bars, colored tables, styled panels
Requires Python 3.10+
pip install audioquant
# With quality metrics (optional)
pip install audioquant[metrics]git clone https://github.com/wjddusrb03/audioquant
cd audioquant
pip install -e ".[dev]"# Compress audio (WAV 2.88MB → SNAC 7.35KB)
audioquant compress podcast.wav
# Decompress back to WAV
audioquant decompress podcast.snac
# Get file info
audioquant info podcast.snac
# Compare all models
audioquant compare podcast.wav
# Benchmark across multiple files
audioquant benchmark file1.wav file2.mp3 file3.flacCompress an audio file to .snac format.
audioquant compress input.wav # Default: snac_24khz
audioquant compress input.wav -m snac_44khz # High-quality music
audioquant compress input.wav -o output.snac # Custom output path
audioquant compress input.wav -d cuda # Force GPU
audioquant compress input.wav --no-progress # No progress barOutput:
╭──────── Compression Complete ────────╮
│ Input: podcast.wav (2.88 MB) │
│ Output: podcast.snac (7.35 KB) │
│ Model: snac_24khz │
│ Ratio: 392:1 │
│ Time: 0.85s (7.1x realtime) │
│ Device: CUDA (NVIDIA RTX 4090) │
╰──────────────────────────────────────╯
Decompress a .snac file back to audio.
audioquant decompress podcast.snac # Default: WAV
audioquant decompress podcast.snac --format flac # FLAC output
audioquant decompress podcast.snac -o restored.wav # Custom path
audioquant decompress podcast.snac -d cuda # Force GPUDisplay metadata about any audio or .snac file.
audioquant info podcast.wav # Audio file info
audioquant info podcast.snac # SNAC file infoCompare compression across all three SNAC models.
audioquant compare input.wav # Table output
audioquant compare input.wav --json # JSON output
audioquant compare input.wav -m snac_24khz,snac_44khz # Specific models
audioquant compare input.wav --no-metrics # Skip quality metricsOutput:
Benchmark Results
┌────────────┬──────┬────────┬───────┬───────┐
│ Model │ Rate │ Size │ Ratio │ SNR │
├────────────┼──────┼────────┼───────┼───────┤
│ snac_24khz │ 24k │ 7.3 KB │ 392:1 │ 28.5 │
│ snac_32khz │ 32k │ 14.2KB │ 203:1 │ 31.2 │
│ snac_44khz │ 44k │ 19.5KB │ 148:1 │ 33.8 │
└────────────┴──────┴────────┴───────┴───────┘
Benchmark models across multiple audio files.
audioquant benchmark *.wav # All WAV files
audioquant benchmark file1.wav file2.mp3 --json # JSON output
audioquant benchmark file1.wav -m snac_24khz,snac_44khz # Specific models
audioquant benchmark file1.wav -d cuda # Force GPUEncode large files using chunked streaming mode (lower memory usage).
audioquant stream long_podcast.wav -o podcast.snac
audioquant stream lecture.wav -o lecture.snac --chunk-duration 2.0
audioquant stream lecture.wav -o lecture.snac -m snac_44khz -d cuda| Model | Sample Rate | Bitrate | Quality (MUSHRA) | Best For | Params |
|---|---|---|---|---|---|
snac_24khz |
24,000 Hz | 0.98 kbps | 88.4 | Speech, podcasts | 19.8M |
snac_32khz |
32,000 Hz | 1.9 kbps | — | Music, sound effects | 54.5M |
snac_44khz |
44,100 Hz | 2.6 kbps | — | High-fidelity music | 54.5M |
| Codec | Bitrate | Speech Quality | Music Quality | CLI Tool |
|---|---|---|---|---|
| SNAC (AudioQuant) | 0.98 kbps | 88.4 | 76.8 | ✅ |
| EnCodec (Meta) | 1.5 kbps | 78.3 | 64.4 | ✅ |
| DAC (Descript) | 2.5 kbps | 85.0 | 54.0 | ✅ |
Audio File → Resample → AI Encoder → Multi-Scale Tokens → .snac File
(WAV/MP3) (to model (SNAC neural (3-4 RVQ levels (44-byte header
rate) network) at different + uint16 tokens)
temporal rates)
SNAC uses Residual Vector Quantization (RVQ) at multiple temporal scales (rates vary by model):
- Level 0 (coarse, 10–14 Hz): Captures overall melody and tone
- Level 1 (medium, 21–29 Hz): Captures timbre and emotion
- Level 2 (fine, 42–57 Hz): Captures pronunciation and detail
- Level 3 (finest, 83–115 Hz): High-frequency detail (32kHz/44kHz models only)
┌─────────────────────────────────────┐
│ Header (44 bytes) │
│ Magic: "SNAC" │
│ Version: 1 │
│ Model: "snac_24khz" │
│ Sample rate, duration, channels │
├─────────────────────────────────────┤
│ Token Data │
│ Level 0: [count] [tokens...] │
│ Level 1: [count] [tokens...] │
│ Level 2: [count] [tokens...] │
└─────────────────────────────────────┘
| Format | Read | Write |
|---|---|---|
| WAV | ✅ | ✅ |
| MP3 | ✅ | ✅ |
| FLAC | ✅ | ✅ |
| OGG | ✅ | ✅ |
| OPUS | ✅ | ✅ |
| SNAC | ✅ | ✅ |
audioquant/
├── src/audioquant/
│ ├── __init__.py # Package version
│ ├── models.py # Dataclasses & model registry
│ ├── codec.py # SNAC wrapper (encode/decode)
│ ├── audio_io.py # Audio file I/O (WAV/MP3/FLAC)
│ ├── format.py # .snac binary file format
│ ├── metrics.py # Quality metrics (SNR, PESQ, spectral)
│ ├── streaming.py # Chunked encode/decode for large files
│ ├── benchmark.py # Cross-model comparison
│ ├── display.py # Rich terminal output
│ └── cli.py # Click CLI commands
├── tests/ # Test suite
├── pyproject.toml
├── README.md
├── README_KO.md
└── LICENSE
# Run all fast tests (no SNAC model required)
pytest tests/ -v
# Run specific test file
pytest tests/test_format.py -v
pytest tests/test_metrics.py -v| Package | Purpose |
|---|---|
snac |
SNAC neural audio codec |
torch |
Tensor operations |
torchaudio |
Audio I/O and resampling |
click |
CLI framework |
rich |
Terminal formatting |
numpy |
Numerical operations |
pesq (optional) |
PESQ quality metric |
| Metric | Description | Range |
|---|---|---|
| SNR | Signal-to-Noise Ratio | Higher = better (dB) |
| SI-SNR | Scale-Invariant SNR | Higher = better (dB) |
| Spectral Distance | Multi-resolution STFT distance | Lower = better |
| PESQ | Perceptual speech quality | 1.0–4.5 |
| Compression Ratio | Original / compressed size | Higher = smaller |
Found a bug? Have a feature request? Please let us know!
- Bug reports: Create an issue
- Feature requests: Create an issue
All feedback is welcome. If something doesn't work as expected, please report it — it helps make AudioQuant better for everyone!
MIT
- SNAC — Multi-Scale Neural Audio Codec by Hubert Siuzdak
- TurboQuant — Inspiration for the practical CLI-on-top-of-research pattern