AudioQuant

AI-powered ultra-low bitrate audio codec CLI built on SNAC

Compress audio at 0.98–2.6 kbps — up to 392x smaller than WAV — with near-original quality using neural audio codecs.

WAV:  1 min speech = 2.88 MB
MP3:  1 min speech = 960 KB
SNAC: 1 min speech = 7.35 KB  ← 392:1 compression

Features

Ultra-low bitrate: 0.98 kbps (speech) to 2.6 kbps (music)
Near-original quality: MUSHRA 88.4/100 for speech at 0.98 kbps
3 pretrained models: Optimized for speech (24kHz), music (32kHz), high-fidelity (44kHz)
Custom .snac format: Compact binary format with 44-byte header + token data
Quality metrics: SNR, SI-SNR, spectral distance, PESQ (optional)
Streaming mode: Chunked encoding for large files (1-hour+ podcasts)
GPU acceleration: CUDA, MPS (Apple Silicon), and CPU support
Rich terminal UI: Progress bars, colored tables, styled panels

Installation

Requires Python 3.10+

pip install audioquant

# With quality metrics (optional)
pip install audioquant[metrics]

From source

git clone https://github.com/wjddusrb03/audioquant
cd audioquant
pip install -e ".[dev]"

Quick Start

# Compress audio (WAV 2.88MB → SNAC 7.35KB)
audioquant compress podcast.wav

# Decompress back to WAV
audioquant decompress podcast.snac

# Get file info
audioquant info podcast.snac

# Compare all models
audioquant compare podcast.wav

# Benchmark across multiple files
audioquant benchmark file1.wav file2.mp3 file3.flac

Commands

`compress`

Compress an audio file to .snac format.

audioquant compress input.wav                          # Default: snac_24khz
audioquant compress input.wav -m snac_44khz            # High-quality music
audioquant compress input.wav -o output.snac           # Custom output path
audioquant compress input.wav -d cuda                  # Force GPU
audioquant compress input.wav --no-progress            # No progress bar

Output:

╭──────── Compression Complete ────────╮
│ Input:  podcast.wav  (2.88 MB)       │
│ Output: podcast.snac  (7.35 KB)      │
│ Model:  snac_24khz                   │
│ Ratio:  392:1                        │
│ Time:   0.85s  (7.1x realtime)       │
│ Device: CUDA (NVIDIA RTX 4090)       │
╰──────────────────────────────────────╯

`decompress`

Decompress a .snac file back to audio.

audioquant decompress podcast.snac                     # Default: WAV
audioquant decompress podcast.snac --format flac       # FLAC output
audioquant decompress podcast.snac -o restored.wav     # Custom path
audioquant decompress podcast.snac -d cuda             # Force GPU

`info`

Display metadata about any audio or .snac file.

audioquant info podcast.wav       # Audio file info
audioquant info podcast.snac      # SNAC file info

`compare`

Compare compression across all three SNAC models.

audioquant compare input.wav                    # Table output
audioquant compare input.wav --json             # JSON output
audioquant compare input.wav -m snac_24khz,snac_44khz  # Specific models
audioquant compare input.wav --no-metrics       # Skip quality metrics

Output:

         Benchmark Results
┌────────────┬──────┬────────┬───────┬───────┐
│ Model      │ Rate │ Size   │ Ratio │ SNR   │
├────────────┼──────┼────────┼───────┼───────┤
│ snac_24khz │ 24k  │ 7.3 KB │ 392:1 │ 28.5  │
│ snac_32khz │ 32k  │ 14.2KB │ 203:1 │ 31.2  │
│ snac_44khz │ 44k  │ 19.5KB │ 148:1 │ 33.8  │
└────────────┴──────┴────────┴───────┴───────┘

`benchmark`

Benchmark models across multiple audio files.

audioquant benchmark *.wav                          # All WAV files
audioquant benchmark file1.wav file2.mp3 --json     # JSON output
audioquant benchmark file1.wav -m snac_24khz,snac_44khz  # Specific models
audioquant benchmark file1.wav -d cuda              # Force GPU

`stream`

Encode large files using chunked streaming mode (lower memory usage).

audioquant stream long_podcast.wav -o podcast.snac
audioquant stream lecture.wav -o lecture.snac --chunk-duration 2.0
audioquant stream lecture.wav -o lecture.snac -m snac_44khz -d cuda

SNAC Models

Model	Sample Rate	Bitrate	Quality (MUSHRA)	Best For	Params
`snac_24khz`	24,000 Hz	0.98 kbps	88.4	Speech, podcasts	19.8M
`snac_32khz`	32,000 Hz	1.9 kbps	—	Music, sound effects	54.5M
`snac_44khz`	44,100 Hz	2.6 kbps	—	High-fidelity music	54.5M

vs Other Codecs

Codec	Bitrate	Speech Quality	Music Quality	CLI Tool
SNAC (AudioQuant)	0.98 kbps	88.4	76.8	✅
EnCodec (Meta)	1.5 kbps	78.3	64.4	✅
DAC (Descript)	2.5 kbps	85.0	54.0	✅

How It Works

Encoding (Compression)

Audio File  →  Resample  →  AI Encoder  →  Multi-Scale Tokens  →  .snac File
(WAV/MP3)     (to model     (SNAC neural    (3-4 RVQ levels       (44-byte header
               rate)         network)         at different           + uint16 tokens)
                                              temporal rates)

SNAC uses Residual Vector Quantization (RVQ) at multiple temporal scales (rates vary by model):

Level 0 (coarse, 10–14 Hz): Captures overall melody and tone
Level 1 (medium, 21–29 Hz): Captures timbre and emotion
Level 2 (fine, 42–57 Hz): Captures pronunciation and detail
Level 3 (finest, 83–115 Hz): High-frequency detail (32kHz/44kHz models only)

`.snac` File Format

┌─────────────────────────────────────┐
│ Header (44 bytes)                   │
│   Magic: "SNAC"                     │
│   Version: 1                        │
│   Model: "snac_24khz"              │
│   Sample rate, duration, channels   │
├─────────────────────────────────────┤
│ Token Data                          │
│   Level 0: [count] [tokens...]      │
│   Level 1: [count] [tokens...]      │
│   Level 2: [count] [tokens...]      │
└─────────────────────────────────────┘

Supported Formats

Format	Read	Write
WAV	✅	✅
MP3	✅	✅
FLAC	✅	✅
OGG	✅	✅
OPUS	✅	✅
SNAC	✅	✅

Project Structure

audioquant/
├── src/audioquant/
│   ├── __init__.py       # Package version
│   ├── models.py         # Dataclasses & model registry
│   ├── codec.py          # SNAC wrapper (encode/decode)
│   ├── audio_io.py       # Audio file I/O (WAV/MP3/FLAC)
│   ├── format.py         # .snac binary file format
│   ├── metrics.py        # Quality metrics (SNR, PESQ, spectral)
│   ├── streaming.py      # Chunked encode/decode for large files
│   ├── benchmark.py      # Cross-model comparison
│   ├── display.py        # Rich terminal output
│   └── cli.py            # Click CLI commands
├── tests/                # Test suite
├── pyproject.toml
├── README.md
├── README_KO.md
└── LICENSE

Testing

# Run all fast tests (no SNAC model required)
pytest tests/ -v

# Run specific test file
pytest tests/test_format.py -v
pytest tests/test_metrics.py -v

Dependencies

Package	Purpose
`snac`	SNAC neural audio codec
`torch`	Tensor operations
`torchaudio`	Audio I/O and resampling
`click`	CLI framework
`rich`	Terminal formatting
`numpy`	Numerical operations
`pesq` (optional)	PESQ quality metric

Quality Metrics

Metric	Description	Range
SNR	Signal-to-Noise Ratio	Higher = better (dB)
SI-SNR	Scale-Invariant SNR	Higher = better (dB)
Spectral Distance	Multi-resolution STFT distance	Lower = better
PESQ	Perceptual speech quality	1.0–4.5
Compression Ratio	Original / compressed size	Higher = smaller

Issues & Contributions

Found a bug? Have a feature request? Please let us know!

Bug reports: Create an issue
Feature requests: Create an issue

All feedback is welcome. If something doesn't work as expected, please report it — it helps make AudioQuant better for everyone!

License

MIT

Acknowledgments

SNAC — Multi-Scale Neural Audio Codec by Hubert Siuzdak
TurboQuant — Inspiration for the practical CLI-on-top-of-research pattern

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AudioQuant

Features

Installation

From source

Quick Start

Commands

`compress`

`decompress`

`info`

`compare`

`benchmark`

`stream`

SNAC Models

vs Other Codecs

How It Works

Encoding (Compression)

`.snac` File Format

Supported Formats

Project Structure

Testing

Dependencies

Quality Metrics

Issues & Contributions

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
src/audioquant		src/audioquant
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_KO.md		README_KO.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

AudioQuant

Features

Installation

From source

Quick Start

Commands

compress

decompress

info

compare

benchmark

stream

SNAC Models

vs Other Codecs

How It Works

Encoding (Compression)

.snac File Format

Supported Formats

Project Structure

Testing

Dependencies

Quality Metrics

Issues & Contributions

License

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`compress`

`decompress`

`info`

`compare`

`benchmark`

`stream`

`.snac` File Format

Packages