Skip to content

wjddusrb03/audioquant

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AudioQuant

AI-powered ultra-low bitrate audio codec CLI built on SNAC

Compress audio at 0.98–2.6 kbps — up to 392x smaller than WAV — with near-original quality using neural audio codecs.

WAV:  1 min speech = 2.88 MB
MP3:  1 min speech = 960 KB
SNAC: 1 min speech = 7.35 KB  ← 392:1 compression

Features

  • Ultra-low bitrate: 0.98 kbps (speech) to 2.6 kbps (music)
  • Near-original quality: MUSHRA 88.4/100 for speech at 0.98 kbps
  • 3 pretrained models: Optimized for speech (24kHz), music (32kHz), high-fidelity (44kHz)
  • Custom .snac format: Compact binary format with 44-byte header + token data
  • Quality metrics: SNR, SI-SNR, spectral distance, PESQ (optional)
  • Streaming mode: Chunked encoding for large files (1-hour+ podcasts)
  • GPU acceleration: CUDA, MPS (Apple Silicon), and CPU support
  • Rich terminal UI: Progress bars, colored tables, styled panels

Installation

Requires Python 3.10+

pip install audioquant

# With quality metrics (optional)
pip install audioquant[metrics]

From source

git clone https://github.com/wjddusrb03/audioquant
cd audioquant
pip install -e ".[dev]"

Quick Start

# Compress audio (WAV 2.88MB → SNAC 7.35KB)
audioquant compress podcast.wav

# Decompress back to WAV
audioquant decompress podcast.snac

# Get file info
audioquant info podcast.snac

# Compare all models
audioquant compare podcast.wav

# Benchmark across multiple files
audioquant benchmark file1.wav file2.mp3 file3.flac

Commands

compress

Compress an audio file to .snac format.

audioquant compress input.wav                          # Default: snac_24khz
audioquant compress input.wav -m snac_44khz            # High-quality music
audioquant compress input.wav -o output.snac           # Custom output path
audioquant compress input.wav -d cuda                  # Force GPU
audioquant compress input.wav --no-progress            # No progress bar

Output:

╭──────── Compression Complete ────────╮
│ Input:  podcast.wav  (2.88 MB)       │
│ Output: podcast.snac  (7.35 KB)      │
│ Model:  snac_24khz                   │
│ Ratio:  392:1                        │
│ Time:   0.85s  (7.1x realtime)       │
│ Device: CUDA (NVIDIA RTX 4090)       │
╰──────────────────────────────────────╯

decompress

Decompress a .snac file back to audio.

audioquant decompress podcast.snac                     # Default: WAV
audioquant decompress podcast.snac --format flac       # FLAC output
audioquant decompress podcast.snac -o restored.wav     # Custom path
audioquant decompress podcast.snac -d cuda             # Force GPU

info

Display metadata about any audio or .snac file.

audioquant info podcast.wav       # Audio file info
audioquant info podcast.snac      # SNAC file info

compare

Compare compression across all three SNAC models.

audioquant compare input.wav                    # Table output
audioquant compare input.wav --json             # JSON output
audioquant compare input.wav -m snac_24khz,snac_44khz  # Specific models
audioquant compare input.wav --no-metrics       # Skip quality metrics

Output:

         Benchmark Results
┌────────────┬──────┬────────┬───────┬───────┐
│ Model      │ Rate │ Size   │ Ratio │ SNR   │
├────────────┼──────┼────────┼───────┼───────┤
│ snac_24khz │ 24k  │ 7.3 KB │ 392:1 │ 28.5  │
│ snac_32khz │ 32k  │ 14.2KB │ 203:1 │ 31.2  │
│ snac_44khz │ 44k  │ 19.5KB │ 148:1 │ 33.8  │
└────────────┴──────┴────────┴───────┴───────┘

benchmark

Benchmark models across multiple audio files.

audioquant benchmark *.wav                          # All WAV files
audioquant benchmark file1.wav file2.mp3 --json     # JSON output
audioquant benchmark file1.wav -m snac_24khz,snac_44khz  # Specific models
audioquant benchmark file1.wav -d cuda              # Force GPU

stream

Encode large files using chunked streaming mode (lower memory usage).

audioquant stream long_podcast.wav -o podcast.snac
audioquant stream lecture.wav -o lecture.snac --chunk-duration 2.0
audioquant stream lecture.wav -o lecture.snac -m snac_44khz -d cuda

SNAC Models

Model Sample Rate Bitrate Quality (MUSHRA) Best For Params
snac_24khz 24,000 Hz 0.98 kbps 88.4 Speech, podcasts 19.8M
snac_32khz 32,000 Hz 1.9 kbps Music, sound effects 54.5M
snac_44khz 44,100 Hz 2.6 kbps High-fidelity music 54.5M

vs Other Codecs

Codec Bitrate Speech Quality Music Quality CLI Tool
SNAC (AudioQuant) 0.98 kbps 88.4 76.8
EnCodec (Meta) 1.5 kbps 78.3 64.4
DAC (Descript) 2.5 kbps 85.0 54.0

How It Works

Encoding (Compression)

Audio File  →  Resample  →  AI Encoder  →  Multi-Scale Tokens  →  .snac File
(WAV/MP3)     (to model     (SNAC neural    (3-4 RVQ levels       (44-byte header
               rate)         network)         at different           + uint16 tokens)
                                              temporal rates)

SNAC uses Residual Vector Quantization (RVQ) at multiple temporal scales (rates vary by model):

  • Level 0 (coarse, 10–14 Hz): Captures overall melody and tone
  • Level 1 (medium, 21–29 Hz): Captures timbre and emotion
  • Level 2 (fine, 42–57 Hz): Captures pronunciation and detail
  • Level 3 (finest, 83–115 Hz): High-frequency detail (32kHz/44kHz models only)

.snac File Format

┌─────────────────────────────────────┐
│ Header (44 bytes)                   │
│   Magic: "SNAC"                     │
│   Version: 1                        │
│   Model: "snac_24khz"              │
│   Sample rate, duration, channels   │
├─────────────────────────────────────┤
│ Token Data                          │
│   Level 0: [count] [tokens...]      │
│   Level 1: [count] [tokens...]      │
│   Level 2: [count] [tokens...]      │
└─────────────────────────────────────┘

Supported Formats

Format Read Write
WAV
MP3
FLAC
OGG
OPUS
SNAC

Project Structure

audioquant/
├── src/audioquant/
│   ├── __init__.py       # Package version
│   ├── models.py         # Dataclasses & model registry
│   ├── codec.py          # SNAC wrapper (encode/decode)
│   ├── audio_io.py       # Audio file I/O (WAV/MP3/FLAC)
│   ├── format.py         # .snac binary file format
│   ├── metrics.py        # Quality metrics (SNR, PESQ, spectral)
│   ├── streaming.py      # Chunked encode/decode for large files
│   ├── benchmark.py      # Cross-model comparison
│   ├── display.py        # Rich terminal output
│   └── cli.py            # Click CLI commands
├── tests/                # Test suite
├── pyproject.toml
├── README.md
├── README_KO.md
└── LICENSE

Testing

# Run all fast tests (no SNAC model required)
pytest tests/ -v

# Run specific test file
pytest tests/test_format.py -v
pytest tests/test_metrics.py -v

Dependencies

Package Purpose
snac SNAC neural audio codec
torch Tensor operations
torchaudio Audio I/O and resampling
click CLI framework
rich Terminal formatting
numpy Numerical operations
pesq (optional) PESQ quality metric

Quality Metrics

Metric Description Range
SNR Signal-to-Noise Ratio Higher = better (dB)
SI-SNR Scale-Invariant SNR Higher = better (dB)
Spectral Distance Multi-resolution STFT distance Lower = better
PESQ Perceptual speech quality 1.0–4.5
Compression Ratio Original / compressed size Higher = smaller

Issues & Contributions

Found a bug? Have a feature request? Please let us know!

All feedback is welcome. If something doesn't work as expected, please report it — it helps make AudioQuant better for everyone!

License

MIT

Acknowledgments

  • SNAC — Multi-Scale Neural Audio Codec by Hubert Siuzdak
  • TurboQuant — Inspiration for the practical CLI-on-top-of-research pattern

Releases

No releases published

Packages

 
 
 

Contributors

Languages