Peak-based audio fingerprinting. Zero overhead. Written in Rust.
wavio is a high-throughput acoustic fingerprinting library built for DSP engineers who need fast, deterministic audio identification without the weight of an ML stack. No embeddings, no models, no runtime — just spectral peaks, combinatorial hashing, and raw speed.
- Full DSP pipeline — WAV loading → FFT spectrogram → constellation peak detection → combinatorial hashing
- In-memory & on-disk indexing — query against thousands of tracks in microseconds
- Deterministic — same input always produces the same fingerprints
- Zero unsafe code —
#![forbid(unsafe_code)] - Parallel processing — optional rayon-based parallelism
- Python bindings — use from Python via PyO3/maturin
- CLI tool — index and query from the command line
[dependencies]
wavio = "0.1"
# With optional features:
wavio = { version = "0.1", features = ["persist", "parallel"] }| Flag | Description |
|---|---|
parallel |
Rayon-based parallel fingerprinting and peak extraction |
persist |
On-disk index persistence via sled |
python |
Python bindings via PyO3 |
git clone https://github.com/MinLee0210/wavio.git
cd wavio
cargo build --release --all-featuresuse wavio::dsp::audio::load_wav;
use wavio::dsp::spectrogram::{compute_spectrogram, SpectrogramConfig};
use wavio::dsp::peaks::{extract_peaks, PeakExtractorConfig};
use wavio::hash::{generate_hashes, HashConfig};
use wavio::index::Index;
// Load → Spectrogram → Peaks → Hashes → Index → Query
let audio = load_wav("track.wav").unwrap();
let spec = compute_spectrogram(&audio.samples, &SpectrogramConfig::default()).unwrap();
let peaks = extract_peaks(&spec, &PeakExtractorConfig::default());
let hashes = generate_hashes(&peaks, &HashConfig::default());
let mut index = Index::default();
index.insert("my_track", &hashes);
let result = index.query(&hashes);
assert_eq!(result.unwrap().track_id, "my_track");The wavio-cli binary provides three subcommands:
# Index a folder of WAV files into a database
wavio-cli index --db ./wavio.db ./music/
# Identify a clip against the database
wavio-cli query --db ./wavio.db ./clip.wav
# Print database statistics
wavio-cli info --db ./wavio.dbAdd --verbose for detailed output (peak count, hash count, timing).
# Build the CLI (requires the persist feature)
cargo build --release --bin wavio-cli --features persistInstall via maturin:
cd python/
pip install maturin
maturin develop --features pythonUsage:
import wavio
fp = wavio.PyFingerprinter()
hashes = fp.fingerprint_file("track.wav")
index = wavio.PyIndex()
index.insert("my_track", hashes)
result = index.query(hashes)
print(result) # {'track_id': 'my_track', 'score': ..., 'offset_secs': ...}See python/README.md for more details.
Platform: macOS, release profile · Criterion · 22,050 Hz synthetic audio
| Benchmark | Median |
|---|---|
| Fingerprint single 3-min track | 88.6 ms |
| Index 1,000 tracks (20 hashes each) | 1.04 ms |
| Query 1,000 lookups | 568 µs (~0.57 µs/query) |
Reproduce with:
cargo bench --features parallelSee docs/BENCHMARKS.md for full results.
See ARCHITECTURE.md for the pipeline diagram, design decisions (sample rate, FFT size, hash bit-packing), and known limitations.
See .github/CONTRIBUTING.md for development setup, code style, and PR checklist.