Robust audio/video authentication system that survives compression using speech-optimized perceptual hashing and quantum-resistant cryptography.
- 🔐 Quantum-Resistant: Ed25519 + future Dilithium support
- 🎙️ Speech-Optimized: Robust features that survive compression
- 📹 Audio + Video: Dual-modal authentication
- 📦 Compression-Resistant: Verified up to 60.9% file reduction
- 🛡️ Tampering Detection: Phase inversion (9.6%), silence (62.4%), noise (46.7%), video (51.0%) - all detected
- ⚡ Two Modes: Embedded (self-contained) or External (.qrmn file)
pip install -r requirements.txt# 1. Generate keys (one time)
python3 keygen.py generate
# 2. Sign media (creates .qrmn file)
python3 external_signer.py example.mp4 example_private.json
# 3. Verify media
python3 external_verifier.py example.mp4 example.mp4.qrmn example_public.json
# 4. Run comprehensive demo
python3 demo.py- Video: .mp4, .avi, .mov, .mkv, .webm
- Audio: .wav, .mp3, .flac, .m4a, .ogg
- 🟢 GREEN: Authentic and verified
- 🟡 AMBER: Possibly authentic, verify manually
- 🔴 RED: Not authentic or tampered
1. File Integrity (SHA-256)
- Cryptographic hash of entire file
- Detects any bit-level modification
- Fast rejection of altered files
2. Content Verification (Perceptual Hashing)
- Audio: F0 patterns, MFCC, spectral envelope, energy, rhythm + polarity fingerprinting
- Video: Edge patterns, color histograms, texture, brightness, contrast
- Compression-resistant features with 80% (audio) / 75% (video) thresholds
3. Cryptographic Chain
- Hash-chain links all segments (SHA-256)
- Ed25519 signatures (quantum-resistant ready)
- Prevents insertion, deletion, reordering
Compression changes bits but preserves content:
- Original features:
[1,0,1,1,0,1,...] - After compression:
[1,0,1,1,1,1,...](91% similar) - System accepts ≥80% = authentic ✅
This follows industry best practices from academic research on robust media hashing.
- Speech verification - Podcasts, interviews, recordings
- Deepfake detection - AI-generated content identification
- Legal/forensic evidence - Chain of custody for media
- Content distribution - Verify authenticity across platforms
- Compression tolerance - Works after YouTube, TikTok, messaging apps
| Attack Type | Result | Similarity |
|---|---|---|
| Phase Inversion | ✅ Detected | 9.6% |
| Silence Injection | ✅ Detected | 62.4% |
| Noise Replacement | ✅ Detected | 46.7% |
| Video Tampering | ✅ Detected | 51.0% |
| Scenario | File Reduction | Status |
|---|---|---|
| Heavy AAC | -22% | ✅ PASS |
| Extreme Low | 61% | ✅ PASS |
| MP3 Transcode | -19% | ✅ PASS |
| OGG Vorbis | 22% | ✅ PASS |
| Mono Resample | 49% | ✅ PASS |
| Speed Change | 27% | ✅ PASS |
All 6/6 scenarios pass with high confidence (95-100% similarity)
from src.perceptual_hash import PerceptualHasher
# Balanced (default)
hasher = PerceptualHasher(
speech_optimized=True,
polarity_threshold=0.60, # Detection sensitivity
polarity_penalty=0.15 # Rejection severity
)Conservative (fewer false positives):
polarity_threshold=0.70, polarity_penalty=0.20Aggressive (maximum security):
polarity_threshold=0.50, polarity_penalty=0.10Features Extracted (48 bits):
- F0 patterns (pitch)
- MFCC coefficients (vocal tract)
- Spectral envelope (voice quality)
- Energy patterns (amplitude)
- Temporal rhythm (speech rate)
Polarity Fingerprint (32 bits):
- Waveform mean
- Temporal skewness
- Time-weighted features
- Zero-crossing asymmetry
- Peak sign detection
Total: 10 bytes per window
# Stage 1: Polarity check (fast rejection)
if polarity_similarity < 0.60:
return full_similarity * 0.15 # 85% penalty
# Stage 2: Full comparison
return hamming_similarity(hash1, hash2)Result:
- Phase inversion: 9.8% similarity (detected)
- Compression: 97.5% similarity (authentic)
Features Extracted (96 bits):
- Edge density (Canny)
- Color histograms (HSV)
- dHash (difference hash)
- Texture analysis
- Brightness/contrast
Total: 12 bytes per keyframe
Genesis → Window 0 → Window 1 → ... → Window N → Video 0 → Video 1 → ...
Each link contains:
perceptual_hash: Binary feature vectorchain_hash: SHA-256(prev_hash + current_features)metadata: Timestamp, method, indexsignature: Ed25519 signature
{
"version": "2.0",
"method": "external",
"file_hash": "sha256...",
"media_file": "filename.mp4",
"total_windows": 55,
"messages": [...],
"signatures": [...],
"audio_info": {...},
"video_keyframes": [...],
"video_messages": [...],
"video_signatures": [...]
}| Level | Audio | Video | Meaning |
|---|---|---|---|
| 🟢 Very High | ≥95% | ≥90% | Bit-for-bit identical or minimal processing |
| 🟢 High | ≥90% | ≥80% | Authentic, minor compression |
| 🟢 Medium | ≥80% | ≥70% | Authentic, heavy compression |
| 🟡 Low | ≥70% | ≥60% | Possibly authentic, verify manually |
| 🔴 Rejected | <70% | <60% | Not authentic, tampering detected |
Audio: 80%
- Features (F0, MFCC) very stable under compression
- Achieves 97.5% average → 17.5 point margin
Video: 75%
- Visual features more affected by quantization
- Achieves 89.6% average → 14.6 point margin
- Block artifacts and DCT compression cause more drift
- Embedded mode: Not compression-resistant (use external mode with
.qrmnfiles) - Tolerance-based: Uses ≥80% similarity threshold, not cryptographic exact matching
- Phase manipulation: Detects full inversion (×-1), but not sophisticated partial phase shifts
- Extreme distortion: May fail with severe noise/distortion beyond tested scenarios
# Run rigorous test suite
python3 rigorous_test.py
# Run comprehensive demo (6 compression scenarios)
python3 demo.pyTests phase inversion, splicing attacks, noise handling, and false positives
- Phase inversion detection (~9.6% similarity)
- Audio tampering detection (silence, noise, phase inversion)
- Audio compression resistance (≥80%)
- Video identity (100% similarity)
- Video tampering detection (black, white, noise frames)
- Video compression resistance (≥75%)
- Hash size validation
├── README.md # This file
├── requirements.txt # Python dependencies
├── src/ # Core modules
│ ├── perceptual_hash.py # Audio perceptual hashing + polarity detection
│ ├── video_perceptual_hash.py # Video perceptual hashing
│ ├── enhanced_verification.py # Adaptive quality + temporal consistency
│ ├── crypto_signing.py # Ed25519/Dilithium signatures
│ ├── hashchain.py # Cryptographic chain
│ ├── canonicalization.py # Audio preprocessing
│ └── embedding.py # DCT steganography
├── external_signer.py # Create .qrmn signature files
├── external_verifier.py # Verify with .qrmn files
├── embedded_signer.py # Embed signatures in media
├── embedded_verifier.py # Verify embedded signatures
├── keygen.py # Generate cryptographic keys
├── demo.py # Comprehensive demonstration
└── rigorous_test.py # Rigorous test suite
For production use, consider:
- Additional speech feature extraction methods
- More post-quantum algorithm support
- Real-time processing optimization
- GUI interfaces
- API endpoints
MIT License
- Ed25519: RFC 8032
- Dilithium: NIST PQC Round 3
- Audio hashing: Robust audio fingerprinting research
- Video hashing: Perceptual hashing literature
⚡ Get Started: Run python3 demo.py to see the full system in action!