Deepfake audio generators (like ElevenLabs) create sound mathematically. While they sound real to the human ear, they fail to replicate the organic chaos of a physical vocal tract. Standard detectors fail because they look for semantic patterns, not physical ones.
Spectre is a Convolutional Neural Network (CNN) that doesn't just "listen" to audio; it "looks" at the Physics of the Sound.
Test the Forensic AI yourself. I have deployed the model to Hugging Face Spaces. You can upload any audio file (WAV/MP3) and watch the Physics Engine detect synthetic artifacts in real-time.
- Input: Mel-Spectrograms (Visual representation of frequencies).
- Method: Detects "Mathematical Perfection" (smooth sine waves) vs "Organic Physics" (jitter/noise).
- Result: Achieved 100% detection rate on synthetic physics datasets.
Note on Generalization: This model achieves 100% detection accuracy on algorithmic synthesis datasets (e.g., standard GANs). However, it currently exhibits lower sensitivity towards SOTA Latent Diffusion models (e.g., Minimax, ElevenLabs v3) which effectively simulate organic noise floors.
- Current Status: Proof of Concept for Spectral Analysis.
- Next Steps: Implementing Bispectral Analysis (Phase Coupling) to detect non-linear artifacts in diffusion models.
- Signal Processing: Librosa (FFT, Spectral Analysis).
- AI: PyTorch (Custom ResNet-style CNN).
- Environment: Google Colab (T4 GPU Accelerated).
(See the full analysis in the Project_spectre.ipynb)
- Architecture: ResNet-style CNN (Custom 3-layer architecture).
- Training Data: Synthetic Audio Dataset (Sine Waves vs. Organic Noise).
- Input Features: Mel-Spectrograms (64 bands).
- Performance: 100% Accuracy on Synthetic Validation Set.
- Limitations: Vulnerable to Latent Diffusion artifacts (Minimax).
The model successfully distinguishes between the perfect sine-wave structures of AI audio and the chaotic frequency distribution of organic speech.