HeyDittoNet

Spoken "Hey Ditto" activation using CNN-LSTM model. Model trained on both synthetic and real human voices along with samples of background noise from various scenes around the world.

Getting Started

Install required packages: pip install -r requirements.txt
Run: python main.py to test activation on your default mic.

Model Architecture

CNN-LSTM model architecture below with 99% testing accuracy on roughly 30,000 audio samples:

Training Metrics

CNN-LSTM Training Loss: