A state-of-the-art deep learning application that classifies urban environmental sounds using a hybrid U-Net architecture. This project leverages the UrbanSound8K dataset to identify common urban sounds with exceptional accuracy.
The Urban Sound Classifier is an advanced audio classification system designed to identify and categorize urban environmental sounds. Using deep learning techniques and a hybrid U-Net architecture, the system achieves remarkable accuracy in distinguishing between different urban sound categories.
- High Accuracy: 96.63% classification accuracy on test data
- Real-time Processing: Supports both file uploads and real-time microphone recording
- Robust Feature Extraction: Advanced mel-spectrogram feature extraction pipeline
- User-friendly Interface: Modern, responsive web interface with real-time feedback
- Versatile Audio Support: Handles multiple audio formats (WAV, MP3, OGG, FLAC, M4A)
- Multi-class Classification: Identifies 10 distinct urban sound categories
- Confidence Scoring: Provides confidence metrics for each prediction
- Spectrogram Analysis: Converts audio to mel-spectrograms for neural network processing
- Drag-and-Drop: Easy file upload with drag-and-drop functionality
- Real-time Microphone Recording: Record and analyze sounds directly from your microphone
- Audio Visualization: Visual feedback during recording with waveform display
- Responsive Design: Works seamlessly across desktop and mobile devices
- Hybrid U-Net Architecture: Combines feature extraction capabilities of CNNs with context-preserving properties of U-Net
- Audio Preprocessing Pipeline: Automatic conversion, normalization, and feature extraction
- RESTful API: Endpoints for classification and retrieving available classes
- Modular Design: Well-structured codebase with separation of concerns
The classifier uses a sophisticated hybrid U-Net architecture that combines:
- Feature Extraction: Convolutional layers extract hierarchical features from mel-spectrograms
- Context Preservation: U-Net's skip connections maintain spatial context information
- Multi-scale Analysis: Captures both fine-grained details and broader patterns in audio signals
This project includes pre-trained models that can be used directly for urban sound classification. The models are located in the models/unetropolis-hybrid-unet-urbansound_96.63_weights directory.
- UNet models for each fold (best_model_fold1_UNet.h5, best_model_fold2_UNet.h5, etc.)
- SimpleCNN models for each fold (best_model_fold1_SimpleCNN.h5, etc.)
These models are used together in an ensemble to achieve higher accuracy.
Run the main script to load the models and see information about them:
python double_unet_audio_classifier.pyThis will load all available models and display information about them. If the UrbanSound8K dataset is available, it will also evaluate the models on a sample of the dataset.
To classify a single audio file, use the prediction script:
python predict_sample.pyThis script will prompt you to enter the path to an audio file, and then it will classify the sound using the pre-trained models.
To use the pre-trained models in your own code, follow these steps:
from double_unet_audio_classifier import load_pretrained_models, predict_with_models
# Load models
models = load_pretrained_models('path/to/models', model_types=['UNet', 'SimpleCNN'])
# Process your audio features
# features = ...
# Make predictions
ensemble_preds, ensemble_classes = predict_with_models(models, features)This architecture significantly outperforms traditional CNN models for audio classification tasks, achieving 96.63% accuracy on the UrbanSound8K dataset.
This project uses the UrbanSound8K dataset, which contains 8,732 labeled sound excerpts (β€4s) of urban sounds from 10 classes:
- π¬οΈ Air conditioner
- π Car horn
- πΆ Children playing
- π Dog bark
- π¨ Drilling
- π Engine idling
- π« Gun shot
- ποΈ Jackhammer
- π¨ Siren
- π΅ Street music
The dataset should be placed in the following directory structure:
D:\Urban-Sound_Classifier-Project\data\UrbanSound8K\
βββ UrbanSound8K.csv
βββ fold1\
βββ fold2\
...
βββ fold10\
The code is configured to look for the dataset in this location. If you have the dataset in a different location, you can modify the path in the double_unet_audio_classifier.py file.
- Clone the repository
git clone https://github.com/yourusername/Urban-Sound_Classifier-Project.git
cd Urban-Sound_Classifier-Project- Create and activate a virtual environment
python -m venv venv
# On Windows
venv\Scripts\activate
# On macOS/Linux
source venv/bin/activate- Install dependencies
pip install -r requirements.txtpython src/app.pyThen open your browser and navigate to http://localhost:5000
- File Upload: Click "Select Audio File" or drag and drop an audio file
- Microphone Recording: Click "Use Microphone" to record a sound for classification
- Analysis: Click "Analyze Sound" to process the audio and view results
POST /predict- Upload an audio file for classificationGET /classes- Get a list of all sound classes the model can predict
- Audio Conversion: Converts various audio formats to WAV
- Resampling: Standardizes to 22050Hz sample rate
- Duration Normalization: Adjusts to 4-second segments
- Feature Extraction: Generates mel-spectrograms with 128 mel bands
- Normalization: Scales features for optimal model performance
- Backend: Flask-based RESTful API
- Frontend: Modern HTML5/CSS3/JavaScript interface
- Real-time Processing: Asynchronous audio handling with Web Audio API
The project now includes real-time audio classification capabilities:
python realtime_classifier.pyThis will start recording from your microphone and classify sounds in real-time. Press Ctrl+C to stop.
Generate visualizations of the model architecture and layer activations:
python visualize_model.pyThe visualizations will be saved in the model_visualization directory.
Run comprehensive evaluation of the model:
python evaluate_model.pyThis will generate evaluation metrics, confusion matrix, and classification report in the evaluation_results directory.
- Model Fine-tuning: Further optimization for specific urban environments
- Continuous Learning: Implementation of feedback mechanisms for model improvement
- Multi-label Classification: Detection of overlapping sound categories
- Mobile Application: Native mobile apps for iOS and Android
- Edge Deployment: Optimization for edge devices and IoT applications
This project is licensed under the MIT License - see the LICENSE file for details.
- UrbanSound8K dataset - J. Salamon, C. Jacoby, and J. P. Bello
- TensorFlow - For the deep learning framework
- Flask - For the web application framework
- Librosa - For audio processing capabilities
Developed by Sachin Paunikar
Β© 2025 Sachin Paunikar. All Rights Reserved.