WiFi CSI Gesture Detection is an innovative deep learning-based system that leverages WiFi Channel State Information (CSI) to detect human presence and gestures without cameras or wearables. The system employs a novel dual-path ensemble architecture combining CNN2D and CNN1D-LSTM models to achieve 90.2% accuracy in classifying occupancy states through ubiquitous WiFi infrastructure, offering a privacy-first, contact-free alternative to traditional vision-based sensing systems.
This project is built and developed by Sreehari S J
- Ensemble Architecture: Dual-path fusion of CNN2D (spectrogram) and CNN1D-LSTM (temporal) models
- Contact-Free Sensing: Detects gestures through WiFi signals without cameras or wearables
- Privacy-Preserving: No visual data capture - CSI signals cannot reconstruct images
- Multi-Class Detection: Classifies Not Occupied, Occupied Static, and Occupied Motion
- Real-Time Performance: <50ms inference time on CPU hardware
- ESP32 Integration: Commodity WiFi hardware for CSI data collection
- Advanced Features: STFT spectrograms + 8 statistical features per subcarrier
- Robust Pipeline: End-to-end preprocessing, feature extraction, and training workflow
- Through-Wall Detection: RF signals penetrate non-metallic barriers
- 90.2% Accuracy: Validated on balanced dataset with 1,680 samples across 3 classes
The system employs a multi-stage pipeline for CSI signal processing and classification:
- Preprocessing Module: Raw CSI amplitude normalization, outlier detection, and sliding window segmentation
- Feature Extraction: Short-Time Fourier Transform (STFT) spectrograms and statistical feature computation
- Deep Learning Models: Ensemble fusion of CNN2D (spectrogram analysis) and CNN1D-LSTM (temporal sequence modeling)
- Training Framework: Cross-validation with early stopping and learning rate scheduling
- Evaluation Metrics: Multi-class accuracy, precision, recall, F1-score, and confusion matrix analysis
Ensemble Configuration:
- CNN2D Branch: Processes 128×128 STFT spectrograms for frequency-time pattern recognition
- CNN1D-LSTM Branch: Analyzes 121-timestep sequences with 8 per-subcarrier statistical features
- Fusion Strategy: Weighted combination (α=0.6 for CNN2D, α=0.4 for LSTM)
- Total Parameters: ~2.1M trainable parameters
Performance Metrics:
- Test Accuracy: 90.2%
- Validation Accuracy: 88-92%
- Inference Time: <50ms per prediction (CPU)
- Classes: Not Occupied, Occupied Static, Occupied Motion
data/
├── raw/ # Raw CSI CSV files (timestamp, amplitude vectors)
│ ├── not_occupied/ # Background/empty room recordings
│ ├── occupied_static/ # Stationary human presence
│ └── occupied_motion/ # Active hand gestures and movements
├── preprocessed/ # Normalized and segmented CSI windows (.npy)
└── features/ # Extracted spectrograms and statistical features (.npy)
Core Libraries:
- PyTorch >= 2.0.0 (Deep learning framework)
- NumPy >= 1.24.0 (Numerical computing)
- Librosa >= 0.10.0 (Audio signal processing for STFT)
- Scikit-learn >= 1.3.0 (Machine learning utilities)
- SciPy >= 1.10.0 (Scientific computing)
Execute the main training script:
python main.pyAvailable Operations:
- Complete pipeline execution (preprocessing + feature extraction + training)
- Ensemble model training (requires preprocessed features)
- Performance metrics visualization
- Training log inspection
.
├── ai_pipeline/ # Core machine learning pipeline
│ ├── preprocessing.py
│ ├── feature_extraction.py
│ ├── models.py
│ ├── training.py
│ ├── train_ensemble.py
│ └── models/ # Trained model weights (.pth)
├── data/ # Dataset repository
├── results/ # Training outputs and evaluation metrics
│ ├── logs/ # JSON metrics and text logs
│ └── plots/ # Training curves and confusion matrices
├── hardware/ # ESP32 CSI data collection firmware
├── utils/ # Utility functions and ONNX export
└── main.py # Entry point
Input CSV Format:
timestamp,csi_raw_data
2025-11-10 00:35:22.099022,"CSI_DATA,STA,MAC_ADDRESS,...,[amplitude_values]"
CSI Vector:
- Length: 128 subcarriers
- Type: Amplitude values (float32)
- Range: Normalized to [-1, 1]
Feature Representation:
- Spectrogram: 128×128 frequency-time matrix (STFT with Hann window)
- Statistical: 8 features × 121 subcarriers (mean, std, skewness, kurtosis, RMS, range, min, max)
Hyperparameters:
- Optimizer: AdamW (lr=2e-3, weight_decay=1e-4)
- Batch Size: 32
- Epochs: 40 (with early stopping, patience=10)
- Loss Function: Cross-Entropy Loss
- Learning Rate Schedule: ReduceLROnPlateau (factor=0.5, patience=5)
Data Split:
- Training: 80%
- Validation: 20%
- Stratified sampling to maintain class balance
Trained models are automatically saved to:
ai_pipeline/models/ensemble_model_YYYYMMDD_HHMMSS.pthresults/logs/ai_training/metrics_YYYYMMDD_HHMMSS.json
Performance visualizations:
- Training/validation loss curves
- Accuracy progression
- Confusion matrix heatmap
- Per-class precision-recall metrics
Models can be exported to ONNX format for deployment:
- Dynamic batch size support
- Optimized for CPU inference
- Compatible with ONNX Runtime
This project is licensed under the MIT License.
- System designed for training pipeline only (real-time inference components removed)
- Optimized for batch processing of collected CSI datasets
- GPU acceleration supported (CUDA-compatible devices)
- Model checkpoint saving after each improvement in validation accuracy
- Reproducible results with fixed random seed (seed=42)


.png)