Skip to content

WiFi CSI-based gesture recognition using dual-path ensemble deep learning (CNN2D + CNN1D-LSTM). 90.19% accuracy on ESP32 hardware with 426K parameters.

License

Notifications You must be signed in to change notification settings

sjsreehari/Wi-Fi-gesture-dectection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WiFi CSI Gesture Detection

WiFi CSI Gesture Detection is an innovative deep learning-based system that leverages WiFi Channel State Information (CSI) to detect human presence and gestures without cameras or wearables. The system employs a novel dual-path ensemble architecture combining CNN2D and CNN1D-LSTM models to achieve 90.2% accuracy in classifying occupancy states through ubiquitous WiFi infrastructure, offering a privacy-first, contact-free alternative to traditional vision-based sensing systems.

This project is built and developed by Sreehari S J

Key Features

  • Ensemble Architecture: Dual-path fusion of CNN2D (spectrogram) and CNN1D-LSTM (temporal) models
  • Contact-Free Sensing: Detects gestures through WiFi signals without cameras or wearables
  • Privacy-Preserving: No visual data capture - CSI signals cannot reconstruct images
  • Multi-Class Detection: Classifies Not Occupied, Occupied Static, and Occupied Motion
  • Real-Time Performance: <50ms inference time on CPU hardware
  • ESP32 Integration: Commodity WiFi hardware for CSI data collection
  • Advanced Features: STFT spectrograms + 8 statistical features per subcarrier
  • Robust Pipeline: End-to-end preprocessing, feature extraction, and training workflow
  • Through-Wall Detection: RF signals penetrate non-metallic barriers
  • 90.2% Accuracy: Validated on balanced dataset with 1,680 samples across 3 classes

System Architecture

System Architecture

The system employs a multi-stage pipeline for CSI signal processing and classification:

  1. Preprocessing Module: Raw CSI amplitude normalization, outlier detection, and sliding window segmentation
  2. Feature Extraction: Short-Time Fourier Transform (STFT) spectrograms and statistical feature computation
  3. Deep Learning Models: Ensemble fusion of CNN2D (spectrogram analysis) and CNN1D-LSTM (temporal sequence modeling)
  4. Training Framework: Cross-validation with early stopping and learning rate scheduling
  5. Evaluation Metrics: Multi-class accuracy, precision, recall, F1-score, and confusion matrix analysis

Technical Specifications

Model Architecture

Model Architecture - Ensemble

Ensemble Configuration:

  • CNN2D Branch: Processes 128×128 STFT spectrograms for frequency-time pattern recognition
  • CNN1D-LSTM Branch: Analyzes 121-timestep sequences with 8 per-subcarrier statistical features
  • Fusion Strategy: Weighted combination (α=0.6 for CNN2D, α=0.4 for LSTM)
  • Total Parameters: ~2.1M trainable parameters

Performance Metrics:

  • Test Accuracy: 90.2%
  • Validation Accuracy: 88-92%
  • Inference Time: <50ms per prediction (CPU)
  • Classes: Not Occupied, Occupied Static, Occupied Motion

Dataset Structure

data/
├── raw/                    # Raw CSI CSV files (timestamp, amplitude vectors)
│   ├── not_occupied/      # Background/empty room recordings
│   ├── occupied_static/   # Stationary human presence
│   └── occupied_motion/   # Active hand gestures and movements
├── preprocessed/          # Normalized and segmented CSI windows (.npy)
└── features/              # Extracted spectrograms and statistical features (.npy)

Dependencies

Core Libraries:

  • PyTorch >= 2.0.0 (Deep learning framework)
  • NumPy >= 1.24.0 (Numerical computing)
  • Librosa >= 0.10.0 (Audio signal processing for STFT)
  • Scikit-learn >= 1.3.0 (Machine learning utilities)
  • SciPy >= 1.10.0 (Scientific computing)

Usage

Training Pipeline

Execute the main training script:

python main.py

Available Operations:

  1. Complete pipeline execution (preprocessing + feature extraction + training)
  2. Ensemble model training (requires preprocessed features)
  3. Performance metrics visualization
  4. Training log inspection

Directory Structure

.
├── ai_pipeline/           # Core machine learning pipeline
│   ├── preprocessing.py
│   ├── feature_extraction.py
│   ├── models.py
│   ├── training.py
│   ├── train_ensemble.py
│   └── models/           # Trained model weights (.pth)
├── data/                 # Dataset repository
├── results/              # Training outputs and evaluation metrics
│   ├── logs/            # JSON metrics and text logs
│   └── plots/           # Training curves and confusion matrices
├── hardware/            # ESP32 CSI data collection firmware
├── utils/               # Utility functions and ONNX export
└── main.py              # Entry point

Data Format Specification

Input CSV Format:

timestamp,csi_raw_data
2025-11-10 00:35:22.099022,"CSI_DATA,STA,MAC_ADDRESS,...,[amplitude_values]"

CSI Vector:

  • Length: 128 subcarriers
  • Type: Amplitude values (float32)
  • Range: Normalized to [-1, 1]

Feature Representation:

  • Spectrogram: 128×128 frequency-time matrix (STFT with Hann window)
  • Statistical: 8 features × 121 subcarriers (mean, std, skewness, kurtosis, RMS, range, min, max)

Model Training

Hyperparameters:

  • Optimizer: AdamW (lr=2e-3, weight_decay=1e-4)
  • Batch Size: 32
  • Epochs: 40 (with early stopping, patience=10)
  • Loss Function: Cross-Entropy Loss
  • Learning Rate Schedule: ReduceLROnPlateau (factor=0.5, patience=5)

Data Split:

  • Training: 80%
  • Validation: 20%
  • Stratified sampling to maintain class balance

Results

Trained models are automatically saved to:

  • ai_pipeline/models/ensemble_model_YYYYMMDD_HHMMSS.pth
  • results/logs/ai_training/metrics_YYYYMMDD_HHMMSS.json

Performance visualizations:

  • Training/validation loss curves
  • Accuracy progression
  • Confusion matrix heatmap
  • Per-class precision-recall metrics

Model Export

Models can be exported to ONNX format for deployment:

  • Dynamic batch size support
  • Optimized for CPU inference
  • Compatible with ONNX Runtime

License

This project is licensed under the MIT License.

Technical Notes

  • System designed for training pipeline only (real-time inference components removed)
  • Optimized for batch processing of collected CSI datasets
  • GPU acceleration supported (CUDA-compatible devices)
  • Model checkpoint saving after each improvement in validation accuracy
  • Reproducible results with fixed random seed (seed=42)

About

WiFi CSI-based gesture recognition using dual-path ensemble deep learning (CNN2D + CNN1D-LSTM). 90.19% accuracy on ESP32 hardware with 426K parameters.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages