This repository contains the complete implementation of the thesis project:
"Data Collection System for Training a Stream Learning Model for Activities of Daily Living (ADL) Classification"
The system was designed to support real-time classification of human activities using inertial data collected from smartphones (accelerometer and gyroscope).
The project integrates:
- Offline supervised training
- Online incremental (stream) learning
- Selective label request policy
- Stability and reaction analysis over time
The architecture follows a client–server model with persistent storage and incremental adaptation.
To design and implement a client-server architecture capable of:
- Collecting inertial sensor data from smartphones
- Segmenting data into temporal windows
- Training supervised models (SVM, Random Forest)
- Deploying an incremental Stream Learning model
- Performing real-time ADL classification
- Requesting labels selectively based on model uncertainty
- Evaluating stability and reaction performance over time
The system is divided into two major phases:
In the offline phase:
- Smartphones collect raw IMU signals.
- Data are stored locally and exported as CSV files.
- The dataset is cleaned and filtered.
- Signals are segmented into time windows (20Hz / 50Hz).
- Class balancing techniques are applied (Random Oversampling / SMOTE).
- Supervised models (SVM, Random Forest) are trained and compared.
- The best configuration is selected and exported as the baseline model.
In the online phase:
- The client app streams windowed data (features + metadata).
- The server receives and validates windows.
- A classification module generates predictions + confidence.
- A policy engine decides whether to request a label.
- Labeled windows are aligned and stored.
- An incremental learner (HT / ARF) updates the model continuously.
- Performance is monitored using learning curves, stability and reaction metrics.
The server is divided into:
- Ingestion/Validation module
- Classification module
- Rule/Policy engine
- Label aligner
- Incremental trainer
- Database persistence
This diagram summarizes the internal decision flow:
- Window → Baseline classifier (SVM + Calibrator)
- Prediction + confidence → Policy engine
- Label request (if needed)
- Label alignment
- Incremental update (HT / ARF)
- Final selected offline model: SVM – Pocket position – 20Hz
- Comparative analysis against Random Forest
- Incremental evaluation using River
- Stability analysis across sessions
- Reaction time analysis after model updates
- Per-class performance metrics
- Learning curve overlays
- Python
- Scikit-learn
- River (Stream Learning)
- Pandas
- NumPy
- Matplotlib
- Imbalanced-learn (SMOTE)
- Integration of offline supervised learning with online incremental adaptation
- Selective labeling strategy to reduce annotation burden
- Stability and reaction metrics for continuous evaluation
- End-to-end architecture from data collection to deployment simulation
Paula Sofía Muñoz.
Electronic and Telecommunications Engineering
Universidad del Cauca
This project is intended for academic and research purposes.



