This repository contains code and instructions for using MICA-Net, a new multimodal action recognition framework that combines data from multiple sensors to enhance human action recognition. Deploy MICA-Net model on NVIDIA Jetson AGX Xavier edge device, and test on a human-machine interactive system with home device control application.
The paper "MICA-Net: A Multimodal Cross-Attention Network for Human Action Recognition" presents a solution by designing a compact, wrist-worn device similar to a smartwatch, which facilitates easy portability and mobility. Moreover, we propose a new action recognition method called MICA-Net that uses a cross-attention mechanism to simultaneously extract correlations from multimodal data collected by different sensors, thereby improving model efficiency. The paper has three key contributions:
- 🎯 Introduce MICA-Net, a new framework for efficient multimodal human action recognition. Validated on three benchmarks demonstrates superior performance compared to single modality methods and some existing combined methods.
- ⌚ A new wrist-worn sensor prototype is presented, featuring a more compact, robust, and user-friendly design. This prototype, which can function as a smartwatch, is easily deployable in ubiquitous environments. It is suitable for various applications, including human life logging and human-machine interaction.
- 🏠 Developing an application for controlling home appliances using hand gestures collected by our prototype, running on an edge device, demonstrating its potential for practical use.
- 🐍 Python 3.x
- 🔥 PyTorch
- 📦 torchvision
- 🎥 OpenCV
- 🔢 NumPy
- Clone the repository:
git clone https://github.com/drkhanusa/MICA-Net.git cd MICA-Net
- Install the required packages:
pip install -r requirements.txt
-
📥 Download Datasets: UTD-MHAD, UESTC-MMEA-CL, and MuWiGes datasets.
-
⚙️ Preprocess: With inertial data, we use the Gramian Angular Different Field algorithm to transform time series data into 8-dimensional images. With image data, for each video, we use 16 frames, resizing to 172x172. All preprocessing algorithms are in the algorithm.py file
- 📝 Configure Training: Adjust the setting parameters in each training file of each model: GAFormer model, MoViNet model, and MICA-Net model.
▶️ Run training:
- With GAFormer model:
python Training_GAFormer.py
- With MoViNet model:
python Training_MoViNet.py
- After finishing training both GAFormer and MoViNet models, cutting the last fully connected layer, freezing the previous layers and starting inference, the output is obtained in both the two model branches (inertial and visual) as feature vectors, which will be saved as numpy files (.npy) in separate folders. After completing the above work, training the MICA-Net fusion model with the following command:
python Training_MICA-Net.py
We evaluate the results of the models by parameters: Accuracy, Precision, Recall, F1-Score, Confusion Matrix, T-SNE Graph. These results will also be displayed immediately after you finish training the model.
📚 Datasets | 🎯 Accuracy(%) | 🎯 Precision(%) | 🎯 Recall(%) | 🎯 F1-Score(%) |
---|---|---|---|---|
UTD-MHAD | 92.10 | 93.33 | 92.18 | 92.22 |
UESTC-MMEA-CL | 99.22 | 99.28 | 99.28 | 99.37 |
MuWiGes | 98.98 | 99.00 | 98.92 | 99.16 |
- 🛠 Install JetPack SDK: Follow NVIDIA's instructions to install JetPack SDK using SDK Manager. In this paper, we use Jetson AGX Xavier 32GB RAM with JetPack version 5.1.2
- 📂 Transfer Code: Transfer the MICA-Net code to the Jetson AGX Xavier
- 🔄 Convert PyTorch model into .ONNX model:
python convertPytorch2onnx.py
- ⚡ Optimize Model: TensorRT is a neural network model optimization library. On NVIDIA's family of edge devices, the trtexec tool (TensorRT library) is available to convert ONNX models to TensorRT. Convert into the TensorRT model with the following command:
/usr/src/tensorrt/bin/trtexec --onnx=MoViNet.onnx --saveEngine=MoViNet.trt --fp16
▶️ Run inference: Inference and evaluate results of TensorRT models on Jetson AGX Xavier for speed and accuracy:python TensorRT_validation.py
This research is funded by Hanoi University of Science and Technology (HUST) under project number T2021-SAHEP-003.