Skip to content

Violence recognition in streaming video using Transfer Learning and MoViNets. The project leverages state-of-the-art deep learning techniques to create an efficient and accurate violence detection system.

Notifications You must be signed in to change notification settings

engares/MoViNets-for-Violence-Detection-in-Live-Video-Streaming

Repository files navigation

MoViNets for Violence Detection in Video Streaming

Overview

This project aims to harness the capabilities of MoViNet models to accurately detect instances of violence in video streams. By employing strategies such as transfer learning and fine-tuning, the objective is to develop a high-performance model that can function efficiently on edge devices (like a raspberry pi or other SBCs), which often have limited computational resources. Specifically, the MoViNet-A3 model has demonstrated promising results, achieving an accuracy rate of 85%. This level of precision underscores the model's potential for real-time applications in environments where quick and reliable video analysis is critical.

Key Features

  • Model Training: Utilizes MoViNets, an advanced architecture from Google Research (Kondratyuk et al., 2021) known for its efficiency in mobile and edge computing environments. Employs transfer-learning on this pre-trained models on human action recognition to enhance learning efficacy and reduce the necessity for extensive computational resources. The code is available in 'movinet_training.ipynb'. The training and the evaluation metrics are availible in the folders above, as well as a futher analysis on those results.
  • Real-time Operation: Optimized for real-time applications, ensuring swift and accurate violence detection, the inference can be performed through 'movinet_inference.ipynb'

Example of the visual interface for the inference

Fight_2 Fight_1

More examples on the 'example_videos' folder

Usage of normal inference

You can use directly the Colab Notebook here (RECOMENDED)

Or you can run it on the python script in a virtual environment:

Requirements

  • Python 3.10+
  • TensorFlow 2.15+
  • linux distro
  • Other dependencies listed in requirements.txt

Steps

  1. Clone the repository
    git clone https://github.com/engares/MoViNets-for-Violence-Detection-in-Live-Video-Streaming.git
    cd MoViNets-for-Violence-Detection-in-Live-Video-Streaming
  2. Install the required packages
    pip install -r requeriments.txt
    sudo apt update && sudo apt install -y ffmpeg
  3. Download the models (1.8 GB, this may take time)
    git clone https://huggingface.co/engares/MoViNet4Violence-Detection
    
  4. Run 'movinet_inference.py' indicating the path to the video and one ofselecting one of the trained models based on the hyperparameters. (The best model is chosen by default)
    python movinet_inference.py [/path/to/video.mp4] --model_id a3 --lr 0.001 --bs 64 --dr 0.3 --trly 0

The full list of models with its performance metrics is available is on this .csv

Note. This Tensorflow implementation does not work for tf-lite

Usage for tf-lite inference

Tested on an Orange pi 5

Requirements

  • Python 3.10+
  • TensorFlow 2.15+
  • Jupyter-notebook
  • linux distro based Single-board Computer
  • Other dependencies listed in requirements_tf_lite.txt

Steps

  1. Downloading the repo
    git clone https://github.com/engares/MoViNets-for-Violence-Detection-in-Live-Video-Streaming.git
    cd MoViNets-for-Violence-Detection-in-Live-Video-Streaming
  2. Install the required packages
    sudo apt-get install pkg-config libhdf5-dev
    pip install -r ./MoViNets-for-Violence-Detection-in-Live-Video-Streaming/requeriments_tflite.txt
    
  3. Open the 'movinet_tf_lite_inference.ipynb' and select the file. Also available here on colab, if you want to connect it to your local Runtime

Note. The default tflite model used is the one avilable in this repo, corresponfing to the best model trained. If you want to export your own tf-lite model, check the last section of this Colab Notebook

About

Violence recognition in streaming video using Transfer Learning and MoViNets. The project leverages state-of-the-art deep learning techniques to create an efficient and accurate violence detection system.

Topics

Resources

Stars

Watchers

Forks