Skip to content

Reinforced active video acquisition (AVA) framework designed to select the most valuable video partitions for training. (AAAI'25)

License

Notifications You must be signed in to change notification settings

MSNLAB/Active-Video-Acquisition

Repository files navigation

Multi-Edge Reinforced Collaborative Data Acquisition for Continuous Video Analytics by Prioritizing Quality over Quantity

GitHub license

🌈 Overview

In this work, we propose a multi-edge collaborative active video acquisition (AVA) framework to collaboratively learn a reinforced video acquisition strategy to identify informative video frames from multiple edge nodes that best enhance model accuracy, avoiding redundancy across edges. Extensive experiments on three video datasets demonstrate that, our method achieves comparable performance to full-set video training while utilizing only 20% of the data in classification tasks. In object detection tasks, our methods can maintain productive accuracy with a reduction of nearly 70% in training video frames.

architecture

🚀 Quick start

  1. Please install all dependencies from requirements.txt after creating your own python environment with python >= 3.10.
pip3 install -r requirements.txt
  1. Download the datasets, and then unzip on the project root path.
  1. Startup the experiments.
python3 train_policy.py \
    --dataset "visdrone" \
    --source_root "/path/to/visdrone"

🛠️ Configuration

You can configure the training process by passing arguments to the script. Here are some of the key configurations:

  • --dataset: Specify the dataset to use (e.g., visdrone_n3, core50_ni_c10, mot15_n3).
  • --model: Choose the model architecture (e.g., fasterrcnn_resnet50_fpn, resnet50, vit_base).
  • --agent_net: Define the network type for the agent (e.g., transformer).
  • --episodes: Set the number of training episodes.
  • --env_task: Choose the task type (e.g., classification or detection).
  • --node_cnt: Number of nodes (agents) for distributed training.
  • --checkpoint_interval: Save a checkpoint every N episodes.

📦 Datasets

Dataset Task Description
core50_ni_c10 Classification CORE50 dataset with 10 classes, scenario NI
core50_ni_c50 Classification CORE50 dataset with 50 classes, scenario NI
core50_simulation Classification Simulated CORE50 dataset
visdrone_n1 Detection VisDrone dataset with 1 node configuration
visdrone_n3 Detection VisDrone dataset with 3 nodes configuration
visdrone_n5 Detection VisDrone dataset with 5 nodes configuration
visdrone_n8 Detection VisDrone dataset with 8 nodes configuration
mot15_n1 Detection MOT15 dataset with 1 node configuration
mot15_n3 Detection MOT15 dataset with 3 nodes configuration
mot15_n5 Detection MOT15 dataset with 5 nodes configuration
mot15_n8 Detection MOT15 dataset with 8 nodes configuration

✨ Models

Model Task Description
resnet18 Classification ResNet-18 model
resnet34 Classification ResNet-34 model
resnet50 Classification ResNet-50 model
resnet101 Classification ResNet-101 model
resnet152 Classification ResNet-152 model
vgg11 Classification VGG-11 model
vgg13 Classification VGG-13 model
vgg16 Classification VGG-16 model
vgg19 Classification VGG-19 model
vit_tiny Classification Vision Transformer Tiny model
vit_small Classification Vision Transformer Small model
vit_base Classification Vision Transformer Base model
vit_large Classification Vision Transformer Large model
swin_tiny Classification Swin Transformer Tiny model
swin_small Classification Swin Transformer Small model
swin_base Classification Swin Transformer Base model
swin_large Classification Swin Transformer Large model
cnn Classification Custom CNN model
fasterrcnn_resnet50_fpn Detection Faster R-CNN with ResNet-50 FPN backbone
fasterrcnn_resnet50_fpn_v2 Detection Faster R-CNN with ResNet-50 FPN v2 backbone
fasterrcnn_mobilenet_v3_large_fpn Detection Faster R-CNN with MobileNet v3 large FPN backbone
fasterrcnn_mobilenet_v3_large_320_fpn Detection Faster R-CNN with MobileNet v3 large 320 FPN backbone

🧩 Multi-GPU Training

To start training, run the following command:

python3 train_policy.py --dataset visdrone_n3 --model fasterrcnn_resnet50_fpn --episodes 100000

For distributed training across multiple GPUs, use the --parallel flag to specify the number of GPUs:

python3 train_policy.py --parallel <number_of_gpus>

📖 Citation

If you use this for research, please cite. The example BibTeX entry will be given after paper review.

📚 Acknowledgments

The codes are expanded on the following projects:

📝 License

The project is available as open source under the terms of the MIT License.

About

Reinforced active video acquisition (AVA) framework designed to select the most valuable video partitions for training. (AAAI'25)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages