Skip to content

SKKUAutoLab/aicity_2024_driving_action

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

[CVPRW 2024] Multi-View Spatial-Temporal Learning for Understanding Unusual Behaviors in Untrimmed Naturalistic Driving Videos

This repository contains the source code for AI City Challenge 2024 Track 3 (Naturalistic Driving Action Recognition).

  • Team Name: SKKU-AutoLab.
  • Team ID: 05.

1. Setup

1.1 Run from conda (for both training and inference)

Using environment.yml

conda env create --name track3 --file=environment.yml
conda activate track3
pip install torch==1.13.0+cu117 torchvision==0.14.0+cu117 --extra-index-url https://download.pytorch.org/whl/cu117

Using requirements.txt

conda create --name track3 python=3.10.13
conda activate track3
pip install -r requirements.txt
pip install torch==1.13.0+cu117 torchvision==0.14.0+cu117 --extra-index-url https://download.pytorch.org/whl/cu117
pip install detectron2-0.6-cp310-cp310-linux_x86_64.whl

1.2 Run from Docker (only for inference)

sudo docker load < docker_aic24_track3_final.tar
sudo docker run --ipc=host --gpus all -v <LOCAL_INPUT_DATA>:/usr/src/aic24-track_3/B/ \
				      -v <LOCAL_OUTPUT_FOLDER>:/usr/src/aic24-track_3/output_submission/ \
				      -it <IMAGE_ID>
bash run_infer_all.sh
Example: sudo docker run --ipc=host --gpus all -v /home/vsw/Downloads/B/:/usr/src/aic24-track_3/B/ \
					       -v /home/vsw/Downloads/output_submission/:/usr/src/aic24-track_3/output_submission/ \
					       -it 96f8bfc76877

2. Dataset preparation

To get cut videos for training X3D, UniformerV1_1, and VideoMAE, please download it from this link. After downloading, extract the file and put it to three folders X3D_train/data, VideoMAE_train/data/A1_clip (only put sub folders in the A1_clip folder), and UniformerV2_1_train/data.

To get custom cut videos for training UniformerV2_2, please download it from this link. After downloading, extract the file and put it to folder UniformerV2_2_train/data.

To get pretrained weights for UniformerV2_1 and UniformerV2_2, please download it from this link and this link. After downloading, extract the file and put it to two folders UniformerV2_1 and UniformerV2_2.

To get pretrained weights for VideoMAE, please download it from this link. After downloading, extract the file and put it to the folder VideoMAE_train.

To get docker file to make an inference on a custom dataset, please download it from this link.

3. Weight preparation

To get X3D weights, please download them from this link. After downloading, extract the file and put it to the folder X3D_train.

To get UniformerV2_1 weights, please download them from this link. After downloading, extract the file and put it to the folder UniformerV2_1_train.

To get UniformerV2_2 weights, please download them from this link. After downloading, extract the file and put it to the folder UniformerV2_2_train.

To get VideoMAE weights, please download them from this link. After downloading, extract the file and put it to the folder VideoMAE_train.

3. Dataset structure

3.1 X3D

For X3D model, the dataset is organized with the following structure:

X3D_train
|_ data
|  |_ A1_clip
|  |  |_ 0
|  |  |  |_ *.mp4
|  |  |_ 1
|  |  |  |_ *.mp4
|  |  |_ ...
|  |  |  |_ *.mp4
|  |  |_ 15
|  |  |  |_ *.mp4
|  |_ *.csv
|_ pickle_x3d
|  |_ A2
|  |  |_ *.pkl
|_ checkpoint_x3d
|  |_ *.pyth

3.2 UniformerV2_1

For UniformerV2_1 model, the dataset is organized with the following structure:

UniformerV2_1_train
|_ A2
|  |_ user_id_12670
|  |  |_ *.mp4
|  |_ user_id_13148
|  |  |_ *.mp4
|  |_ ...
|  |  |_ *.mp4
|  |_ user_id_96715
|  |  |_ *.mp4
|_ data
|  |_ A1_clip
|  |  |_ 0
|  |  |  |_ *.mp4
|  |  |_ 1
|  |  |  |_ *.mp4
|  |  |_ ...
|  |  |  |_ *.mp4
|  |  |_ 15
|  |  |  |_ *.mp4
|_ pickle_uniformerv2_full
|  |_ *.pkl
|_ checkpoint_uniformerv2_full
|  |_ *.pyth
|_ k710_uniformerv2_l14_8x336.pyth
|_ vit_saved
|  |  |_ vit_b16.pth
|  |  |_ vit_l14.pth
|  |  |_ vit_l14_336.pth

3.3 UniformerV2_2

For UniformerV2_2 model, the dataset is organized with the following structure:

UniformerV2_2_train
|_ A2
|  |_ user_id_12670
|  |  |_ *.mp4
|  |_ user_id_13148
|  |  |_ *.mp4
|  |_ ...
|  |  |_ *.mp4
|  |_ user_id_96715
|  |  |_ *.mp4
|_ data
|  |_ A1_clip_custom
|  |  |_ 0
|  |  |  |_ *.mp4
|  |  |_ 1
|  |  |  |_ *.mp4
|  |  |_ 2
|  |  |  |_ *.mp4
|  |  |_ 3
|  |  |  |_ *.mp4
|_ pickle_uniformerv2_4lcs
|  |_ *.pkl
|_ checkpoint_uniformerv2_4cls
|  |_ *.pyth
|_ k710_uniformerv2_l14_8x336.pyth
|_ vit_saved
|  |  |_ vit_b16.pth
|  |  |_ vit_l14.pth
|  |  |_ vit_l14_336.pth

3.4 VideoMAE

For VideoMAE model, the dataset is organized with the following structure:

VideoMAE_train
|_ data
|  |_ A1_clip
|  |  |_ 0
|  |  |  |_ *.mp4
|  |  |_ 1
|  |  |  |_ *.mp4
|  |  |_ ...
|  |  |  |_ *.mp4
|  |  |_ 15
|  |  |  |_ *.mp4
|  |  |_ *.csv
|_ pretrained_models
|  |_ vit_l_hybrid_pt_800e_k700_ft.pth

4. Usage

4.1 X3D

To train X3D, follow the code snippets bellow:

cd X3D_train
# Step 1: Train X3D
bash train.sh
# Step 2: Rename and move checkpoints
python move_ckpt.py
cd ..

4.2 UniformerV2_1

To train UniformerV2_1, follow the code snippets bellow:

cd UniformerV2_1_train
# Step 1: Train UniformerV2_1
bash train.sh
# Step 2: Rename and move checkpoints
python move_ckpt.py
cd ..

4.3 UniformerV2_2

To train UniformerV2_2, follow the code snippets bellow:

cd UniformerV2_2_train
# Step 1: Train UniformerV2_2
bash train.sh
# Step 2: Rename and move checkpoints
python move_ckpt.py
cd ..

4.4 VideoMAE

To train VideoMAE, follow the code snippets bellow:

cd VideoMAE_train
# Step 1: Train VideoMAE
bash scripts/cls/train_fold0.sh
bash scripts/cls/train_fold1.sh
bash scripts/cls/train_fold2.sh
bash scripts/cls/train_fold3.sh
bash scripts/cls/train_fold4.sh
# Step 2: Rename and move checkpoints
python move_ckpt.py
cd ..

5. Ensemble model

To ensemble four models, run the following script:

bash run_infer_x3d.sh
bash run_infer_uniformerv2_1.sh
bash run_infer_uniformerv2_2.sh
bash run_infer_videomae.sh
# copy all checkpoints to the infer folder and create the submission file
bash run_infer_all.sh

6. Citation

If you find our work useful, please cite the following:

@inproceedings{nguyen2024multi,
  title={Multi-view spatial-temporal learning for understanding unusual behaviors in untrimmed naturalistic driving videos},
  author={Nguyen, Huy-Hung and Tran, Chi Dai and Pham, Long Hoang and Tran, Duong Nguyen-Ngoc and Tran, Tai Huu-Phuong and Vu, Duong Khac and Ho, Quoc Pham-Nam and Huynh, Ngoc Doan-Minh and Jeon, Hyung-Min and Jeon, Hyung-Joon and others},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={7144--7152},
  year={2024}
}

7. Contact

If you have any questions, feel free to contact Huy H. Nguyen (huyhung411991@gmail.com), Chi D. Tran (ctran743@gmail.com) or Automation Lab (automation.skku@gmail.com).

8. Acknowledgement

Our framework is built using multiple open source, thanks for their great contributions.