This repo is the official Pytorch implementation of paper:
"Long-Short Temporal Co-Teaching for Weakly Supervised Video Anomaly Detection"
Please follow the requirements.txt
You can use the pre-trained I3D such pytorch-resnet3d or C3D model for feature extraction.
You can also download the extracted I3D features from links below:
ShanghaiTech I3D features (code:8XJB)ShanghaiTech I3D features (code:KV44)
UCF-Crime I3D features (code:6EB8)UCF-Crime I3D features (code:344D)
UBnormal I3D features (code:PYL5)UBnormal I3D features (code:34A4)
Take the example of ShanghaiTech, run the following commands:
python --encoder_weight_init --regressor_weight_init --FFN_layerNorm --MHA_dropout 0.3 --FFN_dropout 0.3 --dataset_path SHT_I3D_16PATCH.h5 --gpu 0
Generating the pseudo labels of spatio-transformer:
python --dataset SHT --n_patch 16 --FFN_layerNorm --threshold 0.9 --pseudo_labels_path STN_pseudo_labels.npy --training_txt SH_Train_new.txt --dataset_path SHT_I3D_16PATCH.h5 --gpu 0
python --part_len 3 --MHA_layerNorm --FFN_layerNorm --relative_position_encoding --pseudo_labels_path STN_pseudo_labels.npy --dataset_path SHT_I3D_16PATCH.h5 --gpu 0
Generating the pseudo labels of temporal-transformer:
python --dataset SHT --relative_position_encoding --n_hidden 4096 --n_patch 16 --n_head 8 --d_k 256 --d_v 256 --part_len 3 --MHA_layerNorm --FFN_layerNorm --dataset_path SHT_I3D_16PATCH.h5 --temporal_model_path temporal_model --classifier_model_path classifier_model --pseudo_labels_path LTN_pseudo_labels.npy --training_txt SH_Train_new.txt --threshold 0.65 --gpu 0
For multi-gpu training, you can use the command:
--data_parallel --gpu id0,id1
- You can download the checkpoint models from links below:
ShanghaiTech (code:L958)ShanghaiTech (code:3UJ9)
for ShanghaiTech:
python --dataset SHT --temporal_MHA_layerNorm --temporal_FFN_layerNorm --temporal_relative_position_encoding --dataset_path SHT_I3D_16PATCH.h5 --temporal_model_path shanghaitech_temporal_model_oneCrop_I3D_RGB_0.9779.ckpt --classifier_model_path shanghaitech_classifier_model_oneCrop_I3D_RGB_0.9779.ckpt --gpu 0
for UBnormal:
python --dataset UBnormal --d_model 1024 --part_len 5 --temporal_MHA_layerNorm --temporal_FFN_layerNorm --temporal_relative_position_encoding --dataset_path UBnormal_I3D_16PATCH.h5 --temporal_model_path UBnormal_temporal_model_oneCrop_I3D_RGB_0.7551.ckpt --classifier_model_path UBnormal_classifier_model_oneCrop_I3D_RGB_0.7551.ckpt --test_mask_dir data/UBnormal/test_frame_mask --training_txt data/UBnormal/train_video_names_frames.txt --testing_txt data/UBnormal/test_video_names_frames.txt --gpu 0
for UCF-Crime:
python --n_patch 9 --part_num 32 --part_len 2 --dataset_path UCF_I3D_9PATCH.h5 --temporal_MHA_layerNorm --temporal_FFN_layerNorm --temporal_model_path UCF_temporal_model_oneCrop_I3D_RGB_0.8570.ckpt --classifier_model_path UCF_classifier_model_oneCrop_I3D_RGB_0.8570.ckpt --relative_position_encoding --gpu 0
Tips: If the model is trained by multi-gpu mode, you must add the command
in the inference stage.
This repo is released under the MIT License.
If this repo is useful for your research, please consider citing our paper:
title={Long-short temporal co-teaching for weakly supervised video anomaly detection},
author={Sun, Shengyang and Gong, Xiaojin},
booktitle={2023 IEEE International Conference on Multimedia and Expo (ICME)},
Partial codes are based on MIST, we sincerely thank them for their contributions.