- we choose yolov5 as an object detector instead of Faster R-CNN, it is faster and more convenient
- we use a tracker(deepsort) to allocate action labels to all objects(with same ids) in different frames
- our processing speed reached 24.2 FPS at 30 inference batch size (on a single RTX 2080Ti GPU)
Relevant infomation: FAIR/PytorchVideo; Ultralytics/Yolov5
-
2023.03.31 fix some bugs(maybe caused by yolov5 version upgrade), support real time testing(test on camera or video stearm).
-
2022.01.24 optimize pre-process method(no need to extract video to image before processing), faster and cleaner.
-
clone this repo:
git clone https://github.com/wufan-tb/yolo_slowfast cd yolo_slowfast
-
create a new python environment (optional):
conda create -n {your_env_name} python=3.7.11 conda activate {your_env_name}
-
install requiments:
pip install -r requirements.txt
-
download weights file(ckpt.t7) from [deepsort] to this folder:
./deep_sort/deep_sort/deep/checkpoint/
-
test on your video/camera/stream:
python yolo_slowfast.py --input {path to your video/camera/stream}
The first time execute this command may take some times to download the yolov5 code and it's weights file from torch.hub, keep your network connection.
set
--input 0
to test on your local camera, set--input {stream path, such as "rtsp://xxx" or "rtmp://xxxx"}
to test on viewo stream.
Thanks for these great works:
[2] ZQPei/deepsort
[4] AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions. paper
[5] SlowFast Networks for Video Recognition. paper
If you find our work useful, please cite as follow:
{ yolo_slowfast,
author = {Wu Fan},
title = { A realtime action detection frame work based on PytorchVideo},
year = {2021},
url = {\url{https://github.com/wufan-tb/yolo_slowfast}}
}