End-to-End Video Scene Graph Generation with Temporal Propagation Transformer

This repository provides the official implementation of the End-to-End Video Scene Graph Generation with Temporal Propagation Transformer paper.

Installation

pip3 install -r requirements.txt
Install PyTorch>=1.5 and torchvision>=0.6 from here.
pip install pycocotools
Install MultiScaleDeformableAttention package: python src/trackformer/models/ops/setup.py build --build-base=src/trackformer/models/ops/ install

Data preparation

ActionGenome

1.Preprocess and dump frames following https://github.com/JingweiJ/ActionGenome

2.Convert to COCO annotation format using python src/generate_coco_from_actiongenome.py

VidHOI

1.Download and prepare VidHOI following https://github.com/coldmanck/VidHOI

2.Dump frames

python src/generate_coco_from_vidhoi.py --task dump_frames

3.Convert to COCO annotations format

python src/generate_coco_from_vidhoi.py --task convert_coco_annotations

Training & Evaluation

All the running scripts are in ./runs directory.

1.Tain a Transformer-based detector for object detection in individual video frames.

sh ./runs/vidhoi_detr.sh your_output_dir

2.Fine-tune the Transformer-based detector together with the QPM module, to further build temporal associations of detected instances.

sh ./runs/vidhoi_detr+tracking.sh your_output_dir

3.Freeze all parameters of the architecture learnt in previous step, and only optimize the modules for relation recognition.

sh ./runs/vidhoi_detr+tracking+hoi.sh your_output_dir

4.Jointly fine-tune the whole framework.

# Set freeze_detr=False in ./runs/vidhoi_detr+hoi.sh
sh ./runs/vidhoi_detr+hoi.sh your_output_dir

Acknowledgement

The codebase builds upon DETR, Deformable DETR, TrackFormer, STTran and ByteTrack. Thanks for their wonderful works.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.circleci		.circleci
.github		.github
.run		.run
cfgs		cfgs
data		data
docs		docs
logs		logs
models		models
runs		runs
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

End-to-End Video Scene Graph Generation with Temporal Propagation Transformer

Installation

Data preparation

ActionGenome

VidHOI

Training & Evaluation

Acknowledgement

About

Releases

Packages

Languages

License

zyong812/TPT_TMM23

Folders and files

Latest commit

History

Repository files navigation

End-to-End Video Scene Graph Generation with Temporal Propagation Transformer

Installation

Data preparation

ActionGenome

VidHOI

Training & Evaluation

Acknowledgement

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages