Skip to content
/ TRACE Public

[Preprint] TRACE: Temporal Grounding Video LLM via Casual Event Modeling

License

Notifications You must be signed in to change notification settings

gyxxyg/TRACE

Repository files navigation

If our project helps you, please give us a star ⭐ and cite our paper!

hf_space hf_checkpoint hf_data arxiv Hits

News

  • 10/10/2024, 🔥 Annotation files of training data are released!
  • 10/10/2024, 🔥 Our model checkpoints and code are released!

TODO

  • Release the model checkpoints
  • Release the inference and evaluation code
  • Release the training and fine-tuning code
  • Release the training data
  • Release the TRACE-Retrieval, which outputs timestamps of input frames instead of predict unseen timestamps.
  • Train TRACE models on more tasks.

Overview

In this work

  • We model the videos by a series of events, and propose causal event modeling framework to capture videos' inherent structure.
  • We present a novel task-interleaved video LLM model, TRACE, tailored to implement the causal event modeling framework through the sequential encoding/decoding of timestamps, salient scores, and textual captions.
Overview of TRACE
Overview of TRACE.

Enviroments

We use NPU environments for training and fine-tuning, and use V100 GPUs for evaluation. The environment we use can be found in npu-requirements and gpu-requirements.

Model Zoo

Checkpoints Description URL
Initialization Weights initialized from VideoLLaMA2 trace-init
Stage-1 Model checkpoints trained after stage-1 trace-stage1
Stage-2 Model checkpoints trained after stage-2 trace
FT-Charades Fine-tuned on Charades-STA dataset trace-ft-charades
FT-Youcook2 Fine-tuned on Youcook2 dataset trace-ft-youcook2
FT-QVHighlights Fine-tuned on QVHighlights dataset trace-ft-qvhighlights

Inference and Evaluation

Please make sure the model and video paths are correct before running the code.

Training

Stage 1 training

bash TRACE/scripts/train/pretrain-128.sh

Stage 2 training

bash TRACE/scripts/train/sft-128.sh

Fine-tune on downsteam task

bash TRACE/scripts/train/sft-youcook2.sh

Please config the data and model paths before running the scrips.

Results

Youcook2 (Zero-Shot) CIDER METEOR SODA_c F1
TRACE 8.1 2.8 2.2 22.4
Charades-STA (Zero-Shot) 0.3 0.5 0.7 mIOU
TRACE 58.6 40.3 19.4 38.7
QVHighlights (Zero-Shot) mAP Hit@1
TRACE 26.8 42.7
ActivityNet-DVC CIDER METEOR SODA_c F1
TRACE 25.9 6.0 6.4 39.3
ActivityNet-MR 0.3 0.5 0.7 mIOU
TRACE 53.0 37.7 24.0 39.0

Demo

Demo of TRACE
Demo of TRACE.

Acknowledgement

We are grateful for the following awesome projects:

Bibliography

If you find this repository helpful for your project, please consider citing:

@misc{guo2024tracetemporalgroundingvideo,
      title={TRACE: Temporal Grounding Video LLM via Causal Event Modeling}, 
      author={Yongxin Guo and Jingyu Liu and Mingda Li and Xiaoying Tang and Qingbin Liu and Xi Chen},
      year={2024},
      eprint={2410.05643},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2410.05643}, 
}