Skip to content

NjtechCVLab/TVPReid-Dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

51 Commits
 
 
 
 
 
 

Repository files navigation

TVPReid Dataset for our ACMMM2024 accepted paper TVPR: Text-to-Video Person Retrieval and a New Benchmark [1] Dataset

Introduction

🚨 The currently published dataset is an optimized version of the dataset in the paper.lt is significantly different from the results in the original paper. The latest experimental results on this dataset are published in the following.

To make up for the lack of experimental data for Text-to-Video Person Retrieval (TVPR) tasks, we construct a large-scale dataset and name it the Text-to-Video Person Re-identification (TVPReid) dataset. Our dataset consists of 6559 pedestrian videos from three existing person re-identification datasets: PRID-2011 [2], iLIDS-VID [3], and DukeMTMC-VideoReID [4]. These source data contain image data of different pedestrians from surveillance videos. We first aggregate them into the video form required by the task. For image data from the same pedestrian in the same time period and the same viewpoint, we use OpenCV to integrate them into a video. After integrating the three complete person re-identification datasets and performing data cleaning, we obtain a total of 6559 unique pedestrian videos. We then annotate each pedestrian video with two different sentence descriptions, for a total of 13118 sentences. The sentence descriptions are in a natural language style and contain rich details about the pedestrian's appearance, actions, and environmental elements that the pedestrian interacts with. The average sentence length of the TVPReid dataset is 30 words, and the longest sentence contains 83 words. The TVPReid dataset is divided into training set, validation set and test set with a ratio of 0.8125:0.0625:0.125, which is based on the division method of the MSRVTT [5] dataset. The details of each sub-dataset are shown in the following table, among which TVPReid-PRID has 2268 sentence descriptions, TVPReid-iLIDs has 1200 sentence descriptions, and the largest sub-dataset TVPReid-Duke has 9650 sentence descriptions.

Details for three sub-datasets in TVPReid dataset

TVPReid-PRID TVPReid-iLIDs TVPReid-Duke
Train 921 488 3920
Validate 71 37 302
Test 142 75 603
Total 1134 600 4825

🚨 The TVPReid dataset released by this project is an updated version based on the dataset introduced in the paper. The following table shows the results of experiments on the new version of the TVPReid dataset :(Due to the lack of audio data, the original paper conducted experiments by removing the audio branch in MMT. However, the new experimental results have limited reference value, so the results from this experiment are not included in the table below.)

Comparative experimental results on the new version of TVPReid dataset

Method TVPReid-PRID TVPReid-iLIDs TVPReid-Duke
R@1R@5R@10R@50MdR R@1R@5R@10R@50MdR R@1R@5R@10R@50MdR
Frozen-in-time 38.074.085.698.63.0 19.449.465.499.46.0 30.561.571.791.73.0
X-pool 45.182.490.5-2.0 2463.376.0-3.0 34.165.476.1-3.0
ours 48.481.590.1100.01.7 31.167.679.8100.02.5 37.166.274.694.23.0

🚨 For Frozen-in-time and X-pool, we conducted experiments using the publicly available source code provided in their respective papers, while preserving their original hyperparameter settings. For our own method, we also performed multiple experiments on the new version of the dataset, and the final results reported are the averages of these multiple runs.

Dataset Access

Google Drive

Link: https://drive.google.com/drive/folders/1lmLik5zEPckDGrwakWcAM9XpQQ7Q8NmW?usp=sharing

Baidu Netdisk

Link: https://pan.baidu.com/s/1cKOqq_RJb1zcT3DsJIqy1A?pwd=9jpd Password: 9jpd

Folder structure of the dataset

As shown in the structure below, TVPReid_dataset contains three sub-datasets. Taking TVPReid-Duke/ as an example, the sub-dataset folder contains captions/ and videos/. captions/ contains the complete text description of the sub-dataset: TVPReid-Duke.json, and the partition files: train.csv, test.csv, val.csv, which store the video ID. The folder structure of TVPReid-iLIDs/ and TVPReid-PRID/ is consistent with that of TVPReid-Duke/.

TVPReid_dataset
├── TVPReid-Duke
│   ├── captions
│   │   ├── train.csv
│   │   ├── test.csv
│   │   ├── val.csv
│   │   └── TVPReid-Duke.json
│   └── videos
│       ├── person0001.mp4
│       ├── person0002.mp4
│       ...
│
├── TVPReid-iLIDs
│
└── TVPReid-PRID

Note

Using this project will download third-party open source datasets. Please check the license terms of these open source dataset projects before use.

Citation

If you find TVPReid dataset useful in your work, you can cite the following paper[1]:

@inproceedings{zhang2024tvpr,
  title={TVPR: Text-to-Video Person Retrieval and a New Benchmark},
  author={Zhang, Xu and Ni, Fan and Dong, Guan-Nan and Zhu, Aichun and Wu, Jianhui and Ni, Mingcheng and Liu, Hui},
  booktitle={Proceedings of the 32nd ACM International Conference on Multimedia},
  pages={10105--10113},
  year={2024}
}

Reference

[1] Zhang X, Ni F, Dong G N, et al. TVPR: Text-to-Video Person Retrieval and a New Benchmark[C]//Proceedings of the 32nd ACM International Conference on Multimedia. 2024: 10105-10113.

[2] Hirzer M, Beleznai C, Roth P M, et al. Person re-identification by descriptive and discriminative classification[C]//Image Analysis: 17th Scandinavian Conference, SCIA 2011, Ystad, Sweden, May 2011. Proceedings 17. Springer Berlin Heidelberg, 2011: 91-102.

[3] Wang T, Gong S, Zhu X, et al. Person re-identification by video ranking[C]//Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part IV 13. Springer International Publishing, 2014: 688-703.

[4] Wu Y, Lin Y, Dong X, et al. Exploit the unknown gradually: One-shot video-based person re-identification by stepwise learning[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 5177-5186.

[5] Xu J, Mei T, Yao T, et al. Msr-vtt: A large video description dataset for bridging video and language[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 5288-5296.

About

TVPReid Dataset for Text-to-Video Person Retrieval

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •