DATASET

The CNVid-3.5M dataset is a large-scale video-text Chinese dataset.

Download

We provide the following files:

cn_vid3_5m_v1.csv: This CSV contains all of the CNVid video urls.
caption.json: This json file contains the captions for all the CNVid videos. It is stored as a dictionary where each key is a video_id. Each value of the dictionary is another dictionary with the keys ['text', 'start', 'end'] where the value of 'text' is a list of all the captions from the given video_id, and 'start' and 'end' are arrays correspondings to the start and end time timestamp of the captions (in second).

It is recommended to use SVD-download to download the original videos in the CNVid-3.5M dataset.

Another instruction/project that may be helpful to download original videos in the CNVid-3.5M dataset: MediaCrawler.

The CNVid-3.5M dataset contains 3.5 Million <video, text> pairs.
Topics: CNVid-3.5M contains a total of 5.7 Million topics, and each video has 1.63 topics on average.
Keywords: CNVid-3.5M contains a total of 21.5 Million keywords, and each video has 6.15 keywords on average.
Video Durations: The maximum, minimum, and average video duration of CNVid-3.5M is 1,974s, 4s, and 36.34s, respectively.
PoS Tags of the ASR Text: CNVid-3.5M contains a total of 17.0 Million verbs, 9.1 Million nouns, and 3.2Million adjectives, respectively.