The YouTube 8M dataset, released in June 2019, provides segment-level annotations with human-verified labels on approximately 237,000 segments across 1,000 classes. This dataset was derived from the validation set of the YouTube-8M dataset.
- Frame Level Data Size: 1.71 TB
- Number of Shards: 3,844
The data is organized with the following schema:
- "video-id": Unique identifier for each video.
- "labels": A list of labels associated with that video.
Each frame in the dataset includes the following features:
- "rgb": Float array of length 1,024.
- "audio": Float array of length 128.
We have provided images to illustrate the architecture and visual aspects of our implementation.
The diagram illustrates the architecture of our implementation, showcasing the flow and components used to process and analyze the YouTube 8M dataset.
We use ipywidgets to have real-time playback of our predictions
- Dataset: YouTube 8M Dataset
- YouTube-8M: A Large-Scale Video Classification Benchmark: Paper
- Learnable pooling with Context Gating for video classification: Antoine Miech, Ivan Laptev, and Josef Sivic. Paper
- Context-gated dbof models for YouTube-8M: Paul Natsev. 2018. PDF
- LinkedIn spark-tfrecord: GitHub Repository
- Kafka in Action: Building a Distributed Multi-Video Processing Pipeline with Python and Confluent: Article