DeViBench: The first benchmark evaluates how video quality affects MLLM accuracy.

This repository maintains the DeViBench (Degraded Video Understanding Benchmark) from the HotNets paper "Chat with AI: The Surprising Turn of Real-time Video Communication from Human to AI".

*The benchmark is continuously growing, and we are considering transforming more existing Streaming Video Understanding Benchmarks to the "DeViBench style".

📁 Dataset Content

The dataset file datasets.csv contains the following columns:

Column	Description
`sample_folder`	Corresponding video ID. This dataset uses the same video files as the StreamingBench Real-Time Visual Understanding dataset, available for download from the provided link.
`start_time`	Start time of the video segment. Questions refer to a 5-second segment starting from this timestamp.
`question`	The question content.
`options`	Available options for the question.
`standard_answer`	Standard answer to the question.
`task_type`	Task type of the question.

The prompt format for asking questions:

You are a multiple-choice question answering assistant.
Your task is to output only the letter of the correct option, without any explanation or extra text.
Question:
[Question]
Options:
[Options]

📥 Data Download

Video files can be downloaded from:

https://huggingface.co/datasets/mjuicem/StreamingBench/tree/main

Please match the video files with the corresponding sample_folder in datasets.csv.

📊 Distribution of our generated QA samples

Outer ring: QA categories. Inner ring: Whether the question requires multiple frames to answer.

An example (200 kbps vs 2000 kbps). If the video fails to load or has color distortion, please try Chrome browser.

example.mp4

Question: How does the road gradient (GRAD) change as the cyclist progresses?

Options:

(A) Remains constant at -1
(B) Gradually becomes more negative
(C) Fluctuates between -1 and -2
(D) Gradually becomes less negative

Standard Answer: B

Answer from 200 kbps: D

Answer from 2000 kbps: B

🔧 Pipeline for automatic QA sample construction

🔧 How to improve accuracy?

Please refer to our HotNets paper for context-aware video streaming. Our method allocates more bits to chat-important regions (e.g., purple circles) and fewer bits to chat-irrelevant regions (e.g., yellow circles), thus improving MLLM accuracy.

To achieve fine-grained QP control, we adopt H.265 implemented by Kvazaar to encode ours and baseline. Except for the QP values, ours and baseline use the same encoding parameters.

The specific Kvazaar command lines are as follows:

kvazaar -i {input.yuv} --input-res={resolution} --gop 0 --period 0 --input-fps {fps} --qp {qp} [--roi roi.txt] -o {output.mp4}

Where the fps remains the same as the original video. The resolution gradually decreases as the bitrate decreases. For example, we set 1920×1080 for 800 Kbps, 1600×900 for 600 Kbps, 1280×720 for 400 Kbps, and 1024×576 for 200 Kbps. --roi is optional and is only used in our method. Both our method and the baseline achieve the target bitrates by adjusting --qp.

📝 Citation

@article{wu2025chat,
  title={Chat with AI: The Surprising Turn of Real-time Video Communication from Human to AI},
  author={Wu, Jiangkai and Ren, Zhiyuan and Liu, Liming and Zhang, Xinggong},
  journal={arXiv preprint arXiv:2507.10510},
  year={2025}
}

⚠️ Important Note

The data is model-generated and may contain minor errors. We recommend using it primarily for exploratory analysis and demonstration purposes.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
docs		docs
old_versions		old_versions
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
datasets.csv		datasets.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DeViBench: The first benchmark evaluates how video quality affects MLLM accuracy.

📁 Dataset Content

📥 Data Download

📊 Distribution of our generated QA samples

🔧 Pipeline for automatic QA sample construction

🔧 How to improve accuracy?

📝 Citation

⚠️ Important Note

About

Uh oh!

Releases

Packages

License

pku-netvideo/DeViBench

Folders and files

Latest commit

History

Repository files navigation

DeViBench: The first benchmark evaluates how video quality affects MLLM accuracy.

📁 Dataset Content

📥 Data Download

📊 Distribution of our generated QA samples

🔧 Pipeline for automatic QA sample construction

🔧 How to improve accuracy?

📝 Citation

⚠️ Important Note

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages