videoqa is the dataset and the algorithms used in Unifying the Video and Question Attentions for Open-Ended Video Question Answering
- file_map: contains the Tumblr urls of the videos
- QA: contains the question-answer pairs
- Split: contains the dataset split in the paper
- [E-SA] (https://www.aaai.org/ocs/index.php/AAAI/AAAI17/paper/viewFile/14906/14319)
- [SS-VQA] (https://www.aaai.org/ocs/index.php/AAAI/AAAI17/paper/viewFile/14906/14319)
- Mean-VQA: a designed baseline where imageQA is performed on each frame
- Question: What is a boy combing his hair with?
- Groundtruth: with his fingers
- Prediction: with his hands
- Question: What runs up a fence?
- Groundtruth: a cat
- Prediction: a cat
- Question: What is a young girl in a car adjusting?
- Groundtruth: her dark glasses
- Prediction: her hair
python main.py
If you use the code or our dataset, please cite our paper
@article{xue2017unifying,
title={Unifying the Video and Question Attentions for Open-Ended Video Question Answering},
author={Xue, Hongyang and Zhao, Zhou and Cai, Deng},
journal={IEEE Transactions on Image Processing},
year={2017},
publisher={IEEE}
}