ActionAtlas (NeurIPS 2024 D&B)

This is the official repository for the ActionAtlas benchmark. The benchmark evaluates large multimodal-models on videos of complex actions in specialized domains. This first version of the benchmark focuses on sports moves.

Installation

You can install the package from source with pip or poetry:

pip install -e .

Usage

Download the metadta either from this google drive or from HuggingFace
Each sample in the metadata contains a YouTube ID and the metadata of that video. Download the video from YouTube. Please take a look at action_atlas/download_yt_videos.py provided in this repo.
Extract ActionAtlas video segments from the original videos using action_atlas/extract_segments.py:

python action_atlas/extract_segments.py \
    --data_fpath /path/to/metadata.json \
    --yt_videos_dir /path/to/downloaded_yt_videos \
    --out_segments_dir /path/to/output_dir/for/segments \
    --max_workers 32

There is text on some of the videos that leak information about the action. We have already found polygons obfuscating the text using Google Cloud Vision API and provided them in the metadata. You can reused them to obfuscate text by running action_atlas/obfuscate_text.py:

python action_atlas/obfuscate_text.py  \
    obfuscate_text_in_videos_with_masks \
    --data_fpath /path/to/metadata.json \
    --video_segments_dir /path/to/extracted_segments \
    --out_dir /path/to/output_dir/for/final/segments/including/obfuscated \
    --max_workers 32

Note that after running the above command all videos in ActionAtlas will be stored in out_dir, including those with obfuscated text.

We have provided example script to evaluate both proprietary and open models. Please take a look at action_atlas/eval_proprietary.py and action_atlas/eval_qwen2_vl.py.

Citation

If you use this dataset in your research, please cite the following paper:

@misc{salehi2024actionatlasvideoqabenchmarkdomainspecialized,
      title={ActionAtlas: A VideoQA Benchmark for Domain-specialized Action Recognition},
      author={Mohammadreza Salehi and Jae Sung Park and Tanush Yadav and Aditya Kusupati and Ranjay Krishna and Yejin Choi and Hannaneh Hajishirzi and Ali Farhadi},
      year={2024},
      eprint={2410.05774},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2410.05774},
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
action_atlas		action_atlas
static		static
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ActionAtlas (NeurIPS 2024 D&B)

Installation

Usage

Citation

About

Uh oh!

Releases

Packages

Languages

License

mrsalehi/action-atlas

Folders and files

Latest commit

History

Repository files navigation

ActionAtlas (NeurIPS 2024 D&B)

Installation

Usage

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages