Official PyTorch implementation of CVPR 2023 Aligning Step-by-Step Instructional Diagrams to Video Demonstrations.
Data Preparation
- Download the dataset in JSON format from here.
- Follow the instructions to download the data.
- Resize the short side of both page and step images to 224px.
- Use the script
script/gen_image_pickle.py
to generate the image pickle files. - Resize the short side of the videos to 224px.
- Following the split files to split the video into 10-second long clips and store the frames in numpy format.
Installation
# clone project
git clone https://github.com/DavidZhang73/AssemblyVideoManualAlignment.git
# [Optional] create conda virtual environment
conda create -n <env_name> python=<3.8|3.9|3.10>
conda activate <env_name>
# [Optional] use mamba instead of conda
conda install mamba -n base -c conda-forge
# [Optional] install pytorch according to the official guide to support GPU acceleration, etc.
# https://pytorch.org/get-started/locally/
# install requirements
pip install -r requirements.txt
Train
python src/main.py fit -c configs/exp/ours.yaml -c configs/exp/{exp_name}.yaml --trainer.logger.name {log_name}
Inference
python src/main.py test -c configs/exp/ours.yaml -c configs/exp/{exp_name}.yaml --trainer.logger.name {log_name}
@inproceedings{Zhang2023Aligning,
author = {Zhang, Jiahao and Cherian, Anoop and Liu, Yanbin and Ben-Shabat, Yizhak and Rodriguez, Cristian and Gould, Stephen},
title = {Aligning Step-by-Step Instructional Diagrams to Video Demonstrations},
booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2023},
}