This is official implementation for WACV2024 paper: Object-centric Video Representation for Long-term Action Anticipation
If you are using OSCAR (Brown University's cluster):
module load python/3.9.0 ffmpeg/4.0.1 gcc/10.2
Clone this repository.
git clone git@github.com:brown-palm/ObjectPrompt.git
cd ObjectPrompt
Set up Python (3.9) virtual environment. Install pytorch with the right CUDA version.
python3 -m venv venvs/objectprompt
source venvs/objectprompt/bin/activate
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117
Install other packages.
pip install -r requirements.txt
Download Ego4D dataset, annotations and pretrained models. Remember to set path_to_ego4d
to the ego4D data path.
# set download dir for Ego4D
# export EGO4D_DIR=path_to_ego4d
# link data to the current project directory
mkdir -p data/ego4d/annotations/ data/ego4d/clips_hq/ data/ego4d/clips/
ln -s ${EGO4D_DIR}/v1/annotations/* data/ego4d/annotations/
ln -s ${EGO4D_DIR}/v1/clips/* data/ego4d/clips_hq/
ln -s ${EGO4D_DIR}/v1/clips_low_res/* data/ego4d/clips/
# link model files to current project directory
mkdir -p pretrained_models
ln -s ${EGO4D_DIR}/v1/lta_models/pretrained_models/* pretrained_models/
Download image-level and object-level features extracted using GLIP and CLIP to the following paths.
data/glip/object_manual_olcs
data/glip/image
data/clip/object_manual_olcs
data/clip/image
T: number of frames @1FPS.
GLIP image-level feature size: [T, 256]
GLIP object-level feature size: [T, 10, 232]
. 10: number of objects per frame. 232 = 256 (object feature size) + 4 (object bbox: x1, y1, x2, y2) + 1 (object class ID) + 1 (confidence score)
CLIP image-level feature size: [T, 256]
CLIP object-level feature size: [T, 10, 774]
. 774 = 768 (object feature size) + 4 + 1 + 1
NCCL_P2P_DISABLE="1" python -m scripts.run \
--cfg configs/ego4d/sf_video.yaml \
--exp_name ego4d/sf_video
This will create log files and checkpoints in lightning_logs/ego4d/sf_video
. You can launch tensorboard to monitor training process.
After training, please go to lightning_logs/ego4d/sf_video
to find the best checkpoint. Set CKPT_PATH
to the path of the best checkpoint.
NCCL_P2P_DISABLE="1" python -m scripts.run \
--cfg configs/ego4d/sf_video.yaml \
--exp_name ego4d/sf_video \
train.enable False \
test.enable True \
ckpt_path CKPT_PATH
# GLIP object features:
NCCL_P2P_DISABLE="1" python -m scripts.run \
--cfg configs/ego4d/sf_video_image_object_glip.yaml \
--exp_name ego4d/sf_video_image_object_glip_lr1e-3_epoch30 \
data.image.base_path data/glip/image \
data.object.base_path data/glip/object_manual_olcs
# CLIP object features:
NCCL_P2P_DISABLE="1" python -m scripts.run \
--cfg configs/ego4d/sf_video_image_object_clip.yaml \
--exp_name ego4d/sf_video_image_object_clip_lr1e-3_epoch30 \
data.image.base_path data/clip/image \
data.object.base_path data/clip/object_manual_olcs
This will create log files and checkpoints in lightning_logs/$exp_name$
. You can launch tensorboard to monitor training process.
After training, please go to lightning_logs/$exp_name$
to find the best checkpoint. Set CKPT_PATH
to the path of the best checkpoint.
# GLIP object features:
NCCL_P2P_DISABLE="1" python -m scripts.run \
--cfg configs/ego4d/sf_video_image_object_glip.yaml \
--exp_name ego4d/sf_video_image_object_glip_lr1e-3_epoch30 \
train.enable False \
test.enable True \
ckpt_path CKPT_PATH
# CLIP object features:
NCCL_P2P_DISABLE="1" python -m scripts.run \
--cfg configs/ego4d/sf_video_image_object_clip.yaml \
--exp_name ego4d/sf_video_image_object_clip_lr1e-3_epoch30 \
train.enable False \
test.enable True \
ckpt_path CKPT_PATH