Epic-Kitchens (we used rgb frames)
Ego4D (we used fho subset)
Epic:
-
download hand crops for epic kitchens from the following repo
-
we preprocess the provided crops by applying union on the objects that touch with hands and all visible hands. We keep the default parameters from the respective library.
-
Put in a pickle in a format: dict[segment_id][frame_idx] = (left, top, right, bottom)
(Otherwise, the library works too long if used without preextracting and preprocessing) -
save file with the name:
hand_thd0.8_obj_thd0.01_ONLY_inter_obj_with_HANDS_v2
Download hand crops detection here for Ego4D and apply similar preprocessing: https://github.com/Chuhanxx/helping_hand_for_egocentric_videos
All splits of shared and unique (novel) noun and verb classes are in folder anno/
- follow CoOp to install prerequisites. However, skip installation of Dassl as its modified version is already integrated into the framework and the requirements will be installed during the next step
- Go to the Dassl folder and run:
cd x-mic/Dassl.pytorch
# Install dependencies
pip install -r requirements.txt
# Install this library (no need to re-build if the source code is modified)
python setup.py develop
- [In case of no internet connection during training] In general, CLIP model will be downloaded automatically. However, in case if you do not have internet connection during training, download CLIP vit-b-16 manually and set the path in ‘x-mic/clip/clip’ as a default parameter in _download function “root” parameter.
this step also can be skipped
Epic config: extract_EPIC_clip_vitb16_segments.yaml
To change:
DATASET.ROOT
- where your dataset is located with the structure DATASET.ROOT/annotations
, DATASET.ROOT/epic_kitchens_videos_256ss
and OUTPUT_DIR
Ego config: extract_EGO4D_clip_vitb16.yaml
To change:
DATASET.ROOT - where your dataset is located with the structure DATASET.ROOT/annotations, DATASET.ROOT/epic_kitchens_videos_256ss
DATA.PATH_TO_DATA_DIR:
- path to annotations
DATA.PATH_PREFIX:
- path to videos
DATASET.ROOT
- path to videos (same as path_prefix)
and OUTPUT_DIR
Epic config: extract_EPIC_clip_vitb16_segments_handcrops.yaml
see full frames +
DATASET.DETECTION_ROOT - path to hand crop annotations
Ego4d config: extract_EGO4D_clip_vitb16_handcrops.yaml
To run the script on a subset distributed over 8 gpus:
export OMP_NUM_THREADS=64; export NCCL_ASYNC_ERROR_HANDLING=1; torchrun --standalone --nproc_per_node=8 --nnodes 1 feat_extractor_segments_distributed.py --config_name XX --split YY --distributed --seed 42
To run the script on a subset on a single gpu:
python feat_extractor_segments.py --config_name
XX --split YY --div 0
XX - config name without “.yaml” extension and folder
YY - train or validation
Similarly, features can be extracted with DINO and Lavila models.
Config params:
DATA.PATH_TO_DATA_DIR
- Ego4D dataset annotations location
DATA.PATH_PREFIX
- Ego4D features that will be classified with adopted classifier - best results with hand cropped frames
DATA.PATH_PREFIX_DINO
- Ego4D features that will be adopted - best results with hand cropped frames
DATA.PATH_PREFIX_DINO
2 - Ego4D features that will be adopted. This and previous features will be combined in the adaptation module - best results with full frames
DATALOADER.FEATURES_NAME
- Epic features that will be classified with adopted classifier - best results with hand cropped frames
DATALOADER.FEATURES_NAME_DINO
- Epic features that will be adopted - best results with hand cropped frames
DATALOADER.FEATURES_NAME_DINO2
- Epic features that will be adopted. This and previous features will be combined in the adaptation module - best results with full frames
note that all these features can be the same. If use the model without hand crops, set DATALOADER.USE_DINO_FEATURES2
= False
Set resolution of conditioning features in DATALOADER.DINO_DIM
if it’s different from 512
If only one dataset is available, disable cross-dataset evaluation by setting TEST.CROSS_DATASET.EVAL = False
train X-MIC config: XMIC_vitb16.yaml
setup data or feature paths for one or two datasets
XX - name of the config file located in scripts/configs folder
With single gpu:
Epic nouns:
sh scripts/baselines/epic_gpu1.sh noun XX
Epic verbs:
sh scripts/baselines/epic_gpu1.sh verb XX
Ego4d nouns:
sh scripts/baselines/ego_gpu1.sh noun XX
Ego4d verbs:
sh scripts/baselines/ego_gpu1.sh verb XX
With 8 gpus:
Epic nouns:
sh scripts/baselines/epic_gpu8.sh noun XX
Epic verbs:
sh scripts/baselines/epic_gpu8.sh verb XX
Ego4d nouns:
sh scripts/baselines/ego_gpu8.sh noun XX
Ego4d verbs:
sh scripts/baselines/ego_gpu8.sh verb XX
- Model code is trainers/xmic.py
- To add additional trainer include it also in train.py or train_dist.py
Unfortunately, after my internship all models and data were deleted due to internal refactoring. Therefore, I lost all the pretrained models, parts of code and could not make a final verification of the code.
Feel free to connect with me via email in case of any questions.
I sincerely apologise for the inconvenience it may cause.
If you use our work, please consider citing:
@inproceedings{kukleva2024xmic,
title={X-MIC: Cross-Modal Instance Conditioning for Egocentric Action Generalization},
author={Kukleva, Anna and Sener, Fadime and Remelli, Edoardo and Tekin, Bugra and Sauser, Eric and Schiele, Bernt and Ma, Shugao},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2024}
}