The HOCap Toolkit is a Python package that provides evaluation and visualization tools for the HO-Cap dataset.
HO-Cap: A Capture System and Dataset for 3D Reconstruction and Pose Tracking of Hand-Object Interaction
Jikai Wang, Qifan Zhang, Yu-Wei Chao, Bowen Wen, Xiaohu Guo, Yu Xiang
[ arXiv ] [ Project page ]
⚠️ ⚠️ 2025-01-13: We fixed the bug in image labels for "hand_joints_3d" and "hand_joints_2d". Please re-download the labels and regenerate the HPE split dataset.- 2025-01-13: The code for image label visualization is added! Please check the here (item 4).
- 2024-12-15: The training codes and datasets for YOLO11 and RT-DETR are added! Please check the here.
- 2024-12-15: The Object Collection dataset is added! Please check the project page for more details.
- 2024-12-14: The Object Collection dataset is added! Please check the project page for more details.
- 2024-12-14: The HO-Cap dataset is updated! Please check the project page for more details.
- 2024-06-24: The HO-Cap dataset is released! Please check the project page for more details.
If HO-Cap helps your research, please consider citing the following:
@misc{wang2024hocapcapturedataset3d,
title={HO-Cap: A Capture System and Dataset for 3D Reconstruction and Pose Tracking of Hand-Object Interaction},
author={Jikai Wang and Qifan Zhang and Yu-Wei Chao and Bowen Wen and Xiaohu Guo and Yu Xiang},
year={2024},
eprint={2406.06843},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2406.06843},
}
HOCap Toolkit is released under the GNU General Public License v3.0.
This code is tested with Python 3.10 and CUDA 11.8 on Ubuntu 20.04. Make sure CUDA 11.8 is installed on your system before running the code.
-
Clone the HO-Cap repository from GitHub.
git clone https://github.com/IRVLUTD/HO-Cap.git
-
Change the current directory to the cloned repository.
cd HO-Cap
-
Create conda environment
conda create -n hocap-toolkit python=3.10
-
Activate conda environment
conda activate hocap-toolkit
-
Install Pytorch and torchvision
python -m pip install torch==2.3.1 torchvision==0.18.1 --index-url https://download.pytorch.org/whl/cu118 --no-cache-dir
-
Install hocap-toolkit package.
python -m pip install -e .
-
Download MANO models and code (
mano_v1_2.zip
) from the MANO website and place the extracted.pkl
files underconfig/mano_models
directory. The directory should look like this:./config/mano_models ├── MANO_LEFT.pkl └── MANO_RIGHT.pkl
-
Run below code to download the whole dataset:
python tools/hocap_downloader.py --subject_id all
-
Or you can download the dataset for a specific subject:
python tools/hocap_downloader.py --subject_id subject_1
-
The downloaded
.zip
files will be extracted to the./datasets
directory. And the directory should look like this:./datasets ├── calibration ├── models ├── subject_1 │ ├── 20231025_165502 │ │ ├── 037522251142 │ │ │ ├── color_000000.jpg │ │ │ ├── depth_000000.png │ │ │ ├── label_000000.npz │ │ │ └── ... │ │ ├── 043422252387 │ │ ├── ... │ │ ├── hololens_kv5h72 │ │ ├── meta.yaml │ │ ├── poses_m.npy │ │ ├── poses_o.npy │ │ └── poses_pv.npy │ ├── 20231025_165502 │ └── ... ├── ... └── subject_9
The HOCap dataset provides the following labels:
- 3d hand keypoints
- 2d hand keypoints
- hand bounding boxes
- hand sides
- hand MANO poses
- object 6OD poses
- segmentation masks
-
Below example shows how to visualize the pose annotations of one frame:
python examples/sequence_pose_viewer.py
-
Below example shows how to visualize sequence by the interactive 3D viewer:
python examples/sequence_3d_viewer.py
The 3D viewer provides the following functionalities:
Background
: change the background color.Point Size
: change the point size.Show Skybox
: display/hide the skybox.Show Axes
: display/hide the axes of world coordinate.Crop Points
: crop the points outside the table area.Point Clouds
: display/hide the point clouds.Hand Mesh
: display/hide the hand mesh.Object Mesh
: display/hide the object mesh.Frame Slider
: change the frame index.Reset
: reset the camera view and the frame index.Pause/Play
: pause/play the sequence.Exit
: close the viewer.Help Tab
: show the help information.
-
Below example shows how to offline render the sequence:
python examples/sequence_renderer.py
This will render the color image and segmentation map for all the frames in the sequence. The rendered images will be saved in the
<sequence_folder>/renders/
directory. -
Below example shows how to visualize the image labels:
python examples/image_label_viewer.py
HO-Cap provides the benchmark evaluation for three tasks:
- Hand Pose Estimation (HPE) (A2J-Transformer1 and HaMeR2)
- Object Pose Estimation (OPE) (MegaPose3 and FoundationPose4)
- Object Detection (ODET) (CNOS5, GroundingDINO6, YOLO117 and RT-DETR8).
Run below code to download the example evaluation results:
python config/benchmarks/benchmark_downloader.py
If the evaluation results are saved in the same format, the evaluation codes below can be used to evaluate the results.
-
Evaluate the hand pose estimation performance:
python examples/evaluate_hand_pose.py
You should see the following output:
PCK (0.05) PCK (0.10) PCK (0.15) PCK (0.20) MPJPE (mm) 45.319048 81.247619 91.357143 95.080952 25.657379
-
Evaluate the novel object pose estimation performance:
python examples/evaluate_object_pose.py
You should see the following output:
Object_ID ADD-S_err (cm) ADD_err (cm) ADD-S_AUC (%) ADD_AUC (%) |-------------- |-------------- |-------------- |-------------- |-------------- | G01_1 0.42 0.72 95.79 92.82 G01_2 0.37 0.69 96.39 93.38 G01_3 0.45 0.82 95.72 92.08 G01_4 0.61 2.73 94.14 74.19 Average 0.46 1.24 95.43 88.04
-
Evaluate the object detection performance:
python examples/evaluate_object_detection.py
You should see the following output: (click to expand)
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.016 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.023 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.018 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.002 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.018 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.014 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.036 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.036 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.036 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.005 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.037 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.017 AP: 0.016 | AP_50: 0.023 | AP_75: 0.018 | AP_s: 0.002 | AP_m: 0.018 | AP_l: 0.014
The train/valid/test split is defined separately for each task (HPE, ODET, OPE) by files config/hocap_hpt.json
, config/hocap_odt.json
, and config/hocap_ope.json
. Each configuration file has the following structure:
{
"train": [[0, 0, 0, 0], ...],
"valid": [...],
"test": [...]
}
Each item is in format [subject_index, sequence_index, camera_index, frame_index]
. For example, [0, 0, 0, 0]
refers to subject_1/20231022_190534/105322251564
folder and frame color_000000.jpg
/ depth_000000.png
.
To save time, we provide the pre-defined splits for each task, the split datasets could be downloaded here.
Or run below code to split the HOCap dataset manually, the split dataset will be saved in the ./datasets
directory.
-
Hand Pose Estimation (HPE) task:
python tools/hocap_dataset_split.py --task hpe
-
Object Pose Estimation (OPE) task:
python tools/hocap_dataset_split.py --task ope
-
Object Detection (ODET) task:
- COCO annotation type:
python tools/hocap_dataset_split.py --task odet --anno_type coco
- YOLO annotation type:
python tools/hocap_dataset_split.py --task odet --anno_type yolo
- COCO annotation type:
Footnotes
-
A2J-Transformer: Anchor-to-Joint Transformer Network for 3D Interacting Hand Pose Estimation from a Single RGB Image ↩
-
MegaPose: 6D Pose Estimation of Novel Objects via Render & Compare ↩
-
FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects ↩
-
CNOS: A Strong Baseline for CAD-based Novel Object Segmentation ↩
-
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection ↩
-
YOLOv11: An Overview of the Key Architectural Enhancements ↩