Skip to content

yzrs/CapeNext

Repository files navigation

CapeNext

This repo is the official implementation of "CapeNext: Rethinking and Refining Dynamic Support Information for Category-Agnostic Pose Estimation" [AAAI-2026]

Introduction

CapeNext is available at arXiv. It's a new framework that innovatively integrates hierarchical cross-modal interaction

with dual-stream feature refinement, enhancing the joint embedding with both class-level and instance-specific cues from

textual description and specific images. Experiments on the MP-100 dataset demonstrate that, regardless of the network

backbone, CapeNext consistently outperforms state-of-the-art CAPE methods by a large margin.

mainFig

Results on MP-100 dataset

methods Img Backbone CLIP Backbone Split1 Split2 Split3 Split4 Split5 Avg
POMNet ResNet-50 - 84.23 78.25 78.17 78.68 79.17 79.70
CapeFormer ResNet-50 - 89.45 84.88 83.59 83.53 85.09 85.31
ESCAPE ResNet-50 - 86.89 82.55 81.25 81.72 81.32 82.74
MetaPoint+ ResNet-50 - 90.43 85.59 84.52 84.34 85.96 86.17
X-Pose ResNet-50 ViT-Base-32 89.07 85.05 85.26 85.52 85.79 86.14
SDPNet HRNet-32 - 91.54 86.72 85.49 85.77 87.26 87.36
GraphCape Swinv2-T - 91.19 87.81 85.68 85.87 85.61 87.23
CapeX HRNet-w32 ViT-Base-32 89.1 85.0 81.9 84.4 85.4 85.2
CapeX ViT-Base-16 ViT-Base-32 90.75 82.87 83.18 85.95 85.49 85.65
CapeX DINOv2-ViT-S ViT-Base-32 90.6 83.74 83.67 86.87 85.93 86.18
CapeX Swinv2-T ViT-Base-32 91.9 86.97 84.41 86.13 88.64 87.61
CapeNext HRNet-w32 ViT-Base-32 90.2 86.0 82.9 85.4 87.1 86.3
CapeNext ViT-Base-16 ViT-Base-32 90.84 86.73 86.5 82.44 87.91 86.88
CapeNext DINOv2-ViT-S ViT-Base-32 92.12 87.75 83.76 87.16 88.95 87.95
CapeNext Swinv2-T ViT-Base-32 92.44 87.31 85.44 86.47 90.17 88.37

Getting Started

Conda Environment Set-up

Please run:

conda env create -f capenext_env.yml
conda activate capenext

MP-100 Dataset

Please follow the official guide to prepare the MP-100 dataset for training and evaluation, and organize the data structure properly.

Then, use Pose Anything's updated annotation file, with all the skeleton definitions, from the following link.

Training

Backbone

Pretrained weights of Swin-Transformer-V2-Tiny are taken from this repo, in the following link. Pretrained weights should be placed in the ./pretrained folder.

Training

To train the model, run:

python train.py --config [path_to_config_file]  --work-dir [path_to_work_dir]

For example:

# capenext setting
python train.py --config configs/clip/clip_split1_config.py \
    --work-dir work_dirs/tiny/clip/capenext/split1 --cfg-options data.samples_per_gpu=32 data.workers_per_gpu=32 \
    additional_module_cfg.module_name="SimpleMultiModalModule" 

Evaluation and Pretrained Models

Here we provide the evaluation results of our pretrained models on MP-100 dataset along with the config files and checkpoints:

Setting Backbone split 1 split 2 split 3 split 4 split 5 Average
CapeNext Swinv2-Tiny 92.44 87.31 85.44 86.47 90.17 88.37
weight / config weight / config weight / config weight / config weight / config

Evaluation

To evaluate the pretrained model, run:

python test.py [path_to_config_file] [path_to_pretrained_ckpt]

For example:

# capenext setting
python test.py configs/clip/clip_split1_config.py \
    work_dirs/tiny/clip/capenext/split1/split1_epoch_200.pth \
    --cfg-options additional_module_cfg.module_name="SimpleMultiModalModule" 

Acknowledgement

Our code is based on code from:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published