Skip to content

RHM2DGen: Towards Rich Human Motion2D Generation.

Notifications You must be signed in to change notification settings

ZulutionAI/RHM2DGen

Repository files navigation

RHM2DGen: Towards Rich Human Motion2D Generation.

framework

framework

results

results

dataset

Our model was trained on the following datasets (~1.5M samples), with the top and bottom 5% excluded from training for validation/testing purposes:

  • skeleton_trainv8_flux_0.txt_new.txt # Flux-generated data
  • skeleton_trainv8_flux_1.txt_new.txt # Flux-generated data
  • skeleton_trainv8_pinterest.txt_new.txt # Web-crawled data
  • skeleton_trainv8_reelshort.txt_new.txt # Vertical short-video data
  • skeleton_trainv8_vcg_0.txt_new.txt # Web-crawled data
  • skeleton_trainv8_vcg_1.txt_new.txt # Web-crawled data
  • skeleton_trainv8_vcg_2.txt_new.txt # Web-crawled data
  • skeleton_trainv8_vcg_3.txt_new.txt # Web-crawled data

We additionally incorporated skeleton data from the COYO open-source dataset, though these were not used in final model iterations due to project adjustments:

  • coyo_two_people_0.txt
  • coyo_two_people_1.txt
  • coyo_two_people_2.txt
  • coyo_two_people_3.txt
  • coyo_two_people_4.txt
  • coyo274w_0.txt
  • coyo274w_1.txt
  • coyo274w_2.txt
  • coyo274w_3.txt
  • coyo274w_4.txt
  • coyo274w_5.txt
  • coyo274w_6.txt
  • coyo274w_7.txt
  • coyo274w_8.txt
  • coyo274w_9.txt

Dataset Processing

framework

  • Dataset labels were generated using complex prompts + GPT-4o, with multi-dimensional annotations (see script: tools/gen_caption.py)
  • During training, descriptions from different dimensions are randomly combined to create richer text distributions
  • We also provide simple single-sentence prompts generated via gen_prompt_simple in tools/gen_caption.py
  • These simple prompts were used to train our evaluation model: RHM2DGen_eval
  • Pipeline: Face/human detection → Region extraction using detection boxes + SAM (resolving overlaps in multi-person cases) → Skeleton extraction and subsequent labeling using SAM masks
  • Note: To ensure GPT-4o can recognize character relationships and maintain description consistency in multi-person scenarios, we input each person's SAM detection region to define character names (subject0/1) while maintaining whole-image context for detail description

Evaluation Datasets

(We provide original images, JSON files, and test prompts)

  • eval_single_1k
  • eval_double_1k
  • For image data requests (non-commercial use only), please email: wxktongji@163.com
  • Download link: Baidu Drive (Code: 4ujq)

Environment

  • environment.yml

Train

python3 train.py

Infer

python3 infer.py

Model Download

Evaluation Model

  • Code located in RHM2DGen_eval/, developed based on MDM's evaluation framework (modified for skeleton points)
  • Environment: RHM2DGen_eval/environment.yml
  • prompt: tools/gen_caption.py, using simple prompts (gen_prompt_simple)
  • Training script: RHM2DGen_eval/train.py
  • Evaluation script: RHM2DGen_eval/eval.py

Eval model

Technical Report

Detailed technical report will be released subsequently.

Contribution

Primary contributors: Xuekuan Wang, Haoyu Yin, Haoyu Zheng, Yuqiu Huang, Keqiang Sun, Feng Qiu, Yunhao Shui, Junru Qiu

About

RHM2DGen: Towards Rich Human Motion2D Generation.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published