This repository contains the PyTorch implementation of the paper "Aligning Motion Generation with Human Perceptions".
MotionCritic is capable of scoring a single motion with just a few lines of code.
bash prepare/prepare_smpl.sh
from lib.model.load_critic import load_critic
from parsedata import into_critic
import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
critic_model = load_critic("critic/motioncritic_pre.pth", device)
example = torch.load("criexample.pth", map_location=device)
example_motion = example['motion'] # [bs, 25, 6, frame], rot6d with 24 SMPL joints and 1 XYZ root location
# motion pre-processing
preprocessed_motion = into_critic(example['motion']) # [bs, frame, 25, 3], axis-angle with 24 SMPL joints and 1 XYZ root location
# critic score
critic_scores = critic_model.module.batch_critic(preprocessed_motion)
print(f"critic scores are {critic_scores}") # Critic score being 4.1297 in this case
criexample.mp4
Try scoring multiple motions with some more code
bash prepare/prepare_demo.sh
from lib.model.load_critic import load_critic
from render.render import render_multi
from parsedata import into_critic
import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
critic_model = load_critic("critic/motioncritic_pre.pth", device)
example = torch.load("visexample.pth", map_location=device)
# calculate critic score
critic_scores = critic_model.module.batch_critic(into_critic(example['motion']))
print(f"critic scores are {critic_scores}")
# rendering
render_multi(example['motion'], device, example['comment'], example['path'])
demo.mp4
conda env create -f environment.yml
conda activate mocritic
Download the pre-processed datasets and pretrained models:
bash prepare/prepare_dataset.sh # Download pre-processed datasets
bash prepare/prepare_pretrained.sh # Download pretrained models
Alternatively, you can manually download the files from the following links:
- Pre-processed datasets: Google Drive Link
- Pretrained MotionCritic model: Google Drive Link
To build your own dataset from the original motion files and annotation results:
bash prepare/prepare_fullannotation.sh
bash prepare/prepare_fullmotion.sh
Manual downloads are available here:
- Full annotation results: Google Drive Link
- Complete motion .npz files: Google Drive Link
After pre-processing the complete data, build your dataset with:
cd MotionCritic
python parsedata.py
Reproduce the results from the paper by running:
cd MotionCritic/metric
python metrics.py
python critic_score.py
Train your own critic model with the following command:
cd MotionCritic
python train.py --gpu_indices 0 --exp_name my_experiment --dataset mdmfull_shuffle --save_latest --lr_decay --big_model
First, prepare the MDM baseline:
bash prepare/prepare_MDM_dataset.sh
bash prepare/prepare_MDM_pretrained.sh
If you encounter any issues, refer to the MDM baseline setup.
Next, start MotionCritic-supervised fine-tuning:
cd MDMCritic
python -m train.tune_mdm \
--dataset humanact12 --cond_mask_prob 0 --lambda_rcxyz 1 --lambda_vel 1 --lambda_fc 1 \
--resume_checkpoint ./save/humanact12/model000350000.pt \
--reward_model_path ./reward/motioncritic_pre.pth \
--device 0 \
--num_steps 1200 \
--save_interval 100 \
--reward_scale 1e-4 --kl_scale 5e-2 --random_reward_loss \
--ddim_sampling \
--eval_during_training \
--sample_when_eval \
--batch_size 64 --lr 1e-5 \
--denoise_lower 700 --denoise_upper 900 \
--use_kl_loss \
--save_dir save/finetuned/my_experiment \
--wandb my_experiment
Additional Python scripts for various fine-tuning purposes can be found in MDMCritic/train
, detailed in the fine-tuning documentation.
If you find our work useful for your project, please consider citing the paper:
@article{motioncritic2024,
title={Aligning Motion Generation with Human Perceptions},
author={Wang, Haoru and Zhu, Wentao and Miao, Luyi and Xu, Yishu and Gao, Feng and Tian, Qi and Wang, Yizhou},
journal={arXiv preprint arXiv:2407.02272},
year={2024}
}
If you use MotionPercept and MotionCritic in your work, please also cite the original datasets and methods on which our work is based.
MDM:
@inproceedings{
tevet2023human,
title={Human Motion Diffusion Model},
author={Guy Tevet and Sigal Raab and Brian Gordon and Yoni Shafir and Daniel Cohen-or and Amit Haim Bermano},
booktitle={The Eleventh International Conference on Learning Representations },
year={2023},
url={https://openreview.net/forum?id=SJ1kSyO2jwu}
}
HumanAct12:
@inproceedings{guo2020action2motion,
title={Action2motion: Conditioned generation of 3d human motions},
author={Guo, Chuan and Zuo, Xinxin and Wang, Sen and Zou, Shihao and Sun, Qingyao and Deng, Annan and Gong, Minglun and Cheng, Li},
booktitle={Proceedings of the 28th ACM International Conference on Multimedia},
pages={2021--2029},
year={2020}
}
FLAME:
@inproceedings{kim2023flame,
title={Flame: Free-form language-based motion synthesis \& editing},
author={Kim, Jihoon and Kim, Jiseob and Choi, Sungjoon},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={37},
number={7},
pages={8255--8263},
year={2023}
}
UESTC:
@inproceedings{ji2018large,
title={A large-scale RGB-D database for arbitrary-view human action recognition},
author={Ji, Yanli and Xu, Feixiang and Yang, Yang and Shen, Fumin and Shen, Heng Tao and Zheng, Wei-Shi},
booktitle={Proceedings of the 26th ACM international Conference on Multimedia},
pages={1510--1518},
year={2018}
}
DSTFormer:
@inproceedings{zhu2023motionbert,
title={Motionbert: A unified perspective on learning human motion representations},
author={Zhu, Wentao and Ma, Xiaoxuan and Liu, Zhaoyang and Liu, Libin and Wu, Wayne and Wang, Yizhou},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={15085--15099},
year={2023}
}
SMPL:
@incollection{loper2023smpl,
title={SMPL: A skinned multi-person linear model},
author={Loper, Matthew and Mahmood, Naureen and Romero, Javier and Pons-Moll, Gerard and Black, Michael J},
booktitle={Seminal Graphics Papers: Pushing the Boundaries, Volume 2},
pages={851--866},
year={2023}
}
We also recommend exploring other motion metrics, including PoseNDF, NPSS, NDMS, MoBERT, and PFC. You can also check out a survey of different motion generation metrics, datasets, and approaches.