Skip to content

OpenMotionLab/MotionChain

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

3 Commits
ย 
ย 
ย 
ย 

Repository files navigation

Official repo for MotionChain

MotionChain: Conversational Motion Controllers via Multimodal Prompts

Arxiv Paper โ€ข Demo โ€ข FAQ โ€ข Citation

Intro MotionChain

MotionChain is a unified vision-motion-language generative pre-trained model, which performs conversational generation tasks via multi-modal inputs with language models.

Technical details

Recent advancements in language models have demonstrated their adeptness in conducting multi-turn dialogues and retaining conversational context. However, this proficiency remains largely unexplored in other multimodal generative models, particularly in human motion models. By integrating multi-turn conversations in controlling continuous virtual human movements, generative human motion models can achieve an intuitive and step-by-step process of human task execution for humanoid robotics, game agents, or other embodied systems. In this work, we present MotionChain, a conversational human motion controller to generate continuous and long-term human motion through multimodal prompts. Specifically, MotionChain consists of multi-modal tokenizers that transform various data types such as text, image, and motion, into discrete tokens, coupled with a Vision-Motion-aware Language model. By leveraging large-scale language, vision-language, and vision-motion data to assist motion-related generation tasks, MotionChain thus comprehends each instruction in multi-turn conversation and generates human motions followed by these prompts. Extensive experiments validate the efficacy of MotionChain, demonstrating state-of-the-art performance in conversational motion generation, as well as more intuitive manners of controlling and interacting with virtual humans.

pipeline

๐Ÿšฉ News

  • [2024/07/15] Conversation dataset released.
  • [2024/04/02] Upload paper and init project ๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ

โšก Quick Start

โ–ถ๏ธ Demo

๐Ÿ‘€ Visualization

โš ๏ธ FAQ

Question-and-Answer

๐Ÿ“– Citation

If you find our code or paper helps, please consider citing:

@misc{jiang2024motionchain,
      title={MotionChain: Conversational Motion Controllers via Multimodal Prompts},
      author={Biao Jiang and Xin Chen and Chi Zhang and Fukun Yin and Zhuoyuan Li and Gang YU and Jiayuan Fan},
      year={2024},
      eprint={2404.01700},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Acknowledgments

Thanks to BEDLAM, TMR, vector-quantize-pytorch, Motion-GPT, Motion-latent-diffusion, T2m-gpt, TEMOS, ACTOR, HumanML3D and joints2smpl, our code is partially borrowing from them.

License

This code is distributed under an MIT LICENSE.

Note that our code depends on other libraries, including SMPL, SMPL-X, PyTorch3D, and uses datasets which each have their own respective licenses that must also be followed.

About

MotionChain: Conversational Motion Controllers via Multimodal Prompts

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published