FRoM-W1: Towards General Humanoid Whole-Body Control with Language Instructions

The Humanoid Intelligence Team from FudanNLP and OpenMOSS

🔥 Introduction

For more information, please refer to our project page and technical report.

Humanoid robots are capable of performing various actions such as greeting, dancing and even backflipping. However, these motions are often hard-coded or specifically trained, which limits their versatility. In this work, we present FRoM-W1¹, an open-source framework designed to achieve general humanoid whole-body motion control using natural language.

To universally understand natural language and generate corresponding motions, as well as enable various humanoid robots to stably execute these motions in the physical world under gravity, FRoM-W1 operates in two stages:

(a) H-GPT
Utilizing massive human data, a large-scale language-driven human whole-body motion generation model is trained to generate diverse natural behaviors. We further leverage the Chain-of-Thought technique to improve the model's generalization in instruction understanding.

(b) H-ACT
After retargeting generated human whole-body motions into robot-specific actions, a motion controller that is pretrained and further fine-tuned through reinforcement learning in physical simulation enables humanoid robots to accurately and stably perform corresponding actions. It is then deployed on real robots via a modular sim-to-real module.

We extensively evaluate FRoM-W1 on Unitree H1 and G1 robots. Results demonstrate superior performance on the HumanML3D-X benchmark for human whole-body motion generation, and our introduced reinforcement learning fine-tuning consistently improves both motion tracking accuracy and task success rates of these humanoid robots. We open-source the entire FRoM-W1 framework and hope it will advance the development of humanoid intelligence.

📑 Roadmap

🎉 Release the initial codebase for the H-GPT and H-ACT modules
🎉 Release the amazing humanoid-robot deployment framework RoboJuDo
Release the CoT datasets of the HumanML3D-X and Motion-X benchmarks, and the δHumanML3D-X benchmark
Release checkpoints for the baseline models, SMPL-X version of T2M, MotionDiffuse, MLD, T2M-GPT
🎉 Release the Technical Report and Project Page of FRoM-W1!
More powerful models are working in progress

💾 Datasets

Due to license restrictions, we cannot publicly share all of the data. Here are the reference download and processing links for the relevant datasets:

H-GPT Module

Dataset Name	Download Guide
HumanML3D-X	Please refer to the process in the Motion-X repo to download and process the corresponding AMASS data. The CoT part can be downloaded here.
δHumanML3D-X	After obtaining the HumanML3D-X data, replace the textual instructions in it with the perturbed versions provided here.
Motion-X	Please refer to the original Motion-X repo. Note that we did not use the Motion-X++ version; specifically, we used the version from [2024.2.6].

H-ACT Module

Dataset Name	Download Guide
AMASS	Please refer to the download and processing procedures for the AMASS dataset in the human2humanoid project.
AMASS-H1	The retargeted dataset for the Unitree H1 can be obtained from the link provided by human2humanoid.
AMASS-G1	We provide a retargeted dataset for the Unitree G1, with the link available here.

🧠 Models

To keep the repo organized, we provide a subset of core model checkpoints below:

H-GPT Module

Model Name	Download Guide
Eval Model	HuggingFace link, which were trained following the T2M pipeline with the SMPL-X format.
Baseline Models	HuggingFace link, including the SMPL-X version of the T2M, MotionDiffuse, MLD and T2M-GPT models.
H-GPT w.o. CoT	HuggingFace link, you can refer to this script to merge these LoRA parameters with the original Llama-3.1 model.
H-GPT	HuggingFace link, you can refer to this script to merge these LoRA parameters with the original Llama-3.1 model.
H-GPT++ w.o. CoT	HuggingFace link, you can refer to this script to merge these LoRA parameters with the original Llama-3.1 model.
H-GPT++	HuggingFace link, you can refer to this script to merge these LoRA parameters with the original Llama-3.1 model.

H-ACT Module

Model Name	Download Guide
H1-Full	Teacher Policy, Student Policy
H1-Clean	Teacher Policy, Student Policy
G1-Full	Teacher Policy, Student Policy
G1-Clean	Teacher Policy, Student Policy

If you require additional model checkpoints, please contact us.

🚀 Quick Start

1. Setup

Clone our GitHub repo:

git clone --depth 1 git@github.com:OpenMOSS/FRoM-W1.git
cd ./FRoM-W1

Setup the conda environment:

conda create -n fromw1 python=3.10
conda activate fromw1
pip install -r requirements.txt

2. Whole-Body Human Motion Generation

[26/02/04] Note: This motion generation part is not fully organized yet. We are currently dealing with it and will update the files after checking. Please wait a moment.

Download the H-GPT whole-body motion tokenizer and the motion generator from the HuggingFace.
Replace the path to the motion tokenizer and the motion generator at line 55 & 78 of ./H-GPT/hGPT/configs/config_deployment_cot.yaml
Run bash ./H-GPT/app.sh to deploy the H-GPT model to a gradio app and generate human motions.

3. Human-to-Humanoid Motion Retargeting

After generating a human motion sequence, we need to retarget it into specific humanoid robot poses.

(a) This retargeting module requires human motion models SMPL and MANO. Before use, download the corresponding model files:

Download SMPL models: Visit the SMPL official website and download (SMPL_NEUTRAL.pkl, SMPL_MALE.pkl, SMPL_FEMALE.pkl) into models/smpl. Note: download the SMPL_python_v.1.1.0.zip file, unzip it and rename ./models/basicmodel_{m/f/neutral}_lbs_10_207_0_v1.1.0.pkl into ./models/SMPL_{MALE/FEMALE/NEUTRAL}.pkl.
Download MANO models: Visit the MANO official website and download the model files (MANO_LEFT.pkl, MANO_RIGHT.pkl, via the Models & Code link) into models/mano.

And the folder structure should be like

./H-ACT/retarget/models/
├── smpl/
│   ├── SMPL_NEUTRAL.pkl
│   ├── SMPL_MALE.pkl
│   └── SMPL_FEMALE.pkl
└── mano/
    ├── MANO_LEFT.pkl
    └── MANO_RIGHT.pkl

You may need the MANO lib for hand visualization.

pip install git+https://github.com/otaheri/MANO

(b) Then download the retargeting assets via this huggingface link. And put them into the ./H-ACT/retarget/assets folder. The folder structure should be like

./H-ACT/retarget/assets/
├── beta/
├── meta/
└── robot/
    ├── dex3/
    ├── g1/
    ├── h1/
    └── inspire/

(c) Then put the H-GPT generated motion feature sequences into the ./H-ACT/retarget/data/ folder. You should have

./H-ACT/retarget/data/
├── 623/                       # stores the 623-dimensional motion data generated by H-GPT
│   ├── data1.npy              # output file from H-GPT
│   └── data2.npy
├── smplx/                     # stores intermediate SMPL-X motion sequences
└── output/                    # stores final robot and dexterous-hand joint sequences

We have put an example motion in the ./H-ACT/retarget/data/623 folder.

(d) Finally, run the following command to retarget the motion representations into robot-specific joint sequences:

cd ./H-ACT/retarget
python main.py

The module currently supports the following robots and dexterous hands:

Unitree H1
Unitree G1
Inspire Hand
Dex3 Hand

You can modify lines 47–48 in ./H-ACT/retarget/main.py to select a target robot:

robot_data = process_data(amass_data, "G1") # available robot: H1, G1, H121(H1 19dof and 2dof from wrist)
hand_data = retarget_from_rotvec(smpl_dict['poses'][:, 66:], hand_type="dex3") # available hand: inspire, dex3

The output of bash should look like below:

tensor([0.0014, 0.0014], grad_fn=<SelectBackward0>)
[MujocoKinematics] Loaded 14 joints from assets/robot/dex3/dex3.xml
(256, 29)

Note: Motion controllers like the Beyondmimic require input motion data in CSV format, so you have to first convert the retargeted robot motion data into CSV. We have provided a python script to do this: ./H-ACT/retarget/scripts/pkl_2_csv.py.

4. Sim and Real Humanoid Robot Deployment

After obtaining the retargeted robot sequence, you can conveniently use our RoboJuDo repo to track various strategies in both simulation and real-world scenarios.

RoboJuDo supports:

A unified, clean interface for integrating custom policy models with minimal effort
Sim2sim & sim2real deployment using Beyondmimic, Human2Humanoid, Twist, and more
Pretrained policy models for quick real-robot deployment

We made RoboJuDo available as a standalone module for everyone to use, so here you need to set it up according to the instructions in the RoboJuDo Readme.

We have placed a retargeted g1+dex3 example pkl file 0_feats_out.pkl in the ./H-ACT/retarget/data/output folder. After setting up the RoboJuDo module, you can copy it to the assets/motions/g1/phc_29/singles directory of RoboJudo, then modify the path of motion_name in the G1MotionCtrlCfg class within the file RoboJuDo/robojudo/config/g1/ctrl/g1_motion_ctrl_cfg.py to match the path of the pkl file in the assets directory, and then run

python scripts/run_pipeline.py -c g1_h2h

to track the motion in the simulation.

Since H2H is an earlier work, its tracking performance might be relatively limited. You can use newer and better tracking strategies in RoboJudo, such as TWIST and BeyondMimic.

Have fun with it!

🛠️ Model Training and Evaluation

1. H-GPT

Please refer to the corresponding H-GPT README file in the subfolder.

2. H-ACT

Please refer to the corresponding H-ACT README file in the subfolder.

🙏 Acknowledgements

We extend our gratitude to Biao Jiang for discussions and assistance regarding the motion generation models, to Tairan He and Ziwen Zhuang for their discussions and help in the motion tracking section.

And we thank all the relevant open-source datasets and open-source codes; it is these open-source projects that have propelled the advancement of the entire field!

📄 Citation

If you find our work useful, please cite it in the following way:

@misc{li2026fromw1generalhumanoidwholebody,
      title={FRoM-W1: Towards General Humanoid Whole-Body Control with Language Instructions}, 
      author={Peng Li and Zihan Zhuang and Yangfan Gao and Yi Dong and Sixian Li and Changhao Jiang and Shihan Dou and Zhiheng Xi and Enyu Zhou and Jixuan Huang and Hui Li and Jingjing Gong and Xingjun Ma and Tao Gui and Zuxuan Wu and Qi Zhang and Xuanjing Huang and Yu-Gang Jiang and Xipeng Qiu},
      year={2026},
      eprint={2601.12799},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2601.12799}, 
}

Welcome to star ⭐ our GitHub Repo, raise issues, and submit PRs!

Foundational Humanoid Robot Model - Whole-Body Control, Version 1 ↩

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
H-ACT		H-ACT
H-GPT		H-GPT
assets		assets
baselines		baselines
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FRoM-W1: Towards General Humanoid Whole-Body Control with Language Instructions

🔥 Introduction

📑 Roadmap

💾 Datasets

🧠 Models

🚀 Quick Start

1. Setup

2. Whole-Body Human Motion Generation

3. Human-to-Humanoid Motion Retargeting

4. Sim and Real Humanoid Robot Deployment

🛠️ Model Training and Evaluation

1. H-GPT

2. H-ACT

🙏 Acknowledgements

📄 Citation

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

OpenMOSS/FRoM-W1

Folders and files

Latest commit

History

Repository files navigation

FRoM-W1: Towards General Humanoid Whole-Body Control with Language Instructions

🔥 Introduction

📑 Roadmap

💾 Datasets

🧠 Models

🚀 Quick Start

1. Setup

2. Whole-Body Human Motion Generation

3. Human-to-Humanoid Motion Retargeting

4. Sim and Real Humanoid Robot Deployment

🛠️ Model Training and Evaluation

1. H-GPT

2. H-ACT

🙏 Acknowledgements

📄 Citation

Footnotes

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages