Runfa Chen1, Jiaqi Han1, Fuchun Sun1 2, Wenbing Huang3 4
1Department of Computer Science and Technology, Institute for AI, BNRist Center, Tsinghua University, 2THU-Bosch JCML Center, 3Gaoling School of Artificial Intelligence, Renmin University of China, 4Beijing Key Laboratory of Big Data Management and Analysis Methods
This is a PyTorch-based implementation of our Subequivariant Graph Reinforcement Learning. In this work, we introduce a new morphology-agnostic RL benchmark that extends the widely adopted 2D-Planar setting to 3D-SGRL, permitting significantly larger exploring space of the agents with arbitrary initial location and target direction. To learn a policy in this massive search space, we design SET, a novel model that preserves geometric symmetry by construction. Experimental results strongly support the necessity of encoding symmetry into the policy network and its wide applicability towards learning to navigate in various 3D environments.
If you find this work useful in your research, please cite using the following BibTeX:
@inproceedings{chen2023sgrl,
title = {Subequivariant Graph Reinforcement Learning in 3D Environment},
author = {Chen, Runfa and Han, Jiaqi and Sun, Fuchun and Huang, Wenbing},
booktitle={International Conference on Machine Learning},
year={2023},
organization={PMLR}
}
- Python-3.8
- PyTorch-1.12
- CUDA-11.3
- MuJoCo-210
pip install --upgrade pip
pip install -r requirements.txt
Flags and Parameters | Description |
---|---|
--env_name <STRING> |
The name of the experiment project folder and the project name in wandb |
--morphologies <STRING> |
Find existing environments matching each keyword for training (e.g. walker, hopper, humanoid, cheetah, whh, cwhh, etc) |
--expID <STRING> |
Experiment Name for creating saving directory |
--exp_path <STRING> |
The directory path where the experimental results are saved |
--config_path <STRING> |
The path to the configuration file |
--gpu <INT> |
The GPU device ID (e.g., 0, 1, 2, 3, etc) |
--custom_xml <PATH> |
Path to custom xml file for training the morphology-agnostic policy.When <PATH> is a file, train with that xml morphology only. When <PATH> is a directory, train on all xml morphologies found in the directory |
--actor_type <STRING> |
Type of the actor to use (e.g., smp, swat, set, mlp, etc) |
--critic_type <STRING> |
Type of the critic to use (e.g., smp, swat, set, mlp, etc) |
--seed <INT> |
(Optional) Seed for Gym, PyTorch and Numpy |
- Train SET on
3D_Hopper++
(3 variants of hopper):
cd src/
bash start.sh
3D Hopper | ||
3d_hopper_3_shin |
3d_hopper_4_lower_shin |
3d_hopper_5_full |
3D Humanoid | |||
3d_humanoid_7_left_arm |
3d_humanoid_7_lower_arms |
3d_humanoid_7_right_arm |
3d_humanoid_7_right_leg |
3d_humanoid_8_left_knee |
3d_humanoid_9_full |
3d_humanoid_7_left_leg |
3d_humanoid_8_right_knee |
For the results reported in the paper, the following agents are in the held-out set for the corresponding experiments:
- 3D_Walker++: 3d_walker_3_left_knee_right_knee, 3d_walker_6_right_foot
- 3D_Humanoid++: 3d_humanoid_7_left_leg, 3d_humanoid_8_right_knee
- 3D_Cheetah++: 3d_cheetah_11_leftbkneen_rightffoot, 3d_cheetah_12_tail_leftffoot
All other agents in the corresponding experiments are used for training.
The RL code is based on this open-source implementation and the morphology-agnostic implementation is built on top of SMP (Huang et al., ICML 2020), Amorpheus (Kurin et al., ICLR 2021) and SWAT (Hong et al., ICLR 2022) repository.