(ICML 2024) The official code for EvoRainbow: Combining Improvements in Evolutionary Reinforcement Learning for Policy Search
EvoRainbow integrates the latest advancements in ERL methods for policy search. Specifically, EvoRainbow contributes as follows:
- 🔱 Decomposes ERL methods into five perspectives, where all current ERL for policy search work can be seen as combinations of these five aspects.
- 📊 Analyzes different mechanisms within each aspect across tasks with diverse characteristics, identifying the most efficient mechanisms.
- 🎭 Integrates the most effective mechanisms into EvoRainbow and EvoRainbow-Exp.
- 🔑 EvoRainbow expands the testing domain of current ERL beyond MUJOCO tasks, advocating for researchers to focus on tasks with various other characteristics.
- 🏆 EvoRainbow achieves the best performance in the field of ERL for policy search across various tasks.
For more detailed information, please refer to our paper.
Tip
🔥 🔥 🔥 If you are interested in ERL for policy search or other hybrid algorithms combining EA and RL, we strongly recommend reading our survey paper: Bridging Evolutionary Algorithms and Reinforcement Learning: A Comprehensive Survey on Hybrid Algorithms. It provides a comprehensive and accessible overview of research directions and classifications suitable for researchers with various backgrounds.
EvoRainbow = Parallel Mode + Shared Architecture + CEM + Genetic Soft Update + H-Step Bootstrap with Critic. EvoRainbow-Exp = Parallel Mode + Private Architecture + CEM + Genetic Soft Update. Our repository primarily provides the implementation codes for EvoRainbow and EvoRainbow-Exp on MUJOCO and Metaworld. We will consider releasing the code for other environments in the future.
If you do find our paper or the repository helpful (or if you would be so kind as to offer us some encouragement), please consider kindly giving a star, and citing our paper.
@inproceedings{lievorainbow,
title={EvoRainbow: Combining Improvements in Evolutionary Reinforcement Learning for Policy Search},
author={Li, Pengyi and Zheng, Yan and Tang, Hongyao and Fu, Xian and Jianye, HAO},
booktitle={Forty-first International Conference on Machine Learning}
}
You need to create a Weights & Biases account for visualizing results, and you should already have conda and MUJOCO installed.
Different benchmarks require different versions of libraries, so we recommend constructing different environments based on the specific benchmarks.
First, select the task you want to experiment with. If it's a MUJOCO task, navigate to the MUJOCO directory by running cd ./MUJOCO
. If it's a Metaworld task, navigate to the Metaworld directory by running cd ./MetaWorld
.
Then, we create the conda environment based on the provided 'environment.yml':
conda env create -f environment.yml
Note: Modify the name
field in the environment.yml
file to change the environment name to your desired name.
Activate the environment:
conda activate env_name
The env_name
refers to the name of the conda environment you create.
Then directly run run.sh
. We recommend checking the commands inside the run.sh
file as it uses nohup
. If you want to run it directly, please remove the nohup
related instructions.
chmod 777 ./run.sh
./run.sh
In MUJOCO tasks, -Value_Function
is a hyperparameter used to select between Critic and PeVFA. -theta
is the probability of using a surrogate. -EA_tau
is the hyperparameter for Genetic Soft Update. -damp
is the hyperparameter for CEM. In MetaWorld tasks, we remove -Value_Function
for simplicity. These specific hyperparameter settings need to be adjusted according to the original paper. Other hyperparameters can be kept at their default settings. (The theta is 0.8 instead of 0.2 in Humanoid tasks for EvoRainbow)
EvoRainhow is licensed under the MIT license. We use the implementations of other open-source ERL algorithms, including ERL PDERL, CEMRL, PGPS, and ERL-Re$^2$.
For any questions, please feel free to email lipengyi@tju.edu.cn
.