diff --git a/README.md b/README.md
index b91fc42f..ea9b3da6 100644
--- a/README.md
+++ b/README.md
@@ -1,6 +1,8 @@
-
-
-
+[comment]: <> ()
+
+[comment]: <> (
)
+
+[comment]: <> (
)
MARLlib: A Scalable and Efficient Multi-agent Reinforcement Learning Library
@@ -10,10 +12,9 @@
[![GitHub issues](https://img.shields.io/github/issues/Replicable-MARL/MARLlib)](https://github.com/Replicable-MARL/MARLlib/issues)
[![PyPI version](https://badge.fury.io/py/marllib.svg)](https://badge.fury.io/py/marllib)
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Replicable-MARL/MARLlib/blob/sy_dev/marllib.ipynb)
-[![Awesome](https://awesome.re/badge.svg)](https://marllib.readthedocs.io/en/latest/resources/awesome.html)
[![Organization](https://img.shields.io/badge/Organization-ReLER_RL-blue.svg)](https://github.com/Replicable-MARL/MARLlib)
[![Organization](https://img.shields.io/badge/Organization-PKU_MARL-blue.svg)](https://github.com/Replicable-MARL/MARLlib)
-
+[![Awesome](https://awesome.re/badge.svg)](https://marllib.readthedocs.io/en/latest/resources/awesome.html)
> __News__:
> We are excited to announce that a major update has just been released. For detailed version information, please refer to the [version info](https://github.com/Replicable-MARL/MARLlib/releases/tag/1.0.2).
@@ -55,7 +56,7 @@ Here we provide a table for the comparison of MARLlib and existing work.
| [MAPPO Benchmark](https://github.com/marlbenchmark/on-policy) | 4 cooperative | 1 | share + separate | MLP + GRU | :x: |
| [MAlib](https://github.com/sjtu-marl/malib) | 4 self-play | 10 | share + group + separate | MLP + LSTM | [![Documentation Status](https://readthedocs.org/projects/malib/badge/?version=latest)](https://malib.readthedocs.io/en/latest/?badge=latest)
| [EPyMARL](https://github.com/uoe-agents/epymarl)| 4 cooperative | 9 | share + separate | GRU | :x: |
-| **[MARLlib](https://github.com/Replicable-MARL/MARLlib)** | 11 **no task mode restriction** | 18 | share + group + separate + **customizable** | MLP + CNN + GRU + LSTM | [![Documentation Status](https://readthedocs.org/projects/marllib/badge/?version=latest)](https://marllib.readthedocs.io/en/latest/) |
+| **[MARLlib](https://github.com/Replicable-MARL/MARLlib)** | 12 **no task mode restriction** | 18 | share + group + separate + **customizable** | MLP + CNN + GRU + LSTM | [![Documentation Status](https://readthedocs.org/projects/marllib/badge/?version=latest)](https://marllib.readthedocs.io/en/latest/) |
| Library | Github Stars | Documentation | Issues Open | Activity | Last Update
|:-------------:|:-------------:|:-------------:|:-------------:|:-------------:|:-------------:|
@@ -108,7 +109,7 @@ First, install MARLlib dependencies to guarantee basic usage.
following [this guide](https://marllib.readthedocs.io/en/latest/handbook/env.html), finally install patches for RLlib.
```bash
-$ conda create -n marllib python=3.8
+$ conda create -n marllib python=3.8 # or 3.9
$ conda activate marllib
$ git clone https://github.com/Replicable-MARL/MARLlib.git && cd MARLlib
$ pip install -r requirements.txt
@@ -185,6 +186,7 @@ Most of the popular environments in MARL research are supported by MARLlib:
| **[GRF](https://github.com/google-research/football)** | collaborative + mixed | Full | Discrete | 2D |
| **[Hanabi](https://github.com/deepmind/hanabi-learning-environment)** | cooperative | Partial | Discrete | 1D |
| **[MATE](https://github.com/XuehaiPan/mate)** | cooperative + mixed | Partial | Both | 1D |
+| **[GoBigger](https://github.com/opendilab/GoBigger)** | cooperative + mixed | Both | Continuous | 1D |
Each environment has a readme file, standing as the instruction for this task, including env settings, installation, and
important notes.
@@ -320,7 +322,11 @@ More tutorial documentations are available [here](https://marllib.readthedocs.io
## Awesome List
-A collection of research and review papers of multi-agent reinforcement learning (MARL) is available [here](https://marllib.readthedocs.io/en/latest/resources/awesome.html). The papers have been organized based on their publication date and their evaluation of the corresponding environments.
+A collection of research and review papers of multi-agent reinforcement learning (MARL) is available. The papers have been organized based on their publication date and their evaluation of the corresponding environments.
+
+Algorithms: [![Awesome](https://awesome.re/badge.svg)](https://marllib.readthedocs.io/en/latest/resources/awesome.html)
+Environments: [![Awesome](https://awesome.re/badge.svg)](https://marllib.readthedocs.io/en/latest/handbook/env.html)
+
## Community
diff --git a/ROADMAP.md b/ROADMAP.md
index 3ead2415..eed984ae 100644
--- a/ROADMAP.md
+++ b/ROADMAP.md
@@ -11,7 +11,8 @@ This list describes the planned features including breaking changes.
- [ ] manual training, refer to issue: https://github.com/Replicable-MARL/MARLlib/issues/86#issuecomment-1468188682
- [ ] new environments
- [x] MATE: https://github.com/UnrealTracking/mate
- - [ ] Go-Bigger: https://github.com/opendilab/GoBigger
+ - [x] Go-Bigger: https://github.com/opendilab/GoBigger
- [ ] Voltage Control: https://github.com/Future-Power-Networks/MAPDN
- [ ] Overcooked: https://github.com/HumanCompatibleAI/overcooked_ai
-- [ ] Support Transformer architecture
+ - [ ] CloseAirCombat: https://github.com/liuqh16/CloseAirCombat
+- [ ] Support Transformers
diff --git a/docs/source/handbook/env.rst b/docs/source/handbook/env.rst
index f3dcfd6c..baec9148 100644
--- a/docs/source/handbook/env.rst
+++ b/docs/source/handbook/env.rst
@@ -594,4 +594,52 @@ Installation
.. code-block:: shell
- pip3 install git+https://github.com/XuehaiPan/mate.git#egg=mate
\ No newline at end of file
+ pip3 install git+https://github.com/XuehaiPan/mate.git#egg=mate
+
+
+.. _GoBigger:
+
+GoBigger
+==============
+.. only:: html
+
+ .. figure:: images/env_gobigger.gif
+ :width: 320
+ :align: center
+
+
+GoBigger is a game engine that offers an efficient and easy-to-use platform for agar-like game development. It provides a variety of interfaces specifically designed for game AI development. The game mechanics of GoBigger are similar to those of Agar, a popular massive multiplayer online action game developed by Matheus Valadares of Brazil. The objective of GoBigger is for players to navigate one or more circular balls across a map, consuming Food Balls and smaller balls to increase their size while avoiding larger balls that can consume them. Each player starts with a single ball, but can divide it into two when it reaches a certain size, giving them control over multiple balls.
+Official Link: https://github.com/opendilab/GoBigger
+
+.. list-table::
+ :widths: 25 25
+ :header-rows: 0
+
+ * - ``Original Learning Mode``
+ - Cooperative + Mixed
+ * - ``MARLlib Learning Mode``
+ - Cooperative + Mixed
+ * - ``Observability``
+ - Partial + Full
+ * - ``Action Space``
+ - Continuous
+ * - ``Observation Space Dim``
+ - 1D
+ * - ``Action Mask``
+ - No
+ * - ``Global State``
+ - No
+ * - ``Global State Space Dim``
+ - /
+ * - ``Reward``
+ - Dense
+ * - ``Agent-Env Interact Mode``
+ - Simultaneous
+
+
+Installation
+-----------------
+
+.. code-block:: shell
+
+ conda install -c opendilab gobigger
\ No newline at end of file
diff --git a/docs/source/images/env_gobigger.gif b/docs/source/images/env_gobigger.gif
new file mode 100644
index 00000000..918f74fa
Binary files /dev/null and b/docs/source/images/env_gobigger.gif differ
diff --git a/marllib/envs/base_env/__init__.py b/marllib/envs/base_env/__init__.py
index c06d11be..c917fff3 100644
--- a/marllib/envs/base_env/__init__.py
+++ b/marllib/envs/base_env/__init__.py
@@ -88,3 +88,9 @@
except Exception as e:
ENV_REGISTRY["mate"] = str(e)
+try:
+ from marllib.envs.base_env.gobigger import RLlibGoBigger
+ ENV_REGISTRY["gobigger"] = RLlibGoBigger
+except Exception as e:
+ ENV_REGISTRY["gobigger"] = str(e)
+
diff --git a/marllib/envs/base_env/config/gobigger.yaml b/marllib/envs/base_env/config/gobigger.yaml
new file mode 100644
index 00000000..aee7b3c5
--- /dev/null
+++ b/marllib/envs/base_env/config/gobigger.yaml
@@ -0,0 +1,33 @@
+# MIT License
+
+# Copyright (c) 2023 Replicable-MARL
+
+# Permission is hereby granted, free of charge, to any person obtaining a copy
+# of this software and associated documentation files (the "Software"), to deal
+# in the Software without restriction, including without limitation the rights
+# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+# copies of the Software, and to permit persons to whom the Software is
+# furnished to do so, subject to the following conditions:
+#
+# The above copyright notice and this permission notice shall be included in all
+# copies or substantial portions of the Software.
+#
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+# SOFTWARE.
+
+env: gobigger
+
+env_args:
+ map_name: "st_t1p2" # st(andard)_t(eam)2p(layer)2
+ #num_teams: 1
+ #num_agents: 2
+ frame_limit: 1600
+mask_flag: False
+global_state_flag: False
+opp_action_in_cc: True
+fixed_batch_timesteps: 3200 # optional, all scenario will use this batch size, only valid for on-policy algorithms
diff --git a/marllib/envs/base_env/gobigger.py b/marllib/envs/base_env/gobigger.py
new file mode 100644
index 00000000..dbbb003a
--- /dev/null
+++ b/marllib/envs/base_env/gobigger.py
@@ -0,0 +1,202 @@
+# MIT License
+
+# Copyright (c) 2023 Replicable-MARL
+
+# Permission is hereby granted, free of charge, to any person obtaining a copy
+# of this software and associated documentation files (the "Software"), to deal
+# in the Software without restriction, including without limitation the rights
+# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+# copies of the Software, and to permit persons to whom the Software is
+# furnished to do so, subject to the following conditions:
+#
+# The above copyright notice and this permission notice shall be included in all
+# copies or substantial portions of the Software.
+#
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+# SOFTWARE.
+
+import copy
+
+from gobigger.envs import create_env_custom
+from gym.spaces import Dict as GymDict, Box
+from ray.rllib.env.multi_agent_env import MultiAgentEnv
+import numpy as np
+
+
+policy_mapping_dict = {
+ "all_scenario": {
+ "description": "mixed scenarios to t>2 (num_teams > 1)",
+ "team_prefix": ("team0_", "team1_"),
+ "all_agents_one_policy": True,
+ "one_agent_one_policy": True,
+ },
+}
+
+
+class RLlibGoBigger(MultiAgentEnv):
+
+ def __init__(self, env_config):
+
+ map_name = env_config["map_name"]
+
+ env_config.pop("map_name", None)
+ self.num_agents_per_team = int(map_name.split("p")[-1][0])
+ self.num_teams = int(map_name.split("_t")[1][0])
+ if self.num_teams == 1:
+ policy_mapping_dict["all_scenario"]["team_prefix"] = ("team0_",)
+ self.num_agents = self.num_agents_per_team * self.num_teams
+ self.max_steps = env_config["frame_limit"]
+ self.env = create_env_custom(type='st', cfg=dict(
+ team_num=self.num_teams,
+ player_num_per_team=self.num_agents_per_team,
+ frame_limit=self.max_steps
+ ))
+
+ self.action_space = Box(low=-1,
+ high=1,
+ shape=(2,),
+ dtype=float)
+
+ self.rectangle_dim = 4
+ self.food_dim = self.num_agents * 100
+ self.thorns_dim = self.num_agents * 6
+ self.clone_dim = self.num_agents * 10
+ self.team_name_dim = 1
+ self.score_dim = 1
+
+ self.obs_dim = self.rectangle_dim + self.food_dim + self.thorns_dim + \
+ self.clone_dim + self.team_name_dim + self.score_dim
+
+ self.observation_space = GymDict({"obs": Box(
+ low=-1e6,
+ high=1e6,
+ shape=(self.obs_dim,),
+ dtype=float)})
+
+ self.agents = []
+ for team_index in range(self.num_teams):
+ for agent_index in range(self.num_agents_per_team):
+ self.agents.append("team{}_{}".format(team_index, agent_index))
+
+ env_config["map_name"] = map_name
+ self.env_config = env_config
+
+ def reset(self):
+ original_obs = self.env.reset()
+ obs = {}
+ for agent_index, agent_name in enumerate(self.agents):
+
+ rectangle = list(original_obs[1][agent_index]["rectangle"])
+
+ overlap_dict = original_obs[1][agent_index]["overlap"]
+
+ food = overlap_dict["food"]
+ if 4 * len(food) > self.food_dim:
+ food = food[:self.food_dim // 4]
+ else:
+ padding = [0] * (self.food_dim - 4 * len(food))
+ food.append(padding)
+ food = [item for sublist in food for item in sublist]
+
+ thorns = overlap_dict["thorns"]
+ if 6 * len(thorns) > self.thorns_dim:
+ thorns = thorns[:self.thorns_dim // 6]
+ else:
+ padding = [0] * (self.thorns_dim - 6 * len(thorns))
+ thorns.append(padding)
+ thorns = [item for sublist in thorns for item in sublist]
+
+ clone = overlap_dict["clone"]
+ if 10 * len(clone) > self.clone_dim:
+ clone = clone[:self.clone_dim // 10]
+ else:
+ padding = [0] * (self.clone_dim - 10 * len(clone))
+ clone.append(padding)
+ clone = [item for sublist in clone for item in sublist]
+
+ team = original_obs[1][agent_index]["team_name"]
+ score = original_obs[1][agent_index]["score"]
+
+ all_elements = rectangle + food + thorns + clone + [team] + [score]
+ all_elements = np.array(all_elements, dtype=float)
+
+ obs[agent_name] = {
+ "obs": all_elements
+ }
+
+ return obs
+
+ def step(self, action_dict):
+ actions = {}
+ for i, agent_name in enumerate(self.agents):
+ actions[i] = list(action_dict[agent_name])
+ actions[i].append(-1)
+
+ original_obs, team_rewards, done, info = self.env.step(actions)
+
+ rewards = {}
+ obs = {}
+ infos = {}
+
+ for agent_index, agent_name in enumerate(self.agents):
+
+ rectangle = list(original_obs[1][agent_index]["rectangle"])
+
+ overlap_dict = original_obs[1][agent_index]["overlap"]
+
+ food = overlap_dict["food"]
+ if 4 * len(food) > self.food_dim:
+ food = food[:self.food_dim // 4]
+ else:
+ padding = [0] * (self.food_dim - 4 * len(food))
+ food.append(padding)
+ food = [item for sublist in food for item in sublist]
+
+ thorns = overlap_dict["thorns"]
+ if 6 * len(thorns) > self.thorns_dim:
+ thorns = thorns[:self.thorns_dim // 6]
+ else:
+ padding = [0] * (self.thorns_dim - 6 * len(thorns))
+ thorns.append(padding)
+ thorns = [item for sublist in thorns for item in sublist]
+
+ clone = overlap_dict["clone"]
+ if 10 * len(clone) > self.clone_dim:
+ clone = clone[:self.clone_dim // 10]
+ else:
+ padding = [0] * (self.clone_dim - 10 * len(clone))
+ clone.append(padding)
+ clone = [item for sublist in clone for item in sublist]
+
+ team = original_obs[1][agent_index]["team_name"]
+ score = original_obs[1][agent_index]["score"]
+
+ all_elements = rectangle + food + thorns + clone + [team] + [score]
+ all_elements = np.array(all_elements, dtype=float)
+
+ obs[agent_name] = {
+ "obs": all_elements
+ }
+
+ rewards[agent_name] = team_rewards[team]
+
+ dones = {"__all__": done}
+ return obs, rewards, dones, infos
+
+ def get_env_info(self):
+ env_info = {
+ "space_obs": self.observation_space,
+ "space_act": self.action_space,
+ "num_agents": self.num_agents,
+ "episode_limit": self.max_steps,
+ "policy_mapping_info": policy_mapping_dict
+ }
+ return env_info
+
+ def close(self):
+ self.env.close()
diff --git a/marllib/envs/global_reward_env/__init__.py b/marllib/envs/global_reward_env/__init__.py
index d5088910..8ab46a8d 100644
--- a/marllib/envs/global_reward_env/__init__.py
+++ b/marllib/envs/global_reward_env/__init__.py
@@ -24,56 +24,70 @@
try:
from marllib.envs.global_reward_env.mpe_fcoop import RLlibMPE_FCOOP
+
COOP_ENV_REGISTRY["mpe"] = RLlibMPE_FCOOP
except Exception as e:
COOP_ENV_REGISTRY["mpe"] = str(e)
try:
from marllib.envs.global_reward_env.magent_fcoop import RLlibMAgent_FCOOP
+
COOP_ENV_REGISTRY["magent"] = RLlibMAgent_FCOOP
except Exception as e:
COOP_ENV_REGISTRY["magent"] = str(e)
try:
from marllib.envs.global_reward_env.mamujoco_fcoop import RLlibMAMujoco_FCOOP
+
COOP_ENV_REGISTRY["mamujoco"] = RLlibMAMujoco_FCOOP
except Exception as e:
COOP_ENV_REGISTRY["mamujoco"] = str(e)
try:
from marllib.envs.global_reward_env.smac_fcoop import RLlibSMAC_FCOOP
+
COOP_ENV_REGISTRY["smac"] = RLlibSMAC_FCOOP
except Exception as e:
COOP_ENV_REGISTRY["smac"] = str(e)
try:
from marllib.envs.global_reward_env.football_fcoop import RLlibGFootball_FCOOP
+
COOP_ENV_REGISTRY["football"] = RLlibGFootball_FCOOP
except Exception as e:
COOP_ENV_REGISTRY["football"] = str(e)
try:
from marllib.envs.global_reward_env.rware_fcoop import RLlibRWARE_FCOOP
+
COOP_ENV_REGISTRY["rware"] = RLlibRWARE_FCOOP
except Exception as e:
COOP_ENV_REGISTRY["rware"] = str(e)
try:
from marllib.envs.global_reward_env.lbf_fcoop import RLlibLBF_FCOOP
+
COOP_ENV_REGISTRY["lbf"] = RLlibLBF_FCOOP
except Exception as e:
COOP_ENV_REGISTRY["lbf"] = str(e)
try:
from marllib.envs.global_reward_env.pommerman_fcoop import RLlibPommerman_FCOOP
+
COOP_ENV_REGISTRY["pommerman"] = RLlibPommerman_FCOOP
except Exception as e:
COOP_ENV_REGISTRY["pommerman"] = str(e)
-
try:
from marllib.envs.global_reward_env.mate_fcoop import RLlibMATE_FCOOP
+
COOP_ENV_REGISTRY["mate"] = RLlibMATE_FCOOP
except Exception as e:
COOP_ENV_REGISTRY["mate"] = str(e)
+try:
+ from marllib.envs.global_reward_env.gobigger_fcoop import RLlibGoBigger_FCOOP
+
+ COOP_ENV_REGISTRY["gobigger"] = RLlibGoBigger_FCOOP
+except Exception as e:
+ COOP_ENV_REGISTRY["gobigger"] = str(e)
diff --git a/marllib/envs/global_reward_env/gobigger_fcoop.py b/marllib/envs/global_reward_env/gobigger_fcoop.py
new file mode 100644
index 00000000..455314c8
--- /dev/null
+++ b/marllib/envs/global_reward_env/gobigger_fcoop.py
@@ -0,0 +1,207 @@
+# MIT License
+
+# Copyright (c) 2023 Replicable-MARL
+
+# Permission is hereby granted, free of charge, to any person obtaining a copy
+# of this software and associated documentation files (the "Software"), to deal
+# in the Software without restriction, including without limitation the rights
+# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+# copies of the Software, and to permit persons to whom the Software is
+# furnished to do so, subject to the following conditions:
+#
+# The above copyright notice and this permission notice shall be included in all
+# copies or substantial portions of the Software.
+#
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+# SOFTWARE.
+
+import copy
+
+from gobigger.envs import create_env_custom
+from gym.spaces import Dict as GymDict, Box
+from ray.rllib.env.multi_agent_env import MultiAgentEnv
+import numpy as np
+
+policy_mapping_dict = {
+ "all_scenario": {
+ "description": "cooperative scenarios to t=1 (num_teams=1)",
+ "team": ("team0_"),
+ "all_agents_one_policy": True,
+ "one_agent_one_policy": True,
+ },
+}
+
+
+class RLlibGoBigger_FCOOP(MultiAgentEnv):
+
+ def __init__(self, env_config):
+
+ map_name = env_config["map_name"]
+
+ env_config.pop("map_name", None)
+ self.num_agents_per_team = int(map_name.split("p")[-1][0])
+ self.num_teams = 1
+ self.num_agents = self.num_agents_per_team * self.num_teams
+ self.max_steps = env_config["frame_limit"]
+ self.env = create_env_custom(type='st', cfg=dict(
+ team_num=self.num_teams,
+ player_num_per_team=self.num_agents_per_team,
+ frame_limit=self.max_steps
+ ))
+
+ self.action_space = Box(low=-1,
+ high=1,
+ shape=(2,),
+ dtype=float)
+
+ self.rectangle_dim = 4
+ self.food_dim = self.num_agents * 100
+ self.thorns_dim = self.num_agents * 6
+ self.clone_dim = self.num_agents * 10
+ self.team_name_dim = 1
+ self.score_dim = 1
+
+ self.obs_dim = self.rectangle_dim + self.food_dim + self.thorns_dim + \
+ self.clone_dim + self.team_name_dim + self.score_dim
+
+ self.observation_space = GymDict({"obs": Box(
+ low=-1e6,
+ high=1e6,
+ shape=(self.obs_dim,),
+ dtype=float)})
+
+ self.agents = []
+ for team_index in range(self.num_teams):
+ for agent_index in range(self.num_agents_per_team):
+ self.agents.append("team{}_{}".format(team_index, agent_index))
+
+ env_config["map_name"] = map_name
+ self.env_config = env_config
+
+ def reset(self):
+ original_obs = self.env.reset()
+ obs = {}
+ for agent_index, agent_name in enumerate(self.agents):
+
+ rectangle = list(original_obs[1][agent_index]["rectangle"])
+
+ overlap_dict = original_obs[1][agent_index]["overlap"]
+
+ food = overlap_dict["food"]
+ if 4 * len(food) > self.food_dim:
+ food = food[:self.food_dim // 4]
+ else:
+ padding = [0] * (self.food_dim - 4 * len(food))
+ food.append(padding)
+ food = [item for sublist in food for item in sublist]
+
+ thorns = overlap_dict["thorns"]
+ if 6 * len(thorns) > self.thorns_dim:
+ thorns = thorns[:self.thorns_dim // 6]
+ else:
+ padding = [0] * (self.thorns_dim - 6 * len(thorns))
+ thorns.append(padding)
+ thorns = [item for sublist in thorns for item in sublist]
+
+ clone = overlap_dict["clone"]
+ if 10 * len(clone) > self.clone_dim:
+ clone = clone[:self.clone_dim // 10]
+ else:
+ padding = [0] * (self.clone_dim - 10 * len(clone))
+ clone.append(padding)
+ clone = [item for sublist in clone for item in sublist]
+
+ team = original_obs[1][agent_index]["team_name"]
+ score = original_obs[1][agent_index]["score"]
+
+ all_elements = rectangle + food + thorns + clone + [team] + [score]
+
+ if len(all_elements) != self.obs_dim:
+ print(1)
+
+ all_elements = np.array(all_elements, dtype=float)
+
+ obs[agent_name] = {
+ "obs": all_elements
+ }
+
+ return obs
+
+ def step(self, action_dict):
+ actions = {}
+ for i, agent_name in enumerate(self.agents):
+ actions[i] = list(action_dict[agent_name])
+ actions[i].append(-1)
+
+ original_obs, team_rewards, done, info = self.env.step(actions)
+
+ rewards = {}
+ obs = {}
+ infos = {}
+
+ for agent_index, agent_name in enumerate(self.agents):
+
+ rectangle = list(original_obs[1][agent_index]["rectangle"])
+
+ overlap_dict = original_obs[1][agent_index]["overlap"]
+
+ food = overlap_dict["food"]
+ if 4 * len(food) > self.food_dim:
+ food = food[:self.food_dim // 4]
+ else:
+ padding = [0] * (self.food_dim - 4 * len(food))
+ food.append(padding)
+ food = [item for sublist in food for item in sublist]
+
+ thorns = overlap_dict["thorns"]
+ if 6 * len(thorns) > self.thorns_dim:
+ thorns = thorns[:self.thorns_dim // 6]
+ else:
+ padding = [0] * (self.thorns_dim - 6 * len(thorns))
+ thorns.append(padding)
+ thorns = [item for sublist in thorns for item in sublist]
+
+ clone = overlap_dict["clone"]
+ if 10 * len(clone) > self.clone_dim:
+ clone = clone[:self.clone_dim // 10]
+ else:
+ padding = [0] * (self.clone_dim - 10 * len(clone))
+ clone.append(padding)
+ clone = [item for sublist in clone for item in sublist]
+
+ team = original_obs[1][agent_index]["team_name"]
+ score = original_obs[1][agent_index]["score"]
+
+ all_elements = rectangle + food + thorns + clone + [team] + [score]
+
+ if len(all_elements) != self.obs_dim:
+ print(1)
+
+ all_elements = np.array(all_elements, dtype=float)
+
+ obs[agent_name] = {
+ "obs": all_elements
+ }
+
+ rewards[agent_name] = team_rewards[team]
+
+ dones = {"__all__": done}
+ return obs, rewards, dones, infos
+
+ def get_env_info(self):
+ env_info = {
+ "space_obs": self.observation_space,
+ "space_act": self.action_space,
+ "num_agents": self.num_agents,
+ "episode_limit": self.max_steps,
+ "policy_mapping_info": policy_mapping_dict
+ }
+ return env_info
+
+ def close(self):
+ self.env.close()
diff --git a/marllib/marl/algos/README.md b/marllib/marl/algos/README.md
deleted file mode 100644
index dcd00f6c..00000000
--- a/marllib/marl/algos/README.md
+++ /dev/null
@@ -1,37 +0,0 @@
-10 environments are available for Independent Learning
-
-- Football
-- MPE
-- SMAC
-- mamujoco
-- RWARE
-- LBF
-- Pommerman
-- Magent
-- MetaDrive
-- Hanabi
-
-
-7 environments are available for Value Decomposition
-
-- Football
-- MPE
-- SMAC
-- mamujoco
-- RWARE
-- LBF
-- Pommerman
-
-9 environments are available for Centralized Critic
-
-- Football
-- MPE
-- SMAC
-- mamujoco
-- RWARE
-- LBF
-- Pommerman
-- Magent
-- Hanabi
-
-
diff --git a/marllib/marl/algos/hyperparams/common/coma.yaml b/marllib/marl/algos/hyperparams/common/coma.yaml
index b7a455b0..44274589 100644
--- a/marllib/marl/algos/hyperparams/common/coma.yaml
+++ b/marllib/marl/algos/hyperparams/common/coma.yaml
@@ -29,6 +29,6 @@ algo_args:
lambda: 1.0
vf_loss_coeff: 1.0
batch_episode: 10
- batch_mode: "complete_episodes"
+ batch_mode: "truncate_episodes"
lr: 0.0005
entropy_coeff: 0.01
diff --git a/marllib/marl/algos/hyperparams/common/facmac.yaml b/marllib/marl/algos/hyperparams/common/facmac.yaml
index ad22a4d4..8ffbe6d9 100644
--- a/marllib/marl/algos/hyperparams/common/facmac.yaml
+++ b/marllib/marl/algos/hyperparams/common/facmac.yaml
@@ -36,6 +36,6 @@ algo_args:
buffer_size_episode: 1000
target_network_update_freq_episode: 1
tau: 0.002
- batch_mode: "complete_episodes"
+ batch_mode: "truncate_episodes"
mixer: "qmix" # qmix or vdn
diff --git a/marllib/marl/algos/hyperparams/common/happo.yaml b/marllib/marl/algos/hyperparams/common/happo.yaml
index 42b71349..1388564b 100644
--- a/marllib/marl/algos/hyperparams/common/happo.yaml
+++ b/marllib/marl/algos/hyperparams/common/happo.yaml
@@ -38,4 +38,4 @@ algo_args:
entropy_coeff: 0.01
vf_clip_param: 10.0
min_lr_schedule: 1e-11
- batch_mode: "complete_episodes"
\ No newline at end of file
+ batch_mode: "truncate_episodes"
\ No newline at end of file
diff --git a/marllib/marl/algos/hyperparams/common/hatrpo.yaml b/marllib/marl/algos/hyperparams/common/hatrpo.yaml
index 85fc6f5f..ddf4e9c5 100644
--- a/marllib/marl/algos/hyperparams/common/hatrpo.yaml
+++ b/marllib/marl/algos/hyperparams/common/hatrpo.yaml
@@ -34,7 +34,7 @@ algo_args:
vf_loss_coeff: 1.0
entropy_coeff: 0.01
vf_clip_param: 10.0
- batch_mode: "complete_episodes"
+ batch_mode: "truncate_episodes"
kl_threshold: 0.00001
accept_ratio: 0.5
critic_lr: 0.00005
diff --git a/marllib/marl/algos/hyperparams/common/ia2c.yaml b/marllib/marl/algos/hyperparams/common/ia2c.yaml
index 2b2c4fa6..76af2158 100644
--- a/marllib/marl/algos/hyperparams/common/ia2c.yaml
+++ b/marllib/marl/algos/hyperparams/common/ia2c.yaml
@@ -29,6 +29,6 @@ algo_args:
lambda: 1.0
vf_loss_coeff: 1.0
batch_episode: 10
- batch_mode: "complete_episodes"
+ batch_mode: "truncate_episodes"
lr: 0.0005
entropy_coeff: 0.01
diff --git a/marllib/marl/algos/hyperparams/common/iddpg.yaml b/marllib/marl/algos/hyperparams/common/iddpg.yaml
index cfbe62aa..c4971a4a 100644
--- a/marllib/marl/algos/hyperparams/common/iddpg.yaml
+++ b/marllib/marl/algos/hyperparams/common/iddpg.yaml
@@ -36,5 +36,5 @@ algo_args:
buffer_size_episode: 1000
target_network_update_freq_episode: 1
tau: 0.002
- batch_mode: "complete_episodes"
+ batch_mode: "truncate_episodes"
diff --git a/marllib/marl/algos/hyperparams/common/ippo.yaml b/marllib/marl/algos/hyperparams/common/ippo.yaml
index dad13578..8df638d1 100644
--- a/marllib/marl/algos/hyperparams/common/ippo.yaml
+++ b/marllib/marl/algos/hyperparams/common/ippo.yaml
@@ -35,5 +35,5 @@ algo_args:
entropy_coeff: 0.01
clip_param: 0.3
vf_clip_param: 10.0
- batch_mode: "complete_episodes"
+ batch_mode: "truncate_episodes"
diff --git a/marllib/marl/algos/hyperparams/common/itrpo.yaml b/marllib/marl/algos/hyperparams/common/itrpo.yaml
index 1b0ad894..66d1e072 100644
--- a/marllib/marl/algos/hyperparams/common/itrpo.yaml
+++ b/marllib/marl/algos/hyperparams/common/itrpo.yaml
@@ -34,7 +34,7 @@ algo_args:
vf_loss_coeff: 1.0
entropy_coeff: 0.01
vf_clip_param: 10.0
- batch_mode: "complete_episodes"
+ batch_mode: "truncate_episodes"
kl_threshold: 0.00001
accept_ratio: 0.5
critic_lr: 0.00005
diff --git a/marllib/marl/algos/hyperparams/common/maa2c.yaml b/marllib/marl/algos/hyperparams/common/maa2c.yaml
index 449462d6..df3b0abb 100644
--- a/marllib/marl/algos/hyperparams/common/maa2c.yaml
+++ b/marllib/marl/algos/hyperparams/common/maa2c.yaml
@@ -29,6 +29,6 @@ algo_args:
lambda: 1.0
vf_loss_coeff: 1.0
batch_episode: 10
- batch_mode: "complete_episodes"
+ batch_mode: "truncate_episodes"
lr: 0.0005
entropy_coeff: 0.01
diff --git a/marllib/marl/algos/hyperparams/common/maddpg.yaml b/marllib/marl/algos/hyperparams/common/maddpg.yaml
index 20d42498..5c957a8d 100644
--- a/marllib/marl/algos/hyperparams/common/maddpg.yaml
+++ b/marllib/marl/algos/hyperparams/common/maddpg.yaml
@@ -36,5 +36,5 @@ algo_args:
buffer_size_episode: 1000
target_network_update_freq_episode: 1
tau: 0.002
- batch_mode: "complete_episodes"
+ batch_mode: "truncate_episodes"
diff --git a/marllib/marl/algos/hyperparams/common/mappo.yaml b/marllib/marl/algos/hyperparams/common/mappo.yaml
index c03dcb26..efcbb7f2 100644
--- a/marllib/marl/algos/hyperparams/common/mappo.yaml
+++ b/marllib/marl/algos/hyperparams/common/mappo.yaml
@@ -35,6 +35,6 @@ algo_args:
entropy_coeff: 0.01
clip_param: 0.3
vf_clip_param: 10.0
- batch_mode: "complete_episodes"
+ batch_mode: "truncate_episodes"
diff --git a/marllib/marl/algos/hyperparams/common/matrpo.yaml b/marllib/marl/algos/hyperparams/common/matrpo.yaml
index 4d44a416..76a86a8d 100644
--- a/marllib/marl/algos/hyperparams/common/matrpo.yaml
+++ b/marllib/marl/algos/hyperparams/common/matrpo.yaml
@@ -34,7 +34,7 @@ algo_args:
vf_loss_coeff: 1.0
entropy_coeff: 0.01
vf_clip_param: 10.0
- batch_mode: "complete_episodes"
+ batch_mode: "truncate_episodes"
kl_threshold: 0.00001
accept_ratio: 0.5
critic_lr: 0.00005
diff --git a/marllib/marl/algos/hyperparams/common/vda2c.yaml b/marllib/marl/algos/hyperparams/common/vda2c.yaml
index f2d0e24d..95c03bb6 100644
--- a/marllib/marl/algos/hyperparams/common/vda2c.yaml
+++ b/marllib/marl/algos/hyperparams/common/vda2c.yaml
@@ -29,7 +29,7 @@ algo_args:
lambda: 1.0
vf_loss_coeff: 1.0
batch_episode: 10
- batch_mode: "complete_episodes"
+ batch_mode: "truncate_episodes"
lr: 0.0005
entropy_coeff: 0.01
mixer: "qmix" # vdn
diff --git a/marllib/marl/algos/hyperparams/common/vdppo.yaml b/marllib/marl/algos/hyperparams/common/vdppo.yaml
index 04e90420..3adf7000 100644
--- a/marllib/marl/algos/hyperparams/common/vdppo.yaml
+++ b/marllib/marl/algos/hyperparams/common/vdppo.yaml
@@ -35,5 +35,5 @@ algo_args:
entropy_coeff: 0.01
clip_param: 0.3
vf_clip_param: 10.0
- batch_mode: "complete_episodes"
+ batch_mode: "truncate_episodes"
mixer: "qmix" # qmix or vdn
diff --git a/marllib/marl/algos/hyperparams/finetuned/mamujoco/facmac.yaml b/marllib/marl/algos/hyperparams/finetuned/mamujoco/facmac.yaml
index 62b186af..1d7f09ce 100644
--- a/marllib/marl/algos/hyperparams/finetuned/mamujoco/facmac.yaml
+++ b/marllib/marl/algos/hyperparams/finetuned/mamujoco/facmac.yaml
@@ -36,6 +36,6 @@ algo_args:
buffer_size_episode: 1000
target_network_update_freq_episode: 1
tau: 0.002
- batch_mode: "complete_episodes"
+ batch_mode: "truncate_episodes"
mixer: "qmix" # qmix or vdn
diff --git a/marllib/marl/algos/hyperparams/finetuned/mamujoco/happo.yaml b/marllib/marl/algos/hyperparams/finetuned/mamujoco/happo.yaml
index 1b2707dd..1451a6f9 100644
--- a/marllib/marl/algos/hyperparams/finetuned/mamujoco/happo.yaml
+++ b/marllib/marl/algos/hyperparams/finetuned/mamujoco/happo.yaml
@@ -36,6 +36,6 @@ algo_args:
lr: 0.0001
entropy_coeff: 0.01
vf_clip_param: 10.0
- batch_mode: "complete_episodes"
+ batch_mode: "truncate_episodes"
min_lr_schedule: 1e-11
gain: 0.01
diff --git a/marllib/marl/algos/hyperparams/finetuned/mamujoco/hatrpo.yaml b/marllib/marl/algos/hyperparams/finetuned/mamujoco/hatrpo.yaml
index fd289dc4..45436c7c 100644
--- a/marllib/marl/algos/hyperparams/finetuned/mamujoco/hatrpo.yaml
+++ b/marllib/marl/algos/hyperparams/finetuned/mamujoco/hatrpo.yaml
@@ -34,7 +34,7 @@ algo_args:
vf_loss_coeff: 1.0
entropy_coeff: 0.01
vf_clip_param: 10.0
- batch_mode: "complete_episodes"
+ batch_mode: "truncate_episodes"
kl_threshold: 0.00001
accept_ratio: 0.5
critic_lr: 0.0005
diff --git a/marllib/marl/algos/hyperparams/finetuned/mamujoco/ia2c.yaml b/marllib/marl/algos/hyperparams/finetuned/mamujoco/ia2c.yaml
index 2b2c4fa6..76af2158 100644
--- a/marllib/marl/algos/hyperparams/finetuned/mamujoco/ia2c.yaml
+++ b/marllib/marl/algos/hyperparams/finetuned/mamujoco/ia2c.yaml
@@ -29,6 +29,6 @@ algo_args:
lambda: 1.0
vf_loss_coeff: 1.0
batch_episode: 10
- batch_mode: "complete_episodes"
+ batch_mode: "truncate_episodes"
lr: 0.0005
entropy_coeff: 0.01
diff --git a/marllib/marl/algos/hyperparams/finetuned/mamujoco/iddpg.yaml b/marllib/marl/algos/hyperparams/finetuned/mamujoco/iddpg.yaml
index babce71a..e6e84b55 100644
--- a/marllib/marl/algos/hyperparams/finetuned/mamujoco/iddpg.yaml
+++ b/marllib/marl/algos/hyperparams/finetuned/mamujoco/iddpg.yaml
@@ -36,5 +36,5 @@ algo_args:
buffer_size_episode: 1000
target_network_update_freq_episode: 1
tau: 0.002
- batch_mode: "complete_episodes"
+ batch_mode: "truncate_episodes"
diff --git a/marllib/marl/algos/hyperparams/finetuned/mamujoco/ippo.yaml b/marllib/marl/algos/hyperparams/finetuned/mamujoco/ippo.yaml
index 6dde1d9d..c25d964c 100644
--- a/marllib/marl/algos/hyperparams/finetuned/mamujoco/ippo.yaml
+++ b/marllib/marl/algos/hyperparams/finetuned/mamujoco/ippo.yaml
@@ -35,5 +35,5 @@ algo_args:
entropy_coeff: 0.01
clip_param: 0.3
vf_clip_param: 10.0
- batch_mode: "complete_episodes"
+ batch_mode: "truncate_episodes"
diff --git a/marllib/marl/algos/hyperparams/finetuned/mamujoco/itrpo.yaml b/marllib/marl/algos/hyperparams/finetuned/mamujoco/itrpo.yaml
index 2578927b..e38d6d2c 100644
--- a/marllib/marl/algos/hyperparams/finetuned/mamujoco/itrpo.yaml
+++ b/marllib/marl/algos/hyperparams/finetuned/mamujoco/itrpo.yaml
@@ -34,7 +34,7 @@ algo_args:
vf_loss_coeff: 1.0
entropy_coeff: 0.01
vf_clip_param: 10.0
- batch_mode: "complete_episodes"
+ batch_mode: "truncate_episodes"
kl_threshold: 0.00001
accept_ratio: 0.5
critic_lr: 0.0005
diff --git a/marllib/marl/algos/hyperparams/finetuned/mamujoco/maa2c.yaml b/marllib/marl/algos/hyperparams/finetuned/mamujoco/maa2c.yaml
index 449462d6..df3b0abb 100644
--- a/marllib/marl/algos/hyperparams/finetuned/mamujoco/maa2c.yaml
+++ b/marllib/marl/algos/hyperparams/finetuned/mamujoco/maa2c.yaml
@@ -29,6 +29,6 @@ algo_args:
lambda: 1.0
vf_loss_coeff: 1.0
batch_episode: 10
- batch_mode: "complete_episodes"
+ batch_mode: "truncate_episodes"
lr: 0.0005
entropy_coeff: 0.01
diff --git a/marllib/marl/algos/hyperparams/finetuned/mamujoco/maddpg.yaml b/marllib/marl/algos/hyperparams/finetuned/mamujoco/maddpg.yaml
index 476e6a96..9dc602d9 100644
--- a/marllib/marl/algos/hyperparams/finetuned/mamujoco/maddpg.yaml
+++ b/marllib/marl/algos/hyperparams/finetuned/mamujoco/maddpg.yaml
@@ -36,5 +36,5 @@ algo_args:
buffer_size_episode: 1000
target_network_update_freq_episode: 1
tau: 0.002
- batch_mode: "complete_episodes"
+ batch_mode: "truncate_episodes"
diff --git a/marllib/marl/algos/hyperparams/finetuned/mamujoco/mappo.yaml b/marllib/marl/algos/hyperparams/finetuned/mamujoco/mappo.yaml
index eef75ff0..802aed8f 100644
--- a/marllib/marl/algos/hyperparams/finetuned/mamujoco/mappo.yaml
+++ b/marllib/marl/algos/hyperparams/finetuned/mamujoco/mappo.yaml
@@ -35,6 +35,6 @@ algo_args:
entropy_coeff: 0.01
clip_param: 0.3
vf_clip_param: 10.0
- batch_mode: "complete_episodes"
+ batch_mode: "truncate_episodes"
diff --git a/marllib/marl/algos/hyperparams/finetuned/mamujoco/matrpo.yaml b/marllib/marl/algos/hyperparams/finetuned/mamujoco/matrpo.yaml
index 5942888f..9770b9a2 100644
--- a/marllib/marl/algos/hyperparams/finetuned/mamujoco/matrpo.yaml
+++ b/marllib/marl/algos/hyperparams/finetuned/mamujoco/matrpo.yaml
@@ -34,7 +34,7 @@ algo_args:
vf_loss_coeff: 1.0
entropy_coeff: 0.01
vf_clip_param: 10.0
- batch_mode: "complete_episodes"
+ batch_mode: "truncate_episodes"
kl_threshold: 0.00001
accept_ratio: 0.5
critic_lr: 0.0005
diff --git a/marllib/marl/algos/hyperparams/finetuned/mamujoco/vda2c.yaml b/marllib/marl/algos/hyperparams/finetuned/mamujoco/vda2c.yaml
index f2d0e24d..95c03bb6 100644
--- a/marllib/marl/algos/hyperparams/finetuned/mamujoco/vda2c.yaml
+++ b/marllib/marl/algos/hyperparams/finetuned/mamujoco/vda2c.yaml
@@ -29,7 +29,7 @@ algo_args:
lambda: 1.0
vf_loss_coeff: 1.0
batch_episode: 10
- batch_mode: "complete_episodes"
+ batch_mode: "truncate_episodes"
lr: 0.0005
entropy_coeff: 0.01
mixer: "qmix" # vdn
diff --git a/marllib/marl/algos/hyperparams/finetuned/mamujoco/vdppo.yaml b/marllib/marl/algos/hyperparams/finetuned/mamujoco/vdppo.yaml
index fe3f1bd4..d1b53e56 100644
--- a/marllib/marl/algos/hyperparams/finetuned/mamujoco/vdppo.yaml
+++ b/marllib/marl/algos/hyperparams/finetuned/mamujoco/vdppo.yaml
@@ -35,5 +35,5 @@ algo_args:
entropy_coeff: 0.01
clip_param: 0.3
vf_clip_param: 10.0
- batch_mode: "complete_episodes"
+ batch_mode: "truncate_episodes"
mixer: "qmix" # qmix or vdn
diff --git a/marllib/marl/algos/hyperparams/finetuned/mpe/coma.yaml b/marllib/marl/algos/hyperparams/finetuned/mpe/coma.yaml
index 26ad593f..3b54ae3b 100644
--- a/marllib/marl/algos/hyperparams/finetuned/mpe/coma.yaml
+++ b/marllib/marl/algos/hyperparams/finetuned/mpe/coma.yaml
@@ -29,6 +29,6 @@ algo_args:
lambda: 1.0
vf_loss_coeff: 1.0
batch_episode: 128
- batch_mode: "complete_episodes"
+ batch_mode: "truncate_episodes"
lr: 0.0005
entropy_coeff: 0.01
diff --git a/marllib/marl/algos/hyperparams/finetuned/mpe/facmac.yaml b/marllib/marl/algos/hyperparams/finetuned/mpe/facmac.yaml
index 2c8d62b7..f42ce4ec 100644
--- a/marllib/marl/algos/hyperparams/finetuned/mpe/facmac.yaml
+++ b/marllib/marl/algos/hyperparams/finetuned/mpe/facmac.yaml
@@ -36,6 +36,6 @@ algo_args:
buffer_size_episode: 1000
target_network_update_freq_episode: 1
tau: 0.002
- batch_mode: "complete_episodes"
+ batch_mode: "truncate_episodes"
mixer: "qmix" # qmix or vdn
diff --git a/marllib/marl/algos/hyperparams/finetuned/mpe/happo.yaml b/marllib/marl/algos/hyperparams/finetuned/mpe/happo.yaml
index 4ab06ad1..afef9151 100644
--- a/marllib/marl/algos/hyperparams/finetuned/mpe/happo.yaml
+++ b/marllib/marl/algos/hyperparams/finetuned/mpe/happo.yaml
@@ -38,4 +38,4 @@ algo_args:
entropy_coeff: 0.01
vf_clip_param: 10.0
min_lr_schedule: 1e-11
- batch_mode: "complete_episodes"
+ batch_mode: "truncate_episodes"
diff --git a/marllib/marl/algos/hyperparams/finetuned/mpe/hatrpo.yaml b/marllib/marl/algos/hyperparams/finetuned/mpe/hatrpo.yaml
index 588d1ed3..a0d81929 100644
--- a/marllib/marl/algos/hyperparams/finetuned/mpe/hatrpo.yaml
+++ b/marllib/marl/algos/hyperparams/finetuned/mpe/hatrpo.yaml
@@ -34,7 +34,7 @@ algo_args:
vf_loss_coeff: 1.0
entropy_coeff: 0.01
vf_clip_param: 10.0
- batch_mode: "complete_episodes"
+ batch_mode: "truncate_episodes"
kl_threshold: 0.00001
accept_ratio: 0.5
critic_lr: 0.0005
diff --git a/marllib/marl/algos/hyperparams/finetuned/mpe/ia2c.yaml b/marllib/marl/algos/hyperparams/finetuned/mpe/ia2c.yaml
index 2b2c4fa6..76af2158 100644
--- a/marllib/marl/algos/hyperparams/finetuned/mpe/ia2c.yaml
+++ b/marllib/marl/algos/hyperparams/finetuned/mpe/ia2c.yaml
@@ -29,6 +29,6 @@ algo_args:
lambda: 1.0
vf_loss_coeff: 1.0
batch_episode: 10
- batch_mode: "complete_episodes"
+ batch_mode: "truncate_episodes"
lr: 0.0005
entropy_coeff: 0.01
diff --git a/marllib/marl/algos/hyperparams/finetuned/mpe/iddpg.yaml b/marllib/marl/algos/hyperparams/finetuned/mpe/iddpg.yaml
index 94ba33ef..621beb54 100644
--- a/marllib/marl/algos/hyperparams/finetuned/mpe/iddpg.yaml
+++ b/marllib/marl/algos/hyperparams/finetuned/mpe/iddpg.yaml
@@ -36,5 +36,5 @@ algo_args:
buffer_size_episode: 1000
target_network_update_freq_episode: 1
tau: 0.002
- batch_mode: "complete_episodes"
+ batch_mode: "truncate_episodes"
diff --git a/marllib/marl/algos/hyperparams/finetuned/mpe/ippo.yaml b/marllib/marl/algos/hyperparams/finetuned/mpe/ippo.yaml
index aa8d522d..8c6c08b4 100644
--- a/marllib/marl/algos/hyperparams/finetuned/mpe/ippo.yaml
+++ b/marllib/marl/algos/hyperparams/finetuned/mpe/ippo.yaml
@@ -35,5 +35,5 @@ algo_args:
entropy_coeff: 0.01
clip_param: 0.3
vf_clip_param: 20.0
- batch_mode: "complete_episodes"
+ batch_mode: "truncate_episodes"
diff --git a/marllib/marl/algos/hyperparams/finetuned/mpe/itrpo.yaml b/marllib/marl/algos/hyperparams/finetuned/mpe/itrpo.yaml
index 3e8cc247..f41374a8 100644
--- a/marllib/marl/algos/hyperparams/finetuned/mpe/itrpo.yaml
+++ b/marllib/marl/algos/hyperparams/finetuned/mpe/itrpo.yaml
@@ -34,7 +34,7 @@ algo_args:
vf_loss_coeff: 1.0
entropy_coeff: 0.01
vf_clip_param: 10.0
- batch_mode: "complete_episodes"
+ batch_mode: "truncate_episodes"
kl_threshold: 0.00001
accept_ratio: 0.5
critic_lr: 0.0005
diff --git a/marllib/marl/algos/hyperparams/finetuned/mpe/maa2c.yaml b/marllib/marl/algos/hyperparams/finetuned/mpe/maa2c.yaml
index a5201c1f..74dccc18 100644
--- a/marllib/marl/algos/hyperparams/finetuned/mpe/maa2c.yaml
+++ b/marllib/marl/algos/hyperparams/finetuned/mpe/maa2c.yaml
@@ -29,6 +29,6 @@ algo_args:
lambda: 1.0
vf_loss_coeff: 1.0
batch_episode: 128
- batch_mode: "complete_episodes"
+ batch_mode: "truncate_episodes"
lr: 0.0005
entropy_coeff: 0.01
diff --git a/marllib/marl/algos/hyperparams/finetuned/mpe/maddpg.yaml b/marllib/marl/algos/hyperparams/finetuned/mpe/maddpg.yaml
index 61ec7e6c..2faf2b4e 100644
--- a/marllib/marl/algos/hyperparams/finetuned/mpe/maddpg.yaml
+++ b/marllib/marl/algos/hyperparams/finetuned/mpe/maddpg.yaml
@@ -36,5 +36,5 @@ algo_args:
buffer_size_episode: 10000
target_network_update_freq_episode: 1
tau: 0.002
- batch_mode: "complete_episodes"
+ batch_mode: "truncate_episodes"
diff --git a/marllib/marl/algos/hyperparams/finetuned/mpe/mappo.yaml b/marllib/marl/algos/hyperparams/finetuned/mpe/mappo.yaml
index e5f13fc5..823705a1 100644
--- a/marllib/marl/algos/hyperparams/finetuned/mpe/mappo.yaml
+++ b/marllib/marl/algos/hyperparams/finetuned/mpe/mappo.yaml
@@ -35,6 +35,6 @@ algo_args:
entropy_coeff: 0.01
clip_param: 0.3
vf_clip_param: 20.0
- batch_mode: "complete_episodes"
+ batch_mode: "truncate_episodes"
diff --git a/marllib/marl/algos/hyperparams/finetuned/mpe/matrpo.yaml b/marllib/marl/algos/hyperparams/finetuned/mpe/matrpo.yaml
index 3a3da10f..6ded245c 100644
--- a/marllib/marl/algos/hyperparams/finetuned/mpe/matrpo.yaml
+++ b/marllib/marl/algos/hyperparams/finetuned/mpe/matrpo.yaml
@@ -34,7 +34,7 @@ algo_args:
vf_loss_coeff: 1.0
entropy_coeff: 0.01
vf_clip_param: 10.0
- batch_mode: "complete_episodes"
+ batch_mode: "truncate_episodes"
kl_threshold: 0.00001
accept_ratio: 0.5
critic_lr: 0.0005
diff --git a/marllib/marl/algos/hyperparams/finetuned/mpe/vda2c.yaml b/marllib/marl/algos/hyperparams/finetuned/mpe/vda2c.yaml
index e11990b1..7053131f 100644
--- a/marllib/marl/algos/hyperparams/finetuned/mpe/vda2c.yaml
+++ b/marllib/marl/algos/hyperparams/finetuned/mpe/vda2c.yaml
@@ -29,7 +29,7 @@ algo_args:
lambda: 1.0
vf_loss_coeff: 1.0
batch_episode: 128
- batch_mode: "complete_episodes"
+ batch_mode: "truncate_episodes"
lr: 0.0005
entropy_coeff: 0.01
mixer: "qmix" # vdn
diff --git a/marllib/marl/algos/hyperparams/finetuned/mpe/vdppo.yaml b/marllib/marl/algos/hyperparams/finetuned/mpe/vdppo.yaml
index dc45d4cb..5df3d881 100644
--- a/marllib/marl/algos/hyperparams/finetuned/mpe/vdppo.yaml
+++ b/marllib/marl/algos/hyperparams/finetuned/mpe/vdppo.yaml
@@ -35,5 +35,5 @@ algo_args:
entropy_coeff: 0.01
clip_param: 0.3
vf_clip_param: 20.0
- batch_mode: "complete_episodes"
+ batch_mode: "truncate_episodes"
mixer: "qmix" # qmix or vdn
diff --git a/marllib/marl/algos/hyperparams/test/coma.yaml b/marllib/marl/algos/hyperparams/test/coma.yaml
index f320a3f1..e3019d38 100644
--- a/marllib/marl/algos/hyperparams/test/coma.yaml
+++ b/marllib/marl/algos/hyperparams/test/coma.yaml
@@ -29,6 +29,6 @@ algo_args:
lambda: 1.0
vf_loss_coeff: 1.0
batch_episode: 2
- batch_mode: "complete_episodes"
+ batch_mode: "truncate_episodes"
lr: 0.0005
entropy_coeff: 0.01
diff --git a/marllib/marl/algos/hyperparams/test/facmac.yaml b/marllib/marl/algos/hyperparams/test/facmac.yaml
index 40c0d4df..e41bbc68 100644
--- a/marllib/marl/algos/hyperparams/test/facmac.yaml
+++ b/marllib/marl/algos/hyperparams/test/facmac.yaml
@@ -36,6 +36,6 @@ algo_args:
buffer_size_episode: 10
target_network_update_freq_episode: 1
tau: 0.002
- batch_mode: "complete_episodes"
+ batch_mode: "truncate_episodes"
mixer: "qmix" # qmix or vdn
diff --git a/marllib/marl/algos/hyperparams/test/happo.yaml b/marllib/marl/algos/hyperparams/test/happo.yaml
index 85ed5d79..dfbbc47d 100644
--- a/marllib/marl/algos/hyperparams/test/happo.yaml
+++ b/marllib/marl/algos/hyperparams/test/happo.yaml
@@ -38,4 +38,4 @@ algo_args:
entropy_coeff: 0.01
vf_clip_param: 10.0
min_lr_schedule: 1e-11
- batch_mode: "complete_episodes"
\ No newline at end of file
+ batch_mode: "truncate_episodes"
\ No newline at end of file
diff --git a/marllib/marl/algos/hyperparams/test/hatrpo.yaml b/marllib/marl/algos/hyperparams/test/hatrpo.yaml
index 3b74bca1..33af497c 100644
--- a/marllib/marl/algos/hyperparams/test/hatrpo.yaml
+++ b/marllib/marl/algos/hyperparams/test/hatrpo.yaml
@@ -34,7 +34,7 @@ algo_args:
vf_loss_coeff: 1.0
entropy_coeff: 0.01
vf_clip_param: 10.0
- batch_mode: "complete_episodes"
+ batch_mode: "truncate_episodes"
kl_threshold: 0.00001
accept_ratio: 0.5
critic_lr: 0.00005
diff --git a/marllib/marl/algos/hyperparams/test/ia2c.yaml b/marllib/marl/algos/hyperparams/test/ia2c.yaml
index faed5009..5d830e6a 100644
--- a/marllib/marl/algos/hyperparams/test/ia2c.yaml
+++ b/marllib/marl/algos/hyperparams/test/ia2c.yaml
@@ -29,6 +29,6 @@ algo_args:
lambda: 1.0
vf_loss_coeff: 1.0
batch_episode: 2
- batch_mode: "complete_episodes"
+ batch_mode: "truncate_episodes"
lr: 0.0005
entropy_coeff: 0.01
diff --git a/marllib/marl/algos/hyperparams/test/iddpg.yaml b/marllib/marl/algos/hyperparams/test/iddpg.yaml
index d52f814d..a1f237f3 100644
--- a/marllib/marl/algos/hyperparams/test/iddpg.yaml
+++ b/marllib/marl/algos/hyperparams/test/iddpg.yaml
@@ -36,5 +36,5 @@ algo_args:
buffer_size_episode: 10
target_network_update_freq_episode: 1
tau: 0.002
- batch_mode: "complete_episodes"
+ batch_mode: "truncate_episodes"
diff --git a/marllib/marl/algos/hyperparams/test/ippo.yaml b/marllib/marl/algos/hyperparams/test/ippo.yaml
index c40456a9..e13de22e 100644
--- a/marllib/marl/algos/hyperparams/test/ippo.yaml
+++ b/marllib/marl/algos/hyperparams/test/ippo.yaml
@@ -35,5 +35,5 @@ algo_args:
entropy_coeff: 0.01
clip_param: 0.3
vf_clip_param: 10.0
- batch_mode: "complete_episodes"
+ batch_mode: "truncate_episodes"
diff --git a/marllib/marl/algos/hyperparams/test/itrpo.yaml b/marllib/marl/algos/hyperparams/test/itrpo.yaml
index ed85d536..ce0093d6 100644
--- a/marllib/marl/algos/hyperparams/test/itrpo.yaml
+++ b/marllib/marl/algos/hyperparams/test/itrpo.yaml
@@ -34,7 +34,7 @@ algo_args:
vf_loss_coeff: 1.0
entropy_coeff: 0.01
vf_clip_param: 10.0
- batch_mode: "complete_episodes"
+ batch_mode: "truncate_episodes"
kl_threshold: 0.00001
accept_ratio: 0.5
critic_lr: 0.00005
diff --git a/marllib/marl/algos/hyperparams/test/maa2c.yaml b/marllib/marl/algos/hyperparams/test/maa2c.yaml
index 1a199a75..cca3b1e3 100644
--- a/marllib/marl/algos/hyperparams/test/maa2c.yaml
+++ b/marllib/marl/algos/hyperparams/test/maa2c.yaml
@@ -29,6 +29,6 @@ algo_args:
lambda: 1.0
vf_loss_coeff: 1.0
batch_episode: 2
- batch_mode: "complete_episodes"
+ batch_mode: "truncate_episodes"
lr: 0.0005
entropy_coeff: 0.01
diff --git a/marllib/marl/algos/hyperparams/test/maddpg.yaml b/marllib/marl/algos/hyperparams/test/maddpg.yaml
index efe7c914..a4f3197c 100644
--- a/marllib/marl/algos/hyperparams/test/maddpg.yaml
+++ b/marllib/marl/algos/hyperparams/test/maddpg.yaml
@@ -36,5 +36,5 @@ algo_args:
buffer_size_episode: 10
target_network_update_freq_episode: 1
tau: 0.002
- batch_mode: "complete_episodes"
+ batch_mode: "truncate_episodes"
diff --git a/marllib/marl/algos/hyperparams/test/mappo.yaml b/marllib/marl/algos/hyperparams/test/mappo.yaml
index f13392e2..c96c5f9a 100644
--- a/marllib/marl/algos/hyperparams/test/mappo.yaml
+++ b/marllib/marl/algos/hyperparams/test/mappo.yaml
@@ -35,6 +35,6 @@ algo_args:
entropy_coeff: 0.01
clip_param: 0.3
vf_clip_param: 10.0
- batch_mode: "complete_episodes"
+ batch_mode: "truncate_episodes"
diff --git a/marllib/marl/algos/hyperparams/test/matrpo.yaml b/marllib/marl/algos/hyperparams/test/matrpo.yaml
index 915e843d..29972443 100644
--- a/marllib/marl/algos/hyperparams/test/matrpo.yaml
+++ b/marllib/marl/algos/hyperparams/test/matrpo.yaml
@@ -34,7 +34,7 @@ algo_args:
vf_loss_coeff: 1.0
entropy_coeff: 0.01
vf_clip_param: 10.0
- batch_mode: "complete_episodes"
+ batch_mode: "truncate_episodes"
kl_threshold: 0.00001
accept_ratio: 0.5
critic_lr: 0.00005
diff --git a/marllib/marl/algos/hyperparams/test/vda2c.yaml b/marllib/marl/algos/hyperparams/test/vda2c.yaml
index 3f0bd5c4..c3889033 100644
--- a/marllib/marl/algos/hyperparams/test/vda2c.yaml
+++ b/marllib/marl/algos/hyperparams/test/vda2c.yaml
@@ -29,7 +29,7 @@ algo_args:
lambda: 1.0
vf_loss_coeff: 1.0
batch_episode: 2
- batch_mode: "complete_episodes"
+ batch_mode: "truncate_episodes"
lr: 0.0005
entropy_coeff: 0.01
mixer: "qmix" # vdn
diff --git a/marllib/marl/algos/hyperparams/test/vdppo.yaml b/marllib/marl/algos/hyperparams/test/vdppo.yaml
index b2b4c59e..0e792bec 100644
--- a/marllib/marl/algos/hyperparams/test/vdppo.yaml
+++ b/marllib/marl/algos/hyperparams/test/vdppo.yaml
@@ -35,5 +35,5 @@ algo_args:
entropy_coeff: 0.01
clip_param: 0.3
vf_clip_param: 10.0
- batch_mode: "complete_episodes"
+ batch_mode: "truncate_episodes"
mixer: "qmix" # qmix or vdn
diff --git a/marllib/marl/ray/ray.yaml b/marllib/marl/ray/ray.yaml
index 9ba4d281..b58285be 100644
--- a/marllib/marl/ray/ray.yaml
+++ b/marllib/marl/ray/ray.yaml
@@ -24,7 +24,7 @@
local_mode: False # True for debug mode only
share_policy: "group" # individual(separate) / group(division) / all(share)
-evaluation_interval: 10 # evaluate model every 10 training iterations
+evaluation_interval: 50 # evaluate model every 10 training iterations
framework: "torch"
num_workers: 1 # thread number
num_gpus: 1 # gpu to use