-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Proposal
Currently, in multi-agent reinforcement learning (MARL) environments, we can define observation spaces separately for each agent using a dictionary. However, the state_space, which serves as the global input for centralized critics in MAPPO, is assumed to be shared across all agents.
I would like to propose extending MAPPO to support agent-specific state_space definitions, enabling:
- Independent value functions per agent (like IPPO).
- More flexible privileged critic inputs separate from actor observations.
- Hybrid training where some agents use a centralized critic while others do not.
- This can be thought of
state_spacethat is global view from each agent's perspective.
Example:
Location:
IsaacLab/source/isaaclab_tasks/isaaclab_tasks/direct/shadow_hand_over/shadow_hand_over_env_cfg.py
Line 123 in ecf551f
| state_space = 290 |
@configclass
class ShadowHandOverEnvCfg(DirectMARLEnvCfg):
possible_agents = ["right_hand", "left_hand"]
action_spaces = {"right_hand": 20, "left_hand": 20}
observation_spaces = {"right_hand": 157, "left_hand": 157}
# Current (single shared state_space)
- state_space = 290
# Proposed (agent-specific state_space)
+ state_space = {"right_hand": 290, "left_hand": 266} Location:
IsaacLab/source/isaaclab_tasks/isaaclab_tasks/direct/shadow_hand_over/shadow_hand_over_env.py
Line 227 in ecf551f
| def _get_states(self) -> torch.Tensor: |
def _get_states(self) -> torch.Tensor:
# Current (single state shared across all agents)
- return state
# Current (agent-specific state)
right_states = .....
left_states = .....
+ return {'right_hand': right_states, 'left_hand': left_states}
Alternatives
- Use IPPO instead of MAPPO: This works for independent critics but does not allow privileged information in the critic while maintaining decentralized execution.
- Manually concatenate different state spaces: This is inefficient and requires unnecessary additional computation for agents that do not need a shared critic.
- Modify policy architecture outside MAPPO: Requires significant changes and breaks existing frameworks designed for centralized training with decentralized execution (CTDE).
Additional Context
- This feature would enable heterogeneous critic inputs, improving learning efficiency and performance in asymmetric multi-agent scenarios.
- The current assumption that
state_spaceis identical for all agents is too restrictive. - This change should be backward-compatible, ensuring that if
state_spaceis defined as a single value, it still works as expected. - To support this feature, PR has been initiated in
skrlrepository. Please check them here. Add support for per-agentstate_spacein MAPPO Toni-SM/skrl#274
Checklist
- I have checked that there is no similar issue in the repo (required)
Acceptance Criteria
- Allow specifying
state_spaceas a dictionary per agent. - Ensure MAPPO correctly handles different critic inputs for different agents.
- Maintain backward compatibility with the existing single
state_spacesetup.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request