Skip to content

[Proposal] Support Agent-Specific state_space in MAPPO #1887

@bikcrum

Description

@bikcrum

Proposal

Currently, in multi-agent reinforcement learning (MARL) environments, we can define observation spaces separately for each agent using a dictionary. However, the state_space, which serves as the global input for centralized critics in MAPPO, is assumed to be shared across all agents.

I would like to propose extending MAPPO to support agent-specific state_space definitions, enabling:

  • Independent value functions per agent (like IPPO).
  • More flexible privileged critic inputs separate from actor observations.
  • Hybrid training where some agents use a centralized critic while others do not.
  • This can be thought of state_space that is global view from each agent's perspective.

Example:

Location:

@configclass
class ShadowHandOverEnvCfg(DirectMARLEnvCfg):
    possible_agents = ["right_hand", "left_hand"]
    action_spaces = {"right_hand": 20, "left_hand": 20}
    observation_spaces = {"right_hand": 157, "left_hand": 157}

    # Current (single shared state_space)
-   state_space = 290  

    # Proposed (agent-specific state_space)
+   state_space = {"right_hand": 290, "left_hand": 266}  

Location:

def _get_states(self) -> torch.Tensor:
          # Current (single state shared across all agents) 
-        return state
         # Current (agent-specific state) 
          right_states = .....
          left_states = .....
+        return {'right_hand': right_states, 'left_hand': left_states}

Alternatives

  • Use IPPO instead of MAPPO: This works for independent critics but does not allow privileged information in the critic while maintaining decentralized execution.
  • Manually concatenate different state spaces: This is inefficient and requires unnecessary additional computation for agents that do not need a shared critic.
  • Modify policy architecture outside MAPPO: Requires significant changes and breaks existing frameworks designed for centralized training with decentralized execution (CTDE).

Additional Context

  • This feature would enable heterogeneous critic inputs, improving learning efficiency and performance in asymmetric multi-agent scenarios.
  • The current assumption that state_space is identical for all agents is too restrictive.
  • This change should be backward-compatible, ensuring that if state_space is defined as a single value, it still works as expected.
  • To support this feature, PR has been initiated in skrl repository. Please check them here. Add support for per-agent state_space in MAPPO Toni-SM/skrl#274

Checklist

  • I have checked that there is no similar issue in the repo (required)

Acceptance Criteria

  • Allow specifying state_space as a dictionary per agent.
  • Ensure MAPPO correctly handles different critic inputs for different agents.
  • Maintain backward compatibility with the existing single state_space setup.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions