[Proposal] Support Agent-Specific `state_space` in MAPPO

### Proposal
Currently, in multi-agent reinforcement learning (MARL) environments, we can define **observation spaces** separately for each agent using a dictionary. However, the `state_space`, which serves as the **global input for centralized critics in MAPPO**, is assumed to be **shared across all agents**.

I would like to propose extending **MAPPO** to support **agent-specific `state_space` definitions**, enabling:
- Independent **value functions per agent** (like IPPO).
- More flexible **privileged critic inputs** separate from actor observations.
- Hybrid training where some agents use a centralized critic while others do not.
- This can be thought of `state_space` that is global view from each agent's perspective.

Example:

Location: https://github.com/isaac-sim/IsaacLab/blob/ecf551f863a1638911c2c1851b2958b4be0d70e9/source/isaaclab_tasks/isaaclab_tasks/direct/shadow_hand_over/shadow_hand_over_env_cfg.py#L123

```python

@configclass
class ShadowHandOverEnvCfg(DirectMARLEnvCfg):
    possible_agents = ["right_hand", "left_hand"]
    action_spaces = {"right_hand": 20, "left_hand": 20}
    observation_spaces = {"right_hand": 157, "left_hand": 157}

    # Current (single shared state_space)
-   state_space = 290  

    # Proposed (agent-specific state_space)
+   state_space = {"right_hand": 290, "left_hand": 266}  
```

Location: https://github.com/isaac-sim/IsaacLab/blob/ecf551f863a1638911c2c1851b2958b4be0d70e9/source/isaaclab_tasks/isaaclab_tasks/direct/shadow_hand_over/shadow_hand_over_env.py#L227
```
def _get_states(self) -> torch.Tensor:
          # Current (single state shared across all agents) 
-        return state
         # Current (agent-specific state) 
          right_states = .....
          left_states = .....
+        return {'right_hand': right_states, 'left_hand': left_states}
```

### Alternatives
- **Use IPPO instead of MAPPO**: This works for independent critics but does not allow privileged information in the critic while maintaining decentralized execution.
- **Manually concatenate different state spaces**: This is inefficient and requires unnecessary additional computation for agents that do not need a shared critic.
- **Modify policy architecture outside MAPPO**: Requires significant changes and breaks existing frameworks designed for centralized training with decentralized execution (CTDE).

### Additional Context
- This feature would enable **heterogeneous critic inputs**, improving learning efficiency and performance in **asymmetric multi-agent scenarios**.
- The current assumption that `state_space` is **identical for all agents** is too restrictive.
- This change should be **backward-compatible**, ensuring that if `state_space` is defined as a single value, it still works as expected.
- To support this feature, PR has been initiated in `skrl` repository.  Please check them here. https://github.com/Toni-SM/skrl/pull/274

### Checklist
- [x] I have checked that there is no similar issue in the repo (**required**)

### Acceptance Criteria
- [x] Allow specifying `state_space` as a dictionary per agent.
- [x] Ensure MAPPO correctly handles different critic inputs for different agents.
- [x] Maintain backward compatibility with the existing single `state_space` setup.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Proposal] Support Agent-Specific `state_space` in MAPPO #1887

Proposal

Alternatives

Additional Context

Checklist

Acceptance Criteria

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Proposal] Support Agent-Specific state_space in MAPPO #1887

Description

Proposal

Alternatives

Additional Context

Checklist

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[Proposal] Support Agent-Specific `state_space` in MAPPO #1887