Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to PettingZoo API #10

Merged
merged 11 commits into from
Jul 18, 2024
Merged

Switch to PettingZoo API #10

merged 11 commits into from
Jul 18, 2024

Conversation

rchaput
Copy link
Contributor

@rchaput rchaput commented Jul 18, 2024

We previously used the Gymnasium library, which provides a standard and well-known API for single-agent Reinforcement Learning. However, our EthicalSmartGrid is inherently multi-agent; this required some tweaks to the Gym API.

PettingZoo is an alternative to Gymnasium, very similar and developed by the same foundation, that provides an API for multi-agent RL.
Switching to PettingZoo ensures that our environment will be compatible with most multi-agent learning algorithms; although our tweaks were common (for example, found in an older version of MultiParticleEnvironment), we cannot guarantee that all learning algorithms will follow the same tweaks.

* Added an `agents_by_name` attribute (dictionary) to access agents based on
  their name.
* Changed the previous `agents` attribute to a view over the dict's values.
  This should not change anything, as we can still iterate over this attribute.
* Constructed the dict and the view from the iterable of Agents passed in the
  constructor. This parameter was previously typed as a `List`, but was changed
  to `Iterable`, as we do not require it to be strictly a list. This does not
  change anything when using the class, as `Iterable` is a parent of `List`
  (i.e., passing a `List` will still work), and makes clearer that other types
  will work (e.g., passing a generator expression).

This commit is necessary and paves the way for using the "truly" multi-agent
PettingZoo API, instead of the single-agent Gymnasium API (that we extended
in our own ways to support multi-agent).
…f namedtuples

The classes for the Observations used `namedtuple`, which was great to ensure
they are immutable, and that users can access their fields by names, e.g.,
`obs.comfort` (instead of having to find their index in a given list).

However, namedtuples bring 2 major disadvantages:
* they cannot be inherited (and thus extended) easily;
* only the fields' names are specified, not their types.

Instead, `dataclasses` are a more recent and better tool to create such classes
that are immutable, which fields can be accessed by names, to which we can add
functions, and which can be extended through inheritance.

This commit thus refactors those classes to use `dataclasses` instead; the
usage should be quite similar, with the following additions:
* The `fields` method can be used to get the fields' names (whereas it
  previously required defining the list of names somewhere or using private
  methods).
* The `asdict` method can be used to represent the observation as a dictionary,
  when a dataclass does not do the job.
* The `__array__` magic method simplifies the transformation into a NumPy
  ndarray: an observation can be passed to `np.ndarray(obs)` to efficiently
  (and easily) obtain such arrays. This is particularly useful, since learning
  and decision-making algorithms typically use ndarrays. We can thus pass
  dataclasses around, knowing that they will be converted automatically to
  ndarrays when necessary.

The `Observation` class is temporarily changed to avoid raising exceptions
(because it can no longer access the `GlobalObservation` and `LocalObservation`
fields), but will be deeply refactored.
* Observations (Local and Global) now use dataclasses instead of namedtuples.
* The base class `BaseObservation` was added to avoid re-implementing some common
  methods, such as `fields`, `asdict`, and the magic method `__array__`. This
  also helps with type hints.
* Removed the previous `Observation` class, which was completely useless.
* In order to support passing a custom class for local and global observations,
  we now create dynamically a new `Observation` dataclass that merges all values
  from these given classes (by default, `LocalObservation` and `GlobalObservation`
  are used as previously).
* This dynamic dataclass is instantiated by the `ObservationManager` when computing
  observations, by using calculations from the global and local observations.
  It also holds references to the original global and local observations, so that
  users can choose to get them if their learning algorithm makes a distinction
  between local and global (e.g., COMA).

Notes:
* For now, we determine the observation space directly in `BaseObservation`, by
  assuming a range of ``[0, 1]`` for every field. In the future, it would be better
  to override the method in the derived classes...
* We also created an empty abstract class `Observation` that only serves for type
  hints. Because the true `Observation` dataclass is dynamically created, we cannot
  use it as type hints. If we use `BaseObservation` instead, we lack information
  about the `create`, `get_global_observation` and `get_local_observation` methods.
  This empty class declares these methods (but does nothing), so that the type hints
  are truly helpful.

It represents a massive breaking change from the way we used observations previously,
but will make it easier for both "simple" (using the observations as-is) and
"advanced" (splitting global and local observations) use-cases in the future.
Force the qualname (fully qualified name) to be `'Observation'` instead of the longer
(but more accurate) `_create_observation_type.<locals>._Observation`. This is
especially important when people want to get a string representation of an Observation
(through the `str` or `repr` methods), because dataclasses return a string in the
following format: `qualname(field1=value1, field2=value2, ...)`.
When qualname is long and difficult to read (`<locals>`, etc.), this might confuse
users.

Another approach, which would not require to "lie" on the qualname, would be to
override the `__repr__` method, to reproduce a similar behaviour to the default
dataclass but hardcoding the first part to be `Observation` instead of the qualname.
It is a bit more complicated however, as it requires re-parsing the fields.
…sium

The Gymnasium API is inherently single-agent; we had tweaked it (in a similar
manner to an older version of MultiParticleEnvironment) to support multiple
agents. However, the "true" multi-agent API is PettingZoo, which is developed
by the same foundation (Farama) as Gymnasium.

This commit refactors the `SmartGrid` environment to inherit from PettingZoo's
`ParallelEnv` instead of the traditional Gymnasium's `Env`. `ParallelEnv`
refers to multiple agents acting at the same time in the environment (contrary
to the `ACE` way, in which agents act in turn), which corresponds to our
previous code.

It represents a breaking change, as the two APIs are different on several points:

* PettingZoo uses dictionaries almost everywhere, indexed by the agents' names,
  instead of lists (or even single elements in the traditional Gymnasium API).
* Observation and action spaces must be accessed through new methods that take an agent name. Previous code used list attributes, which means that new code should use `env.action_space(agent_name)` (in place of `env.action_space[agent_index]`).
* The `step` method takes actions as a dict instead of a list. Code should now
  use `step({agent_name: agent_action})`, instead of `step([agent_action])`.
* The `step` method returns a tuple of dicts, instead of a tuple of lists.
  The dicts hold the same data as the lists, but are again indexed by agents'
  names. The `_get_obs`, `_get_reward` and `_get_info` methods follow the same
  change.
* The `reset` method now returns two elements: the initial observations and
  an (empty) dictionary of additional information. This follows the PettingZoo
  API.
* The `agents` property iterates on the agents' **names**, which can be useful, e.g., to construct the actions dictionary. PettingZoo requires to property to return names (IDs); to get the agents themselves, see the `get_agent(agent_name)` method.
* The `n_agents` property is replaced with `num_agent`.
* The gymnasium Env instantiated a `_np_random` generator in its `reset` method,
  but PettingZoo's ParallelEnv does not. We must thus define this attribute in our
  own `SmartGrid` class. In addition, `ParallelEnv.reset()` raises a
  `NotImplementedError`, which means we should not call `super()`.
* The environment is no longer `register`ed as a Gymnasium environment.
Our custom wrappers used the Gymnasium API, which is not compatible with
PettingZoo. In addition, PettingZoo only has wrappers for their AEC envs
(turn-by-turn, not all agents in parallel). It would hinder performances
to transform our environment to AEC, wrap it, then back to Parallel.
Instead, we have made our `RewardAggregator` base class the equivalent
of the `RewardWrapper` from Gymnasium: it provides access to the wrapped
class seamlessly and allows to intercept the rewards through its
`reward` method.

All our reward aggregators now return dicts instead of lists, to follow
the PettingZoo API.

It would have been overkill to allow intercepting other values as well
(e.g., obs, infos, ...), because we have no need for it now. It would be
fairly easy to do in the future if it becomes necessary (provide other
methods, and call them in `step` similarly to how we call `reward`).

Because we inherit from `SmartGrid` to make the wrapper act as a drop-in replacement, the magic method `__getattr__` does not "intercept" method calls if they are found in the inheritance graph. Instead, we have to use `__getattribute__` which intercepts *all* accesses and is a bit more complicated to use.
All attributes or methods that must be accessed directly in the wrapper (instead of the wrapped env) *MUST* start with an underscore (`_`) or be listed as an exception in `__getattribute__`. Otherwise, the code will crash or loop infinitely.
Because the SmartGrid env changed, we must also update the learning algorithms (models).

* Observations are now dicts (one value per agent), and no longer split between local and global (we do not need to reconstruct them anymore).
* Actions should be dicts, not lists.
* Agents must be accessed by names instead of indices.
Because the SmartGrid env changed, we must update our tests.
* Documentation now mentions PettingZoo instead of Gymnasium.
* Updated examples to follow the PettingZoo API (e.g., `obs, _ = env.reset()`).
* Removed obsolete parts of documentation.
* Fixed a few typos.
* Improved appearance of some paragraphs.
@rchaput rchaput merged commit bb52e75 into master Jul 18, 2024
7 checks passed
@rchaput rchaput deleted the pettingzoo branch July 19, 2024 11:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant