-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch to PettingZoo API #10
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Added an `agents_by_name` attribute (dictionary) to access agents based on their name. * Changed the previous `agents` attribute to a view over the dict's values. This should not change anything, as we can still iterate over this attribute. * Constructed the dict and the view from the iterable of Agents passed in the constructor. This parameter was previously typed as a `List`, but was changed to `Iterable`, as we do not require it to be strictly a list. This does not change anything when using the class, as `Iterable` is a parent of `List` (i.e., passing a `List` will still work), and makes clearer that other types will work (e.g., passing a generator expression). This commit is necessary and paves the way for using the "truly" multi-agent PettingZoo API, instead of the single-agent Gymnasium API (that we extended in our own ways to support multi-agent).
…f namedtuples The classes for the Observations used `namedtuple`, which was great to ensure they are immutable, and that users can access their fields by names, e.g., `obs.comfort` (instead of having to find their index in a given list). However, namedtuples bring 2 major disadvantages: * they cannot be inherited (and thus extended) easily; * only the fields' names are specified, not their types. Instead, `dataclasses` are a more recent and better tool to create such classes that are immutable, which fields can be accessed by names, to which we can add functions, and which can be extended through inheritance. This commit thus refactors those classes to use `dataclasses` instead; the usage should be quite similar, with the following additions: * The `fields` method can be used to get the fields' names (whereas it previously required defining the list of names somewhere or using private methods). * The `asdict` method can be used to represent the observation as a dictionary, when a dataclass does not do the job. * The `__array__` magic method simplifies the transformation into a NumPy ndarray: an observation can be passed to `np.ndarray(obs)` to efficiently (and easily) obtain such arrays. This is particularly useful, since learning and decision-making algorithms typically use ndarrays. We can thus pass dataclasses around, knowing that they will be converted automatically to ndarrays when necessary. The `Observation` class is temporarily changed to avoid raising exceptions (because it can no longer access the `GlobalObservation` and `LocalObservation` fields), but will be deeply refactored.
* Observations (Local and Global) now use dataclasses instead of namedtuples. * The base class `BaseObservation` was added to avoid re-implementing some common methods, such as `fields`, `asdict`, and the magic method `__array__`. This also helps with type hints. * Removed the previous `Observation` class, which was completely useless. * In order to support passing a custom class for local and global observations, we now create dynamically a new `Observation` dataclass that merges all values from these given classes (by default, `LocalObservation` and `GlobalObservation` are used as previously). * This dynamic dataclass is instantiated by the `ObservationManager` when computing observations, by using calculations from the global and local observations. It also holds references to the original global and local observations, so that users can choose to get them if their learning algorithm makes a distinction between local and global (e.g., COMA). Notes: * For now, we determine the observation space directly in `BaseObservation`, by assuming a range of ``[0, 1]`` for every field. In the future, it would be better to override the method in the derived classes... * We also created an empty abstract class `Observation` that only serves for type hints. Because the true `Observation` dataclass is dynamically created, we cannot use it as type hints. If we use `BaseObservation` instead, we lack information about the `create`, `get_global_observation` and `get_local_observation` methods. This empty class declares these methods (but does nothing), so that the type hints are truly helpful. It represents a massive breaking change from the way we used observations previously, but will make it easier for both "simple" (using the observations as-is) and "advanced" (splitting global and local observations) use-cases in the future.
Force the qualname (fully qualified name) to be `'Observation'` instead of the longer (but more accurate) `_create_observation_type.<locals>._Observation`. This is especially important when people want to get a string representation of an Observation (through the `str` or `repr` methods), because dataclasses return a string in the following format: `qualname(field1=value1, field2=value2, ...)`. When qualname is long and difficult to read (`<locals>`, etc.), this might confuse users. Another approach, which would not require to "lie" on the qualname, would be to override the `__repr__` method, to reproduce a similar behaviour to the default dataclass but hardcoding the first part to be `Observation` instead of the qualname. It is a bit more complicated however, as it requires re-parsing the fields.
…sium The Gymnasium API is inherently single-agent; we had tweaked it (in a similar manner to an older version of MultiParticleEnvironment) to support multiple agents. However, the "true" multi-agent API is PettingZoo, which is developed by the same foundation (Farama) as Gymnasium. This commit refactors the `SmartGrid` environment to inherit from PettingZoo's `ParallelEnv` instead of the traditional Gymnasium's `Env`. `ParallelEnv` refers to multiple agents acting at the same time in the environment (contrary to the `ACE` way, in which agents act in turn), which corresponds to our previous code. It represents a breaking change, as the two APIs are different on several points: * PettingZoo uses dictionaries almost everywhere, indexed by the agents' names, instead of lists (or even single elements in the traditional Gymnasium API). * Observation and action spaces must be accessed through new methods that take an agent name. Previous code used list attributes, which means that new code should use `env.action_space(agent_name)` (in place of `env.action_space[agent_index]`). * The `step` method takes actions as a dict instead of a list. Code should now use `step({agent_name: agent_action})`, instead of `step([agent_action])`. * The `step` method returns a tuple of dicts, instead of a tuple of lists. The dicts hold the same data as the lists, but are again indexed by agents' names. The `_get_obs`, `_get_reward` and `_get_info` methods follow the same change. * The `reset` method now returns two elements: the initial observations and an (empty) dictionary of additional information. This follows the PettingZoo API. * The `agents` property iterates on the agents' **names**, which can be useful, e.g., to construct the actions dictionary. PettingZoo requires to property to return names (IDs); to get the agents themselves, see the `get_agent(agent_name)` method. * The `n_agents` property is replaced with `num_agent`. * The gymnasium Env instantiated a `_np_random` generator in its `reset` method, but PettingZoo's ParallelEnv does not. We must thus define this attribute in our own `SmartGrid` class. In addition, `ParallelEnv.reset()` raises a `NotImplementedError`, which means we should not call `super()`. * The environment is no longer `register`ed as a Gymnasium environment.
Our custom wrappers used the Gymnasium API, which is not compatible with PettingZoo. In addition, PettingZoo only has wrappers for their AEC envs (turn-by-turn, not all agents in parallel). It would hinder performances to transform our environment to AEC, wrap it, then back to Parallel. Instead, we have made our `RewardAggregator` base class the equivalent of the `RewardWrapper` from Gymnasium: it provides access to the wrapped class seamlessly and allows to intercept the rewards through its `reward` method. All our reward aggregators now return dicts instead of lists, to follow the PettingZoo API. It would have been overkill to allow intercepting other values as well (e.g., obs, infos, ...), because we have no need for it now. It would be fairly easy to do in the future if it becomes necessary (provide other methods, and call them in `step` similarly to how we call `reward`). Because we inherit from `SmartGrid` to make the wrapper act as a drop-in replacement, the magic method `__getattr__` does not "intercept" method calls if they are found in the inheritance graph. Instead, we have to use `__getattribute__` which intercepts *all* accesses and is a bit more complicated to use. All attributes or methods that must be accessed directly in the wrapper (instead of the wrapped env) *MUST* start with an underscore (`_`) or be listed as an exception in `__getattribute__`. Otherwise, the code will crash or loop infinitely.
Because the SmartGrid env changed, we must also update the learning algorithms (models). * Observations are now dicts (one value per agent), and no longer split between local and global (we do not need to reconstruct them anymore). * Actions should be dicts, not lists. * Agents must be accessed by names instead of indices.
Because the SmartGrid env changed, we must update our tests.
* Documentation now mentions PettingZoo instead of Gymnasium. * Updated examples to follow the PettingZoo API (e.g., `obs, _ = env.reset()`). * Removed obsolete parts of documentation. * Fixed a few typos. * Improved appearance of some paragraphs.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We previously used the Gymnasium library, which provides a standard and well-known API for single-agent Reinforcement Learning. However, our EthicalSmartGrid is inherently multi-agent; this required some tweaks to the Gym API.
PettingZoo is an alternative to Gymnasium, very similar and developed by the same foundation, that provides an API for multi-agent RL.
Switching to PettingZoo ensures that our environment will be compatible with most multi-agent learning algorithms; although our tweaks were common (for example, found in an older version of MultiParticleEnvironment), we cannot guarantee that all learning algorithms will follow the same tweaks.