Skip to content

Commit

Permalink
Merge branch 'master' of github.com:cameronangliss/poke-env into open…
Browse files Browse the repository at this point in the history
…-sheets-fill-battle-obj
  • Loading branch information
cameronangliss committed Jan 4, 2025
2 parents 2178ac2 + ea4b5f3 commit 3b1d398
Show file tree
Hide file tree
Showing 22 changed files with 474 additions and 578 deletions.
2 changes: 1 addition & 1 deletion docs/source/examples/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,4 +11,4 @@ This page lists detailled examples demonstrating how to use this package. They a
quickstart
using_a_custom_teambuilder
connecting_to_showdown_and_challenging_humans
rl_with_open_ai_gym_wrapper
rl_with_gymnasium_wrapper
Original file line number Diff line number Diff line change
@@ -1,18 +1,18 @@
.. _rl_with_open_ai_gym_wrapper:
.. _rl_with_gymnasium_wrapper:

Reinforcement learning with the OpenAI Gym wrapper
Reinforcement learning with the Gymnasium wrapper
==================================================

The corresponding complete source code can be found `here <https://github.com/hsahovic/poke-env/blob/master/examples/rl_with_new_open_ai_gym_wrapper.py>`__.
The corresponding complete source code can be found `here <https://github.com/hsahovic/poke-env/blob/master/examples/rl_with_new_gymnasium_wrapper.py>`__.

The goal of this example is to demonstrate how to use the `open ai gym <https://gym.openai.com/>`__ interface proposed by ``EnvPlayer``, and to train a simple deep reinforcement learning agent comparable in performance to the ``MaxDamagePlayer`` we created in :ref:`max_damage_player`.
The goal of this example is to demonstrate how to use the `farama gymnasium <https://gymnasium.farama.org/>`__ interface proposed by ``EnvPlayer``, and to train a simple deep reinforcement learning agent comparable in performance to the ``MaxDamagePlayer`` we created in :ref:`max_damage_player`.

.. note:: This example necessitates `keras-rl <https://github.com/keras-rl/keras-rl>`__ (compatible with Tensorflow 1.X) or `keras-rl2 <https://github.com/wau/keras-rl2>`__ (Tensorflow 2.X), which implement numerous reinforcement learning algorithms and offer a simple API fully compatible with the Open AI Gym API. You can install them by running ``pip install keras-rl`` or ``pip install keras-rl2``. If you are unsure, ``pip install keras-rl2`` is recommended.
.. note:: This example necessitates `keras-rl <https://github.com/keras-rl/keras-rl>`__ (compatible with Tensorflow 1.X) or `keras-rl2 <https://github.com/wau/keras-rl2>`__ (Tensorflow 2.X), which implement numerous reinforcement learning algorithms and offer a simple API fully compatible with the Gymnasium API. You can install them by running ``pip install keras-rl`` or ``pip install keras-rl2``. If you are unsure, ``pip install keras-rl2`` is recommended.

Implementing rewards and observations
*************************************

The open ai gym API provides *rewards* and *observations* for each step of each episode. In our case, each step corresponds to one decision in a battle and battles correspond to episodes.
The Gymnasium API provides *rewards* and *observations* for each step of each episode. In our case, each step corresponds to one decision in a battle and battles correspond to episodes.

Defining observations
^^^^^^^^^^^^^^^^^^^^^
Expand All @@ -26,9 +26,9 @@ Observations are embeddings of the current state of the battle. They can be an a

To define our observations, we will create a custom ``embed_battle`` method. It takes one argument, a ``Battle`` object, and returns our embedding.

In addition to this, we also need to describe the embedding to the gym interface.
In addition to this, we also need to describe the embedding to the gymnasium interface.
To achieve this, we need to implement the ``describe_embedding`` method where we specify the low bound and the high bound
for each component of the embedding vector and return them as a ``gym.Space`` object.
for each component of the embedding vector and return them as a ``gymnasium.Space`` object.

Defining rewards
^^^^^^^^^^^^^^^^
Expand Down Expand Up @@ -108,7 +108,7 @@ Our player will play the ``gen8randombattle`` format. We can therefore inherit f
Instantiating and testing a player
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Now that our custom class is defined, we can instantiate our RL player and test if it's compliant with the OpenAI gym API.
Now that our custom class is defined, we can instantiate our RL player and test if it's compliant with the Gymnasium API.

.. code-block:: python
Expand Down Expand Up @@ -340,7 +340,7 @@ To use the ``cross_evaluate`` method, the strategy is the same to the one used f
Final result
************

Running the `whole file <https://github.com/hsahovic/poke-env/blob/master/examples/rl_with_new_open_ai_gym_wrapper.py>`__ should take a couple of minutes and print something similar to this:
Running the `whole file <https://github.com/hsahovic/poke-env/blob/master/examples/rl_with_gymnasium_wrapper.py>`__ should take a couple of minutes and print something similar to this:

.. code-block:: console
Expand Down
2 changes: 1 addition & 1 deletion docs/source/getting_started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ Agents in ``poke-env`` are instances of the ``Player`` class. Explore the follow

- Basic agent: :ref:`/examples/cross_evaluate_random_players.ipynb`
- Advanced agent: :ref:`max_damage_player`
- RL agent: :ref:`rl_with_open_ai_gym_wrapper`
- RL agent: :ref:`rl_with_gymnasium_wrapper`
- Using teams: :ref:`ou_max_player`
- Custom team builder: :ref:`using_a_custom_teambuilder`

Expand Down
2 changes: 1 addition & 1 deletion docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Poke-env: A Python Interface for Training Reinforcement Learning Pokémon Bots

Poke-env provides an environment for engaging in `Pokémon Showdown <https://pokemonshowdown.com/>`__ battles with a focus on reinforcement learning.

It boasts a straightforward API for handling Pokémon, Battles, Moves, and other battle-centric objects, alongside an `OpenAI Gym <https://gym.openai.com/>`__ interface for training agents.
It boasts a straightforward API for handling Pokémon, Battles, Moves, and other battle-centric objects, alongside a `Farama Gymnasium <https://gymnasium.farama.org/>`__ interface for training agents.

.. attention:: While poke-env aims to support all Pokémon generations, it was primarily developed with the latest generations in mind. If you discover any missing or incorrect functionalities for earlier generations, please `open an issue <https://github.com/hsahovic/poke-env/issues>`__ to help improve the library.

Expand Down
4 changes: 2 additions & 2 deletions docs/source/modules/player.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,10 @@ Player
:undoc-members:
:show-inheritance:

OpenAIGymEnv
GymnasiumEnv
************

.. automodule:: poke_env.player.openai_api
.. automodule:: poke_env.player.gymnasium_api
:members:
:undoc-members:
:show-inheritance:
Expand Down
20 changes: 10 additions & 10 deletions examples/openai_example.py → examples/gymnasium_example.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,13 @@
from poke_env.environment.abstract_battle import AbstractBattle
from poke_env.player import (
Gen8EnvSinglePlayer,
GymnasiumEnv,
ObservationType,
OpenAIGymEnv,
RandomPlayer,
)


class TestEnv(OpenAIGymEnv):
class TestEnv(GymnasiumEnv):
def __init__(self, **kwargs):
self.opponent = RandomPlayer(
battle_format="gen8randombattle",
Expand Down Expand Up @@ -66,31 +66,31 @@ def describe_embedding(self) -> Space:
return Box(np.array([0, 0]), np.array([6, 6]), dtype=int)


def openai_api():
gym_env = TestEnv(
def gymnasium_api():
gymnasium_env = TestEnv(
battle_format="gen8randombattle",
server_configuration=LocalhostServerConfiguration,
start_challenging=True,
)
check_env(gym_env)
gym_env.close()
check_env(gymnasium_env)
gymnasium_env.close()


def env_player():
opponent = RandomPlayer(
battle_format="gen8randombattle",
server_configuration=LocalhostServerConfiguration,
)
gym_env = Gen8(
gymnasium_env = Gen8(
battle_format="gen8randombattle",
server_configuration=LocalhostServerConfiguration,
start_challenging=True,
opponent=opponent,
)
check_env(gym_env)
gym_env.close()
check_env(gymnasium_env)
gymnasium_env.close()


if __name__ == "__main__":
openai_api()
gymnasium_api()
env_player()
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ def describe_embedding(self) -> Space:

async def main():
# First test the environment to ensure the class is consistent
# with the OpenAI API
# with the Gymnasium API
opponent = RandomPlayer(battle_format="gen8randombattle")
test_env = SimpleRLPlayer(
battle_format="gen8randombattle", start_challenging=True, opponent=opponent
Expand Down
106 changes: 32 additions & 74 deletions integration_tests/test_env_player.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import numpy as np
import pytest
from gymnasium.spaces import Box, Space
from gymnasium.utils.env_checker import check_env
from pettingzoo.test.parallel_test import parallel_api_test

from poke_env.player import (
Gen4EnvSinglePlayer,
Expand All @@ -10,7 +10,6 @@
Gen7EnvSinglePlayer,
Gen8EnvSinglePlayer,
Gen9EnvSinglePlayer,
RandomPlayer,
)


Expand Down Expand Up @@ -80,81 +79,61 @@ def embed_battle(self, battle):
return np.array([0])


def play_function(player, n_battles):
def play_function(env, n_battles):
for _ in range(n_battles):
done = False
player.reset()
env.reset()
while not done:
_, _, terminated, truncated, _ = player.step(player.action_space.sample())
done = terminated or truncated
actions = {name: env.action_space(name).sample() for name in env.agents}
_, _, terminated, truncated, _ = env.step(actions)
done = any(terminated.values()) or any(truncated.values())


@pytest.mark.timeout(30)
def test_random_gym_players_gen4():
random_player = RandomPlayer(battle_format="gen4randombattle", log_level=25)
env_player = RandomGen4EnvPlayer(
log_level=25, opponent=random_player, start_challenging=False
)
def test_random_gymnasium_players_gen4():
env_player = RandomGen4EnvPlayer(log_level=25, start_challenging=False)
env_player.start_challenging(3)
play_function(env_player, 3)


@pytest.mark.timeout(30)
def test_random_gym_players_gen5():
random_player = RandomPlayer(battle_format="gen5randombattle", log_level=25)
env_player = RandomGen5EnvPlayer(
log_level=25, opponent=random_player, start_challenging=False
)
def test_random_gymnasium_players_gen5():
env_player = RandomGen5EnvPlayer(log_level=25, start_challenging=False)
env_player.start_challenging(3)
play_function(env_player, 3)


@pytest.mark.timeout(30)
def test_random_gym_players_gen6():
random_player = RandomPlayer(battle_format="gen6randombattle", log_level=25)
env_player = RandomGen6EnvPlayer(
log_level=25, opponent=random_player, start_challenging=False
)
def test_random_gymnasium_players_gen6():
env_player = RandomGen6EnvPlayer(log_level=25, start_challenging=False)
env_player.start_challenging(3)
play_function(env_player, 3)


@pytest.mark.timeout(30)
def test_random_gym_players_gen7():
random_player = RandomPlayer(battle_format="gen7randombattle", log_level=25)
env_player = RandomGen7EnvPlayer(
log_level=25, opponent=random_player, start_challenging=False
)
def test_random_gymnasium_players_gen7():
env_player = RandomGen7EnvPlayer(log_level=25, start_challenging=False)
env_player.start_challenging(3)
play_function(env_player, 3)


@pytest.mark.timeout(30)
def test_random_gym_players_gen8():
random_player = RandomPlayer(battle_format="gen8randombattle", log_level=25)
env_player = RandomGen8EnvPlayer(
log_level=25, opponent=random_player, start_challenging=False
)
def test_random_gymnasium_players_gen8():
env_player = RandomGen8EnvPlayer(log_level=25, start_challenging=False)
env_player.start_challenging(3)
play_function(env_player, 3)


@pytest.mark.timeout(30)
def test_random_gym_players_gen9():
random_player = RandomPlayer(battle_format="gen9randombattle", log_level=25)
env_player = RandomGen9EnvPlayer(
log_level=25, opponent=random_player, start_challenging=False
)
def test_random_gymnasium_players_gen9():
env_player = RandomGen9EnvPlayer(log_level=25, start_challenging=False)
env_player.start_challenging(3)
play_function(env_player, 3)


@pytest.mark.timeout(60)
def test_two_successive_calls_gen8():
random_player = RandomPlayer(battle_format="gen8randombattle", log_level=25)
env_player = RandomGen8EnvPlayer(
log_level=25, opponent=random_player, start_challenging=False
)
env_player = RandomGen8EnvPlayer(log_level=25, start_challenging=False)
env_player.start_challenging(2)
play_function(env_player, 2)
env_player.start_challenging(2)
Expand All @@ -163,10 +142,7 @@ def test_two_successive_calls_gen8():

@pytest.mark.timeout(60)
def test_two_successive_calls_gen9():
random_player = RandomPlayer(battle_format="gen9randombattle", log_level=25)
env_player = RandomGen9EnvPlayer(
log_level=25, opponent=random_player, start_challenging=False
)
env_player = RandomGen9EnvPlayer(log_level=25, start_challenging=False)
env_player.start_challenging(2)
play_function(env_player, 2)
env_player.start_challenging(2)
Expand All @@ -175,39 +151,21 @@ def test_two_successive_calls_gen9():

@pytest.mark.timeout(60)
def test_check_envs():
random_player = RandomPlayer(battle_format="gen4randombattle", log_level=25)
env_player_gen4 = RandomGen4EnvPlayer(
log_level=25, opponent=random_player, start_challenging=True
)
check_env(env_player_gen4)
env_player_gen4 = RandomGen4EnvPlayer(log_level=25, start_challenging=True)
parallel_api_test(env_player_gen4)
env_player_gen4.close()
random_player = RandomPlayer(battle_format="gen5randombattle", log_level=25)
env_player_gen5 = RandomGen5EnvPlayer(
log_level=25, opponent=random_player, start_challenging=True
)
check_env(env_player_gen5)
env_player_gen5 = RandomGen5EnvPlayer(log_level=25, start_challenging=True)
parallel_api_test(env_player_gen5)
env_player_gen5.close()
random_player = RandomPlayer(battle_format="gen6randombattle", log_level=25)
env_player_gen6 = RandomGen6EnvPlayer(
log_level=25, opponent=random_player, start_challenging=True
)
check_env(env_player_gen6)
env_player_gen6 = RandomGen6EnvPlayer(log_level=25, start_challenging=True)
parallel_api_test(env_player_gen6)
env_player_gen6.close()
random_player = RandomPlayer(battle_format="gen7randombattle", log_level=25)
env_player_gen7 = RandomGen7EnvPlayer(
log_level=25, opponent=random_player, start_challenging=True
)
check_env(env_player_gen7)
env_player_gen7 = RandomGen7EnvPlayer(log_level=25, start_challenging=True)
parallel_api_test(env_player_gen7)
env_player_gen7.close()
random_player = RandomPlayer(battle_format="gen8randombattle", log_level=25)
env_player_gen8 = RandomGen8EnvPlayer(
log_level=25, opponent=random_player, start_challenging=True
)
check_env(env_player_gen8)
env_player_gen8 = RandomGen8EnvPlayer(log_level=25, start_challenging=True)
parallel_api_test(env_player_gen8)
env_player_gen8.close()
random_player = RandomPlayer(battle_format="gen9randombattle", log_level=25)
env_player_gen9 = RandomGen9EnvPlayer(
log_level=25, opponent=random_player, start_challenging=True
)
check_env(env_player_gen9)
env_player_gen9 = RandomGen9EnvPlayer(log_level=25, start_challenging=True)
parallel_api_test(env_player_gen9)
env_player_gen9.close()
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"

[project]
name = "poke_env"
version = "0.8.2"
version = "0.8.3"
description = "A python interface for training Reinforcement Learning bots to battle on pokemon showdown."
readme = "README.md"
requires-python = ">=3.9.0"
Expand Down
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
gymnasium
numpy
orjson
pettingzoo
requests
tabulate
websockets==12.0
1 change: 1 addition & 0 deletions src/poke_env/environment/effect.py
Original file line number Diff line number Diff line change
Expand Up @@ -828,6 +828,7 @@ def is_from_move(self) -> bool:
"FLOWERVEIL": Effect.FLOWER_VEIL,
"FOCUSBAND": Effect.FOCUS_BAND,
"FOCUSENERGY": Effect.FOCUS_ENERGY,
"FOCUSPUNCH": Effect.FOCUS_PUNCH,
"FOLLOWME": Effect.FOLLOW_ME,
"FORESIGHT": Effect.FORESIGHT,
"FOREWARN": Effect.FOREWARN,
Expand Down
2 changes: 2 additions & 0 deletions src/poke_env/environment/pokemon.py
Original file line number Diff line number Diff line change
Expand Up @@ -276,6 +276,8 @@ def forme_change(self, species: str):

def heal(self, hp_status: str):
self.set_hp_status(hp_status)
if self.fainted:
self._status = None

def invert_boosts(self):
self._boosts = {k: -v for k, v in self._boosts.items()}
Expand Down
Loading

0 comments on commit 3b1d398

Please sign in to comment.