Skip to content

Commit

Permalink
Merge branch 'release/1.3.0'
Browse files Browse the repository at this point in the history
  • Loading branch information
ronaldosvieira committed Sep 19, 2022
2 parents 8fe6d14 + a60e6b4 commit cf02be3
Show file tree
Hide file tree
Showing 15 changed files with 555 additions and 438 deletions.
2 changes: 1 addition & 1 deletion CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -19,5 +19,5 @@ license: MIT
message: "If you use this software, please cite it as below."
repository-code: "https://github.com/ronaldosvieira/gym-locm"
title: "OpenAI Gym Environments for Legends of Code and Magic"
version: "1.2.0"
version: "1.3.0"
...
28 changes: 21 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -243,8 +243,9 @@ engine, and with a specific random seed:
### Train draft agents with deep reinforcement learning

We provide scripts to train deep reinforcement learning draft agents as described in our
thesis <a href="#vieira2020a">[2]</a> and paper <a href="#vieira2020b">[3]</a>. Further instructions are available in the README.md in
the [experiments](https://github.com/ronaldosvieira/gym-locm/tree/master/gym_locm/experiments)
thesis <a href="#vieira2020a">[2]</a> and SBGames 2020 paper <a href="#vieira2020b">[3]</a>.
Further instructions are available in the README.md file in
the [experiments](gym_locm/experiments)
package.

To install the dependencies necessary to run the scripts, install
Expand All @@ -253,16 +254,25 @@ the repository with
pip install -e .['experiments']
```

### Use trained draft agents

We provide a collection of draft agents trained with deep
We also provide a collection of draft agents trained with deep
reinforcement learning, and a script to use them in the LOCM's original engine.
Further details on these agents and instructions for the script are available in the
README.md in the
[trained_models](https://github.com/ronaldosvieira/gym-locm/tree/master/gym_locm/trained_models)
[trained_models](gym_locm/trained_models)
package. The use of these draft agents with the Runner script is not implemented yet.

### Train battle agents with deep reinforcement learning

We provide scripts to train deep reinforcement learning battle agents as described in our
SBGames 2022 paper <a href="#vieira2022a">[4]</a>. Further instructions are available
in the README.md file in the [experiments/papers/sbgames-2022](gym_locm/experiments/papers/sbgames-2022)
package.

The use of these draft agents with the Runner script is not implemented yet.
To install the dependencies necessary to run the scripts, install
the repository with
```python
pip install -e .['experiments']
```

## References
1. <span id="kowalski2020">Kowalski, J., Miernik, R. (2020). Evolutionary
Expand All @@ -276,5 +286,9 @@ of Minas Gerais, Belo Horizonte, Brazil.</span>
Collectible Card Games via Reinforcement Learning. 19th Brazilian Symposium of Computer Games
and Digital Entertainment (SBGames).</span>

4. <span id="vieira2022a">Vieira, R., Tavares, A. R., Chaimowicz, L. (2022). Exploring Deep
Reinforcement Learning for Battling in Collectible Card Games. 19th Brazilian Symposium
of Computer Games and Digital Entertainment (SBGames).</span>

## License
[MIT](https://choosealicense.com/licenses/mit/)
2 changes: 2 additions & 0 deletions gym_locm/agents.py
Original file line number Diff line number Diff line change
Expand Up @@ -1198,8 +1198,10 @@ def act(self, state):
"pass": PassBattleAgent,
"random": RandomBattleAgent,
"greedy": GreedyBattleAgent,
"osl": GreedyBattleAgent,
"rule-based": RuleBasedBattleAgent,
"max-attack": MaxAttackBattleAgent,
"ma": MaxAttackBattleAgent,
"coac": CoacBattleAgent,
"mcts": MCTSBattleAgent
}
Expand Down
37 changes: 32 additions & 5 deletions gym_locm/envs/battle.py
Original file line number Diff line number Diff line change
Expand Up @@ -200,13 +200,15 @@ def get_episode_rewards(self):

class LOCMBattleSingleEnv(LOCMBattleEnv):
def __init__(self, battle_agent=RandomBattleAgent(),
play_first=True, **kwargs):
play_first=True, alternate_roles=False, **kwargs):
# init the env
super().__init__(**kwargs)

# also init the battle agent and the new parameter
# also init the battle agent and the new parameters
self.battle_agent = battle_agent
self.play_first = play_first
self.alternate_roles = alternate_roles
self.rewards_single_player = []

# reset the battle agent
self.battle_agent.reset()
Expand All @@ -216,6 +218,9 @@ def reset(self) -> np.array:
Resets the environment.
The game is put into its initial state and all agents are reset.
"""
if self.alternate_roles:
self.play_first = not self.play_first

# reset what is needed
encoded_state = super().reset()

Expand All @@ -227,6 +232,8 @@ def reset(self) -> np.array:
while self.state.current_player.id != PlayerOrder.SECOND:
super().step(self.battle_agent.act(self.state))

self.rewards_single_player.append(0.0)

return encoded_state

def step(self, action):
Expand All @@ -253,17 +260,27 @@ def step(self, action):
if not self.play_first:
reward = -reward

try:
self.rewards_single_player[-1] += reward
except IndexError:
self.rewards_single_player = [reward]

return state, reward, done, info

def get_episode_rewards(self):
return self.rewards_single_player


class LOCMBattleSelfPlayEnv(LOCMBattleEnv):
def __init__(self, play_first=True, adversary_policy=None, **kwargs):
def __init__(self, play_first=True, alternate_roles=True, adversary_policy=None, **kwargs):
# init the env
super().__init__(**kwargs)

# also init the new parameters
self.play_first = play_first
self.adversary_policy = adversary_policy
self.alternate_roles = alternate_roles
self.rewards_single_player = []

def reset(self) -> np.array:
"""
Expand All @@ -273,8 +290,8 @@ def reset(self) -> np.array:
# reset what is needed
encoded_state = super().reset()

# also reset the battle agent
self.play_first = not self.play_first
if self.alternate_roles:
self.play_first = not self.play_first

# if playing second, have first player play
if not self.play_first:
Expand All @@ -288,6 +305,8 @@ def reset(self) -> np.array:
state, reward, done, info = super().step(0)
break

self.rewards_single_player.append(0.0)

return encoded_state

def step(self, action):
Expand Down Expand Up @@ -315,4 +334,12 @@ def step(self, action):
if not self.play_first:
reward = -reward

try:
self.rewards_single_player[-1] += reward
except IndexError:
self.rewards_single_player = [reward]

return state, reward, done, info

def get_episode_rewards(self):
return self.rewards_single_player
2 changes: 0 additions & 2 deletions gym_locm/envs/draft.py
Original file line number Diff line number Diff line change
Expand Up @@ -152,8 +152,6 @@ def step(self, action: Union[int, Action]) -> (np.array, int, bool, dict):

done = True

del info['turn']

if reward_before is None:
raw_rewards = (0.0,) * len(self.reward_functions)
else:
Expand Down
2 changes: 0 additions & 2 deletions gym_locm/envs/full_game.py
Original file line number Diff line number Diff line change
Expand Up @@ -107,8 +107,6 @@ def step(self, action):
if winner is not None:
reward = 1 if winner == PlayerOrder.FIRST else -1

del info['turn']

return self.encode_state(), reward, done, info

def _encode_state_battle(self):
Expand Down
4 changes: 2 additions & 2 deletions gym_locm/experiments/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ python3 gym_locm/experiments/hyp-search.py --approach <approach> --battle-agent
--path hyp_search_results/ --seed 96765 --processes 4
```

The list and range of hyperparameted explored is available in the Appendix of our paper and in Attachment A of
The list and range of hyperparameters explored is available in the Appendix of our paper and in Attachment A of
our thesis. we performed hyperparameter tunings for all combinations of `<approach>` (`immediate`, `history`
and `lstm`) and `<battle_agent>` (`max-attack` and `greedy`). Each run of the script took around 2 days with the
`max-attack` battle agent and more than a week with the `greedy` battle agent. To learn about other script's
Expand All @@ -37,7 +37,7 @@ python3 gym_locm/experiments/training.py --approach <approach> --battle-agent <b
```

We trained 20 draft agents (ten 1st players and 2nd second players) of each combination of `<approach>` and
`<battle agent>`, using the best sets of hyperparameters found for them in the previous experiment. That comprises
`<battle_agent>`, using the best sets of hyperparameters found for them in the previous experiment. That comprises
ten runs of the script, in which we used the seeds 32359627, 91615349, 88803987, 83140551, 50731732, 19279988, 35717793,
48046766, 86798618 and 62644993.

Expand Down
109 changes: 109 additions & 0 deletions gym_locm/experiments/papers/entcom-2022/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
# Reproducing the experiments from our Entertainment Computing 2022 paper

This readme file contains the information necessary to reproduce the experiments
from our paper in Entertainment Computing 2022 named "_Exploring Deep Reinforcement Learning for
Drafting in Collectible Card Games_." Please contact
me at [ronaldo.vieira@dcc.ufmg.br](mailto:ronaldo.vieira@dcc.ufmg.br) in case any
of the instructions below do not work.

The game engine for LOCM 1.2 can be found at [engine.py](../../../engine.py), which is used by the OpenAI
Gym environments (more info on the repository's main page). The implementation of our
approaches can be found in the experiment files mentioned below. The resulting agents can be found in the
[trained_models](../../../trained_models) folder, along with instructions on how to use them.

## Section 4.1: hyperparameter search

To perform a hyperparameter tuning, simply execute the [hyp-search.py](../../../experiments/hyp-search.py) script:

```
python3 gym_locm/experiments/hyp-search.py --approach <approach> --battle-agent <battle_agent> \
--path hyp_search_results/ --seed 96765 --processes 4
```

The list and range of hyperparameters explored is available in the Appendix A of our paper. we performed
hyperparameter tunings for all combinations of `<approach>` (`immediate`, `history` and `lstm`) and
`<battle_agent>` (`ma` and `osl`). To learn about the other script's parameters, execute it with the
`--help` flag.

## Section 4.2: comparison between our approaches

To train **two** draft agents (a 1st player and a 2nd player) with a specific draft approach and battle agent,
in asymmetric self-play, simply execute the [training.py](../../../experiments/training.py) script:

```
python3 gym_locm/experiments/training.py --approach <approach> --battle-agent <battle_agent> \
--path training_results/ --switch-freq <switch_freq> --layers <layers> --neurons <neurons> \
--act-fun <activation_function> --n-steps <batch_size> --nminibatches <n_minibatches> \
--noptepochs <n_epochs> --cliprange <cliprange> --vf-coef <vf_coef> --ent-coef <ent_coef> \
--learning-rate <learning_rate> --seed 32359627 --concurrency 4
```

We trained ten draft agents (five 1st players and five 2nd players) of each combination of `<approach>` and
`<battle_agent>`, using the best sets of hyperparameters found for them in the previous experiment. That comprises
five runs of the script, in which we used the seeds `32359627`, `91615349`, `88803987`, `83140551` and `50731732`.

To learn about the other script's parameters, execute it with the `--help` flag. Running the script with all default
parameters will train a `immediate` drafter with the `ma` battler, using the best set of hyperparameters
we found for that combination. The best set of hyperparameters for the other combinations is available in the
Appendix A of our paper.

## Section 4.3: comparison with other draft agents

To run one of the tournaments, simply execute the [tournament.py](../../../experiments/tournament.py) script:
```
python3 gym_locm/experiments/tournament.py \
--drafters random max-attack coac closet-ai icebox chad \
gym_locm/trained_models/<battle_agent>/immediate-1M/ \
gym_locm/trained_models/<battle_agent>/lstm-1M/ \
gym_locm/trained_models/<battle_agent>/history-1M/ \
--battler <battle_agent> --concurrency 4 --games 1000 --path tournament_results/ \
--seeds 32359627 91615349 88803987 83140551 50731732
```
replacing `<battle_agent>` for either `ma` or `osl`, respectively, to run either tournament as
depicted in our paper. The script will create files at `tournament_results/` describing
the individual win rates of every set of matches, the aggregate win rates, average mana curves (section 4.3.2)
and every individual draft choice made by every agent, in CSV format, for human inspection, and as serialized
Pandas data frames (PKL format), for easy further data manipulation. To learn about the other script's
parameters, execute it with the `--help` flag.

To reproduce the table of agent similarities and the plot containing the agent's three-dimensional coordinates
found via Principal Component Analysis and grouped via K-Means (section 4.3.3), simply execute the
[similarities.py](../../../experiments/similarities.py) script:
```
python3 gym_locm/experiments/similarities.py \
--files ma_tournament_results/choices.csv osl_tournament_results/choices.csv
```
which will result in files containing the similarities table (in CSV and PKL formats) and the plot (in PNG format)
created to the current folder.

## Section 4.4: agent improvement in the SCGAI competition

We used the source code of the Strategy Card Game AI competition
([2019](https://github.com/acatai/Strategy-Card-Game-AI-Competition/tree/master/contest-2019-08-COG) and
[2020](https://github.com/acatai/Strategy-Card-Game-AI-Competition/tree/master/contest-2020-08-COG) editions)
to re-run the matches, replacing the *max-attack* player (named Baseline2) with a personalized player featuring
our best draft agent and the battle portion on the *max-attack* player. This can be reproduced by altering line
11 (2019) or line 2 (2020) of the runner script (`run.sh`) from `AGENTS[10]="python3 Baseline2/main.py"` to
```bash
AGENTS[10]="python3 gym_locm/toolbox/predictor.py --battle \"python3 Baseline2/main.py\" \
--draft-1 path/to/gym_locm/trained_models/max-attack/immediate-1M/1st/6.json \
--draft-2 path/to/gym_locm/trained_models/max-attack/immediate-1M/2nd/8.json"
```
then, executing it. Parallelism can be achieved by running the script in multiple processes/machines. Save the
output to text files named `out-*.txt` (with a number instead of `*`) in the same folder, then run `analyze.py`
to extract win rates. The runner script can take up to several days, and the analyze script can take up to some hours.
See the [trained_models](../../../trained_models) package for more information on the predictor script.

## Section 4.5: importance of being history-aware in LOCM

This experiment is simply a re-execution of the OSL tournament from section 4.2, adding a new draft agent to the
tournament (`historyless`). To reproduce it, execute the following script:
```
python3 gym_locm/experiments/tournament.py \
--drafters random max-attack coac closet-ai icebox chad historyless \
gym_locm/trained_models/<battle_agent>/immediate-1M/ \
gym_locm/trained_models/<battle_agent>/lstm-1M/ \
gym_locm/trained_models/<battle_agent>/history-1M/ \
--battler osl --concurrency 4 --games 1000 --path osl_historyless_tournament_results/ \
--seeds 32359627 91615349 88803987 83140551 50731732
```
74 changes: 74 additions & 0 deletions gym_locm/experiments/papers/sbgames-2022/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
# Reproducing the experiments from our SBGames 2022 paper

This readme file contains the information necessary to reproduce the experiments
from our paper in SBGames 2022 named "_Exploring Deep Reinforcement Learning for
Battling in Collectible Card Games_." Although we mention in the paper that we use
gym-locm's version 1.3.0, any future version should also suffice. Please contact
me at [ronaldo.vieira@dcc.ufmg.br](mailto:ronaldo.vieira@dcc.ufmg.br) in case any
of the instructions below do not work.

## Experiment 1: hyperparameter search

We use Weights and Biases (W&B) to orchestrate our hyperparameter search. The
`hyp-search.yaml` file contains the search configuration, including hyperparameter
ranges. Having W&B installed, executing the following command on a terminal will
create a "sweep" on W&B:

```commandline
wandb sweep gym_locm/experiments/sbgames-2022/hyp-search.yaml
```

This command will output a _sweep ID_, including the entity and project names.
Save it for the next step.
From this moment on, the hyperparameter search can be observed on W&B's website.
However, no training sessions will happen until you "recruit" one or more
computers to run the training sessions. That can be done by executing the following
command on a terminal:

```commandline
wandb agent <sweep_id>
```

Where the `sweep_id` parameter should be the _sweep ID_ saved from the output of
the previous command. From now on, the recruited computers will run training sessions
continuously until you tell them to stop. That can be done on W&B's website or by
issuing a CTRL + C on the terminal where the training sessions are being executed.
In our paper, we executed 35 training sessions. All the statistics can be seen on
W&B's website, including which sets of hyperparameters yielded the best results.
For more info on W&B sweeps, see [the docs](https://docs.wandb.ai/guides/sweeps).

## Experiment 2: training in self-play

Using the best set of hyperparameters found in the previous experiment, we executed
five training sessions, each with a different random seed. To reproduce the training
sessions we used for the paper, execute the following command on a terminal:

```commandline
python gym_locm/experiments/training.py --act-fun=relu --adversary=self-play \
--cliprange=0.2 --concurrency=4 --draft-agent=random --ent-coef=0.005 \
--eval-episodes=500 --gamma=0.99 --layers=7 --learning-rate=0.0041142387646692325 \
--n-steps=512 --neurons=455 --nminibatches-divider=1 --noptepochs=1 --num-evals=100 \
--path=gym_locm/experiments/papers/sbgames-2022/self-play --role=alternate \
--seed=<seed> --switch-freq=10 --task=battle --train-episodes=100000 --vf-coef=1
```

Repeating five times, each with a different `seed` parameter. The seeds we used were:
`91577453`, `688183`, `63008694`, `4662087`, and `58793266`.

## Experiment 3: training against a fixed battle agent

This experiment uses almost the same command as the previous:

```commandline
python gym_locm/experiments/training.py --act-fun=relu --adversary=fixed \
--battle-agent=<battle_agent> --cliprange=0.2 --concurrency=4 --draft-agent=random \
--ent-coef=0.005 --eval-episodes=500 --gamma=0.99 --layers=7 \
--learning-rate=0.0041142387646692325 --n-steps=512 --neurons=455 \
--nminibatches-divider=1 --noptepochs=1 --num-evals=100 \
--path=gym_locm/experiments/papers/sbgames-2022/fixed --role=alternate --seed=<seed> \
--switch-freq=10 --task=battle --train-episodes=100000 --vf-coef=1
```

Repeating ten times, each with a different combination of `battle_agent` and `seed`
parameters. The seeds we used were: `91577453`, `688183`, `63008694`, `4662087`,
and `58793266`. The battle agents we used were `max-attack` (MA) and `greedy` (OSL).
Loading

0 comments on commit cf02be3

Please sign in to comment.