-
Notifications
You must be signed in to change notification settings - Fork 5
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
15 changed files
with
555 additions
and
438 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,109 @@ | ||
# Reproducing the experiments from our Entertainment Computing 2022 paper | ||
|
||
This readme file contains the information necessary to reproduce the experiments | ||
from our paper in Entertainment Computing 2022 named "_Exploring Deep Reinforcement Learning for | ||
Drafting in Collectible Card Games_." Please contact | ||
me at [ronaldo.vieira@dcc.ufmg.br](mailto:ronaldo.vieira@dcc.ufmg.br) in case any | ||
of the instructions below do not work. | ||
|
||
The game engine for LOCM 1.2 can be found at [engine.py](../../../engine.py), which is used by the OpenAI | ||
Gym environments (more info on the repository's main page). The implementation of our | ||
approaches can be found in the experiment files mentioned below. The resulting agents can be found in the | ||
[trained_models](../../../trained_models) folder, along with instructions on how to use them. | ||
|
||
## Section 4.1: hyperparameter search | ||
|
||
To perform a hyperparameter tuning, simply execute the [hyp-search.py](../../../experiments/hyp-search.py) script: | ||
|
||
``` | ||
python3 gym_locm/experiments/hyp-search.py --approach <approach> --battle-agent <battle_agent> \ | ||
--path hyp_search_results/ --seed 96765 --processes 4 | ||
``` | ||
|
||
The list and range of hyperparameters explored is available in the Appendix A of our paper. we performed | ||
hyperparameter tunings for all combinations of `<approach>` (`immediate`, `history` and `lstm`) and | ||
`<battle_agent>` (`ma` and `osl`). To learn about the other script's parameters, execute it with the | ||
`--help` flag. | ||
|
||
## Section 4.2: comparison between our approaches | ||
|
||
To train **two** draft agents (a 1st player and a 2nd player) with a specific draft approach and battle agent, | ||
in asymmetric self-play, simply execute the [training.py](../../../experiments/training.py) script: | ||
|
||
``` | ||
python3 gym_locm/experiments/training.py --approach <approach> --battle-agent <battle_agent> \ | ||
--path training_results/ --switch-freq <switch_freq> --layers <layers> --neurons <neurons> \ | ||
--act-fun <activation_function> --n-steps <batch_size> --nminibatches <n_minibatches> \ | ||
--noptepochs <n_epochs> --cliprange <cliprange> --vf-coef <vf_coef> --ent-coef <ent_coef> \ | ||
--learning-rate <learning_rate> --seed 32359627 --concurrency 4 | ||
``` | ||
|
||
We trained ten draft agents (five 1st players and five 2nd players) of each combination of `<approach>` and | ||
`<battle_agent>`, using the best sets of hyperparameters found for them in the previous experiment. That comprises | ||
five runs of the script, in which we used the seeds `32359627`, `91615349`, `88803987`, `83140551` and `50731732`. | ||
|
||
To learn about the other script's parameters, execute it with the `--help` flag. Running the script with all default | ||
parameters will train a `immediate` drafter with the `ma` battler, using the best set of hyperparameters | ||
we found for that combination. The best set of hyperparameters for the other combinations is available in the | ||
Appendix A of our paper. | ||
|
||
## Section 4.3: comparison with other draft agents | ||
|
||
To run one of the tournaments, simply execute the [tournament.py](../../../experiments/tournament.py) script: | ||
``` | ||
python3 gym_locm/experiments/tournament.py \ | ||
--drafters random max-attack coac closet-ai icebox chad \ | ||
gym_locm/trained_models/<battle_agent>/immediate-1M/ \ | ||
gym_locm/trained_models/<battle_agent>/lstm-1M/ \ | ||
gym_locm/trained_models/<battle_agent>/history-1M/ \ | ||
--battler <battle_agent> --concurrency 4 --games 1000 --path tournament_results/ \ | ||
--seeds 32359627 91615349 88803987 83140551 50731732 | ||
``` | ||
replacing `<battle_agent>` for either `ma` or `osl`, respectively, to run either tournament as | ||
depicted in our paper. The script will create files at `tournament_results/` describing | ||
the individual win rates of every set of matches, the aggregate win rates, average mana curves (section 4.3.2) | ||
and every individual draft choice made by every agent, in CSV format, for human inspection, and as serialized | ||
Pandas data frames (PKL format), for easy further data manipulation. To learn about the other script's | ||
parameters, execute it with the `--help` flag. | ||
|
||
To reproduce the table of agent similarities and the plot containing the agent's three-dimensional coordinates | ||
found via Principal Component Analysis and grouped via K-Means (section 4.3.3), simply execute the | ||
[similarities.py](../../../experiments/similarities.py) script: | ||
``` | ||
python3 gym_locm/experiments/similarities.py \ | ||
--files ma_tournament_results/choices.csv osl_tournament_results/choices.csv | ||
``` | ||
which will result in files containing the similarities table (in CSV and PKL formats) and the plot (in PNG format) | ||
created to the current folder. | ||
|
||
## Section 4.4: agent improvement in the SCGAI competition | ||
|
||
We used the source code of the Strategy Card Game AI competition | ||
([2019](https://github.com/acatai/Strategy-Card-Game-AI-Competition/tree/master/contest-2019-08-COG) and | ||
[2020](https://github.com/acatai/Strategy-Card-Game-AI-Competition/tree/master/contest-2020-08-COG) editions) | ||
to re-run the matches, replacing the *max-attack* player (named Baseline2) with a personalized player featuring | ||
our best draft agent and the battle portion on the *max-attack* player. This can be reproduced by altering line | ||
11 (2019) or line 2 (2020) of the runner script (`run.sh`) from `AGENTS[10]="python3 Baseline2/main.py"` to | ||
```bash | ||
AGENTS[10]="python3 gym_locm/toolbox/predictor.py --battle \"python3 Baseline2/main.py\" \ | ||
--draft-1 path/to/gym_locm/trained_models/max-attack/immediate-1M/1st/6.json \ | ||
--draft-2 path/to/gym_locm/trained_models/max-attack/immediate-1M/2nd/8.json" | ||
``` | ||
then, executing it. Parallelism can be achieved by running the script in multiple processes/machines. Save the | ||
output to text files named `out-*.txt` (with a number instead of `*`) in the same folder, then run `analyze.py` | ||
to extract win rates. The runner script can take up to several days, and the analyze script can take up to some hours. | ||
See the [trained_models](../../../trained_models) package for more information on the predictor script. | ||
|
||
## Section 4.5: importance of being history-aware in LOCM | ||
|
||
This experiment is simply a re-execution of the OSL tournament from section 4.2, adding a new draft agent to the | ||
tournament (`historyless`). To reproduce it, execute the following script: | ||
``` | ||
python3 gym_locm/experiments/tournament.py \ | ||
--drafters random max-attack coac closet-ai icebox chad historyless \ | ||
gym_locm/trained_models/<battle_agent>/immediate-1M/ \ | ||
gym_locm/trained_models/<battle_agent>/lstm-1M/ \ | ||
gym_locm/trained_models/<battle_agent>/history-1M/ \ | ||
--battler osl --concurrency 4 --games 1000 --path osl_historyless_tournament_results/ \ | ||
--seeds 32359627 91615349 88803987 83140551 50731732 | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,74 @@ | ||
# Reproducing the experiments from our SBGames 2022 paper | ||
|
||
This readme file contains the information necessary to reproduce the experiments | ||
from our paper in SBGames 2022 named "_Exploring Deep Reinforcement Learning for | ||
Battling in Collectible Card Games_." Although we mention in the paper that we use | ||
gym-locm's version 1.3.0, any future version should also suffice. Please contact | ||
me at [ronaldo.vieira@dcc.ufmg.br](mailto:ronaldo.vieira@dcc.ufmg.br) in case any | ||
of the instructions below do not work. | ||
|
||
## Experiment 1: hyperparameter search | ||
|
||
We use Weights and Biases (W&B) to orchestrate our hyperparameter search. The | ||
`hyp-search.yaml` file contains the search configuration, including hyperparameter | ||
ranges. Having W&B installed, executing the following command on a terminal will | ||
create a "sweep" on W&B: | ||
|
||
```commandline | ||
wandb sweep gym_locm/experiments/sbgames-2022/hyp-search.yaml | ||
``` | ||
|
||
This command will output a _sweep ID_, including the entity and project names. | ||
Save it for the next step. | ||
From this moment on, the hyperparameter search can be observed on W&B's website. | ||
However, no training sessions will happen until you "recruit" one or more | ||
computers to run the training sessions. That can be done by executing the following | ||
command on a terminal: | ||
|
||
```commandline | ||
wandb agent <sweep_id> | ||
``` | ||
|
||
Where the `sweep_id` parameter should be the _sweep ID_ saved from the output of | ||
the previous command. From now on, the recruited computers will run training sessions | ||
continuously until you tell them to stop. That can be done on W&B's website or by | ||
issuing a CTRL + C on the terminal where the training sessions are being executed. | ||
In our paper, we executed 35 training sessions. All the statistics can be seen on | ||
W&B's website, including which sets of hyperparameters yielded the best results. | ||
For more info on W&B sweeps, see [the docs](https://docs.wandb.ai/guides/sweeps). | ||
|
||
## Experiment 2: training in self-play | ||
|
||
Using the best set of hyperparameters found in the previous experiment, we executed | ||
five training sessions, each with a different random seed. To reproduce the training | ||
sessions we used for the paper, execute the following command on a terminal: | ||
|
||
```commandline | ||
python gym_locm/experiments/training.py --act-fun=relu --adversary=self-play \ | ||
--cliprange=0.2 --concurrency=4 --draft-agent=random --ent-coef=0.005 \ | ||
--eval-episodes=500 --gamma=0.99 --layers=7 --learning-rate=0.0041142387646692325 \ | ||
--n-steps=512 --neurons=455 --nminibatches-divider=1 --noptepochs=1 --num-evals=100 \ | ||
--path=gym_locm/experiments/papers/sbgames-2022/self-play --role=alternate \ | ||
--seed=<seed> --switch-freq=10 --task=battle --train-episodes=100000 --vf-coef=1 | ||
``` | ||
|
||
Repeating five times, each with a different `seed` parameter. The seeds we used were: | ||
`91577453`, `688183`, `63008694`, `4662087`, and `58793266`. | ||
|
||
## Experiment 3: training against a fixed battle agent | ||
|
||
This experiment uses almost the same command as the previous: | ||
|
||
```commandline | ||
python gym_locm/experiments/training.py --act-fun=relu --adversary=fixed \ | ||
--battle-agent=<battle_agent> --cliprange=0.2 --concurrency=4 --draft-agent=random \ | ||
--ent-coef=0.005 --eval-episodes=500 --gamma=0.99 --layers=7 \ | ||
--learning-rate=0.0041142387646692325 --n-steps=512 --neurons=455 \ | ||
--nminibatches-divider=1 --noptepochs=1 --num-evals=100 \ | ||
--path=gym_locm/experiments/papers/sbgames-2022/fixed --role=alternate --seed=<seed> \ | ||
--switch-freq=10 --task=battle --train-episodes=100000 --vf-coef=1 | ||
``` | ||
|
||
Repeating ten times, each with a different combination of `battle_agent` and `seed` | ||
parameters. The seeds we used were: `91577453`, `688183`, `63008694`, `4662087`, | ||
and `58793266`. The battle agents we used were `max-attack` (MA) and `greedy` (OSL). |
Oops, something went wrong.