GitHub - mughees-asif/pommerman-java-qmul: Monte Carlo Tree Search with Progressive Bias and Decaying Reward for the Pommerman (Java version) game.

Monte Carlo Tree Search with Progressive Bias and Decaying Reward

Abstract

This research focuses on Pommerman, which is a baseline multi-agent game with partial and full-observability options. The goal of the game is to be the last agent/team standing whilst progressing through the game equipped with player-health damaging bombs. The main aim of the study was to improve upon the classical Statistical Forward Planning algorithm: Monte Carlo Tree Search (MCTS). MCTS is a highly selective best-first search method used for determining optimal outcomes in a given domain by obtaining random samples from the decision space and constructing a search tree based on the results. The new agent, MCTSBias, was modified with progressive bias and the decaying reward strategy. The results highlight the decisive improvement in the overall performance of the new agent as compared to the vanilla MCTS which was outperformed during experimentation. In addition, other players with different technical architectures were also explored to validate MCTSBias performance.

Setup

Open the project using a suitable IDE, such as IntelliJ.
Clone the following repository: git clone https://github.com/GAIGResearch/java-pommerman
Navigate to the players directory.
Clone this repository to the players package of java-pommerman: git clone git@github.com:mughees-asif/pommerman-java-qmul.git

java-pommerman
│   README.md  
│   
└───...
│   
└───src
│   │   
│   └───core 
│   │   
│   └───...
│   │                   
│   └───players
│   │   │    
│   │   └───mcts
│   │   │    
│   │   └───mctsbias
│   │   │   
│   │   └───...
│   │   │    
│   |   └───rhea

Open Run.java (in src/).
This class is executed passing 7 parameters as arguments (although passing none executes a default mode). The usage instructions are as follows:
- [arg index = 0] Game Mode. 0: FFA; 1: TEAM
- [arg index = 1] Number of level generation seeds [S]. "-1" to execute with the ones from the paper (20).
- [arg index = 2] Repetitions per seed [N]. "1" for one game only with visuals.
- [arg index = 3] Vision Range [VR]. (0, 1, 2 for PO; -1 for Full Observability)
- [arg index = 4-7] Agents. When in TEAM, agents are mates as indices 4-6, 5-7:
  - 0 DoNothing
  - 1 Random
  - 2 OSLA
  - 3 SimplePlayer
  - 4 RHEA 200 itereations, shift buffer On, pop size 1, random init, length: 12
  - 5 MCTS 200 iterations, length: 12
  - 6 Human Player (controls: cursor keys + space bar)

Examples:

A single game with full observability, FFA. This is also the default mode when no arguments are passed:
- java -jar run.jar 0 1 1 -1 2 3 4 5
A single game with partial observability, FFA, where you're in control of one player:
- java -jar run.jar 0 1 1 2 0 1 2 6
Executes several games, headless, FFA. Two different random seeds for the level generation, repeated 5 times each (for a total of 5x2 games).
- java -jar run.jar 0 2 5 4 2 3 4 1
Executes several games, headless, TEAM, repeated 10 times each. Same configuration as the one used in the paper, including the 20 seeds.
- java -jar run.jar 1 -1 10 4 5 3 5 3

Group-AS:

Azar Park
Mughees Asif
Shrabana Biswas Shruti

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
mctsbias		mctsbias
GroupAS_FinalReport.pdf		GroupAS_FinalReport.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Monte Carlo Tree Search with Progressive Bias and Decaying Reward

Abstract

Setup

Examples:

Group-AS:

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Monte Carlo Tree Search with Progressive Bias and Decaying Reward

Abstract

Setup

Examples:

Group-AS:

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages