Skip to content

mughees-asif/pommerman-java-qmul

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

Monte Carlo Tree Search with Progressive Bias and Decaying Reward

Abstract

This research focuses on Pommerman, which is a baseline multi-agent game with partial and full-observability options. The goal of the game is to be the last agent/team standing whilst progressing through the game equipped with player-health damaging bombs. The main aim of the study was to improve upon the classical Statistical Forward Planning algorithm: Monte Carlo Tree Search (MCTS). MCTS is a highly selective best-first search method used for determining optimal outcomes in a given domain by obtaining random samples from the decision space and constructing a search tree based on the results. The new agent, MCTSBias, was modified with progressive bias and the decaying reward strategy. The results highlight the decisive improvement in the overall performance of the new agent as compared to the vanilla MCTS which was outperformed during experimentation. In addition, other players with different technical architectures were also explored to validate MCTSBias performance.

Setup

  • Open the project using a suitable IDE, such as IntelliJ.
  • Clone the following repository: git clone https://github.com/GAIGResearch/java-pommerman
  • Navigate to the players directory.
  • Clone this repository to the players package of java-pommerman: git clone git@github.com:mughees-asif/pommerman-java-qmul.git
java-pommerman
│   README.md  
│   
└───...
│   
└───src
│   │   
│   └───core 
│   │   
│   └───...
│   │                   
│   └───players
│   │   │    
│   │   └───mcts
│   │   │    
│   │   └───mctsbias
│   │   │   
│   │   └───...
│   │   │    
│   |   └───rhea
  • Open Run.java (in src/).
  • This class is executed passing 7 parameters as arguments (although passing none executes a default mode). The usage instructions are as follows:
    • [arg index = 0] Game Mode. 0: FFA; 1: TEAM
    • [arg index = 1] Number of level generation seeds [S]. "-1" to execute with the ones from the paper (20).
    • [arg index = 2] Repetitions per seed [N]. "1" for one game only with visuals.
    • [arg index = 3] Vision Range [VR]. (0, 1, 2 for PO; -1 for Full Observability)
    • [arg index = 4-7] Agents. When in TEAM, agents are mates as indices 4-6, 5-7:
      • 0 DoNothing
      • 1 Random
      • 2 OSLA
      • 3 SimplePlayer
      • 4 RHEA 200 itereations, shift buffer On, pop size 1, random init, length: 12
      • 5 MCTS 200 iterations, length: 12
      • 6 Human Player (controls: cursor keys + space bar)

Examples:

  • A single game with full observability, FFA. This is also the default mode when no arguments are passed:
    • java -jar run.jar 0 1 1 -1 2 3 4 5
  • A single game with partial observability, FFA, where you're in control of one player:
    • java -jar run.jar 0 1 1 2 0 1 2 6
  • Executes several games, headless, FFA. Two different random seeds for the level generation, repeated 5 times each (for a total of 5x2 games).
    • java -jar run.jar 0 2 5 4 2 3 4 1
  • Executes several games, headless, TEAM, repeated 10 times each. Same configuration as the one used in the paper, including the 20 seeds.
    • java -jar run.jar 1 -1 10 4 5 3 5 3

Group-AS:

  • Azar Park
  • Mughees Asif
  • Shrabana Biswas Shruti

About

Monte Carlo Tree Search with Progressive Bias and Decaying Reward for the Pommerman (Java version) game.

Topics

Resources

Stars

Watchers

Forks

Contributors

Languages