Skip to content

Commit

Permalink
Merge pull request #2 from FarInHeight/vitabile
Browse files Browse the repository at this point in the history
Final commit, maybe
  • Loading branch information
Vitabile authored Jan 21, 2024
2 parents bbe0340 + 17c177f commit 33dcba0
Show file tree
Hide file tree
Showing 20 changed files with 5,133 additions and 583 deletions.
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
MIT License

Copyright (c) 2023 Davide Sferrazza
Copyright (c) 2024 Davide Sferrazza, Davide Vitabile

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
80 changes: 79 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,79 @@
# Computational-Intelligence-Project
# Computational Intelligence Project

## Players - Design Choices

Since during the semester we developed several agents based on the techniques explained in the lectures, we mainly focused our project on methods which we did not develop in the laboratories or in the additional material we proposed in our personal repositories.

Keeping this in mind, we decided to implement the following methods:
- [x] Human Player
- [x] MinMax
- [x] MinMax + Alpha-Beta pruning
- [x] Monte Carlo Reinforcement Learning (TD learning + Symmetries)
- [x] Monte Carlo Tree Search

Although _Monte Carlo Tree Search_ is not a topic of the course, we included it because _Quixo_ has a great branching factor value and we wanted an agent that could overcome this problem.

### Space Optimization

Since the _Quixo_ game has a huge amount of states, we focused our attention on optimizing the space required by our serialized agents. Before this new representation, the Monte Carlo RL player weighed more than a GB, while now its size is 57 KB.

### Players Improvements

To improve the performance of the players we implemented the following improvements:
- [x] parallelization
- [x] hash tables
- [x] symmetries

### Failed Attempts

We also tried to include in the project a Q-learning player, but we failed resoundingly due to the huge amount of _state-action_ pairs to learn. For this reason, we removed it from the repository.

We tried to use the same agents implemented for the last laboratory, but we failed because the formulas we used were not sufficient to learn the return of rewards of the millions and millions of states in which _Quixo_ can be found. \
We performed several trials and after a consultation with [Riccardo Cardona](https://github.com/Riden15/Computational-Intelligence), we found that the formula he used for the project is quite efficient and effective.

## Repository Structure

- [players](players): this directory contains the implemented agents
- [human_player.py](players/human_player.py): class which implements a human player
- [min_max.py](players/min_max.py): class which implements the MinMax algorithm and the Alpha-Beta pruning technique
- [monte_carlo_rl.py](players/monte_carlo_rl.py): class which implements the Monte Carlo Reinforcement Learning player
- [monte_carlo_tree_search.py](players/monte_carlo_tree_search.py): class which implements the Monte Carlo Tree Search algorithm
- [random_player.py](players/random_player.py): class which implements a player that plays randomly
- [trained_agents](trained_agents): this directory contains the trained agents
- [utils](utils): this directory contains files which are necessary for the agents to play and implement performance improvements
- [investigate_game.py](utils/investigate_game.py): class which extends `Game` and it is used by our agents
- [symmetry.py](utils/symmetry.py): class which implements all the possible symmetries and it is used by our agents
- [project_summary.ipynb](project_summary.ipynb): notebook used to train agents and to show results

The serialized `MinMax` and `MinMax + Alpha-Beta pruning` players with a non-empty hash table can be found in the release section.

## How to run

To run a specific `module.py` file, open the terminal and type the following command from the root of the project:
```bash
python -m folder.module
```
As an example, run the `min_max.py` file as follows:
```bash
python -m players.min_max
```

If you are using VS Code as editor, you can add
```json
"terminal.integrated.env.[your os]":
{
"PYTHONPATH": "${workspaceFolder}"
}
```
to your settings and run the module directly using the <kbd>▶</kbd> button.

## Resources

* Sutton & Barto, _Reinforcement Learning: An Introduction_ [2nd Edition]
* Russel, Norvig, _Artificial Intelligence: A Modern Approach_ [4th edition]
* Nils J. Nilsson, _Artificial Intelligence: A New Synthesis_ Morgan Kaufmann Publishers, Inc. (1998)
* [Quixo Is Solved](https://arxiv.org/pdf/2007.15895.pdf)
* [aimacode/aima-python](https://github.com/aimacode/aima-python/tree/master) + [Monte Carlo Tree Search implementation example](https://github.com/aimacode/aima-python/blob/master/games4e.py#L178)

## License
[MIT License](LICENSE)
Empty file added __init__.py
Empty file.
55 changes: 29 additions & 26 deletions game.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,10 @@


class Move(Enum):
'''
Selects where you want to place the taken piece. The rest of the pieces are shifted
'''

TOP = 0
BOTTOM = 1
LEFT = 2
Expand All @@ -21,6 +25,9 @@ def __init__(self) -> None:
@abstractmethod
def make_move(self, game: 'Game') -> tuple[tuple[int, int], Move]:
'''
The game accepts coordinates of the type (X, Y). X goes from left to right, while Y goes from top to bottom, as in 2D graphics.
Thus, the coordinates that this method returns shall be in the (X, Y) format.
game: the Quixo game. You can use it to override the current game with yours, but everything is evaluated by the main game
return values: this method shall return a tuple of X,Y positions and a move among TOP, BOTTOM, LEFT and RIGHT
'''
Expand All @@ -30,13 +37,20 @@ def make_move(self, game: 'Game') -> tuple[tuple[int, int], Move]:
class Game(object):
def __init__(self) -> None:
self._board = np.ones((5, 5), dtype=np.uint8) * -1
self.current_player_idx = 1

def get_board(self):
def get_board(self) -> np.ndarray:
'''
Returns the board
'''
return deepcopy(self._board)

def get_current_player(self) -> int:
'''
Returns the current player
'''
return deepcopy(self.current_player_idx)

def print(self):
'''Prints the board. -1 are neutral pieces, 0 are pieces of player 0, 1 pieces of player 1'''
print(self._board)
Expand All @@ -57,15 +71,13 @@ def check_winner(self) -> int:
return self._board[0, y]
# if a player has completed the principal diagonal
if self._board[0, 0] != -1 and all(
[self._board[x, x]
for x in range(self._board.shape[0])] == self._board[0, 0]
[self._board[x, x] for x in range(self._board.shape[0])] == self._board[0, 0]
):
# return the relative id
return self._board[0, 0]
# if a player has completed the secondary diagonal
if self._board[0, -1] != -1 and all(
[self._board[x, -(x + 1)]
for x in range(self._board.shape[0])] == self._board[0, -1]
[self._board[x, -(x + 1)] for x in range(self._board.shape[0])] == self._board[0, -1]
):
# return the relative id
return self._board[0, -1]
Expand All @@ -74,15 +86,14 @@ def check_winner(self) -> int:
def play(self, player1: Player, player2: Player) -> int:
'''Play the game. Returns the winning player'''
players = [player1, player2]
current_player_idx = 1
winner = -1
while winner < 0:
current_player_idx += 1
current_player_idx %= len(players)
self.current_player_idx += 1
self.current_player_idx %= len(players)
ok = False
while not ok:
from_pos, slide = players[current_player_idx].make_move(self)
ok = self.__move(from_pos, slide, current_player_idx)
from_pos, slide = players[self.current_player_idx].make_move(self)
ok = self.__move(from_pos, slide, self.current_player_idx)
winner = self.check_winner()
return winner

Expand Down Expand Up @@ -142,17 +153,13 @@ def __slide(self, from_pos: tuple[int, int], slide: Move) -> bool:
# if the piece position is in a corner
else:
# if it is in the upper left corner, it can be moved to the right and down
acceptable_top: bool = from_pos == (0, 0) and (
slide == Move.BOTTOM or slide == Move.RIGHT)
acceptable_top: bool = from_pos == (0, 0) and (slide == Move.BOTTOM or slide == Move.RIGHT)
# if it is in the lower left corner, it can be moved to the right and up
acceptable_left: bool = from_pos == (4, 0) and (
slide == Move.TOP or slide == Move.RIGHT)
acceptable_left: bool = from_pos == (4, 0) and (slide == Move.TOP or slide == Move.RIGHT)
# if it is in the upper right corner, it can be moved to the left and down
acceptable_right: bool = from_pos == (0, 4) and (
slide == Move.BOTTOM or slide == Move.LEFT)
acceptable_right: bool = from_pos == (0, 4) and (slide == Move.BOTTOM or slide == Move.LEFT)
# if it is in the lower right corner, it can be moved to the left and up
acceptable_bottom: bool = from_pos == (4, 4) and (
slide == Move.TOP or slide == Move.LEFT)
acceptable_bottom: bool = from_pos == (4, 4) and (slide == Move.TOP or slide == Move.LEFT)
# check if the move is acceptable
acceptable: bool = acceptable_top or acceptable_bottom or acceptable_left or acceptable_right
# if it is
Expand All @@ -164,35 +171,31 @@ def __slide(self, from_pos: tuple[int, int], slide: Move) -> bool:
# for each column starting from the column of the piece and moving to the left
for i in range(from_pos[1], 0, -1):
# copy the value contained in the same row and the previous column
self._board[(from_pos[0], i)] = self._board[(
from_pos[0], i - 1)]
self._board[(from_pos[0], i)] = self._board[(from_pos[0], i - 1)]
# move the piece to the left
self._board[(from_pos[0], 0)] = piece
# if the player wants to slide it to the right
elif slide == Move.RIGHT:
# for each column starting from the column of the piece and moving to the right
for i in range(from_pos[1], self._board.shape[1] - 1, 1):
# copy the value contained in the same row and the following column
self._board[(from_pos[0], i)] = self._board[(
from_pos[0], i + 1)]
self._board[(from_pos[0], i)] = self._board[(from_pos[0], i + 1)]
# move the piece to the right
self._board[(from_pos[0], self._board.shape[1] - 1)] = piece
# if the player wants to slide it upward
elif slide == Move.TOP:
# for each row starting from the row of the piece and going upward
for i in range(from_pos[0], 0, -1):
# copy the value contained in the same column and the previous row
self._board[(i, from_pos[1])] = self._board[(
i - 1, from_pos[1])]
self._board[(i, from_pos[1])] = self._board[(i - 1, from_pos[1])]
# move the piece up
self._board[(0, from_pos[1])] = piece
# if the player wants to slide it downward
elif slide == Move.BOTTOM:
# for each row starting from the row of the piece and going downward
for i in range(from_pos[0], self._board.shape[0] - 1, 1):
# copy the value contained in the same column and the following row
self._board[(i, from_pos[1])] = self._board[(
i + 1, from_pos[1])]
self._board[(i, from_pos[1])] = self._board[(i + 1, from_pos[1])]
# move the piece down
self._board[(self._board.shape[0] - 1, from_pos[1])] = piece
return acceptable
43 changes: 0 additions & 43 deletions human_player.py

This file was deleted.

19 changes: 4 additions & 15 deletions main.py
Original file line number Diff line number Diff line change
@@ -1,19 +1,8 @@
from game import Game
from min_max import MinMaxPlayer, AlphaBetaMinMaxPlayer
from random_player import RandomPlayer
from human_player import HumanPlayer
import time
from utils.investigate_game import InvestigateGame
from players.random_player import RandomPlayer


if __name__ == '__main__':
g = Game()
g.print()
# player1 = AlphaBetaMinMaxPlayer(0, depth=4)
player1 = RandomPlayer()
# player2 = RandomPlayer()
player2 = AlphaBetaMinMaxPlayer(1, depth=5, symmetries=True)
start = time.time()
winner = g.play(player1, player2)
total_time = time.time() - start
g.print()
print(f"Winner: Player {winner}")
print(f'Game duration: {total_time:.2E} sec, {total_time / 60:.2E} min')
g.play(RandomPlayer(), RandomPlayer())
Loading

0 comments on commit 33dcba0

Please sign in to comment.