Merge pull request #2 from FarInHeight/vitabile

Final commit, maybe
FarInHeight · Jan 21, 2024 · 33dcba0 · 33dcba0
2 parents bbe0340 + 17c177f
commit 33dcba0
Show file tree

Hide file tree

Showing 20 changed files with 5,133 additions and 583 deletions.
diff --git a/LICENSE b/LICENSE
@@ -1,6 +1,6 @@
 MIT License
 
-Copyright (c) 2023 Davide Sferrazza
+Copyright (c) 2024 Davide Sferrazza, Davide Vitabile
 
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal

diff --git a/README.md b/README.md
@@ -1 +1,79 @@
-# Computational-Intelligence-Project
+# Computational Intelligence Project
+
+## Players - Design Choices
+
+Since during the semester we developed several agents based on the techniques explained in the lectures, we mainly focused our project on methods which we did not develop in the laboratories or in the additional material we proposed in our personal repositories.
+
+Keeping this in mind, we decided to implement the following methods:
+- [x] Human Player
+- [x] MinMax
+- [x] MinMax + Alpha-Beta pruning
+- [x] Monte Carlo Reinforcement Learning (TD learning + Symmetries)
+- [x] Monte Carlo Tree Search
+
+Although _Monte Carlo Tree Search_ is not a topic of the course, we included it because _Quixo_ has a great branching factor value and we wanted an agent that could overcome this problem.
+
+### Space Optimization
+
+Since the _Quixo_ game has a huge amount of states, we focused our attention on optimizing the space required by our serialized agents. Before this new representation, the Monte Carlo RL player weighed more than a GB, while now its size is 57 KB.
+
+### Players Improvements
+
+To improve the performance of the players we implemented the following improvements:
+- [x] parallelization
+- [x] hash tables
+- [x] symmetries
+
+### Failed Attempts
+
+We also tried to include in the project a Q-learning player, but we failed resoundingly due to the huge amount of _state-action_ pairs to learn. For this reason, we removed it from the repository.
+
+We tried to use the same agents implemented for the last laboratory, but we failed because the formulas we used were not sufficient to learn the return of rewards of the millions and millions of states in which _Quixo_ can be found. \
+We performed several trials and after a consultation with [Riccardo Cardona](https://github.com/Riden15/Computational-Intelligence), we found that the formula he used for the project is quite efficient and effective.
+
+## Repository Structure
+
+- [players](players): this directory contains the implemented agents
+    - [human_player.py](players/human_player.py): class which implements a human player
+    - [min_max.py](players/min_max.py): class which implements the MinMax algorithm and the Alpha-Beta pruning technique
+    - [monte_carlo_rl.py](players/monte_carlo_rl.py): class which implements the Monte Carlo Reinforcement Learning player
+    - [monte_carlo_tree_search.py](players/monte_carlo_tree_search.py): class which implements the Monte Carlo Tree Search algorithm
+    - [random_player.py](players/random_player.py): class which implements a player that plays randomly
+- [trained_agents](trained_agents): this directory contains the trained agents
+- [utils](utils): this directory contains files which are necessary for the agents to play and implement performance improvements
+    - [investigate_game.py](utils/investigate_game.py): class which extends `Game` and it is used by our agents 
+    - [symmetry.py](utils/symmetry.py): class which implements all the possible symmetries and it is used by our agents
+- [project_summary.ipynb](project_summary.ipynb): notebook used to train agents and to show results
+
+The serialized `MinMax` and `MinMax + Alpha-Beta pruning` players with a non-empty hash table can be found in the release section.
+
+## How to run
+
+To run a specific `module.py` file, open the terminal and type the following command from the root of the project:
+```bash
+python -m folder.module
+```
+As an example, run the `min_max.py` file as follows:
+```bash
+python -m players.min_max
+```
+
+If you are using VS Code as editor, you can add 
+```json
+"terminal.integrated.env.[your os]": 
+{
+    "PYTHONPATH": "${workspaceFolder}"
+}
+```
+to your settings and run the module directly using the <kbd>▶</kbd> button.
+
+## Resources
+
+* Sutton & Barto, _Reinforcement Learning: An Introduction_ [2nd Edition]
+* Russel, Norvig, _Artificial Intelligence: A Modern Approach_ [4th edition]
+* Nils J. Nilsson, _Artificial Intelligence: A New Synthesis_ Morgan Kaufmann Publishers, Inc. (1998)
+* [Quixo Is Solved](https://arxiv.org/pdf/2007.15895.pdf)
+* [aimacode/aima-python](https://github.com/aimacode/aima-python/tree/master) + [Monte Carlo Tree Search implementation example](https://github.com/aimacode/aima-python/blob/master/games4e.py#L178)
+
+## License
+[MIT License](LICENSE)
diff --git a/__init__.py b/__init__.py
diff --git a/game.py b/game.py
@@ -7,6 +7,10 @@
 
 
 class Move(Enum):
+    '''
+    Selects where you want to place the taken piece. The rest of the pieces are shifted
+    '''
+
     TOP = 0
     BOTTOM = 1
     LEFT = 2
@@ -21,6 +25,9 @@ def __init__(self) -> None:
     @abstractmethod
     def make_move(self, game: 'Game') -> tuple[tuple[int, int], Move]:
         '''
+        The game accepts coordinates of the type (X, Y). X goes from left to right, while Y goes from top to bottom, as in 2D graphics.
+        Thus, the coordinates that this method returns shall be in the (X, Y) format.
+
         game: the Quixo game. You can use it to override the current game with yours, but everything is evaluated by the main game
         return values: this method shall return a tuple of X,Y positions and a move among TOP, BOTTOM, LEFT and RIGHT
         '''
@@ -30,13 +37,20 @@ def make_move(self, game: 'Game') -> tuple[tuple[int, int], Move]:
 class Game(object):
     def __init__(self) -> None:
         self._board = np.ones((5, 5), dtype=np.uint8) * -1
+        self.current_player_idx = 1
 
-    def get_board(self):
+    def get_board(self) -> np.ndarray:
         '''
         Returns the board
         '''
         return deepcopy(self._board)
 
+    def get_current_player(self) -> int:
+        '''
+        Returns the current player
+        '''
+        return deepcopy(self.current_player_idx)
+
     def print(self):
         '''Prints the board. -1 are neutral pieces, 0 are pieces of player 0, 1 pieces of player 1'''
         print(self._board)
@@ -57,15 +71,13 @@ def check_winner(self) -> int:
                 return self._board[0, y]
         # if a player has completed the principal diagonal
         if self._board[0, 0] != -1 and all(
-            [self._board[x, x]
-                for x in range(self._board.shape[0])] == self._board[0, 0]
+            [self._board[x, x] for x in range(self._board.shape[0])] == self._board[0, 0]
         ):
             # return the relative id
             return self._board[0, 0]
         # if a player has completed the secondary diagonal
         if self._board[0, -1] != -1 and all(
-            [self._board[x, -(x + 1)]
-             for x in range(self._board.shape[0])] == self._board[0, -1]
+            [self._board[x, -(x + 1)] for x in range(self._board.shape[0])] == self._board[0, -1]
         ):
             # return the relative id
             return self._board[0, -1]
@@ -74,15 +86,14 @@ def check_winner(self) -> int:
     def play(self, player1: Player, player2: Player) -> int:
         '''Play the game. Returns the winning player'''
         players = [player1, player2]
-        current_player_idx = 1
         winner = -1
         while winner < 0:
-            current_player_idx += 1
-            current_player_idx %= len(players)
+            self.current_player_idx += 1
+            self.current_player_idx %= len(players)
             ok = False
             while not ok:
-                from_pos, slide = players[current_player_idx].make_move(self)
-                ok = self.__move(from_pos, slide, current_player_idx)
+                from_pos, slide = players[self.current_player_idx].make_move(self)
+                ok = self.__move(from_pos, slide, self.current_player_idx)
             winner = self.check_winner()
         return winner
 
@@ -142,17 +153,13 @@ def __slide(self, from_pos: tuple[int, int], slide: Move) -> bool:
         # if the piece position is in a corner
         else:
             # if it is in the upper left corner, it can be moved to the right and down
-            acceptable_top: bool = from_pos == (0, 0) and (
-                slide == Move.BOTTOM or slide == Move.RIGHT)
+            acceptable_top: bool = from_pos == (0, 0) and (slide == Move.BOTTOM or slide == Move.RIGHT)
             # if it is in the lower left corner, it can be moved to the right and up
-            acceptable_left: bool = from_pos == (4, 0) and (
-                slide == Move.TOP or slide == Move.RIGHT)
+            acceptable_left: bool = from_pos == (4, 0) and (slide == Move.TOP or slide == Move.RIGHT)
             # if it is in the upper right corner, it can be moved to the left and down
-            acceptable_right: bool = from_pos == (0, 4) and (
-                slide == Move.BOTTOM or slide == Move.LEFT)
+            acceptable_right: bool = from_pos == (0, 4) and (slide == Move.BOTTOM or slide == Move.LEFT)
             # if it is in the lower right corner, it can be moved to the left and up
-            acceptable_bottom: bool = from_pos == (4, 4) and (
-                slide == Move.TOP or slide == Move.LEFT)
+            acceptable_bottom: bool = from_pos == (4, 4) and (slide == Move.TOP or slide == Move.LEFT)
         # check if the move is acceptable
         acceptable: bool = acceptable_top or acceptable_bottom or acceptable_left or acceptable_right
         # if it is
@@ -164,35 +171,31 @@ def __slide(self, from_pos: tuple[int, int], slide: Move) -> bool:
                 # for each column starting from the column of the piece and moving to the left
                 for i in range(from_pos[1], 0, -1):
                     # copy the value contained in the same row and the previous column
-                    self._board[(from_pos[0], i)] = self._board[(
-                        from_pos[0], i - 1)]
+                    self._board[(from_pos[0], i)] = self._board[(from_pos[0], i - 1)]
                 # move the piece to the left
                 self._board[(from_pos[0], 0)] = piece
             # if the player wants to slide it to the right
             elif slide == Move.RIGHT:
                 # for each column starting from the column of the piece and moving to the right
                 for i in range(from_pos[1], self._board.shape[1] - 1, 1):
                     # copy the value contained in the same row and the following column
-                    self._board[(from_pos[0], i)] = self._board[(
-                        from_pos[0], i + 1)]
+                    self._board[(from_pos[0], i)] = self._board[(from_pos[0], i + 1)]
                 # move the piece to the right
                 self._board[(from_pos[0], self._board.shape[1] - 1)] = piece
             # if the player wants to slide it upward
             elif slide == Move.TOP:
                 # for each row starting from the row of the piece and going upward
                 for i in range(from_pos[0], 0, -1):
                     # copy the value contained in the same column and the previous row
-                    self._board[(i, from_pos[1])] = self._board[(
-                        i - 1, from_pos[1])]
+                    self._board[(i, from_pos[1])] = self._board[(i - 1, from_pos[1])]
                 # move the piece up
                 self._board[(0, from_pos[1])] = piece
             # if the player wants to slide it downward
             elif slide == Move.BOTTOM:
                 # for each row starting from the row of the piece and going downward
                 for i in range(from_pos[0], self._board.shape[0] - 1, 1):
                     # copy the value contained in the same column and the following row
-                    self._board[(i, from_pos[1])] = self._board[(
-                        i + 1, from_pos[1])]
+                    self._board[(i, from_pos[1])] = self._board[(i + 1, from_pos[1])]
                 # move the piece down
                 self._board[(self._board.shape[0] - 1, from_pos[1])] = piece
         return acceptable
diff --git a/human_player.py b/human_player.py
diff --git a/main.py b/main.py
@@ -1,19 +1,8 @@
 from game import Game
-from min_max import MinMaxPlayer, AlphaBetaMinMaxPlayer
-from random_player import RandomPlayer
-from human_player import HumanPlayer
-import time
+from utils.investigate_game import InvestigateGame
+from players.random_player import RandomPlayer
+
 
 if __name__ == '__main__':
     g = Game()
-    g.print()
-    # player1 = AlphaBetaMinMaxPlayer(0, depth=4)
-    player1 = RandomPlayer()
-    # player2 = RandomPlayer()
-    player2 = AlphaBetaMinMaxPlayer(1, depth=5, symmetries=True)
-    start = time.time()
-    winner = g.play(player1, player2)
-    total_time = time.time() - start
-    g.print()
-    print(f"Winner: Player {winner}")
-    print(f'Game duration: {total_time:.2E} sec, {total_time / 60:.2E} min')
+    g.play(RandomPlayer(), RandomPlayer())