Skip to content

MiniMax

BB33 edited this page Sep 16, 2021 · 3 revisions

A way to explicitly model the adversarial agents with the techniques of adversarial game-tree search.

Pruning - ignore portions of the search tree that make no difference to the optimal move.

Options:

  • use a heuristic for each state
  • play out the game from the state and average the results.. (this is kind of like what happens with a Monte Carlo Search Tree)

Most games played by AI are perfect information zero sum games. There is no win-win outcome.

Zero sum game has two players: MAX and MIN

MIN wants the minimum minimax value, and MAX wants the maximum minimax value. (since the minimax value is already the max possible utility of that state assuming that MIN plays optimally. )

MAX moves first, then the players take turns. Points awarded to winner. MAX wants to find the series of actions that lead him to win. MIN wants to stop this. MAX must have a conditional plan and know how to respond to MIN's possible move.

MAX Nodes /\ MIN Nodes /

For just win or lose games you could use AND-OR search, but for any games with a range of utility outcomes then we need to use minimax search.

ply - one move by one player, bringing us one level deeper in the game tree.

minimax value - is the utility for MAX of being in that state, assuming that both players play optimally from there to the end of the game.

Terminal nodes at the bottom level get their utility from the games utility function.

Minimax algorithm

Its basically a complete depth-first search of the game tree. If the max depth of the tree is m and b legal moves are at each point then

  • the time complexity is O(b^m)
  • space complexity is O(bm)

Minimax is impractical for complex games, but provides a basis for more practical algorithms.

Alpha-beta pruning

Can cut the exponent in the depth of the tree in halve. We can compute the correct decisions examining every state by pruning.

  • keep in mind that each of the nodes are either played by the MAX or MIN (MAX goes first)
  • you maintain a range of values that can possibly be picked. Initially the range of values is [-inf,inf] at every node.
  • Then you go down and look at the utility of some leaves and you can work out what the player will pick.

Then if you know that the node higher up is going to pick the largest value sometimes it doesn't matter what the lower bound is because we already know the upper bound and the MAX is going to pick according to the upper bound.

Move ordering

  • Move ordering can also be used to improve efficiency. This is because if (when you are MAX) you only look at the moves that give you a lower or upper bound that means the branch is non-optimal straight away then you don't have to look at all of the moves.

Without move ordering then the number of nodes examined will be roughly O(b^3m/4) compared to the best case of O(b^m/2) which is clearly much better.

Dynamic move-ordering schemes, such as trying first the moves tahat were found to be best in the past, brings us quite close to the theoretical limit.

Could come from iterative deepening

  • search one ply deep and record the ranking of moves based on their evaluations, then search one play deeper, using the previous ranking to inform move ordering an so on. The increased search time from the iterative deepening can be more than made up from better move ordering.

Killer moves - the best moves.

Transpositions

  • different permutations of moves can lead to the same position. This is a transposition occuring. We can solve this with a transposition table that caches the heuristic value of states.

Problems

Even with alpha-beta pruning and clever move ordering, minimax wont work for games like chess and Go because the state space is to big to explore in the available time.

Type A strategy considers all possible moves to a certain depth in the search tree, and then uses a heuristic evaluation function to estimate the utility of states at that depth. This is a wide but shallow exploration of the tree.

Type B ignores moves that look bad and follows primising lines " as far as possible". This is deep and narrow.

Go prefers type A where as go prefers type B. (recently type B has been smashing it tho)