gomoku

Gomoku game written in haskell gloss library and reinforcement learning algorithm. It is played on 15x15 board, though the size of the board can be changed.

Usage: gomoku [OPTION...]
  -d USEC            --delay=USEC              delay microseconds
  -D WIDTH,HEIGHT    --dimension=WIDTH,HEIGHT  board dimension width,height
  -w CONNECTED       --wincond=CONNECTED       number of connected moves to win
  -p PARAMETER_FILE  --theta=PARAMETER_FILE    parameter file name
  -m MODE,MODE       --mode=MODE,MODE          playing mode ([human|ai],[human|ai])
)

![screenshot] (https://github.com/slcz/gomoku/blob/master/board.png)

AI strategy is based on temporal difference learning. State value of time t+1 is used to update a state value at time t,

$$V(t) ← V(t) + α(r(t) + λ(V(t+1) - V(t)))$$

Where V(t) is the value of state at time t, α is learning rate, λ is the decay factor that credits the reward to previous moves and r(t) is the instant reward get from time t. Suppose the terminal state is specified by T, the end game reward r(T) is 1 if the game is won, 0.5 when tie and 0 when loss. The reward is then back propagated to the previous states, subject to the decay factor λ. For all intermediate moves, r(t) are set to 0, stands for no instant rewards.

Since the game is played by two parties in turn, the mover always plays to maximize its own value and minimize the opponents value. The actual value function uses minmax strategy, i.e.,

$$V(t) ← 1 - V(t) - λ(V(t+1) - V(t)), when t ≠ T V(T) ← 1$$

AI training happens offline after each game, since only when the reward (win/loss/tie) is known. Each step is trained by a 3 layers neural network, using target V(t+1) - V(t). Input features are gathered by scanning the board for all 5 neighboring positions (horizontally, vertically and diagonally), looking for locations occupied with only movers' stones (and/or empty slots). In particular, the following patterns are recognized as input features,

...oo
..o.o
..oo.
..ooo
.o..o
.o.o.
.o.oo
.oo.o
.ooo.
.oooo
o...o
o..oo
o.o.o
o.ooo
oo.oo
ooooo

To get the input feature set, delta of patterns are recognized for each side, then both sides are aggregated as follows,

$$I ← α I(m) + I(f)$$

Where α is credit factor for the first mover, I(m) is the feature vectors of the first mover and I(f) the feature vectors of the next mover. I is the final input vector, which is used for feed forward neural networks.

Attached sample parameter file (theta) plays decently well. It is the result of AI self playing for a few thousand games. It easily beats beginner-intermediate human players.

Name		Name	Last commit message	Last commit date
Latest commit History 110 Commits
src		src
LICENSE		LICENSE
README.md		README.md
Setup.hs		Setup.hs
board.png		board.png
gomoku.cabal		gomoku.cabal

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

gomoku

About

Uh oh!

Releases

Packages

Languages

License

slcz/gomoku

Folders and files

Latest commit

History

Repository files navigation

gomoku

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages