CS_534_MsPacman

Reinforcement Learning for Ms. Pacman and Space Invaders

Deep Reinforcement Learning

DQN Algortihm

This implementation of reinforcement learning aims to capture the idea of end-to-end learning. The agent takes the complete game frame (RGB values of each pixel) and maps it to actions. At every state (frame) s, an action a is chosen according to epsilon greedy approach and the agent gets a reward r and proceeds to the next state s'. Q - value update is performed for this transition as: Q(s,a) = R + gamma*max(Q(s',a)) gamma is the discount factor

Experience Replay

Every transition s - a - r - s' is stored in a memory buffer of constant size. At a certain predefined frequency, this memory is sampled randomly and fed to the neural network to calculate the loss given by: Loss = sum(Q(s,a)_old - Q(s,a)_new).^2

Installation Instructions

pip install gym gym[atari]

On OSX or maxOS:

brew install cmake boost boost-python sdl2 swig wget

On Ubuntu 14.04:

apt-get install -y python-numpy python-dev cmake zlib1g-dev libjpeg-dev xvfb libav-tools xorg-dev python-opengl libboost-all-dev libsdl2-dev swig

Github

git clone https://github.com/jmcmahon443/CS_534_MsPacman.git

Sample Agents

python do_nothing.py

python do_random.py

DQN Agent

This agent takes the complete frame, downsamples it and converts it into grayscale. The 2-D frame matrix is converted to a single row vector which is fed into the neural network. Thus the network works on matrix_breadth x matrix_height size feature set.

python dqn.py

Or if working on a university cluster

sbatch dqn.py

CNN Agent

This agent takes the complete RGB frame and stakes 4 consecutive frames as input to the neural network. The CNN agent effectively, learns the best features in order to closely approximate the Q - value function.

python cnn.py

Or if working on university cluster

sbatch cnn.py

Analysis

These functions analyze whatever .csv files are in the /analysis/ folder.

python last_100_avg_std.py

python mv_avg_plot.py

References:

[1][http://neuro.cs.ut.ee/demystifying-deep-reinforcement-learning/]

[2][https://jaromiru.com/2016/09/27/lets-make-a-dqn-theory/]

Name		Name	Last commit message	Last commit date
Latest commit History 536 Commits
reinforcement_ucb		reinforcement_ucb
.dockerignore		.dockerignore
.gitignore		.gitignore
.travis.yml		.travis.yml
README.md		README.md
__init__.py		__init__.py
avg_std.py		avg_std.py
cnn_99.py		cnn_99.py
cnn_99_spc_inv.py		cnn_99_spc_inv.py
do_nothing.py		do_nothing.py
do_nothing_spc_inv.py		do_nothing_spc_inv.py
do_random.py		do_random.py
do_random_spc_inv.py		do_random_spc_inv.py
do_random_test.py		do_random_test.py
dqn_99.py		dqn_99.py
dqn_99_spc_inv.py		dqn_99_spc_inv.py
last_100_avg_std.py		last_100_avg_std.py
mv_avg_plot.py		mv_avg_plot.py
mv_max_plot.py		mv_max_plot.py
rl_ex_20.py		rl_ex_20.py
rl_ex_20_spc_inv.py		rl_ex_20_spc_inv.py
rl_ex_30.py		rl_ex_30.py
rl_ex_30_spc_inv.py		rl_ex_30_spc_inv.py
rl_ex_40.py		rl_ex_40.py
rl_ex_40_spc_inv.py		rl_ex_40_spc_inv.py
rl_ex_diff_vec_20.py		rl_ex_diff_vec_20.py
rl_ex_diff_vec_30.py		rl_ex_diff_vec_30.py
rl_ex_ds_20.py		rl_ex_ds_20.py
rl_ex_ds_20_spc_inv.py		rl_ex_ds_20_spc_inv.py
rl_ex_ds_30.py		rl_ex_ds_30.py
rl_ex_ds_30_spc_inv.py		rl_ex_ds_30_spc_inv.py
rl_ex_ds_40.py		rl_ex_ds_40.py
rl_ex_ds_40_spc_inv.py		rl_ex_ds_40_spc_inv.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CS_534_MsPacman

Deep Reinforcement Learning

DQN Algortihm

Experience Replay

Installation Instructions

Github

Sample Agents

DQN Agent

CNN Agent

Analysis

References:

About

Contributors 3

Languages

robotjsorg/CS_534_MsPacman_SpcInvaders

Folders and files

Latest commit

History

Repository files navigation

CS_534_MsPacman

Deep Reinforcement Learning

DQN Algortihm

Experience Replay

Installation Instructions

Github

Sample Agents

DQN Agent

CNN Agent

Analysis

References:

About

Resources

Stars

Watchers

Forks

Contributors 3

Languages