Demos for Reinforcement Learning

UNDER DEVELOPMENT!!

How to prepare

pip

Just run pip install -r requirements.txt to install the dependencies. Be careful with the Python version and global packages.

virtualenv

If you're familiar with virtualenv, then you can create the environment by

virtualenv demo

and activate the virtual environment

source bin/activate

Finally, use pip to install the requirements

pip install -r requirements.txt

Of course, virtualenvwrapper is more pleasant.

pipenv(highly recommended)

If you can use pipenv, that's perfect.

pipenv install

to create the project and install all the dependencies for it. Make sure Python 3.6 is installed on your system.

How to run

pip/virtualenv

Run python3.6 main.py -h directly to see the help page.

pipenv

Run pipenv run python3.6 main.py -h to see the help.

USAGE

usage: main.py [-h] {train,run} ...

This is a demo to show how Q_learning makes agent intelligent

optional arguments:
  -h, --help   show this help message and exit

mode:
  {train,run}  Choose a mode
    train      Train an agent
    run        Make an agent run

train

Help for train subcommand

usage: main.py train [-h] [-m {c,r}] [-r ROUND] [-l] [-s] [-c CONFIG_FILE]
                    [-d {t}] [-a]

optional arguments:
  -h, --help            show this help message and exit
  -m {c,r}, --mode {c,r}
                        Training mode, by rounds or by convergence
  -r ROUND, --round ROUND
                        Training rounds, neglect when convergence is chosen
  -l, --load            Whether to load Q table from a csv file when training
  -s, --show            Show the training process.
  -c CONFIG_FILE, --config_file CONFIG_FILE
                        Config file for significant parameters
  -d {t}, --demo {t}    Choose a demo to run
  -a, --heuristic       Whether to use a heuristic iteration
  -g {Q,SARSA}, --algorithm {Q,SARSA}
                        Training algorithm: Q or SARSA, default is Q

Details:

m

Mode of terminal when training. c stands for 'convergence', r stands for 'round'. If c is chosen, then the agent will stop only when the Q table is converged. If r is chosen, the agent will only be trained for certain rounds(which can be modified by -r flag).

l

Load the Q table from a csv file. The file name can be modified in the program. If not, a new Q table is built.

r

Number of rounds to train the warrior. Will be ignored is -m c is chosen.

s

s flag can show the process of training if been selected.

c

A config filename can be specified when training with this argument.

d

Choose a demo to train.

a

Whether to use the heuristic policy to accelerate the training progress.

g

Choose an algorithm from {Q, SARSA, DoubleQ}

run

Help for run subcommand

usage: main.py run [-h] [-d {t}] [-q Q]

optional arguments:
  -h, --help          show this help message and exit
  -d {t}, --demo {t}  Choose a demo to run
  -q Q                Choose a Q table from a csv file

Details:

d

Choose a demo to run.

q

Specify a Q table file to use when run.

Demos

1-D TreasureHunt

Config file

Config file must be a YAML file containing the following parameters

size: 10
epsilon: 0.9
gamma: 0.9
alpha: 0.1
speed: 0.1

size

The length of the map.

epsilon

The probability of choosing a random action. The other option is choosing the action which makes the Q value of current state maximum.

gamma

Discount factor.

alpha

Learning rate.

speed

Speed of displaying.

DISPLAY

After convegence of training:

Xo_________T
X_o________T
X__o_______T
X___o______T
X____o_____T
X_____o____T
X______o___T
X_______o__T
X________o_T
X_________oT
X__________o

The agent can find the treasure directly.

2-D TreasureHunt

Train the agent

pipenv run python main.py train -d 2d -s

Enjoy the training process.

Run

pipenv run python main.py run -d 2d

Watch the result.

DISPLAY

|@| | |+| | | | | | |
| |+|X| | | | |+| | |
| | |X| | | | | | | |
| | | | | |X|X|+| | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | |X| | |+| |
| | | | | | |X|X| |+|
| | |+| | | | | | | |
| | | | |+| | | |X|#|

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
build		build
config		config
figs		figs
source		source
src		src
test		test
Makefile		Makefile
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.rst		README.rst
main.py		main.py
requirements.txt		requirements.txt
tests.py		tests.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Demos for Reinforcement Learning

How to prepare

How to run

USAGE

train

run

Demos

1-D TreasureHunt

Config file

DISPLAY

2-D TreasureHunt

Train the agent

Run

DISPLAY

About

Uh oh!

Releases

Packages

Uh oh!

Languages

houluy/Reinforcement_Learning_Demos

Folders and files

Latest commit

History

Repository files navigation

Demos for Reinforcement Learning

How to prepare

How to run

USAGE

train

run

Demos

1-D TreasureHunt

Config file

DISPLAY

2-D TreasureHunt

Train the agent

Run

DISPLAY

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages