Skip to content

Recreating the iconic SNAKE ๐Ÿ game from the unbreakable NOKIA phone. Except a NEURAL NET gets to play it :)

Notifications You must be signed in to change notification settings

0m-a-D/reinforcement_rs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

16 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Using elegibility traces -> E(s,a) for efficient assigning of rewards for (state, action) pairs. more specifically: dutch elegibility traces.

-- ฦ›: trace decay parameter::: set to 0.1

-- โบ: Learning parameter...โบ โˆˆ (0,1]

-- ฦ”: have to fine-tune as training proceeds ฦ” โˆˆ (0,0.3)

-- R โˆˆ [0,1]

-- gradually decrease temperature ฮฒ in softmax to slowly increase exploitation.

Learning policy: Q-learning instead of SARSA -> q(st,at) = q(st,at) + โบ(Rt+1 + ฦ”max(q(st,a')) - q(st,at))

About

Recreating the iconic SNAKE ๐Ÿ game from the unbreakable NOKIA phone. Except a NEURAL NET gets to play it :)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages