GitHub - 0m-a-D/reinforcement_rs: Recreating the iconic SNAKE 🐍 game from the unbreakable NOKIA phone. Except a NEURAL NET gets to play it :)

Using elegibility traces -> E(s,a) for efficient assigning of rewards for (state, action) pairs. more specifically: dutch elegibility traces.

-- ƛ: trace decay parameter::: set to 0.1

-- ⍺: Learning parameter...⍺ ∈ (0,1]

-- Ɣ: have to fine-tune as training proceeds Ɣ ∈ (0,0.3)

-- R ∈ [0,1]

-- gradually decrease temperature β in softmax to slowly increase exploitation.

Learning policy: Q-learning instead of SARSA -> q(s_t,a_t) = q(s_t,a_t) + ⍺(R_t+1 + Ɣmax(q(s_t,a')) - q(s_t,a_t))

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github/workflows		.github/workflows
src		src
.gitignore		.gitignore
Cargo.toml		Cargo.toml
README.md		README.md

Provide feedback