Tetris_RL

A reinforcement learning project where I train an AI to play tetris

Tetris Engine

We have created a tetris engine (with gravity) where the agent has the options at each state

Move left
Move right
Rotate piece
Do Nothing
Hard drop

For all options except hard drop the piece drops by one row.

These actions are sufficient for strategic piece placement while avoiding unnecessary complexity. Real Tetris includes soft drops, lock delays, hold mechanics, wall kicks, combo scoring, and multi-frame movement. These features make gameplay richer for humans but dramatically increase the complexity for RL agents.

This simplification results in a deterministic, single-step control scheme where each action immediately affects the board state. The agent focuses solely on where to place the current piece, not when.

We also use 3 heuristics:

Aggregate Height : How high the stack is, the lower the better
Bumpiness : The jaggedness of the surface, the flatter the better
Holes : Empty spaces with filled blocks above them, the fewer the better

We calculate bumpiness by adding the sum of absolute differences between two adjacent columns, 1&2, 2&3, 3&4 so on

We track number of steps and stop at 500 because an intermediate model might cause an infinite loop otherwise by never using hard drop

The observation space fed to our dqn_sb3 model is (2, 20, 10) + (3,) where

We feed the board via two channels, one containing the board and fixed pieces and the other one contianing just the falling piece We also feed it the three heuristics mentioned above

I have tried training a model on several times, at first there was no gravity, so you move left or right and rotate the piece as many times as you want and then hard drop version 1: Death penalty was a 1000 times greater than step penalty (10 and 0.01). So the model will never drop and keep playing with the piece infinitely result 1: training will get stuck at evaluation version 2: Added max steps, to avoid infinity. result 2: But model will make 500 steps with no drop version 3: Started changing step penalty result 3: if it was too great, model will die asap to avoid it, if it was too little model will never drop a piece. Couldn't find a suitable middle ground version4: Introduced penalties for changing the heuristics in a negative manner, result 4: still stuck at the dilemma from previous version

version 5: Added gravity, so pieces fall one level after each move. Avoids infinite loops by guranteeing death. Removed step penalty and penalty for incresing aggregate height. (V3) result 5: Model survives for around 250-300 steps but doesn't clear any line version 6: Reintroduced aggregate height penalty. Introduced a survival reward of 0.01 (V4) result 6: Same as result 5, but model seems close to clearing lines version 7: incrase exploration fraction from 0.3 to 0.5 and exploration_final_eps to 0.05 from 0.02 (V5) result 7: same as 6 version 8: tuned many of the parameters of the model and train on 3M timesteps (V6)

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
agents		agents
models/q_learning		models/q_learning
tetris		tetris
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tetris_RL

Tetris Engine

About

Uh oh!

Releases

Packages

Languages

Rahan-M/Tetris_RL

Folders and files

Latest commit

History

Repository files navigation

Tetris_RL

Tetris Engine

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages