FindGem_v1 is a game that the robot walks in a two-dimensional maze. There are fire pits, stone pillars, and diamonds in the maze. If the robot falls into the fire pit, the game is over. If you find diamonds, you can get rewards and the game is over! Design the best strategy to let the robot find the diamond as soon as possible and get rewards.
Every time the environment is reset, it will only change the position of the robot and not the positions of gems, fire pits, and stone pillars
- video
FindGem's Env setting Reference from this link Based on the original author, made appropriate bug fixes and robot move position modifications
pip install -r requirements.txt
- First move Custom env code move to your site-package gym path
/anaconda3/Lib/site-packages/gym/envs
mkdir /anaconda3/Lib/site-packages/gym/envs/user
cp ./env/grid_findGem_v1.py /anaconda3/Lib/site-packages/gym/envs/user
cp ./env/__init__.py /anaconda3/Lib/site-packages/gym/envs/user
- Registration environment in gym
open
.../anaconda3/Lib/site-packages/gym/envs/__init__.py
add the following code
register(
id='GridWorld-v1',
entry_point='gym.envs.user:GridEnv1',
max_episode_steps=200,
reward_threshold=100,
)
DQN model reference from pytorch official tutorials train process result save reference from tensorflow org tutorials and Modify the reward mechanism, increase with the number of steps, reduce rewards, and update status acquisition
python customEnvFindGem.py --mode train --BATCH_SIZE 256 --num_episodes 50000
python customEnvFindGem.py --mode test
result file will save at ./result/result_polyDL.mp4
result_mp4