Training a reinforcement learning agent for OpenAI's Car Racing environment.
- Deep Q-Learning[1]
We implement a Deep Q-Network and its forward pass in the DQN class in model.py. Our network takes a single frame as input.
The training loop for the DeepQ network is defined in deepq.py file. The target network updations and the deepQ step are defined in the learning.py file.
The action space is defined in the action.py file. We experimented with various action sets and eventually decided to stick with the 7 actions as defined in the file.
schedule.py is the script that defines the exploration-exploitation tradeoff. We begin with a p_initial value of 1 which means we would like to focus on exploration early on during the training.
- Double Deep Q-Learning[2]
We implement a Double Deep Q-Network and its forward pass in the DQN class in model.py. Our network takes a single frame as input similar to the Deep Q learning experiment.
The traning loop for the Double Deep Q network is defined in the file deepq_double.py. The target network updation and double deepQ step is defined in the learning_double.py file.
For this experiment, we use the same action spaces as the DeepQ experiment.
We use the same scheme for exploration-exploitation tradeoff as in the Deep Q leanring experiment.
- Replay buffer for storing agent's memories
- Target q-network to make q-learning stable
- extract sdc_gym.zip
- cd sdc_gym
- pip install -e .["box2d"]
- python evaluate_racing.py score