This project aims to adapt a Deep Q-Network (DQN) agent to learn to walk in the Bipedal Walker environment from OpenAI Gym. The continuous action space of the environment is discretized to allow the learning process using DQN.
In this project, we implement and train a DQN agent to navigate and walk in the Bipedal Walker environment. The Bipedal Walker environment presents a challenging task as it requires the agent to control a two-legged robot to walk across a terrain. This task involves balancing, coordinating both legs, and handling uneven surfaces.
- Discretization of Action Space: The continuous action space of the Bipedal Walker is discretized into discrete actions, allowing the DQN to handle the environment effectively.
- DQN Implementation: Utilizes a neural network to approximate Q-values for state-action pairs.
- Experience Replay Buffer: Stores past experiences to break the correlation between consecutive samples and to improve learning stability.
- Target Network: Helps stabilize training by reducing the divergence between the Q-network and the target values.
- Training and Evaluation: Includes training scripts and evaluation modes to visualize and assess the agent's performance.
The DQN model consists of:
- Input Layer: Accepts the state representation from the environment.
- Hidden Layers: Two fully connected layers with ReLU activations.
- Output Layer: Outputs Q-values for each possible action.
-
Initialization:
- Initialize the replay buffer, Q-network, and target network.
- Set exploration parameters (starting exploration rate, minimum exploration rate).
-
Action Selection:
- Use an epsilon-greedy policy to balance exploration and exploitation. With probability ε, select a random action; otherwise, select the action with the highest Q-value.
-
Experience Replay:
- Store transitions (state, action, reward, next state, done) in the replay buffer.
- Sample mini-batches of transitions from the replay buffer to train the Q-network.
-
Optimization:
- Calculate the loss between the current Q-values and the target Q-values.
- Perform backpropagation and update the Q-network parameters.
- Periodically update the target network to match the Q-network.
-
Training Episodes:
- Train the agent over multiple episodes, each consisting of multiple steps in the environment.
- Save the trained model periodically and monitor the agent's performance.
The DQN agent managed to walk approximately, particularly around episodes 2100-2500. Pre-trained agents are available in the agents
directory for evaluation.
To train or evaluate the DQN agent, run the following command: python main.py