Implementation of Reinforcement Learning algorithms in python.
Algorithms implemented:
- epsilon-greedy on 10-armed bandit testbed
- Softmax action selection method using the Gibbs distribution on a 10-armed testbed
- UCB1
- Median Elimination Algorithm
- Q-learning on puddle-world using OpenAI Gym
- SARSA on puddle-world using OpenAI Gym
- SARSA-Lambda on puddle-world using OpenAI Gym
- Policy Gradients on chakra & vishamC world using OpenAI Gym
- SMDP Q-learning on four-room grid world environment using OpenAI Gym
- Intra-Option Q-learning on four-room grid world environment using OpenAI Gym
- Deep Q-Network (DQN) on ‘CartPole’ environment of OpenAI Gym using TensorFlow
Plots:
- Regret
- Average Reward
- Percentage Optimal arm pulls
- Visualizing Optimal Policy
- Visualizing state values
- The trajectory followed by learned agent
- Learning Curves - Average steps to goal, Average total discounted return, Episode Length.