-
Components of Reinforcement learning and Environment dynamics.
-
Custom Environment creation using OpenAI gymnasium API.
-
Registering and Running of Environment.
-
Analysis and code of Grid walk Environments ( Bandit-Walk and Random-Walk Enironments. )
-
Theroies of methods based on Markov's decision process and Bellman's Equation.
-
Algorithms to explore an environment -
* Greedy Approach ( Pure Exploitation ) * Pure Exploration * Epsilon Greedy Approach * Decaying Epsilon Approach * Softmax Exploration Strategy * UCB Strategy
-
Probabilistic Prediction methods -
* Monte-Carlo FV * Monte-Carlo EV * Temporal Difference * n - Step TD * TD Lambda
-
Control Algorithms -
* Monte-Carlo FV Control * Monte-Carlo EV Control * SARSA * Q- Learning * Double Q-Learning * SARSA lambDa ( Accumulating and Replacing Traces ) * Q lambDA ( Accumulating and Replacing Traces ) * DYNA - Q ( Model Based ) * Trajectory Sampling ( Model Based )
-
Deep Reinforcement Learning -
* Neural-Fitted Q ( NFQ ) * DQN , DDQN , D3QN * PER-D3QN * Reinforce * VPG * DDPG * TD3 * PPO
-
Plots and reports based on above methods.
- The assignment contains all the implementation and plots.
- It has two solutions - the one which I wrote and the one provided by tutors.
- All the codes in this repo may or may not be correct. Please verify once before using.