This repository contains the implementation of two fundamental reinforcement learning algorithms, Q-learning and SARSA, applied to the Cliff Walking environment. The project explores how these algorithms learn to navigate the gridworld, avoid the cliff, and reach the goal while minimizing penalties.
-
Parts:
- Q-learning: An off-policy algorithm that learns the optimal policy by estimating the maximum future rewards.
- SARSA: An on-policy algorithm that updates its policy based on the actual actions taken, leading to potentially safer but less aggressive strategies.
- Comparison: A detailed comparison of the paths chosen by each algorithm, highlighting differences in exploration and exploitation behaviors.
-
Tasks:
- Implement and evaluate the Q-learning algorithm.
- Implement and evaluate the SARSA algorithm.
- Compare and analyze the optimal policies derived from both algorithms.