Q-learning
is an off policy reinforcement learning algorithm that seeks to find the best action to take given the current state. It's considered off-policy because the q-learning function learns from actions that are outside the current policy, like taking random actions, and therefore a policy isn't needed.
- https://www.youtube.com/watch?v=yMk_XtIEzH8
- https://towardsdatascience.com/simple-reinforcement-learning-q-learning-fcddc4b6fe56
- https://www.mygreatlearning.com/blog/simplified-reinforcement-learning-q-learning/
- https://en.wikipedia.org/wiki/Q-learning
- https://towardsdatascience.com/a-beginners-guide-to-q-learning-c3e2a30a653c (use incognito)