Reinforcement-Learning

Index:

Course Overview
Main TextBooks
Slides and Papers
1. Lecture 1: Introduction to Reinforcement Learning
2. Lecture 2: Exploration and Exploitation
3. Lecture 3: Finite Markov Decision Processes
4. Lecture 4: Dynamic Programming
5. Lecture 5: Monte Carlo Methods
6. Lecture 6: Temporal-Diference Learning
7. Lecture 7: n-step Bootstrapping
8. Lecture 8: Planning and Learning with Tabular Methods
9. Lecture 9: On-policy Prediction with Approximation
10. Lecture 10: On-policy Control with Approximation
11. Lecture 11: Off-policy Methods with Approximation
12. Lecture 12: Eligibility Traces
13. Lecture 13: Policy Gradient Methods
14. Lecture 14: Deep Reinforcement Learning
15. Lecture 15: Applications
16. Lecture 16: Useful Toolkits and Libraries
Additional Resources
Class Time and Location
- Recitation and Assignments
Projects
Grading
- Three Exams
Prerequisites
- Linear Algebra
- Probability and Statistics
Topics
Account
Academic Honor Code
Questions
Miscellaneous:
- Data Handling

Course Overview:

In this course, you will learn the foundations of Reinforcement Learning. To realize the dreams and impact of AI requires autonomous systems that learn to make good decisions. Reinforcement learning is one powerful paradigm for doing so, and it is relevant to an enormous range of tasks, including robotics, game playing, consumer modeling and healthcare.

Main TextBook:

Main TextBook:

Reinforcement Learning by By Richard S. Sutton and Andrew G. Barto

Slides and Papers:

Recommended Slides & Papers:

Introduction to Reinforcement Learning

Required Reading:

Slide: An Introduction to Reinforcement Learning by Hossein Hajiabolhassan
Slide: Introduction by Hado van Hasselt

Suggested Reading:

Blog: An Introduction to Reinforcement Learning by Thomas Simonini
Blog: Reinforcement Learning Introduction: Foundations and Applications by Nikolay Manchev

Additional Resources:

Blog: Reinforcement Learning Tutorial
Blog: Reinforcement Learning: What is, Algorithms, Types & Examples by Daniel Johnson
Blog: The Unsupervised Reinforcement Learning Benchmark by Misha Laskin and Denis Yarats

Exploration and Exploitation

Required Reading:

Slide: An Introduction to Reinforcement Learning by Hossein Hajiabolhassan
Slide: Exploration and Exploitation by Hado van Hasselt
Paper: A Tutorial on Thompson Sampling by Daniel J. Russo, Benjamin Van Roy, Abbas Kazerouni, Ian Osband, and Zheng Wen
Lecture: Introduction to Thompson Sampling by Erik Waingarten (Instructor: Shipra Agrawal)

Suggested Reading:

Blog: Bandit Algorithms by Tor Lattimore and Csaba Szepesvari
Slide: Exploration and Exploitation by David Silver
Lecture: Stochastic Multi-Armed Bandits, Regret Minimization by Walter Cai, Emisa Nategh, Jennifer Rogers (Lecturer: Kevin Jamieson)
Blog: Beta Distribution — Intuition, Examples, and Derivation by Aerin Kim
Blog: Visualizing Beta Distribution and Bayesian Updating by Shaw Lu
Blog: Conjugate Prior Explained: With Examples & Proofs by Aerin Kim

Additional Resources:

Tool: The Calculator for Beta Distribution by Dr. Bognar
Tool: Probability Distribution Explorer: This is a tool for you to explore commonly used probability distributions, including information about the stories behind them (e.g., the outcome of a coin flip is Bernoulli distributed), their probability mass/probability density functions, their moments, etc.
Blog: Learn Thompson Sampling by Building an Ad Auction! by Will Kurt
Blog: Do You Know Credible Interval by Shaw Lu
Toolkit: Multi-armed Bandit Demo by Mark Reid
Code (Python): Reinforcement Learning: The K-armed bandit problem by Nikolay Manchev
Code (Python): Multi-Armed Bandit Python Example using UCB by HackDeploy
Code (Python): Multi-Armed Bandits: Epsilon-Greedy Algorithm with Python Code by Artemis Nika

Finite Markov Decision Processes

Required Reading:

Slide: Dynamic Programming by Hossein Hajiabolhassan
Slide: MDPs & Dynamic Programming by Diana Borsa

Suggested Reading:

Blog: Understanding Markov Chains with the Black Friday Puzzle by Will Kurt
Blog: The Intuition Behind Markov Chains by Kyle Chan

Additional Resources:

Slide: An Introduction to Markov Decision Processes by Bob Givan and Ron Parr

Dynamic Programming

Required Reading:

Slide: Dynamic Programming by Hossein Hajiabolhassan
Slide: MDPs & Dynamic Programming by Diana Borsa
Blog: GridWorld: Dynamic Programming Demo by Andrej Karpathy
Blog: Why Does the Optimal Policy Exist? by Alireza Modirshanechi
Blog: Optimizing Jack's Car Rental by Alexander Kozlov
Note: How to Gamble If You Must by Kyle Siegrist
Blog: Hyperbolic Discounting — The Irrational Behavior That Might be Rational After All by Chris Said

Suggested Reading:

To get more familiar with dynamic programing, I recommend to read the following blogs:

Blog: Overlapping Subproblems Property in Dynamic Programming
Blog: Optimal Substructure Property in Dynamic Programming
Blog: Longest Increasing Subsequence
Blog: Longest Common Subsequence

Additional Resources:

Algorithms: Visualizations of Graph Algorithms: Some important algorithms of this area are presented and explained in the following, including both an interactive applet and pseudocode.
Blog: Bellman–Ford Algorithm

Monte Carlo Methods

Required Reading:

Slide: Model-Free Prediction by Hado van Hasselt
Blog: Introduction to Monte Carlo Methods by Asael Alonzo Matamoros
Blog: Introduction to Monte Carlo simulation by Kinder Chen
Blog: Off Policy Monte Carlo Prediction with Importance sampling by Shangeth Rajaa

Suggested Reading:

Paper: Monte Carlo Methods by Jonathan Pengelly
Blog: What is Rejection Sampling? by Kapil Sachdeva

Temporal-Diference Learning

Required Reading:

Blog: Reinforcement Learning Tutorial Part 1: Q-Learning by Juha Kiili

Suggested Reading:

Blog: Deep Double Q-Learning — Why You Should Use It by Ameet Deshpande
Blog: 5 Steps to Master the Reinforcement Learning with a Q-Learning Python Example by Rune
Blog: Reinforcement Learning — Generalisation of Continuing Tasks by Jeremy Zhang

Additional Resources:

Blog: Dopamine and Temporal Difference Learning: A Fruitful Relationship Between Neuroscience and AI by Will Dabney and Zeb Kurth-Nelson
Blog: Temporal-Difference (TD) Learning (Using Gym) by Christian Herta

n-step Bootstrapping

Required Reading:

Multi-step Bootstrapping by Doina Precup

Planning and Learning with Tabular Methods

Required Reading:

Blog: Integrating Real and Simulated Data in Dyna-Q Algorithm by Ranko Mosic

Suggested Reading:

Monte Carlo Tree Search – Beginners Guide by Kamil Czarnogórski
Blog: Monte Carlo Tree Search: An Introduction by Benjamin Wang
Blog: Introduction to Monte Carlo Tree Search: The Game-Changing Algorithm behind DeepMind's AlphaGo by Ankit Choudhary

On-policy Prediction with Approximation

Required Reading:

Slide: Function Approximation in Reinforcement Learning by Hado van Hasselt
Blog: Tile-Coding: An Efficient Sparse-Coding Method for Real-Valued Data by Hamid Maei
Blog: State Aggregation with Monte Carlo

Suggested Reading:

Blog: Radial Basis Function Neural Network Simplified by Luthfi Ramadhan
Blog: RBF Neural Networks

On-policy Control with Approximation

Required Reading:

Slide: Function Approximation in Reinforcement Learning by Hado van Hasselt

Suggested Reading:

Blog: Tile Coding Software by Richard S. Sutton

Off-policy Methods with Approximation

Required Reading:

Slide: Multi-step & Off Policy by Hado van Hasselt

Eligibility Traces

Required Reading:

Eligibility Traces by Doina Precup

Policy Gradient Methods

Required Reading:

Slide: Policy Gradient by David Silver
Slide: Policy-Gradient & Actor-Critic methods by Hado van Hasselt

Deep Reinforcement Learning

Required Reading:

Slide: Deep Reinforcement Learning 1 by Matteo Hessel
Slide: Deep Reinforcement Learning 2 by Matteo Hessel

Suggested Reading:

Blog: Deep Reinforcement Learning: Pong from Pixels by Andrej Karpathy
Blog: Reinforcement Learning with Neural Network by Kumar Chandrakant

Additional Resources:

Toolkit: Welcome to Spinning Up in Deep RL! This is an educational resource produced by OpenAI that makes it easier to learn about deep reinforcement learning (deep RL).
Blog: A Free course in Deep Reinforcement Learning from Beginner to Expert Thomas Simonini

Applications

Required Reading:

Slide: Classic Games by David Silver

Additional Resources:

Blog: Applications by David Silver
Blog: Emergent Tool Use from Multi-Agent Interaction by OpenAI
Blog: Solving Rubik’s Cube with a Robot Hand by OpenAI

Useful Toolkits and Libraries

Required Reading:

Toolkit: Gym is a toolkit for developing and comparing reinforcement learning algorithms. It supports teaching agents everything from walking to playing games like Pong or Pinball.
- Blog: Algorithms
- Blog: Classic Control
- Blog: Robotics
- Blog: MuJoCo
- Blog: Atari
- Blog: Wrappers

Suggested Reading:

Blog: Tutorial: writing a custom OpenAI Gym environment by Vadim Liventsev
Python Module: Deque in Python

Additional Resources:

Package: Highway-env’s Documentation provides a collection of environment for decision-making in Autonomous Driving.

Additional Resources:

Papers:
- Slide: Distributed RL by Richard Liaw
- PDF: Acme: A Research Framework for Distributed Reinforcement Learning
- Blog: Acme: A New Framework for Distributed Reinforcement Learning
- GitHub: Must-read Papers on GNN by Natural Language Processing Lab at Tsinghua University
Online Demos:
- Blog: ConvNetJS Deep Q Learning Demo by Andrej Karpathy
Codes:
- Codes: Reinforcement Learning an Introduction by Shangtong Zhang
Courses:
- Blog: Reinforcement Learning Lecture Series 2021 (DeepMind) by Hado van Hasselt, Diana Borsa & Matteo Hessel
  Blog: A Course taught by David Silver:
- Introduction to Reinforcement Learning
- Reinforcement Learning

Class Time and Location:

Saturday and Monday

Recitation and Assignments:

Tuesday

Projects:

Projects are programming assignments that cover the topic of this course. Any project is written by Jupyter Notebook. Projects will require the use of Python 3.7, as well as additional Python libraries.

Google Colab:

Google Colab is a free cloud service and it supports free GPU!

Fascinating Guides For Machine Learning:

Technical Notes On Using Data Science & Artificial Intelligence: To Fight For Something That Matters by Chris Albon

Latex:

The students can include mathematical notation within markdown cells using LaTeX in their Jupyter Notebooks.

A Brief Introduction to LaTeX PDF
Math in LaTeX PDF
Sample Document PDF
TikZ: A collection Latex files of PGF/TikZ figures (including various neural networks) by Petar Veličković.

Grading:

Projects and Midterm – 50%
Endterm – 50%

ُThree Exams:

First Midterm Examination:
Second Midterm Examination:
Final Examination:

Prerequisites:

General mathematical sophistication; and a solid understanding of Algorithms, Linear Algebra, and Probability Theory, at the advanced undergraduate or beginning graduate level, or equivalent.

Linear Algebra:

Video: Professor Gilbert Strang's Video Lectures on linear algebra.

Probability and Statistics:

Learn Probability and Statistics Through Interactive Visualizations: Seeing Theory was created by Daniel Kunin while an undergraduate at Brown University. The goal of this website is to make statistics more accessible through interactive visualizations (designed using Mike Bostock’s JavaScript library D3.js).
Statistics and Probability: This website provides training and tools to help you solve statistics problems quickly, easily, and accurately - without having to ask anyone for help.
Jupyter NoteBooks: Introduction to Statistics by Bargava
Video: Professor John Tsitsiklis's Video Lectures on Applied Probability.
Video: Professor Krishna Jagannathan's Video Lectures on Probability Theory.

Topics:

Have a look at some assignments of Stanford students (Reinforcement Learning to get some general inspiration.

Account:

It is necessary to have a GitHub account to share your projects. It offers plans for both private repositories and free accounts. Github is like the hammer in your toolbox, therefore, you need to have it!

Academic Honor Code:

Honesty and integrity are vital elements of the academic works. All your submitted assignments must be entirely your own (or your own group's).

We will follow the standard of Department of Mathematical Sciences approach:

You can get help, but you MUST acknowledge the help on the work you hand in
Failure to acknowledge your sources is a violation of the Honor Code
You can talk to others about the algorithm(s) to be used to solve a homework problem; as long as you then mention their name(s) on the work you submit
You should not use code of others or be looking at code of others when you write your own: You can talk to people but have to write your own solution/code

Questions?

I will be having office hours for this course on Saturday (09:00 AM--10:00 AM). If this is not convenient, email me at hhaji@sbu.ac.ir or talk to me after class.

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
Images		Images
README.md		README.md

hhaji/Reinforcement-Learning

Folders and files

Latest commit

History

Repository files navigation

Reinforcement-Learning

Index:

Course Overview:

Main TextBook:

Slides and Papers:

Introduction to Reinforcement Learning

Exploration and Exploitation

Finite Markov Decision Processes

Dynamic Programming

Monte Carlo Methods

Temporal-Diference Learning

n-step Bootstrapping

Planning and Learning with Tabular Methods

On-policy Prediction with Approximation

On-policy Control with Approximation

Off-policy Methods with Approximation

Eligibility Traces

Policy Gradient Methods

Deep Reinforcement Learning

Applications

Useful Toolkits and Libraries

Additional Resources:

Class Time and Location:

Recitation and Assignments:

Projects:

Google Colab:

Fascinating Guides For Machine Learning:

Latex:

Grading:

ُThree Exams:

Prerequisites:

Linear Algebra:

Probability and Statistics:

Topics:

Account:

Academic Honor Code:

Questions?

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages