Concepts
- Offline Reinforcement Learning
- Markov Decision Process
RL Tutorials
- Introduction to Gym toolkit
- Code-Driven Introduction to Reinforcement Learning
- CartPole using Cross-Entropy
- FrozenLake using Cross-Entropy
- FrozenLake using Value Iteration
- FrozenLake using Q-Learning
- CartPole using REINFORCE in PyTorch
- Cartpole in PyTorch
- Q-Learning on Lunar Lander and Frozen Lake
- REINFORCE
- Importance Sampling
- Kullback-Leibler Divergence
- MDP with Dynamic Programming in PyTorch
- REINFORCE in PyTorch
- MDP Basics with Inventory Control
- n-step algorithms and eligibility traces
- Q-Learning vs SARSA and Q-Learning extensions
RecSys Tutorials
- Multi-armed Bandit for Banner Ad
- Contextual Recommender with Vowpal Wabbit
- Top-K Off-Policy Correction for a REINFORCE Recommender System
- Neural Interactive Collaborative Filtering
- Batch-Constrained Deep Q-Learning
- Pydeep Recsys
- Recsim Catalyst
- Solving Multi-armed Bandit Problems
- Deep Reinforcement Learning in Large Discrete Action Spaces
- Off-Policy Learning in Two-stage Recommender Systems
- Comparing Simple Exploration Techniques: ε-Greedy, Annealing, and UCB
- Predicting rewards with the state-value and action-value function
- Real-Time Bidding in Advertising
- GAN User Model for RL-based Recommendation System