- Introduction: learning from experience
- Understanding reinforcement learning
- Defining the agent-environment interface of a reinforcement learning system
- The theoretical foundations of RL
- Markov decision processes
- The mathematical formulation of Markov decision processes
- Visualization of a Markov process
- Episodic versus continuing tasks
- RL terminology: return, policy, and value function
- The return
- Policy
- Value function
- Dynamic programming using the Bellman equation
- Reinforcement learning algorithms
- Dynamic programming
- Policy evaluation – predicting the value function with dynamic programming
- Improving the policy using the estimated value function
- Policy iteration
- Value iteration
- Reinforcement learning with Monte Carlo
- State-value function estimation using MC
- Action-value function estimation using MC
- Finding an optimal policy using MC control
- Policy improvement – computing the greedy policy from the action-value function
- Temporal difference learning
- TD prediction
- On-policy TD control (SARSA)
- Off-policy TD control (Q-learning)
- Dynamic programming
- Implementing our first RL algorithm
- Introducing the OpenAI Gym toolkit
- Working with the existing environments in OpenAI Gym
- A grid world example
- Implementing the grid world environment in OpenAI Gym
- Solving the grid world problem with Q-learning
- Implementing the Q-learning algorithm
- Introducing the OpenAI Gym toolkit
- A glance at deep Q-learning
- Training a DQN model according to the Q-learning algorithm
- Replay memory
- Determining the target values for computing the loss
- Implementing a deep Q-learning algorithm
- Training a DQN model according to the Q-learning algorithm
- Chapter and book summary
Please refer to the README.md file in ../ch01
for more information about running the code examples.