| layout | mathjax | title | author | |
|---|---|---|---|---|
post |
true |
Notations |
|
Top | Notations | Bibliography
This is a list of notations and definitions used throughout the series.
| Symbol | Meaning |
|---|---|
| States | |
| Actions | |
| Initial distribution of states | |
| State-reward transition probability of getting the next state |
|
| State transition probability |
|
| State-action-state reward |
|
| State-action reward |
|
| Policy | |
|
|
|
| State-action trajectory |
|
| State-action-reward trajectory |
|
| Discount factor |
|
| Return of the state-action-reward trajectory $$ | |
| Agent objective |
|
| State value function |
|
| Action value function |
|
| Advantage function |
|
| $$V_(s), Q_(s, a)$$ | Optimal state and action value functions |
| The sets of nonnegative integers, integers, and real numbers |
| Term | Meaning |
|---|---|
| the model |
|
| the policy |
|
| bootstrapping | an algorithm is bootstrapping if it uses predicted output as targets |