Reinforcement Learning

👍 There is no supervisor, only a reward signal

👍 Feedback is delayed, not instantaneous

👍 Time really matters (sequential)

👍 Agent's action affect the subsequent data it receives

RL terminologies

Reward, is a scalar feed back signal, it indicates how weell agent is doing at step t. The agent's jobs is to maximise cumulative reward. RL is based on th ereward hypothesis
Sequential DEcision Making:
1. Gloal: select actions to maximise total future reward
2. Actions may have long term consequences.
3. Reward may be delayed
4. It may be better to sacrifice immediate reward to gain more long-term reward.
5. Example:
  1. Afinancial investment (may take months to mature)
  2. Refuelling a helicopter
  3. Blocking oppoent moves
Agent and Environment:
1. At each step t the agent:
  1. excutes action A_t
  2. Recieves observation O_t
  3. Receives scalar reward R_t
2. The environment:
  1. Receives action A_t
  2. Emits observation O_{t+1}
  3. Emits scalar reward R_{t+1}
3. t increments at env.
History and State:
1. The history is th esequence of {obsevations, actions, rewards} H_t = O_1,R_1,A_1...A_{t-1},O_t,R_t
2. All observable variables up to time t
3. The sensorimotor stream of a robot or embodied agent.
4. What happens next depends on the history
5. State is the information used to determine what happens next
6. Formally, state is a function of the history: S_t = f(H_t)
Environment state
1. The environment state S_t^e is the environment's private representation
2. the environment state is not usually visible to the agent
Agent State S_t^a:
1. It's the informatin used by reinforcement learning algorithms
2. S_t^a=f(H_t) It can be any function of history:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ReinforcementLearning.md

ReinforcementLearning.md

Reinforcement Learning

RL terminologies

Files

ReinforcementLearning.md

Latest commit

History

ReinforcementLearning.md

File metadata and controls

Reinforcement Learning

RL terminologies