-
Notifications
You must be signed in to change notification settings - Fork 1
Policy Gradient for Gym
Praveen Batra edited this page Mar 10, 2019
·
2 revisions
[Deprecated, we are moving to Object-oriented] Proposed architecture.
trajectories = []
for i in range(num_rollouts):
env = # Gym environment
state_obs, reward, done, info = # get initial state from env, or initialze appropriately
agent = Agent(...)
sampler = Sampler(...)
trajectory = []
while (not done):
old_state_obs = state_obs
action = sampler.sample_distribution(agent.eval(state))
(state_obs, reward, done, info) = env.step(action)
trajectory.append((old_state_obs, action, reward))
trajectories.append(trajectory)
for trajectory in random_sample(trajectories):
agent.update_policy(trajectory)