Skip to content

Latest commit

 

History

History
55 lines (41 loc) · 2.23 KB

README.md

File metadata and controls

55 lines (41 loc) · 2.23 KB

Quadrotor Juggling

Keeping a ball in the air by bouncing it off a quadcopter for as many times as possible. We wanted to explore reinforcement learning algorithms.

Algorithms in action

SARSA VPG PPO

Team Members for this project

Dependencies

We recommend using Ubuntu 16 to run the code.

Running the Quadcopter environment on VREP Simulator

Navigate to where simulator is downloaded and use path of provided environment file and run:

./vrep.sh quad_env.ttt

To run in headless mode

./vrep.sh -h quad_env.ttt

To run the code

Download and unzip the code, navigate to the unzipped folder and run:

$ python main.py [algorithm] [action] [number of episodes] [steps per episode]
options values
algorithm pg or vpg or ppo
action eval or train
number of episodes default = 200
steps per episode default = 50
  • zz_GraveYard.zip contains code that we worked on initially and later abandoned as we could not resolve issues. (uses ros, gazebo, sphinx)

Policy Gradient Methods

A class of reinforcement learning techniques that rely upon optimizing parametrized policies with respect to the expected return (long-term cumulative reward) by gradient descent. The actor directly learns the policy function that map states to actions

  • Simple Policy Gradient (SARSA)
  • Vanilla Policy Gradient (VPG)
  • Proximal Policy Optimization (PPO)