Quadrotor Juggling

Keeping a ball in the air by bouncing it off a quadcopter for as many times as possible. We wanted to explore reinforcement learning algorithms.

Algorithms in action

SARSA	VPG	PPO

Team Members for this project

Tanishka Singh (tsingh22@asu.edu)
Deepak Kala Vasudevan (dkalavas@asu.edu)
Nikhil Agarwal (nagarw22@asu.edu)

Dependencies

We recommend using Ubuntu 16 to run the code.

Install latest version of V-Rep Pro Edu
Python 2.7 is required
Install latest version of tensorflow using pip install tensorflow

Running the Quadcopter environment on VREP Simulator

Navigate to where simulator is downloaded and use path of provided environment file and run:

./vrep.sh quad_env.ttt

To run in headless mode

./vrep.sh -h quad_env.ttt

To run the code

Download and unzip the code, navigate to the unzipped folder and run:

$ python main.py [algorithm] [action] [number of episodes] [steps per episode]

options	values
algorithm	pg or vpg or ppo
action	eval or train
number of episodes	default = 200
steps per episode	default = 50

zz_GraveYard.zip contains code that we worked on initially and later abandoned as we could not resolve issues. (uses ros, gazebo, sphinx)

Policy Gradient Methods

A class of reinforcement learning techniques that rely upon optimizing parametrized policies with respect to the expected return (long-term cumulative reward) by gradient descent. The actor directly learns the policy function that map states to actions

Simple Policy Gradient (SARSA)
Vanilla Policy Gradient (VPG)
Proximal Policy Optimization (PPO)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Quadrotor Juggling

Algorithms in action

Team Members for this project

Dependencies

Running the Quadcopter environment on VREP Simulator

To run the code

Policy Gradient Methods

Files

README.md

Latest commit

History

README.md

File metadata and controls

Quadrotor Juggling

Algorithms in action

Team Members for this project

Dependencies

Running the Quadcopter environment on VREP Simulator

To run the code

Policy Gradient Methods