Skip to content

adamnoach/Tennis_PPO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Unity Tennis environment

Introduction

This repository contains an implementation of PPO that solves the Unity Tennis environment. The implementation is based on the existing implementations by Shangtong Zhang and Herimiaina Andria-Ntoanina. Mr. Zhang in particular has a DRL repository with modular implementations of several algorithms, be sure to check it out.

Tennis solved

The set-up is as follows: the two tennis paddles are trained to bounce the ball back and forth. A paddle receives a reward of +0.1 for bouncing the ball over the net, and a reward of -0.01 for making it fall. At each timestep, each paddle receives its own 8-dimensional observation vector, encoding information about the position and velocity of the ball and paddle. The action space is comprised of two continuous variables: moving towards or away from the net, and jumping.

Solving the Environment

It is clear that each paddle maximises its rewards by cooperating with its adversary. Optimal policy is identical for each paddle. Hence, due to the symmetry of the observation vectors, they may be combined to train a single PPO algorithm that dictates and trains the policy for both paddles.

To solve the environment, the paddles must average score of +0.5 over 100 consecutive episodes, where the score for each episode is taken to be the maximum score obtained by a paddle in that episode.

Solution

We use the Proximal Policy Optimization (PPO) deep reinforcement learning algorithm to solve the environment.

Try it yourself!

  1. Install anaconda from here.

  2. Install unity ml-agents using the instructions here.

  3. Download the Tennis environment from one of the links below. You need only select the environment that matches your operating system:

  1. Download the Tennis_PPO.ipynb notebook from this repository to train the agent. Follow these simple these instructions.

  2. Go to the relevant terminal and create a conda environment:

conda create -n myenv python=3.6
  1. Activate the environment and open jupyter notebooks:
conda activate myenv
jupyter notebook

Then open up the Tennis_PPO notebook and run it.

If you're not on a Mac make sure to change the filename of the environment in the notebook.

About

Solving the Unity Tennis environment with PPO.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published