Skip to content

My final project for CS 485, Training a RL bot using RL haha

Notifications You must be signed in to change notification settings

mdion4/Final_project_bots

Repository files navigation

Training a PPO-Based Rocket League Bot to Outperform Baseline AI Opponents

MATT DION

G01286436

Table of Contents

  1. Introduction and Motivation
  2. Related Work
  3. Problem Definition and Data Collection
  4. Theory, Metrics, and Background
  5. Approach and Implementation Details
  6. Experimental Setup
  7. Results and Analysis
  8. Discussion and Future Work
  9. Conclusion

Introduction and Motivation

The stock bots that are included in Rocket League are bad. Most players quickly outperform these bots after achieving a basic understanding of the controls and strategy. Rocket league is a relatively simple game with only a few different actions and a single goal of scoring more than the opponent, but has an infinitely high skill ceiling.

Reinforcement learning is the key to making better bots that can exceed human capabilities. Its potential for Team based Cooperative-Competitive breakthroughs is great. It has a very large state space and dynamic environment that makes hardcoding not feasible. Cooperative-Competitive MARL is still an open challenge in the robotics and ML field.

This tutorial is for those who want to train a bot using open source frameworks to be better then the stock Rocket League bots. This specific implementation is for 1v1, but 2v2 and 3v3 is supported.

This tutorial will provide the resources necessary to make a bot using PPO, and beat all Psyonix bots, and the level 3 (of 5) community designated benchmark bots. Level 5 bots are better than most human players, and only getting better. They can even give pros hard times until the pro can exploit some weakness of the bot.

Related Work

Problem Definition and Data Collection

  • Original Goal
    Beat the Rookie(level 1) Psyonix bots in a 2v2 match with bots trained via Machine Learning.

    • Intermediate Milestones
      • Touch the ball.
      • Score a goal.
      • Move well.
      • Train a 1v1 bot.
  • Revised Goal Beat the level 3 bot self-driving-car with a PPO trained bot.

    • Intermediate Milestones
      • Train a 1v1 bot to this level
      • Train a 2v2 bot to this level
      • Learn aerials
      • Learn Dribbling
      • Consistently get ball in net on purpose.
      • Make saves
      • Play like a human
  • Environment and Datasets
    RLBot runs in Rocket League, and has training available, but we wont be using that. RLgym_sim simulates the rocket league environment and allows for many more instance to be run much faster than real life. The collision data and physics engine for Rocket League is extracted and used in the simulator, so the simulated environment is identical in its function.

  • Data Collection Process
    Metrics are compiled in Weights and Biases (Wandb). This tracks things like policy reward, Mean_kl divergence etc.

Theory, Metrics, and Background

  • PPO background
    PPO Was introduced by Schulman et al. in the 2017 paper "Proximal Policy Optimization Algorithms" It improved upon existing policy gradient methods by learning on multiple epochs of policy updates, and ended up being simpler while performing better.

  • Metric
    Being "better" than a bot is defined as the probability of scoring the next goal being greater than 50% with a 95% confidence level. Model the game as a binomial distribution, where each goal is regarded as an independent random trial. Number of trials (n=69) and p=0.5 (null hypothesis). Stopping rule: achieve 42 goals before the opponent scores 28 to reject the null hypothesis with p<0.05.

  • Why This Metric?
    This is how RLGym Discord Server Roles are assigned.

    "Special roles are offered to people whose ML bot beats the following bots (any gamemode, in RLBot):

    • Rookie (@Rookie Ruiner)
    • TensorBot (@TensorBot Trouncer)
    • Self-driving car (@Self-Driving Car Sabotager)
    • Element (@Element Eliminator)
    • Nexto (@Nexto Nullifier)

    In the spirit of science, winning is conditioned on statistical significance; the probability of your bot scoring the next goal should be larger than 50% with a 95% confidence interval. (see ⁠general⁠) " - Rolv

Approach and Implementation Details

  • Step-by-Step Instructions
    See Setup to get everything installed and running. See Learner for hyperparameter explanations and suggested values. See rewards for explanations of what a reward is, the ZeroSum wrapper, Curriculum learning with AnnealRewards, combining all rewards into one function, and my personal set of reward iteration that led to a Level 3 bot.

  • Bots in Repo

    • Multiple iterations of bots are included, ultimately all bots prior to bot6 were failures.
    • Bot6 is my pride and joy who beat the level 3 bot.
    • the PPO_Policy.pt is included. This is the bots current brain.
    • bronze_scoring.py is the main file that in modified from example.py
    • Each bot has an increasing number of files taken from other open source repos.
    • The trained RLBot version is located in Corner_Addict. You can copy it into RLBot under ./MyBots
  • Modifications from Base Implementations

    • Many hyperparameters were experimented with, and change as learning goes on. The reward functions and their weights mentioned by Zealan in his guide only work for getting the Bot to push the ball around. He suggests ideas for new reward functions, but not the implementations. I wrote my own reward function TouchVelocityReward() that sought to make the bot hit the ball around instead of farming the touch reward by pushing the ball and maintaining contact. I pulled some reward functions from Apollo bot, some from the rlgym libraries. The getCheckpoint function is from Apollo, which was likely pulled from another bot, while the command line rendering function was mine. While the AnnealRewards() function was also pulled from Apollo, my method of making multiple combined reward functions and updating the current reward function throughout training was done myself.

    • Deciding which Reward functions to use, at what points, with which weights, to guide the bots behavior was entirely my own. The basic reward functions are well known, but their weights and usage vary significantly between bot makers.

Experimental Setup

  • Training Procedure
    Bot6 "Corner_Addict" trained for 2 billion steps across 2 days. The reward sequence can be found in rewards.md.

  • Opponent Setup
    The opponent bots are already included in RLBot. The match is a normal 1v1 Rocket League match with unlimited time.

Results and Analysis

Discussion and Future Work

  • Ideas for Future Improvements
    2v2 is my next challenge, along with better dribbles and aerials. 2v2 will need more custom reward functions like proper teammate spacing and rotation. What needs a function to guide it, and what should emerg naturally during training? We will have to see.

    I will also be attempting more custom reward functions for the 1v1 bot, now that I know what works.

Conclusion

While I was unable to beat the rookie bots in a 2v2 match, I did beat 3 out of 5 of the community baseline bots. I did not attempt level 4 due to how close the level 3 results were. I did blow past many intermediate milestones. Getting the bot to push the ball around was the first suggested milestone, and that was achieved almost immediately, and instilled a false sense of confidence toward the rest of the challenge. My hand was held up until that point. Once things weren't so clearly laid out I certainly struggled for a long time, suffering brain death after brain death.

I did achieve all original intermediate milestones, touching the ball, scoring, moving well (fast and in right direction). An earlier bot, bot 4 had discovered wavedashes on its own. Of the revised goals, I was able to score consistently, shadow defend well, exhibit some behaviors that are better than human at a similar level like patience, dribble with the ball on the ground, make baby aerials. My only failure is not doing this in 2v2 as originally intended.