This repository provides an example of a reinforcement learning environment on AWS that uses CloudWatch metrics as the observables. The CloudFormation script and associated code provides an environment that can be used to train and test reinforcement learning algorithms for provisioning resources based on CloudWatch metrics.
The CloudFormation script will set up the VPC, two public subnets, an elastic load balancer (ELB) and autoscaling group (ASG) for your web server, an EC2 instance for the agent, and an EC2 instance for driving calls to the web service.
The web service is currently just a simple "Hello World" page, and should be replaced with something more meaningful. The web server instances are sitting behind an ELB, and scaled using an ASG. The max number of instances is set to 5 (modifying this is done directly in the awsenv/cloudopt_aws.py
file. Cost penalties for utilization are in awsenv/cloudopt.py
.
The Driver instance includes very simple code, which will have the correct ELB DNS substituted in when starting. This will be in the home area of the ec2-user
user. The driver is run at the command line as the ec2-user
:
$ ./driver/driver.py 100
where 100
is the number of iterations, and can be adjusted accordingly. If the driver is not started before the Agent codes, then there will be no CloudWatch metrics returned and the state will return all 0's.
The Agent EC2 instance automatically installs the AWS environment to be used with this system. The three algorithms that have been implemented are Tabular Q-learning, Deep Q-learning (DQN), and Double Dueling Deep Q Learning (D3Q). These can be launched with the following commands at the command line:
$ cd AWS-RL-Env
$ python DQN.py
$ python D3Q.py
$ python QLearn.py
The DQN and D3Q implementations were provided by Zhiguang Wang. Andy Spohn provided essential feedback on AWS-related aspects, including cost model and CloudFormation.