Note
This project is part of the DevOpsTheHardWay course.
Tip
The project builds upon the concepts covered in our previous Docker project. We recommend completing the Docker project first.
In the previous Docker project, you've developed and deployed containerized service which detects objects in images sent through a Telegram chatbot. In addition, your service could effortlessly be launched on any machine using (almost) a single Docker compose command. Great job!
But how services are really deployed in the cloud? What about scalability? Consider if your service were to receive 30 images per second - would it effectively handle the load? Additionally, in the event of an accidental virtual machine shutdown, does your service stay available? does it maintain high availability?
In this project, you'll deploy your Polybot Service on AWS following best practices for a well-architected infrastructure. You'll utilize the majority of the AWS resources covered in the tutorials, including EC2, VPC, IAM, S3, SQS, DynamoDB, ELB, ASG, Lambda, CloudWatch, and more.
- Fork this repo by clicking Fork in the top-right corner of the page.
- Clone your forked repository by:
Change
git clone https://github.com/<your-username>/<your-project-repo-name>
<your-username>
and<your-project-repo-name>
according to your GitHub username and the name you gave to your fork. E.g.git clone https://github.com/johndoe/PolybotServiceAWS
. - Open the repo as a code project in your favorite IDE (Pycharm, VSCode, etc..). It is also a good practice to create an isolated Python virtual environment specifically for your project (see here how to do it in PyCharm).
- This project involves working with many AWS services. Note that you are responsible for the costs of any resources you create. You'll mainly pay for 2-4 EC2 instances and Application Load Balancer. If you work properly, the cost estimation is 35 USD for a month, assuming your instances are running for 8 hours a day for a whole month (the project can be completed in much less than a month. You can, and must, stop you instances and delete the ALB at the end of usage to avoid additional charges).
Later on, you are encouraged to change the README.md
file content to provide relevant information about your service project.
Let's get started...
- If you don't have it yet, create a VPC with at least two public subnets in different AZs.
- During the project you'll deploy the Polybot Service in EC2 instances. For that, your instances has to have permission on some AWS services (E.g. for example to upload/download photos in S3). The correct way to do it is by attaching an IAM role with the required permissions to your EC2 instance. Don't use or generate access keys.
Warning
Never commit AWS credentials into your git repo nor your Docker containers!
-
The Polybot microservice should be running in a
micro
EC2 instance. You'll find the code skeleton underpolybot/
. Take a look at it - it's similar to the one used in the previous project, so you can leverage your code implementation from before. -
The app should be running automatically as the EC2 instances are being started (in case the instance was stopped), without any manual steps.
There are many approached to achieve that, for example by using the
--restart always
flag in thedocker run
command, or using a Linux service. -
The service should be highly available. For that, you'll use an Application Load Balancer (ALB) that routes the traffic across the instances located in different AZs.
The ALB must have an HTTPS listener, as working with HTTP is not allowed by Telegram. To use HTTPS you need a TLS certificate. You can get it either by:
-
Generate a self-signed certificate and import it to the ALB listener. In that case the certificate
Common Name
(CN
) must be your ALB domain name (E.g.test-1369101568.eu-central-1.elb.amazonaws.com
), and you must pass the certificate file when setting the webhook inbot.py
(i.e.self.telegram_bot_client.set_webhook(..., certificate=open(CERTIFICATE_FILE_NAME, 'r'))
).Or
-
Register a real domain using Route53 (
.click
is one of the cheapest). After registering your domain, in the domain's Hosted Zone you should create a subdomain A alias record that routes traffic to your ALB. In addition, you need to request a public certificate for your domain address, since the domain has been issued by Amazon, issuing a certificate can be easily done with AWS Certificate Manager (ACM).
-
-
Read Telegram's webhook docs to get the CIDR of Telegram servers. Since your ALB is publicly accessible, it's better to restrict incoming traffic access to the ALB exclusively to Telegram servers by applying inbound rules to the Security Group.
-
Your Telegram token is a sensitive data. It should be stored in AWS Secret Manager. Create a corresponding secret in Secret Manager, under Secret type choose Other type of secret.
- The Yolo5 microservice should be running within a
medium
EC2 instance. The service files can be found underyolo5/
. Inapp.py
you'll find a code skeleton that periodically consumes jobs from an SQS queue. - Polybot -> Yolo5 communication: When the Polybot microservice receives a message from Telegram servers, it uploads the image to the S3 bucket.
Then, instead of talking directly with the Yolo5 microservice using a simple HTTP request, the bot sends a "job" to an SQS queue.
The job message contains information regarding the image to be processed, as well as the Telegram
chat_id
. The Yolo5 microservice acts as a consumer, consumes the jobs from the queue, downloads the image from S3, processes the image, and writes the results to a DynamoDB table (instead of MongoDB as done in the previous Docker project. Change your code accordingly). - Yolo5 -> Polybot communication: After writing the results to DynamoDB, the Yolo5 microservice then sends a
POST
HTTP request to the Yolo5 microservice, to/results?predictionId=<predictionId>
, while<predictionId>
is the prediction ID of the job the yolo5 worker has just completed. The request is done via the ALB domain address (you can use the existed HTTPS listener or create an HTTP listener). The/results
endpoint in the Polybot microservice then should retrieve the results from DynamoDB and sends them to the end-user Telegram chat by utilizing thechat_id
value. - The Yolo5 microservice should be auto-scaled. For that, all Yolo5 instances would be part of an Auto-scaling Group. Create a Launch Template and Autoscaling Group with desired capacity 1, and maximum capacity of 2 (obviously in real-life application it would be more than 2). The automatic scaling based on CPU utilization, with a scaling policy triggered when the CPU utilization reaches 60%.
- The Yolo5 app should be automatically up and running when an instance is being started as part of the ASG, without any manual steps. You can utilize User Data to achieve that.
- Send multiple images to your Telegram bot within a short timeframe to simulate increased traffic.
- Navigate to the CloudWatch console, observe the metrics related to CPU utilization for your Yolo5 instances. You should see a noticeable increase in CPU utilization during the period of increased load.
- After ~3 minutes, as the CPU utilization crosses the threshold of 60%, CloudWatch alarms will be triggered. Check your ASG size, you should see that the desired capacity increases in response to the increased load.
- After approximately 15 minutes of reduced load, CloudWatch alarms for scale-in will be triggered.
We now want to automate the deployment process of the service, both for the Polybot and the Yolo5 microservices.
Meaning, when you make code changes locally, then commit and push them, a new version of Docker image is being built and deployed into the designated EC2 instances (either the two instances of the Polybot or all ASG instances of the Yolo5).
The process has to be fully automated.
While the expected outcome is simple, there are many ways to implement it.
You are given a skeleton for GitHub Action workflows for the Polybot and the Yolo5 under .github/workflows/polybot-deployment.yaml
and .github/workflows/yolo5-deployment.yaml
respectively.
Beyond the provided YAMLs, you are free to design and implement your CI/CD pipeline in any way you see, here are some ideas:
-
To build your images:
- Simply by
docker
commands in the GitHub Actions workflow. - Use AWS CodeBuild.
- Simply by
-
To deploy a new image version:
- Simply by a bash script that lists EC2 instances according to some tag and execute commands within each instance to pull and run the new image version.
- Use Ansible playbook.
- AWS CodeDeploy.
- ASG instance refresh.
Test your CI/CD pipline for both the Polybot and Yolo5 microservices.
You are highly encouraged to share your project with others by creating a Pull Request.
Create a Pull Request from your repo, branch main
(e.g. johndoe/PolybotServiceAWS
) into our project repo (i.e. exit-zero-academy/PolybotServiceAWS
), branch main
.
Feel free to explore other's pull requests to discover different solution approaches.
As it's only an exercise, we may not approve your pull request (approval would lead your changes to be merged into our original project).