A hands-on dive into AWS Batch.
- AWS CLI and Credentials to an AWS Account with Permissions to create resources
- Terraform
- Docker
- Python
-
Clone this repository
-
Deploy the infrastructure using Terraform
2.1 Navigate to the infrastructuresrc directorycd src/infrastructure/
2.2 You need a VPC and Security Groups, take note of the VPC ID and Security Group IDs in order to pass them to the apply command. Use the prefix variable to create custom names for the resources.
terraform init terraform plan -var prefix=<prefix> -var subnet_ids='["<subnet-1>", "<subnet-2>"]' -var vpc_id=<vpc-id> -out tfplan
2.3 Verify the plan and apply it if it looks good. Resources that are to be created:
- S3 Bucket that will be used as source and destination for the batch jobs
- ECR Repository to store the docker image
- Security Group for the ECS Task hosting the docker image/ Batch Job
- IAM Roles and Policies for the Batch Job and ECS Task
- AWS Batch Compute Environment
- AWS Batch Job Queue
- AWS Batch Job Definition Apply the plan:
terraform apply tfplan
2.4 Take a look at the resources that have been created in the AWS Console.
-
Build the docker image and push it to ECR. 3.1 Login to ECR
aws ecr get-login-password --region eu-central-1 | docker login --username AWS --password-stdin <account-id>.dkr.ecr.<region>.amazonaws.com
url is in the form of
<account-id>.dkr.ecr.<region>.amazonaws.com
3.2 Build Image
Change directory to the Dockerfile location. Dockerfilecd ../python/single/
When on Mac M1, you need to use the
--platform linux/amd64
flag to build the image for the correct architecture.docker build --platform linux/amd64 -t <ecr-name> .
otherwise, just use
docker build -t <ecr-name> .
3.3 Tag Image
docker tag <ecr-name>:latest <ecr-uri>:latest
uri is in the form of
<account-id>.dkr.ecr.<region>.amazonaws.com/<ecr-name>
3.4 Push Image
docker push <ecr-uri>:latest
-
Unpack the data and upload to S3
4.1. Change the directorycd ../../data/
4.2. Unpack the data with the command below or any other tool of your choice.
unzip data.zip
4.3. Upload the data to the S3 bucket that has been created by Terraform.
aws s3 cp data s3://<bucket_name>/source --recursive
You will submit a job to AWS Batch that will run the python script on the data that you have uploaded to S3. After the job has finished, you will find the results in the destination folder in the S3 bucket. The source and destination will be passed as arguments to the job.
5.1 Command to submit the job:
aws batch submit-job --job-name <job-name> --job-queue <job-queue> --job-definition <job-definition> --container-overrides command='["python", "script.py"]' --container-overrides environment='[{name="BUCKET",value="<bucket-name>"},{name=PREFIX,value="source"},{name="OUTPUT_PREFIX",value="output"}]'
5.2 View the job in the AWS Batch Console
5.3. Check the ECS Cluster where the Job is being executed. ECS Console. A task will spin up and execute the job.
5.4. Check the S3 bucket for the results. S3 Console
Change to the infrastructure directory and run the destroy command.
cd ../infrastructure/
terraform plan -var prefix=<prefix> -var 'subnet_ids=["<subnet-1>", "<subnet-2>"]' -var vpc_id=<vpc-id> -destroy -out tfplan
Verify the plan and apply it if it looks good.
terraform apply tfplan