Building a Serverless Data Pipeline with AWS for Batch ETL Processing

Project Overview

This project demonstrates the creation of a data pipeline using AWS services, including CloudFormation, Aurora MySQL, Step Functions, Lambda, S3, Athena, and QuickSight. The primary goal was to learn the fundamentals of setting up and managing a data pipeline on AWS. To read more about this project, check out my blog post Building My First AWS Batch Data Pipeline: A Hands-On Journey with CloudFormation, Step Functions, and More.

Milestone 1: Provision Resources using CloudFormation

Provisioned an S3 data lake bucket, Aurora MySQL instance, and a Step Function using CloudFormation.
Focused on deploying these resources as a stack to understand IaC and AWS resource management.

Milestone 2: Create a Pipeline to ETL Data from MySQL to S3

Created a batch processing pipeline with a Step Function orchestrating a Lambda function.
Exported data from MySQL to S3, demonstrating the ETL process.

Milestone 3: Define Athena Tables and Visualize Data with QuickSight

Defined Athena tables for the S3 data schema.
Connected Athena tables to QuickSight for data visualization.

Key Learnings

The importance of detailed architecture and permissions management.
Practical experience with various AWS services and IaC.

Future Goals

Develop expertise in pipeline design and event-driven architectures.
Enhance understanding of IAM roles and permissions for smoother deployments.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
img		img
stack		stack
AWS_cfn_mysql_small.yaml		AWS_cfn_mysql_small.yaml
README.md		README.md
datalake_s3_stack.yaml		datalake_s3_stack.yaml
payload.json		payload.json
stack.zip		stack.zip
step_partial.yaml		step_partial.yaml
step_simple.yaml		step_simple.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Building a Serverless Data Pipeline with AWS for Batch ETL Processing

Project Overview

Milestone 1: Provision Resources using CloudFormation

Milestone 2: Create a Pipeline to ETL Data from MySQL to S3

Milestone 3: Define Athena Tables and Visualize Data with QuickSight

Key Learnings

Future Goals

About

Releases

Packages

Languages

tacotuesday/aws-batch-etl-demo

Folders and files

Latest commit

History

Repository files navigation

Building a Serverless Data Pipeline with AWS for Batch ETL Processing

Project Overview

Milestone 1: Provision Resources using CloudFormation

Milestone 2: Create a Pipeline to ETL Data from MySQL to S3

Milestone 3: Define Athena Tables and Visualize Data with QuickSight

Key Learnings

Future Goals

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages