I started this project to become familiar with a few AWS services and concepts that are commonly used in data engineering.
Trending Youtube Video Statistics Dataset Source: https://www.kaggle.com/datasets/datasnaek/youtube-new Dataset collected using Youtube API
- Glue
- Athena
- S3
- IAM
- CLI
- Lambda
- Root Account Best Practices
- Always protect root account
- Dont use root account
- Enable MFA where you can
- Rotate passwords and keys periodically
- Avoid credentails file in shared or private computers
- Follow least privilege principle always
- AWS layers & policies
- Create users and roles using IAM
- Creating S3 bucket and uploading data
- Data Catalog
- How to build a Data Lake from scratch with Amazon S3
- Joining structured and semi structured data
- Created Glue Crawlers in AWS Glue
- ETL in Glue and Lambda
- SQL queries with Amazon Athena
- Data Ingestion
- Data Lake
- AWS Cloud
- ETL Design