Ez-Merc-Intergalactic

Capstone project created for Promineo Tech's Data Engineering program. Utilized AWS tools and services to create a pipeline to move, clean, store, and analyze data that was scraped and created using Python.

Cooper Williams

Created: 3/23/2023

Last updated: 3/23/2023

15 Step Process:

Step 1: Created Business Plan

Formed necessary steps to convey the various steps in the business process

Step 2: Created Logical and Physical Model

Created visual models using Visual Paradigm

Step 3: Set up relational database instance

Created MS SQL Server database instance in AWS RDS

Step 4: Connected relational database to local SQL editor

Connected to RDS instance using DBeaver

Step 5: Formed schema in editor

Used DDL script to create tables within database instance

Step 6: Created data using Python

Utilized BeautifulSoup, Faker, and ChatGPT API to generate data then stored in CSV file using Boto3

Step 7: Stored data in database instance

Employed the DBeaver Import Wizard to store CSV files into their respective tables

Step 8: Created data lake

Data lake created using AWS Lake Formation, an Amazon S3 bucket, and Glue data catalog hosted on administrator account

Step 9: Transferred raw data to data lake

Used Glue ETL Job to transfer raw data from RDS instance to S3 data lake, filling Glue catalog

Step 10: Inspected data

Inspected data in Athena to check the data quality

Step 11: Cleaned data

Used Glue ETL Job to correct data issues and moved clean data into data lake

Step 12: Updated data catalog

Created a Glue crawler to get updated data into Glue catalog

Step 13: Created data warehouse

Set up AWS Redshift cluster

Step 14: Ingested clean data from data catalog into data warehouse

Imported updated Glue catalog, creating and populating Redshift cluster schema

Step 15: Visualized data from warehouse using analyzation software

Used Visual Paradigm to visualize data from Redshift cluster

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
csv_files		csv_files
data_collection		data_collection
diagrams		diagrams
etl_job_verification		etl_job_verification
visualizations		visualizations
EZ-Merc-InterGalactic-Notebook.ipynb		EZ-Merc-InterGalactic-Notebook.ipynb
README.md		README.md
table_creations.ddl.ddl		table_creations.ddl.ddl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ez-Merc-Intergalactic

Cooper Williams

Created: 3/23/2023

Last updated: 3/23/2023

15 Step Process:

Step 1: Created Business Plan

Step 2: Created Logical and Physical Model

Step 3: Set up relational database instance

Step 4: Connected relational database to local SQL editor

Step 5: Formed schema in editor

Step 6: Created data using Python

Step 7: Stored data in database instance

Step 8: Created data lake

Step 9: Transferred raw data to data lake

Step 10: Inspected data

Step 11: Cleaned data

Step 12: Updated data catalog

Step 13: Created data warehouse

Step 14: Ingested clean data from data catalog into data warehouse

Step 15: Visualized data from warehouse using analyzation software

About

Releases

Packages

Languages

willcoop98/Ez-Merc-Intergalactic

Folders and files

Latest commit

History

Repository files navigation

Ez-Merc-Intergalactic

Cooper Williams

Created: 3/23/2023

Last updated: 3/23/2023

15 Step Process:

Step 1: Created Business Plan

Step 2: Created Logical and Physical Model

Step 3: Set up relational database instance

Step 4: Connected relational database to local SQL editor

Step 5: Formed schema in editor

Step 6: Created data using Python

Step 7: Stored data in database instance

Step 8: Created data lake

Step 9: Transferred raw data to data lake

Step 10: Inspected data

Step 11: Cleaned data

Step 12: Updated data catalog

Step 13: Created data warehouse

Step 14: Ingested clean data from data catalog into data warehouse

Step 15: Visualized data from warehouse using analyzation software

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages