Cloud data warehouse for Sparkify Data

About

Sparkify is a fictional music application that store songs and users' activity logs in separate JSON files. When the application started to grow, it becomes extremely difficult for the company to handle and benefit from these files. The suggested solution is to start investing in cloud solution. In this project Amazon Web services will be used.

ETL pipeline Logic

Load credentials

Loading credintials
Read data from s3 bucket
Transform data by careting five seprate tables
Load data to a new s3 bucket

NOTE:

Songs table files are partitioned by year and then artist. Time table files are partitioned by year and month. Songplays table files are partitioned by year and month.

User Manual:

To Run the codes do the following instructions in the same exact order

Open the terminal or bash in windows
Write python etl.py then wait until the processing is completed

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.ipynb_checkpoints		.ipynb_checkpoints
data		data
README.md		README.md
dl.cfg		dl.cfg
etl.py		etl.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cloud data warehouse for Sparkify Data

About

ETL pipeline Logic

NOTE:

User Manual:

About

Languages

Naleliwi/DEND_Data-Lakes-Spark

Folders and files

Latest commit

History

Repository files navigation

Cloud data warehouse for Sparkify Data

About

ETL pipeline Logic

NOTE:

User Manual:

About

Topics

Resources

Stars

Watchers

Forks

Languages