Scenario

A music streaming startup, Sparkify, has grown their user base and song database and want to move their processes and data onto the cloud. Their data resides in S3, in a directory of JSON logs on user activity on the app, as well as a directory with JSON metadata on the songs in their app.

Assignment

Build an ETL pipeline that extracts JSON log data from an S3 bucket, for this I'm using my local machine as the server to load that data into create my tables and run queries then create my staging tables After I stage the aggregated data into Redshift cluster for any BI to utilize it.

Set up

In the dwh.cfg file you will need to create and grant access to your AWS IAM account for the use of this application. This will allow boto3 python package to do the setup of the inital infrastructure in your AWS account.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.ipynb_checkpoints		.ipynb_checkpoints
__pycache__		__pycache__
README.md		README.md
create_tables.py		create_tables.py
dwh.cfg		dwh.cfg
etl.py		etl.py
notebook.ipynb		notebook.ipynb
sql_queries.py		sql_queries.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Scenario

Assignment

Set up

About

Uh oh!

Languages

idelfonsog2/data_warehouse_with_aws_redshift

Folders and files

Latest commit

History

Repository files navigation

Scenario

Assignment

Set up

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages