Introduction

Purpose

To structure and simplify Sparkify's access to user activity.

facilitating queries concerning user activity (songs listened)
moving from directory of JSON logs into snowflake RDBMS system on PostgreSQL

Currently, Sparkify doesn't have an easy way to query their data, which resides in a directory of JSON logs on user activity on the app, as well as a directory with JSON metadata on the songs in their app.

Goals

Define STAR-like data structure in PostgreSQL
Define a set of SQL queries to":
- drop tables
- create designed tables (schema)
- populate tables
Define a set of scripts for batch processing of gathered JSON logs

Code structure

Scripts

sql_queries.py - defines schema structure, functions to DROP and UPSERT into tables
create_Tables.py - a helper function to clear existing schema, and create a blank (empty) structure ready to fill
etl.py - a function to batch-process files on a) song_data and b) log_data in the provided directory

Helper notebooks

etl.ipynb - a notebook to document and test the functions inserting data into tables. Notebook works on a single (example) file, and demonstrates/prototypes the functions and processing for the proper upload of data into PostgreSQL
test.ipynb - a notebook to document that the data was properly formatted and uploaded into designed schemas. A set of simple SQL queries to display table structure.

Database Schema

Overview:

Fact Table

songplays - records in log data associated with song plays i.e. records with page NextSong songplay_id, start_time, user_id, level, song_id, artist_id, session_id, location, user_agent

Dimension Tables

users - users in the app user_id, first_name, last_name, gender, level
songs - songs in music database song_id, title, artist_id, year, duration
artists - artists in music database artist_id, name, location, latitude, longitude
time - timestamps of records in songplays broken down into specific units start_time, hour, day, week, month, year, weekday

Example queries

Please see test.ipynb notebooks for example basic queries.

The central "fact" table is songplays that could be easily queried to obtain desired information.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
.gitignore		.gitignore
README.md		README.md
create_tables.py		create_tables.py
etl.ipynb		etl.ipynb
etl.py		etl.py
sql_queries.py		sql_queries.py
test.ipynb		test.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Purpose

Goals

Code structure

Scripts

Helper notebooks

Database Schema

Fact Table

Dimension Tables

Example queries

About

Releases

Packages

Languages

vaxherra/DEND-Data-Modeling-Postgres

Folders and files

Latest commit

History

Repository files navigation

Introduction

Purpose

Goals

Code structure

Scripts

Helper notebooks

Database Schema

Fact Table

Dimension Tables

Example queries

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages