GitHub - BryanHolbrook/spark-startup-data-modeling

1. Purpose of database in context of startup, Sparkify and their analytical goals

The following Postgres database has been designed to easily access Sparkify's new music streaming app data to gain further insight into their customers habits and listening patterns. Data includes users, songs, and user activity.

2. How to run Python scripts

Run the following scripts in the Juptyer notebook by selecting the interested cell contatining your query and clicking 'Run' in the panel above. Or by calling the functions individually via the 'etl.py' file.

3. Explanation of files in the repository

The repository has the following files: 'data' - The current data that we ate working with. 'etl.pynb' - The notebook analyzing the data as well as process of understanding the code. 'test.pynb' - Code tests ensuring functionality. 'create_tables.py' - Functions for creating our tables. 'etl.py' - Python functions to run our code independent of the notebook. 'sql_queries.py' - SQL queries for creating tables and inserting data into them. 'README.md' - Description of project.

4. Database schema design, ETL pipeline, and design justification

The database schema has the following dimension tables: 'users', 'songs, 'artists;, and 'time', as well as one fact table called 'songplays'. The ETL pipeline pulls up to data provided by Sparkify and includes a combination of 'song_data' and 'log_data' files combined to tie users, and time spent listening to the songs data for user activity insights. These table have been designed to optimize queries on song play analysis of Sparkify's users.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
create_tables.py		create_tables.py
data.zip		data.zip
etl.ipynb		etl.ipynb
etl.py		etl.py
sql_queries.py		sql_queries.py
test.ipynb		test.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

1. Purpose of database in context of startup, Sparkify and their analytical goals

2. How to run Python scripts

3. Explanation of files in the repository

4. Database schema design, ETL pipeline, and design justification

About

Releases

Packages

Languages

BryanHolbrook/spark-startup-data-modeling

Folders and files

Latest commit

History

Repository files navigation

1. Purpose of database in context of startup, Sparkify and their analytical goals

2. How to run Python scripts

3. Explanation of files in the repository

4. Database schema design, ETL pipeline, and design justification

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages