Crowdfunding ETL Data Pipeline Project

In this repository you will have an ETL mini project.
We worked with ETL pipeline using Python, Pandas, and either Python dictionary methods or regular expressions to extract and transform the data.
After transforming the data, we followed the steps to create four CSV files and use the CSV file data to create an ERD and a table schemas.
Finally, we uploaded the CSV file data into a #Postgres database.

This repository contains 2 directories and 1 Jupyter Notebook file.

1. Directory “Resources”:

• contacts.xlsx & crowdfunding.xlsx files which were given. These are the first source of information used to run the Jupyter Notebook file. • campaign.csv / category.csv / subcategory.csv & contacts.csv files. These are created automatically when running the codes in Jupyter Notebook file.

2. Directory “Crowdfunding_db_schema and ERD_Diagram”:

• crowdfunding_db_schema.sql Contains the creation of tables to later be used for importing the CSV files. • Data modeling_CrowdFunding(Mini Project) (EXTRA) (Roshni-Laura).sql These references the last part of the challenge, where we create the database schema and the data
modeling in SQL.

• ERD Diagram_Data modeling_CrowdFunding(Mini Project)(Roshni-Laura).png

• ERD_FKeys_ PKeys_Data modeling_CrowdFunding(Mini Project)(Roshni-Laura).pdf

File: ETL_Mini_Project_Starter_Code_RRana_LRoa.ipynb contains all the codes to clean-up the database and export the CSV files.
Each step is well commented and has the desired results.

Please, note that in contacts dataframe, 2 options were given to dataframe creations.
Following the first option was given an error which was not understood by us, so we decided to go with second option.
Please, ignore option 1 for this purposes.

Exctracted Category - Sub- category Data Frame

Extracted Campaign DataFrame

Contacts Data Frame

Creating the Crowdfunding Database

Instructions to run:

crowdfunding_bd_schema.sql (Select statements provided to view the tables)
ETL_Mini_Project_Starter_Code_RRana_LRoa.ipynb

Challenges faced:

Altering the datatype while importing the CSV files to Postgres database.
Kernel needed to be restarted frecuently, showing constant errors when the code were correct.
Date&Time datatype codes were giving an error, needed to further investigate.
Str.split not working in pandas, opted for second option using regex.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
Crowdfunding_db_schema and ERD_Diagram		Crowdfunding_db_schema and ERD_Diagram
Resources		Resources
ETL_Mini_Project_Starter_Code_RRana_LRoa.ipynb		ETL_Mini_Project_Starter_Code_RRana_LRoa.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Crowdfunding ETL Data Pipeline Project

This repository contains 2 directories and 1 Jupyter Notebook file.

1. Directory “Resources”:

2. Directory “Crowdfunding_db_schema and ERD_Diagram”:

Exctracted Category - Sub- category Data Frame

Extracted Campaign DataFrame

Contacts Data Frame

Creating the Crowdfunding Database

Instructions to run:

Challenges faced:

About

Releases

Packages

Contributors 2

Languages

RoshniRanaDS27/Crowdfunding_ETL_Data_Engineering_Pipeline

Folders and files

Latest commit

History

Repository files navigation

Crowdfunding ETL Data Pipeline Project

This repository contains 2 directories and 1 Jupyter Notebook file.

1. Directory “Resources”:

2. Directory “Crowdfunding_db_schema and ERD_Diagram”:

Exctracted Category - Sub- category Data Frame

Extracted Campaign DataFrame

Contacts Data Frame

Creating the Crowdfunding Database

Instructions to run:

Challenges faced:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages