Skip to content

ETL pipeline using Python, Pandas, and Python dictionary methods and uploaded the CSV file data into a #Postgres database

Notifications You must be signed in to change notification settings

RoshniRanaDS27/Crowdfunding_ETL_Data_Engineering_Pipeline

Repository files navigation

Crowdfunding ETL Data Pipeline Project

Project 18

  • In this repository you will have an ETL mini project.
  • We worked with ETL pipeline using Python, Pandas, and either Python dictionary methods or regular expressions to extract and transform the data.
  • After transforming the data, we followed the steps to create four CSV files and use the CSV file data to create an ERD and a table schemas.
  • Finally, we uploaded the CSV file data into a #Postgres database.

This repository contains 2 directories and 1 Jupyter Notebook file.

image

1. Directory “Resources”:

• contacts.xlsx & crowdfunding.xlsx files which were given. These are the first source of information used to run the Jupyter Notebook file. • campaign.csv / category.csv / subcategory.csv & contacts.csv files. These are created automatically when running the codes in Jupyter Notebook file.
image

2. Directory “Crowdfunding_db_schema and ERD_Diagram”:

• crowdfunding_db_schema.sql Contains the creation of tables to later be used for importing the CSV files. • Data modeling_CrowdFunding(Mini Project) (EXTRA) (Roshni-Laura).sql These references the last part of the challenge, where we create the database schema and the data
modeling in SQL.
image image

• ERD Diagram_Data modeling_CrowdFunding(Mini Project)(Roshni-Laura).png image

• ERD_FKeys_ PKeys_Data modeling_CrowdFunding(Mini Project)(Roshni-Laura).pdf

File: ETL_Mini_Project_Starter_Code_RRana_LRoa.ipynb contains all the codes to clean-up the database and export the CSV files.
Each step is well commented and has the desired results.

Please, note that in contacts dataframe, 2 options were given to dataframe creations.
Following the first option was given an error which was not understood by us, so we decided to go with second option.
Please, ignore option 1 for this purposes.

Exctracted Category - Sub- category Data Frame

image image image image image image image image

Extracted Campaign DataFrame

image image image image image image image image image image

Contacts Data Frame

image image image image image image image image image image image image image image image image image

Creating the Crowdfunding Database

image image

Instructions to run:

  1. crowdfunding_bd_schema.sql (Select statements provided to view the tables)
  2. ETL_Mini_Project_Starter_Code_RRana_LRoa.ipynb

Challenges faced:

  • Altering the datatype while importing the CSV files to Postgres database.
  • Kernel needed to be restarted frecuently, showing constant errors when the code were correct.
  • Date&Time datatype codes were giving an error, needed to further investigate.
  • Str.split not working in pandas, opted for second option using regex.

About

ETL pipeline using Python, Pandas, and Python dictionary methods and uploaded the CSV file data into a #Postgres database

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published