ETL-Salary-Pipeline

We will be building an ETL pipeline for the dataset which contains salary information of the people living in San Francisco. The dataset has been downloaded from Kaggle. We extract the data from local DBFS. The dataset is in the form of csv, which consists of personal information like name, salary etc. We ingest the raw data into staging area using batch load and add two new columns - ingest date and ingest time. Then we perform some cleaning and analysis using various python libraries. Finally, we load the cleaned data into an SQL server and perform query operations. This is a POC for ETL pipeline. Tech stack used: Azure Databricks, DBeaver/SQLite

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
ETL pipeline.ipynb		ETL pipeline.ipynb
README.md		README.md
Salaries.csv		Salaries.csv
Salary tracker - Load.ipynb		Salary tracker - Load.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ETL-Salary-Pipeline

About

Releases

Packages

Languages

Saishiyam21/ETL-Salary-Pipeline

Folders and files

Latest commit

History

Repository files navigation

ETL-Salary-Pipeline

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages