NYC-Taxi-Data-Engineering-Project

Overview

This project involves extracting NYC taxi data from an API and storing it in a data lake. The data is transformed using the medallion architecture in Databricks with PySpark, progressing through bronze, silver, and gold layers. Delta tables are created in Azure Data Lake Storage (ADLS) and integrated with Power BI for dynamic visualizations.

Data Architecture

Lessons Learned

Developed a Parameterized Data Pipeline to extract data dynamically from the website.
Enabled Databricks to access ADLS using a service principal for secure, controlled access.
Adopted the Medallion Architecture (Bronze, Silver, Gold Layers) for systematic data transformation and enrichment.
Established Delta Lake for Efficient Data Storage and created Delta Tables over this data.
Leveraged Data Versioning and Time Travel allowing historical data retrieval and rollback capabilities

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
README.md		README.md
nyc_taxi_data_architecture.png		nyc_taxi_data_architecture.png
nyc_taxi_gold.ipynb		nyc_taxi_gold.ipynb
nyc_taxi_silver.ipynb		nyc_taxi_silver.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NYC-Taxi-Data-Engineering-Project

Overview

Data Architecture

Lessons Learned

About

Languages

Maaztajmohammed/NYC-Taxi-Data-Engineering-Project

Folders and files

Latest commit

History

Repository files navigation

NYC-Taxi-Data-Engineering-Project

Overview

Data Architecture

Lessons Learned

About

Topics

Resources

Stars

Watchers

Forks

Languages