Developed and implemented an end-to-end ETL pipeline using Mage.ai to extract, transform, and load Uber dataset into Google BigQuery for data analysis, Utilized Google Cloud Storage to store and manage raw data files throughout the data processing workflow, Structured and optimized the data in BigQuery for fast, scalable querying and reporting. Created interactive data visualizations and dashboards using Looker Studio to present insights on ride patterns, peak usage times, and customer behavior.Ensured data accuracy and performance by applying best practices in cloud-based data engineering.
- Programming Language(Python)
- MySql
- Google Cloud Platform
- BigQuery
- Cloud Storage
- Looker Studio
- Compute Instance
- Mage.AI(Modern data Pipeline tool)
Modern Data Pipeline Tool - : https://www.mern.ai/
TLC Trip Record Data Yellow and green taxi trip records include fields capturing pick-up and drop-off dates/times, pick-up and drop-off locations, trip distances, itemized fares, rate types, payment types, and driver-reported passenger counts.
Here is the dataset used - https://github.com/DillipKumarNayak2000/New_York_taxi_Projects/blob/main/uber_data.csv
- Original Data Source - https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page
- Data Dictionary - https://www.nyc.gov/assets/tlc/downloads/pdf/data_dictionary_trip_records_yellow.pdf