An End to End Data Engineering Project, involving Ride Hailing data analytics and insights using GCP Services like Compute Engine etc, Mage AI (Open Source ETL Tool) and Python pandas for scripting ETL processes.
ARCHITRCTURE:
Technologies Used:
-
GCP Services Used :
a. Ckoud Storage : For storing the dataset in csv form. b. Compute Engine : A VM to run the mage instance for running the ETL pipeline. c. Big Query : For storing the Tranformed data into BigQuery d. Looker : For analysing insights using visualizations.
-
Programming Languages : Python Scripting using pandas.
-
Modern Data Pipeine Tool - https://www.mage.ai/
-
Dataset Used : TLC Trip Record Data Yellow and green taxi trip records include fields capturing pick-up and drop-off dates/times, pick-up and drop-off locations, trip distances, itemized fares, rate types, payment types, and driver-reported passenger counts.
Cloud Storage Link for Data set - https://storage.googleapis.com/uber_data_analytics_sravya/uber_data.csv More info about dataset can be found here:
Website - https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page
Data Dictionary - https://www.nyc.gov/assets/tlc/downloads/pdf/data_dictionary_trip_records_yellow.pdf
-
DATA MODEL:
- Published Looker Dashboard Results : https://lookerstudio.google.com/reporting/fd70012d-e6fe-4ce2-8f83-917e6a07af9a
Thanks to Darshil Parmar for giving such a learning experience.