Skip to content

sarutlaa/Ride-Hailing-Data-Analytics

Repository files navigation

Ride-Hailing-Data-Analytics

An End to End Data Engineering Project, involving Ride Hailing data analytics and insights using GCP Services like Compute Engine etc, Mage AI (Open Source ETL Tool) and Python pandas for scripting ETL processes.

ARCHITRCTURE:

image

Technologies Used:

  1. GCP Services Used :

     a. Ckoud Storage : For storing the dataset in csv form.
     
     b. Compute Engine : A VM to run the mage instance for running the ETL pipeline. 
     
     c. Big Query : For storing the Tranformed data into BigQuery
     
     d. Looker : For analysing insights using visualizations. 
    
  2. Programming Languages : Python Scripting using pandas.

  3. Modern Data Pipeine Tool - https://www.mage.ai/

  4. Dataset Used : TLC Trip Record Data Yellow and green taxi trip records include fields capturing pick-up and drop-off dates/times, pick-up and drop-off locations, trip distances, itemized fares, rate types, payment types, and driver-reported passenger counts.

    Cloud Storage Link for Data set - https://storage.googleapis.com/uber_data_analytics_sravya/uber_data.csv More info about dataset can be found here:

    Website - https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page

    Data Dictionary - https://www.nyc.gov/assets/tlc/downloads/pdf/data_dictionary_trip_records_yellow.pdf

  5. DATA MODEL:

image

  1. Published Looker Dashboard Results : https://lookerstudio.google.com/reporting/fd70012d-e6fe-4ce2-8f83-917e6a07af9a

Thanks to Darshil Parmar for giving such a learning experience.