This project showcases real-world taxi data analytics centered around New York City. Utilizing a range of tools, including GCP Services, Python, Virtual Machine, and Mage Data Pipeline, I prepare data for insightful analysis. The findings are visualized using PowerBI and Looker Studio to inform data-driven decisions.
- Programming Language: Python
- Google Cloud Platform (GCP):
- Google Storage
- Compute Engine (VM Instance)
- BigQuery
- Looker Studio
- Data Pipeline Tool: Mage-AI
- Data Visualization: Power BI & Looker Studio
I utilize the TLC Trip Record Data, which encompasses a wealth of information from yellow and green taxi trip records, including pick-up and drop-off times, locations, distances, fares, and more. These datasets have been provided under the Taxicab & Livery Passenger Enhancement Programs (TPEP/LPEP).
- Minimized Dataset Sample: Download Here
- Download the dataset to your local machine. Dataset Download
- Perform preliminary data transformations in Jupyter Notebooks. Codebase
- Initialize the GCP Console.
- Create and configure a GCP VM Instance. Installation Scripts
- Upload the dataset to Google Cloud Storage with 'Public Access' settings.
- Establish a new Mage environment. Refer to Mage documentation for details.
- Implement a new firewall rule for Mage project access.
- Develop the ETL pipeline. ETL Framework
- Configure
io_config.ymlfor BigQuery data export. (For assistance, contact the support team) - Create analytical data products using SQL scripts. SQL Scripts
- Develop a dashboard in PowerBI or Looker Studio for visualization.

