Skip to content

tpham45/GCP-Data-Engineer-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 

Repository files navigation

NYC Yellow Taxi Data Analytics Project

Introduction

This project showcases real-world taxi data analytics centered around New York City. Utilizing a range of tools, including GCP Services, Python, Virtual Machine, and Mage Data Pipeline, I prepare data for insightful analysis. The findings are visualized using PowerBI and Looker Studio to inform data-driven decisions.

Architecture Framework

Architecture Diagram

Technology Stack

  • Programming Language: Python
  • Google Cloud Platform (GCP):
    • Google Storage
    • Compute Engine (VM Instance)
    • BigQuery
    • Looker Studio
  • Data Pipeline Tool: Mage-AI
  • Data Visualization: Power BI & Looker Studio

Data Source

I utilize the TLC Trip Record Data, which encompasses a wealth of information from yellow and green taxi trip records, including pick-up and drop-off times, locations, distances, fares, and more. These datasets have been provided under the Taxicab & Livery Passenger Enhancement Programs (TPEP/LPEP).

Additional Resources

ERD Diagram

ERD Diagram

Step-by-Step Guide

  1. Download the dataset to your local machine. Dataset Download
  2. Perform preliminary data transformations in Jupyter Notebooks. Codebase
  3. Initialize the GCP Console.
  4. Create and configure a GCP VM Instance. Installation Scripts
  5. Upload the dataset to Google Cloud Storage with 'Public Access' settings.
  6. Establish a new Mage environment. Refer to Mage documentation for details.
  7. Implement a new firewall rule for Mage project access.
  8. Develop the ETL pipeline. ETL Framework
  9. Configure io_config.yml for BigQuery data export. (For assistance, contact the support team)
  10. Create analytical data products using SQL scripts. SQL Scripts
  11. Develop a dashboard in PowerBI or Looker Studio for visualization.

References

About

Data Engineer - Data Analyst Project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published