Skip to content

Mukesh-Sajjan/Data-Engineering-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Powerlifting Meets Data Pipleine

A data pipleline for batch processing of 2 CSV file (Open Power Lifting and Meets)

Web --> Prefect --> GCS --> DBT --> BigQuery --> Looker Studio

Objective

The project is intended to build a data pipeline for batch processing every week for Meets.csv and OpenPowerlifting.csv that was loaded from web to gcs using prefect tool with python script and then ingested over to BigQuery datawarehouse. Further, there are transformationx done to join these 2 csv files using meetid to bring the information together from both the files using inner join and tranforming the datatype of the existing columns. This transformation helped to come up with a new Powerliftingmaster table in BigQuery which was then connected to Looker Studio to build the dashboard.

Dataset

Powerlifting dataset was downloaded from Kaggle that provides 2 csv data files where meetid is common to both files and can be used to join and integrate the fields to create one master file. Meets file provides the information about the events and the places where those events were organized. Open Powerlifting file provides the information on athletes, games they participated, ranking etc.

Tools and Technologies

  • Cloud - Google Cloud Platform
  • Orchestration and Batch processing - Prefect
  • Transformation - dbt
  • Data Lake - Google Cloud Storage
  • Data Warehouse - BigQuery
  • Data Visualization - Looker Studio
  • Language - Python

Architecture

image

Final Dashboard

image

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages