data-engineering-portfolio

A collection of data engineering projects showcasing ETL pipelines, SQL optimization, cloud data processing, orchestration with Apache Airflow, and Power BI dashboards. This repository demonstrates skills in Python, SQL, PySpark, and Azure Data Services

Data Engineering Portfolio

Overview

This repository contains a collection of data engineering projects demonstrating ETL pipelines, SQL optimization, cloud data processing, orchestration with Apache Airflow, and Power BI dashboards. The goal is to showcase practical skills in Python, SQL, PySpark, Azure, and Power BI for real-world data engineering workflows.

Projects

1. ETL Pipeline with Python & SQL

Goal: Extract, transform, and load data from an API/CSV into a database.
Tech Stack: Python, Pandas, PostgreSQL, Airflow.
Key Features:
- Fetches and cleans data.
- Loads data into a SQL database.
- Automates pipeline execution with Apache Airflow.
Project Code

2. Power BI Dashboard with Cloud Data

Goal: Connect Power BI to a cloud database and create an interactive dashboard.
Tech Stack: Power BI, Azure Synapse, SQL.
Key Features:
- Live data visualization.
- DAX calculations for KPIs.
- Automated data refresh.
Project Code

3. Cloud Data Engineering with Azure

Goal: Process and analyze big data using Azure Data Lake & PySpark.
Tech Stack: Azure Data Lake, Databricks, PySpark.
Key Features:
- Stores large datasets in Azure Data Lake.
- Uses PySpark for transformation.
- Loads processed data into Azure Synapse.
Project Code

4. SQL Optimization & Performance Tuning

Goal: Improve query performance using indexing and partitioning.
Tech Stack: PostgreSQL, MySQL.
Key Features:
- Benchmarks query execution time.
- Implements indexing for optimization.
- Compares before/after performance.
Project Code

5. Data Orchestration with Apache Airflow

Goal: Automate ETL workflows using Apache Airflow.
Tech Stack: Python, Airflow, Docker.
Key Features:
- Uses Airflow DAGs to schedule ETL jobs.
- Monitors pipeline execution.
- Containerized using Docker.
Project Code

Technologies Used

Programming: Python, SQL, PySpark
Databases: PostgreSQL, MySQL, NoSQL
Cloud: Azure (Data Lake, Synapse, Databricks)
Orchestration: Apache Airflow, Azure Data Factory
Visualization: Power BI, DAX
Containerization: Docker, Kubernetes
Big Data Tools: Hadoop, Hive, Kafka

Setup Instructions

1. Clone the Repository

git clone https://github.com/erictreacy/data-engineering-portfolio.git
cd data-engineering-portfolio

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

data-engineering-portfolio

Data Engineering Portfolio

Overview

Projects

1. ETL Pipeline with Python & SQL

2. Power BI Dashboard with Cloud Data

3. Cloud Data Engineering with Azure

4. SQL Optimization & Performance Tuning

5. Data Orchestration with Apache Airflow

Technologies Used

Setup Instructions

1. Clone the Repository

About

Releases

Packages

erictreacy/data-engineering-portfolio

Folders and files

Latest commit

History

Repository files navigation

data-engineering-portfolio

Data Engineering Portfolio

Overview

Projects

1. ETL Pipeline with Python & SQL

2. Power BI Dashboard with Cloud Data

3. Cloud Data Engineering with Azure

4. SQL Optimization & Performance Tuning

5. Data Orchestration with Apache Airflow

Technologies Used

Setup Instructions

1. Clone the Repository

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages