A collection of data engineering projects showcasing ETL pipelines, SQL optimization, cloud data processing, orchestration with Apache Airflow, and Power BI dashboards. This repository demonstrates skills in Python, SQL, PySpark, and Azure Data Services
This repository contains a collection of data engineering projects demonstrating ETL pipelines, SQL optimization, cloud data processing, orchestration with Apache Airflow, and Power BI dashboards. The goal is to showcase practical skills in Python, SQL, PySpark, Azure, and Power BI for real-world data engineering workflows.
- Goal: Extract, transform, and load data from an API/CSV into a database.
- Tech Stack: Python, Pandas, PostgreSQL, Airflow.
- Key Features:
- Fetches and cleans data.
- Loads data into a SQL database.
- Automates pipeline execution with Apache Airflow.
- Project Code
- Goal: Connect Power BI to a cloud database and create an interactive dashboard.
- Tech Stack: Power BI, Azure Synapse, SQL.
- Key Features:
- Live data visualization.
- DAX calculations for KPIs.
- Automated data refresh.
- Project Code
- Goal: Process and analyze big data using Azure Data Lake & PySpark.
- Tech Stack: Azure Data Lake, Databricks, PySpark.
- Key Features:
- Stores large datasets in Azure Data Lake.
- Uses PySpark for transformation.
- Loads processed data into Azure Synapse.
- Project Code
- Goal: Improve query performance using indexing and partitioning.
- Tech Stack: PostgreSQL, MySQL.
- Key Features:
- Benchmarks query execution time.
- Implements indexing for optimization.
- Compares before/after performance.
- Project Code
- Goal: Automate ETL workflows using Apache Airflow.
- Tech Stack: Python, Airflow, Docker.
- Key Features:
- Uses Airflow DAGs to schedule ETL jobs.
- Monitors pipeline execution.
- Containerized using Docker.
- Project Code
- Programming: Python, SQL, PySpark
- Databases: PostgreSQL, MySQL, NoSQL
- Cloud: Azure (Data Lake, Synapse, Databricks)
- Orchestration: Apache Airflow, Azure Data Factory
- Visualization: Power BI, DAX
- Containerization: Docker, Kubernetes
- Big Data Tools: Hadoop, Hive, Kafka
git clone https://github.com/erictreacy/data-engineering-portfolio.git
cd data-engineering-portfolio