End-to-End Azure Data Engineering Project

Overview

This project demonstrates an end-to-end data engineering pipeline implemented using Azure Databricks, Azure Data Factory (ADF), Azure Logic Apps, SQL Databases, and various data sources such as CSV, XLSX, JSON, PostgreSQL, SQL databases, and APIs. The pipeline follows the Medallion Architecture and is designed to ingest, transform, and process large-scale data, storing it in a structured, clean format ready for reporting and analysis.

The project also includes integration with Azure Logic Apps for email notifications, ensuring that stakeholders are informed of pipeline successes or failures.

Project Description

This repository contains an end-to-end data pipeline that extracts data from multiple sources, transforms it, and loads it into an analytics-ready data store using Medallion Architecture.

Key steps of the project involve:

Ingestion: Pulling data from multiple sources like CSV, XLSX, JSON, PostgreSQL, SQL databases, and APIs.
Transformation: Data is processed, cleaned, and transformed using Azure Databricks (Apache Spark).
Storage: Cleaned and transformed data is stored in Azure SQL Database.
Aggregation: Aggregated views are created for reporting and analytics in the Gold Layer.

The solution also leverages Azure Logic Apps to send email notifications on key pipeline events, ensuring transparency and real-time alerts for data engineers or stakeholders.

Technologies Used

Azure Databricks: For scalable data processing, transformation, and analysis with Apache Spark.
Azure Data Factory (ADF): For orchestrating ETL processes, moving data between systems.
Azure Logic Apps: For sending email notifications upon pipeline completion, success, or failure.
Azure SQL Database: For storing processed and structured data.

Data Sources:

CSV: Data files in Comma Separated Values format.
XLSX: Data in Microsoft Excel format.
JSON: Data in JavaScript Object Notation format.
PostgreSQL: Data sourced from PostgreSQL databases.
SQL databases: Data from structured SQL-based databases.
APIs: External data sources accessed via API requests.

Data Pipeline Architecture

The project follows the Medallion Architecture, a multi-layer data architecture that allows for scalable and structured data processing.

Bronze Layer (Raw Data):
- Raw data from sources like CSV, XLSX, JSON, and APIs is ingested and stored.
- The data is loaded into Azure Data Lake Storage or a similar raw storage solution.
Silver Layer (Cleaned Data):
- The raw data is processed and cleaned using Azure Databricks.
- Data is validated, transformed, and enriched before being loaded into the Azure SQL Database.
Gold Layer (Aggregated Data):
- The data is aggregated and transformed into analytics-ready forms.
- Aggregated tables and views are created and stored in Azure SQL Database for reporting and analysis.

Additionally, Azure Logic Apps are integrated to send email notifications (such as success or failure emails) whenever specific events occur in the pipeline, like completion or errors.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
ADB/notebooks		ADB/notebooks
ADF		ADF
Data Lake/Storage		Data Lake/Storage
Logicapp/LogicApp		Logicapp/LogicApp
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

End-to-End Azure Data Engineering Project

Overview

Table of Contents

Project Description

Technologies Used

Data Pipeline Architecture

About

Languages

Prasadmuthyala/End-to-End-AzureProject

Folders and files

Latest commit

History

Repository files navigation

End-to-End Azure Data Engineering Project

Overview

Table of Contents

Project Description

Technologies Used

Data Pipeline Architecture

About

Topics

Resources

Stars

Watchers

Forks

Languages