This project sets up an end-to-end data pipeline that transforms and processes historical data from a SQL Database into a structured format in Azure Synapse Analytics, utilizing Azure Data Lake Storage (ADLS) and Delta Tables for efficient storage and querying.
- Python
- Azure SQL Database
- T-SQL (Transact-SQL)
- Azure Synapse Analytics
- Azure Data Lake Storage (ADLS)
- Azure Logic App
- Azure Notebook
- PySpark
- Delta Tables
The data pipeline consists of the following stages:
The data pipeline consists of the following stages:
- Bronze Layer: Raw, unprocessed data directly coming from each table stored in the Azure SQL database. All these tables are stored in Parquet format in Azure Data Lake Storage (ADLS) for further processing.
- Silver Layer: Cleaned and transformed data stored as Delta Tables for optimized querying and performance.
- Gold Layer: The final, optimized dataset containing dimension and fact tables, designed for high-performance analytics and reporting.
For detailed activity descriptions, see Pipeline Activities.