Skip to content

Data Migration with Azure Synapse, PySpark, and Delta Tables

Notifications You must be signed in to change notification settings

Adarsh-Hota/Fintech_Data_Migration

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Migration with Azure Synapse, PySpark, and Delta Tables

This project sets up an end-to-end data pipeline that transforms and processes historical data from a SQL Database into a structured format in Azure Synapse Analytics, utilizing Azure Data Lake Storage (ADLS) and Delta Tables for efficient storage and querying.

Fintech Azure Snyapse Pipeline Complete Run

Tech Stack

  • Python
  • Azure SQL Database
  • T-SQL (Transact-SQL)
  • Azure Synapse Analytics
  • Azure Data Lake Storage (ADLS)
  • Azure Logic App
  • Azure Notebook
  • PySpark
  • Delta Tables

Table of Contents

  1. Pipeline Overview
  2. Pipeline Activities
  3. Parameters
  4. Pipeline Steps
  5. Summary of Flow

Pipeline Overview

The data pipeline consists of the following stages:

Storage Account Layers

Pipeline Overview

The data pipeline consists of the following stages:

  • Bronze Layer: Raw, unprocessed data directly coming from each table stored in the Azure SQL database. All these tables are stored in Parquet format in Azure Data Lake Storage (ADLS) for further processing.
  • Silver Layer: Cleaned and transformed data stored as Delta Tables for optimized querying and performance.
  • Gold Layer: The final, optimized dataset containing dimension and fact tables, designed for high-performance analytics and reporting.

For detailed activity descriptions, see Pipeline Activities.