This project demonstrates an end-to-end data pipeline built using Databricks, Delta Live Tables, and DBT (Data Build Tool) to process, transform, and model flight booking data for advanced analytics and business intelligence.
The project simulates a real-world data engineering workflow for ingesting flight-related datasets (passengers, bookings, flights, airports), applying transformations and data quality rules, and modeling a star schema for downstream analytics using DBT.
It covers:
- Autoloading CSVs from managed volume into Bronze layer
- Using Delta Live Tables to build the Silver layer with DLT expectations
- Creating a dynamic star schema in the Gold layer
- Consuming gold data using DBT for curated business views
The architecture follows the modern lakehouse paradigm with an integrated warehouse and transformation layer.
This pipeline ingests raw CSV files using Autoloader and moves data into the Bronze layer using dynamic iteration logic.
- Uses a lookup notebook for incremental load tracking
- Autoloads multiple files using schema evolution
The Silver layer is built using Delta Live Tables (DLT) with the following features:
- Streaming ingestion of data
- Transformations like joins, filtering, and normalization
- DLT expectations to enforce data quality
The Gold layer includes:
Gold_Dimension
notebook: dynamically creates dimension tablesGold_Fact
notebook: builds the centralfactbookings
table
This schema is dynamically generated to enable extensibility and modularity.
DBT is used to build curated business views on top of the gold layer. An example model aggregates booking amounts by country.
- Materialized as tables
- Maintains version control
- Enables SQL-based transformation on curated data
flight DB & DBT/
βββ Lookup Notebook.python # Used for last load tracking
βββ Bronze Layer.python # Ingests data using Autoloader
βββ DLT_DIMENSION/ # Contains DLT streaming logic for silver layer
βββ Gold_Dimension.python # Dynamically builds star schema dimensions
βββ Gold_Fact.python # Builds the fact table
βββ Script Notebook.python # Auxiliary scripts