This project demonstrates key PySpark performance optimization techniques using a synthetic banking transactions dataset (~5,000 records). Built using Databricks and Delta Lake.
pyspark data-engineering parquet partitioning databricks etl-pipeline bucketing delta-lake broadcast-join spark-optimization spark-performance adaptive-query-execution
-
Updated
Aug 12, 2025 - Python