This project provides a detailed overview of creating an automated data engineering pipeline. It integrates Apache Airflow for workflow orchestration, utilizes Apache Spark on AWS EMR for large-scale data processing and employs Snowflake for data warehousing. Additionally, Tableau is used for creating visualizations to analyze the real estate market in the USA effectively.
The detailed blog can be found here.
To build the entire pipeline, here was the process:
- Configuring the necessary AWS services
- Setting up Airflow
- Data Collection
- Data Transformation using AWS EMR
- Connecting All the Tasks to Create a DAG Pipeline
- Data Warehousing using Snowflake
- Visualization using Tableau
An overview of the complete pipeline:
The final output of the dashboard created using Tableau:

For questions or feedback about the project, don't hesitate to reach out to me on LinkedIn.