This is a collection of the projects realized following the syllabus of the Data Engineering Nanodegree offered by Udacity (https://www.udacity.com/course/data-engineer-nanodegree--nd027).
The course is divided into 4 blocks of lessons, each block consists of a theoretical introduction on various topics, a series of demos for hands-on practice on the explained concepts and one (or two) projects:
1. Data Modeling
- Introduction to Data Modeling
- Relational Data Models
- [Proj1]: Data Modeling with Postgres
- NoSQL Data Models
- [Proj2]: Data Modeling with Apache Cassandra
2. Cloud Data Warehouses
- Introduction to Data Warehouses
- Introduction to Cloud Computing and AWS
- Implementing Data Warehouses on AWS
- [Proj3]: Data Warehouse
3. Data Lakes with Spark
- The Power of Spark
- Data Wrangling with Spark
- Debugging and Optimization
- Introduction to Data Lakes
- [Proj4]: Data Lake
4. Data Pipelines with Airflow
- Data Pipelines
- Data Quality
- Production Data Pipelines
- [Proj5]: Data Pipelines
5. Bonus: [CapstoneProject] - ####