Skip to content

Design data models, build data warehouses, data lakes & lakehouse, automate data pipelines - SQL | NoSQL | AWS | Spark | Airflow

Notifications You must be signed in to change notification settings

phphoebe/Udacity-Data-Engineering-with-AWS

Repository files navigation

Udacity Data Engineering with AWS Nanodegree

Design data models, build data warehouses and data lakes, automate data pipelines, and manage massive datasets.

  • Create user-friendly Relational and NoSQL data models
  • Create scalable and efficient data warehouses
  • Work efficiently with massive datasets
  • Build and interact with a cloud-based data lake
  • Automate and monitor data pipelines
  • Develop proficiency in Spark, Airflow, and AWS tools

Create Relational and NoSQL data models to fit the diverse needs of data consumers. Use ETL to build databases in PostgreSQL and Apache Cassandra.

Lessons

  1. Introduction to Data Modeling
  2. Relational Data Models
  3. NoSQL Data Models

Projects

Create cloud-based data warehouses. Sharpen data warehousing skills, deepen understanding of data infrastructure, and be introduced to data engineering on the cloud using Amazon Web Services (AWS).

Lessons

  1. Introduction to Data Warehouses
  2. ELT and Data Warehouse Technology in the Cloud
  3. AWS Data Technologies
  4. Implementing Data Warehouses on AWS

Project

Build a data lake on AWS and a data catalog following the principles of data lakehouse architecture. Learn about the big data ecosystem and the power of Apache Spark for data wrangling and transformation. Work with AWS data tools and services to extract, load, process, query, and transform semi-structured data in data lakes.

Lessons

  1. Big Data Ecosystem, Data Lakes, & Spark
  2. Spark Essentials
  3. Using Spark & Data Lakes in the AWS Cloud
  4. Ingesting & organizing data in lakehouse architecture on AWS

Project

Dive into the concept of data pipelines.

  • Focus on applying the data pipeline concepts learn through Apache Airflow - concepts covered including data validation, DAGs, and Airflow.
  • Venture into AWS quality concepts like copying S3 data, connections and hooks, and Redshift Serverless.
  • Explore data quality through data lineage, data pipeline schedules, and data partitioning.
  • Put data pipelines into production by extending Airflow with plugins, implementing task boundaries, and refactoring DAGs.

Lessons

  1. Data Pipelines
  2. Airflow & AWS
  3. Data Quality
  4. Production Data Pipelines

Project


Program Syllabus, more information about this program can be found by visiting Udacity Data Engineering ND.

About

Design data models, build data warehouses, data lakes & lakehouse, automate data pipelines - SQL | NoSQL | AWS | Spark | Airflow

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published