Udacity-Data-Engineering-Nanodegree

C1. Welcome to the Data Engineering Nanodegree Program

Program Introduction: projects, pre-requisites, instructors, careers team, getting help..
Introduction to Data Engineering
What do data engineers do?

C2. Data Modeling

NoSQL Database: Apache Cassandra
OLAP vs. OLTP
Normalization, Denormalization
Fact and Dimension tables
Star and Snowflake Schema
Project 1: Data Modeling with Postgres Details

This project builds a star schema in Postgres with fact and dimension tables for analytics. A Python ETL pipeline transfers data from local JSON files into these tables. Apache Cassandra exercises: create tables, primary key, clustering column, where clause

Apache Cassandra exercises 1-4: create tables, primary key, clustering column, where clause
Project 2: Data Modeling with Apache Cassandra Details

This project involves designing and implementing an ETL pipeline to analyze music streaming data using Apache Cassandra. The goal is to transform raw event data stored in CSV files into a structured database optimized for specific query patterns.

C3. Cloud Data Warehouse

L1. Introduction to Data Warehouse

Data Warehouse
Primary key, partition key, composite key, and clustering key
Dimensional Modeling: Fact & Dimensions
Exercise 1: 3NF to Star Shema

This exercise effectively demonstrates the practical process of converting a database from Third Normal Form (3NF) to a Star Schema. The key takeaway is the strategic denormalization of tables to optimize for analytical querying. By creating a central fact table (sales_fact) surrounded by descriptive dimension tables (dim_customer, dim_product, etc.), the resulting schema simplifies data retrieval and enhances performance for business intelligence tasks like sales analysis. This transformation highlights the fundamental difference between a schema designed for transactional efficiency (3NF) and one designed for analytical speed and simplicity (Star Schema).

DWH Architecture: Kimball’s Bus Architecture, Independent Data Marts, Inmon’s Corporate Information Factory (CIF), Hybrid Kimball Bus & Inmon CIF
Data marts
OLAP Cubes and exercises: Roll-up, Drill-down, Slice, Dice, query optimization
OLAP Cubes Technologies: MOLAP, ROLAP
Exercise 3: Column format in ROLAP

L2. Introduction to Cloud Computing and AWS

Create an IAM Role, Security Group, an IAM User, Bucket
Lunch and delete a Redshift Cluster
Create PostgreSQL Database

L3. Implementing Data Warehouse on AWS

Redshift technology, Architecture, ETL
Exercise 1: Launch Redshift cluster
Infrastructure as Code (IaC)
Exercise 2: Creating Redshift Cluster using the AWS python SDK
Exercise 3 Parallel ETL
Optimizing Table Design
Distribution Style: Even, All, Auto, Key
Sorting Key
Exercise 4: Table Design

L4. Project

Project: Data Warehouse deatails

This project designs and implements a cloud-based Data Warehouse on Amazon Redshift. It demonstrates an end-to-end ETL pipeline that extracts song and user activity data from files stored in an Amazon S3 bucket, stages and processes it in Redshift, and transforms it into a set of optimized dimensional tables (a star schema).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Udacity-Data-Engineering-Nanodegree

C1. Welcome to the Data Engineering Nanodegree Program

C2. Data Modeling

C3. Cloud Data Warehouse

L1. Introduction to Data Warehouse

L2. Introduction to Cloud Computing and AWS

L3. Implementing Data Warehouse on AWS

L4. Project

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
pic		pic
C2-4.10 Exercise 1 Three Queries Three Tables.ipynb		C2-4.10 Exercise 1 Three Queries Three Tables.ipynb
C2-4.15 Exercise 2 Primary Key.ipynb		C2-4.15 Exercise 2 Primary Key.ipynb
C2-4.19 Exercise 3 Clustering Column.ipynb		C2-4.19 Exercise 3 Clustering Column.ipynb
C2-4.22 Exercise 4 Using the WHERE Clause.ipynb		C2-4.22 Exercise 4 Using the WHERE Clause.ipynb
C3-L1 Exercise 1 3NF to Star Schema.ipynb		C3-L1 Exercise 1 3NF to Star Schema.ipynb
C3-L1 Exercise 2-1 Slicing and Dicing.ipynb		C3-L1 Exercise 2-1 Slicing and Dicing.ipynb
C3-L1 Exercise 2-2 Roll up and Drill Down.ipynb		C3-L1 Exercise 2-2 Roll up and Drill Down.ipynb
C3-L1 Exercise 2-3 Grouping Sets.ipynb		C3-L1 Exercise 2-3 Grouping Sets.ipynb
C3-L1 Exercise 2-4 CUBE.ipynb		C3-L1 Exercise 2-4 CUBE.ipynb
C3-L1 Exercise 3 Columnar Vs Row Storage.ipynb		C3-L1 Exercise 3 Columnar Vs Row Storage.ipynb
C3-L3 Exercise 2 Creating Redshift Cluster using the AWS python SDK.ipynb		C3-L3 Exercise 2 Creating Redshift Cluster using the AWS python SDK.ipynb
C3-L3 Exercise 3 Parallel ETL.ipynb		C3-L3 Exercise 3 Parallel ETL.ipynb
C3-L3 Exercise 4 Table Design.ipynb		C3-L3 Exercise 4 Table Design.ipynb
C4-L1-05_Demo hadoop mapreduce.ipynb		C4-L1-05_Demo hadoop mapreduce.ipynb
C4-L2-05_Example procedural_vs_functional_in_python.ipynb		C4-L2-05_Example procedural_vs_functional_in_python.ipynb
C4-L2-09_Example spark_maps_and_lazy_evaluation.ipynb		C4-L2-09_Example spark_maps_and_lazy_evaluation.ipynb
C4-L2-14_Example data_inputs_and_outputs.ipynb		C4-L2-14_Example data_inputs_and_outputs.ipynb
C4-L2-18_Example data_wrangling.ipynb		C4-L2-18_Example data_wrangling.ipynb
C4-L2-20_Quiz data_wrangling_with_dataframes_Solution_Code.ipynb		C4-L2-20_Quiz data_wrangling_with_dataframes_Solution_Code.ipynb
C4-L2-24_Example data_wrangling_with_Sparksql.ipynb		C4-L2-24_Example data_wrangling_with_Sparksql.ipynb
C4-L2-26_Quiz spark_sql_solution.ipynb		C4-L2-26_Quiz spark_sql_solution.ipynb
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Udacity-Data-Engineering-Nanodegree

C1. Welcome to the Data Engineering Nanodegree Program

C2. Data Modeling

C3. Cloud Data Warehouse

L1. Introduction to Data Warehouse

L2. Introduction to Cloud Computing and AWS

L3. Implementing Data Warehouse on AWS

L4. Project

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages