Skip to content

DataSystemsGroupUT/dataeng

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Engineering:

Repository for the Data Engineering Course (LTAT.02.007)

Graph View

inline

Teaching Assistants:

Acknowledgments

Special Thanks to Emanuele Della Valle and Marco Brambilla from Politecnico di Milano to letting me "steal" some of their great slides.

Overview

Course Goals: Build a data pipeline about: Internet Memes

Lectures

Date Title Material Mandatory Reads Extras
01/09 Course Intro Slides - pdf slide 45-109)
03/09 Data Modeling Slides - pdf slide 1-44 Chp 4 p111-127, Chp 5 p151-156, Chp 6 p199-205 of [3]
10/09 DM for Relational Databases Slides - pdf slide 45-109 Chp 2, 6, and 7 (Normal Forms) of [1] Relational Model
10/09 DM for Data Warehouse Slides - pdfslide 109-118 pdf video Chp 2 of [2]
17/09 DM for Big Data Slides - pdf Chp 2 of [3], video paper
17/09 Key Value Stores Slides 1,Slides 2pdf nosql
24/10 Column Oriented Databases Slides 1 Slides 2 pdf nosql
24/10 Document Databases Slides 1 Slides 2 pdf nosql
01/10 Graph Databases Slides 1 Slides 2 pdf1 pdf2 Chp 3 and 5 of [5] book
08/10 Data Ingestion Slides 1 Slide 2 Slide 3 Slide 4
15/10 Part 1 Recap Slides 1 pdf
22/10 Midterm
29/10 Data Engineering Pipelines (Part1) Slides 1 slide 2 pdf
05/11 Data Engineering Pipelines (Part2) Slides 1 Slides 2 Slides 3 Chp 10 of 3 R. Chang Pt 2 R. Chang Pt 3
12/11 Streaming Data (Part 1) Slide 1 Slide 2 Chp 11 of 3 Streaming 101 Streaming 102
19/11 Data Journey Slides
26/11 Streaming Data (Part 2) Slide 1 Slide 2
03/12 Data Wrangling (Part 1) pdf
10/12 Data Wrangling (Part 2) pdf

Practices (Videos Will be Available after Group 2 issue)

Date Title Material Reads Videos Branch Notes
07-8/09 Docker Slides - Video GP1 Video GP2 Lab Branch QA GP2 only
14-15 /09 Modeling and Querying Relational Data with Postgres Slides Chp 32 of [1]§ Video Homework 1
21-22 /09 Modeling and Querying Key Value Data with Redis Slides Video Homework 2
28-29/09 Modeling and Querying Document Data with MongoDB Slides Video Homework 3
5-6/10 Modeling and Querying Graph Data with Neo4J Slides CypherManual Video Homework 4
19-20-26-27/10 Data Ingestion with Apache Kafka Slides Video 1 Video 2 Video 3 Video 4 Homework 5
10-11/11 Apache Airflow Data Pipelines Slides Video 1 Video 2 Homework 6
16-17/11 Stream Processing with Kafka Streams Slides Video 1 Video 2 Homework 7
23-24/11 Stream Processing with KSQL Slides Video 1 Video 2 Homework 7
07-8/12 Data Cleansing Slides Video 1 Video 2 Homework8
14-15/12 Data Augmentation Slides Video1Video2 Homework8

Extras

Contributing

  • Modeling and Querying RDF data: SPARQL
  • Domain Driven Design: a summary
  • Event Sourcing: a summary
  • Data Pipelines with Luigi
  • Data Pipelines with Apachi Nifi
  • Data Processing with Apache Flink

Syllabus

  • What is (Big) Data?
  • The Role of Data Engineer
  • Data Modeling
    • Data Replication
    • Data Partitioning
    • Transactions
  • Relational Data
  • NoSQL
    • Document
    • Graph
  • Data Warehousing
    • Star and Snowflake schemas
  • Data Vault
  • (Big) Data Pipelines
    • Big Data Systems Architectures
    • ETL and Data Pipelines
      • Best Practices and Anti-Patterns
    • Batch vs Streaming Processing
  • Data Cleansing
  • Data Augumentation

Books

About

Repository fo Data Engineering Course

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published