Skip to content

Latest commit

 

History

History
42 lines (27 loc) · 2.6 KB

README.md

File metadata and controls

42 lines (27 loc) · 2.6 KB

Hours with Experts logo

Welcome to the Hours With Experts Labs Repo

This repository will be the central location for the hands-on programming component of the course.

Course Work Overview

The goal of the course is to build an end-to-end data pipeline processing Amazon reviews.

The data pipeline you construct will look like below: Hours with Experts logo

Repo Overview

  • Week 1 - Environment Setup - Configure your environment to begin the programming course work
  • Week 2 - Spark SQL - write a Python Spark application to analyze local Amazon review data
  • Week 3 - Write to Amazon S3 - the program will now connect to Amazon S3 and write data to the storage
  • Week 4 - Kafka + Bronze layer - read from Kafka instead of the local file, and use Spark structured streaming to be output to Amazon S3 creating the Bronze layer
  • Week 5 - Silver layer - transform and enrich data from the Bronze layer, creating the Silver layer
  • Week 6 - Gold layer - define a schema for the silver layer, streams the data from the silver layer, transforms the data, and establishes the gold layer
  • TODO: Week 7 BI

Important Course Resources

Continued Learning

Want to continue your learning in Data Engineering? Great -- check out these links:

  • STL Big Data - Innovation, Data Engineering, Analytics Group A meetup for users of Big Data services and tools in the Saint Louis Area. We are interested in Innovation (new tools, techniques, and services), Data Engineering (architecture and design of data movement systems), and Analytics (converting information into meaning). (with Kit Menke and Matt Harris)

  • Data Engineering Podcast This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Some Previous "STL Big Data - I.D.E.A" Meetups

Apache Iceberg Presentation - August 2023

LakeFS Presentation - June 2023