Skip to content

MahmoudHousam/data-engineering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 

Repository files navigation

Project-based Learning - Data Engineering

This repository is dedicated to honing skills in healthcare data engineering through practical projects and exercises with support from Synthea, a synthetic clinical data simulator to output realistic, but not real, patient data. The objective behind this repository is to provide hands-on experience by leveraging Python and SQL programming languages, along with a diverse set of technologies and tools commonly used in the field of data engineering.

Tech Stack

Programming Languages

  • Python
  • SQL

Technologies and Tools

  • Docker
  • Terraform
  • PostgreSQL
  • Google Cloud Platform (GCP)
  • Mage (alternative to Airflow)
  • BigQuery
  • DBT (Data Build Tool)
  • Apache Spark (Python & SQL)
  • Kafka
  • Faust
  • KSQL
  • ksqlDB
  • Make

Modules

  • Module 1: Containerization and Infrastructure as Code (IaC)
    • Docker
    • Terraform
    • GCP
  • Module 2: Workflow Orchestration
    • Data Lake
    • Mage
    • Airflow
  • Module 3: Data Warehouse
    • Data Warehouse
    • BigQuery
  • Module 4: Analytics engineering
    • ELT vs. ETL
    • DBT
    • Testing (unit & integration testing)
  • Module 5: Batch processing
    • Apache Spark (Python & SQL)
  • Module 6: Streaming
    • Kafka
    • Faust
    • KSQL
    • ksqlDB
    • Exposure to examples with Java & Scala

Workshops

  • Workshop 1: Data Ingestion
  • Workshop 2: Stream Processing with SQL

About

hands-on data engineering exercise

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published