Skip to content

Latest commit

 

History

History
103 lines (64 loc) · 2.78 KB

File metadata and controls

103 lines (64 loc) · 2.78 KB

Week 2: Workflow Orchestration

If you're looking for Airflow videos from the 2022 edition, check the 2022 cohort folder.

Files.

Vicnenzo's DE Zoomcamp Prefect Repo - For Week 2

Data Lake (GCS)

  • What is a Data Lake
  • ELT vs. ETL
  • Alternatives to components (S3/HDFS, Redshift, Snowflake etc.)
  • Video
  • Slides

1. Introduction to Workflow orchestration

  • What is orchestration?
  • Workflow orchestrators vs. other types of orchestrators
  • Core features of a workflow orchestration tool
  • Different types of workflow orchestration tools that currently exist

🎥 Video - TBA

2. Introduction to Prefect concepts

  • What is Prefect?
  • Installing Prefect
  • Prefect flow
  • Creating an ETL
  • Prefect task
  • Blocks and collections
  • Orion UI

🎥 Video - TBA

3. ETL with GCP & Prefect

  • Flow 1: Putting data to Google Cloud Storage

🎥 Video - TBA

4. From Google Cloud Storage to Big Query

  • Flow 2: From GCS to BigQuery

🎥 Video - TBA

5. Parametrizing Flow & Deployments

  • Parametrizing the script from your flow
  • Parameter validation with Pydantic
  • Creating a deployment locally
  • Setting up Prefect Agent
  • Running the flow
  • Notifications

🎥 Video - TBA

6. Schedules & Docker Storage with Infrastructure

  • Scheduling a deployment
  • Flow code storage
  • Running tasks in Docker

🎥 Video - TBA

7. Prefect Cloud and Additional Resources

  • Using Prefect Cloud instead of local Prefect
  • Workspaces
  • Running flows on GCP

🎥 Video - TBA

Homework

TBA

Community notes

Did you take notes? You can share them here.

  • Add your notes here (above this line)

2022 notes

Most of these notes are about Airflow, but you will most likely find them useful too.