Skip to content

Latest commit

 

History

History

04-analytics-engineering

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

Module 4: Analytics Engineering

Goal: Transforming the data loaded in DWH into Analytical Views developing a dbt project.

Prerequisites

By this stage of the course you should have already:

  • A running warehouse (BigQuery or postgres)
  • A set of running pipelines ingesting the project dataset (week 3 completed)
  • The following datasets ingested from the course Datasets list:
    • Yellow taxi data - Years 2019 and 2020
    • Green taxi data - Years 2019 and 2020
    • fhv data - Year 2019.

Note

  • We have two quick hack to load that data quicker, follow this video for option 1 or check instructions in week3/extras for option 2

Setting up your environment

Note

the cloud setup is the preferred option.

the local setup does not require a cloud database.

Alternative A Alternative B
Setting up dbt for using BigQuery (cloud) Setting up dbt for using Postgres locally
- Open a free developer dbt cloud account following this link - Open a free developer dbt cloud account following this link

- Following these instructions to connect to your BigQuery instance - follow the official dbt documentation or
- follow the dbt core with BigQuery on Docker guide to setup dbt locally on docker or
- use a docker image from oficial Install with Docker.
- More detailed instructions in dbt_cloud_setup.md - You will need to install the latest version with the BigQuery adapter (dbt-bigquery).
- You will need to install the latest version with the postgres adapter (dbt-postgres).
After local installation you will have to set up the connection to PG in the profiles.yml, you can find the templates here

Content

Introduction to analytics engineering

  • What is analytics engineering?
  • ETL vs ELT
  • Data modeling concepts (fact and dim tables)

What is dbt?

  • Introduction to dbt

Starting a dbt project

Alternative A Alternative B
Using BigQuery + dbt cloud Using Postgres + dbt core (locally)
- Starting a new project with dbt init (dbt cloud and core)
- dbt cloud setup
- project.yml

- Starting a new project with dbt init (dbt cloud and core)
- dbt core local setup
- profiles.yml
- project.yml

dbt models

  • Anatomy of a dbt model: written code vs compiled Sources
  • Materialisations: table, view, incremental, ephemeral
  • Seeds, sources and ref
  • Jinja and Macros
  • Packages
  • Variables

Note

This video is shown entirely on dbt cloud IDE but the same steps can be followed locally on the IDE of your choice

Tip

  • If you recieve an error stating "Permission denied while globbing file pattern." when attempting to run fact_trips.sql this video may be helpful in resolving the issue

Testing and documenting dbt models

  • Tests
  • Documentation

Note

This video is shown entirely on dbt cloud IDE but the same steps can be followed locally on the IDE of your choice

Deployment

Alternative A Alternative B
Using BigQuery + dbt cloud Using Postgres + dbt core (locally)
- Deployment: development environment vs production
- dbt cloud: scheduler, sources and hosted documentation
- Deployment: development environment vs production
- dbt cloud: scheduler, sources and hosted documentation

Visualising the transformed data

🎥 Google data studio Video (Now renamed to Looker studio)

🎥 Metabase Video

Extra resourses

Note

If find the videos above overwhelming, we recommend completing the dbt Fundamentals course and then rewatching the module. It provides a solid foundation for all the key concepts you need in this module.

Advanced concepts

Community notes

Did you take notes? You can share them here.

Useful links