Skip to content

Using dbt to load(seed) and do some transformations and then finally load that data to some Cloud Warehouse

Notifications You must be signed in to change notification settings

hams71/Dbt_Demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dbt_With_SnowFlake

Table of Contents


Overview

  • The purpose of doing this was to learn how the Tranformations step can be made easy using data build tool (dbt).
  • Check what different functionalities does dbt support that can save development time.
  • Snowflake was used as a warehouse. Free 30 day trail when you sign up.

Data Build Tool

  • dbt enables analytics engineers to transform data in their warehouses by simply writing select statements.
  • dbt handles turning these select statements into tables and views. These tables can be incremental and dbt will handle that.
  • dbt does the T in ELT (Extract, Load, Transform) processes.
  • dbt provides a web UI that can be used to visualize the lineage and dependency of models.
  • dbt web UI also provides detail about each model and what it depends on as well the as test cases on certain models.

dbt Installation

  • Install git and python.

  • dbt installation on Linux has some problem and due to that we need install the dbt-core project.

  • Will be doing this all in a virtual environment.

  • This will create a virtual env

  python3 -m venv dbt-env
  • Activate the env
    • Remember to be in the correct folder.
  source dbt-env/bin/activate
  • In this repo dbt-core has already been download but you can clone it as well.
  git clone https://github.com/dbt-labs/dbt.git
  • Go into the dbt folder and install dbt.
  • In the requirements.txt you can specify what to download e.g. snowflake, big-query.
  cd dbt
  pip install -r requirements.txt
  • Command to verify dbt installed.
  dbt --version
  • Download the Snowflake Plugin if not specified in the requirements.txt file
  pip install dbt-snowflake
  • Open your choice of editor, VS Code used in this demo. Running the command will populate with different folders (seeds, models etc)
  dbt init <project-name>

Connecting dbt with Snowflake

dbt_model:
  outputs:
    dev:
      account: ap12345.ap-south-1.aws
      database: <database>
      password: <password>
      role: <role-from-snowflake>
      schema: <schema>
      threads: 2
      type: snowflake
      user: <username>
      warehouse: <warehouse-name-from-snowflake>
  target: dev
  • When snowflake profile has been set, run the command to check the connection.
  dbt debug

Folder Structure

  • dbt     - dbt cloned repo used for installation
  • dbt-evn   - python virtual env related
  • dbt-model
    • dbt-model   - after dbt init this is created
      • analyses
      • macros       - create macros here and refer later
      • models       - tables, views, incremental load, merge
      • seeds         - flat files incase want to load to staging tables using dbt
      • snapshots       - SCD tables
      • tests        - tests on different models
      • dbt_project.yml    - one place to configure all
      • packages.yml    - dbt has many packages which can be downloaded

Program Flow


Program Execution

  • Before executing any of the commands remember to be in the correct folder.
  cd <project-name>
  • To load file from seeds folder to Stage Tables in snowflake.
  dbt seed
  • The data will be in the Stage Tables, now will load data to Core/Dim tables.
    • City, Country, Transations will be loaded as they have no history handling needed.
  dbt run
  • To run a specific folder inside model folder.
  dbt run -m <folder-name>
  • The Snapshot folder has all those models on which SCD-2 is being used.
  dbt snapshot
  • We can also check test cases that are defined on different models, snapshots, seeds.
  dbt test
  • dbt provides a web UI that can be accessed using.
    • Internally it has all metadata in json that is saved and used by the web UI
  dbt docs generate
  dbt docs serve
  • You can check different things in the UI and also the lineage as well.


Data Model

  • The source system provides the Full Dump (all of the data every time).
  • The Transactions table is append only data.
  • Most the tables in Core/Dim are SCD Type 1 or Type 2.
  • Country and City do not change much they can be loaded manually whenever needed.

Level Up

  • Right now just loading data to Core/Dim.
  • Create a pipeline and orchectrate it using Airflow or Airbyte.
  • Make Fact Table and visualizations on top of that.
  • Use dbt test for testing purpose and different packages that can aid in development.

Documentation and Material


Tools and Technologies

  • Dbt
  • Snowflake
  • Git

About

Using dbt to load(seed) and do some transformations and then finally load that data to some Cloud Warehouse

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published