- Overview
- Data Build Tool
- Folder Structure
- Program Flow
- Program Execution
- Data Model
- Level Up
- Documentation and Material
- Tools and Technologies
- The purpose of doing this was to learn how the Tranformations step can be made easy using data build tool (dbt).
- Check what different functionalities does dbt support that can save development time.
- Snowflake was used as a warehouse. Free 30 day trail when you sign up.
- dbt enables analytics engineers to transform data in their warehouses by simply writing select statements.
- dbt handles turning these select statements into tables and views. These tables can be incremental and dbt will handle that.
- dbt does the T in ELT (Extract, Load, Transform) processes.
- dbt provides a web UI that can be used to visualize the lineage and dependency of models.
- dbt web UI also provides detail about each model and what it depends on as well the as test cases on certain models.
-
Install git and python.
-
dbt installation on Linux has some problem and due to that we need install the dbt-core project.
-
Will be doing this all in a virtual environment.
-
This will create a virtual env
python3 -m venv dbt-env
- Activate the env
- Remember to be in the correct folder.
source dbt-env/bin/activate
- In this repo dbt-core has already been download but you can clone it as well.
git clone https://github.com/dbt-labs/dbt.git
- Go into the dbt folder and install dbt.
- In the requirements.txt you can specify what to download e.g. snowflake, big-query.
cd dbt
pip install -r requirements.txt
- Command to verify dbt installed.
dbt --version
- Download the Snowflake Plugin if not specified in the requirements.txt file
pip install dbt-snowflake
- Open your choice of editor, VS Code used in this demo. Running the command will populate with different folders (seeds, models etc)
dbt init <project-name>
- Snowflake provides 30 day free trail.
- When dbt installed and configured, in home directory a .dbt will be visible.
- Two file will be present.
- profile.yml
- .user.yml
- In the profile.yml we provide our Snowflake credentials. Refer to dbt documentation.
- https://docs.getdbt.com/reference/warehouse-profiles/snowflake-profile
dbt_model:
outputs:
dev:
account: ap12345.ap-south-1.aws
database: <database>
password: <password>
role: <role-from-snowflake>
schema: <schema>
threads: 2
type: snowflake
user: <username>
warehouse: <warehouse-name-from-snowflake>
target: dev
- When snowflake profile has been set, run the command to check the connection.
dbt debug
- dbt - dbt cloned repo used for installation
- dbt-evn - python virtual env related
- dbt-model
- dbt-model - after dbt init this is created
- analyses
- macros - create macros here and refer later
- models - tables, views, incremental load, merge
- seeds - flat files incase want to load to staging tables using dbt
- snapshots - SCD tables
- tests - tests on different models
- dbt_project.yml - one place to configure all
- packages.yml - dbt has many packages which can be downloaded
- dbt-model - after dbt init this is created
- Before executing any of the commands remember to be in the correct folder.
cd <project-name>
- To load file from seeds folder to Stage Tables in snowflake.
dbt seed
- The data will be in the Stage Tables, now will load data to Core/Dim tables.
- City, Country, Transations will be loaded as they have no history handling needed.
dbt run
- To run a specific folder inside model folder.
dbt run -m <folder-name>
- The Snapshot folder has all those models on which SCD-2 is being used.
dbt snapshot
- We can also check test cases that are defined on different models, snapshots, seeds.
dbt test
- dbt provides a web UI that can be accessed using.
- Internally it has all metadata in json that is saved and used by the web UI
dbt docs generate
dbt docs serve
- You can check different things in the UI and also the lineage as well.
- The source system provides the Full Dump (all of the data every time).
- The Transactions table is append only data.
- Most the tables in Core/Dim are SCD Type 1 or Type 2.
- Country and City do not change much they can be loaded manually whenever needed.
- Right now just loading data to Core/Dim.
- Create a pipeline and orchectrate it using Airflow or Airbyte.
- Make Fact Table and visualizations on top of that.
- Use dbt test for testing purpose and different packages that can aid in development.
- dbt documentation
- dbt profile setup
- dbt Youtube Playlist
- Snowflake Youtube Playlist
- Thanks to Kahan Data Solutions for the demo videos.
- Dbt
- Snowflake
- Git