This project is meant for experimenting with dbt
and the dbt-clickhouse
adapter for Analytics,
using NYC TLC Trip Record dataset as the datasource, with Kimball dimensional modeling technique.
1. Install dependencies from pyproject.toml and activate the created virtualenv:
uv sync && source .venv/bin/activate
2. (Optional) Install pre-commit:
brew install pre-commit
# From root folder where `.pre-commit-config.yaml` is located, run:
pre-commit install
3. Setup dbt profiles.yaml accordingly (use the profiles.tmpl.yaml
as template)
3.1. By default, the profiles_dir is the user '$HOME/.dbt/'
mkdir -p ~/.dbt/
cat profiles.tmpl.yml >> ~/.dbt/profiles.yml
3.2. Set the environment variables for dbt-clickhouse
:
export DBT_CLICKHOUSE_HOST=localhost
export DBT_CLICKHOUSE_PORT=8123
export DBT_CLICKHOUSE_FQDN_NYC_TAXI=fqdb_nyc_taxi
export DBT_CLICKHOUSE_TARGET_DATABASE=nyc_tlc_record_data
export DBT_CLICKHOUSE_USER=clickhouse
export DBT_CLICKHOUSE_PASSWORD=clickhouse
4. Install dbt dependencies and trigger the pipeline
4.1. Run dbt deps
to install dbt plugins
dbt deps
4.2. Run dbt seed
to push/create the tables from the .csv seed files to the target schema
dbt seed
4.3. Run dbt run to trigger the dbt models to run
dbt build --target [prod|dev]
# Alternatively you can run only a subset of the models with:
## +models/staging: Runs the dependencies/preceding models first that lead
## to 'models/staging', and then the target models
dbt [build|run] --select +models/staging --target [prod|dev]
## models/staging+: Runs the target models first, and then all models that depend on it
dbt [build|run] --select models/staging+ --target [prod|dev]
5. Generate the Docs and the Data Lineage graph with:
dbt docs generate
dbt docs serve
Access the generated docs at:
open http://localhost:8080
1. With your instance of Clickhouse and Postgres up, connect to ClickHouse and run:
CREATE DATABASE fqdb_nyc_taxi
ENGINE = PostgreSQL('host.docker.internal:5432', 'nyc_taxi', 'postgres', 'postgres', 'public', 0);
2. Build the Docker Image with:
docker build -t dbt-clickhouse:latest . --no-cache
3. Fire up the container with it:
docker run -d --rm \
-e DBT_CLICKHOUSE_HOST=host.docker.internal \
-e DBT_CLICKHOUSE_FQDN_NYC_TAXI=fqdb_nyc_taxi \
-e DBT_CLICKHOUSE_TARGET_DATABASE=nyc_tlc_record_data \
-e DBT_CLICKHOUSE_USER=clickhouse \
-e DBT_CLICKHOUSE_PASSWORD=clickhouse \
--name dbt-clickhouse \
dbt-clickhouse
- PEP-517: Packaging and dependency management with
uv
- Bootstrap dbt with ClickHouse Adapter (dbt-clickhouse)
- Generate and serve docs and Data Lineage Graphs locally
- Run
dbt-core
in Docker - Build at least one dbt staging_model based on Federated Queries on PostgreSQL