This project builds a clean, BI‑ready table user_base in BigQuery by:
- Moving complex SQL out of BI and into dbt layers for consistency, speed, and testability.
- Building a robust location dimension from component tables and deduping per source location to avoid conflicting geographies.
- Unifying user linkage across roles/routes (learners, educators, invite codes, sites/partners) and preserving users without links via left joins, producing a clear grain for dashboards.
- Documenting lineage with dbt Docs and enforcing quality with dbt tests.
- Project:
oroboro-dw - Datasets:
bronze_raw(sources),analytics_dev(models) - dbt source name:
raw
-
staging: thin cleans of raw sources
-
intermediate: reusable transforms and joins
intermediate/locations_clean.sql— one row perfrom_location_idwithcity/county/state/country+ coordinatesintermediate/stacked_users_partners.sql— one row per user + partner/site/classroom
-
marts: BI‑ready outputs
marts/user_base.sql— final table queried by BI
bronze_raw (raw sources)
├─ user_user
├─ location_* (components, types, base)
└─ (classroom, invites, partners, sites)
intermediate views
├─ locations_clean -- unique per from_location_id
└─ stacked_users_partners -- user ↔ partner/site/classroom
mart
└─ user_base (table) -- one row per user_id + partner_id + site_id
A single from_location_id can map to several to_location_ids representing different types (city/locality, county, state, country). The pipeline:
- Collects candidates and tags by type.
- Ranks city candidates (e.g., distance checks, non‑street heuristics).
- Picks one value per type and emits a single, consistent row per source location. This prevents partial/ambiguous geography from appearing in BI and supports stable filtering and grouping.
Users can be associated via multiple routes (classroom membership, invitations, partner codes, etc.). The pipeline:
- Normalizes these routes into a stacked intermediate with a consistent set of fields.
- Joins back to
user_userand the location dimension in the final mart. - Preserves users with no current link (via left joins) so global user counts remain accurate.
Grain of user_base: (user_id, partner_id, site_id).
intermediatemodels: views (lightweight, always reflect raw)marts.user_base: table (fast dashboards, predictable cost)
Change to a view by editing dbt_project.yml:
models:
user_base_project:
marts:
+materialized: viewmodels/marts/marts_schema.yml includes:
not_nullonuser_iddbt_utils.unique_combination_of_columnson(user_id, partner_id, site_id)Will add staging tests (e.g.,unique/not_nullon natural keys) as needed.
pip install dbt-bigquery
# Configure ~/.dbt/profiles.yml from profiles.example.yml
dbt deps # if using packages (dbt_utils)
dbt debug
# Build
dbt run --select intermediate.locations_clean intermediate.stacked_users_partners marts.user_base
# Test
dbt test
# Docs
dbt docs generate && dbt docs serve- Connect Metabase to dataset
analytics_devand queryuser_base. - Keep Metabase dynamic filter queries using fully‑qualified names when Metabase injects SQL. Aliases inside dbt models are fine and don’t affect this.
.
├─ dbt_project.yml
├─ profiles.example.yml
├─ packages.yml
├─ macros/
│ └─ utils.sql
├─ models/
│ ├─ sources.yml
│ ├─ intermediate/
│ │ ├─ locations_clean.sql
│ │ └─ stacked_users_partners.sql
│ └─ marts/
│ ├─ user_base.sql
│ └─ marts_schema.yml
└─ README.md
- dbt Core ≥ 1.5
- Adapter:
dbt-bigquery