GitHub - rpalloni/mp-dwh: Python ingestion workflow > DuckDB warehouse > dbt transformations

Architectural Decision Record

Tables (in transformer/models) follow the medallion architecture design principle:

🥉 Staging – Raw data ingestion, untouched
🥈 Marts – Cleaned and transformed data (facts and dimensions)
🥇 Reports – Final data products for analytics, BI, or ML

This structure enables incremental processing and data quality improvements at each layer, with progressive refinement from raw ingestion to silver (cleaned and enriched) and gold (business-ready) tables.
It assigns clear responsibilities per step, fully decouples ingestion from transformations, enhances governance and access control, and simplifies data object discovery for analysts and data scientists.

Generic tests for data quality are included in the sources.yml while specific tests on logic are in the /tests repo.

Strategy

This project leverages a DuckDB data warehouse and runs a dbt project via the dbt-duckdb adapter with workflow (including ingestion from file sources) handled by a Typer app.
The next first two tools to add are:

A data ingestion platform (Fivetran, Airbyte) to load data from different sources into the source area of the DWH
A data orchestration platform (Airflow, Dagster) to automate the workflow execution

Results

Top tier mentors show a 62% rebooking rate compared to the 24% (combined) of standard mentors.
This is confirmed in both calculation approaches (using booking sessions or mentoring sessions) with only minor difference.
Results are shown by tier to highlight the relatively good performance of silver mentors and the alarming one of bronze.
Booking system reliability looks good despite some minor data inconsistencies and several possible improvements (see stg_events).

tier	tier_category	total_users_w_sessions	rebooked_count	rebooking_rate_pct
Gold	top_tier	42	26	61.90
Silver	standard_tier	57	22	38.60
Bronze	standard_tier	33	0	0.0

How to Run with Docker

docker server installed and running locally

git clone <repo> && cd <project-dir>
docker compose up --build

How to Run Locally

install uv

git clone <repo> && cd <project-dir>
uv sync
source .venv/bin/activate

Complete pipeline:

uv run src/workflow.py full
uv run src/workflow.py report

Or step-by-step:

uv run src/workflow.py ingest
uv run src/workflow.py run
uv run src/workflow.py test
uv run src/workflow.py report

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github/workflows		.github/workflows
data		data
src		src
transformer		transformer
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Architectural Decision Record

Strategy

Results

How to Run with Docker

How to Run Locally

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Architectural Decision Record

Strategy

Results

How to Run with Docker

How to Run Locally

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages