schema_drift_impact

A tool to analyze the impact of schema drift (disappearing columns) in dbt source tables by generating column lineage reports.

Overview

This project helps track how changes in source table columns (such as columns being removed or renamed) affect downstream dbt models. It uses column lineage data generated by dbt-colibri to create HTML reports showing direct and downstream usage of source columns.

Dependencies

Python 3.6+
dbt-colibri (install via pip install dbt-colibri)

Setup

Install dbt-colibri:
```
pip install dbt-colibri
```
In your dbt project directory, run the following commands to generate the required manifest:
```
dbt compile
dbt docs generate
colibri generate
```
This will create colibri-manifest.json in your dbt project's dist/ directory.

Usage

Update the MANIFEST_PATH variable in schema_drift_impact.py to point to your colibri-manifest.json file (e.g., ../integration_tests/dist/colibri-manifest.json).
Modify the INPUTS list in schema_drift_impact.py to specify the source tables and columns you want to analyze. Each input should be a dictionary with:
- source: The fully qualified source name (e.g., "source.integration_tests.kdrogaieva.source_table")
- source_column: The column name to track (e.g., "dummy_varchar")
Set the OUTPUT_FORMAT to "html" (default) to generate an HTML report, or "json" for JSON output.
Run the script:
```
python schema_drift_impact.py
```
If using HTML output, the report will be saved as column_lineage_report.html and also printed to the console.

Output

The report shows for each specified source column:

Direct Usage: Models that directly reference the column
Downstream Usage: Models that depend on the direct usage models
Errors: Any issues encountered during analysis or notification, nothing is found if a column is not used anywhere

Files

schema_drift_impact.py: Main script to generate reports
colibri_lineage.py: Library for processing column lineage data
column_lineage_report.html: Generated HTML report (example)

Purpose

This tool is particularly useful for:

Assessing the impact of schema changes in source systems
Identifying which dbt models would be affected by column removals
Planning data migration or refactoring efforts
Ensuring data pipeline reliability when source schemas evolve

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
colibri_lineage.py		colibri_lineage.py
column_lineage_report.html		column_lineage_report.html
schema_drift_impact.py		schema_drift_impact.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

schema_drift_impact

Overview

Dependencies

Setup

Usage

Output

Files

Purpose

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

schema_drift_impact

Overview

Dependencies

Setup

Usage

Output

Files

Purpose

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages