Automatic Schema Lineage Discovery for SQL and Notebook Pipelines — includes Algorithm 1 implementation, SDG builder, synthetic corpus generator, and Colab quickstart for reproducible experiments.
big-data reproducible-research data-engineering sql-parser data-lineage duckdb sqlglot schema-dependency
-
Updated
Nov 7, 2025 - Jupyter Notebook