Skip to content

MobileTeleSystems/data-rentgen

Repository files navigation

Data.Rentgen

Repo Status PyPI PyPI License PyPI Python Version Docker image Documentation Build Status Coverage pre-commit.ci

What is Data.Rentgen?

Data.Rentgen is a DataLineage service compatible with OpenLineage specification.

Note: service is under active development, and is not ready to use.

Goals

  • Collect lineage events produced by OpenLineage clients & integrations (Spark, Airflow, Flink, custom ones).
  • Store operation-grained events (instead of job grained Marquez), for better detalization.
  • Provide API for run ↔ dataset lineage, as well as parent run → children run lineage.
  • Support handling large amounts of lineage events, using Kafka as event buffer and storing data in tables partitioned by event timestamp.

Non-goals

  • This is not a data catalog. Use Datahub or OpenMetadata instead.
  • Static dataset → dataset lineage (like view → table) is not supported.
  • Currently column-level lineage is not supported.

Documentation

See https://data-rentgen.readthedocs.io/