Skip to content

MobileTeleSystems/data-rentgen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

74bbf82 · Feb 26, 2025
Feb 21, 2025
Feb 26, 2025
Nov 7, 2024
Feb 19, 2025
Feb 21, 2025
Jun 27, 2024
Dec 25, 2024
Feb 19, 2025
Jul 5, 2024
Feb 24, 2025
Jan 9, 2025
Jan 28, 2025
Sep 21, 2024
Jan 9, 2025
Jan 29, 2025
Nov 15, 2024
Jul 3, 2024
Feb 26, 2025
Dec 2, 2024
Jan 29, 2025
Feb 21, 2025
Feb 21, 2025

Repository files navigation

Data.Rentgen logo

Repo Status PyPI PyPI License PyPI Python Version Docker image Documentation Build Status Coverage pre-commit.ci

What is Data.Rentgen?

Data.Rentgen is a Data Motion Lineage service, compatible with OpenLineage specification.

Note: service is under active development, and is not ready to use yet.

Goals

  • Collect lineage events produced by OpenLineage clients & integrations (Spark, Airflow).
  • Support consuming large amounts of lineage events, by using Kafka as event buffer and storing data in tables partitioned by event timestamp.
  • Store operation-grained events (instead of job grained Marquez), for better detalization.
  • Provide API for building run ↔ dataset lineage, as well as parent run → children run lineage.
  • Ability to build lineage graph with specific time boundaries (unlike Marquez there lineage is build only for last job run).
  • Ability to build lineage graph with different granularity. e.g. merge all individual Spark operations into Spark applicationId or Spark applicationName.

Non-goals

  • This is not a Data Catalog. Use Datahub or OpenMetadata instead.
  • Static Data Lineage like view → table is not supported.
  • Currently column-level lineage is collected by OpenLineage, but not yet consumed by Data.Rentgen.

Documentation

See https://data-rentgen.readthedocs.io/