This repository contains a curated list of awesome Data & AI Governance platforms and tools that help you discover, manage, and observe data and AI solutions in your organization.
As Generative AI (GenAI) solutions gain prominence, the role of robust Metadata Management Systems equipped with Large Language Model (LLM) capabilities becomes increasingly critical. Such systems significantly enhance context-awareness, enabling advanced semantic understanding, bridging the gap between business and technical metadata, and effectively powering Retrieval-Augmented Generation (RAG) and Agent-to-Agent (A2A) interactions. We observe a notable convergence where RAG systems, initially context-limited, incorporate Metadata Management capabilities, while Metadata Management platforms increasingly adopt GenAI functionalities to cater to new, sophisticated use-cases.
The following categories outline key GenAI readiness features available across various Metadata Management solutions:
- AI & Semantic Metadata Enrichment
- Semantic Translation & Business Glossaries
- Technical Metadata & Query Analytics
- Data Quality, Observability & Governance
Disclaimer: The GenAI feature landscape evolves rapidly, and thus the provided list, feature groupings, and matrix may frequently change. Contributions, updates, and community-driven insights are strongly encouraged to keep this resource comprehensive and current.
The group includes built-in AI capabilities essential for Generative AI applications, enhancing metadata through semantic enrichment and context awareness. It supports Agent-to-Agent (A2A) protocols and Retrieval-Augmented Generation (RAG) systems by facilitating semantic understanding, vectorization, and natural language interactions. These capabilities align closely with Model Context Protocol (MCP), enabling standardized metadata exchange and interoperability across GenAI applications, making it possible for AI agents and RAG systems to leverage rich, structured metadata efficiently.
Product | Vendor / Origin | Offering Type | GitHub Repo | Official Website | MCP Support | GenAI Metadata Enrichment | Classic ML Metadata Enrichment | Built-in AI Assistant | Semantic Search | Vector Store Collector | API Access |
---|---|---|---|---|---|---|---|---|---|---|---|
Open Data Discovery | Provectus / Community | OSS | GitHub | opendatadiscovery.org | β | βοΈ | βοΈ | β | β | βοΈ | βοΈ |
OpenMetadata | OpenMetadata Community (Collate, exβAtlas) | OSS | GitHub | open-metadata.org | βοΈ | βοΈ | βοΈ | β | β | β | βοΈ |
Amundsen | Lyft / LF AI & Data Foundation | OSS | GitHub | amundsen.io | β | βοΈ | βοΈ | β | β | β | βοΈ |
DataHub | LinkedIn / Community | OSS | GitHub | datahub.io | βοΈ | βοΈ | βοΈ | β | β | βοΈ | βοΈ |
Marquez | WeWork / LF AI & Data Foundation | OSS | GitHub | marquezproject.ai | β | β | β | β | β | β | βοΈ |
Gravitino | Apache Software Foundation | OSS | GitHub | gravitino | βοΈ | β | β | β | β | β | βοΈ |
Soda Core | Soda | OSS | GitHub | soda.io | β | β | β | β | β | β | β |
Elementary | Elementary Data | OSS | GitHub | elementary-data.com | β | βοΈ | β | β | β | β | β |
Egeria | LF / ODPi | OSS | GitHub | egeria.odpi.org | β | β | β | β | β | β | βοΈ |
Magda | CSIRO / Community | OSS | GitHub | magda.io | β | β | β | β | β | β | βοΈ |
Atlas | Apache Software Foundation | OSS | GitHub | atlas.apache.org | β | β | β | β | β | β | βοΈ |
Grai Core | Grai | OSS | GitHub | grai.io | β | β | β | β | β | β | βοΈ |
CKAN | Datopian / Link Digital | OSS | GitHub | ckan.org | β | β | β | β | β | β | βοΈ |
Hamilton | DagWorks / Apache Software Foundation | OSS | GitHub | hamilton | β | β | β | β | β | β | β |
DataHub Cloud | Acryl Data / SaaS DataHub | Prop | β | datahub.io | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
Collate AI Platform | Collate / SaaS OpenMetadata | Prop | β | getcollate.io | βοΈ | βοΈ | βοΈ | βοΈ | β | βοΈ | βοΈ |
Atlan | Atlan | Prop | GitHub | atlan.com | β | βοΈ | β | βοΈ | βοΈ | βοΈ | βοΈ |
Ataccama ONE | Ataccama | Prop | β | ataccama.com | β | βοΈ | β | βοΈ | βοΈ | βοΈ | βοΈ |
Monte Carlo | Monte Carlo | Prop | β | montecarlodata.com | β | β | β | β | β | β | βοΈ |
Select Star | Select Star | Prop | β | selectstar.com | βοΈ | βοΈ | βοΈ | β | β | ? | βοΈ |
OvalEdge Data Catalog | OvalEdge | Prop | β | ovaledge.com | β | βοΈ | βοΈ | β | β | βοΈ | βοΈ |
Alation Data Catalog | Alation | Prop | β | alation.com | β | βοΈ | βοΈ | β | β | βοΈ | βοΈ |
Informatica Data Catalog | Informatica | Prop | β | informatica.com | β | βοΈ | βοΈ | β | β | βοΈ | βοΈ |
Precisely Data Integrity Suite | Precisely | Prop | β | precisely.com | β | βοΈ | βοΈ | β | β | β | βοΈ |
erwin Data Intelligence | Quest | Prop | β | quest.com | β | βοΈ | βοΈ | β | β | βοΈ | βοΈ |
OneTrust Data Discovery | OneTrust | Prop | β | onetrust.com | β | βοΈ | βοΈ | β | βοΈ | βοΈ | βοΈ |
Collibra Data Catalog | Collibra | Prop | GitHub | collibra.com | β | βοΈ | βοΈ | β | βοΈ | βοΈ | βοΈ |
Grai Cloud | SaaS Grai | Prop | grai.io | β | βοΈ | βοΈ | β | βοΈ | βοΈ | βοΈ | |
DataGalaxy Data Catalog | DataGalaxy | Prop | datagalaxy.com | β | βοΈ | βοΈ | β | βοΈ | βοΈ | βοΈ | |
data.world Data Catalog Platform | data.world / ServiceNow | Prop | GitHub | data.world | β | βοΈ | βοΈ | β | βοΈ | βοΈ | βοΈ |
Zeenea Data Catalog | Zeenea / Actian division of HCL Software | Prop | GitHub | zeenea.com | β | βοΈ | βοΈ | β | βοΈ | β | βοΈ |
Explorium | Explorium | Prop | GitHub | explorium.ai | βοΈ | βοΈ | βοΈ | β | βοΈ | βοΈ | βοΈ |
Talend Data Fabric | Talend / Qlik | Prop | β | talend.com | β | βοΈ | βοΈ | β | βοΈ | βοΈ | βοΈ |
Datafold | Datafold | Prop | β | datafold.com | β | βοΈ | βοΈ | β | βοΈ | β | βοΈ |
Metaplane | Metaplane / Datadog | Prop | β | metaplane.dev | β | βοΈ | βοΈ | β | βοΈ | β | βοΈ |
BigID Data Intelligence Platform | BigID | Prop | β | bigid.com | β | βοΈ | βοΈ | β | βοΈ | βοΈ | βοΈ |
IBM watsonx.data intelligence | IBM | Prop | β | ibm.com | β | βοΈ | βοΈ | β | βοΈ | βοΈ | βοΈ |
DataKitchen DataOps Observability | DataKitchen | Prop | β | datakitchen.io | β | βοΈ | βοΈ | β | βοΈ | βοΈ | βοΈ |
IBM Data Observability by Databand | Databand / IBM | Prop | β | ibm.com | β | βοΈ | βοΈ | β | βοΈ | βοΈ | βοΈ |
Databricks Unity Catalog | Databricks | Cloud | β | databricks.com | β | βοΈ | βοΈ | β | βοΈ | βοΈ | βοΈ |
Snowflake Horizon Catalog | Snowflake | Cloud | β | snowflake.com | β | βοΈ | βοΈ | β | βοΈ | βοΈ | βοΈ |
AWS DataZone | Amazon | Cloud | β | aws.amazon.com | β | βοΈ | βοΈ | β | βοΈ | βοΈ | βοΈ |
Google Cloud Dataplex | Cloud | β | cloud.google.com | β | βοΈ | βοΈ | β | βοΈ | βοΈ | βοΈ | |
Microsoft Purview | Microsoft | Cloud | β | azure.microsoft.com | β | βοΈ | βοΈ | β | βοΈ | βοΈ | βοΈ |
Features enabling a semantic bridge between business concepts and technical metadata. These capabilities enable precise mapping between business terms and data assets, fostering accurate, context-aware query generation.
Product | Vendor / Origin | Offering Type | GitHub Repo | Official Website | Data Asset Level Business Name | Column Level Business Name | Data Asset Level Description | Column Level Description | Data Asset Level Tagging | Column Level Tagging | Business Glossary | Glossary to Data Asset Lineage | Glossary to Column Lineage | Glossary to Glossary Lineage | Glossary to Query Example Lineage |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Open Data Discovery | Provectus / Community | OSS | GitHub | opendatadiscovery.org | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
OpenMetadata | OpenMetadata Community (Collate, ex-Atlas/Databook) | OSS | GitHub | open-metadata.org | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | β |
Amundsen | Lyft / LF AI & Data Foundation | OSS | GitHub | amundsen.io | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | β | β | β |
DataHub | LinkedIn-origin / DataHub Project | OSS | GitHub | datahub.io | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | β |
Marquez | WeWork-origin / LF AI & Data Foundation | OSS | GitHub | marquezproject.ai | β | β | βοΈ | βοΈ | βοΈ | βοΈ | β | β | β | β | β |
Gravitino | Apache Software Foundation | OSS | GitHub | gravitino | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | β | β | β | β |
Soda Core | Soda | OSS | GitHub | soda.io | β | β | β | β | β | β | β | β | β | β | β |
Elementary | Elementary Data | OSS | GitHub | elementary-data.com | β | β | β | β | β | β | β | β | β | β | β |
Egeria | LF / ODPi | OSS | GitHub | egeria.odpi.org | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | β |
Magda | magda.io / CSIRO-origin | OSS | GitHub | magda.io | βοΈ | β | βοΈ | βοΈ | βοΈ | β | βοΈ | β | β | βοΈ | β |
Atlas | Apache Software Foundation | OSS | GitHub | atlas.apache.org | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | β |
CKAN | Datopian / Link Digital | OSS | GitHub | ckan.org | βοΈ | β | βοΈ | β | βοΈ | β | β | β | β | β | β |
Hamilton | DagWorks / Apache Software Foundation | OSS | GitHub | hamilton | β | β | βοΈ | β | β | β | β | β | β | β | β |
DataHub Cloud | Acryl Data / SaaS DataHub | Prop | - | datahub.io | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
Collate AI Platform | Collate / SaaS OpenMetadata | Prop | - | getcollate.io | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
Atlan | Atlan | Prop | GitHub | atlan.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
Grai Core | Grai | Prop | GitHub | grai.io | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
Ataccama ONE | Ataccama | Prop | - | ataccama.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
Monte Carlo | Monte Carlo | Prop | - | montecarlodata.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | β | βοΈ | β | β | β | β |
Select Star | Select Star | Prop | - | selectstar.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
OvalEdge Data Catalog | OvalEdge | Prop | - | ovaledge.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
Alation Data Catalog | Alation | Prop | - | alation.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
Informatica Data Catalog | Informatica | Prop | - | informatica.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
Precisely Data Integrity Suite | Precisely | Prop | - | precisely.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | β |
erwin Data Intelligence | Quest | Prop | - | quest.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
OneTrust Data Discovery | OneTrust | Prop | - | onetrust.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
Collibra Data Catalog | Collibra | Prop | GitHub | collibra.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
Grai Cloud | SaaS Grai | Prop | grai.io | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | |
DataGalaxy Data Catalog | DataGalaxy | Prop | datagalaxy.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | |
data.world Data Catalog Platform | ServiceNow | Prop | GitHub | data.world | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
Zeenea Data Catalog | Zeenea / Actian division of HCL Software | Prop | GitHub | zeenea.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
Explorium | Explorium | Prop | GitHub | explorium.ai | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
Talend Data Fabric | Talend / Qlik | Prop | - | talend.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
Datafold | Datafold | Prop | - | datafold.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | β |
Metaplane | Metaplane / Datadog | Prop | - | metaplane.dev | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | β |
BigID Data Intelligence Platform | BigID | Prop | - | bigid.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
IBM watsonx.data intelligence | IBM | Prop | - | ibm.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
DataKitchen DataOps Observability | DataKitchen | Prop | - | datakitchen.io | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
IBM Data Observability by Databand | Databand / IBM | Prop | - | ibm.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
Databricks Unity Catalog | Databricks | Cloud | - | databricks.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
Snowflake Horizon Catalog | Snowflake | Cloud | - | snowflake.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
AWS DataZone | AWS | Cloud | - | aws.amazon.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
Google Cloud Dataplex | Google Cloud | Cloud | - | cloud.google.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
Microsoft Purview | Microsoft | Cloud | - | azure.microsoft.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
Core features providing detailed technical insights, query analytics, and usage patterns critical for metadata-informed query generation, optimization, and efficient GenAI-driven data interactions. Essential for automated database query generation, lineage tracking, and metadata-informed GenAI processes.
Product | Vendor / Origin | Offering Type | GitHub Repo | Official Website | Data Asset Level Technical (System) Name | Column Level Technical (System) Name | Column Data Type | Column Level Constraints | Data Asset Relationships (Data Model) | Data Asset Level Lineage | Column Level Lineage | Data Asset Level Metadata Versioning & History | Column Level Metadata Versioning & History | Curated Query Examples | Most Frequent Queries | Recent Queries | Data Sample |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Open Data Discovery | Provectus / Community | OSS | GitHub | opendatadiscovery.org | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | β | β | βοΈ | βοΈ | β | β | β |
OpenMetadata | OpenMetadata Community (Collate, ex-Atlas/Databook) | OSS | GitHub | open-metadata.org | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | β | βοΈ | βοΈ |
Amundsen | Lyft / LF AI & Data Foundation | OSS | GitHub | amundsen.io | βοΈ | βοΈ | βοΈ | β | βοΈ | βοΈ | βοΈ | β | β | βοΈ | βοΈ | βοΈ | βοΈ |
DataHub | LinkedInβorigin / DataHub Project | OSS | GitHub | datahub.io | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | β | β | β |
Marquez | WeWorkβorigin /LF AI & Data Foundation | OSS | GitHub | marquezproject.ai | βοΈ | βοΈ | βοΈ | ? | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | β | ? | ? | β |
Gravitino | Apache Software Foundation | OSS | GitHub | gravitino | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | β | β | β | β | β | β |
Soda Core | Soda | OSS | GitHub | soda.io | βοΈ | βοΈ | βοΈ | βοΈ | β | β | β | β | β | β | β | β | βοΈ |
Elementary | Elementary Data | OSS | GitHub | elementary-data.com | βοΈ | βοΈ | βοΈ | βοΈ | β | βοΈ | βοΈ | β | β | β | β | β | β |
Egeria | LF / ODPi | OSS | GitHub | egeria.odpi.org | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | ? | ? | ? | βοΈ |
Magda | magda.io / CSIROβorigin | OSS | GitHub | magda.io | βοΈ | βοΈ | βοΈ | β | β | β | β | β | β | β | βοΈ | βοΈ | βοΈ |
Atlas | Apache Software Foundation | OSS | GitHub | atlas.apache.org | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | β | β | β | β |
Grai Core | Grai | OSS | GitHub | grai.io | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | β | β | β | β |
CKAN | Datopian / Link Digital | OSS | GitHub | ckan.org | βοΈ | βοΈ | βοΈ | β | β | β | β | βοΈ | β | β | β | β | βοΈ |
Hamilton | DagWorks / Apache Software Foundation | OSS | GitHub | hamilton | βοΈ | β | β | β | β | βοΈ | βοΈ | βοΈ | βοΈ | β | β | β | β |
DataHub Cloud | Acryl Data / SaaS DataHub | Prop | - | datahub.io | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | β | β | β |
Collate AI Platform | Collate / SaaS OpenMetadata | Prop | - | getcollate.io | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | β | βοΈ | βοΈ |
Atlan | Atlan | Prop | GitHub | atlan.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
Ataccama | Ataccama | Prop | - | ataccama.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
Monte Carlo | Monte Carlo | Prop | - | montecarlodata.com | βοΈ | βοΈ | βοΈ | β | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | β | β | β | β |
Select Star | Select Star | Prop | - | selectstar.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
OvalEdge Data Catalog | OvalEdge | Prop | - | ovaledge.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
Alation Data Catalog | Alation | Prop | - | alation.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
Informatica Data Catalog | Informatica | Prop | - | informatica.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
Precisely Data Integrity Suite | Precisely | Prop | - | precisely.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
erwin Data Intelligence | Quest | Prop | - | quest.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
OneTrust Data Discovery | OneTrust | Prop | - | onetrust.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
Collibra Data Catalog | Collibra | Prop | GitHub | collibra.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
Grai Cloud | SaaS Grai | Prop | grai.io | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | |
DataGalaxy Data Catalog | DataGalaxy | Prop | datagalaxy.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | |
data.world Data Catalog Platform | ServiceNow | Prop | GitHub | data.world | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
Zeenea Data Catalog | Zeenea / Actian division of HCL Software | Prop | GitHub | zeenea.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
Explorium | Explorium | Prop | GitHub | explorium.ai | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
Talend Data Fabric | Talend / Qlik | Prop | - | talend.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
Datafold | Datafold | Prop | - | datafold.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
Metaplane | Metaplane / Datadog | Prop | - | metaplane.dev | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
BigID Data Intelligence Platform | BigID | Prop | - | bigid.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
IBM watsonx.data intelligence | IBM | Prop | - | ibm.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
DataKitchen DataOps Observability | DataKitchen | Prop | - | datakitchen.io | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
IBM Data Observability by Databand | Databand / IBM | Prop | - | ibm.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
Databricks Unity Catalog | Databricks | Cloud | - | databricks.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
Snowflake Horizon Catalog | Snowflake | Cloud | - | snowflake.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
AWS DataZone | Amazon | Cloud | - | aws.amazon.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
Google Cloud Dataplex | Cloud | - | cloud.google.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | |
Microsoft Purview | Microsoft | Cloud | - | azure.microsoft.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
Essential features for maintaining data reliability, trustworthiness, observability, collaborative governance, and secure access management. Integral to ensuring metadata-driven AI operations remain accurate, dependable, securely managed, and compliant with sensitive data handling requirements.
Product | Vendor / Origin | Offering Type | GitHub Repo | Official Website | Data Quality | Data Statistics | Popularity Scoring | Data Status | Collaborative Editing | User Feedback | Sensitive Data Detection | RBAC on Metadata |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Open Data Discovery | Provectus / Community | OSS | GitHub | opendatadiscovery.org | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | β | βοΈ | βοΈ |
OpenMetadata | OpenMetadata Community (Collate, exβAtlas/Databook) | OSS | GitHub | open-metadata.org | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
Amundsen | Lyft / LF AI & Data Foundation | OSS | GitHub | amundsen.io | β | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | β | β |
DataHub | LinkedIn / DataHub Project | OSS | GitHub | datahubproject.io | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
Marquez | WeWork / LF AI & Data Foundation | OSS | GitHub | marquezproject.ai | β | βοΈ | β | βοΈ | β | β | β | x |
Gravitino | Apache Software Foundation | OSS | GitHub | Gravitino | β | βοΈ | β | β | β | β | β | x |
Soda Core | Soda | OSS | GitHub | soda.io | βοΈ | βοΈ | β | β | β | β | βοΈ | β |
Elementary | Elementary Data | OSS | GitHub | elementary-data.com | βοΈ | βοΈ | β | β | β | β | β | β |
Egeria | LF / ODPi | OSS | GitHub | egeria.odpi.org | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
Magda | CSIRO / magda.io | OSS | GitHub | magda.io | β | βοΈ | β | βοΈ | βοΈ | βοΈ | β | βοΈ |
Atlas | Apache Software Foundation | OSS | GitHub | atlas.apache.org | βοΈ | βοΈ | β | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
Grai Core | Grai | OSS | GitHub | grai.io | βοΈ | βοΈ | β | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
CKAN | Datopian & Link Digital | OSS | GitHub | ckan.org | β | βοΈ | β | βοΈ | βοΈ | βοΈ | β | β |
Hamilton | Apache Software Foundation | OSS | GitHub | dagworks.io/hamilton | β | β | β | β | β | β | β | β |
DataHub Cloud | Acryl Data / SaaS DataHub | Prop | - | datahub.io | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
Collate AI Platform | Managed OpenMetadata | Prop | - | getcollate.io | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
Atlan | Atlan | Prop | GitHub | atlan.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
Ataccama ONE | Ataccama | Prop | - | ataccama.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
Monte Carlo | Monte Carlo | Prop | - | montecarlodata.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
Select Star | Select Star | Prop | - | selectstar.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
OvalEdge | OvalEdge | Prop | - | ovaledge.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
Alation | Alation | Prop | - | alation.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
Informatica Data Catalog | Informatica | Prop | - | informatica.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
Precisely Data Integrity Suite | Precisely | Prop | - | precisely.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
erwin Data Intelligence | Quest | Prop | - | quest.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
OneTrust Data Discovery | OneTrust | Prop | - | onetrust.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
Collibra Data Catalog | Collibra | Prop | GitHub | collibra.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
Grai Cloud | SaaS Grai | Prop | grai.io | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | |
DataGalaxy Data Catalog | DataGalaxy | Prop | datagalaxy.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | |
data.world | ServiceNow | Prop | GitHub | data.world | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
Zeenea | Zeenea / Actian division of HCL Software | Prop | GitHub | zeenea.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
Explorium | Explorium | Prop | GitHub | explorium.ai | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
Talend Data Fabric | Talend / Qlik | Prop | - | talend.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
Datafold | Datafold | Prop | - | datafold.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
Metaplane | Metaplane / Datadog | Prop | - | metaplane.dev | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
BigID Data Intelligence Platform | BigID | Prop | - | bigid.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
IBM watsonx.data intelligence | IBM | Prop | - | ibm.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
DataKitchen DataOps Observability | DataKitchen | Prop | - | datakitchen.io | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
IBM Data Observability by Databand | Databand / IBM | Prop | - | datakitchen.io | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
Databricks Unity Catalog | Databricks | Cloud | - | databricks.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
Snowflake Horizon Catalog | Snowflake | Cloud | - | snowflake.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
AWS DataZone | Amazone | Cloud | - | aws.amazon.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
Google Cloud Dataplex | Cloud | - | cloud.google.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | |
Microsoft Purview | Microsoft | Cloud | - | azure.microsoft.com | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
Tool | Specification -Based | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observ- ability | Column-level lineage | Data collaboration |
---|---|---|---|---|---|---|---|---|---|---|---|
Alation | β | βοΈ | β | βοΈ | β | β | βοΈ | β | β | β | β |
Amundsen | β | βοΈ | βοΈ | βοΈ | β | β | β | β | β | β | β |
Ataccama | β | βοΈ | β | βοΈ | β | β | βοΈ | β | β | β | β |
Atlan | β | βοΈ | β | βοΈ | β | β | βοΈ | β | β | βοΈ | βοΈ |
Atlas | β | βοΈ | β | βοΈ | β | β | β | β | β | β | β |
Microsoft Purview | β | βοΈ | ? | βοΈ | β | β | ? | β | β | β | β |
CKAN | βοΈ DCAT, DCAT-AP, Schema.org and more | βοΈ | β | β | βοΈ details | β | βοΈ details | β | βοΈ details | β | β |
Collibra | β | βοΈ | ? | βοΈ | β | β | ? | β | β | β | β |
DataGalaxy | β | βοΈ | βοΈ | βοΈ | β | β | β | βοΈ | βοΈ | ? | ? |
Databand | β | ? | ? | ? | β | ? | ? | ? | βοΈ | β | β |
Datafold | β | βοΈ | βοΈ | βοΈ | β | β | βοΈ | β | βοΈ | β | β |
DataHub | βοΈ details | βοΈ | βοΈ | βοΈ | β | β | β | β | β | β | β |
Google Cloud Dataplex | β | βοΈ | β | βοΈ | β | β | ? | β | β | β | β |
Informatica | β | βοΈ | βοΈ | βοΈ | β | β | βοΈ | β | β | ? | β |
Magda | β | βοΈ | β | β | βοΈ | β | β | β | β | β | β |
Marquez | OpenLineage | βοΈ | β | βοΈ | ? | β | β | β | β | βοΈ | β |
Monte Carlo | β | βοΈ | β | βοΈ | β | β | βοΈ | β | βοΈ | β | β |
Select Star | β | βοΈ | βοΈ | βοΈ | βοΈ | β | β | βοΈ | β | βοΈ | βοΈ |
Open Data Discovery | ODD Specification | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | β | βοΈ |
OpenMetadata | JSON Schema | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
Stemma | β | βοΈ | βοΈ | βοΈ | β | β | ? | βοΈ | β | β | β |
Talend | β | βοΈ | ? | βοΈ | β | β | βοΈ | β | β | β | β |
Grai | Grai Schemas | βοΈ | β | βοΈ | β | βοΈ | βοΈ | β | β | βοΈ | βοΈ |
Hamilton | Hamilton | βοΈ | βοΈ | ? | βοΈ | β | βοΈ | Β½ | βοΈ | βοΈ | β |
Definitions:
- Specification-based - uses an open standard for collecting metadata to allow efficient time-to-discovery and federating data catalogs
- Search-based - allows to search for data assets
- Network-based - provides rich context about data asset ownership
- Lineage-based - provides lineage for all entities the solution operates
- Federation - the ability to map multiple data catalogs into a single UI to avoid repeated data collection.
- ML 1st citizen - operates ML entities on a high level - you can use them as any other data assets.
- Data Quality - includes mature data quality assurance tools.
- End-to-end lineage - data lineage that includes all data assets used in the organization across all its data catalogs and ML tools.
- Column-level lineage - data lineage with column level granularity
- Data collaboration - provides possibility to bring together data from various internal and external sources to unlock combined data insights
A popular open-source data catalog for metadata management and data discovery originated from Lyft. Created by Amundsen maintainers, Stemma provides a managed version of an enterprise data catalog, inspired by Amundsen.
Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
---|---|---|---|---|---|---|---|---|---|---|
β | βοΈ | βοΈ | βοΈ | β | β | β | β | β | β | β |
More features
- Strategy: Push
- UX personalization: No
- AI autowiring: No
- Rich data profiling: No
- Recommendations: Yes
- Schemas, Description: Yes
- Complex schemas: No
- Data preview: Yes
- Column statistics: Yes
- Data owner: Yes
- Top data users: Yes
- Change notifications:No
- Change feed: No
- Deployment:
- Supported data sources: Hive, Redshift, Druid, RDBMS, Presto, Snowflake
DataHub is an open-source data catalog enabling data discovery, data observability and federated governance that originated from LinkedIn and is commercially offered by Acryl Data as a cloud-hosted SaaS offering.
Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability |
---|---|---|---|---|---|---|---|---|
βοΈ details | βοΈ | βοΈ | βοΈ | β | β | β | β | β |
More features
- Strategy: Push, Pull
- Customizable metadata model: Yes. The metadata model can be declared using the open-source Pegasus language, and is interoperable with JSONSchema and Avro
- Rich data profiling: Yes
- Recommendations: Yes
- Schemas, Description: Yes
- Complex schemas: Yes
- Data preview: Yes
- Column statistics: Yes
- Data owner: Yes
- Top data users: Yes
- Lineage impact analysis: Yes
- Change notifications: Yes
- Change feed: No
- Automation: Yes
- UX personalization: No
- Deployment: docker-compose / Kubernetes with Helm, or using Acryl Data's SaaS offering
- Supported data sources:
- Snowflake
- BigQuery
- Redshift
- Hive
- Athena
- Postgres
- MySQL
- SQL server
- Trino
- Delta Lake
- S3
- Looker
- PowerBI
- Tableau
- Mode
- Metabase
- Redash
- Superset
- Airflow
- Great Expectation
- dbt
- Feast
- SageMaker
- Glue
- Kafka
- Nifi
- Okta
- LDAP
- Slack
- There's 50+ integrations - see the docs for the latest.
Marquez is an open-source data catalog for collection, aggregation, and visualization of a data ecosystemβs metadata originated from WeWork.
Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
---|---|---|---|---|---|---|---|---|---|---|
OpenLineage | βοΈ | β | βοΈ | ? | β | β | β | β | βοΈ | β |
More features
- Strategy: Push
- UX personalization: No
- AI autowiring: No
- Rich data profiling: No
- Recommendations: No
- Schemas, Description: Yes
- Complex schemas: No
- Data preview: Yes
- Column statistics: No
- Data owner: Yes
- Top data users: ?
- Change notifications: No
- Change feed: No
- Deployment:
- Supported data sources: S3, Kafka
Apache Atlas is an open-source data catalog for metadata collection, governance, and data democratization.
Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
---|---|---|---|---|---|---|---|---|---|---|
β | βοΈ | β | βοΈ | β | β | β | β | β | β | β |
More features
- Strategy: Push
- UX personalization: No
- AI autowiring: No
- Rich data profiling: No
- Recommendations: No
- Schemas, Description: Yes
- Complex schemas: No
- Data preview: No
- Column statistics: No
- Data owner: No
- Top data users: ?
- Change notifications: Yes
- Change feed: No
- Deployment:
- Supported data sources:HBase, Hive, Sqoop, Kafka, Storm
CKAN is an open-source data catalog for data management, widely adopted by governments, NGOs, research institutions, and enterprises. It is actively maintained by a global community, with four of the eight core maintainers currently funded by Link Digital.
Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
---|---|---|---|---|---|---|---|---|---|---|
βοΈ Yes (DCAT, DCAT-AP, schema.org via plugins) | βοΈ | β | β | βοΈ | β | βοΈ (ckanext-qa) | β | βοΈ(ckanext-archiver) | β | βοΈ (possible via external integrations or Drupal front-end) |
More features
- Strategy: Push, CKAN is used to publish or upload datasets into a central catalog
- UX personalization: Achievable through integration with CMS platforms like Drupal or any custom front ends.
- AI autowiring: Yes. Can be experimented using ckanext-embeddings.
- Rich data profiling: Yes, via ckanext-validation extension.
- Recommendations: Yes. Custom development or integration would be required.
- Metadata Schemas and Custom Fields: Yes, using ckanext-schemingβ. One of the most widely used extensions for complex metadata schemas.
- Complex schemas: Yes (via ckanext-dcat extension)
- Data preview: Yes
- Visual previews: Yes
- Column statistics: No
- Data owner: No
- Top data users: Achievable using the stats-extension, which provides insights into user activity and dataset popularity.
- Change notifications: Achievable using ckanext-email-notification.
- Change feed: Yes
- Deployment: Yes (Self-hosted, cloud-hosted)
- Supported data sources: Yes; various formats:
- Postgres
- MySQL
- SQL server
- PowerBI
- Tableau
- CSV
- Croissant
- JSON
- GeoJSON
- XLS
- Tableau
- ...and more!
Magda is an open-source data catalog that features data discovery, metadata enrichment, and federation, focused on geodata.
Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
---|---|---|---|---|---|---|---|---|---|---|
β | βοΈ | β | β | βοΈ | β | β | β | β | β | β |
More features
- Strategy: Push via UI
- UX personalization: No
- AI autowiring: No
- Rich data profiling: No
- Recommendations: No
- Schemas, Description: Yes
- Complex schemas: No
- Data preview: Yes
- Column statistics: No
- Data owner: Yes
- Top data users: ?
- Change notifications: No
- Change feed: No
- Deployment:
- Supported data sources: Mostly geodata
First open-source data discovery and observability platform. ODD Platform is based on ODD Specification.
Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
---|---|---|---|---|---|---|---|---|---|---|
βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | β | βοΈ |
More features
- Strategy: Push/Pull
- UX personalization: No
- Rich data profiling: Yes
- Data collaboration: Yes
- Schemas, Description: Yes
- Complex schemas: Yes
- Data preview: No
- Column statistics: Yes
- Data owner: Yes
- Change notifications: Yes
- Change feed: Yes
- Metadata versioning: Yes
- SaaS: No
- Third-party integrations: Airflow, Apache Spark, Dbt, Great Expectations, and Prefect
- Supported data sources: Airflow, Athena, AzureSQL, BigQuery, Clickhouse, Databricks, DeltaLake, Druid, DynamoDB, Fivetran, Glue, Hive, Kafka, Looker, MariaDB, MlFlow, MSSQL, MySQL, Oracle, Postgres, Presto, Redash, Redpanda, Redshift, Snowflake, Tableau, and Vertica
OpenMetadata is the all-in-one platform for data collaboration, discovery, governance, lineage, and quality that lets you focus on building and analyzing.
Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
---|---|---|---|---|---|---|---|---|---|---|
βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |
More features
- Strategy: Push/Pull
- UX personalization: No
- Rich data profiling: Yes
- Data collaboration: Yes
- Schemas, Description: Yes
- Complex schemas: Yes
- Data preview: Yes
- Column statistics: Yes
- Data owner: Yes
- Change notifications: Yes
- Change feed: Yes
- Metadata versioning: Yes
- SaaS: Yes
- Third-party integrations: Dbt, Great Expectations, and Prefect
- Supported data sources: Airbyte, Airflow, Athena, AzureSQL, BigQuery, Clickhouse, Dagster, Databricks, DB2, DeltaLake, Druid, DynamoDB, Fivetran, Glue, Glue, Hive, Kafka, Looker, MariaDB, Metabase, MlFlow, Mode, MSSQL, MySQL, NiFi, Oracle, Postgres, PowerBI, Presto, Redash, Redpanda, Redshift, Salesforce, SingleStore, Snowflake, Superset, Tableau, Trino, and Vertica
Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
---|---|---|---|---|---|---|---|---|---|---|
Grai Schemas | βοΈ | β | βοΈ | β | βοΈ | βοΈ | β | β | βοΈ | βοΈ |
More features
- Strategy: Push, Pull
- Customizable metadata model: Yes. The metadata model can be flexibly extended or modified as needed.
- Rich data profiling: No
- Recommendations: No
- Schemas, Description: Yes
- Complex schemas: Yes
- Data preview: No
- Column statistics: No
- Data owner: Yes
- Top data users: No
- CI Integration: Yes
- Lineage impact analysis: Yes
- Change notifications: Yes
- Change feed: Yes
- Automation: Yes
- UX personalization: Yes
- Deployment: docker-compose / Kubernetes with Helm, or using Grai SaaS offering
- Supported data sources:
- Snowflake
- BigQuery
- Redshift
- Postgres
- MySQL
- dbt
- Slack
- ... many others see the docs for a full list.
A popular open-source framework for describing transformations that comes with a catalog (the Hamilton UI). The project originated from Stitch Fix. Created by Hamilton maintainers, Dagworks Inc. (YCW23) provides a managed version of the Hamilton UI in addition to self-hosted on-premise features.
Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
---|---|---|---|---|---|---|---|---|---|---|
Hamilton | βοΈ | βοΈ | ? | βοΈ | β | βοΈ | Β½ | βοΈ | βοΈ | β |
More features
- Strategy: Use python to describe DAG, add one-line of code to capture: lineage & provenance, metadata, data summary profiles, versions, and execution telemetry. [See blog for details](https://blog.dagworks.io/p/hamilton-ui-streamlining-metadata).
- UX personalization: No
- AI autowiring: No
- Rich data profiling: Pluggable
- Recommendations: No
- Schemas, Description: Yes
- Complex schemas: Yes
- Data preview: Only summary statistics
- Column statistics: Yes, only for python.
- Data owner: Via tags.
- Top data users: Not yet
- Change notifications: Not yet
- Change feed: No yet
- Deployment: pip installable locally, self-hosted via docker, managed SaaS
- Supported data sources: No direct ingrations as of yet.
Collibra is an enterprise data catalog that helps to discover and understand data that matters and drive impactful insights from it.
Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
---|---|---|---|---|---|---|---|---|---|---|
β | βοΈ | ? | βοΈ | β | β | ? | β | β | β | β |
More features
- Strategy: Push
- UX personalization: Yes
- AI autowiring: ?
- Network-based: No
- Rich data profiling: ?
- Supported data sources:
Informatica is an enterprise data catalog that provides AI-powered data discovery engine to scan and catalog data assets.
Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
---|---|---|---|---|---|---|---|---|---|---|
β | βοΈ | βοΈ | βοΈ | β | β | βοΈ | β | β | ? | β |
More features
- Strategy: Push
- UX personalization: ?
- AI autowiring: ?
- Network-based: Yes
- Rich data profiling: Yes
- Supported data sources:
Alation is a collaborative data catalog that helps companies to drive value and business impact from their data.
Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
---|---|---|---|---|---|---|---|---|---|---|
β | βοΈ | β | βοΈ | β | β | βοΈ | β | β | β | β |
More features
- Strategy: Push
- UX personalization: Yes
- AI autowiring: No
- Network-based: No
- Rich data profiling: No
- Supported data sources:
Atlan is a modern data catalog offering data discovery, data profiling, data quality, data lineage and data governance.
Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
---|---|---|---|---|---|---|---|---|---|---|
β | βοΈ | β | βοΈ | β | β | βοΈ | β | β | βοΈ | βοΈ |
More features
- Strategy: Pull
- UX personalization: ?
- AI autowiring: ?
- Network-based: No
- Rich data profiling: ?
- Supported data sources: Presto, Deequ, Atlas, Airflow, Hudi
DataGalaxy is a modern data catalog offering data discovery, data profiling, data quality, data lineage and data governance.
Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
---|---|---|---|---|---|---|---|---|---|---|
β | βοΈ | βοΈ | βοΈ | β | β | β | βοΈ | βοΈ | ? | ? |
More features
- Strategy: Pull & Push
- UX personalization: Yes
- AI autowiring: Yes
- Network-based: Yes
- Rich data profiling: Yes
- Supported data sources: [Available connectors](https://www.datagalaxy.com/fr/integrations-connecteurs/)
Talend is a data catalog that helps enterprises power critical business descisions with trusted data.
Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
---|---|---|---|---|---|---|---|---|---|---|
β | βοΈ | ? | βοΈ | β | β | βοΈ | β | β | β | β |
More features
- Strategy: Push
- UX personalization: Yes
- AI autowiring: ?
- Network-based: ?
- Rich data profiling: Yes
- Supported data sources:
Select Star is an intelligent data discovery platform that automatically analyzes and documents your data. Select Star provides an easy to use data portal that everyone can use to find and understand data.
Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
---|---|---|---|---|---|---|---|---|---|---|
β | βοΈ | βοΈ | βοΈ | βοΈ | β | β | βοΈ | β | βοΈ | βοΈ |
More features
- Strategy: Pull
- AI autowiring: Yes
- Network-based: Yes
- Rich data profiling: No
- ER Diagram generation: Yes
- Role & Policy based access control: Yes
- Popularity & usage: Yes
- Description & Tag propagation: Yes
- Data preview: Yes
- Data owners: Yes
- Top data users: Yes
- UX personalization: No
- Supported data sources:
- Snowflake
- BigQuery
- Redshift
- Postgres
- Looker
- PowerBI
- Tableau
- Mode
- Sigma
- Sisense
- Metabase
- Looker Studio
- DBT Cloud & Core
- Slack
Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
---|---|---|---|---|---|---|---|---|---|---|
βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | β | βοΈ |
More features
- Strategy: Full featured, with UI for singlue user. Enterprise version for teams
- UX personalization: No
- AI autowiring: DataOps TestGen data quality verification tool that does five main tasks: (1) data profiling, (2) new dataset screening and hygiene review, (3) AI/algorithmic generation of data quality validation tests, (4) ongoing production testing of new data refreshes and (5) continuous periodic monitoring of datasets for anomalies
- Network-based: Data Journey based
- Rich data profiling: 51 characteristics, with UI
- Supported data sources: Snowflake, Redshift, Tableau, Synapse, Postgres, Tableau, PowerBI, Airflow, Fivetran, Databricks, dbt, Databricks Azure Data Factory, SSIS, Synapse Pipelines, ADF-managed Airflow, Google Composer, AWS S3, Qlik Sense, Amazon Managed Workflows for Apache Airflow, Talend Cloud, Azure Functions (via Event Hub), Azure ADLS/Blob Storage (via Event Hub)
Monte Carlo is a data observability tool that helps to increase trust in data by eliminating or preventing data downtime.
Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
---|---|---|---|---|---|---|---|---|---|---|
β | βοΈ | β | βοΈ | β | β | βοΈ | β | βοΈ | β | β |
More features
- Strategy: Pull
- UX personalization: ?
- AI autowiring: ?
- Network-based: ?
- Rich data profiling: ?
- Supported data sources: Snowflake, Hive, Kafka, Looker, Redshift, Tableau, Big Query, Airflow, Fivetran, Presto, Mode, Periscope, Databricks, Glue, dbt, Chartio, Spark, AWS, S3, data.world, Google Cloud Platform
Databand is an observability platform that helps data engineers identify and troubleshoot pipeline issues and data quality problems.
Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
---|---|---|---|---|---|---|---|---|---|---|
β | ? | ? | ? | β | ? | ? | ? | βοΈ | ? | ? |
More features
- Strategy: Push
- UX personalization: ?
- AI autowiring: ?
- Network-based: ?
- Rich data profiling: ?
- Supported data sources:
Datafold is a data monitoring and observability platform that gives you confidence in your data quality through diffs, profiling, and anomaly detection.
Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
---|---|---|---|---|---|---|---|---|---|---|
β | βοΈ | βοΈ | βοΈ | β | β | βοΈ | β | βοΈ | ? | ? |
More features
- Strategy: Push
- UX personalization: ?
- AI autowiring: ?
- Network-based: ?
- Rich data profiling: ?
- Supported data sources:
Ataccama is an enterprise data catalog and observability tool featuring data profiling and data quality management, designed for data professionals.
Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
---|---|---|---|---|---|---|---|---|---|---|
β | βοΈ | β | βοΈ | β | β | βοΈ | β | β | β | β |
More features
- Strategy: Pull
- UX personalization: Yes
- AI autowiring: No
- Network-based: No
- Rich data profiling: Yes
- Supported data sources:
Amazon DataZone is a data management service that makes it faster and easier for customers to catalog, discover, share, and govern data stored across AWS, on premises, and third-party sources.
Google Cloud Dataplex Universal Catalog is integrated platform that combines data discovery, cataloging, governance, quality, and exploration into one cohesive service.
Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
---|---|---|---|---|---|---|---|---|---|---|
β | βοΈ | β | βοΈ | β | β | ? | β | β | β | β |
More features
- Strategy: Pull
- UX personalization: ?
- AI autowiring: ?
- Network-based: No
- Rich data profiling: No
- Supported data sources:
Microsoft Purview Unified Catalog is a central platform for discovering, classifying, and managing data assets across organization.
Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
---|---|---|---|---|---|---|---|---|---|---|
β | βοΈ | ? | βοΈ | β | β | ? | β | β | β | β |
More features
- Strategy: Pull
- UX personalization: ?
- AI autowiring: ?
- Network-based: ?
- Rich data profiling: ?
- Supported data sources: