This connector extracts dataset-level data profiles from Unity Catalog using the Unity Catalog API.
Create a dedicated access token based on the Setup guide for the general Unity Catalog connector. You'll need to ensure the owner of the access token has SELECT
privilege for the tables in order to analyze the table statistics:
GRANT SELECT ON TABLE * TO <user_role>
Create a YAML config file based on the following template.
hostname: <cluster_or_warehouse_hostname>
http_path: <http_path>
token: <access_token>
See this page for details on how to set the values for hostname
and http_path
.
See Filter Configurations for more information on the optional filter
config.
See Output Config for more information on the optional output
config.
The max number of concurrent queries to the databricks compute node can be configured as follows,
max_concurrency: <max_number_of_queries> # Default to 10
To run ANALYZE TABLE
query if there are not statistics for the table.
analyze_if_no_statistics: true # Default is false
Follow the Installation instructions to install metaphor-connectors
in your environment (or virtualenv). Make sure to include either all
or unity_catalog
extra.
Run the following command to test the connector locally:
metaphor unity_catalog.profile <config_file>
Manually verify the output after the command finishes.