Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get query logs from unity catalog #651

Merged
merged 16 commits into from
Nov 1, 2023

Conversation

usefulalgorithm
Copy link
Contributor

@usefulalgorithm usefulalgorithm commented Oct 30, 2023

🤔 Why?

  • To get query logs and relevant information from databricks system.
  • Use newer databricks SDK.
    (SC links: SC21951, SC21623)

🤓 What?

  • updated to use new sdk.
  • get query logs with WorkspaceClient.queriy_history.list, and return those logs along with the datasets.

🧪 Tested?

  • Modified and ran unit tests.
  • Tested with staging configurations. Was able to extract query logs with the new SDK. See description for sample query log entry.

Description

A sample query log entry:

        {
          "_id": "UNITY_CATALOG:01ee7724-1377-18f6-9781-b0aa2bb3fd13",
          "bytesRead": 0.0,
          "bytesWritten": 0.0,
          "duration": 33686.0,
          "platform": "UNITY_CATALOG",
          "queryId": "01ee7724-1377-18f6-9781-b0aa2bb3fd13",
          "rowsRead": 0.0,
          "rowsWritten": 3.0,
          "sql": "-- this is a system generated query from catalog explorer\ndescribe history `main`.`default`.`testing` LIMIT 100;",
          "sqlHash": "fd1d327232c9df5c354a6c7cd51276a7",
          "startTime": "2023-10-30T12:58:52.420000+00:00",
          "userId": "dev@metaphor.io"
        }

@shortcut-integration
Copy link

This pull request has been linked to Shortcut Story #21951: Get query log from Unity Catalog.

@usefulalgorithm
Copy link
Contributor Author

usefulalgorithm commented Oct 30, 2023

TODO

  • add config values in the crawler configs: something like
@dataclass(config=ConnectorConfig)
class SnowflakeQueryLogConfig:
    # Number of days back of query logs to fetch, if 0, don't fetch query logs
    lookback_days: int = 1

    # Query log filter to exclude certain usernames
    excluded_usernames: Set[str] = field(default_factory=lambda: set())

    # The number of query logs to fetch from Snowflake in one batch
    fetch_size: int = DEFAULT_QUERY_LOG_FETCH_SIZE

    # Queries larger than this size will not be processed
    max_query_size: int = DEFAULT_MAX_QUERY_SIZE
  • Test on a real environment that new list_table/column_lineage works

pyproject.toml Outdated Show resolved Hide resolved
metaphor/unity_catalog/config.py Outdated Show resolved Hide resolved
metaphor/unity_catalog/extractor.py Show resolved Hide resolved
metaphor/unity_catalog/extractor.py Outdated Show resolved Hide resolved
@usefulalgorithm usefulalgorithm force-pushed the tsung-julii/sc-21951/get-query-log-from-unity-catalog branch from 68cd94d to b108bab Compare October 30, 2023 13:10
@usefulalgorithm usefulalgorithm force-pushed the tsung-julii/sc-21951/get-query-log-from-unity-catalog branch from b108bab to 9769fde Compare October 30, 2023 13:12
@usefulalgorithm usefulalgorithm marked this pull request as ready for review October 30, 2023 13:24
@usefulalgorithm usefulalgorithm requested review from mars-lan and elic-eon and removed request for mars-lan October 30, 2023 13:25
@github-actions
Copy link

github-actions bot commented Oct 30, 2023

☂️ Python Coverage

current status: ✅

Overall Coverage

Lines Covered Coverage Threshold Status
12470 11278 90% 85% 🟢

New Files

No new covered files...

Modified Files

File Coverage Status
metaphor/unity_catalog/config.py 100% 🟢
metaphor/unity_catalog/extractor.py 93% 🟢
metaphor/unity_catalog/models.py 100% 🟢
metaphor/unity_catalog/utils.py 100% 🟢
TOTAL 98% 🟢

updated for commit: 26a236a by action🐍

@codecov
Copy link

codecov bot commented Oct 30, 2023

Codecov Report

Attention: 5 lines in your changes are missing coverage. Please review.

Files Coverage Δ
metaphor/unity_catalog/config.py 100.00% <100.00%> (ø)
metaphor/unity_catalog/models.py 100.00% <100.00%> (+1.31%) ⬆️
metaphor/unity_catalog/utils.py 100.00% <100.00%> (ø)
metaphor/unity_catalog/extractor.py 93.15% <92.75%> (+1.15%) ⬆️

... and 31 files with indirect coverage changes

📢 Thoughts on this report? Let us know!.

Copy link
Contributor

@mars-lan mars-lan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also update https://github.com/MetaphorData/connectors/blob/main/metaphor/unity_catalog/README.md with the new configs and the requirement of Databricks Premium or above.

Copy link
Contributor

@mars-lan mars-lan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also include a sample entry of the query log in the description.

metaphor/unity_catalog/models.py Show resolved Hide resolved
metaphor/unity_catalog/utils.py Outdated Show resolved Hide resolved
metaphor/unity_catalog/utils.py Show resolved Hide resolved
metaphor/unity_catalog/models.py Outdated Show resolved Hide resolved
Copy link
Contributor

@mars-lan mars-lan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM at a high level. Will let @elic-eon give the final ship it.

@mars-lan mars-lan enabled auto-merge (squash) November 1, 2023 01:59
@mars-lan mars-lan merged commit 926b3df into main Nov 1, 2023
@mars-lan mars-lan deleted the tsung-julii/sc-21951/get-query-log-from-unity-catalog branch November 1, 2023 01:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants