Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1906607: Fix telemetry collection for all SnowflakePlan #2967

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

sfc-gh-aalam
Copy link
Contributor

@sfc-gh-aalam sfc-gh-aalam commented Jan 31, 2025

  1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes SNOW-1906607

  2. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code
      • If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing
    • I am adding new logging messages
    • I am adding a new telemetry message
    • I am adding new credentials
    • I am adding a new dependency
    • If this is a new feature/behavior, I'm adding the Local Testing parity changes.
    • I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: Thread-safe Developer Guidelines
  3. Please describe how your code solves the related issue.

    Currently snowflake plan metrics related telemetry is only collected for apis that are tagged with df_collect_api_telemetry. This PR fixes this issue and collects telemetry for all snowflake plans under type snowpark_compilation_stage_statistics.

@sfc-gh-aalam sfc-gh-aalam added the NO-CHANGELOG-UPDATES This pull request does not need to update CHANGELOG.md label Jan 31, 2025
api_calls[0][TelemetryField.THREAD_IDENTIFIER.value] = threading.get_ident()
except Exception:
pass
api_calls[0][TelemetryField.THREAD_IDENTIFIER.value] = threading.get_ident()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would threading.get_ident() throw an exception under any case? if yes, let's keep the try catch part

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is method of in built python module threading which is unlikely to fail under. Only issue I could find relating to this throwing an exception is this: python/cpython#128189 which looks like it was more of a user error than a library error.

CompilationStageTelemetryField.QUERY_PLAN_HEIGHT.value
] = plan_state[PlanState.PLAN_HEIGHT]
api_calls[0][
CompilationStageTelemetryField.QUERY_PLAN_NUM_SELECTS_WITH_COMPLEXITY_MERGED.value
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sfc-gh-aalam what is the kind of dicrepancy you are referring to in the slack thread https://snowflake.slack.com/archives/C03MJ5AA8CS/p1738347392255399?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In our telemetry collected for job etl logs, we have sql statements submitted by snowpark client which have failed with error codes we are interested in, but the corresponding telemetry for the same plan uuid does not exist in client telemtery being sent from this part of code. This is a decorator -df_collect_api_telemetry is only put on select few apis and is not applied on all functions. As a result, will miss some important cases like dataframe.write.save_as_table.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated, but do we think that any of the existing logic in the decorator may also be relevant to not-currently-decorated functions like save_as_table?

@@ -765,6 +773,45 @@ def get_result_set(

return result, result_meta

def send_plan_metrics_telemetry(self, plan: SnowflakePlan) -> None:
"""Extract the SnowflakePlan's metrics and including plan_state, uuid identifiers, complexity
classification breakdown, and complexity score.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we get a parameter her to guard the sending of the new telemetry? it now occurs on the critical pass of the query execution, and we do now know if there will be any side effect introduced

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there should ideally be no side effects but having a param protection sounds reasonable given how unstable we have been recently.

Copy link
Contributor

@sfc-gh-helmeleegy sfc-gh-helmeleegy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks Afroz.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NO-CHANGELOG-UPDATES This pull request does not need to update CHANGELOG.md
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants