-
Notifications
You must be signed in to change notification settings - Fork 4.7k
OCPBUGS-55238: spyglass: hide disruption events for localhost #29710
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@vrutkovs: This pull request references Jira Issue OCPBUGS-55238, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/jira refresh |
@vrutkovs: This pull request references Jira Issue OCPBUGS-55238, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Risk analysis has seen new tests most likely introduced by this PR. New Test Risks for sha: c900caa
New tests seen in this PR at sha: c900caa
|
The problem with leaving expected disruption in and hiding it in the UI is the larger system used to monitor disruption data, all of which needs the same accommodations otherwise it flags localhost disruption as disruption and starts monitoring for changes. This would include the grafana dashboard, the alerts in dpcr cluster, and the metrics published by sippy for those alerts, as well as scheduled queries in bigquery used for the reporting. Do you intend to have this monitored for changes in disruption and pursue fixes for those issues? If so then maybe we leave it in. (but we wouldn't to hide it on interval charts) If not, these intervals really should be classified with a different source. That would immediately remove them from the analysis framework, and they would not appear in this chart. Also remember the new intervals UI under debug tools is at https://github.com/openshift/sippy/blob/main/sippy-ng/src/prow_job_runs/IntervalsChart.js and it is largely based on categorizing by Source. |
Localhost disruptions are expected when pod restarts (on rollout), but may be misleading - in most cases they are expected to happen.
We're hiding them on the main chart, but leaving on non-spyglass charts for completeness.
I don't think these are being sent for analysis anyway |
They have been spamming #trt-alerts for weeks now, up to and including today, they are definitely going into the analysis system. Can you skip generating the intervals when it's expected? |
c900caa
to
c3917f0
Compare
I think it's easier to move them to a different source |
Localhost disruptions on apiservers are useful to record, however some of them are expected (i.e. during installer pod rollout). Instead of hiding them entirely these are created as a separate source and hidden on spyglass/sippy view. Other views are displaying them in case they are helpful to find correlations
c3917f0
to
da1a05c
Compare
This looks great, thank you, just waiting to see the resulting files. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dgoodwin, vrutkovs The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold cancel Yup, looks good |
@vrutkovs: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Job Failure Risk Analysis for sha: da1a05c
|
Don't display localhost-related disruptions on spyglass. These are still displayed on non-spyglass reports in case unexpected localhost disruption happens