-
Couldn't load subscription status.
- Fork 4.8k
OCPBUGS-62264: vSphere snapshot options test should wait for operator to settle #30336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@RomanBednar: This pull request references Jira Issue OCPBUGS-62264, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. In response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: RomanBednar The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/jira refresh |
|
@RomanBednar: This pull request references Jira Issue OCPBUGS-62264, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira (wduan@redhat.com), skipping review request. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/payload-job periodic-ci-openshift-release-master-nightly-4.21-e2e-vsphere-ovn-serial |
|
@RomanBednar: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/650dd560-a03d-11f0-9b26-9d92c77dd6e8-0 |
|
/test help |
|
@jsafrane: The specified target(s) for The following commands are available to trigger optional jobs: Use In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/payload-job periodic-ci-openshift-release-master-nightly-4.21-e2e-vsphere-ovn-serial |
|
@jsafrane: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/c38b5a20-a292-11f0-9c93-98188bb0b34c-0 |
| err := util.WaitForOperatorProgressingFalse(ctx, oc.AdminConfigClient(), providerName) | ||
| o.Expect(err).NotTo(o.HaveOccurred()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
providerName is csi.vsphere.vmware.com and WaitForOperatorProgressingFalse tries to read ClusterOperator with this name... Do I miss anything?!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this should be cluster operator name instead (it would be more obvious if the function said ClusterOperator).
| // Wait for operator to be Progressing=False to ensure all pod creation events complete before test ends. | ||
| // This allows the pathological event matcher (newVsphereConfigurationTestsRollOutTooOftenEventMatcher in | ||
| // pkg/monitortestlibrary/pathologicaleventlibrary/duplicated_event_patterns.go) to accurately attribute | ||
| // pod events to this test's time window (interval); any events emitted later would not be matched. | ||
| err := util.WaitForOperatorProgressingFalse(ctx, oc.AdminConfigClient(), providerName) | ||
| o.Expect(err).NotTo(o.HaveOccurred()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole chunk belongs to AfterEach, where the original configuration is restored. We should ensure that events produced by that are counted within the test interval too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, but shouldn't we wait for both config change and restore? Just to make sure the config is mounted on controller pods before we start validateSnapshotCreation.
| // This allows the pathological event matcher (newVsphereConfigurationTestsRollOutTooOftenEventMatcher in | ||
| // pkg/monitortestlibrary/pathologicaleventlibrary/duplicated_event_patterns.go) to accurately attribute | ||
| // pod events to this test's time window (interval); any events emitted later would not be matched. | ||
| err := util.WaitForOperatorProgressingFalse(ctx, oc.AdminConfigClient(), providerName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can WaitForOperatorProgressingFalse() observe the ClusterOperator as not progressing before the CSI driver operator has even chance to see the new config and act on it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like that's not guaranteed with the current test code, we can make it more robust. But I don't think there's a way to directly check if the operator picked up the change. What about adding a helper function that could be called immediately after config change that would wait for operator to start progressing and then stop progressing for a few seconds?
cc57d96 to
7069684
Compare
|
/payload-job periodic-ci-openshift-release-master-nightly-4.21-e2e-vsphere-ovn-serial |
|
@RomanBednar: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/026832e0-a35b-11f0-90c6-17256641cd71-0 |
|
Job Failure Risk Analysis for sha: 7069684
|
| e2e.Logf("Waiting for storage operator to be Progressing=True") | ||
| o.Eventually(exutil.WaitForOperatorProgressingTrue(ctx, oc.AdminConfigClient(), "storage")).WithTimeout(time.Second * 10).Should(o.Succeed()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not all tests actually modify existing ClusterCSIDriver, so the operator won't get Progressing.
I would wait ~10 seconds if the operator gets Progressing and if not, then just assume nothing changes.
Or each test case could have its own flag if Progressing is expected or not and fail if the condition unexpectedly changes (or does not change).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, one of them just tests defaults. I like the idea of having a parameter for it - adding.
But we should test for Progressing=False on two places right? If the test is not supposed to progress and when we restore the the config. And only the latter is addressing the original issue of emitting events after exiting the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, one of them just tests defaults. I like the idea of having a parameter for it - adding.
But we should test for Progressing=False on two places right? If the test is not supposed to progress and when we restore the the config. And only the latter is addressing the original issue of emitting events after exiting the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the check should be on both places.
7069684 to
52d75ed
Compare
|
/payload-job periodic-ci-openshift-release-master-nightly-4.21-e2e-vsphere-ovn-serial |
|
@jsafrane: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/99afc5b0-aaa3-11f0-84c2-1ab0df8424f3-0 |
| setClusterCSIDriverSnapshotOptions(ctx, oc, t.clusterCSIDriverOptions) | ||
|
|
||
| if operatorShouldProgress { | ||
| o.Eventually(exutil.WaitForOperatorProgressingTrue(ctx, oc.AdminConfigClient(), "storage")).WithTimeout(time.Second * 10).Should(o.Succeed()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After it detects Progressing=true, should it wait for Progressing=False?
| // in pkg/monitortestlibrary/pathologicaleventlibrary/duplicated_event_patterns.go) to accurately attribute | ||
| // pod events to this test's time window (interval); any events emitted later would not be matched. | ||
| if operatorShouldProgress { | ||
| o.Consistently(exutil.WaitForOperatorProgressingFalse(ctx, oc.AdminConfigClient(), "storage")).WithTimeout(time.Second * 10).Should(o.Succeed()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious here, how does Consistently.WithTimeout("10s") work when the called function (exutil.WaitForOperatorProgressingFalse) takes longer than 10s?
What we need is to check for Progressing=true in the first 10 seconds, and if it happens, then wait indefinitely for Progressing=false. Is that what the code does?
|
@RomanBednar: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
No description provided.