OCPBUGS-62264: vSphere snapshot options test should wait for operator to settle #30336

RomanBednar · 2025-10-03T09:31:16Z

No description provided.

openshift-ci-robot · 2025-10-03T09:31:23Z

@RomanBednar: This pull request references Jira Issue OCPBUGS-62264, which is invalid:

expected the bug to target the "4.21.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci · 2025-10-03T09:33:53Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: RomanBednar
Once this PR has been reviewed and has the lgtm label, please assign tsmetana for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

test/extended/storage/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

RomanBednar · 2025-10-03T09:43:05Z

/jira refresh

openshift-ci-robot · 2025-10-03T09:43:13Z

@RomanBednar: This pull request references Jira Issue OCPBUGS-62264, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (4.21.0) matches configured target version for branch (4.21.0)
bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira (wduan@redhat.com), skipping review request.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

RomanBednar · 2025-10-03T09:43:18Z

/payload-job periodic-ci-openshift-release-master-nightly-4.21-e2e-vsphere-ovn-serial

openshift-ci · 2025-10-03T09:43:27Z

@RomanBednar: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

periodic-ci-openshift-release-master-nightly-4.21-e2e-vsphere-ovn-serial

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/650dd560-a03d-11f0-9b26-9d92c77dd6e8-0

jsafrane · 2025-10-06T08:53:30Z

/test help

openshift-ci · 2025-10-06T08:53:40Z

@jsafrane: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

/test e2e-aws-csi

/test e2e-aws-jenkins

/test e2e-aws-ovn-fips

/test e2e-aws-ovn-image-registry

/test e2e-aws-ovn-microshift

/test e2e-aws-ovn-microshift-serial

/test e2e-aws-ovn-serial-1of2

/test e2e-aws-ovn-serial-2of2

/test e2e-gcp-csi

/test e2e-gcp-ovn

/test e2e-gcp-ovn-builds

/test e2e-gcp-ovn-image-ecosystem

/test e2e-gcp-ovn-upgrade

/test e2e-metal-ipi-ovn-ipv6

/test e2e-vsphere-ovn

/test e2e-vsphere-ovn-upi

/test images

/test lint

/test okd-scos-images

/test unit

/test verify

/test verify-deps

The following commands are available to trigger optional jobs:

/test e2e-agnostic-ovn-cmd

/test e2e-aws-disruptive

/test e2e-aws-etcd-certrotation

/test e2e-aws-etcd-recovery

/test e2e-aws-ovn

/test e2e-aws-ovn-cgroupsv2

/test e2e-aws-ovn-edge-zones

/test e2e-aws-ovn-etcd-scaling

/test e2e-aws-ovn-kube-apiserver-rollout

/test e2e-aws-ovn-kubevirt

/test e2e-aws-ovn-serial-fast

/test e2e-aws-ovn-serial-ipsec

/test e2e-aws-ovn-serial-publicnet-1of2

/test e2e-aws-ovn-serial-publicnet-2of2

/test e2e-aws-ovn-single-node

/test e2e-aws-ovn-single-node-serial

/test e2e-aws-ovn-single-node-techpreview

/test e2e-aws-ovn-single-node-techpreview-serial

/test e2e-aws-ovn-single-node-upgrade

/test e2e-aws-ovn-upgrade

/test e2e-aws-ovn-upgrade-rollback

/test e2e-aws-ovn-upi

/test e2e-aws-proxy

/test e2e-azure

/test e2e-azure-ovn-etcd-scaling

/test e2e-azure-ovn-upgrade

/test e2e-baremetalds-kubevirt

/test e2e-external-aws

/test e2e-external-aws-ccm

/test e2e-external-vsphere-ccm

/test e2e-gcp-disruptive

/test e2e-gcp-fips-serial-1of2

/test e2e-gcp-fips-serial-2of2

/test e2e-gcp-ovn-etcd-scaling

/test e2e-gcp-ovn-rt-upgrade

/test e2e-gcp-ovn-techpreview

/test e2e-gcp-ovn-techpreview-serial-1of2

/test e2e-gcp-ovn-techpreview-serial-2of2

/test e2e-gcp-ovn-usernamespace

/test e2e-hypershift-conformance

/test e2e-metal-ipi-ovn

/test e2e-metal-ipi-ovn-bgp-virt-dualstack

/test e2e-metal-ipi-ovn-bgp-virt-dualstack-techpreview

/test e2e-metal-ipi-ovn-dualstack

/test e2e-metal-ipi-ovn-dualstack-bgp

/test e2e-metal-ipi-ovn-dualstack-bgp-local-gw

/test e2e-metal-ipi-ovn-dualstack-local-gateway

/test e2e-metal-ipi-ovn-kube-apiserver-rollout

/test e2e-metal-ipi-serial-1of2

/test e2e-metal-ipi-serial-2of2

/test e2e-metal-ipi-serial-ovn-ipv6-1of2

/test e2e-metal-ipi-serial-ovn-ipv6-2of2

/test e2e-metal-ipi-virtualmedia

/test e2e-metal-ovn-single-node-live-iso

/test e2e-metal-ovn-single-node-with-worker-live-iso

/test e2e-metal-ovn-two-node-arbiter

/test e2e-metal-ovn-two-node-fencing

/test e2e-openstack-ovn

/test e2e-openstack-serial

/test e2e-test-image-stream-import-mode-techpreview

/test e2e-vsphere-ovn-dualstack-primaryv6

/test e2e-vsphere-ovn-etcd-scaling

/test okd-scos-e2e-aws-ovn

Use /test all to run the following jobs that were automatically triggered:

pull-ci-openshift-origin-main-e2e-aws-csi

pull-ci-openshift-origin-main-e2e-aws-ovn-fips

pull-ci-openshift-origin-main-e2e-aws-ovn-microshift

pull-ci-openshift-origin-main-e2e-aws-ovn-microshift-serial

pull-ci-openshift-origin-main-e2e-aws-ovn-serial-1of2

pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2

pull-ci-openshift-origin-main-e2e-aws-ovn-single-node

pull-ci-openshift-origin-main-e2e-aws-ovn-single-node-serial

pull-ci-openshift-origin-main-e2e-aws-ovn-single-node-upgrade

pull-ci-openshift-origin-main-e2e-gcp-csi

pull-ci-openshift-origin-main-e2e-gcp-ovn

pull-ci-openshift-origin-main-e2e-gcp-ovn-upgrade

pull-ci-openshift-origin-main-e2e-metal-ipi-ovn

pull-ci-openshift-origin-main-e2e-metal-ipi-ovn-ipv6

pull-ci-openshift-origin-main-e2e-metal-ipi-ovn-kube-apiserver-rollout

pull-ci-openshift-origin-main-e2e-openstack-ovn

pull-ci-openshift-origin-main-e2e-vsphere-ovn

pull-ci-openshift-origin-main-e2e-vsphere-ovn-upi

pull-ci-openshift-origin-main-images

pull-ci-openshift-origin-main-lint

pull-ci-openshift-origin-main-okd-scos-e2e-aws-ovn

pull-ci-openshift-origin-main-okd-scos-images

pull-ci-openshift-origin-main-unit

pull-ci-openshift-origin-main-verify

pull-ci-openshift-origin-main-verify-deps

In response to this:

/test help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

jsafrane · 2025-10-06T08:59:26Z

/payload-job periodic-ci-openshift-release-master-nightly-4.21-e2e-vsphere-ovn-serial

openshift-ci · 2025-10-06T08:59:32Z

@jsafrane: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

periodic-ci-openshift-release-master-nightly-4.21-e2e-vsphere-ovn-serial

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/c38b5a20-a292-11f0-9c93-98188bb0b34c-0

jsafrane · 2025-10-06T09:00:37Z

test/extended/storage/driver_configuration.go

+				err := util.WaitForOperatorProgressingFalse(ctx, oc.AdminConfigClient(), providerName)
+				o.Expect(err).NotTo(o.HaveOccurred())


providerName is csi.vsphere.vmware.com and WaitForOperatorProgressingFalse tries to read ClusterOperator with this name... Do I miss anything?!

Yes, this should be cluster operator name instead (it would be more obvious if the function said ClusterOperator).

jsafrane · 2025-10-06T09:06:19Z

test/extended/storage/driver_configuration.go

+				// Wait for operator to be Progressing=False to ensure all pod creation events complete before test ends.
+				// This allows the pathological event matcher (newVsphereConfigurationTestsRollOutTooOftenEventMatcher in
+				// pkg/monitortestlibrary/pathologicaleventlibrary/duplicated_event_patterns.go) to accurately attribute
+				// pod events to this test's time window (interval); any events emitted later would not be matched.
+				err := util.WaitForOperatorProgressingFalse(ctx, oc.AdminConfigClient(), providerName)
+				o.Expect(err).NotTo(o.HaveOccurred())


This whole chunk belongs to AfterEach, where the original configuration is restored. We should ensure that events produced by that are counted within the test interval too.

Agreed, but shouldn't we wait for both config change and restore? Just to make sure the config is mounted on controller pods before we start validateSnapshotCreation.

jsafrane · 2025-10-06T09:08:05Z

test/extended/storage/driver_configuration.go

+				// This allows the pathological event matcher (newVsphereConfigurationTestsRollOutTooOftenEventMatcher in
+				// pkg/monitortestlibrary/pathologicaleventlibrary/duplicated_event_patterns.go) to accurately attribute
+				// pod events to this test's time window (interval); any events emitted later would not be matched.
+				err := util.WaitForOperatorProgressingFalse(ctx, oc.AdminConfigClient(), providerName)


Can WaitForOperatorProgressingFalse() observe the ClusterOperator as not progressing before the CSI driver operator has even chance to see the new config and act on it?

Looks like that's not guaranteed with the current test code, we can make it more robust. But I don't think there's a way to directly check if the operator picked up the change. What about adding a helper function that could be called immediately after config change that would wait for operator to start progressing and then stop progressing for a few seconds?

RomanBednar · 2025-10-07T08:52:52Z

/payload-job periodic-ci-openshift-release-master-nightly-4.21-e2e-vsphere-ovn-serial

openshift-ci · 2025-10-07T08:53:05Z

@RomanBednar: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

periodic-ci-openshift-release-master-nightly-4.21-e2e-vsphere-ovn-serial

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/026832e0-a35b-11f0-90c6-17256641cd71-0

openshift-trt · 2025-10-07T14:12:59Z

Job Failure Risk Analysis for sha: 7069684

Job Name	Failure Risk
pull-ci-openshift-origin-main-e2e-aws-ovn-single-node-upgrade	IncompleteTests Tests for this run (2163) are below the historical average (3485): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-okd-scos-e2e-aws-ovn	IncompleteTests Tests for this run (139) are below the historical average (1258): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

jsafrane · 2025-10-07T14:54:50Z

test/extended/storage/driver_configuration.go

+	e2e.Logf("Waiting for storage operator to be Progressing=True")
+	o.Eventually(exutil.WaitForOperatorProgressingTrue(ctx, oc.AdminConfigClient(), "storage")).WithTimeout(time.Second * 10).Should(o.Succeed())


Not all tests actually modify existing ClusterCSIDriver, so the operator won't get Progressing.

I would wait ~10 seconds if the operator gets Progressing and if not, then just assume nothing changes.
Or each test case could have its own flag if Progressing is expected or not and fail if the condition unexpectedly changes (or does not change).

True, one of them just tests defaults. I like the idea of having a parameter for it - adding.

But we should test for Progressing=False on two places right? If the test is not supposed to progress and when we restore the the config. And only the latter is addressing the original issue of emitting events after exiting the test.

Yes, the check should be on both places.

…tion

jsafrane · 2025-10-16T15:20:07Z

/payload-job periodic-ci-openshift-release-master-nightly-4.21-e2e-vsphere-ovn-serial

openshift-ci · 2025-10-16T15:20:21Z

@jsafrane: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

periodic-ci-openshift-release-master-nightly-4.21-e2e-vsphere-ovn-serial

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/99afc5b0-aaa3-11f0-84c2-1ab0df8424f3-0

jsafrane · 2025-10-16T15:28:54Z

test/extended/storage/driver_configuration.go

 				setClusterCSIDriverSnapshotOptions(ctx, oc, t.clusterCSIDriverOptions)
+
+				if operatorShouldProgress {
+					o.Eventually(exutil.WaitForOperatorProgressingTrue(ctx, oc.AdminConfigClient(), "storage")).WithTimeout(time.Second * 10).Should(o.Succeed())


After it detects Progressing=true, should it wait for Progressing=False?

jsafrane · 2025-10-16T15:39:13Z

test/extended/storage/driver_configuration.go

+		// in pkg/monitortestlibrary/pathologicaleventlibrary/duplicated_event_patterns.go) to accurately attribute
+		// pod events to this test's time window (interval); any events emitted later would not be matched.
+		if operatorShouldProgress {
+			o.Consistently(exutil.WaitForOperatorProgressingFalse(ctx, oc.AdminConfigClient(), "storage")).WithTimeout(time.Second * 10).Should(o.Succeed())


Just curious here, how does Consistently.WithTimeout("10s") work when the called function (exutil.WaitForOperatorProgressingFalse) takes longer than 10s?

What we need is to check for Progressing=true in the first 10 seconds, and if it happens, then wait indefinitely for Progressing=false. Is that what the code does?

openshift-ci · 2025-10-17T12:53:13Z

@RomanBednar: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-aws-ovn-single-node-serial	`cc57d96`	link	false	`/test e2e-aws-ovn-single-node-serial`
ci/prow/e2e-openstack-ovn	`cc57d96`	link	false	`/test e2e-openstack-ovn`
ci/prow/e2e-aws-ovn-single-node-upgrade	`cc57d96`	link	false	`/test e2e-aws-ovn-single-node-upgrade`
ci/prow/e2e-gcp-ovn	`52d75ed`	link	true	`/test e2e-gcp-ovn`
ci/prow/okd-scos-e2e-aws-ovn	`52d75ed`	link	false	`/test okd-scos-e2e-aws-ovn`
ci/prow/e2e-gcp-ovn-upgrade	`52d75ed`	link	true	`/test e2e-gcp-ovn-upgrade`
ci/prow/e2e-aws-ovn-fips	`52d75ed`	link	true	`/test e2e-aws-ovn-fips`
ci/prow/go-verify-deps	`52d75ed`	link	true	`/test go-verify-deps`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci bot requested review from dobsonj and tsmetana October 3, 2025 09:33

openshift-ci-robot added the jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. label Oct 3, 2025

openshift-ci-robot removed the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Oct 3, 2025

jsafrane reviewed Oct 6, 2025

View reviewed changes

RomanBednar force-pushed the OCPBUGS-62264 branch from cc57d96 to 7069684 Compare October 7, 2025 08:50

jsafrane reviewed Oct 7, 2025

View reviewed changes

vSphere snapshot options test should check operator progressing condi…

52d75ed

…tion

RomanBednar force-pushed the OCPBUGS-62264 branch from 7069684 to 52d75ed Compare October 9, 2025 13:31

jsafrane reviewed Oct 16, 2025

View reviewed changes

		err := util.WaitForOperatorProgressingFalse(ctx, oc.AdminConfigClient(), providerName)
		o.Expect(err).NotTo(o.HaveOccurred())

		e2e.Logf("Waiting for storage operator to be Progressing=True")
		o.Eventually(exutil.WaitForOperatorProgressingTrue(ctx, oc.AdminConfigClient(), "storage")).WithTimeout(time.Second * 10).Should(o.Succeed())

Uh oh!

OCPBUGS-62264: vSphere snapshot options test should wait for operator to settle #30336

Are you sure you want to change the base?

OCPBUGS-62264: vSphere snapshot options test should wait for operator to settle #30336

Uh oh!

Conversation

RomanBednar commented Oct 3, 2025

Uh oh!

openshift-ci-robot commented Oct 3, 2025

Uh oh!

openshift-ci bot commented Oct 3, 2025

Uh oh!

RomanBednar commented Oct 3, 2025

Uh oh!

openshift-ci-robot commented Oct 3, 2025

Uh oh!

RomanBednar commented Oct 3, 2025

Uh oh!

openshift-ci bot commented Oct 3, 2025

Uh oh!

jsafrane commented Oct 6, 2025

Uh oh!

openshift-ci bot commented Oct 6, 2025

Uh oh!

jsafrane commented Oct 6, 2025

Uh oh!

openshift-ci bot commented Oct 6, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RomanBednar commented Oct 7, 2025

Uh oh!

openshift-ci bot commented Oct 7, 2025

Uh oh!

openshift-trt bot commented Oct 7, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jsafrane commented Oct 16, 2025

Uh oh!

openshift-ci bot commented Oct 16, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

openshift-ci bot commented Oct 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants