-
Notifications
You must be signed in to change notification settings - Fork 550
Promote OnClusterBuild featuregate to default #2192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Promote OnClusterBuild featuregate to default #2192
Conversation
Hello @yuqi-zhang! Some important instructions when contributing to openshift/api: |
/hold Currently for testing |
Given the absence of automated testing reporting into component readiness, there is a want to make an exception for this feature and promote within 4.19 based on QE testing alone Could we please get a write up on this PR detailing the existing testing that QE have been completing, with links to historical pass rates where available I would also appreciate if we can link out to any trackers for the future automated regression testing that's in the pipeline |
Verify failure in schema checker is an issue with the alpha API, we will override that issue /test e2e-aws-ovn-hypershift |
@yuqi-zhang @cheesesashimi The failures in the minor upgrade look like they are legitimate, can you please investigate? |
244a7dc
to
c939ea4
Compare
Rebased on master just in case. It looks like in the failed upgrade, the MCO hasn't started upgrade yet (?) and the operator logs are failing on:
checking if that was caused by outdated CRDs. The kube-rbac-proxy failures should also be old, since the pods should have the proper SCC labels now |
/retest |
|
These jobs are currently impacted by this issue: OCPBUGS-53408 In OCL. error: Old and new refs are equal
periodic-ci-openshift-openshift-tests-private-release-4.19-amd64-nightly-aws-ipi-tp-ocl-f7 An OCL cluster does not behave exactly the same as a non-OCL cluster, so some test cases may fail in an OCL cluster because of that. Especially the ones that scale up new nodes (OCL needs an extra reboot) or configure registries.conf (OCL clusters reboot the nodes but non-OCL cluster don't do it)
Same case as above, since an OCL cluster doesn't behave exactly as a non-OCL cluster some cases will never pass. Especially the test cases involving scaling up new nodes (OCL needs an extra reboot to apply the new image), and those configuring the registres.conf file (OCL clusters reboots the nodes in this scenario, but non-OCL clusters do not reboot the nodes).
These jobs have been reconfigured to run daily IPI on GCP, AMD64,TP IPI on AWS, ARM64,PROXY,FIPS,TP These jobs are currently impacted by OCPBUGS-49894 In OCL. Disabling OCL process is not working in clusters with proxy Unfortunately this issue breaks the cluster and we cannot recover it, so once our tests hit this issue all tests will report failures or refuse to be executed. I have just launched all the jobs with the latest nightly build. |
Thanks for the writeup, this is useful context!
No, that's an error, this hasn't been updated since branching, we really need to make a way to automate this, I will get on it.
Based on feedback from some promotions in the previous release, we would ideally see a week of clean, daily runs of the CI prior to merging. We should aim to prio fixing this ASAP (and the other issue mentioned later)
How do you ensure you understand this signal when there are so many false positives? Is work being done to mitigate the false positives and resolve the differences between the OCL and non-OCL cases?
Once promoted, we will also need the stable cluster configuration to run daily, has this also been adjusted? |
/test minor-e2e-upgrade-minor |
E2E minor is now doing the correct thing, lets see if it works this time! |
/retest |
Adding some notes here from conversations so that we capture them within the promotion process: @cheesesashimi has gather pass rates for the last week and put them together in a sheet. Looking at this, and based on our conversation in slack, we can see that our top 5 tests are showing a minimum 85% pass rate, and, that we have on average 5 runs per platform right now. For reference, we would normally be asking for at least 5 tests, and for all tests to attain a 95% pass rate across 6 platforms with 14 runs over 2 weeks to promote any regular feature. It would be good to understand what is causing the tests to fail, are there issues with the feature itself, or can they be attributed to other failures? |
/skip |
c939ea4
to
d386d6a
Compare
Rebased just for good measure |
/retest |
2 similar comments
/retest |
/retest |
/test e2e-aws-serial-techpreview e2e-upgrade minor-e2e-upgrade-minor e2e-aws-serial e2e-aws-ovn e2e-aws-ovn-techpreview e2e-upgrade-out-of-change I believe all of the CI issues have cleared up so lets see if we can get some of those to pass. |
The verify-crd-schema failure seems to be related to the fact that we renamed a file to
|
/override ci/prow/verify-crd-schema As explained above, this happened through a rename of an existing file. These are from existing problems that we cannot rectify without widespread consideration for the impacts |
@JoelSpeed: Overrode contexts on behalf of JoelSpeed: ci/prow/verify-crd-schema In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/test e2e-aws-serial-techpreview e2e-upgrade minor-e2e-upgrade-minor e2e-aws-serial e2e-aws-ovn e2e-aws-ovn-techpreview e2e-upgrade-out-of-change Try rerunning these since the failures did not appear related to this PR. |
/test e2e-aws-serial-techpreview e2e-upgrade minor-e2e-upgrade-minor e2e-aws-serial e2e-aws-ovn e2e-aws-ovn-techpreview e2e-upgrade-out-of-change |
The serial suite is pretty broken at the moment, since the suite passed on Friday, modulo a deprovision timeout, I think we can override this one /override ci/prow/e2e-aws-serial-techpreview |
@JoelSpeed: Overrode contexts on behalf of JoelSpeed: ci/prow/e2e-aws-serial-techpreview In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/override ci/prow/e2e-aws-serial This passed yesterday, and the base of main has only moved for a tooling change since, so I'm confident that this wouldn't have re-run if we hadn't merged that tooling change |
@JoelSpeed: Overrode contexts on behalf of JoelSpeed: ci/prow/e2e-aws-serial In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/override ci/prow/minor-e2e-upgrade-minor This passed on Friday, but the tooling change meant it had to be re-run. I'm confident the tooling changes haven't impacted this result and so the value from Friday should still be valid |
@JoelSpeed: Overrode contexts on behalf of JoelSpeed: ci/prow/minor-e2e-upgrade-minor In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
+1 to Promote OnClusterBuild featuregate to default, despite test results currently below the usual 95% expectation. cc @craychee |
Per #2192 (comment) /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: JoelSpeed, yuqi-zhang The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@JoelSpeed: Overrode contexts on behalf of JoelSpeed: ci/prow/verify-feature-promotion In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
@yuqi-zhang: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
/hold cancel |
[ART PR BUILD NOTIFIER] Distgit: ose-cluster-config-api |
Final step to GA On Cluster Layering, after
openshift/machine-config-operator#4756 and #2134
See also updated enhancement: openshift/enhancements#1515