Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 2233615: core: Operator skips reconcile of mons and osds in debug #512

Merged

Conversation

travisn
Copy link

@travisn travisn commented Aug 22, 2023

Description of your changes:
During certain maintenance tasks the admin will own running operations on the ceph mons and osds, and the operator should not interfere with those operations. If the operator sees any mon in debug mode, every reconcile and mon health check will be skipped. Thus, mons will not be updated while any one of them is in maintenance. During OSD reconcile, individual OSD deployment updates will only be skipped for OSDs that are actively being debugged.

The debug mode for osd and mon deployments is signaled by creating the ceph.rook.io/do-not-reconcile label.

Which issue is resolved by this Pull Request:
Resolves #https://bugzilla.redhat.com/show_bug.cgi?id=2233615

Checklist:

  • Commit Message Formatting: Commit titles and messages follow guidelines in the developer guide).
  • Skip Tests for Docs: If this is only a documentation change, add the label skip-ci on the PR.
  • Reviewed the developer guide on Submitting a Pull Request
  • Pending release notes updated with breaking and/or notable changes for the next minor release.
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.

During certain maintenance tasks the admin will own running
operations on the ceph mons and osds, and the operator should
not interfere with those operations. If the operator sees
any mon in debug mode, every reconcile and mon health check
will be skipped. Thus, mons will not be updated while any
one of them is in maintenance. During OSD reconcile, individual
OSD deployment updates will only be skipped for OSDs that are
actively being debugged.

The debug mode for osd and mon deployments is signaled by
creating the ceph.rook.io/do-not-reconcile label.

Signed-off-by: Travis Nielsen <tnielsen@redhat.com>
(cherry picked from commit 7c56b93)
(cherry picked from commit f34c940)
@openshift-ci
Copy link

openshift-ci bot commented Aug 22, 2023

@travisn: Bugzilla bug 2233615 is in a bug group that is not in the allowed groups for this repo.
Allowed groups for this repo are:

  • qe_staff
  • redhat

In response to this:

Bug 2233615: core: Operator skips reconcile of mons and osds in debug

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 22, 2023
Copy link

@subhamkrai subhamkrai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@travisn , how about we do cherry-pick with -x from 10585 pr from rook?

I see these are the exact same changes but it helps in tracing things when something goes wrong, maybe it will not require this change but in general, good to follow.

@travisn
Copy link
Author

travisn commented Aug 23, 2023

@travisn , how about we do cherry-pick with -x from 10585 pr from rook?

I see these are the exact same changes but it helps in tracing things when something goes wrong, maybe it will not require this change but in general, good to follow.

I cherry-picked from rook#10690, which was the backport of 10585. What different approach are you suggesting?

@subhamkrai
Copy link

@travisn , how about we do cherry-pick with -x from 10585 pr from rook?
I see these are the exact same changes but it helps in tracing things when something goes wrong, maybe it will not require this change but in general, good to follow.

I cherry-picked from rook#10690, which was the backport of 10585. What different approach are you suggesting?

I don't see the cherry-pick comment we get when use -x with cherry-pick so I added the comment

@travisn
Copy link
Author

travisn commented Aug 25, 2023

@travisn , how about we do cherry-pick with -x from 10585 pr from rook?
I see these are the exact same changes but it helps in tracing things when something goes wrong, maybe it will not require this change but in general, good to follow.

I cherry-picked from rook#10690, which was the backport of 10585. What different approach are you suggesting?

I don't see the cherry-pick comment we get when use -x with cherry-pick so I added the comment

Did you look on the commit page? I just usually delete it from the PR message.

@subhamkrai
Copy link

@travisn , how about we do cherry-pick with -x from 10585 pr from rook?
I see these are the exact same changes but it helps in tracing things when something goes wrong, maybe it will not require this change but in general, good to follow.

I cherry-picked from rook#10690, which was the backport of 10585. What different approach are you suggesting?

I don't see the cherry-pick comment we get when use -x with cherry-pick so I added the comment

Did you look on the commit page? I just usually delete it from the PR message.

ah, missed that.

@subhamkrai
Copy link

@travisn need rebase?

@travisn
Copy link
Author

travisn commented Aug 31, 2023

@travisn need rebase?

It's already on the latest. Are you wondering about the failing CI? I think the release-4.11 CI is just outdated and those are not related to the PR.

@subhamkrai
Copy link

@travisn need rebase?

It's already on the latest. Are you wondering about the failing CI? I think the release-4.11 CI is just outdated and those are not related to the PR.

yes, I was thinking was ci are failing but as you mentioned it could mainly due to older version

@openshift-ci
Copy link

openshift-ci bot commented Aug 31, 2023

@subhamkrai: changing LGTM is restricted to collaborators

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci
Copy link

openshift-ci bot commented Aug 31, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: subhamkrai, travisn

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@travisn
Copy link
Author

travisn commented Aug 31, 2023

@travisn need rebase?

It's already on the latest. Are you wondering about the failing CI? I think the release-4.11 CI is just outdated and those are not related to the PR.

yes, I was thinking was ci are failing but as you mentioned it could mainly due to older version

Ok thanks, i'll go ahead and merge.

@travisn travisn merged commit 8e8e4f6 into red-hat-storage:release-4.11 Aug 31, 2023
28 of 42 checks passed
@travisn travisn deleted the backport-skip-reconcile-4.11 branch August 31, 2023 14:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants