Sync latest master from upstream rook #567

Nikhil-Ladha · 2024-02-05T08:04:46Z

Description of your changes:
This PR syncs the latest code from upstream master to downstream master branch.

Checklist:

Commit Message Formatting: Commit titles and messages follow guidelines in the developer guide).
Skip Tests for Docs: If this is only a documentation change, add the label skip-ci on the PR.
Reviewed the developer guide on Submitting a Pull Request
Pending release notes updated with breaking and/or notable changes for the next minor release.
Documentation has been updated, if necessary.
Unit tests have been added, if necessary.
Integration tests have been added, if necessary.

Fixes: #13167 Previously, the mgr did not honor the flag ContinueUpgradeAfterChecksEvenIfNotHealthy from the cluster spec. Only osd, mds, and rgw did. To render the update behavior correct and complete across the daemons, this change implements the honoring of the flag for the mgr. Signed-off-by: Michael Adam <obnox@samba.org>

It was somewhat difficult to find an appropriate and allowed commit prefix for changes to the k8sutil package. This changes removes this problem by addink "k8sutil" to the list of allowed prefixes in our commitlint configuration Signed-off-by: Michael Adam <obnox@samba.org>

The helm templates assumed that the resources would be installed to the given namespace for the helm install or upgrade. This works perfectly until there is a desire to extract the manifests from the helm chart and instead install with those. Thus, the namespace is added to all the resources in the chart where they were missing. Signed-off-by: travisn <tnielsen@redhat.com>

Adding callback function in the osd removal method as in downstream there is requirement of adding extra check before proceeding with osd removal. Signed-off-by: subhamkrai <srai@redhat.com>

osd: add callback function in osd removal

current code gets the ip:port for the dashboard by using the ip of the mgr pods. This works great when there's only one mgr but it fails in case of a multi-node cluster Signed-off-by: Redouane Kachach <rkachach@redhat.com>

Originally we create it using this cmd ceph fs subvolume create <vol_name> <subvol_name> So we can have 2 variables filesystem and subvolume name, Currently the CR doesn't allow us to make subvolume-name as constant as needed to "csi" because of k8s limitations Signed-off-by: parth-gr <paarora@redhat.com>

test: fix how we obtain the dashboard endpoint

Signed-off-by: Redouane Kachach <rkachach@redhat.com>

at this moment we have two different cluster spec files for testing cluster-test.yaml and cluster-on-pvc-minikube.yaml. With the new option user can choose which one to use to bootstrap the cluster Signed-off-by: Redouane Kachach <rkachach@redhat.com>

Signed-off-by: Redouane Kachach <rkachach@redhat.com>

mgr: honor the ContinueUpgradeAfterChecksEvenIfNotHealthy flag

subvolumegroup: add name spec in subvolumegroup CRD

helm: Add namespace to all resource templates

This commits removes controller-runtime dependencies from the apis dir and to achieve that we are removing webhook. Signed-off-by: subhamkrai <srai@redhat.com>

There was a missing codegen deep copy by running `make codegen` it updated it Signed-off-by: Rakshith R <rar@redhat.com>

build: missing DeepCopy code for CSIDriverSpec

this commit adds validating admission policy for cephcluster cr according to webhook rules. Not all the webhook can be moved to validating admission policy for example checking multus selector validation. Signed-off-by: subhamkrai <srai@redhat.com>

core: remove webhook & controller-runtime from apis

webhook: add validating admission policy

external: add support for rados namespace for external cluster

Add mergify rules for opening backport PRs to the release-1.13 branch and also for auto-merging backports after the CI passes. Signed-off-by: travisn <tnielsen@redhat.com>

This implements the "Ceph Config via Ceph Cluster CRD" design document as a `cephConfig:` structure on the CRD. This also fixes the `yq` commands used to manipulate the `cluster-test.yaml` that caused CI issues for this PR and potentially unknowingly others. Signed-off-by: Alexander Trost <galexrt@googlemail.com>

Signed-off-by: Anthony D'Atri <anthonyeleven@users.noreply.github.com>

ci: Mergify rules for release-1.13

Signed-off-by: Cyril Jouve <jv.cyril@gmail.com>

…impl operator: allow setting ceph config options via ceph cluster crd

12976 : Improve Documentation/Storage-Configuration/Ceph-CSI/ceph-csi…

Add toleration of 5 seconds to rook-ceph operator deployment to override default toleration seconds of 300 seconds

…ndencies-99794afd47 build(deps): bump the github-dependencies group with 5 updates

exporter: Don't delete exporter service on daemon deletion

…th-generation csi: Fix NetNamespaceFilePath generation with namespace instead of name

This is needed because not all contributors may be using gomft to format source code, and this could result in inconsistent usage over time. Signed-off-by: Blaine Gardner <blaine.gardner@ibm.com>

ci: check for gofmt usage in golangci-lint

If the configmap rook-config-override is empty, there is no need to trigger the reconcile to update the ceph daemons. This configmap update is causing unnecessary reconciles periodically in some clusters even when it is empty. Signed-off-by: travisn <tnielsen@redhat.com>

core: Skip reconcile if override configmap is empty

Nikhil-Ladha · 2024-02-05T08:09:46Z

With the goal of having the rook CSV separated out from ocs-operator, we need to start maintaining the downstream rook repo and sync the latest commits daily from upstream repo and have the CSV generation script added here to be referenced in the ocs-operator code.
This sync can be handled by the automated pipeline that we have for csi-addons, ceph-csi and ramen.

csi: update network fence CR name

…check object: add check specific to name and namespace for ceph cosi driver

the disk size in the github action machine has increased from 64G to 75G. Now, we detech the version automatically not fetching hard coded value. Co-authored-by: Jan Klippel <jan.klippel@uhurutec.com> Signed-off-by: subhamkrai <srai@redhat.com>

ci: disk in github action increased to 75G from 64G

The 'extra' block device attached to GH actions runners has changed size twice in 3 months. The previous strategy of detecting the disk by size is becoming harder to maintain. Additionally, the block size with recent changes (75G) is now the same as the boot device (also 75G), making the method inexact. The method can now be summarized as, "find the boot disk and choose the disk that isn't the boot disk to be the 'extra' one used." Prior to this, we used a one-liner based on `lsblk`. While we could still make this a one-liner, the method is now updated to 2 effective lines, plus debug text output to stderr to help if we need to debug further in the future. Of note, the 'extra' disk has a mount point of "/mnt", but it is unclear whether this is a reliable heuristic for detecting the extra disk. For years now, GH action runners have had only 2 disks. Therefore, it seems slightly more likely that a heuristic to "choose the non-boot disk" will be a more robust long-term solution. If this strategy proves to be unreliable in the future, it may be wise to consider whether "the device with a partition mounted to '/mnt'" would be a good alternative. Signed-off-by: Blaine Gardner <blaine.gardner@ibm.com>

ci: fix detection of GH actions extra disk

Nikhil-Ladha · 2024-02-06T06:52:06Z

@subhamkrai can you please take a look at the prow/unit test failure once and see if something could be done about it?

This PR fixes the failure while running multicluster mirroring CI tests Signed-off-by: sp98 <sapillai@redhat.com>

ci: remove `/dev` prefix from the `TEST_SCRATCH_DEVICE` and `deviceFilter` spec.

this kernal version is greater than 5.11 let's use mounter kernal instead of fuse. Signed-off-by: subhamkrai <srai@redhat.com>

test: use mounter kernal instead of fuse

subhamkrai · 2024-02-06T14:22:38Z

@subhamkrai can you please take a look at the prow/unit test failure once and see if something could be done about it?

looking

subhamkrai · 2024-02-06T14:25:35Z

@subhamkrai can you please take a look at the prow/unit test failure once and see if something could be done about it?

@Nikhil-Ladha I'm not sure about this error

ERRO[2024-02-06T14:20:58Z] Some steps failed:                           
ERRO[2024-02-06T14:20:58Z] 
  * could not run steps: step unit failed: test "unit" failed: could not watch pod: the pod ci-op-mrip83sl/unit failed after 1m10s (failed containers: test): ContainerFailed one or more containers exited
Container test exited with code 1, reason Error
---

I think Nitin/Malay could help with Prow issue

Nikhil-Ladha · 2024-02-06T14:34:57Z

I guess we should be fine to disable it, anyway I don't see this job running on release branches

…ain-active-clean core: set blocking PDB even if no unhealthy PGs appear

openshift-ci · 2024-02-06T18:49:06Z

@Nikhil-Ladha: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/unit	`4e0c4f6`	link	true	`/test unit`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Nikhil-Ladha · 2024-02-06T18:52:34Z

@travisn if we are good with merging this PR for now, let's merge this and I will enable the daily sync for the repo while we discuss on the csv approach on the design doc.
You might have to override the prow test for this PR, and I will disable it tomorrow for the master branch.

travisn · 2024-02-06T19:41:59Z

/approve
/lgtm

openshift-ci · 2024-02-06T19:42:14Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Nikhil-Ladha, travisn

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

obnoxxx and others added 30 commits November 29, 2023 20:23

osd: add callback function in osd removal

927af7c

Adding callback function in the osd removal method as in downstream there is requirement of adding extra check before proceeding with osd removal. Signed-off-by: subhamkrai <srai@redhat.com>

Merge pull request #13281 from subhamkrai/add-callback-osd-removal

ee14ecb

osd: add callback function in osd removal

test: fix how we obtain the dashboard endpoint

b739f29

current code gets the ip:port for the dashboard by using the ip of the mgr pods. This works great when there's only one mgr but it fails in case of a multi-node cluster Signed-off-by: Redouane Kachach <rkachach@redhat.com>

Merge pull request #13293 from rkachach/fix_issue_dashboard_ipaddr

d7b06ba

test: fix how we obtain the dashboard endpoint

test: improve examples directory checking

b02a214

Signed-off-by: Redouane Kachach <rkachach@redhat.com>

test: moving parameters default to init_vars function

f576742

Signed-off-by: Redouane Kachach <rkachach@redhat.com>

test: fixing namespace to get dashboard endpoint

e122c01

Signed-off-by: Redouane Kachach <rkachach@redhat.com>

Merge pull request #13222 from obnoxxx/mgr-update-fix

e391dea

mgr: honor the ContinueUpgradeAfterChecksEvenIfNotHealthy flag

Merge pull request #13266 from parth-gr/svg-name

1efb694

subvolumegroup: add name spec in subvolumegroup CRD

Merge pull request #13288 from travisn/helm-operator-ns

09844e8

helm: Add namespace to all resource templates

core: remove webhook & controller-runtime from apis

28cc1eb

This commits removes controller-runtime dependencies from the apis dir and to achieve that we are removing webhook. Signed-off-by: subhamkrai <srai@redhat.com>

build: missing DeepCopy code for CSIDriverSpec

1153c10

There was a missing codegen deep copy by running `make codegen` it updated it Signed-off-by: Rakshith R <rar@redhat.com>

Merge pull request #13305 from Rakshith-R/fix-codegen

eacc7e7

build: missing DeepCopy code for CSIDriverSpec

Merge pull request #13261 from subhamkrai/remove-controller-runtime

f4d67fd

core: remove webhook & controller-runtime from apis

Merge pull request #13177 from subhamkrai/add-cel

20ce9da

webhook: add validating admission policy

Merge pull request #13196 from parth-gr/external-rados

25f5a23

external: add support for rados namespace for external cluster

ci: mergify rules for 1.13

ec2136a

Add mergify rules for opening backport PRs to the release-1.13 branch and also for auto-merging backports after the CI passes. Signed-off-by: travisn <tnielsen@redhat.com>

doc: improve ceph-csi-drivers.md and lintrolling

59d0240

Signed-off-by: Anthony D'Atri <anthonyeleven@users.noreply.github.com>

Merge pull request #13311 from travisn/mergify-1.13

55b6a37

ci: Mergify rules for release-1.13

helm: fix namespace for objstore ingress

d65c5b6

Signed-off-by: Cyril Jouve <jv.cyril@gmail.com>

Merge pull request #13246 from koor-tech/ceph_config_via_cluster_crd_…

3097455

…impl operator: allow setting ceph config options via ceph cluster crd

Merge pull request #12977 from anthonyeleven/improve-csi-addons

22b95db

12976 : Improve Documentation/Storage-Configuration/Ceph-CSI/ceph-csi…

Merge pull request #12983 from mashetty330/fix12660

9101dd8

Add toleration of 5 seconds to rook-ceph operator deployment to override default toleration seconds of 300 seconds

subhamkrai and others added 7 commits February 1, 2024 16:33

Merge pull request #13639 from rook/dependabot/go_modules/github-depe…

949016d

…ndencies-99794afd47 build(deps): bump the github-dependencies group with 5 updates

Merge pull request #13653 from travisn/exporter-delete-ns

e39ee8c

exporter: Don't delete exporter service on daemon deletion

Merge pull request #13663 from iPraveenParihar/fix/netNamespaceFilePa…

60d6fe5

…th-generation csi: Fix NetNamespaceFilePath generation with namespace instead of name

ci: check for gofmt usage in golangci-lint

c98ce0a

This is needed because not all contributors may be using gomft to format source code, and this could result in inconsistent usage over time. Signed-off-by: Blaine Gardner <blaine.gardner@ibm.com>

Merge pull request #13670 from BlaineEXE/ci-add-gofmt-check

21a99f3

ci: check for gofmt usage in golangci-lint

Merge pull request #13652 from travisn/cm-skip-reconcile

f5fcf96

core: Skip reconcile if override configmap is empty

travisn and others added 6 commits February 5, 2024 07:10

Merge pull request #13615 from riya-singhal31/node-name

73427c9

csi: update network fence CR name

Merge pull request #13623 from thotz/ceph-cosi-driver-name-namespace-…

7f7e8d4

…check object: add check specific to name and namespace for ceph cosi driver

Merge pull request #13675 from subhamkrai/ci-fix-disk-size

577c7cc

ci: disk in github action increased to 75G from 64G

Merge pull request #13694 from BlaineEXE/ci-fix-extra-disk-selection

a5abeab

ci: fix detection of GH actions extra disk

sp98 and others added 4 commits February 6, 2024 12:54

ci: fix ci failure for multicluster tests

45adae1

This PR fixes the failure while running multicluster mirroring CI tests Signed-off-by: sp98 <sapillai@redhat.com>

Merge pull request #13697 from sp98/fix-ci

4ca474f

ci: remove `/dev` prefix from the `TEST_SCRATCH_DEVICE` and `deviceFilter` spec.

test: use mounter kernal instead of fuse

abc272c

this kernal version is greater than 5.11 let's use mounter kernal instead of fuse. Signed-off-by: subhamkrai <srai@redhat.com>

Merge pull request #13699 from subhamkrai/fix-smoke-suite

39f4458

test: use mounter kernal instead of fuse

Merge pull request #13511 from ushitora-anqou/set-pdb-even-if-pgs-rem…

4e0c4f6

…ain-active-clean core: set blocking PDB even if no unhealthy PGs appear

openshift-ci bot assigned travisn Feb 6, 2024

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Feb 6, 2024

travisn merged commit f70ca3e into red-hat-storage:master Feb 6, 2024
45 of 49 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync latest master from upstream rook #567

Sync latest master from upstream rook #567

Nikhil-Ladha commented Feb 5, 2024

Nikhil-Ladha commented Feb 5, 2024

Nikhil-Ladha commented Feb 6, 2024

subhamkrai commented Feb 6, 2024

subhamkrai commented Feb 6, 2024

Nikhil-Ladha commented Feb 6, 2024

openshift-ci bot commented Feb 6, 2024

Nikhil-Ladha commented Feb 6, 2024

travisn commented Feb 6, 2024

openshift-ci bot commented Feb 6, 2024

Sync latest master from upstream rook #567

Sync latest master from upstream rook #567

Conversation

Nikhil-Ladha commented Feb 5, 2024

Nikhil-Ladha commented Feb 5, 2024

Nikhil-Ladha commented Feb 6, 2024

subhamkrai commented Feb 6, 2024

subhamkrai commented Feb 6, 2024

Nikhil-Ladha commented Feb 6, 2024

openshift-ci bot commented Feb 6, 2024

Nikhil-Ladha commented Feb 6, 2024

travisn commented Feb 6, 2024

openshift-ci bot commented Feb 6, 2024