Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync latest master from upstream rook #567

Merged
merged 1,990 commits into from
Feb 6, 2024
Merged

Conversation

Nikhil-Ladha
Copy link
Member

Description of your changes:
This PR syncs the latest code from upstream master to downstream master branch.

Checklist:

  • Commit Message Formatting: Commit titles and messages follow guidelines in the developer guide).
  • Skip Tests for Docs: If this is only a documentation change, add the label skip-ci on the PR.
  • Reviewed the developer guide on Submitting a Pull Request
  • Pending release notes updated with breaking and/or notable changes for the next minor release.
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.

obnoxxx and others added 30 commits November 29, 2023 20:23
Fixes: #13167

Previously, the mgr did not honor the flag
ContinueUpgradeAfterChecksEvenIfNotHealthy
from the cluster spec. Only osd, mds, and rgw did.

To render the update behavior correct and complete across the daemons, this
change implements the honoring of the flag for the mgr.

Signed-off-by: Michael Adam <obnox@samba.org>
It was somewhat difficult to find an appropriate and allowed commit
prefix for changes to the k8sutil package. This changes removes this
problem by addink "k8sutil" to the list of allowed prefixes in our
commitlint configuration

Signed-off-by: Michael Adam <obnox@samba.org>
The helm templates assumed that the resources would be installed
to the given namespace for the helm install or upgrade. This works
perfectly until there is a desire to extract the manifests from
the helm chart and instead install with those. Thus, the namespace
is added to all the resources in the chart where they were
missing.

Signed-off-by: travisn <tnielsen@redhat.com>
Adding callback function in the osd removal method
as in downstream there is requirement of adding extra
check before proceeding with osd removal.

Signed-off-by: subhamkrai <srai@redhat.com>
osd: add callback function in osd removal
current code gets the ip:port for the dashboard by using
the ip of the mgr pods. This works great when there's only
one mgr but it fails in case of a multi-node cluster

Signed-off-by: Redouane Kachach <rkachach@redhat.com>
Originally we create it using this cmd
ceph fs subvolume create <vol_name> <subvol_name>
So we can have 2 variables filesystem and subvolume name,
Currently the CR doesn't allow us to make subvolume-name
as constant as needed to "csi" because of k8s limitations

Signed-off-by: parth-gr <paarora@redhat.com>
test: fix how we obtain the dashboard endpoint
Signed-off-by: Redouane Kachach <rkachach@redhat.com>
Signed-off-by: Redouane Kachach <rkachach@redhat.com>
at this moment we have two different cluster spec files for testing
cluster-test.yaml and cluster-on-pvc-minikube.yaml. With the new
option user can choose which one to use to bootstrap the cluster

Signed-off-by: Redouane Kachach <rkachach@redhat.com>
Signed-off-by: Redouane Kachach <rkachach@redhat.com>
mgr: honor the ContinueUpgradeAfterChecksEvenIfNotHealthy flag
subvolumegroup: add name spec in subvolumegroup CRD
helm: Add namespace to all resource templates
This commits removes controller-runtime dependencies
from the apis dir and to achieve that we are removing
webhook.

Signed-off-by: subhamkrai <srai@redhat.com>
There was a missing codegen deep copy
by running `make codegen` it updated it

Signed-off-by: Rakshith R <rar@redhat.com>
build: missing DeepCopy code for CSIDriverSpec
this commit adds validating admission policy
for cephcluster cr according to webhook rules.
Not all the webhook can be moved to validating
admission policy for example checking multus
selector validation.

Signed-off-by: subhamkrai <srai@redhat.com>
core: remove webhook & controller-runtime from apis
webhook: add validating admission policy
external: add support for rados namespace for external cluster
Add mergify rules for opening backport PRs to the release-1.13
branch and also for auto-merging backports after the CI
passes.

Signed-off-by: travisn <tnielsen@redhat.com>
This implements the "Ceph Config via Ceph Cluster CRD" design document
as a `cephConfig:` structure on the CRD.
This also fixes the `yq` commands used to manipulate the
`cluster-test.yaml` that caused CI issues for this PR and potentially
unknowingly others.

Signed-off-by: Alexander Trost <galexrt@googlemail.com>
Signed-off-by: Anthony D'Atri <anthonyeleven@users.noreply.github.com>
ci: Mergify rules for release-1.13
Signed-off-by: Cyril Jouve <jv.cyril@gmail.com>
…impl

operator: allow setting ceph config options via ceph cluster crd
12976 : Improve Documentation/Storage-Configuration/Ceph-CSI/ceph-csi…
Add toleration of 5 seconds to rook-ceph operator deployment to override default toleration seconds of 300 seconds
subhamkrai and others added 7 commits February 1, 2024 16:33
…ndencies-99794afd47

build(deps): bump the github-dependencies group with 5 updates
exporter: Don't delete exporter service on daemon deletion
…th-generation

csi: Fix NetNamespaceFilePath generation with namespace instead of name
This is needed because not all contributors may be using gomft to format
source code, and this could result in inconsistent usage over time.

Signed-off-by: Blaine Gardner <blaine.gardner@ibm.com>
ci: check for gofmt usage in golangci-lint
If the configmap rook-config-override is empty,
there is no need to trigger the reconcile to update
the ceph daemons. This configmap update is causing
unnecessary reconciles periodically in some clusters
even when it is empty.

Signed-off-by: travisn <tnielsen@redhat.com>
core: Skip reconcile if override configmap is empty
@Nikhil-Ladha
Copy link
Member Author

With the goal of having the rook CSV separated out from ocs-operator, we need to start maintaining the downstream rook repo and sync the latest commits daily from upstream repo and have the CSV generation script added here to be referenced in the ocs-operator code.
This sync can be handled by the automated pipeline that we have for csi-addons, ceph-csi and ramen.

travisn and others added 6 commits February 5, 2024 07:10
csi: update network fence CR name
…check

object: add check specific to name and namespace for ceph cosi driver
the disk size in the github action machine has
increased from 64G to 75G. Now, we detech the version
automatically not fetching hard coded value.

Co-authored-by: Jan Klippel <jan.klippel@uhurutec.com>
Signed-off-by: subhamkrai <srai@redhat.com>
ci: disk in github action increased to 75G from 64G
The 'extra' block device attached to GH actions runners has changed size
twice in 3 months. The previous strategy of detecting the disk by size
is becoming harder to maintain. Additionally, the block size with recent
changes (75G) is now the same as the boot device (also 75G), making the
method inexact.

The method can now be summarized as, "find the boot disk and choose the
disk that isn't the boot disk to be the 'extra' one used."

Prior to this, we used a one-liner based on `lsblk`. While we could
still make this a one-liner, the method is now updated to 2 effective
lines, plus debug text output to stderr to help if we need to debug
further in the future.

Of note, the 'extra' disk has a mount point of "/mnt", but it is unclear
whether this is a reliable heuristic for detecting the extra disk. For
years now, GH action runners have had only 2 disks. Therefore, it seems
slightly more likely that a heuristic to "choose the non-boot disk" will
be a more robust long-term solution.

If this strategy proves to be unreliable in the future, it may be wise
to consider whether "the device with a partition mounted to '/mnt'"
would be a good alternative.

Signed-off-by: Blaine Gardner <blaine.gardner@ibm.com>
ci: fix detection of GH actions extra disk
@Nikhil-Ladha
Copy link
Member Author

@subhamkrai can you please take a look at the prow/unit test failure once and see if something could be done about it?

sp98 and others added 4 commits February 6, 2024 12:54
This PR fixes the failure while running multicluster mirroring CI tests

Signed-off-by: sp98 <sapillai@redhat.com>
ci: remove `/dev` prefix from the `TEST_SCRATCH_DEVICE`  and `deviceFilter` spec.
this kernal version is greater than 5.11 let's
use mounter kernal instead of fuse.

Signed-off-by: subhamkrai <srai@redhat.com>
test: use mounter kernal instead of fuse
@subhamkrai
Copy link

@subhamkrai can you please take a look at the prow/unit test failure once and see if something could be done about it?

looking

@subhamkrai
Copy link

@subhamkrai can you please take a look at the prow/unit test failure once and see if something could be done about it?

@Nikhil-Ladha I'm not sure about this error

ERRO[2024-02-06T14:20:58Z] Some steps failed:                           
ERRO[2024-02-06T14:20:58Z] 
  * could not run steps: step unit failed: test "unit" failed: could not watch pod: the pod ci-op-mrip83sl/unit failed after 1m10s (failed containers: test): ContainerFailed one or more containers exited
Container test exited with code 1, reason Error
---

I think Nitin/Malay could help with Prow issue

@Nikhil-Ladha
Copy link
Member Author

I guess we should be fine to disable it, anyway I don't see this job running on release branches

…ain-active-clean

core: set blocking PDB even if no unhealthy PGs appear
Copy link

openshift-ci bot commented Feb 6, 2024

@Nikhil-Ladha: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/unit 4e0c4f6 link true /test unit

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@Nikhil-Ladha
Copy link
Member Author

@travisn if we are good with merging this PR for now, let's merge this and I will enable the daily sync for the repo while we discuss on the csv approach on the design doc.
You might have to override the prow test for this PR, and I will disable it tomorrow for the master branch.

@travisn
Copy link

travisn commented Feb 6, 2024

/approve
/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Feb 6, 2024
Copy link

openshift-ci bot commented Feb 6, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Nikhil-Ladha, travisn

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@travisn travisn merged commit f70ca3e into red-hat-storage:master Feb 6, 2024
45 of 49 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.