Skip to content

Commit

Permalink
Merge pull request #757 from red-hat-storage/sync_us--master
Browse files Browse the repository at this point in the history
Syncing latest changes from upstream master for rook
  • Loading branch information
subhamkrai authored Oct 18, 2024
2 parents 84dc8b2 + c3c58a1 commit ae7ca5b
Show file tree
Hide file tree
Showing 11 changed files with 140 additions and 117 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -23,12 +23,97 @@ In short, as the documentation describes it:
Created by cluster administrators to describe how volume group snapshots
should be created. including the driver information, the deletion policy, etc.

## Volume Group Snapshots
## RBD Volume Group Snapshots

### RBD VolumeGroupSnapshotClass

In [VolumeGroupSnapshotClass](https://github.com/rook/rook/tree/master/deploy/examples/csi/rbd/groupsnapshotclass.yaml),
the `csi.storage.k8s.io/group-snapshotter-secret-name` parameter references the
name of the secret created for the rbd-plugin and `pool` to reflect the Ceph pool name.

In the `VolumeGroupSnapshotClass`, update the value of the `clusterID` field to match the namespace
that Rook is running in. When Ceph CSI is deployed by Rook, the operator will automatically
maintain a configmap whose contents will match this key. By default this is
"rook-ceph".

```console
kubectl create -f deploy/examples/csi/rbd/groupsnapshotclass.yaml
```

### RBD VolumeGroupSnapshot

In [VolumeGroupSnapshot](https://github.com/rook/rook/tree/master/deploy/examples/csi/rbd/groupsnapshot.yaml),
`volumeGroupSnapshotClassName` is the name of the `VolumeGroupSnapshotClass`
previously created. The labels inside `matchLabels` must be present on the
PVCs that are already created by the RBD CSI driver.

```console
kubectl create -f deploy/examples/csi/rbd/groupsnapshot.yaml
```

### Verify RBD GroupSnapshot Creation

```console
$ kubectl get volumegroupsnapshotclass
NAME DRIVER DELETIONPOLICY AGE
csi-rbdplugin-groupsnapclass rook-ceph.rbd.csi.ceph.com Delete 21m
```

```console
$ kubectl get volumegroupsnapshot
NAME READYTOUSE VOLUMEGROUPSNAPSHOTCLASS VOLUMEGROUPSNAPSHOTCONTENT CREATIONTIME AGE
rbd-groupsnapshot true csi-rbdplugin-groupsnapclass groupsnapcontent-d13f4d95-8822-4729-9586-4f222a3f788e 5m37s 5m39s
```

The snapshot will be ready to restore to a new PVC when `READYTOUSE` field of the
`volumegroupsnapshot` is set to true.

### Restore the RBD volume group snapshot to a new PVC

Find the name of the snapshots created by the `VolumeGroupSnapshot` first by running:

```console
$ kubectl get volumegroupsnapshot/rbd-groupsnapshot -o=jsonpath='{range .status.pvcVolumeSnapshotRefList[*]}PVC: {.persistentVolumeClaimRef.name}, Snapshot: {.volumeSnapshotRef.name}{"\n"}{end}'
PVC: rbd-pvc, Snapshot: snapshot-9d21b143904c10f49ddc92664a7e8fe93c23387d0a88549c14337484ebaf1011-2024-09-12-3.49.13
```

It will list the PVC's name followed by its snapshot name.

In
[pvc-restore](https://github.com/rook/rook/tree/master/deploy/examples/csi/rbd/pvc-restore.yaml),
`dataSource` is one of the `Snapshot` that we just
found. The `dataSource` kind must be the `VolumeSnapshot`.

Create a new PVC from the snapshot

```console
kubectl create -f deploy/examples/csi/rbd/pvc-restore.yaml
```

### Verify RBD Restore PVC Creation

```console
$ kubectl get pvc
rbd-pvc Bound pvc-9ae60bf9-4931-4f9a-9de1-7f45f31fe4da 1Gi RWO rook-cephfs <unset> 171m
rbd-pvc-restore Bound pvc-b4b73cbb-5061-48c7-9ac8-e1202508cf97 1Gi RWO rook-cephfs <unset> 46s
```

### RBD volume group snapshot resource Cleanup

To clean the resources created by this example, run the following:

```console
kubectl delete -f deploy/examples/csi/rbd/pvc-restore.yaml
kubectl delete -f deploy/examples/csi/rbd/groupsnapshot.yaml
kubectl delete -f deploy/examples/csi/rbd/groupsnapshotclass.yaml
```

## CephFS Volume Group Snapshots

### CephFS VolumeGroupSnapshotClass

In [VolumeGroupSnapshotClass](https://github.com/rook/rook/tree/master/deploy/examples/csi/cephfs/groupsnapshotclass.yaml),
the `csi.storage.k8s.io/group-snapshotter-secret-name` parameter should reference the
the `csi.storage.k8s.io/group-snapshotter-secret-name` parameter references the
name of the secret created for the cephfs-plugin.

In the `VolumeGroupSnapshotClass`, update the value of the `clusterID` field to match the namespace
Expand All @@ -43,8 +128,8 @@ kubectl create -f deploy/examples/csi/cephfs/groupsnapshotclass.yaml
### CephFS VolumeGroupSnapshot

In [VolumeGroupSnapshot](https://github.com/rook/rook/tree/master/deploy/examples/csi/cephfs/groupsnapshot.yaml),
`volumeGroupSnapshotClassName` should be the name of the `VolumeGroupSnapshotClass`
previously created. The labels inside `matchLabels` should be present on the
`volumeGroupSnapshotClassName` is the name of the `VolumeGroupSnapshotClass`
previously created. The labels inside `matchLabels` must be present on the
PVCs that are already created by the CephFS CSI driver.

```console
Expand Down Expand Up @@ -81,8 +166,8 @@ It will list the PVC's name followed by its snapshot name.

In
[pvc-restore](https://github.com/rook/rook/tree/master/deploy/examples/csi/cephfs/pvc-restore.yaml),
`dataSource` should be one of the `Snapshot` that we just
found. The `dataSource` kind should be the `VolumeSnapshot`.
`dataSource` is one of the `Snapshot` that we just
found. The `dataSource` kind must be the `VolumeSnapshot`.

Create a new PVC from the snapshot

Expand All @@ -98,7 +183,7 @@ cephfs-pvc Bound pvc-9ae60bf9-4931-4f9a-9de1-7f45f31fe4da 1Gi
cephfs-pvc-restore Bound pvc-b4b73cbb-5061-48c7-9ac8-e1202508cf97 1Gi RWO rook-cephfs <unset> 46s
```

## CephFS volume group snapshot resource Cleanup
### CephFS volume group snapshot resource Cleanup

To clean the resources created by this example, run the following:

Expand Down
2 changes: 2 additions & 0 deletions deploy/examples/csi/cephfs/pvc.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: cephfs-pvc
# Use this with the example `groupsnapshotclass.yaml`.
# Not needed if the volume group snapshots are not required.
labels:
group: snapshot-test
spec:
Expand Down
13 changes: 13 additions & 0 deletions deploy/examples/csi/rbd/groupsnapshot.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
---
apiVersion: groupsnapshot.storage.k8s.io/v1alpha1
kind: VolumeGroupSnapshot
metadata:
name: rbd-groupsnapshot
spec:
source:
selector:
matchLabels:
# The PVCs require this label for them to be
# included in the VolumeGroupSnapshot
group: snapshot-test
volumeGroupSnapshotClassName: csi-rbdplugin-groupsnapclass
15 changes: 15 additions & 0 deletions deploy/examples/csi/rbd/groupsnapshotclass.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
---
apiVersion: groupsnapshot.storage.k8s.io/v1alpha1
kind: VolumeGroupSnapshotClass
metadata:
name: csi-rbdplugin-groupsnapclass
driver: rook-ceph.rbd.csi.ceph.com # csi-provisioner-name
parameters:
# Specify a string that identifies your cluster. Ceph CSI supports any
# unique string. When Ceph CSI is deployed by Rook use the Rook namespace,
# for example "rook-ceph".
clusterID: rook-ceph # namespace: cluster
pool: replicapool
csi.storage.k8s.io/group-snapshotter-secret-name: rook-csi-rbd-provisioner
csi.storage.k8s.io/group-snapshotter-secret-namespace: rook-ceph
deletionPolicy: Delete
4 changes: 4 additions & 0 deletions deploy/examples/csi/rbd/pvc.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,10 @@ apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: rbd-pvc
# Use this with the example `groupsnapshotclass.yaml`.
# Not needed if the volume group snapshots are not required.
labels:
group: snapshot-test
spec:
accessModes:
- ReadWriteOnce
Expand Down
3 changes: 2 additions & 1 deletion pkg/operator/ceph/cluster/nodedaemon/add.go
Original file line number Diff line number Diff line change
Expand Up @@ -177,7 +177,8 @@ func isCephPod(labels map[string]string, podName string) bool {
// will be empty since the monitors don't exist yet
isCanaryPod := strings.Contains(podName, "-canary-")
isCrashCollectorPod := strings.Contains(podName, "-crashcollector-")
if ok && !isCanaryPod && !isCrashCollectorPod {
isExporterPod := strings.Contains(podName, "-exporter-")
if ok && !isCanaryPod && !isCrashCollectorPod && !isExporterPod {
logger.Debugf("%q is a ceph pod!", podName)
return true
}
Expand Down
64 changes: 7 additions & 57 deletions pkg/operator/ceph/cluster/nodedaemon/pruner.go
Original file line number Diff line number Diff line change
Expand Up @@ -24,30 +24,20 @@ import (
"github.com/rook/rook/pkg/operator/ceph/config"
"github.com/rook/rook/pkg/operator/ceph/config/keyring"
"github.com/rook/rook/pkg/operator/ceph/controller"
cephver "github.com/rook/rook/pkg/operator/ceph/version"
"github.com/rook/rook/pkg/operator/k8sutil"
v1 "k8s.io/api/batch/v1"
"k8s.io/api/batch/v1beta1"
corev1 "k8s.io/api/core/v1"
kerrors "k8s.io/apimachinery/pkg/api/errors"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/util/version"
"sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/controller/controllerutil"
)

func (r *ReconcileNode) reconcileCrashPruner(namespace string, cephCluster cephv1.CephCluster, cephVersion *cephver.CephVersion) error {
func (r *ReconcileNode) reconcileCrashPruner(namespace string, cephCluster cephv1.CephCluster) error {
if cephCluster.Spec.CrashCollector.Disable {
logger.Debugf("crash collector is disabled in namespace %q so skipping crash retention reconcile", namespace)
return nil
}

k8sVersion, err := k8sutil.GetK8SVersion(r.context.Clientset)
if err != nil {
return errors.Wrap(err, "failed to get k8s version")
}
useCronJobV1 := k8sVersion.AtLeast(version.MustParseSemantic(MinVersionForCronV1))

objectMeta := metav1.ObjectMeta{
Name: prunerName,
Namespace: namespace,
Expand All @@ -56,13 +46,7 @@ func (r *ReconcileNode) reconcileCrashPruner(namespace string, cephCluster cephv
if cephCluster.Spec.CrashCollector.DaysToRetain == 0 {
logger.Debug("deleting cronjob if it exists...")

var cronJob client.Object
// minimum k8s version required for v1 cronJob is 'v1.21.0'. Apply v1 if k8s version is at least 'v1.21.0', else apply v1beta1 cronJob.
if useCronJobV1 {
cronJob = &v1.CronJob{ObjectMeta: objectMeta}
} else {
cronJob = &v1beta1.CronJob{ObjectMeta: objectMeta}
}
cronJob := &v1.CronJob{ObjectMeta: objectMeta}

err := r.client.Delete(r.opManagerContext, cronJob)
if err != nil {
Expand All @@ -76,15 +60,15 @@ func (r *ReconcileNode) reconcileCrashPruner(namespace string, cephCluster cephv
}
} else {
logger.Debugf("daysToRetain set to: %d", cephCluster.Spec.CrashCollector.DaysToRetain)
op, err := r.createOrUpdateCephCron(cephCluster, cephVersion, useCronJobV1)
op, err := r.createOrUpdateCephCron(cephCluster)
if err != nil {
return errors.Wrapf(err, "node reconcile failed on op %q", op)
}
logger.Debugf("cronjob successfully reconciled. operation: %q", op)
}
return nil
}
func (r *ReconcileNode) createOrUpdateCephCron(cephCluster cephv1.CephCluster, cephVersion *cephver.CephVersion, useCronJobV1 bool) (controllerutil.OperationResult, error) {
func (r *ReconcileNode) createOrUpdateCephCron(cephCluster cephv1.CephCluster) (controllerutil.OperationResult, error) {
objectMeta := metav1.ObjectMeta{
Name: prunerName,
Namespace: cephCluster.GetNamespace(),
Expand All @@ -105,7 +89,7 @@ func (r *ReconcileNode) createOrUpdateCephCron(cephCluster cephv1.CephCluster, c
},
Spec: corev1.PodSpec{
Containers: []corev1.Container{
getCrashPruneContainer(cephCluster, *cephVersion),
getCrashPruneContainer(cephCluster),
},
RestartPolicy: corev1.RestartPolicyNever,
HostNetwork: cephCluster.Spec.Network.IsHost(),
Expand All @@ -118,33 +102,11 @@ func (r *ReconcileNode) createOrUpdateCephCron(cephCluster cephv1.CephCluster, c
// To avoid this, the cronjob is configured to only count the failures
// that occurred in the last hour.
deadline := int64(60)

// minimum k8s version required for v1 cronJob is 'v1.21.0'. Apply v1 if k8s version is at least 'v1.21.0', else apply v1beta1 cronJob.
if useCronJobV1 {
r.deletev1betaJob(objectMeta)

cronJob := &v1.CronJob{ObjectMeta: objectMeta}
err := controllerutil.SetControllerReference(&cephCluster, cronJob, r.scheme)
if err != nil {
return controllerutil.OperationResultNone, errors.Errorf("failed to set owner reference of deployment %q", cronJob.Name)
}
mutateFunc := func() error {
cronJob.ObjectMeta.Labels = cronJobLabels
cronJob.Spec.JobTemplate.Spec.Template = podTemplateSpec
cronJob.Spec.Schedule = pruneSchedule
cronJob.Spec.StartingDeadlineSeconds = &deadline

return nil
}

return controllerutil.CreateOrUpdate(r.opManagerContext, r.client, cronJob, mutateFunc)
}
cronJob := &v1beta1.CronJob{ObjectMeta: objectMeta}
cronJob := &v1.CronJob{ObjectMeta: objectMeta}
err := controllerutil.SetControllerReference(&cephCluster, cronJob, r.scheme)
if err != nil {
return controllerutil.OperationResultNone, errors.Errorf("failed to set owner reference of deployment %q", cronJob.Name)
}

mutateFunc := func() error {
cronJob.ObjectMeta.Labels = cronJobLabels
cronJob.Spec.JobTemplate.Spec.Template = podTemplateSpec
Expand All @@ -157,19 +119,7 @@ func (r *ReconcileNode) createOrUpdateCephCron(cephCluster cephv1.CephCluster, c
return controllerutil.CreateOrUpdate(r.opManagerContext, r.client, cronJob, mutateFunc)
}

func (r *ReconcileNode) deletev1betaJob(objectMeta metav1.ObjectMeta) {
// delete v1beta1 cronJob on an update to v1 job,only if v1 job is not created yet
if _, err := r.context.Clientset.BatchV1().CronJobs(objectMeta.Namespace).Get(r.opManagerContext, prunerName, metav1.GetOptions{}); err != nil {
if kerrors.IsNotFound(err) {
err = r.client.Delete(r.opManagerContext, &v1beta1.CronJob{ObjectMeta: objectMeta})
if err != nil && !kerrors.IsNotFound(err) {
logger.Debugf("could not delete CronJob v1Beta1 %q. %v", prunerName, err)
}
}
}
}

func getCrashPruneContainer(cephCluster cephv1.CephCluster, cephVersion cephver.CephVersion) corev1.Container {
func getCrashPruneContainer(cephCluster cephv1.CephCluster) corev1.Container {
envVars := append(controller.DaemonEnvVars(&cephCluster.Spec), generateCrashEnvVar())
dataPathMap := config.NewDatalessDaemonDataPathMap(cephCluster.GetNamespace(), cephCluster.Spec.DataDirHostPath)
volumeMounts := controller.DaemonVolumeMounts(dataPathMap, "", cephCluster.Spec.DataDirHostPath)
Expand Down
35 changes: 2 additions & 33 deletions pkg/operator/ceph/cluster/nodedaemon/pruner_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -24,12 +24,9 @@ import (
rookclient "github.com/rook/rook/pkg/client/clientset/versioned/fake"
"github.com/rook/rook/pkg/client/clientset/versioned/scheme"
"github.com/rook/rook/pkg/clusterd"
cephver "github.com/rook/rook/pkg/operator/ceph/version"
"github.com/rook/rook/pkg/operator/test"
"github.com/stretchr/testify/assert"
v1 "k8s.io/api/batch/v1"
"k8s.io/api/batch/v1beta1"
kerrors "k8s.io/apimachinery/pkg/api/errors"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/types"
"sigs.k8s.io/controller-runtime/pkg/client/fake"
Expand All @@ -38,7 +35,6 @@ import (

func TestCreateOrUpdateCephCron(t *testing.T) {
cephCluster := cephv1.CephCluster{ObjectMeta: metav1.ObjectMeta{Namespace: "rook-ceph"}}
cephVersion := &cephver.CephVersion{Major: 17, Minor: 2, Extra: 0}
ctx := context.TODO()
context := &clusterd.Context{
Clientset: test.New(t, 1),
Expand All @@ -50,10 +46,6 @@ func TestCreateOrUpdateCephCron(t *testing.T) {
if err != nil {
assert.Fail(t, "failed to build scheme")
}
err = v1beta1.AddToScheme(s)
if err != nil {
assert.Fail(t, "failed to build scheme")
}

r := &ReconcileNode{
scheme: s,
Expand All @@ -68,34 +60,11 @@ func TestCreateOrUpdateCephCron(t *testing.T) {
},
}

cronV1Beta1 := &v1beta1.CronJob{
ObjectMeta: metav1.ObjectMeta{
Name: prunerName,
Namespace: "rook-ceph",
},
}

// check if v1beta1 cronJob is present and v1 cronJob is not
cntrlutil, err := r.createOrUpdateCephCron(cephCluster, cephVersion, false)
assert.NoError(t, err)
assert.Equal(t, cntrlutil, controllerutil.OperationResult("created"))

err = r.client.Get(ctx, types.NamespacedName{Namespace: "rook-ceph", Name: prunerName}, cronV1Beta1)
assert.NoError(t, err)

err = r.client.Get(ctx, types.NamespacedName{Namespace: "rook-ceph", Name: prunerName}, cronV1)
assert.Error(t, err)
assert.True(t, kerrors.IsNotFound(err))

// check if v1 cronJob is present and v1beta1 cronJob is not
cntrlutil, err = r.createOrUpdateCephCron(cephCluster, cephVersion, true)
// check if cronJob is created
cntrlutil, err := r.createOrUpdateCephCron(cephCluster)
assert.NoError(t, err)
assert.Equal(t, cntrlutil, controllerutil.OperationResult("created"))

err = r.client.Get(ctx, types.NamespacedName{Namespace: "rook-ceph", Name: prunerName}, cronV1)
assert.NoError(t, err)

err = r.client.Get(ctx, types.NamespacedName{Namespace: "rook-ceph", Name: prunerName}, cronV1Beta1)
assert.Error(t, err)
assert.True(t, kerrors.IsNotFound(err))
}
Loading

0 comments on commit ae7ca5b

Please sign in to comment.