From e8dac5d2d877ec25130abc607a23f7bee47af288 Mon Sep 17 00:00:00 2001 From: Yuji Ito Date: Wed, 10 Jul 2024 09:46:38 +0000 Subject: [PATCH 1/2] add design doc about restore processes Signed-off-by: Yuji Ito --- docs/design.md | 123 ++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 96 insertions(+), 27 deletions(-) diff --git a/docs/design.md b/docs/design.md index e25bc665..cac263dd 100644 --- a/docs/design.md +++ b/docs/design.md @@ -8,7 +8,7 @@ We want to backup and restore RBD PVCs managed by a Rook/Ceph cluster, either by 1. Backup arbitrary RBD PVCs. 2. Restore RBD PVCs from backups. -3. Backup arbitary RBD PVCs periodically. +3. Backup arbitrary RBD PVCs periodically. 4. Copy backup data to another cluster in another data center. Currently, the goal 1 and 3 are implemented. Other goals will be achieved later. @@ -21,39 +21,71 @@ Currently, the goal 1 and 3 are implemented. Other goals will be achieved later. flowchart LR style Architecture fill:#FFFFFF + + USER([User]) + subgraph Architecture - USER([User]) - RBSC[mantle-controller] - RPB[MantleBackup] - PVC[PersistentVolumeClaim] - PV[PersistentVolume] - RI[RBD Image] - RS[RBD Snapshot] - MBC[MantleBackupConfig] - MBCCronJob[CronJob] + %% restore + MR -- point --> MB + MRR -- watch --> MR + MRR -- create/delete --> RC + MRR -- create/delete --> RES_PVC + MRR -- create/delete --> RES_PV + USER -- create/delete --> MR + RES_PVC -- consume --> RES_PV + MR -.-|related| RC + RES_PV -- point --> RC + RC -- point --> RS + + %% backup config + MBCCronJob -- create/delete --> MB + MBCR -- watch --> MBC + MBC -- point --> SRC_PVC + MBCR -- create --> MBCCronJob + MBCCronJob -.-|related| MBC + + %% backup + MB -.-|related| RS + USER -- create/delete --> MB + MBR -- watch --> MB + MB -- point --> SRC_PVC + SRC_PVC -- consume --> SRC_PV + USER -- create/delete --> MBC + MBR -- create/delete --> RS + SRC_PV -- point --> RI + RS -- point --> RI - subgraph Kubernetes Layer - USER -- create/delete --> RPB - RBSC -- watch --> RPB - RPB -- point --> PVC - PVC -- consume --> PV - USER -- create/delete --> MBC - RBSC -- watch --> MBC - RBSC -- create --> MBCCronJob - MBCCronJob -- create/delete --> RPB - MBCCronJob -.-|related| MBC - MBC -- point --> PVC - end subgraph Ceph Layer - RBSC -- create/delete --> RS - PV -- point --> RI - RS -- point --> RI - + RI[RBD Image] + RS[RBD Snapshot] + RC[RBD cloned Image] end - + subgraph Kubernetes Layer + + SRC_PVC[source PersistentVolumeClaim] + SRC_PV[source PersistentVolume] + + subgraph Mantle controller + MBCR[MantleBackupConfigReconciler] + MBR[MantleBackupReconciler] + MRR[MantleRestoreReconciler] + end + + subgraph Backup related manifests + MBC[MantleBackupConfig] + MBCCronJob[CronJob] + MB[MantleBackup] + end + + subgraph Restore related manifests + MR[MantleRestore] + RES_PVC[restored PersistentVolumeClaim] + RES_PV[restored PersistentVolume] + end + end end ``` @@ -88,6 +120,7 @@ apiVersion: mantle.cybozu.io/v1 kind: MantleBackup metadata: name: + namespace: spec: # The name of the backup target PVC pvc: @@ -111,3 +144,39 @@ spec: expire: 2w # when the MantleBackups generated by this MantleBackupConfig should expire. suspend: false # whether the periodic backup is active or not. ``` + +### Restore flow + +Precondition: Process will not start until conditions are met. +- The target MantleBackup must exist and be ready to use. + +1. Users crate a `MantleRestore` resource. +2. The controller gets the target MantleBackup from the `MantleRestore` resource. +3. The controller stores the pool name for the `status.pool` field and cluster ID for the `status.clusterID` field. This value is used to remove the restored PV/PVC when the MantleRestore resource is deleted. +4. The controller gets backup target RBD snap image name from the MantleBackup. +5. The controller creates a new RBD clone from the RBD snap image. +6. The controller creates a new PV/PVC using the new RBD clone. + +### Cleanup restore flow + +1. Users delete the `MantleRestore` resource. +2. The controller tries to delete the PV/PVC created by the `MantleRestore` resource and wait until the PV/PVC is deleted. If the PV/PVC is used by some pod. +3. The controller removes the RBD clone created by the `MantleRestore` resource. The controller should not remove RBD clone volume specified by the PV. + +#### The manifest to get restore PV/PVC from a backup + +```yaml +apiVersion: mantle.cybozu.io/v1 +kind: MantleRestore +metadata: + name: + namespace: +spec: + # The name of the restore target backup + backup: +status: + conditions: + # The corresponding restore PV/PVC is ready to use if `status` is "True" + - type: "ReadyToUse" + status: "True" +``` From c39b66063809b766767f45e603ef25f41b736181 Mon Sep 17 00:00:00 2001 From: Yuji Ito Date: Thu, 11 Jul 2024 18:42:40 +0900 Subject: [PATCH 2/2] Apply suggestions from code review Co-authored-by: Satoru Takeuchi --- docs/design.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/docs/design.md b/docs/design.md index cac263dd..88fc6c53 100644 --- a/docs/design.md +++ b/docs/design.md @@ -150,18 +150,18 @@ spec: Precondition: Process will not start until conditions are met. - The target MantleBackup must exist and be ready to use. -1. Users crate a `MantleRestore` resource. +1. Users create a `MantleRestore` resource. 2. The controller gets the target MantleBackup from the `MantleRestore` resource. 3. The controller stores the pool name for the `status.pool` field and cluster ID for the `status.clusterID` field. This value is used to remove the restored PV/PVC when the MantleRestore resource is deleted. -4. The controller gets backup target RBD snap image name from the MantleBackup. -5. The controller creates a new RBD clone from the RBD snap image. -6. The controller creates a new PV/PVC using the new RBD clone. +4. The controller gets backup target RBD snapshot name from the MantleBackup. +5. The controller creates a new RBD clone image from the RBD snapshot. +6. The controller creates a new PV/PVC using the above-mentioned RBD clone image. ### Cleanup restore flow 1. Users delete the `MantleRestore` resource. -2. The controller tries to delete the PV/PVC created by the `MantleRestore` resource and wait until the PV/PVC is deleted. If the PV/PVC is used by some pod. -3. The controller removes the RBD clone created by the `MantleRestore` resource. The controller should not remove RBD clone volume specified by the PV. +2. The controller tries to delete the PV/PVC created by the `MantleRestore` resource and wait until the Pod consuming the PV/PVC are stopped and deleted. +3. The controller removes the RBD clone image created by the `MantleRestore` resource. However, the controller should not remove the RBD clone image if the previous step is not completed and a PV/PVC exists. #### The manifest to get restore PV/PVC from a backup