Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add design doc about restore processes #28

Merged
merged 2 commits into from
Jul 11, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
123 changes: 96 additions & 27 deletions docs/design.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ We want to backup and restore RBD PVCs managed by a Rook/Ceph cluster, either by

1. Backup arbitrary RBD PVCs.
2. Restore RBD PVCs from backups.
3. Backup arbitary RBD PVCs periodically.
3. Backup arbitrary RBD PVCs periodically.
4. Copy backup data to another cluster in another data center.

Currently, the goal 1 and 3 are implemented. Other goals will be achieved later.
Expand All @@ -21,39 +21,71 @@ Currently, the goal 1 and 3 are implemented. Other goals will be achieved later.
flowchart LR

style Architecture fill:#FFFFFF

USER([User])

subgraph Architecture

USER([User])
RBSC[mantle-controller]
RPB[MantleBackup]
PVC[PersistentVolumeClaim]
PV[PersistentVolume]
RI[RBD Image]
RS[RBD Snapshot]
MBC[MantleBackupConfig]
MBCCronJob[CronJob]
%% restore
MR -- point --> MB
MRR -- watch --> MR
MRR -- create/delete --> RC
MRR -- create/delete --> RES_PVC
MRR -- create/delete --> RES_PV
USER -- create/delete --> MR
RES_PVC -- consume --> RES_PV
MR -.-|related| RC
RES_PV -- point --> RC
RC -- point --> RS

%% backup config
MBCCronJob -- create/delete --> MB
MBCR -- watch --> MBC
MBC -- point --> SRC_PVC
MBCR -- create --> MBCCronJob
MBCCronJob -.-|related| MBC

%% backup
MB -.-|related| RS
USER -- create/delete --> MB
MBR -- watch --> MB
MB -- point --> SRC_PVC
SRC_PVC -- consume --> SRC_PV
USER -- create/delete --> MBC
MBR -- create/delete --> RS
SRC_PV -- point --> RI
RS -- point --> RI

subgraph Kubernetes Layer
USER -- create/delete --> RPB
RBSC -- watch --> RPB
RPB -- point --> PVC
PVC -- consume --> PV
USER -- create/delete --> MBC
RBSC -- watch --> MBC
RBSC -- create --> MBCCronJob
MBCCronJob -- create/delete --> RPB
MBCCronJob -.-|related| MBC
MBC -- point --> PVC
end

subgraph Ceph Layer
RBSC -- create/delete --> RS
PV -- point --> RI
RS -- point --> RI

RI[RBD Image]
RS[RBD Snapshot]
RC[RBD cloned Image]
end


subgraph Kubernetes Layer

SRC_PVC[source PersistentVolumeClaim]
SRC_PV[source PersistentVolume]

subgraph Mantle controller
MBCR[MantleBackupConfigReconciler]
MBR[MantleBackupReconciler]
MRR[MantleRestoreReconciler]
end

subgraph Backup related manifests
MBC[MantleBackupConfig]
MBCCronJob[CronJob]
MB[MantleBackup]
end

subgraph Restore related manifests
MR[MantleRestore]
RES_PVC[restored PersistentVolumeClaim]
RES_PV[restored PersistentVolume]
end
end
end
```

Expand Down Expand Up @@ -88,6 +120,7 @@ apiVersion: mantle.cybozu.io/v1
kind: MantleBackup
metadata:
name: <MantleBackup resource name>
namespace: <should be the same as the target PVC>
spec:
# The name of the backup target PVC
pvc: <target PVC name>
Expand All @@ -111,3 +144,39 @@ spec:
expire: 2w # when the MantleBackups generated by this MantleBackupConfig should expire.
suspend: false # whether the periodic backup is active or not.
```

### Restore flow

Precondition: Process will not start until conditions are met.
- The target MantleBackup must exist and be ready to use.

1. Users create a `MantleRestore` resource.
2. The controller gets the target MantleBackup from the `MantleRestore` resource.
3. The controller stores the pool name for the `status.pool` field and cluster ID for the `status.clusterID` field. This value is used to remove the restored PV/PVC when the MantleRestore resource is deleted.
4. The controller gets backup target RBD snapshot name from the MantleBackup.
5. The controller creates a new RBD clone image from the RBD snapshot.
6. The controller creates a new PV/PVC using the above-mentioned RBD clone image.

### Cleanup restore flow

1. Users delete the `MantleRestore` resource.
2. The controller tries to delete the PV/PVC created by the `MantleRestore` resource and wait until the Pod consuming the PV/PVC are stopped and deleted.
3. The controller removes the RBD clone image created by the `MantleRestore` resource. However, the controller should not remove the RBD clone image if the previous step is not completed and a PV/PVC exists.

#### The manifest to get restore PV/PVC from a backup

```yaml
apiVersion: mantle.cybozu.io/v1
kind: MantleRestore
metadata:
name: <MantleRestore resource name>
namespace: <should be the same as the target MantleBackup>
spec:
# The name of the restore target backup
backup: <MantleBackup resource name>
status:
conditions:
# The corresponding restore PV/PVC is ready to use if `status` is "True"
- type: "ReadyToUse"
status: "True"
```
Loading