Add rfc for k8s multi cluster deployment (#5069)

* Add rfc for k8s multi cluster deployment Signed-off-by: Yoshiki Fujikane <ffjlabo@gmail.com> * Add the description about piped Signed-off-by: Yoshiki Fujikane <ffjlabo@gmail.com> * Add more expected behavior for rollback Signed-off-by: Yoshiki Fujikane <ffjlabo@gmail.com> * Add more expected behavior for stage log Signed-off-by: Yoshiki Fujikane <ffjlabo@gmail.com> * Add more behavior for registering app Signed-off-by: Yoshiki Fujikane <ffjlabo@gmail.com> * Fix docs typo Signed-off-by: Yoshiki Fujikane <ffjlabo@gmail.com> --------- Signed-off-by: Yoshiki Fujikane <ffjlabo@gmail.com>
pipe-cd · Aug 9, 2024 · 58bf4ff · 58bf4ff
1 parent d6d82db
commit 58bf4ff
Show file tree

Hide file tree

Showing 9 changed files with 337 additions and 0 deletions.
diff --git a/docs/rfcs/0014-multi-cluster-deployment-for-k8s.md b/docs/rfcs/0014-multi-cluster-deployment-for-k8s.md
@@ -0,0 +1,337 @@
+- Start Date: 2024-07-25
+- Target Version: 0.49.0
+
+# Summary
+
+This RFC proposes a new feature for k8s app to deploy resources into multi-cluster.
+
+# Motivation
+
+# Usecase
+- case 1. When applying the same manifest to multiple clusters for redundant configuration
+- case 2. When applying manifest with some patches applied to multiple clusters for redundant configuration
+- case 3. Blue/Green Deployment across clusters
+
+# Detailed design
+
+## Overview
+
+We propose the feature to apply the manifests to multiple-clusters in one application.
+
+![image](assets/0014-pipeline-image.png)
+
+## How it works
+
+### Register Application with multiple platform providers
+
+On the application register part, we can choose multiple platform providers.
+At first, we add the first platform provider.
+If we want to use the feature for deploying multi-cluster, we can set more platform providers. This is optional.
+Only the platform providers specified here can be configured for multi-target.
+
+![image](assets/0014-choose-multiple-providers.png)
+
+Also, we can check the list of platform providers on the piped list page to verify the platform providers.
+
+![image](assets/0014-piped-list.png)
+
+### QickSync
+
+Piped asynchronously applies the resources to each environment based on the platform provider and resourceDir specified by the user.
+
+For example, consider deploying a microservice called `microservice-a` to the clusters called `cluster-hoge`, `cluster-fuga`.
+At first, we will prepare one application with one `app.pipecd.yaml` and some manifests like this.
+Set the item `multiTarget` in spec.quickSync of app.pipecd.yaml, and set the dir containing the manifests you want to deploy and the platform provider to which you want to deploy.
+This will deploy to `cluster-hoge` and `cluster-fuga` at the same time when quickSync is executed.
+
+```
+microservice-a
+└── prd
+    ├── app.pipecd.yaml
+    ├── base
+    │   ├── deployment.yaml
+    │   ├── kustomization.yaml
+    │   └── service.yaml
+    ├── cluster-hoge
+    │   └── kustomization.yaml
+    ├── cluster-fuga
+    │   └── kustomization.yaml
+    └── kustomization.yaml
+```
+
+```app.pipecd.yaml
+apiVersion: pipecd.dev/v1beta1
+kind: KubernetesApp
+spec:
+  name: multi-cluster-app
+  labels:
+    env: prd
+  quickSync:
+    multiTarget:
+      - provider:
+          name: cluster-hoge # platform provider name
+        resourceDir: ./cluster-hoge # the resource dir
+      - provider:
+          name: cluster-fuga
+        resourceDir: ./cluster-fuga
+```
+
+**Rollback**
+
+Similarly, when rolling back, multiple environments are rolled back at the same time based on the information specified in `multiTarget`.
+If at least one of the rollback processes succeeds, we consider the rollback successful.
+This ensures that the rollback is executed for other environments even if one of the deployment environments is inaccessible.
+
+```
+apiVersion: pipecd.dev/v1beta1
+kind: KubernetesApp
+spec:
+  name: multi-cluster-app
+  labels:
+    env: prd
+  quickSync:
+    multiTarget:
+      - provider:
+          name: cluster-hoge # platform provider name
+        resourceDir: ./cluster-hoge # the resource dir
+      - provider:
+          name: cluster-fuga
+        resourceDir: ./cluster-fuga
+```
+
+
+### PipelineSync
+
+
+Piped asynchronously applies to each environment based on the platform provider and resourceDir specified by the user for each stage.
+
+For example, consider deploying a microservice called `microservice-a` to the clusters called `cluster-hoge`, `cluster-fuga`.
+At first, we will prepare one application with one `app.pipecd.yaml` and some manifests like this.
+Set the item `multiTarget` in spec.quickSync of app.pipecd.yaml, and set the dir containing the manifests you want to deploy and the platform provider to which you want to deploy.
+Also, set the item `multiTarget` in each stage config.
+This allows applications to be applied to multiple environments at the same time when one stage is executed.
+
+```
+microservice-a
+└── prd
+    ├── app.pipecd.yaml
+    ├── base
+    │   ├── deployment.yaml
+    │   ├── kustomization.yaml
+    │   └── service.yaml
+    ├── cluster-hoge
+    │   └── kustomization.yaml
+    ├── cluster-fuga
+    │   └── kustomization.yaml
+    └── kustomization.yaml
+```
+
+```
+apiVersion: pipecd.dev/v1beta1
+kind: KubernetesApp
+spec:
+  name: multi-cluster-app
+  labels:
+    env: example
+    team: product
+  quickSync:
+    prune: true
+    multiTarget:
+      - provider:
+          name: cluster-hoge
+        resourceDir: ./cluster-hoge
+      - provider:
+          name: cluster-fuga
+        resourceDir: ./cluster-fuga
+  pipeline:
+    stages:
+      - name: K8S_CANARY_ROLLOUT
+        with:
+          replicas: 10%
+          multiTarget:
+            - provider:
+                name: cluster-hoge
+              resourceDir: ./cluster-hoge
+            - provider:
+                name: cluster-fuga
+              resourceDir: ./cluster-fuga
+...
+```
+
+**Rollback**
+
+When rolling back, multiple environments are rolled back at the same time based on the information specified in `spec.quickSync.multiTarget`.
+If at least one of the rollback processes succeeds, we consider the rollback successful.
+This ensures that the rollback is executed for other environments even if one of the deployment environments is inaccessible.
+
+
+#### Stages to be supported
+
+We introduce the feature into the stages where changes are made to resources on the cluster.
+
+- K8S_PRIMARY_ROLLOUT
+- K8S_CANARY_ROLLOUT
+- K8S_CANARY_CLEAN
+- K8S_BASELINE_ROLLOUT
+- K8S_BASELINE_CLEAN
+- K8S_TRAFFIC_ROUTING
+
+### How to check the stage progress of each platform provider in the deployment
+
+Users can check stage logs for each platform provider.
+In the future, we will consider visualizing the deployment environment status for each platform provider.
+
+![image](assets/0014-stage-log.png)
+
+
+### Livestate View & Drift Detection
+
+
+Currently, a livestate store exists for each platform provider.
+Both Livestate View and drift detection use the values obtained from the livestate store based on the appID.
+Also, application : platform provider = 1:1 relationship is assumed.
+
+So we propose the improvement to obtain the all state from each platform provider using appID, like aggregation.
+This achieves a relationship of application : platform provider = 1 : N.
+
+**Livestate View**
+
+Show livestate of all platform providers deployed by app.
+
+**Drift Detection**
+
+Performs Drift Detection based on the livestate of all platform providers deployed by the app.
+
+### [option] Improve kubeconfig setup on piped
+
+Currently, we need to prepare the kubeconfig file manually.
+But it would be nice to prepare it automatically.
+
+It might realize it by using cloud vender feature, for example using Workload Identity on GKE, or IRSA on EKS.
+It means piped get kubeconfig when it starts by using them.
+
+# Alternatives
+
+## Idea: Execute Stages in parallel within a pipeline
+
+![image](assets/0014-pipeline-paralell-stage.png)
+
+### UX
+
+- When registering an application
+    - Prepare manifests for each clusters and one app.pipecd.yaml & register on UI.
+    - Dir structure
+
+```
+    - /prd
+        - app.pipecd.yaml
+        - /base
+        - /cluster-hoge
+        - /cluster-fuga
+```
+
+- When deploying
+    - Sync all clusters corresponding to prd.
+
+- When rolling back
+    - Roll back in the all previous state.
+
+### Pros & Cons
+
+**Pros**
+
+- Only one app setting is required.
+- You can operate WaitApproval for all clusters in one place.
+- Flexisible stage pipeline.
+
+**Cons**
+
+- By realizing “parallel execution of stages”, the scheduler mechanism becomes complicated.
+
+# Idea: Deploy to multiple Platform Providers internally
+
+![image](assets/0014-pipeline-already-implemented.png)
+
+This is already implemented as PoC↓
+- https://github.com/pipe-cd/pipecd/pull/3790
+- https://github.com/pipe-cd/pipecd/pull/3854
+
+## UX
+
+- When registering an application
+    - Prepare manifests for each clusters and one app.pipecd.yaml & register on UI.
+    - Dir structure
+
+```
+    - /prd
+        - app.pipecd.yaml
+        - /base
+        - /cluster-hoge
+        - /cluster-fuga
+```
+
+- When deploying
+    - Sync all clusters corresponding to prd.
+
+- When rolling back
+    - Roll back in the all previous state.
+
+### Pros & Cons
+
+**Pros**
+
+- Only one app setting is required.
+- You can operate WaitApproval for all clusters in one place.
+
+**Cons**
+
+- Cannot support cases where you want to change the number of replicas for only some clusters.
+
+# Idea: Create a stage to sync apps
+
+![image](assets/0014-pipeline-sync-app-stage-01.png)
+
+![image](assets/0014-pipeline-sync-app-stage-02.png)
+
+### UX
+
+- When registering an application
+    - Prepare one app.pipecd.yaml as a root application with sync app stage.
+    - Prepare manifests and app.pipecd.yaml for each clusters and & register on UI.
+    - Dir structure
+
+```
+        - /prd
+            - app.pipecd.yaml
+            - /base
+            - /cluster-hoge
+                - app.pipecd.yaml
+            - /cluster-fuga
+                - app.pipecd.yaml
+```
+
+- When deploying
+    - Sync all clusters corresponding to prd when triggering the root app.
+    - If you want to sync clusters partially, sync them as the each application.
+
+- When rolling back
+    - Roll back in the all previous state.
+    - You can select the following behavior by setting the stage.
+        - Rollback if any app fails
+        - Rollback if all apps fail
+    - If the deployments of the applications triggered by the sync app stage are successful, start rollback to the previous commit.
+    - If the deployments of the applications triggered by the sync app stage are in progress, cancel it.
+
+### Pros & Cons
+
+**Pros**
+
+- It is possible to sync the whole or partially.
+- Deployment pipelines can be configured for each environment.
+
+**Cons**
+
+- It takes time to set the App config.
+- Need a mechanism to trigger application rollback.
+- You need to OK Wait Approval for each App.
+- Deployment Chain already exists as a similar function.
diff --git a/docs/rfcs/assets/0014-choose-multiple-providers.png b/docs/rfcs/assets/0014-choose-multiple-providers.png
diff --git a/docs/rfcs/assets/0014-piped-list.png b/docs/rfcs/assets/0014-piped-list.png
diff --git a/docs/rfcs/assets/0014-pipeline-already-implemented.png b/docs/rfcs/assets/0014-pipeline-already-implemented.png
diff --git a/docs/rfcs/assets/0014-pipeline-image.png b/docs/rfcs/assets/0014-pipeline-image.png
diff --git a/docs/rfcs/assets/0014-pipeline-paralell-stage.png b/docs/rfcs/assets/0014-pipeline-paralell-stage.png
diff --git a/docs/rfcs/assets/0014-pipeline-sync-app-stage-01.png b/docs/rfcs/assets/0014-pipeline-sync-app-stage-01.png
diff --git a/docs/rfcs/assets/0014-pipeline-sync-app-stage-02.png b/docs/rfcs/assets/0014-pipeline-sync-app-stage-02.png
diff --git a/docs/rfcs/assets/0014-stage-log.png b/docs/rfcs/assets/0014-stage-log.png