diff --git a/docs/rfcs/0014-multi-cluster-deployment-for-k8s.md b/docs/rfcs/0014-multi-cluster-deployment-for-k8s.md new file mode 100644 index 0000000000..8eacb8976b --- /dev/null +++ b/docs/rfcs/0014-multi-cluster-deployment-for-k8s.md @@ -0,0 +1,337 @@ +- Start Date: 2024-07-25 +- Target Version: 0.49.0 + +# Summary + +This RFC proposes a new feature for k8s app to deploy resources into multi-cluster. + +# Motivation + +# Usecase +- case 1. When applying the same manifest to multiple clusters for redundant configuration +- case 2. When applying manifest with some patches applied to multiple clusters for redundant configuration +- case 3. Blue/Green Deployment across clusters + +# Detailed design + +## Overview + +We propose the feature to apply the manifests to multiple-clusters in one application. + +![image](assets/0014-pipeline-image.png) + +## How it works + +### Register Application with multiple platform providers + +On the application register part, we can choose multiple platform providers. +At first, we add the first platform provider. +If we want to use the feature for deploying multi-cluster, we can set more platform providers. This is optional. +Only the platform providers specified here can be configured for multi-target. + +![image](assets/0014-choose-multiple-providers.png) + +Also, we can check the list of platform providers on the piped list page to verify the platform providers. + +![image](assets/0014-piped-list.png) + +### QickSync + +Piped asynchronously applies the resources to each environment based on the platform provider and resourceDir specified by the user. + +For example, consider deploying a microservice called `microservice-a` to the clusters called `cluster-hoge`, `cluster-fuga`. +At first, we will prepare one application with one `app.pipecd.yaml` and some manifests like this. +Set the item `multiTarget` in spec.quickSync of app.pipecd.yaml, and set the dir containing the manifests you want to deploy and the platform provider to which you want to deploy. +This will deploy to `cluster-hoge` and `cluster-fuga` at the same time when quickSync is executed. + +``` +microservice-a +└── prd + ├── app.pipecd.yaml + ├── base + │   ├── deployment.yaml + │   ├── kustomization.yaml + │   └── service.yaml + ├── cluster-hoge + │   └── kustomization.yaml + ├── cluster-fuga + │   └── kustomization.yaml + └── kustomization.yaml +``` + +```app.pipecd.yaml +apiVersion: pipecd.dev/v1beta1 +kind: KubernetesApp +spec: + name: multi-cluster-app + labels: + env: prd + quickSync: + multiTarget: + - provider: + name: cluster-hoge # platform provider name + resourceDir: ./cluster-hoge # the resource dir + - provider: + name: cluster-fuga + resourceDir: ./cluster-fuga +``` + +**Rollback** + +Similarly, when rolling back, multiple environments are rolled back at the same time based on the information specified in `multiTarget`. +If at least one of the rollback processes succeeds, we consider the rollback successful. +This ensures that the rollback is executed for other environments even if one of the deployment environments is inaccessible. + +``` +apiVersion: pipecd.dev/v1beta1 +kind: KubernetesApp +spec: + name: multi-cluster-app + labels: + env: prd + quickSync: + multiTarget: + - provider: + name: cluster-hoge # platform provider name + resourceDir: ./cluster-hoge # the resource dir + - provider: + name: cluster-fuga + resourceDir: ./cluster-fuga +``` + + +### PipelineSync + + +Piped asynchronously applies to each environment based on the platform provider and resourceDir specified by the user for each stage. + +For example, consider deploying a microservice called `microservice-a` to the clusters called `cluster-hoge`, `cluster-fuga`. +At first, we will prepare one application with one `app.pipecd.yaml` and some manifests like this. +Set the item `multiTarget` in spec.quickSync of app.pipecd.yaml, and set the dir containing the manifests you want to deploy and the platform provider to which you want to deploy. +Also, set the item `multiTarget` in each stage config. +This allows applications to be applied to multiple environments at the same time when one stage is executed. + +``` +microservice-a +└── prd + ├── app.pipecd.yaml + ├── base + │   ├── deployment.yaml + │   ├── kustomization.yaml + │   └── service.yaml + ├── cluster-hoge + │   └── kustomization.yaml + ├── cluster-fuga + │   └── kustomization.yaml + └── kustomization.yaml +``` + +``` +apiVersion: pipecd.dev/v1beta1 +kind: KubernetesApp +spec: + name: multi-cluster-app + labels: + env: example + team: product + quickSync: + prune: true + multiTarget: + - provider: + name: cluster-hoge + resourceDir: ./cluster-hoge + - provider: + name: cluster-fuga + resourceDir: ./cluster-fuga + pipeline: + stages: + - name: K8S_CANARY_ROLLOUT + with: + replicas: 10% + multiTarget: + - provider: + name: cluster-hoge + resourceDir: ./cluster-hoge + - provider: + name: cluster-fuga + resourceDir: ./cluster-fuga +... +``` + +**Rollback** + +When rolling back, multiple environments are rolled back at the same time based on the information specified in `spec.quickSync.multiTarget`. +If at least one of the rollback processes succeeds, we consider the rollback successful. +This ensures that the rollback is executed for other environments even if one of the deployment environments is inaccessible. + + +#### Stages to be supported + +We introduce the feature into the stages where changes are made to resources on the cluster. + +- K8S_PRIMARY_ROLLOUT +- K8S_CANARY_ROLLOUT +- K8S_CANARY_CLEAN +- K8S_BASELINE_ROLLOUT +- K8S_BASELINE_CLEAN +- K8S_TRAFFIC_ROUTING + +### How to check the stage progress of each platform provider in the deployment + +Users can check stage logs for each platform provider. +In the future, we will consider visualizing the deployment environment status for each platform provider. + +![image](assets/0014-stage-log.png) + + +### Livestate View & Drift Detection + + +Currently, a livestate store exists for each platform provider. +Both Livestate View and drift detection use the values ​​obtained from the livestate store based on the appID. +Also, application : platform provider = 1:1 relationship is assumed. + +So we propose the improvement to obtain the all state from each platform provider using appID, like aggregation. +This achieves a relationship of application : platform provider = 1 : N. + +**Livestate View** + +Show livestate of all platform providers deployed by app. + +**Drift Detection** + +Performs Drift Detection based on the livestate of all platform providers deployed by the app. + +### [option] Improve kubeconfig setup on piped + +Currently, we need to prepare the kubeconfig file manually. +But it would be nice to prepare it automatically. + +It might realize it by using cloud vender feature, for example using Workload Identity on GKE, or IRSA on EKS. +It means piped get kubeconfig when it starts by using them. + +# Alternatives + +## Idea: Execute Stages in parallel within a pipeline + +![image](assets/0014-pipeline-paralell-stage.png) + +### UX + +- When registering an application + - Prepare manifests for each clusters and one app.pipecd.yaml & register on UI. + - Dir structure + +``` + - /prd + - app.pipecd.yaml + - /base + - /cluster-hoge + - /cluster-fuga +``` + +- When deploying + - Sync all clusters corresponding to prd. + +- When rolling back + - Roll back in the all previous state. + +### Pros & Cons + +**Pros** + +- Only one app setting is required. +- You can operate WaitApproval for all clusters in one place. +- Flexisible stage pipeline. + +**Cons** + +- By realizing “parallel execution of stages”, the scheduler mechanism becomes complicated. + +# Idea: Deploy to multiple Platform Providers internally + +![image](assets/0014-pipeline-already-implemented.png) + +This is already implemented as PoC↓ +- https://github.com/pipe-cd/pipecd/pull/3790 +- https://github.com/pipe-cd/pipecd/pull/3854 + +## UX + +- When registering an application + - Prepare manifests for each clusters and one app.pipecd.yaml & register on UI. + - Dir structure + +``` + - /prd + - app.pipecd.yaml + - /base + - /cluster-hoge + - /cluster-fuga +``` + +- When deploying + - Sync all clusters corresponding to prd. + +- When rolling back + - Roll back in the all previous state. + +### Pros & Cons + +**Pros** + +- Only one app setting is required. +- You can operate WaitApproval for all clusters in one place. + +**Cons** + +- Cannot support cases where you want to change the number of replicas for only some clusters. + +# Idea: Create a stage to sync apps + +![image](assets/0014-pipeline-sync-app-stage-01.png) + +![image](assets/0014-pipeline-sync-app-stage-02.png) + +### UX + +- When registering an application + - Prepare one app.pipecd.yaml as a root application with sync app stage. + - Prepare manifests and app.pipecd.yaml for each clusters and & register on UI. + - Dir structure + +``` + - /prd + - app.pipecd.yaml + - /base + - /cluster-hoge + - app.pipecd.yaml + - /cluster-fuga + - app.pipecd.yaml +``` + +- When deploying + - Sync all clusters corresponding to prd when triggering the root app. + - If you want to sync clusters partially, sync them as the each application. + +- When rolling back + - Roll back in the all previous state. + - You can select the following behavior by setting the stage. + - Rollback if any app fails + - Rollback if all apps fail + - If the deployments of the applications triggered by the sync app stage are successful, start rollback to the previous commit. + - If the deployments of the applications triggered by the sync app stage are in progress, cancel it. + +### Pros & Cons + +**Pros** + +- It is possible to sync the whole or partially. +- Deployment pipelines can be configured for each environment. + +**Cons** + +- It takes time to set the App config. +- Need a mechanism to trigger application rollback. +- You need to OK Wait Approval for each App. +- Deployment Chain already exists as a similar function. diff --git a/docs/rfcs/assets/0014-choose-multiple-providers.png b/docs/rfcs/assets/0014-choose-multiple-providers.png new file mode 100644 index 0000000000..d5bfe7fa26 Binary files /dev/null and b/docs/rfcs/assets/0014-choose-multiple-providers.png differ diff --git a/docs/rfcs/assets/0014-piped-list.png b/docs/rfcs/assets/0014-piped-list.png new file mode 100644 index 0000000000..47303c5418 Binary files /dev/null and b/docs/rfcs/assets/0014-piped-list.png differ diff --git a/docs/rfcs/assets/0014-pipeline-already-implemented.png b/docs/rfcs/assets/0014-pipeline-already-implemented.png new file mode 100644 index 0000000000..7029444ca4 Binary files /dev/null and b/docs/rfcs/assets/0014-pipeline-already-implemented.png differ diff --git a/docs/rfcs/assets/0014-pipeline-image.png b/docs/rfcs/assets/0014-pipeline-image.png new file mode 100644 index 0000000000..a20f653938 Binary files /dev/null and b/docs/rfcs/assets/0014-pipeline-image.png differ diff --git a/docs/rfcs/assets/0014-pipeline-paralell-stage.png b/docs/rfcs/assets/0014-pipeline-paralell-stage.png new file mode 100644 index 0000000000..7463c14fd8 Binary files /dev/null and b/docs/rfcs/assets/0014-pipeline-paralell-stage.png differ diff --git a/docs/rfcs/assets/0014-pipeline-sync-app-stage-01.png b/docs/rfcs/assets/0014-pipeline-sync-app-stage-01.png new file mode 100644 index 0000000000..01b3d26551 Binary files /dev/null and b/docs/rfcs/assets/0014-pipeline-sync-app-stage-01.png differ diff --git a/docs/rfcs/assets/0014-pipeline-sync-app-stage-02.png b/docs/rfcs/assets/0014-pipeline-sync-app-stage-02.png new file mode 100644 index 0000000000..47e3765af7 Binary files /dev/null and b/docs/rfcs/assets/0014-pipeline-sync-app-stage-02.png differ diff --git a/docs/rfcs/assets/0014-stage-log.png b/docs/rfcs/assets/0014-stage-log.png new file mode 100644 index 0000000000..a59c6c8580 Binary files /dev/null and b/docs/rfcs/assets/0014-stage-log.png differ