From 17d08f581f672d41239ce9a81da625ebdb3aa207 Mon Sep 17 00:00:00 2001 From: Timofei Larkin Date: Sun, 14 Apr 2024 11:38:28 +0300 Subject: [PATCH 01/11] Start documenting the controller's logic --- docs/DESIGN.md | 14 ++++++++++++++ 1 file changed, 14 insertions(+) create mode 100644 docs/DESIGN.md diff --git a/docs/DESIGN.md b/docs/DESIGN.md new file mode 100644 index 00000000..ffc30563 --- /dev/null +++ b/docs/DESIGN.md @@ -0,0 +1,14 @@ +# Design + +This document describes the interaction between `EtcdCluster` custom resources and other Kubernetes primitives and gives an overview of the underlying implementation. + +## Creating a cluster + +When a user adds an `EtcdCluster` resource to the Kubernetes cluster, the operator responds by creating +* A configmap holding configuration values for bootstrapping a new cluster (`ETCD_INITIAL_CLUSTER_*` environment variables). +* A headless service for intra-cluster communication. +* A statefulset with pods for the individual members of the etcd cluster. +* A service for clients' access to the etcd cluster. +* A pod disruption budget to prevent the etcd cluster from losing quorum. + +If the above is successful, the etcd cluster status is set to `Initialized`. From be237d5ce52e71d55650552c44c10c74dc83f42a Mon Sep 17 00:00:00 2001 From: Timofei Larkin Date: Fri, 19 Apr 2024 00:25:23 +0300 Subject: [PATCH 02/11] update design doc to better describe current algorithm of the controller --- docs/DESIGN.md | 21 +++++++++++++++++++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/docs/DESIGN.md b/docs/DESIGN.md index ffc30563..98386fb1 100644 --- a/docs/DESIGN.md +++ b/docs/DESIGN.md @@ -1,10 +1,25 @@ # Design -This document describes the interaction between `EtcdCluster` custom resources and other Kubernetes primitives and gives an overview of the underlying implementation. +This document describes the interaction between `EtcdCluster` custom resources and other Kubernetes +primitives and gives an overview of the underlying implementation. ## Creating a cluster -When a user adds an `EtcdCluster` resource to the Kubernetes cluster, the operator responds by creating +When a user adds an `EtcdCluster` resource to the Kubernetes cluster, the reconciler observes an +`EtcdCluster` object with an empty list of conditions in its status. This prompts it to fill the +status field with a set of default conditions, including an "etcd not ready" condtion with the +reason "waiting for first quorum". + + + +Next, the operator creates the following objects: + * A configmap holding configuration values for bootstrapping a new cluster (`ETCD_INITIAL_CLUSTER_*` environment variables). * A headless service for intra-cluster communication. * A statefulset with pods for the individual members of the etcd cluster. @@ -12,3 +27,5 @@ When a user adds an `EtcdCluster` resource to the Kubernetes cluster, the operat * A pod disruption budget to prevent the etcd cluster from losing quorum. If the above is successful, the etcd cluster status is set to `Initialized`. + +If no error happens, the statefulset is most likely not yet ready and the status is updated with "etcd cluster not ready" as it is "waiting for first quorum". Once the statefulset is ready, a reconciliation is triggered again, since the child statefulset is also being watched. Finally, the status is updated once again to a "ready" condition. From 075ebc4aff18b68002772a5364326d6266175661 Mon Sep 17 00:00:00 2001 From: Timofei Larkin Date: Fri, 3 May 2024 18:00:07 +0300 Subject: [PATCH 03/11] Add reconciliation flowchart --- docs/DESIGN.md | 25 +++++++++++++++++++++++-- 1 file changed, 23 insertions(+), 2 deletions(-) diff --git a/docs/DESIGN.md b/docs/DESIGN.md index 98386fb1..52992c11 100644 --- a/docs/DESIGN.md +++ b/docs/DESIGN.md @@ -3,6 +3,28 @@ This document describes the interaction between `EtcdCluster` custom resources and other Kubernetes primitives and gives an overview of the underlying implementation. +## Reconciliation flowchart + +```mermaid +flowchart TD + Start(Start) --> A0[Ensure\nservice] + A0 --> A1[Connect to the cluster\nand fetch all statuses] + A1 --> |Got some response| AA{Is cluster\nin quorum?} + AA -->|Yes| AAA{All reachable\nmembers have the\nsame cluster ID?} + AAA -->|Yes| AAAA{Have all\nmembers been\nreached?} + AAAA -->|Yes| AAAAA0[Ensure the StatefulSet with a replica count\nmatching the cluster size, because anything\nelse is an invalid state. Ensure the cluster\nstate ConfigMap based on the value of\nEtcdCluster.spec.replicas. Initial cluster\nstate is 'existing'.] + AAAAA0 --> AAAAA1{Is the\nEtcdCluster\nsize equal to the\nStatefulSet\nsize?} + AAAAA1 --> |Yes|AAAAAA[Set cluster\nstatus to ready.] + AAAAAA --> AAAAAAStop(Stop) + + AAA -->|No| AAAB[Cluster is in\nsplit-brain. Set\nerror status.] + AAAB -->AAABStop(Stop) + + A1 --> |No members\nreached| AB{EtcdCluster\n.spec.replicas==0?} + A1 --> |Unexpected\nerror| AC(Requeue) +``` + Next, the operator creates the following objects: @@ -29,3 +49,4 @@ Next, the operator creates the following objects: If the above is successful, the etcd cluster status is set to `Initialized`. If no error happens, the statefulset is most likely not yet ready and the status is updated with "etcd cluster not ready" as it is "waiting for first quorum". Once the statefulset is ready, a reconciliation is triggered again, since the child statefulset is also being watched. Finally, the status is updated once again to a "ready" condition. +---> From 246ba85ae6d249fba9286ea3a06ccb61d3a4d282 Mon Sep 17 00:00:00 2001 From: Timofei Larkin Date: Mon, 6 May 2024 08:00:19 +0300 Subject: [PATCH 04/11] update reconciliation flowchart --- docs/DESIGN.md | 29 +++++++++++++++++++++++------ 1 file changed, 23 insertions(+), 6 deletions(-) diff --git a/docs/DESIGN.md b/docs/DESIGN.md index 52992c11..7c681b44 100644 --- a/docs/DESIGN.md +++ b/docs/DESIGN.md @@ -11,14 +11,31 @@ flowchart TD A0 --> A1[Connect to the cluster\nand fetch all statuses] A1 --> |Got some response| AA{Is cluster\nin quorum?} AA -->|Yes| AAA{All reachable\nmembers have the\nsame cluster ID?} - AAA -->|Yes| AAAA{Have all\nmembers been\nreached?} - AAAA -->|Yes| AAAAA0[Ensure the StatefulSet with a replica count\nmatching the cluster size, because anything\nelse is an invalid state. Ensure the cluster\nstate ConfigMap based on the value of\nEtcdCluster.spec.replicas. Initial cluster\nstate is 'existing'.] - AAAAA0 --> AAAAA1{Is the\nEtcdCluster\nsize equal to the\nStatefulSet\nsize?} - AAAAA1 --> |Yes|AAAAAA[Set cluster\nstatus to ready.] - AAAAAA --> AAAAAAStop(Stop) + AAA -->|Yes| AAAA{Are the\nmember ordinals\ncontiguous?} + AAAA -->|Yes| AAAAA{Have all\nmembers been\nreached?} + AAAAA -->|Yes| AAAAAA{Is the\nStatefulSet\npresent?} + AAAAAA -->|Yes| AAAAAAA{Is it\nready?} + AAAAAAA -->|Yes| AAAAAAAA{Is its size\nequal to the\nnumber of\n members?} + AAAAAAAA -->|Yes| AAAAAAAAA{Is the\nEtcdCluster\nsize equal to the\nStatefulSet\nsize?} + AAAAAAAAA -->|Yes| AAAAAAAAAA[Set cluster\nstatus to ready.] + AAAAAAAAAA --> HappyStop([Stop]) + + AAAAAAAAA --> |No, desired\nsize larger|AAAAAAAAAB[Ensure ConfigMap with\ninitial cluster state existing\nand initial cluster URLs\nequal to current cluster\nplus one member, do\n'member add' API call and\nincrement StatefulSet size.] + AAAAAAAAAB --> ScaleUpStop([Stop]) + + AAAAAAAAA --> |No, desired\nsize smaller|AAAAAAAAAC[Member remove API\ncall, then decrement\nStatefulSet size.] + AAAAAAAAAC --> ScaleDownStop([Stop]) + + AAAAAAAA -->|No,\ngreater| AAAAAAAAB([This is 146%\nsplitbrain, stop.]) + + AAAAAAAA -->|No,\nsmaller| AAAAAAAAC([StatefulSetController\nis not working as\nit should, stop.]) + + AAAAAAA -->|No| AAAAAAAB[The non-ready replicas\nare evicted members,\nthey should be removed.] + + AAAAA -->|No| AAAAAB{asd} AAA -->|No| AAAB[Cluster is in\nsplit-brain. Set\nerror status.] - AAAB -->AAABStop(Stop) + AAAB --> AAABStop([Stop]) A1 --> |No members\nreached| AB{EtcdCluster\n.spec.replicas==0?} A1 --> |Unexpected\nerror| AC(Requeue) From b81595bbc1be746cdfc74735f9e1a41b5ddca34d Mon Sep 17 00:00:00 2001 From: Timofei Larkin Date: Wed, 8 May 2024 01:44:02 +0300 Subject: [PATCH 05/11] update flowchart --- docs/DESIGN.md | 38 +++++++++++++++++++++++--------------- 1 file changed, 23 insertions(+), 15 deletions(-) diff --git a/docs/DESIGN.md b/docs/DESIGN.md index 7c681b44..f0c477e3 100644 --- a/docs/DESIGN.md +++ b/docs/DESIGN.md @@ -11,29 +11,37 @@ flowchart TD A0 --> A1[Connect to the cluster\nand fetch all statuses] A1 --> |Got some response| AA{Is cluster\nin quorum?} AA -->|Yes| AAA{All reachable\nmembers have the\nsame cluster ID?} - AAA -->|Yes| AAAA{Are the\nmember ordinals\ncontiguous?} - AAAA -->|Yes| AAAAA{Have all\nmembers been\nreached?} - AAAAA -->|Yes| AAAAAA{Is the\nStatefulSet\npresent?} - AAAAAA -->|Yes| AAAAAAA{Is it\nready?} - AAAAAAA -->|Yes| AAAAAAAA{Is its size\nequal to the\nnumber of\n members?} - AAAAAAAA -->|Yes| AAAAAAAAA{Is the\nEtcdCluster\nsize equal to the\nStatefulSet\nsize?} - AAAAAAAAA -->|Yes| AAAAAAAAAA[Set cluster\nstatus to ready.] - AAAAAAAAAA --> HappyStop([Stop]) + AAA -->|Yes| AAAA[Promote any learners.] + AAAA -->|OK| AAAA0[Ensure configmap with initial cluster\nmatching existing members and\ncluster state=existing] + AAAA0 -->|OK| AAAA1[Ensure StatefulSet with\nreplicas = max member ordinal + 1] + AAAA1 -->|OK| AAAAA{Have all members\nbeen reached?} + AAAAA -->|Yes| AAAAAA{Is it\nready?} + AAAAAA -->|Yes| AAAAAAA{Is its size\nequal to the\nnumber of\n members?} + AAAAAAA -->|Yes| AAAAAAAA{Is the\nEtcdCluster\nsize equal to the\nStatefulSet\nsize?} + AAAAAAAA -->|Yes| AAAAAAAAA[Set cluster\nstatus to ready.] + AAAAAAAAA --> HappyStop([Stop]) - AAAAAAAAA --> |No, desired\nsize larger|AAAAAAAAAB[Ensure ConfigMap with\ninitial cluster state existing\nand initial cluster URLs\nequal to current cluster\nplus one member, do\n'member add' API call and\nincrement StatefulSet size.] - AAAAAAAAAB --> ScaleUpStop([Stop]) + AAAAAAAA --> |No, desired\nsize larger| AAAAAAAAB[Ensure ConfigMap with\ninitial cluster state existing\nand initial cluster URLs\nequal to current cluster\nplus one member, do\n'member add' API call and\nincrement StatefulSet size.] + AAAAAAAAB --> ScaleUpStop([Stop]) - AAAAAAAAA --> |No, desired\nsize smaller|AAAAAAAAAC[Member remove API\ncall, then decrement\nStatefulSet size.] - AAAAAAAAAC --> ScaleDownStop([Stop]) + AAAAAAAA --> |No, desired\nsize smaller| AAAAAAAAC[Member remove API\ncall, then decrement\nStatefulSet size\nthen delete PVC.] + AAAAAAAAC --> ScaleDownStop([Stop]) - AAAAAAAA -->|No,\ngreater| AAAAAAAAB([This is 146%\nsplitbrain, stop.]) + AAAAAAAA --> |Etcd replicas=0\nSTS replicas=1| AAAAAAAAD[Decrement\nSTS to zero] + AAAAAAAAD --> ScaleToZeroStop([Stop]) - AAAAAAAA -->|No,\nsmaller| AAAAAAAAC([StatefulSetController\nis not working as\nit should, stop.]) + AAAAAAA -->|No,\ngreater| AAAAAAAB([This is 146%\nsplitbrain, stop.]) - AAAAAAA -->|No| AAAAAAAB[The non-ready replicas\nare evicted members,\nthey should be removed.] + AAAAAAA -->|No,\nsmaller| AAAAAAAC([StatefulSetController\nis not working as\nit should, stop.]) + AAAAAA -->|No| AAAAAAB[The non-ready replicas\nare evicted members,\nthey should be removed.] + AAAAA -->|No| AAAAAB{asd} + AAAA -->|Error| AAAAB([Requeue]) + AAAA0 -->|Error| AAAAB([Requeue]) + AAAA1 -->|Error| AAAAB([Requeue]) + AAA -->|No| AAAB[Cluster is in\nsplit-brain. Set\nerror status.] AAAB --> AAABStop([Stop]) From 23004f981e4c9d2c9e9926712be18bd44d8591db Mon Sep 17 00:00:00 2001 From: Timofei Larkin Date: Thu, 9 May 2024 01:16:42 +0300 Subject: [PATCH 06/11] more updates to flowchart --- docs/DESIGN.md | 97 +++++++++++++++++++++++++++++--------------------- 1 file changed, 57 insertions(+), 40 deletions(-) diff --git a/docs/DESIGN.md b/docs/DESIGN.md index f0c477e3..613d1151 100644 --- a/docs/DESIGN.md +++ b/docs/DESIGN.md @@ -7,46 +7,63 @@ primitives and gives an overview of the underlying implementation. ```mermaid flowchart TD - Start(Start) --> A0[Ensure\nservice] - A0 --> A1[Connect to the cluster\nand fetch all statuses] - A1 --> |Got some response| AA{Is cluster\nin quorum?} - AA -->|Yes| AAA{All reachable\nmembers have the\nsame cluster ID?} - AAA -->|Yes| AAAA[Promote any learners.] - AAAA -->|OK| AAAA0[Ensure configmap with initial cluster\nmatching existing members and\ncluster state=existing] - AAAA0 -->|OK| AAAA1[Ensure StatefulSet with\nreplicas = max member ordinal + 1] - AAAA1 -->|OK| AAAAA{Have all members\nbeen reached?} - AAAAA -->|Yes| AAAAAA{Is it\nready?} - AAAAAA -->|Yes| AAAAAAA{Is its size\nequal to the\nnumber of\n members?} - AAAAAAA -->|Yes| AAAAAAAA{Is the\nEtcdCluster\nsize equal to the\nStatefulSet\nsize?} - AAAAAAAA -->|Yes| AAAAAAAAA[Set cluster\nstatus to ready.] - AAAAAAAAA --> HappyStop([Stop]) - - AAAAAAAA --> |No, desired\nsize larger| AAAAAAAAB[Ensure ConfigMap with\ninitial cluster state existing\nand initial cluster URLs\nequal to current cluster\nplus one member, do\n'member add' API call and\nincrement StatefulSet size.] - AAAAAAAAB --> ScaleUpStop([Stop]) - - AAAAAAAA --> |No, desired\nsize smaller| AAAAAAAAC[Member remove API\ncall, then decrement\nStatefulSet size\nthen delete PVC.] - AAAAAAAAC --> ScaleDownStop([Stop]) - - AAAAAAAA --> |Etcd replicas=0\nSTS replicas=1| AAAAAAAAD[Decrement\nSTS to zero] - AAAAAAAAD --> ScaleToZeroStop([Stop]) - - AAAAAAA -->|No,\ngreater| AAAAAAAB([This is 146%\nsplitbrain, stop.]) - - AAAAAAA -->|No,\nsmaller| AAAAAAAC([StatefulSetController\nis not working as\nit should, stop.]) - - AAAAAA -->|No| AAAAAAB[The non-ready replicas\nare evicted members,\nthey should be removed.] - - AAAAA -->|No| AAAAAB{asd} - - AAAA -->|Error| AAAAB([Requeue]) - AAAA0 -->|Error| AAAAB([Requeue]) - AAAA1 -->|Error| AAAAB([Requeue]) - - AAA -->|No| AAAB[Cluster is in\nsplit-brain. Set\nerror status.] - AAAB --> AAABStop([Stop]) - - A1 --> |No members\nreached| AB{EtcdCluster\n.spec.replicas==0?} - A1 --> |Unexpected\nerror| AC(Requeue) + Start(Start) --> A0[Ensure service.] + A0 --> A1[Connect to the cluster\nand fetch all statuses.] + A1 --> |Got some response| A2[Ensure ConfigMap has\nETCD_FORCE_NEW_CLUSTER=false.] + A2 --> AA{All reachable\nmembers have the\nsame cluster ID?} + AA --> |Yes| AAA{Is cluster\nin quorum?} + AAA --> |Yes| AAAA{Are all members \nmanaged by the operator?} + AAAA --> |Yes| AAAAA0[Promote any learners.] + AAAAA0 --> |OK| AAAAA1[Ensure configmap with initial cluster\nmatching existing members and\ncluster state=existing] + AAAAA1 --> |OK| AAAAA2[Ensure StatefulSet with\nreplicas = max member ordinal + 1] + AAAAA2 --> |OK| AAAAA3{Are all\nmembers healthy?} + AAAAA3 --> |Yes| AAAAAA{Are all STS pods present\nin the member list?} + AAAAAA --> |Yes| AAAAAAA{Is the\nEtcdCluster\nsize equal to the\nStatefulSet\nsize?} + AAAAAAA -->|Yes| AAAAAAAA[Set cluster\nstatus to ready.] + AAAAAAAA --> HappyStop([Stop]) + + AAAAAAA --> |No, desired\nsize larger| AAAAAAAB[Ensure ConfigMap with\ninitial cluster state existing\nand initial cluster URLs\nequal to current cluster\nplus one member, do\n'member add' API call and\nincrement StatefulSet size.] + AAAAAAAB --> ScaleUpStop([Stop]) + + AAAAAAA --> |No, desired\nsize smaller| AAAAAAAC[Member remove API\ncall, then decrement\nStatefulSet size\nthen delete PVC.] + AAAAAAAC --> ScaleDownStop([Stop]) + + AAAAAAA --> |Etcd replicas=0\nSTS replicas=1| AAAAAAAD[Decrement\nSTS to zero] + AAAAAAAD --> ScaleToZeroStop([Stop]) + + AAAAA0 -->|Error| AAAAAB([Requeue]) + AAAAA1 -->|Error| AAAAAB([Requeue]) + AAAAA2 -->|Error| AAAAAB([Requeue]) + + AAAA --> |No| AAAAB([Not implemented,\nstop.]) + + AAA --> |No| AAAB([Either the cluster will\nsoon recover when\nall pods are back online\nor something caused\ndata loss and majority\n failure simultaneously.]) + + AA --> |No| AAB[Cluster is in\nsplit-brain. Set\nerror status.] + AAB --> AABStop([Stop]) + + A1 --> |No members\nreached| AB{Is the correct\nzero-replica STS\npresent?} + AB --> |Yes| ABA{EtcdCluster\n.spec.replicas==0?} + ABA --> |Yes| ABAA([Cluster successfully\nscaled to zero, stop.]) + ABA --> |No| ABAB[Ensure ConfigMap with\nforce-new-cluster,\ninitial cluster = new,\ninitial cluster peers with\nsingle member `name`-0] + ABAB --> |OK| ABABA[Increment STS size.] + ABABA --> |OK| ABABAA([Stop]) + ABABA --> |Error| ABABAB([Requeue]) + + ABAB --> |Error| ABABAB + + AB --> |No| ABB{Is the STS\npresent at all?} + ABB --> |Yes| ABBA[Patch the STS,\nexcept for replicas] + ABBA --> |OK| ABBAA([Stop]) + ABBA --> |Error| ABBAB([Requeue]) + + ABB --> |No| ABBB[Create a zero-\nreplica STS] + ABBB --> |OK| ABBBA([Stop]) + ABBB --> |Error| ABBBB([Requeue]) + + A0 --> |Unexpected\nerror| AC(Requeue) + A1 --> |Unexpected\nerror| AC(Requeue) + A2 --> |Unexpected\nerror| A2Err(Requeue) ``` |Yes| AAAAA0[Promote any learners.] AAAAA0 --> |OK| AAAAA1[Ensure configmap with initial cluster\nmatching existing members and\ncluster state=existing] AAAAA1 --> |OK| AAAAA2[Ensure StatefulSet with\nreplicas = max member ordinal + 1] - AAAAA2 --> |OK| AAAAA3{Are all\nmembers healthy?} - AAAAA3 --> |Yes| AAAAAA{Are all STS pods present\nin the member list?} - AAAAAA --> |Yes| AAAAAAA{Is the\nEtcdCluster\nsize equal to the\nStatefulSet\nsize?} - AAAAAAA -->|Yes| AAAAAAAA[Set cluster\nstatus to ready.] - AAAAAAAA --> HappyStop([Stop]) + AAAAA2 --> |OK| AAAAAA{Are all\nmembers healthy?} + AAAAAA --> |Yes| AAAAAAA{Are all STS pods present\nin the member list?} + AAAAAAA --> |Yes| AAAAAAAA{Is the\nEtcdCluster\nsize equal to the\nStatefulSet\nsize?} + AAAAAAAA -->|Yes| AAAAAAAAA[Set cluster\nstatus to ready.] + AAAAAAAAA --> HappyStop([Stop]) - AAAAAAA --> |No, desired\nsize larger| AAAAAAAB[Ensure ConfigMap with\ninitial cluster state existing\nand initial cluster URLs\nequal to current cluster\nplus one member, do\n'member add' API call and\nincrement StatefulSet size.] - AAAAAAAB --> ScaleUpStop([Stop]) + AAAAAAAA --> |No, desired\nsize larger| AAAAAAAAB[Ensure ConfigMap with\ninitial cluster state existing\nand initial cluster URLs\nequal to current cluster\nplus one member, do\n'member add' API call and\nincrement StatefulSet size.] + AAAAAAAAB --> ScaleUpStop([Stop]) - AAAAAAA --> |No, desired\nsize smaller| AAAAAAAC[Member remove API\ncall, then decrement\nStatefulSet size\nthen delete PVC.] - AAAAAAAC --> ScaleDownStop([Stop]) + AAAAAAAA --> |No, desired\nsize smaller| AAAAAAAAC[Member remove API\ncall, then decrement\nStatefulSet size\nthen delete PVC.] + AAAAAAAAC --> ScaleDownStop([Stop]) - AAAAAAA --> |Etcd replicas=0\nSTS replicas=1| AAAAAAAD[Decrement\nSTS to zero] - AAAAAAAD --> ScaleToZeroStop([Stop]) + AAAAAAAA --> |Etcd replicas=0\nSTS replicas=1| AAAAAAAAD[Decrement\nSTS to zero] + AAAAAAAAD --> ScaleToZeroStop([Stop]) + + AAAAAA --> |No| AAAAAAB1[On timeout evict member.] + AAAAAAB1 --> AAAAAAB2[Delete PVC, ensure ConfigMap with\nmembers + this one and delete pod.] + + AAAAAAA --> |No| AAAAAAB2 AAAAA0 -->|Error| AAAAAB([Requeue]) AAAAA1 -->|Error| AAAAAB([Requeue]) @@ -65,30 +70,3 @@ flowchart TD A1 --> |Unexpected\nerror| AC(Requeue) A2 --> |Unexpected\nerror| A2Err(Requeue) ``` - From f999f5f96435e780a32c2b9b70613940ca86b5ef Mon Sep 17 00:00:00 2001 From: Timofei Larkin Date: Sun, 12 May 2024 14:30:47 +0300 Subject: [PATCH 08/11] Remove unnecessary force-new-cluster --- docs/DESIGN.md | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/docs/DESIGN.md b/docs/DESIGN.md index 52f115af..547a732b 100644 --- a/docs/DESIGN.md +++ b/docs/DESIGN.md @@ -9,8 +9,7 @@ primitives and gives an overview of the underlying implementation. flowchart TD Start(Start) --> A0[Ensure service.] A0 --> A1[Connect to the cluster\nand fetch all statuses.] - A1 --> |Got some response| A2[Ensure ConfigMap has\nETCD_FORCE_NEW_CLUSTER=false.] - A2 --> AA{All reachable\nmembers have the\nsame cluster ID?} + A1 --> |Got some response| AA{All reachable\nmembers have the\nsame cluster ID?} AA --> |Yes| AAA{Is cluster\nin quorum?} AAA --> |Yes| AAAA{Are all members \nmanaged by the operator?} AAAA --> |Yes| AAAAA0[Promote any learners.] @@ -50,7 +49,7 @@ flowchart TD A1 --> |No members\nreached| AB{Is the correct\nzero-replica STS\npresent?} AB --> |Yes| ABA{EtcdCluster\n.spec.replicas==0?} ABA --> |Yes| ABAA([Cluster successfully\nscaled to zero, stop.]) - ABA --> |No| ABAB[Ensure ConfigMap with\nforce-new-cluster,\ninitial cluster = new,\ninitial cluster peers with\nsingle member `name`-0] + ABA --> |No| ABAB[Ensure ConfigMap with\ninitial cluster = new,\ninitial cluster peers with\nsingle member `name`-0] ABAB --> |OK| ABABA[Increment STS size.] ABABA --> |OK| ABABAA([Stop]) ABABA --> |Error| ABABAB([Requeue]) @@ -68,5 +67,4 @@ flowchart TD A0 --> |Unexpected\nerror| AC(Requeue) A1 --> |Unexpected\nerror| AC(Requeue) - A2 --> |Unexpected\nerror| A2Err(Requeue) ``` From 9509c1e6487b4c7a614b5c99ea44434c2eb9db2c Mon Sep 17 00:00:00 2001 From: Hidden Marten Date: Sat, 1 Jun 2024 12:05:08 +0200 Subject: [PATCH 09/11] svg append --- docs/sts-flow.svg | 4 ++++ 1 file changed, 4 insertions(+) create mode 100644 docs/sts-flow.svg diff --git a/docs/sts-flow.svg b/docs/sts-flow.svg new file mode 100644 index 00000000..aea244ba --- /dev/null +++ b/docs/sts-flow.svg @@ -0,0 +1,4 @@ + + + +
No
No
Yes
Yes
Does STS exist?
Does STS exist?
Error
Error
Ok
Ok
Create STS
Create STS
Requeue
Requeue
Stop
Stop
Yes
Yes
No
No
Is
.spec.replicas==0
in existing STS?
Is...
Yes
Yes
No
No
Is
.spec.replicas==0
CR?
Is...
Requeue
Requeue
Error
Error
Yes
Yes
Ensure ConfigMap with
initial cluster = new,
initial cluster peers with
single member `name`-0
Ensure ConfigMap with...
Requeue
Requeue
Error
Error
Yes
Yes
Update STS
Update STS
Stop
Stop
Text is not SVG - cannot display
From 3708e26190d3d87ab2bd4842a93ba4ac5247fa9a Mon Sep 17 00:00:00 2001 From: Timofei Larkin Date: Mon, 17 Jun 2024 23:04:43 +0300 Subject: [PATCH 10/11] updates to flowchart --- docs/DESIGN.md | 123 +++++++++++++++++++++++++------------------------ 1 file changed, 63 insertions(+), 60 deletions(-) diff --git a/docs/DESIGN.md b/docs/DESIGN.md index 547a732b..51b6a2ba 100644 --- a/docs/DESIGN.md +++ b/docs/DESIGN.md @@ -7,64 +7,67 @@ primitives and gives an overview of the underlying implementation. ```mermaid flowchart TD - Start(Start) --> A0[Ensure service.] - A0 --> A1[Connect to the cluster\nand fetch all statuses.] - A1 --> |Got some response| AA{All reachable\nmembers have the\nsame cluster ID?} - AA --> |Yes| AAA{Is cluster\nin quorum?} - AAA --> |Yes| AAAA{Are all members \nmanaged by the operator?} - AAAA --> |Yes| AAAAA0[Promote any learners.] - AAAAA0 --> |OK| AAAAA1[Ensure configmap with initial cluster\nmatching existing members and\ncluster state=existing] - AAAAA1 --> |OK| AAAAA2[Ensure StatefulSet with\nreplicas = max member ordinal + 1] - AAAAA2 --> |OK| AAAAAA{Are all\nmembers healthy?} - AAAAAA --> |Yes| AAAAAAA{Are all STS pods present\nin the member list?} - AAAAAAA --> |Yes| AAAAAAAA{Is the\nEtcdCluster\nsize equal to the\nStatefulSet\nsize?} - AAAAAAAA -->|Yes| AAAAAAAAA[Set cluster\nstatus to ready.] - AAAAAAAAA --> HappyStop([Stop]) - - AAAAAAAA --> |No, desired\nsize larger| AAAAAAAAB[Ensure ConfigMap with\ninitial cluster state existing\nand initial cluster URLs\nequal to current cluster\nplus one member, do\n'member add' API call and\nincrement StatefulSet size.] - AAAAAAAAB --> ScaleUpStop([Stop]) - - AAAAAAAA --> |No, desired\nsize smaller| AAAAAAAAC[Member remove API\ncall, then decrement\nStatefulSet size\nthen delete PVC.] - AAAAAAAAC --> ScaleDownStop([Stop]) - - AAAAAAAA --> |Etcd replicas=0\nSTS replicas=1| AAAAAAAAD[Decrement\nSTS to zero] - AAAAAAAAD --> ScaleToZeroStop([Stop]) - - AAAAAA --> |No| AAAAAAB1[On timeout evict member.] - AAAAAAB1 --> AAAAAAB2[Delete PVC, ensure ConfigMap with\nmembers + this one and delete pod.] - - AAAAAAA --> |No| AAAAAAB2 - - AAAAA0 -->|Error| AAAAAB([Requeue]) - AAAAA1 -->|Error| AAAAAB([Requeue]) - AAAAA2 -->|Error| AAAAAB([Requeue]) - - AAAA --> |No| AAAAB([Not implemented,\nstop.]) - - AAA --> |No| AAAB([Either the cluster will\nsoon recover when\nall pods are back online\nor something caused\ndata loss and majority\n failure simultaneously.]) - - AA --> |No| AAB[Cluster is in\nsplit-brain. Set\nerror status.] - AAB --> AABStop([Stop]) - - A1 --> |No members\nreached| AB{Is the correct\nzero-replica STS\npresent?} - AB --> |Yes| ABA{EtcdCluster\n.spec.replicas==0?} - ABA --> |Yes| ABAA([Cluster successfully\nscaled to zero, stop.]) - ABA --> |No| ABAB[Ensure ConfigMap with\ninitial cluster = new,\ninitial cluster peers with\nsingle member `name`-0] - ABAB --> |OK| ABABA[Increment STS size.] - ABABA --> |OK| ABABAA([Stop]) - ABABA --> |Error| ABABAB([Requeue]) - - ABAB --> |Error| ABABAB - - AB --> |No| ABB{Is the STS\npresent at all?} - ABB --> |Yes| ABBA[Patch the STS,\nexcept for replicas] - ABBA --> |OK| ABBAA([Stop]) - ABBA --> |Error| ABBAB([Requeue]) - - ABB --> |No| ABBB[Create a zero-\nreplica STS] - ABBB --> |OK| ABBBA([Stop]) - ABBB --> |Error| ABBBB([Requeue]) - - A0 --> |Unexpected\nerror| AC(Requeue) - A1 --> |Unexpected\nerror| AC(Requeue) + Start(Start) --> A[Ensure service.] + A --> AA{Are there any\nendpoints?} + AA --> |Yes| AAA[Connect to the cluster\nand fetch all statuses.] + AAA --> |Got some response| AAAA{All reachable\nmembers have the\nsame cluster ID?} + AAAA --> |Yes| AAAAA{Is cluster\nin quorum?} + AAAAA --> |Yes| AAAAAA{Are all members \nmanaged by the operator?} + AAAAAA --> |Yes| AAAAAAA["` + Promote any learners. + Ensure configmap with initial cluster matching existing members and cluster state=existing. + Ensure StatefulSet with replicas = max member ordinal + 1 + `"] + AAAAAAA --> |OK| AAAAAAAA{Are all\nmembers healthy?} + AAAAAAAA --> |Yes| AAAAAAAAA{Are all STS pods present\nin the member list?} + AAAAAAAAA --> |Yes| AAAAAAAAAA{Is the\nEtcdCluster\nsize equal to the\nStatefulSet\nsize?} + AAAAAAAAAA -->|Yes| AAAAAAAAAAA[Set cluster\nstatus to ready.] + AAAAAAAAAAA --> HappyStop([Stop]) + + AAAAAAAAAA --> |No, desired\nsize larger| AAAAAAAAAAB[Ensure ConfigMap with\ninitial cluster state existing\nand initial cluster URLs\nequal to current cluster\nplus one member, do\n'member add' API call and\nincrement StatefulSet size.] + AAAAAAAAAAB --> ScaleUpStop([Stop]) + + AAAAAAAAAA --> |No, desired\nsize smaller| AAAAAAAAAAC[Member remove API\ncall, then decrement\nStatefulSet size\nthen delete PVC.] + AAAAAAAAAAC --> ScaleDownStop([Stop]) + + AAAAAAAAAA --> |Etcd replicas=0\nSTS replicas=1| AAAAAAAAAAD[Decrement\nSTS to zero] + AAAAAAAAAAD --> ScaleToZeroStop([Stop]) + + AAAAAAAA --> |No| AAAAAAAAB1[On timeout evict member.] + AAAAAAAAB1 --> AAAAAAAAB2[Delete PVC, ensure ConfigMap with\nmembers + this one and delete pod.] + + AAAAAAAAA --> |No| AAAAAAAAB2 + + AAAAAAA -->|Error| AAAAAAAB([Requeue]) + + AAAAAA --> |No| AAAAAAB([Not implemented,\nstop.]) + + AAAAA --> |No| AAAAAB([Either the cluster will\nsoon recover when\nall pods are back online\nor something caused\ndata loss and majority\n failure simultaneously.]) + + AAAA --> |No| AAAAB[Cluster is in\nsplit-brain. Set\nerror status.] + AAAAB --> AAAABStop([Stop]) + + AAA --> |No members\nreached| AAAB{Is the STS\npresent?} + AAAB --> |Yes| AAABA{"`Does it have the correct pod spec?`"} + AAABA --> |Yes| AAABAA(["`The statefulset cannot be ready, as the ready and liveness probes must be failing. Hope it becomes ready or wait for user intervention.`"]) + AAABA --> |No| AAABAB["`Patch the podspec`"] + + AAAB --> |No| AAABB(["`Looks like it was deleted with cascade=orphan. Create it again and see what happens`"]) + + AA --> |No| AAB{Is the STS\npresent?} + AAB --> |Yes| AABA{Does it have the\ncorrect pod spec?} + AABA --> |Yes| AABAA{Is it\nready?} + AABAA --> |Yes| AABAAA{Then it must have\nspec.replicas==0\n Is EtcdCluster\n.spec.replicas==0?} + AABAAA --> |Yes| AABAAAA([Cluster successfully\nscaled to zero, stop.]) + AABAAA --> |No| AABAAAB["` + Ensure ConfigMap with initial cluster = new, + initial cluster peers with single member name-0, + increment STS size. + `"] + + AABAA --> |No| AABAAB([Stop and wait, either\nit will turn ready soon\nand the next reconcile\nwill move things along,\nor user intervention is\nneeded]) + + AABA --> |No| AABAB[Patch the podspec] + + AAB --> |No| AABB[Create configmap, initial state new\ninitial cluster according to spec.\nreplicas, create statefulset.] ``` From 8a1715525d10fbfb7c62feddcd316d491bf05b87 Mon Sep 17 00:00:00 2001 From: Timofei Larkin Date: Tue, 17 Dec 2024 18:22:17 +0300 Subject: [PATCH 11/11] Update docs/DESIGN.md Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> --- docs/DESIGN.md | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/docs/DESIGN.md b/docs/DESIGN.md index 51b6a2ba..6aba5df3 100644 --- a/docs/DESIGN.md +++ b/docs/DESIGN.md @@ -42,7 +42,15 @@ flowchart TD AAAAAA --> |No| AAAAAAB([Not implemented,\nstop.]) - AAAAA --> |No| AAAAAB([Either the cluster will\nsoon recover when\nall pods are back online\nor something caused\ndata loss and majority\n failure simultaneously.]) + AAAAA --> |No| AAAAAB([Quorum Loss Detected: + 1. Check for temporary issues: + - Network partitions + - Pod scheduling problems + 2. If temporary, wait for recovery + 3. If permanent: + - Alert operators + - Document disaster recovery steps + - Consider backup restoration]) AAAA --> |No| AAAAB[Cluster is in\nsplit-brain. Set\nerror status.] AAAAB --> AAAABStop([Stop])