Skip to content

Commit

Permalink
Misc fixes (#50)
Browse files Browse the repository at this point in the history
* Fix misspelled occurrences of "resource"

* Fix various typos

* Fix regressions

* Rename NDB -> NDC
  • Loading branch information
komuta authored Dec 12, 2023
1 parent 0e729ad commit fcb19c3
Show file tree
Hide file tree
Showing 12 changed files with 40 additions and 35 deletions.
21 changes: 10 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
# Node Disruption Controller

Node-disruption-controller (NDB) is a way to control node disruptions (e.g. for maintenance purpose) in a Kubernetes cluster.
Node-disruption-controller (NDC) is a way to control node disruptions (e.g. for maintenance purpose) in a Kubernetes cluster.


## Motivation

The Node Disruption Controller was built to help perform maintenance operations on Kubernetes clusters running stateful workloads, especially those reliant on local storage like databases. Nodes within a Kubernetes cluster are subject to disruptions: involuntary (like hardware failures, VM premption, or crashes) and voluntary (such as node maintenance). Performing efficient yet safe maintenances when it involves multiple actors (different teams, different controllers) is challenging. The Node Disruption Controller provides an interface between the actors that want to perform maintenances on nodes (maintenance systems, autoscalers) and the ones that are operating services on top of these nodes.
The Node Disruption Controller was built to help perform maintenance operations on Kubernetes clusters running stateful workloads, especially those reliant on local storage like databases. Nodes within a Kubernetes cluster are subject to disruptions: involuntary (like hardware failures, VM preemption, or crashes) and voluntary (such as node maintenance). Performing efficient yet safe maintenances when it involves multiple actors (different teams, different controllers) is challenging. The Node Disruption Controller provides an interface between the actors that want to perform maintenances on nodes (maintenance systems, autoscalers) and the ones that are operating services on top of these nodes.

While workload operators should build systems that are resilient to involuntary disruptions, the controller helps ensure that voluntary disruptions are as safe as they can be.

Expand All @@ -17,21 +17,21 @@ Storage is represented by Persistent Volume (PV) and PV Claims (PVC). While remo

When relying on drain and PDB, data is safeguarded from eviction only if a Pod is actively running on it. Nevertheless, certain scenarios exist where Pods are not running on the node with the data, and losing a node would be costly:

- Critical rolling update: During a rolling update, Pods are restarted. If a Pod running on a node that is cordoned, preventing it from restarting on the node, the node can be drained successfully even if the PDB permits no disruptions. While the database should ideally withstand the loss of data from a single node, allowing it during a major version upgrade can be risky (Database might not allow node replacement in the middle of an upgrade).
- Critical rolling update: During a rolling update, Pods are restarted. If a Pod running on a node that is cordoned, preventing it from restarting on the node, the node can be drained successfully even if the PDB permits no disruptions. While the database should ideally withstand the loss of data from a single node, allowing it during a major version upgrade can be risky (database might not allow node replacement in the middle of an upgrade).
- Forced eviction on multiple pods: In real-world production environments, we've observed cases where node pressure leads to the eviction of multiple Pods within a StatefulSet. In such instances, node drains proceed without issue on nodes that forcefully evicted their pods. PDB is rendered useless. If multiple nodes enter maintenance mode simultaneously, data loss can occur.

The Node Disruption Controller addresses these concerns by providing that is data (PVC/PV) aware.
The Node Disruption Controller addresses these concerns by providing a primitive which is data (PVC/PV) aware.


### Service level healthcheck

Kubernetes primitives and PDB only provide pod-level healthiness protection through mechanisms like liveness and readiness probes. However stateful workloads often provide service level healthiness (data redundancy status, SLA monitoring). In cases where the service is not in a healthy state, we must avoid initiating voluntary disruptions.

Unfortunately, at the moment Kubernetes doesn't provide such API. Prior to the Node Disruption Controller (NDB), we addressed this need through periodic checks, where we set `maxUnavailable=0` if the service is unhealthy. However, asynchronous health checking is dangerous as it hard to check when the last health check was made (the periodic system could be broken and the state not updated for a long time) or it cannot be fast enough and lead to multiple evictions being granted before reacting. NDB prevents these issues by providing a synchronous health checking API.
Unfortunately, at the moment Kubernetes doesn't provide such API. Prior to the Node Disruption Controller (NDC), we addressed this need through periodic checks, where we set `maxUnavailable=0` if the service is unhealthy. However, asynchronous health checking is dangerous as it hard to check when the last health check was made (the periodic system could be broken and the state not updated for a long time) or it cannot be fast enough and lead to multiple evictions being granted before reacting. NDC prevents these issues by providing a synchronous health checking API.

## Description

The primary purpose of the Node Disruption Controller (NDB) is to provide a set of APIs to ensure safety of voluntary node-level disruptions, such as maintenances. Typically, maintenance involves the eviction of pods from a node and potentially rendering the node temporarily or permanently unavailable.
The primary purpose of the Node Disruption Controller (NDC) is to provide a set of APIs to ensure safety of voluntary node-level disruptions, such as maintenances. Typically, maintenance involves the eviction of pods from a node and potentially rendering the node temporarily or permanently unavailable.

The system is build around a contract: before any actions are taken on a node, a node disruption must be granted for that specific node. Once a node disruption has been granted, the requester can do what it wants (e.g., drain the node).

Expand All @@ -47,7 +47,7 @@ different from the PDB on the following point:
- Can perform a synchronous service level health check. PDB is only looking at the readiness probes of individual pods
- A pod can have more than one budget

Finally, NDB's goal is to make the implementation by the workload owners simple. Owners don't have to implement all
Finally, NDC's goal is to make the implementation by the workload owners simple. Owners don't have to implement all
the features attain a level of safety they are satisfied with:
- Stateless workload owners don't have to use Node Disruption Budget as they can use PodDisruptionBudgets (PDB)
- Stateful owner can only provide a budget without service health-checking
Expand Down Expand Up @@ -89,7 +89,7 @@ stateDiagram

##### Retry

NodeDisruption can be retried automatically. The main purpose of retry is to allow prepation for a disruption. Some disruptions cannot be allowed until some preparations operations have been completed (e.g. replicating data out of the node). A controller can consumme the pending NodeDisruption, perform the required operations and finaly accept the disruption when it is safe.
NodeDisruption can be retried automatically. The main purpose of retry is to allow preparation for a disruption. Some disruptions cannot be allowed until some preparations operations have been completed (e.g. replicating data out of the node). A controller can consume the pending NodeDisruption, perform the required operations and finally accept the disruption when it is safe.

#### Sample object

Expand Down Expand Up @@ -265,7 +265,7 @@ TeamK can create a NodeDisruptionBudget to protect the number of concurrent Node
of nodes.
TeamK, before doing a maintenance of a node will create a NodeDisruption. The controller will check
wich budgets are impacted by the disruption and check if they can tolerate one more disruption.
which budgets are impacted by the disruption and check if they can tolerate one more disruption.
If not, the NodeDisruption will be rejected. TeamK will have to retry creating a NodeDisruption
later.
If it is granted, TeamK can disrupt the node. In this case, the disruption will be the drain and
Expand All @@ -274,8 +274,7 @@ reboot of the Node in question.
### Node autoscaling system
The same features can apply to a Node autoscaling system that needs safely scale down a Kubernetes cluster.
Let's say the system wants to remove 10 nodes from a Kubernetes cluster. It can select nodes randomly, try to create a NodeDisruption, if it is granted, the node can be drained and removed. If it's rejected, the system
can try another node or try later. The budgets ensure that all the drains and node removals are safe.
Let's say the system wants to remove 10 nodes from a Kubernetes cluster. It can select nodes randomly, try to create a NodeDisruption, if it is granted, the node can be drained and removed. If it's rejected, the system can try another node or try later. The budgets ensure that all the drains and node removals are safe.
## Getting Started
You’ll need a Kubernetes cluster to run against. You can use [KIND](https://sigs.k8s.io/kind) to get a local cluster for testing, or run against a remote cluster.
Expand Down
6 changes: 6 additions & 0 deletions config/manager/kustomization.yaml
Original file line number Diff line number Diff line change
@@ -1,2 +1,8 @@
resources:
- manager.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
images:
- name: controller
newName: controller
newTag: latest
4 changes: 2 additions & 2 deletions internal/controller/applicationdisruptionbudget_controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ func (r *ApplicationDisruptionBudgetReconciler) Reconcile(ctx context.Context, r

if err != nil {
if errors.IsNotFound(err) {
// If the ressource was not found, nothing has to be done
// If the resource was not found, nothing has to be done
return ctrl.Result{}, nil
}
return ctrl.Result{}, err
Expand Down Expand Up @@ -194,7 +194,7 @@ func (r *ApplicationDisruptionBudgetResolver) GetNamespacedName() nodedisruption
}
}

// Check health make a synchronous health check on the underlying ressource of a budget
// Check health make a synchronous health check on the underlying resource of a budget
func (r *ApplicationDisruptionBudgetResolver) CheckHealth(context.Context) error {
if r.ApplicationDisruptionBudget.Spec.HealthURL == nil {
return nil
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ var _ = Describe("ApplicationDisruptionBudget controller", func() {
})

AfterEach(func() {
clearAllNodeDisruptionRessources()
clearAllNodeDisruptionResources()
})

BeforeAll(func() {
Expand Down Expand Up @@ -211,15 +211,15 @@ var _ = Describe("ApplicationDisruptionBudget controller", func() {
When("PV doesn't have an affinity", func() {
It("is ignored by ADB", func() {
By("Creating PV without NodeAffinity")
ressources := make(corev1.ResourceList, 1)
ressources[corev1.ResourceStorage] = *resource.NewQuantity(100, ressources.Storage().Format)
resources := make(corev1.ResourceList, 1)
resources[corev1.ResourceStorage] = *resource.NewQuantity(100, resources.Storage().Format)

PVWithoutAffinity := &corev1.PersistentVolume{
ObjectMeta: metav1.ObjectMeta{
Name: "remote-pv",
},
Spec: corev1.PersistentVolumeSpec{
Capacity: ressources,
Capacity: resources,
AccessModes: []corev1.PersistentVolumeAccessMode{corev1.ReadWriteOnce},
PersistentVolumeSource: corev1.PersistentVolumeSource{CSI: &corev1.CSIPersistentVolumeSource{Driver: "test", VolumeHandle: "test"}},
NodeAffinity: nil,
Expand Down
2 changes: 1 addition & 1 deletion internal/controller/budget.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ type Budget interface {
IsImpacted(resolver.NodeSet) bool
// Return the number of disruption allowed considering a list of current node disruptions
TolerateDisruption(resolver.NodeSet) bool
// Check health make a synchronous health check on the underlying ressource of a budget
// Check health make a synchronous health check on the underlying resource of a budget
CheckHealth(context.Context) error
// Call a lifecycle hook in order to synchronously validate a Node Disruption
CallHealthHook(context.Context, nodedisruptionv1alpha1.NodeDisruption) error
Expand Down
4 changes: 2 additions & 2 deletions internal/controller/budget_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -35,13 +35,13 @@ func (m *MockBudget) TolerateDisruption(resolver.NodeSet) bool {
return m.tolerate
}

// Check health make a synchronous health check on the underlying ressource of a budget
// Check health make a synchronous health check on the underlying resource of a budget
func (m *MockBudget) CheckHealth(context.Context) error {
m.healthChecked = true
return m.health
}

// Check health make a synchronous health check on the underlying ressource of a budget
// Check health make a synchronous health check on the underlying resource of a budget
func (m *MockBudget) CallHealthHook(context.Context, nodedisruptionv1alpha1.NodeDisruption) error {
m.healthChecked = true
return m.health
Expand Down
4 changes: 2 additions & 2 deletions internal/controller/nodedisruption_controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ func (r *NodeDisruptionReconciler) Reconcile(ctx context.Context, req ctrl.Reque

if err != nil {
if errors.IsNotFound(err) {
// If the ressource was not found, nothing has to be done
// If the resource was not found, nothing has to be done
return clusterResult, nil
}
return clusterResult, err
Expand Down Expand Up @@ -101,7 +101,7 @@ func (r *NodeDisruptionReconciler) Reconcile(ctx context.Context, req ctrl.Reque
logger.Info("Updating Status, done with", "state", nd.Status.State)
return clusterResult, reconciler.UpdateStatus(ctx)
}
logger.Info("Reconcilation successful", "state", nd.Status.State)
logger.Info("Reconciliation successful", "state", nd.Status.State)
return clusterResult, nil
}

Expand Down
8 changes: 4 additions & 4 deletions internal/controller/nodedisruption_controller_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ import (

// +kubebuilder:docs-gen:collapse=Imports

func clearAllNodeDisruptionRessources() {
func clearAllNodeDisruptionResources() {
// It doesn't seem possible to wipe in all namespace so we walk through all of them
namespaces := &corev1.NamespaceList{}
err := k8sClient.List(context.Background(), namespaces)
Expand Down Expand Up @@ -192,7 +192,7 @@ var _ = Describe("NodeDisruption controller", func() {
})

AfterEach(func() {
clearAllNodeDisruptionRessources()
clearAllNodeDisruptionResources()
})

When("there are no budgets in the cluster", func() {
Expand Down Expand Up @@ -495,7 +495,7 @@ var _ = Describe("NodeDisruption controller", func() {
Expect(k8sClient.Create(ctx, disruption.DeepCopy())).Should(Succeed())
})
AfterEach(func() {
clearAllNodeDisruptionRessources()
clearAllNodeDisruptionResources()
cancelFn()
})

Expand Down Expand Up @@ -549,7 +549,7 @@ var _ = Describe("NodeDisruption controller", func() {
createNodeDisruption(firstDisruptionName, NDNamespace, nodeLabels1, ctx)
})
AfterEach(func() {
clearAllNodeDisruptionRessources()
clearAllNodeDisruptionResources()
cancelFn()
})

Expand Down
2 changes: 1 addition & 1 deletion internal/controller/nodedisruptionbudget_controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ func (r *NodeDisruptionBudgetReconciler) Reconcile(ctx context.Context, req ctrl

if err != nil {
if errors.IsNotFound(err) {
// If the ressource was not found, nothing has to be done
// If the resource was not found, nothing has to be done
return ctrl.Result{}, nil
}
return ctrl.Result{}, err
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ var _ = Describe("NodeDisruptionBudget controller", func() {
})

AfterEach(func() {
clearAllNodeDisruptionRessources()
clearAllNodeDisruptionResources()
})

When("Nodes changes in the cluster", func() {
Expand Down
12 changes: 6 additions & 6 deletions internal/controller/suite_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -74,8 +74,8 @@ func newPod(name, namespace, nodeName string, labels map[string]string) corev1.P
}

func newPVC(name, namespace, pvName string, labels map[string]string) corev1.PersistentVolumeClaim {
ressources := make(corev1.ResourceList, 1)
ressources[corev1.ResourceStorage] = *resource.NewQuantity(100, ressources.Storage().Format)
resources := make(corev1.ResourceList, 1)
resources[corev1.ResourceStorage] = *resource.NewQuantity(100, resources.Storage().Format)
return corev1.PersistentVolumeClaim{
ObjectMeta: metav1.ObjectMeta{
Name: name,
Expand All @@ -85,21 +85,21 @@ func newPVC(name, namespace, pvName string, labels map[string]string) corev1.Per
Spec: corev1.PersistentVolumeClaimSpec{
VolumeName: pvName,
AccessModes: []corev1.PersistentVolumeAccessMode{corev1.ReadWriteOnce},
Resources: corev1.ResourceRequirements{Requests: ressources},
Resources: corev1.ResourceRequirements{Requests: resources},
},
Status: corev1.PersistentVolumeClaimStatus{},
}
}

func newPVforNode(nodeName string) (pv corev1.PersistentVolume) {
ressources := make(corev1.ResourceList, 1)
ressources[corev1.ResourceStorage] = *resource.NewQuantity(100, ressources.Memory().Format)
resources := make(corev1.ResourceList, 1)
resources[corev1.ResourceStorage] = *resource.NewQuantity(100, resources.Memory().Format)
return corev1.PersistentVolume{
ObjectMeta: metav1.ObjectMeta{
Name: fmt.Sprintf("%s-pv-local", nodeName),
},
Spec: corev1.PersistentVolumeSpec{
Capacity: ressources,
Capacity: resources,
AccessModes: []corev1.PersistentVolumeAccessMode{corev1.ReadWriteOnce},
PersistentVolumeSource: corev1.PersistentVolumeSource{Local: &corev1.LocalVolumeSource{Path: "path/to/nothing"}},
NodeAffinity: &corev1.VolumeNodeAffinity{Required: &corev1.NodeSelector{
Expand Down
2 changes: 1 addition & 1 deletion pkg/resolver/resolver.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ import (
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

// Resolver use Kubernetes selectors to return ressources
// Resolver use Kubernetes selectors to return resources
type Resolver struct {
Client client.Client
}
Expand Down

0 comments on commit fcb19c3

Please sign in to comment.