Misc fixes (#50)

* Fix misspelled occurrences of "resource" * Fix various typos * Fix regressions * Rename NDB -> NDC
criteo · Dec 12, 2023 · fcb19c3 · fcb19c3
1 parent 0e729ad
commit fcb19c3
Show file tree

Hide file tree

Showing 12 changed files with 40 additions and 35 deletions.
diff --git a/README.md b/README.md
@@ -1,11 +1,11 @@
 # Node Disruption Controller
 
-Node-disruption-controller (NDB) is a way to control node disruptions (e.g. for maintenance purpose) in a Kubernetes cluster.
+Node-disruption-controller (NDC) is a way to control node disruptions (e.g. for maintenance purpose) in a Kubernetes cluster.
 
 
 ## Motivation
 
-The Node Disruption Controller was built to help perform maintenance operations on Kubernetes clusters running stateful workloads, especially those reliant on local storage like databases. Nodes within a Kubernetes cluster are subject to disruptions: involuntary (like hardware failures, VM premption, or crashes) and voluntary (such as node maintenance). Performing efficient yet safe maintenances when it involves multiple actors (different teams, different controllers) is challenging. The Node Disruption Controller provides an interface between the actors that want to perform maintenances on nodes (maintenance systems, autoscalers) and the ones that are operating services on top of these nodes.
+The Node Disruption Controller was built to help perform maintenance operations on Kubernetes clusters running stateful workloads, especially those reliant on local storage like databases. Nodes within a Kubernetes cluster are subject to disruptions: involuntary (like hardware failures, VM preemption, or crashes) and voluntary (such as node maintenance). Performing efficient yet safe maintenances when it involves multiple actors (different teams, different controllers) is challenging. The Node Disruption Controller provides an interface between the actors that want to perform maintenances on nodes (maintenance systems, autoscalers) and the ones that are operating services on top of these nodes.
 
 While workload operators should build systems that are resilient to involuntary disruptions, the controller helps ensure that voluntary disruptions are as safe as they can be.
 
@@ -17,21 +17,21 @@ Storage is represented by Persistent Volume (PV) and PV Claims (PVC). While remo
 
 When relying on drain and PDB, data is safeguarded from eviction only if a Pod is actively running on it. Nevertheless, certain scenarios exist where Pods are not running on the node with the data, and losing a node would be costly:
 
-- Critical rolling update: During a rolling update, Pods are restarted. If a Pod running on a node that is cordoned, preventing it from restarting on the node, the node can be drained successfully even if the    PDB permits no disruptions. While the database should ideally withstand the loss of data from a single node, allowing it during a major version upgrade can be risky (Database might not allow node replacement in the middle of an upgrade).
+- Critical rolling update: During a rolling update, Pods are restarted. If a Pod running on a node that is cordoned, preventing it from restarting on the node, the node can be drained successfully even if the PDB permits no disruptions. While the database should ideally withstand the loss of data from a single node, allowing it during a major version upgrade can be risky (database might not allow node replacement in the middle of an upgrade).
 - Forced eviction on multiple pods: In real-world production environments, we've observed cases where node pressure leads to the eviction of multiple Pods within a StatefulSet. In such instances, node drains proceed without issue on nodes that forcefully evicted their pods. PDB is rendered useless. If multiple nodes enter maintenance mode simultaneously, data loss can occur.
 
-The Node Disruption Controller addresses these concerns by providing that is data (PVC/PV) aware.
+The Node Disruption Controller addresses these concerns by providing a primitive which is data (PVC/PV) aware.
 
 
 ### Service level healthcheck
 
 Kubernetes primitives and PDB only provide pod-level healthiness protection through mechanisms like liveness and readiness probes. However stateful workloads often provide service level healthiness (data redundancy status, SLA monitoring). In cases where the service is not in a healthy state, we must avoid initiating voluntary disruptions.
 
-Unfortunately, at the moment Kubernetes doesn't provide such API. Prior to the Node Disruption Controller (NDB), we addressed this need through periodic checks, where we set `maxUnavailable=0` if the service is unhealthy. However, asynchronous health checking is dangerous as it hard to check when the last health check was made (the periodic system could be broken and the state not updated for a long time) or it cannot be fast enough and lead to multiple evictions being granted before reacting. NDB prevents these issues by providing a synchronous health checking API.
+Unfortunately, at the moment Kubernetes doesn't provide such API. Prior to the Node Disruption Controller (NDC), we addressed this need through periodic checks, where we set `maxUnavailable=0` if the service is unhealthy. However, asynchronous health checking is dangerous as it hard to check when the last health check was made (the periodic system could be broken and the state not updated for a long time) or it cannot be fast enough and lead to multiple evictions being granted before reacting. NDC prevents these issues by providing a synchronous health checking API.
 
 ## Description
 
-The primary purpose of the Node Disruption Controller (NDB) is to provide a set of APIs to ensure safety of voluntary node-level disruptions, such as maintenances. Typically, maintenance involves the eviction of pods from a node and potentially rendering the node temporarily or permanently unavailable.
+The primary purpose of the Node Disruption Controller (NDC) is to provide a set of APIs to ensure safety of voluntary node-level disruptions, such as maintenances. Typically, maintenance involves the eviction of pods from a node and potentially rendering the node temporarily or permanently unavailable.
 
 The system is build around a contract: before any actions are taken on a node, a node disruption must be granted for that specific node. Once a node disruption has been granted, the requester can do what it wants (e.g., drain the node).
 
@@ -47,7 +47,7 @@ different from the PDB on the following point:
 - Can perform a synchronous service level health check. PDB is only looking at the readiness probes of individual pods
 - A pod can have more than one budget
 
-Finally, NDB's goal is to make the implementation by the workload owners simple. Owners don't have to implement all
+Finally, NDC's goal is to make the implementation by the workload owners simple. Owners don't have to implement all
 the features attain a level of safety they are satisfied with:
 - Stateless workload owners don't have to use Node Disruption Budget as they can use PodDisruptionBudgets (PDB)
 - Stateful owner can only provide a budget without service health-checking
@@ -89,7 +89,7 @@ stateDiagram
 
 ##### Retry
 
-NodeDisruption can be retried automatically. The main purpose of retry is to allow prepation for a disruption. Some disruptions cannot be allowed until some preparations operations have been completed (e.g. replicating data out of the node). A controller can consumme the pending NodeDisruption, perform the required operations and finaly accept the disruption when it is safe.
+NodeDisruption can be retried automatically. The main purpose of retry is to allow preparation for a disruption. Some disruptions cannot be allowed until some preparations operations have been completed (e.g. replicating data out of the node). A controller can consume the pending NodeDisruption, perform the required operations and finally accept the disruption when it is safe.
 
 #### Sample object
 
@@ -265,7 +265,7 @@ TeamK can create a NodeDisruptionBudget to protect the number of concurrent Node
 of nodes.
 
 TeamK, before doing a maintenance of a node will create a NodeDisruption. The controller will check
-wich budgets are impacted by the disruption and check if they can tolerate one more disruption.
+which budgets are impacted by the disruption and check if they can tolerate one more disruption.
 If not, the NodeDisruption will be rejected. TeamK will have to retry creating a NodeDisruption
 later.
 If it is granted, TeamK can disrupt the node. In this case, the disruption will be the drain and
@@ -274,8 +274,7 @@ reboot of the Node in question.
 ### Node autoscaling system
 
 The same features can apply to a Node autoscaling system that needs safely scale down a Kubernetes cluster.
-Let's say the system wants to remove 10 nodes from a Kubernetes cluster. It can select nodes randomly, try to create a NodeDisruption, if it is granted, the node can be drained and removed. If it's rejected, the system
-can try another node or try later. The budgets ensure that all the drains and node removals are safe.
+Let's say the system wants to remove 10 nodes from a Kubernetes cluster. It can select nodes randomly, try to create a NodeDisruption, if it is granted, the node can be drained and removed. If it's rejected, the system can try another node or try later. The budgets ensure that all the drains and node removals are safe.
 
 ## Getting Started
 You’ll need a Kubernetes cluster to run against. You can use [KIND](https://sigs.k8s.io/kind) to get a local cluster for testing, or run against a remote cluster.

diff --git a/config/manager/kustomization.yaml b/config/manager/kustomization.yaml
@@ -1,2 +1,8 @@
 resources:
 - manager.yaml
+apiVersion: kustomize.config.k8s.io/v1beta1
+kind: Kustomization
+images:
+- name: controller
+  newName: controller
+  newTag: latest
diff --git a/internal/controller/applicationdisruptionbudget_controller.go b/internal/controller/applicationdisruptionbudget_controller.go
@@ -70,7 +70,7 @@ func (r *ApplicationDisruptionBudgetReconciler) Reconcile(ctx context.Context, r
 
 	if err != nil {
 		if errors.IsNotFound(err) {
-			// If the ressource was not found, nothing has to be done
+			// If the resource was not found, nothing has to be done
 			return ctrl.Result{}, nil
 		}
 		return ctrl.Result{}, err
@@ -194,7 +194,7 @@ func (r *ApplicationDisruptionBudgetResolver) GetNamespacedName() nodedisruption
 	}
 }
 
-// Check health make a synchronous health check on the underlying ressource of a budget
+// Check health make a synchronous health check on the underlying resource of a budget
 func (r *ApplicationDisruptionBudgetResolver) CheckHealth(context.Context) error {
 	if r.ApplicationDisruptionBudget.Spec.HealthURL == nil {
 		return nil

diff --git a/internal/controller/applicationdisruptionbudget_controller_test.go b/internal/controller/applicationdisruptionbudget_controller_test.go
@@ -71,7 +71,7 @@ var _ = Describe("ApplicationDisruptionBudget controller", func() {
 		})
 
 		AfterEach(func() {
-			clearAllNodeDisruptionRessources()
+			clearAllNodeDisruptionResources()
 		})
 
 		BeforeAll(func() {
@@ -211,15 +211,15 @@ var _ = Describe("ApplicationDisruptionBudget controller", func() {
 		When("PV doesn't have an affinity", func() {
 			It("is ignored by ADB", func() {
 				By("Creating PV without NodeAffinity")
-				ressources := make(corev1.ResourceList, 1)
-				ressources[corev1.ResourceStorage] = *resource.NewQuantity(100, ressources.Storage().Format)
+				resources := make(corev1.ResourceList, 1)
+				resources[corev1.ResourceStorage] = *resource.NewQuantity(100, resources.Storage().Format)
 
 				PVWithoutAffinity := &corev1.PersistentVolume{
 					ObjectMeta: metav1.ObjectMeta{
 						Name: "remote-pv",
 					},
 					Spec: corev1.PersistentVolumeSpec{
-						Capacity:               ressources,
+						Capacity:               resources,
 						AccessModes:            []corev1.PersistentVolumeAccessMode{corev1.ReadWriteOnce},
 						PersistentVolumeSource: corev1.PersistentVolumeSource{CSI: &corev1.CSIPersistentVolumeSource{Driver: "test", VolumeHandle: "test"}},
 						NodeAffinity:           nil,

diff --git a/internal/controller/budget.go b/internal/controller/budget.go
@@ -15,7 +15,7 @@ type Budget interface {
 	IsImpacted(resolver.NodeSet) bool
 	// Return the number of disruption allowed considering a list of current node disruptions
 	TolerateDisruption(resolver.NodeSet) bool
-	// Check health make a synchronous health check on the underlying ressource of a budget
+	// Check health make a synchronous health check on the underlying resource of a budget
 	CheckHealth(context.Context) error
 	// Call a lifecycle hook in order to synchronously validate a Node Disruption
 	CallHealthHook(context.Context, nodedisruptionv1alpha1.NodeDisruption) error

diff --git a/internal/controller/budget_test.go b/internal/controller/budget_test.go
@@ -35,13 +35,13 @@ func (m *MockBudget) TolerateDisruption(resolver.NodeSet) bool {
 	return m.tolerate
 }
 
-// Check health make a synchronous health check on the underlying ressource of a budget
+// Check health make a synchronous health check on the underlying resource of a budget
 func (m *MockBudget) CheckHealth(context.Context) error {
 	m.healthChecked = true
 	return m.health
 }
 
-// Check health make a synchronous health check on the underlying ressource of a budget
+// Check health make a synchronous health check on the underlying resource of a budget
 func (m *MockBudget) CallHealthHook(context.Context, nodedisruptionv1alpha1.NodeDisruption) error {
 	m.healthChecked = true
 	return m.health

diff --git a/internal/controller/nodedisruption_controller.go b/internal/controller/nodedisruption_controller.go
@@ -72,7 +72,7 @@ func (r *NodeDisruptionReconciler) Reconcile(ctx context.Context, req ctrl.Reque
 
 	if err != nil {
 		if errors.IsNotFound(err) {
-			// If the ressource was not found, nothing has to be done
+			// If the resource was not found, nothing has to be done
 			return clusterResult, nil
 		}
 		return clusterResult, err
@@ -101,7 +101,7 @@ func (r *NodeDisruptionReconciler) Reconcile(ctx context.Context, req ctrl.Reque
 		logger.Info("Updating Status, done with", "state", nd.Status.State)
 		return clusterResult, reconciler.UpdateStatus(ctx)
 	}
-	logger.Info("Reconcilation successful", "state", nd.Status.State)
+	logger.Info("Reconciliation successful", "state", nd.Status.State)
 	return clusterResult, nil
 }
 

diff --git a/internal/controller/nodedisruption_controller_test.go b/internal/controller/nodedisruption_controller_test.go
@@ -38,7 +38,7 @@ import (
 
 // +kubebuilder:docs-gen:collapse=Imports
 
-func clearAllNodeDisruptionRessources() {
+func clearAllNodeDisruptionResources() {
 	// It doesn't seem possible to wipe in all namespace so we walk through all of them
 	namespaces := &corev1.NamespaceList{}
 	err := k8sClient.List(context.Background(), namespaces)
@@ -192,7 +192,7 @@ var _ = Describe("NodeDisruption controller", func() {
 			})
 
 			AfterEach(func() {
-				clearAllNodeDisruptionRessources()
+				clearAllNodeDisruptionResources()
 			})
 
 			When("there are no budgets in the cluster", func() {
@@ -495,7 +495,7 @@ var _ = Describe("NodeDisruption controller", func() {
 				Expect(k8sClient.Create(ctx, disruption.DeepCopy())).Should(Succeed())
 			})
 			AfterEach(func() {
-				clearAllNodeDisruptionRessources()
+				clearAllNodeDisruptionResources()
 				cancelFn()
 			})
 
@@ -549,7 +549,7 @@ var _ = Describe("NodeDisruption controller", func() {
 				createNodeDisruption(firstDisruptionName, NDNamespace, nodeLabels1, ctx)
 			})
 			AfterEach(func() {
-				clearAllNodeDisruptionRessources()
+				clearAllNodeDisruptionResources()
 				cancelFn()
 			})
 

diff --git a/internal/controller/nodedisruptionbudget_controller.go b/internal/controller/nodedisruptionbudget_controller.go
@@ -62,7 +62,7 @@ func (r *NodeDisruptionBudgetReconciler) Reconcile(ctx context.Context, req ctrl
 
 	if err != nil {
 		if errors.IsNotFound(err) {
-			// If the ressource was not found, nothing has to be done
+			// If the resource was not found, nothing has to be done
 			return ctrl.Result{}, nil
 		}
 		return ctrl.Result{}, err

diff --git a/internal/controller/nodedisruptionbudget_controller_test.go b/internal/controller/nodedisruptionbudget_controller_test.go
@@ -61,7 +61,7 @@ var _ = Describe("NodeDisruptionBudget controller", func() {
 			})
 
 			AfterEach(func() {
-				clearAllNodeDisruptionRessources()
+				clearAllNodeDisruptionResources()
 			})
 
 			When("Nodes changes in the cluster", func() {

diff --git a/internal/controller/suite_test.go b/internal/controller/suite_test.go
@@ -74,8 +74,8 @@ func newPod(name, namespace, nodeName string, labels map[string]string) corev1.P
 }
 
 func newPVC(name, namespace, pvName string, labels map[string]string) corev1.PersistentVolumeClaim {
-	ressources := make(corev1.ResourceList, 1)
-	ressources[corev1.ResourceStorage] = *resource.NewQuantity(100, ressources.Storage().Format)
+	resources := make(corev1.ResourceList, 1)
+	resources[corev1.ResourceStorage] = *resource.NewQuantity(100, resources.Storage().Format)
 	return corev1.PersistentVolumeClaim{
 		ObjectMeta: metav1.ObjectMeta{
 			Name:      name,
@@ -85,21 +85,21 @@ func newPVC(name, namespace, pvName string, labels map[string]string) corev1.Per
 		Spec: corev1.PersistentVolumeClaimSpec{
 			VolumeName:  pvName,
 			AccessModes: []corev1.PersistentVolumeAccessMode{corev1.ReadWriteOnce},
-			Resources:   corev1.ResourceRequirements{Requests: ressources},
+			Resources:   corev1.ResourceRequirements{Requests: resources},
 		},
 		Status: corev1.PersistentVolumeClaimStatus{},
 	}
 }
 
 func newPVforNode(nodeName string) (pv corev1.PersistentVolume) {
-	ressources := make(corev1.ResourceList, 1)
-	ressources[corev1.ResourceStorage] = *resource.NewQuantity(100, ressources.Memory().Format)
+	resources := make(corev1.ResourceList, 1)
+	resources[corev1.ResourceStorage] = *resource.NewQuantity(100, resources.Memory().Format)
 	return corev1.PersistentVolume{
 		ObjectMeta: metav1.ObjectMeta{
 			Name: fmt.Sprintf("%s-pv-local", nodeName),
 		},
 		Spec: corev1.PersistentVolumeSpec{
-			Capacity:               ressources,
+			Capacity:               resources,
 			AccessModes:            []corev1.PersistentVolumeAccessMode{corev1.ReadWriteOnce},
 			PersistentVolumeSource: corev1.PersistentVolumeSource{Local: &corev1.LocalVolumeSource{Path: "path/to/nothing"}},
 			NodeAffinity: &corev1.VolumeNodeAffinity{Required: &corev1.NodeSelector{

diff --git a/pkg/resolver/resolver.go b/pkg/resolver/resolver.go
@@ -19,7 +19,7 @@ import (
 	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
 )
 
-// Resolver use Kubernetes selectors to return ressources
+// Resolver use Kubernetes selectors to return resources
 type Resolver struct {
 	Client client.Client
 }