Mechanic

Working under the hood to stop disruptions to your AKS nodes

Description

mechanic is a tool for AKS clusters that helps mitigate the impact from platform maintenance events. Its primary focus is preventing application impacts from maintenance events that require node reboots or live migrations without moving pods unnecessarily or causing application downtime.

It does this by monitoring node conditions and, when a maintenance event is indicated, querying the Instance Metadata Service for maintenance event details. If the event is deemed impactful to the node, it will cordon and drain the node to ensure pods are rescheduled to other nodes before the maintenance event occurs.

What's the best way to use this?

The best combination of functionality would be using this alongside Cluster Autoscaler. The built-in node problem detector implementation used by AKS will manage the VMEventScheduled node condition which triggers this drain functionality.

As the pods are drained from the node, without Cluster Autoscaler the cluster could exhaust available compute resources; using CAS or Node Autoprovisioning would ensure that the cluster can scale to meet the demands of the pods being rescheduled.

Installing mechanic in a cluster

The recommended way to run mechanic is through a DaemonSet - this ensures that each node in the cluster has a monitor that can coordinate cordon and drain operations. There are some limitations at this time - namely:

No ARM nodes are supported. The container images for mechanic are built for amd64 architectures.
No Windows node support. The container images target a Linux environment.

Mechanic is offered as a base set of YAMLs that can be applied to your cluster through the use of kustomize. For details on generating valid YAML to install the DaemonSet, see the installation guide.

There are some caveats and items worth noting:

The DaemonSet is deployed in a custom mechanic namespace. This is to ensure that the DaemonSet can be managed independently of other resources in the cluster.
The Kustomize base offers a prebuilt image hosted in the GitHub Container Registry packages of this repository. If you choose, you can build your own image or pull the image from the GitHub Container Registry for this project and push it into your own registry. Once the image is in a registry, you can create a patch to have Kustomize update the image URL.
All images use a base container image of Azure Linux.

How does it work?

mechanic runs as a DaemonSet in your cluster. Each daemon pod monitors node updates and, for each update, checks the node conditions. If a VMEventScheduled condition is present, it queries the Instance Metadata Service for maintenance information.

If the maintenance event is deemed impactful, it will cordon the node and begin draining pods to other nodes in the cluster. During the drain flow, a label is added to the node (mechanic.cordoned) indicating that it was cordoned by mechanic. If the daemon pod is restarted, it will check for this label and use it as an input on whether to uncordon the node if the VMEventScheduled condition is no longer present.

I'm interested in contributing!

Great! We're always looking for contributors to help improve the project. If you're interested in contributing, please see the contributing docs for more information on how to get started.

Name		Name	Last commit message	Last commit date
Latest commit History 155 Commits
.github		.github
build		build
cmd/mechanic		cmd/mechanic
deploy/base		deploy/base
docs		docs
hack		hack
internal		internal
pkg		pkg
.gitignore		.gitignore
.golangci.yaml		.golangci.yaml
.goreleaser.yaml		.goreleaser.yaml
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Justfile		Justfile
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
cliff.toml		cliff.toml
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mechanic

Description

What's the best way to use this?

Installing mechanic in a cluster

How does it work?

I'm interested in contributing!

About

Releases 5

Packages

Contributors 2

Languages

License

amargherio/mechanic

Folders and files

Latest commit

History

Repository files navigation

Mechanic

Description

What's the best way to use this?

Installing mechanic in a cluster

How does it work?

I'm interested in contributing!

About

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases 5

Packages 0

Contributors 2

Languages

Packages