This repository implements the FAF architecture in Kubernetes. It’s intended to supersede the current FAF Docker-Compose stack.
-
Allow more people access to logs, configuration and deployments for certain services without giving them full server access.
-
Role based access control
-
Direct cluster access via kubectl for all authorized developers
-
Improved configuration and secret management
-
-
Advanced resource controls thanks to cpu and ram limits (no app shall consume all CPU ever again)
-
Easier debugging on test environment due to port-forwarding of pods without compromising production config
-
We do not move to k8s because it is cool and fancy!
-
K8s has much more complexity compared to our docker-compose stack
-
We would avoid it, if docker-compose could solve our problems (which it can’t).
-
It was a very conscious trade-off decision.
-
-
We do not move to k8s because we want to deploy FAF on a managed cloud provider.
-
Cloud providers are super expensive. We’d have nothing to gain here.
-
-
We do not move to k8s become highly available!
-
High availability only works if all components are highly available. Most of our apps are not built in that way at all.
-
Deployments with less downtime might be a benefit for some services.
-
-
We’ll use k3s.
-
It is fully supported by NixOS and is a simplified distribution which should be easier to maintain.
-
It also runs on developer machines.
-
It uses few resources.
-
-
Running the same distribution on prod and on local machines makes things more predictable and scripts more stable.
-
Minikube should be mostly compatible, if some devs insist on using it
-
-
We’ll run with manual managed persistent volumes and claims, because we need predictable paths.
-
Predictable paths are a necessity for managing the volumes with ZFS.
-
Using k3s local-path-provisioner we can define the prefix (in the configmap
local-path-config
) and the suffix (in the mounting options in the pod), but in between these there is a random uuid we can’t know beforehand.
This breaks predefined setups and scripts. -
K8s builtin local-path with node affinity ensures all data of a volume can be stored on a selected node (the node label with storage-id=main-01)
-
K3s comes with Traefik as Ingress controller by default.
-
The default Ingress controller in the outside world is nginx.
-
Traefik is well known to FAF since we use it as revers proxy in our faf-stack extensively
-
Traefik offers support for
-
classic Ingress definitions, but requires ingress annotations to use more advanced features (similar to Traefik labels in our current docker-compose.yml)
-
custom IngressRoute definitions which maps the exact Traefik feature set into a yaml format (no annotations required)
-
We have to select which resource type we use and we should stick to it consistently. As always it’s a tradeoff:
-
Pro classic Ingress
-
Class ingress is stable by (not so long) now, while Traefik IngressRoutes are still marked as alpha (yet we use Traefik for quite a while and there were rarely changes even from 1.x to 2.x)
-
Classic Ingress is well-known syntax and understood by most external K8s users. So the entry barrier for external contributions is lower. However a lot of functionality would hide behind the Traefik annotations which would still need people to learn it to understand it all.
-
Using classic ingress would allow us to swap out Traefik anytime and still have a mostly working setup
-
-
Pro Traefik IngressRoute
-
We (the FAF responsible Ops guys) see Traefik as superior compared to Nginx (and moved from Nginx to Traefik as reversy proxy years ago)
-
Thus we do not expect moving back
-
-
We have an existing stack we need to migrate 1:1
-
Since we use Traefik features anyway using the IngressRoute reduces the overall yaml complexity as we do not split logic and annotations
-
Traefik syntax seems easier to understand than regular Ingress, so using Traefik syntax might lower the barrier for external contributors who never used classic Ingress.
-
Decision: We’ll use Traefik IngressRoutes.
-
We could run for Traefik certificate resolvers or use cert-manager
-
Cert-Manager works with classic Ingress routes and Traefik specific IngressRoutes
-
Needs additional software
-
Has a short support cycle (6 months per point release)
-
⇒ More maintenance overhead
-
-
Traefik internal let’s encrypt resolver needs to be manually configured on the node
-
It stores certificates somewhere on disk
-
The easiest approach is a persistent volume on the main storage node
-
This effectively restricts Traefik to run on a single node
-
-
More sophisticate approach is storing the certificates in a persistent remote / network volume
-
Once we have full Cloudflare access, we can do Cloudflare DNS challenge using a Cloudflare token. Then Traefik does not need to issue one certificate per subdomain. It’s unclear though if this makes persisting the certificate obsolete.
-
Decision: We’ll use Traefik as long as we don’t run into any problems, since it seems less maintenance buurden. Cert-manager can still be introduced later if required.
For RabbitMQ there are 3 potential ways of implementing:
-
Manually define a single-node statefulset as a 1:1 copy of faf-stack.
-
A Helm chart from Bitnami
-
Deploying the RabbitMQ operator
Decision: We’ll run for the Bitnami helm charts. It is really awesome configurable so that it can read our secrets, so the template can be perfectly configure. This simplifies coding compared to a manual stefulset. The RabbitMQ operator seems much more complex for now.
-
We want to give access to multiple people with potentially different permissions.
-
Handing out service account certificates is quite annoying.
-
An SSO login via OIDC is preferred and supported by K8s / K3s.
-
The preferred identity provider would be Github as all developers are there and its outside the system itself. Unfortunately Gitlab only supports OAuth2 and not OIDC.
-
Google accounts would be an alternative, but we don’t want to force people on Google.
-
We’ll use FAFs custom login instead.
-
As a fallback (in case the FAF login is broken) we still have the main service account.
-
-
RBAC t.b.d.
-
No service shall go live if its initial configuration or installation can’t be scripted.
-
Everything must be runnable on a single-node cluster.
-
Scripts shall be idempotent / re-runnable without fatal consequences. We will use k8s annotations to keep track of the state.