This is a repository for my home infrastructure and Kubernetes cluster. I try to adhere to Infrastructure as Code (IaC) and GitOps practices using tools like Kubernetes, Flux, Renovate and GitHub Actions.
This semi-hyper-converged cluster runs Talos Linux, an immutable and ephemeral Linux distribution tailored for Kubernetes, and is deployed on bare-metal Lenovo ThinkCentre PCs. Currently, persistent file storage is provided via a custom fork of the Synology CSI, however I plan to incorporate Rook in order to enable block- and object-storage within the cluster. A separate NAS handles media file storage and backups. The cluster is designed to enable a full teardown without any data loss.
πΈ Click here to see my Talos configuration.
There is a template at onedr0p/cluster-template if you want to follow along with many of the practices I use here.
- cert-manager: Manage SSL certificates for services in my cluster.
- cilium: Internal Kubernetes container networking interface (CNI).
- cloudflared: Enables external access to certain services via Cloudflare tunnels.
- external-dns: Automatically syncs ingress DNS records to a DNS provider.
- external-secrets: Managed Kubernetes secrets using 1Password Connect.
- ingress-nginx: Kubernetes ingress controller using NGINX as a reverse proxy and load balancer.
- rook: Distributed block, file, and object storage for stateful workloads. [WIP]
- sops: Managed secrets for Kubernetes which are commited to Git. (Used solely for initial bootstrapping of cluster)
- spegel: Stateless cluster-local OCI registry mirror.
- volsync: Backup and recovery of persistent volume claims.
Flux monitors my kubernetes folder (see Directories below) and implements changes to my cluster based on the YAML manifests.
Flux operates by recursively searching the kubernetes/apps folder until it locates the top-level kustomization.yaml
in each directory. It then applies all the resources listed in it. This kustomization.yaml
typically contains a namespace resource and one or more Flux kustomizations. These Flux kustomizations usually include a HelmRelease
or other application-related resources, which are then applied.
Renovate monitors my entire repository for dependency updates, automatically creating a PR when updates are found. When the relevant PRs are merged, Flux then applies the changes to my cluster.
This Git repository contains the following directories under kubernetes/.
π kubernetes
βββ π apps # applications
βββ π bootstrap # bootstrap procedures
βββ π flux # core flux configuration
βββ π templates # reusable components
This is a high-level look how Flux deploys my applications with dependencies. Below there are 3 Flux kustomizations postgres
, postgres-cluster
, and atuin
. postgres
is the first app that needs to be running and healthy before postgres-cluster
and once postgres-cluster
is healthy, then atuin
will be deployed.
graph TD;
id1>Kustomization: cluster] -->|Creates| id2>Kustomization: cluster-apps];
id2>Kustomization: cluster-apps] -->|Creates| id3>Kustomization: postgres];
id2>Kustomization: cluster-apps] -->|Creates| id5>Kustomization: postgres-cluster]
id2>Kustomization: cluster-apps] -->|Creates| id8>Kustomization: atuin]
id3>Kustomization: postgres] -->|Creates| id4[HelmRelease: postgres];
id5>Kustomization: postgres-cluster] -->|Depends on| id3>Kustomization: postgres];
id5>Kustomization: postgres-cluster] -->|Creates| id10[Postgres Cluster];
id8>Kustomization: atuin] -->|Creates| id9(HelmRelease: atuin);
id8>Kustomization: atuin] -->|Depends on| id5>Kustomization: postgres-cluster];
In my cluster there are two instances of ExternalDNS running: one for syncing private DNS records to my router using the ExternalDNS webhook provider for UniFi, and another instance for syncing public DNS records to Cloudflare. This setup is managed by creating ingresses with two specific classes: internal
for private DNS and external
for public DNS. The external-dns
instances then sync the DNS records to their respective platforms.
While most of my infrastructure and workloads are self-hosted, I do rely upon the cloud for certain key parts of my setup. This saves me from having to worry about three things:
- Dealing with chicken/egg scenarios
- Critical services that need to be accessible, whether my cluster is online or not.
- The "hit by a bus" scenario - what happens to critical apps (e.g. Email, Password Manager, Photos, etc.) that my friends and family rely on when I'm no longer around.
Alternative solutions to the first two of these problems would be to host a Kubernetes cluster in the cloud and deploy applications like Vault, Vaultwarden, ntfy, and Gatus; however, maintaining another cluster and monitoring another group of workloads would frankly be more time and effort than I am willing to put in. (and would probably cost more or equal out to the same costs as described below)
Service | Use | Cost |
---|---|---|
1Password | Secrets with External Secrets | ~$36/yr |
Cloudflare | Domain/DNS | ~$24/yr |
Backblaze | S3-compatible object storage | ~$36/yr |
GitHub | Hosting this repository and continuous integration/deployments | Free |
Pushover | Kubernetes Alerts and application notifications | $5 OTP |
UptimeRobot | Monitoring internet connectivity and external facing applications | Free |
Total: ~$10/mo |
Device | Count | CPU | OS Disk | Data Disk | RAM | OS | Purpose |
---|---|---|---|---|---|---|---|
Lenovo ThinkCentre M93p Tiny | 3 | i5-4570T | 128GB SSD | - | 8GB | Talos | control plane |
Lenovo ThinkCentre M93p Tiny | 4 | i7-4765T | 256GB SSD | - | 16GB | Talos | general-purpose worker |
Total CPU: 32 threads
Total RAM: 64GB
Device | CPU | OS Disk | Data Disk | RAM | OS | Purpose |
---|---|---|---|---|---|---|
Synology DS918+ | Intel Celeron J3455 | - | 2x14TB HDD + 2x18TB HDD | 16GB | DSM 7 | NAS/NFS/Backup |
Device | Count | CPU | OS Disk | Data Disk | RAM | OS | Purpose |
---|---|---|---|---|---|---|---|
Home Assistant Yellow | 1 | Raspberry Pi CM4 | 8GB eMMC | 1TB nVME | 4GB | HAOS | Home Automation |
Hue Bridge | 1 | - | - | - | - | - | Smart Lighting Hub |
ESP32 | 2 | Tensilica Xtensa LX6 | - | - | 512KB | ESPHome | Bluetooth Proxy |
Device | Purpose |
---|---|
Unifi UDM Pro | Network - Router & NVR |
Unifi USW Pro 24 PoE | Network - Switch |
Unifi USP PDU Pro | Power - PDU |
CyberPower OR500LCDRM1U | Power - UPS |
Huge thank-you to the folks over on the Home Operations Discord community, especially @onedrop, @bjw-s, and @buroa. Their home-ops repos have been an amazing resource to reference as I embarked on this journey.
Be sure to check out kubesearch.dev for further ideas and reference for deploying applications on Kubernetes.
See the latest release notes.
See LICENSE.