Talos            Kubernetes

This repository includes all of the configuration and documentation for my home lab cluster.

This cluster consists of 4 Intel Nucs running Talos Linux with a Synology NAS for large data. More information about this repository and cluster is available at https://infrastructure.btkostner.io.

Features

  • Talos Linux cluster with NVMe as a boot drive and SSD for data
  • Argo CD autopilot for cluster bootstrapping
  • Cilium as a kube proxy replacement and sidecar-less networking
  • Rook Ceph for stateful replicated storage for all nodes
  • Velero for offsite cluster backup

Provisioning

This page goes over how to setup a whole new cluster. Starting with basic networking setup, Talos configuration generating and applying, and Argo CD bootstrapping.

My home cluster is made up of 4x Intel 11th gen i5 NUCs with 32 GB of ram, an NVMe boot drive, and a 1TB SSD for Ceph. All of the nodes are nearly identical to make things easier, though this isn't a hard requirement. The main reason I chose this configuration is for it's low power usage, small space, and Intel Xe graphics.

I also have a PiKVM setup connected to a TESmart 4K 16 Ports HDMI KVM Switch. This allows easy bios control and booting of images. This makes it infinitely easier to install Talos on a cluster.

Node Setup

Node setup is mostly ensuring all of the hardware is working and has up to date firmware. This is especially important for my Crucial MX500 SSDs as older versions can randomly disconnect and cause Ceph to go into an unhealthy state.

Once all of the hardware is confirmed working, I set a static IP for the node. This starts at 192.168.3.11 and continue for each node. Talos will set a virtual IP at 192.168.3.10 for the control plane nodes.

One important thing to note is I use cluster.btkostner.network to point to my cluster virtual IP of 192.168.3.10. Yes this is a local address so it will only resolve correctly in my network, but that should be fine.

Generating Talos Configuration

Talos configuration resides in the provision/talos/ directory. It includes a folder of configuration patches as well as a generate.sh script to run the needed talosctl commands.

If you plan to run this for your own cluster, ensure all of the patches in provision/talos/patches are relevant to you.

Once you are ready, you can run the generate.sh script. This will generate your controlplane.yaml, talosconfig, and worker.yaml files. Ensure you back these files.

Installing Talos

For installing Talos, I grab the latest Talos ISO from GitHub, upload it to my PiKVM, and boot the node into it. This pulls up the Talos instance in bootstrap status. I then run this command for the node:

talosctl apply --file ./controlplane.yaml --nodes "192.168.3.11" --insecure

This sets up the control node. Then run this command to bootstrap the cluster:

talosctl bootstrap --nodes "192.168.3.11"

This does the initial Kubernetes resource creation and what not. At this point, 192.168.3.10 should point to the control plane node.

At this point, I can run the same talosctl apply command for all of the other control nodes. Then I apply the worker configuration with:

talosctl apply --file ./worker.yaml --nodes "192.168.3.14" --insecure

With all of that done, there is now a fresh Kubernetes cluster. Note that because the default CNI is not installed, none of the node networking will work and every Kubernetes node will have a taint on it. They will also reboot every 10 minutes until the Cilium CNI is installed.

Finally I run this command to generate the needed kubectl entry:

talosctl kubeconfig ~/.kube/config --nodes talos.btkostner.network --force

Installing Core Resources

Once the Kubernetes cluster is up, we can start installing software on it. To simplify this process I just run the provision/core/install.sh script. This will install all of the resources in the cluster/argocd directory. Once this is ran initially, ArgoCD will take care of syncing all resources, so you should never need to run this provision core install script again.

Bootstrapping 1Password credentials

The last step of provisioning a cluster is setting up 1Password connect to handle secrets in Kubernetes. Luckily Argo will install the 1Password connect service and the external secrets operator, but we have to add the required 1Password secret for your vault. To do this, I have an install script at provision/1password/install.sh. This will use the 1Password CLI to verify if the 1Password Connect password exists in the 1Password vault (or create it if it doesn't), then copies the password to Kubernetes. If you are running it on your own, please verify the variables in the script before running. It makes heavy assumptions based on my own 1Password setup.

Networking

This page covers networking for individual nodes, our private internal DNS and gateway, and our public DNS and gateways.

Local Network

My Intel NUC cluster runs on a separate vlan powered by Ubiquiti.

KindNameIP
PiKVMPiKVM192.168.3.2
SnapAV WB-800VPS-IPVM-18WattBox192.168.3.3
Synology RS1221+Behemoth192.168.3.4
TESmart 4K UHD 16 Ports HDMI KVM SwitchKVM192.168.3.5
Kubernetes control plane VIP192.168.3.10
Intel NUC11PAHI5NUC 1192.168.3.11
Intel NUC11PAHI5NUC 2192.168.3.12
Intel NUC11PAHI5NUC 3192.168.3.13
Intel NUC11PAHI5NUC 4192.168.3.14
Intel NUC11PAHI&NUC 5192.168.3.15
Kubernetes ingress VIPcilium-ingress192.168.3.50
Kubernetes ingress VIPcilium-gateway-external-gateway192.168.3.51

Private Network

To make things easier, I have a Tailscale network for everything. This makes it easy for all of my devices to access private services on the cluster. To make it even easier, I have a full DNS setup with Cloudflare at btkostner.network. All IPs in that zone point to private local network IPs or Tailscale IPs.

Currently the PiKVM and Synology NAS has built in Tailscale support, so they just work™. This allows me to access my kvm from any device with Tailscale setup by opening a browser and accessing https://kvm.btkostner.network, similarly my nas with https://behemoth.btkostner.network.

For the Kubernetes cluster.... TODO....

Public Network

For public networking it's a pretty standard Kubernetes setup. One abnormal thing about my setup is I use the new (and totally awesome) Kubernetes Gateway API. There is a single external-gateway resource that uses MetalLB to allocate the 192.68.3.50 IP address to it. Port forwarding with Ubiquiti allows my house public IP address to accept traffic and route to my cluster services.

To make this fully work, I also have some ddns jobs on the cluster that set the required Cloudflare records pointing to my house public IP address.

Secrets

As you may have noticed, nothing in this repository is a secret. There are no encrypted files. This is because we use 1Password to manage all of our secrets in the cluster. This is done via 1Password connect and external secrets operator.

In order to get this working as intended, there is one provisioning step that needs to run in order to add the required 1Password authentication into the cluster. Once that is done, everything else should be automated and ready to go.

Storage

This cluster uses Rook Ceph to provide distributed storage across all nodes. Each Intel NUC contributes its SSD to a shared Ceph cluster, which is then exposed to workloads via Kubernetes StorageClasses. An external Synology NAS provides additional NFS storage for large media files.

Ceph Cluster

The Ceph cluster is deployed via the rook-ceph Helm chart (v1.19.3) in the rook-ceph namespace. It uses all available nodes and devices automatically.

ComponentCountResources
Monitors (mon)3500m CPU, 512Mi memory
Managers (mgr)2500m CPU, 512Mi memory
OSDsauto (all devices)1000m CPU, 1024Mi memory

Key features enabled:

  • Dashboard — accessible via HTTP route for cluster monitoring
  • Monitoring — Prometheus metrics are exported
  • Disk prediction — the diskprediction_local module is enabled for drive health forecasting
  • PG autoscaler — automatically tunes placement group counts
  • CSI read affinity — reads are served from the closest OSD based on topology
  • Discovery daemon — automatically detects new devices

The Ceph cluster data directory is stored at /var/lib/rook on each node.

Block Pools

Block pools define how data is replicated across the Ceph cluster. Two pools exist with different durability guarantees.

block-pool (Non-Replicated)

PropertyValue
Failure domainhost
Replication size1 (no redundancy)
RBD mirroringenabled (image mode)
RBD statsenabled

Warning: This pool has no data redundancy. If a single OSD or node is lost, data in this pool is lost. Use only for data that is easily recreatable or non-critical.

safe-block-pool (Replicated)

PropertyValue
Failure domainhost
Replication size3 (full redundancy)
RBD mirroringenabled (image mode)
RBD statsenabled

This pool stores three copies of every block across different hosts, surviving up to two simultaneous host failures.

Object Store

object-store-replicated (S3-Compatible)

A Ceph Object Store providing S3-compatible storage with the following configuration:

ComponentFailure DomainStrategyDetails
Metadata poolhostReplicated3 copies
Data poolhostErasure coded2 data + 1 coding chunk
Gateway2 instances, port 80

Pool preservation is enabled (preservePoolsOnDelete: true), so data pools are retained even if the object store resource is deleted.

Storage Classes

Storage classes are the primary interface for workloads to request storage. The cluster defines three storage classes.

Block Storage Classes

All block storage classes use the rook-ceph.rbd.csi.ceph.com provisioner with ext4 filesystem, Retain reclaim policy, Immediate volume binding, and volume expansion enabled.

Storage ClassPoolReplicationDefault
ceph-blockblock-pool✅ Yes
ceph-block-replicatedsafe-block-poolNo

Object Storage Classes

Storage ClassObject StoreProvisioner
ceph-object-replicatedsafe-object-storerook-ceph.ceph.rook.io/bucket

Choosing a Storage Class

  • ceph-block — Use for non-critical, easily recreatable data where performance matters more than durability (1× replication).
  • ceph-block-replicated — Use for important application data that must survive node failures (3× replication).
  • ceph-object-replicated — Use for S3-compatible object/bucket storage with erasure coding.

NFS Storage

The Synology NAS provides NFS volumes for large media and download directories. NFS PVCs use the nfs storage class with ReadWriteMany access mode, allowing multiple pods to mount the same volume simultaneously. These are typically sized at 1Mi as nominal placeholders since the actual storage is managed by the NAS.

Volume Inventory

By Namespace

developer

PVC NameStorage ClassSizeAccess ModeApplication
opencode-dataceph-block-replicated50GiReadWriteOnceOpenCode

download

PVC NameStorage ClassSizeAccess ModeApplication
lidarr-configceph-block-replicated20GiReadWriteOnceLidarr
lidarr-downloadnfs1MiReadWriteManyLidarr
lidarr-medianfs1MiReadWriteManyLidarr
prowlarr-configceph-block-replicated20GiReadWriteOnceProwlarr
radarr-configceph-block-replicated20GiReadWriteOnceRadarr
radarr-downloadnfs1MiReadWriteManyRadarr
radarr-medianfs1MiReadWriteManyRadarr
sabnzbd-configceph-block-replicated20GiReadWriteOnceSABnzbd
sabnzbd-downloadnfs1MiReadWriteManySABnzbd
seerr-configceph-block-replicated10GiReadWriteOnceSeerr
sonarr-configceph-block-replicated20GiReadWriteOnceSonarr
sonarr-downloadnfs1MiReadWriteManySonarr
sonarr-medianfs1MiReadWriteManySonarr

home

PVC NameStorage ClassSizeAccess ModeApplication
n8n-configceph-block-replicated10GiReadWriteOncen8n

media

PVC NameStorage ClassSizeAccess ModeApplication
autoscan-medianfs1MiReadWriteManyAutoscan
ersatztv-configceph-block-replicated5GiReadWriteOnceErsatzTV
ersatztv-medianfs1MiReadWriteManyErsatzTV
plex-configceph-block-replicated300GiReadWriteOncePlex
plex-medianfs1MiReadWriteManyPlex
tautulli-configceph-block-replicated20GiReadWriteOnceTautulli

rook-ceph (via Helm)

ComponentStorage ClassSizeApplication
ClickHouse persistenceceph-block-replicated50GiSigNoz

Summary by Storage Class

Storage ClassConsumersTotal Ceph Storage
ceph-block-replicated12555Gi
nfs11N/A (NAS-managed)

Largest Consumers

ApplicationPVCSize
Plexplex-config300Gi
OpenCodeopencode-data50Gi
SigNozClickHouse persistence50Gi
Lidarrlidarr-config20Gi
Prowlarrprowlarr-config20Gi
Radarrradarr-config20Gi
SABnzbdsabnzbd-config20Gi
Sonarrsonarr-config20Gi
Tautullitautulli-config20Gi

Configuration Reference

All Rook Ceph resources are defined in:

cluster/rook-ceph/rook-ceph/
├── kustomization.yaml
├── values.yaml
└── resources/
    ├── ceph-block-pool.yaml
    ├── ceph-block-pool-replicated.yaml
    ├── ceph-cluster.yaml
    ├── ceph-object-store-replicated.yaml
    ├── storage-class-ceph-block.yaml
    ├── storage-class-ceph-block-replicated.yaml
    └── storage-class-ceph-object-replicated.yaml

Individual application PVCs are defined alongside their deployments in:

cluster/<namespace>/<application>/resources/pvc-*.yaml

Cilium Upgrades

Upgrading Cilium is... painful. Because it runs as the networking back bone it's very easy to do something wrong and completely break your cluster. As always, read the official Cilium upgrade documentation, but this is how I've been upgrading it so far:

  1. Run the preflight checks with helm:
helm --kube-context admin@btkostner install cilium-preflight cilium/cilium --version 1.15.0 --namespace=kube-system --set preflight.enabled=true --set agent=false --set operator.enabled=false

This will pull all of the container images for each node to reduce issues.

  1. Condon all of the nodes. This will prevent them from trying to upgrade in place and causing a huge cluster wide outage.

  2. Apply the upgrade. This should be done via Argo and merging in a helm version upgrade. Make note to ensure the helm values file is updated for the new version.

  3. You should see the cilium operator container start up on the cluster. Make sure no errors are present.

  4. Start rebooting nodes. I usually start with the head controller node which is 192.168.3.11 on my cluster. You can accomplish a standard reboot with the talosctl utility like so:

talosctl reboot --debug --wait --timeout 30s -n 192.168.3.11

Note, if you run into issues, it's helpful to use the kvm console to see what's going on. You can also force a reboot with Ctrl+Alt+Del.

This should reboot the node safely and bring it back into the cluster. Because it's the same host, it will still be condoned on startup. You can uncondone it and check the status of the cilium agent on the node. It should be running and healthy.

  1. Continue rebooting all of the other nodes. Same as above, though now that controller node is back up, you should not get any weird dashboard outages or issues.

  2. Once all nodes are rebooted, you can uncondone them all and the cluster should be back to normal.