Tailscale Extension for Talos Nodes
Enable Tailscale on Talos nodes so they can access services exposed via Tailscale (e.g., Harbor registry at https://registry.<tailnet>.ts.net).
Why This Is Needed
Kubernetes nodes need to pull container images from Harbor. Harbor is exposed via Tailscale Ingress with valid Let's Encrypt TLS. Without Tailscale on the nodes, containerd cannot reach the registry.
Alternative considered: Configure containerd to trust an internal HTTP endpoint. Rejected because it requires different image URLs for push (external) vs pull (internal).
Configuration
1. Talos Schematic with Tailscale Extension
Generate at factory.talos.dev:
customization:
systemExtensions:
officialExtensions:
- siderolabs/i915
- siderolabs/intel-ucode
- siderolabs/tailscale
Current schematic ID: 08086db1d88ea52b2e873f0b0c64562af7ae98f6ed83da5ee478871bbe52abd6
Verify with:
curl -s "https://factory.talos.dev/schematics/<ID>"
2. Encrypted Auth Key
Store the Tailscale auth key in talos/talenv.sops.yaml:
TAILSCALE_AUTHKEY: tskey-auth-xxxxx
Encrypt with:
sops --encrypt --in-place talos/talenv.sops.yaml
Important: Use a reusable auth key from the Tailscale admin console, not a one-time key.
3. ExtensionServiceConfig Patch
File: talos/patches/global/tailscale.yaml
apiVersion: v1alpha1
kind: ExtensionServiceConfig
name: tailscale
environment:
- TS_AUTHKEY=${TAILSCALE_AUTHKEY}
Reference in talos/talconfig.yaml:
patches:
- "@./patches/global/tailscale.yaml"
4. DNS Resolution for Tailscale Hostnames
Critical: The Tailscale extension runs on the node and creates routes, but does NOT configure system DNS to use MagicDNS. Containerd uses the node's DNS resolver, which doesn't know about .ts.net domains.
Add the registry hostname and IP to talos/talenv.sops.yaml:
HARBOR_REGISTRY_HOST: registry.<tailnet>.ts.net
HARBOR_TAILSCALE_IP: <tailscale-ip>
Find the Tailscale IP:
dig +short registry.<tailnet>.ts.net @100.100.100.100
Then encrypt:
sops --encrypt --in-place talos/talenv.sops.yaml
The talos/patches/global/machine-network.yaml uses these variables:
machine:
network:
extraHostEntries:
- ip: ${HARBOR_TAILSCALE_IP}
aliases:
- ${HARBOR_REGISTRY_HOST}
After applying this config, a reboot is required for containerd to pick up the new /etc/hosts entries.
IP Stability Warning
The Tailscale IP is assigned by the Tailscale operator to the Ingress resource. It is stable during normal operation but could change if:
- The Tailscale Ingress is deleted and recreated
- The Tailscale operator loses state
- Flux reconciles and recreates the resource
If image pulls suddenly fail with "no such host" errors, verify the current IP:
# Check current IP
dig +short registry.<tailnet>.ts.net @100.100.100.100
# Decrypt and compare with config
sops -d talos/talenv.sops.yaml | grep HARBOR_TAILSCALE_IP
If they differ, update HARBOR_TAILSCALE_IP in talenv.sops.yaml, re-encrypt, regenerate configs, and apply to all nodes with reboot.
5. Update Node Image URLs
In talos/talconfig.yaml, update all nodes to use the new schematic:
talosImageURL: factory.talos.dev/installer/08086db1d88ea52b2e873f0b0c64562af7ae98f6ed83da5ee478871bbe52abd6
Applying Changes
Why Two Steps Are Required
-
talosctl upgradeinstalls the new Talos image (with Tailscale extension) but does NOT apply config changes likeExtensionServiceConfig -
talosctl apply-configapplies configuration changes but does NOT upgrade the Talos image (changingmachine.install.imageonly affects future fresh installs)
Without both steps, the service waits indefinitely:
ext-tailscale Waiting Waiting for extension service config
Optimized Approach: Config First, Then Upgrade (One Reboot)
Apply the config first (stages it), then upgrade. The extension finds the config immediately after upgrade:
Step 1: Apply config (stages ExtensionServiceConfig, no reboot yet)
talosctl apply-config -n <node-ip> -f talos/clusterconfig/kubernetes-<hostname>.yaml --mode=staged
Step 2: Upgrade the node (installs extension, reboots, extension starts with config ready)
mise exec -- task talos:upgrade-node IP=<node-ip>
Alternative: Upgrade First, Then Apply (Two Reboots)
If you upgrade first, the extension waits for config. Then apply-config triggers another reboot:
# Step 1: Upgrade (extension waits for config)
mise exec -- task talos:upgrade-node IP=<node-ip>
# Step 2: Apply config (triggers reboot)
talosctl apply-config -n <node-ip> -f talos/clusterconfig/kubernetes-<hostname>.yaml
After completion, verify:
# Check config exists
talosctl get extensionserviceconfigs -n <node-ip>
# Check service is running
talosctl services -n <node-ip> | grep tailscale
# Check logs
talosctl logs ext-tailscale -n <node-ip>
Verification
After successful setup, the node should:
-
Have
ExtensionServiceConfigfor tailscale:NODE NAMESPACE TYPE ID VERSION
192.168.1.98 runtime ExtensionServiceConfig tailscale 1 -
Show
ext-tailscaleas Running:192.168.1.98 ext-tailscale Running Started task ext-tailscale (PID xxx) -
Have a
tailscale0interface with a 100.x.x.x IP in the logs
File Locations
| Component | Path |
|---|---|
| Tailscale patch | talos/patches/global/tailscale.yaml |
| Encrypted auth key | talos/talenv.sops.yaml |
| Node configs | talos/talconfig.yaml |
| Generated configs | talos/clusterconfig/kubernetes-*.yaml |
Troubleshooting
Service stuck "Waiting for extension service config"
The ExtensionServiceConfig wasn't applied. Run:
talosctl apply-config -n <node-ip> -f talos/clusterconfig/kubernetes-<hostname>.yaml
Auth key issues
- Ensure the key is reusable (not one-time)
- Ensure no quotes around the key value in the config
- Generate a fresh key from Tailscale admin console
Check actual config on node
talosctl get extensionserviceconfigs -n <node-ip> -o yaml
Image pull fails with "not found" but Tailscale is running
This usually means DNS resolution is failing. Check containerd logs:
talosctl -n <node-ip> logs cri 2>&1 | grep -i "registry" | tail -10
If you see no such host errors, the /etc/hosts entry is missing or containerd hasn't picked it up. Verify:
talosctl -n <node-ip> read /etc/hosts | grep registry
If missing, apply the config with extraHostEntries and reboot the node.
Image architecture mismatch
If you push images from an ARM Mac, they'll be arm64 but cluster nodes are amd64. Check the image architecture:
skopeo inspect --format '{{.Architecture}}' docker://registry.<tailnet>.ts.net/project/image:tag
Push multi-arch or the correct architecture:
docker pull --platform linux/amd64 myimage:tag
docker push registry.<tailnet>.ts.net/project/image:tag
ImagePullSecret not found
Harbor requires authentication even for "public" projects at the Docker v2 API level. Create an imagePullSecret:
kubectl create secret docker-registry harbor-registry-secret \
--docker-server=registry.<tailnet>.ts.net \
--docker-username=admin \
--docker-password=<password> \
-n <namespace>
Reference it in your pod spec:
spec:
imagePullSecrets:
- name: harbor-registry-secret