Tailscale Extension for Talos Nodes

Enable Tailscale on Talos nodes so they can access services exposed via Tailscale (e.g., Harbor registry at https://registry.<tailnet>.ts.net).

Why This Is Needed

Kubernetes nodes need to pull container images from Harbor. Harbor is exposed via Tailscale Ingress with valid Let's Encrypt TLS. Without Tailscale on the nodes, containerd cannot reach the registry.

Alternative considered: Configure containerd to trust an internal HTTP endpoint. Rejected because it requires different image URLs for push (external) vs pull (internal).

Configuration

1. Talos Schematic with Tailscale Extension

Generate at factory.talos.dev:

customization:
  systemExtensions:
    officialExtensions:
      - siderolabs/i915
      - siderolabs/intel-ucode
      - siderolabs/tailscale

Current schematic ID: 08086db1d88ea52b2e873f0b0c64562af7ae98f6ed83da5ee478871bbe52abd6

Verify with:

curl -s "https://factory.talos.dev/schematics/<ID>"

2. Encrypted Auth Key

Store the Tailscale auth key in talos/talenv.sops.yaml:

TAILSCALE_AUTHKEY: tskey-auth-xxxxx

Encrypt with:

sops --encrypt --in-place talos/talenv.sops.yaml

Important: Use a reusable auth key from the Tailscale admin console, not a one-time key.

3. ExtensionServiceConfig Patch

File: talos/patches/global/tailscale.yaml

apiVersion: v1alpha1
kind: ExtensionServiceConfig
name: tailscale
environment:
  - TS_AUTHKEY=${TAILSCALE_AUTHKEY}

Reference in talos/talconfig.yaml:

patches:
  - "@./patches/global/tailscale.yaml"

4. DNS Resolution for Tailscale Hostnames

Critical: The Tailscale extension runs on the node and creates routes, but does NOT configure system DNS to use MagicDNS. Containerd uses the node's DNS resolver, which doesn't know about .ts.net domains.

Add the registry hostname and IP to talos/talenv.sops.yaml:

HARBOR_REGISTRY_HOST: registry.<tailnet>.ts.net
HARBOR_TAILSCALE_IP: <tailscale-ip>

Find the Tailscale IP:

dig +short registry.<tailnet>.ts.net @100.100.100.100

Then encrypt:

sops --encrypt --in-place talos/talenv.sops.yaml

The talos/patches/global/machine-network.yaml uses these variables:

machine:
  network:
    extraHostEntries:
      - ip: ${HARBOR_TAILSCALE_IP}
        aliases:
          - ${HARBOR_REGISTRY_HOST}

After applying this config, a reboot is required for containerd to pick up the new /etc/hosts entries.

IP Stability Warning

The Tailscale IP is assigned by the Tailscale operator to the Ingress resource. It is stable during normal operation but could change if:

The Tailscale Ingress is deleted and recreated
The Tailscale operator loses state
Flux reconciles and recreates the resource

If image pulls suddenly fail with "no such host" errors, verify the current IP:

# Check current IP
dig +short registry.<tailnet>.ts.net @100.100.100.100

# Decrypt and compare with config
sops -d talos/talenv.sops.yaml | grep HARBOR_TAILSCALE_IP

If they differ, update HARBOR_TAILSCALE_IP in talenv.sops.yaml, re-encrypt, regenerate configs, and apply to all nodes with reboot.

5. Update Node Image URLs

In talos/talconfig.yaml, update all nodes to use the new schematic:

talosImageURL: factory.talos.dev/installer/08086db1d88ea52b2e873f0b0c64562af7ae98f6ed83da5ee478871bbe52abd6

Applying Changes

Why Two Steps Are Required

talosctl upgrade installs the new Talos image (with Tailscale extension) but does NOT apply config changes like ExtensionServiceConfig
talosctl apply-config applies configuration changes but does NOT upgrade the Talos image (changing machine.install.image only affects future fresh installs)

Without both steps, the service waits indefinitely:

ext-tailscale   Waiting   Waiting for extension service config

Optimized Approach: Config First, Then Upgrade (One Reboot)

Apply the config first (stages it), then upgrade. The extension finds the config immediately after upgrade:

Step 1: Apply config (stages ExtensionServiceConfig, no reboot yet)

talosctl apply-config -n <node-ip> -f talos/clusterconfig/kubernetes-<hostname>.yaml --mode=staged

Step 2: Upgrade the node (installs extension, reboots, extension starts with config ready)

mise exec -- task talos:upgrade-node IP=<node-ip>

Alternative: Upgrade First, Then Apply (Two Reboots)

If you upgrade first, the extension waits for config. Then apply-config triggers another reboot:

# Step 1: Upgrade (extension waits for config)
mise exec -- task talos:upgrade-node IP=<node-ip>

# Step 2: Apply config (triggers reboot)
talosctl apply-config -n <node-ip> -f talos/clusterconfig/kubernetes-<hostname>.yaml

After completion, verify:

# Check config exists
talosctl get extensionserviceconfigs -n <node-ip>

# Check service is running
talosctl services -n <node-ip> | grep tailscale

# Check logs
talosctl logs ext-tailscale -n <node-ip>

Verification

After successful setup, the node should:

Have ExtensionServiceConfig for tailscale:

NODE           NAMESPACE   TYPE                     ID          VERSION
192.168.1.98   runtime     ExtensionServiceConfig   tailscale   1

Show ext-tailscale as Running:

192.168.1.98   ext-tailscale   Running   Started task ext-tailscale (PID xxx)

Have a tailscale0 interface with a 100.x.x.x IP in the logs

File Locations

Component	Path
Tailscale patch	`talos/patches/global/tailscale.yaml`
Encrypted auth key	`talos/talenv.sops.yaml`
Node configs	`talos/talconfig.yaml`
Generated configs	`talos/clusterconfig/kubernetes-*.yaml`

Troubleshooting

Service stuck "Waiting for extension service config"

The ExtensionServiceConfig wasn't applied. Run:

talosctl apply-config -n <node-ip> -f talos/clusterconfig/kubernetes-<hostname>.yaml

Auth key issues

Ensure the key is reusable (not one-time)
Ensure no quotes around the key value in the config
Generate a fresh key from Tailscale admin console

Check actual config on node

talosctl get extensionserviceconfigs -n <node-ip> -o yaml

Image pull fails with "not found" but Tailscale is running

This usually means DNS resolution is failing. Check containerd logs:

talosctl -n <node-ip> logs cri 2>&1 | grep -i "registry" | tail -10

If you see no such host errors, the /etc/hosts entry is missing or containerd hasn't picked it up. Verify:

talosctl -n <node-ip> read /etc/hosts | grep registry

If missing, apply the config with extraHostEntries and reboot the node.

Image architecture mismatch

If you push images from an ARM Mac, they'll be arm64 but cluster nodes are amd64. Check the image architecture:

skopeo inspect --format '{{.Architecture}}' docker://registry.<tailnet>.ts.net/project/image:tag

Push multi-arch or the correct architecture:

docker pull --platform linux/amd64 myimage:tag
docker push registry.<tailnet>.ts.net/project/image:tag

ImagePullSecret not found

Harbor requires authentication even for "public" projects at the Docker v2 API level. Create an imagePullSecret:

kubectl create secret docker-registry harbor-registry-secret \
  --docker-server=registry.<tailnet>.ts.net \
  --docker-username=admin \
  --docker-password=<password> \
  -n <namespace>

Reference it in your pod spec:

spec:
  imagePullSecrets:
    - name: harbor-registry-secret

Why This Is Needed​

Configuration​

1. Talos Schematic with Tailscale Extension​

2. Encrypted Auth Key​

3. ExtensionServiceConfig Patch​

4. DNS Resolution for Tailscale Hostnames​

IP Stability Warning​

5. Update Node Image URLs​

Applying Changes​

Why Two Steps Are Required​

Optimized Approach: Config First, Then Upgrade (One Reboot)​

Alternative: Upgrade First, Then Apply (Two Reboots)​

Verification​

File Locations​

Troubleshooting​

Service stuck "Waiting for extension service config"​

Auth key issues​

Check actual config on node​

Image pull fails with "not found" but Tailscale is running​

Image architecture mismatch​

ImagePullSecret not found​