Files
Keybard-Vagabond-Demo/docs/ZeroTrustMigration.md

8.1 KiB

Migrating from External DNS to CF Zero Trust

Now that the CF domain is set up, it's time to move other apps and services to using it, then to potentially seal off as much of the Talos and k8s ports as I can.

Zero-Downtime Migration Process

Step 1: Discover Service Configuration

# Find service name and port
kubectl get svc -n <namespace>
# Example output: service-name ClusterIP 10.x.x.x <none> 9898/TCP

Step 2: Create Tunnel Route (FIRST!)

  1. Go to Cloudflare Zero Trust DashboardNetworksTunnels
  2. Find your tunnel, click Configure
  3. Add Public Hostname:
    • Subdomain: app
    • Domain: keyboardvagabond.com
    • Service: http://service-name.namespace.svc.cluster.local:port
  4. Test the tunnel URL works before proceeding!

Step 3: Update Application Configuration

Clear external-DNS annotations and TLS configuration:

# In Helm values or ingress manifest:
ingress:
  annotations: {}  # Explicitly empty - removes cert-manager and external-dns
  tls: []          # Explicitly empty array - no certificates needed

Step 4: Deploy Changes

# For Helm apps via Flux:
flux reconcile helmrelease <app-name> -n <namespace>

# For direct manifests:
kubectl apply -f <manifest-file>

Step 5: Clean Up Certificates

# Delete certificate resources
kubectl delete certificate <cert-name> -n <namespace>

# Find and delete TLS secrets
kubectl get secrets -n <namespace> | grep tls
kubectl delete secret <tls-secret-name> -n <namespace>

Step 6: Verify Clean State

# Check no new certificates are being created
kubectl get certificate,secret -n <namespace> | grep <app-name>

# Should only show Helm release secrets, no certificate or TLS secrets

Step 7: DNS Record Management

How it works:

  • Tunnel automatically creates: CNAME record → tunnel-id.cfargotunnel.com
  • External-DNS created: A records → your cluster IPs
  • DNS Priority: CNAME takes precedence over A records

Cleanup options:

# Option 1: Auto-cleanup (recommended) - wait 5 minutes after removing annotations
# External-DNS will automatically delete A records after TTL expires

# Option 2: Manual cleanup (immediate)
# Go to Cloudflare DNS dashboard and manually delete A records
# Keep the CNAME record (created by tunnel)

Verification:

# Check DNS resolution shows CNAME (not A records)
dig podinfo.keyboardvagabond.com

# Should show:
# podinfo.keyboardvagabond.com. CNAME tunnel-id.cfargotunnel.com.

Rollback Plan

If tunnel doesn't work:

  1. Revert Helm values/manifests (add back annotations and TLS)
  2. Redeploy: flux reconcile or kubectl apply
  3. Wait for cert-manager to recreate certificates

Benefits After Migration

  • No exposed public IPs - cluster nodes not directly accessible
  • Automatic DDoS protection via Cloudflare
  • Centralized SSL management - Cloudflare handles certificates
  • Better observability - Cloudflare analytics and logs

It should work! 🚀 (And now we have a plan if it doesn't!)

Advanced: Securing Administrative Access

Securing Kubernetes & Talos APIs

Once application migration is complete, you can secure administrative access:

Option 1: TCP Proxy (Simpler)

# Cloudflare Zero Trust → Tunnels → Configure
Public Hostname:
  Subdomain: api
  Domain: keyboardvagabond.com
  Service: tcp://localhost:6443  # Kubernetes API
  
Public Hostname:
  Subdomain: talos  
  Domain: keyboardvagabond.com
  Service: tcp://<NODE_1_IP>:50000  # Talos API

Client configuration:

# Update kubectl config
kubectl config set-cluster keyboardvagabond \
  --server=https://api.keyboardvagabond.com:443  # Note: 443, not 6443

# Update talosctl config  
talosctl config endpoint talos.keyboardvagabond.com:443

Option 2: Private Network via WARP (Most Secure)

Step 1: Configure Private Network

# Cloudflare Zero Trust → Tunnels → Configure → Private Networks
Private Network:
  CIDR: 10.132.0.0/24  # Your NetCup vLAN network
  Description: "Keyboard Vagabond Cluster Internal Network"

Step 2: Configure Split Tunnels

# Zero Trust → Settings → WARP Client → Device settings → Split Tunnels
Mode: Exclude (recommended)
Remove: 10.0.0.0/8  # Remove broad private range
Add back:
  - 10.0.0.0/9      # 10.0.0.0 - 10.127.255.255  
  - 10.133.0.0/16   # 10.133.0.0 - 10.133.255.255
  - 10.134.0.0/15   # 10.134.0.0 - 10.135.255.255
  # This ensures only 10.132.0.0/24 routes through WARP

Step 3: Client Configuration

# Install WARP client on admin machines
# macOS: brew install --cask cloudflare-warp
# Connect to Zero Trust organization
warp-cli registration new

# Configure kubectl to use internal IPs
kubectl config set-cluster keyboardvagabond \
  --server=https://<NODE_1_IP>:6443  # Direct to internal node IP

# Configure talosctl to use internal IPs  
talosctl config endpoint <NODE_1_IP>:50000,<NODE_2_IP>:50000

Step 4: Access Policies (Recommended)

# Zero Trust → Access → Applications → Add application
Application Type: Private Network
Name: "Kubernetes Cluster Admin Access"
Application Domain: 10.132.0.0/24

Policies:
  - Name: "Admin Team Only"
    Action: Allow
    Rules: 
      - Email domain: @yourdomain.com
      - Device Posture: Managed device required

Step 5: Device Enrollment

# On admin device
# 1. Install WARP: https://1.1.1.1/
# 2. Login with Zero Trust organization  
# 3. Verify private network access:
ping <NODE_1_IP>  # Should work through WARP

# 4. Test API access
kubectl get nodes  # Should connect to internal cluster
talosctl version   # Should connect to internal Talos API

Step 6: Lock Down External Access Once WARP is working, update Talos machine configs to block external access:

# In machineconfigs/n1.yaml and n2.yaml
machine:
  network:
    extraHostEntries:
      # Firewall rules via Talos
      - ip: 127.0.0.1  # Placeholder - actual firewall config needed

WARP Benefits:

  • No public DNS entries - Admin endpoints not discoverable
  • Device control - Only managed devices can access cluster
  • Zero-trust policies - Granular access control per user/device
  • Audit logs - Full visibility into who accessed what when
  • Device posture - Require encryption, OS updates, etc.
  • Split tunneling - Only cluster traffic goes through tunnel
  • Automatic failover - Multiple WARP data centers

Testing WARP Implementation

Before WARP (Current State)

# Current kubectl configuration
kubectl config view --minify -o jsonpath='{.clusters[0].cluster.server}'
# Output: https://api.keyboardvagabond.com:6443

# This goes through internet → external IPs
kubectl get nodes

After WARP Setup

# 1. Test private network connectivity first
ping <NODE_1_IP>  # Should work once WARP is connected

# 2. Create backup kubectl context  
kubectl config set-context keyboardvagabond-external \
  --cluster=keyboardvagabond.com \
  --user=admin@keyboardvagabond.com

# 3. Update main context to use internal IP
kubectl config set-cluster keyboardvagabond.com \
  --server=https://<NODE_1_IP>:6443

# 4. Test internal access
kubectl get nodes  # Should work through WARP → private network

# 5. Verify traffic path
# WARP status should show "Connected" in system tray
warp-cli status  # Should show connected to your Zero Trust org

Rollback Plan

# If WARP doesn't work, quickly restore external access:
kubectl config set-cluster keyboardvagabond.com \
  --server=https://api.keyboardvagabond.com:6443

# Test external access still works
kubectl get nodes

Next Steps After WARP

Once WARP is proven working:

  1. Configure Talos firewall to block external access to ports 6443 and 50000
  2. Remove public API DNS entry (api.keyboardvagabond.com)
  3. Document emergency access procedure (temporary firewall rule + external DNS)
  4. Set up additional WARP devices for other administrators

This gives you a zero-trust administrative access model where cluster APIs are completely invisible from the internet! 🔒