add source code and readme
This commit is contained in:
265
docs/ZeroTrustMigration.md
Normal file
265
docs/ZeroTrustMigration.md
Normal file
@@ -0,0 +1,265 @@
|
||||
# Migrating from External DNS to CF Zero Trust
|
||||
Now that the CF domain is set up, it's time to move other apps and services to using it, then to potentially seal off
|
||||
as much of the Talos and k8s ports as I can.
|
||||
|
||||
## Zero-Downtime Migration Process
|
||||
|
||||
### Step 1: Discover Service Configuration
|
||||
```bash
|
||||
# Find service name and port
|
||||
kubectl get svc -n <namespace>
|
||||
# Example output: service-name ClusterIP 10.x.x.x <none> 9898/TCP
|
||||
```
|
||||
|
||||
### Step 2: Create Tunnel Route (FIRST!)
|
||||
1. Go to **Cloudflare Zero Trust Dashboard** → **Networks** → **Tunnels**
|
||||
2. Find your tunnel, click **Configure**
|
||||
3. Add **Public Hostname**:
|
||||
- **Subdomain**: `app`
|
||||
- **Domain**: `keyboardvagabond.com`
|
||||
- **Service**: `http://service-name.namespace.svc.cluster.local:port`
|
||||
4. **Test** the tunnel URL works before proceeding!
|
||||
|
||||
### Step 3: Update Application Configuration
|
||||
Clear external-DNS annotations and TLS configuration:
|
||||
```yaml
|
||||
# In Helm values or ingress manifest:
|
||||
ingress:
|
||||
annotations: {} # Explicitly empty - removes cert-manager and external-dns
|
||||
tls: [] # Explicitly empty array - no certificates needed
|
||||
```
|
||||
|
||||
### Step 4: Deploy Changes
|
||||
```bash
|
||||
# For Helm apps via Flux:
|
||||
flux reconcile helmrelease <app-name> -n <namespace>
|
||||
|
||||
# For direct manifests:
|
||||
kubectl apply -f <manifest-file>
|
||||
```
|
||||
|
||||
### Step 5: Clean Up Certificates
|
||||
```bash
|
||||
# Delete certificate resources
|
||||
kubectl delete certificate <cert-name> -n <namespace>
|
||||
|
||||
# Find and delete TLS secrets
|
||||
kubectl get secrets -n <namespace> | grep tls
|
||||
kubectl delete secret <tls-secret-name> -n <namespace>
|
||||
```
|
||||
|
||||
### Step 6: Verify Clean State
|
||||
```bash
|
||||
# Check no new certificates are being created
|
||||
kubectl get certificate,secret -n <namespace> | grep <app-name>
|
||||
|
||||
# Should only show Helm release secrets, no certificate or TLS secrets
|
||||
```
|
||||
|
||||
### Step 7: DNS Record Management
|
||||
**How it works:**
|
||||
- **Tunnel automatically creates**: CNAME record → `tunnel-id.cfargotunnel.com`
|
||||
- **External-DNS created**: A records → your cluster IPs
|
||||
- **DNS Priority**: CNAME takes precedence over A records
|
||||
|
||||
**Cleanup options:**
|
||||
```bash
|
||||
# Option 1: Auto-cleanup (recommended) - wait 5 minutes after removing annotations
|
||||
# External-DNS will automatically delete A records after TTL expires
|
||||
|
||||
# Option 2: Manual cleanup (immediate)
|
||||
# Go to Cloudflare DNS dashboard and manually delete A records
|
||||
# Keep the CNAME record (created by tunnel)
|
||||
```
|
||||
|
||||
**Verification:**
|
||||
```bash
|
||||
# Check DNS resolution shows CNAME (not A records)
|
||||
dig podinfo.keyboardvagabond.com
|
||||
|
||||
# Should show:
|
||||
# podinfo.keyboardvagabond.com. CNAME tunnel-id.cfargotunnel.com.
|
||||
```
|
||||
|
||||
## Rollback Plan
|
||||
If tunnel doesn't work:
|
||||
1. **Revert** Helm values/manifests (add back annotations and TLS)
|
||||
2. **Redeploy**: `flux reconcile` or `kubectl apply`
|
||||
3. **Wait** for cert-manager to recreate certificates
|
||||
|
||||
## Benefits After Migration
|
||||
- ✅ **No exposed public IPs** - cluster nodes not directly accessible
|
||||
- ✅ **Automatic DDoS protection** via Cloudflare
|
||||
- ✅ **Centralized SSL management** - Cloudflare handles certificates
|
||||
- ✅ **Better observability** - Cloudflare analytics and logs
|
||||
|
||||
**It should work!** 🚀 (And now we have a plan if it doesn't!)
|
||||
|
||||
## Advanced: Securing Administrative Access
|
||||
|
||||
### Securing Kubernetes & Talos APIs
|
||||
|
||||
Once application migration is complete, you can secure administrative access:
|
||||
|
||||
#### Option 1: TCP Proxy (Simpler)
|
||||
```yaml
|
||||
# Cloudflare Zero Trust → Tunnels → Configure
|
||||
Public Hostname:
|
||||
Subdomain: api
|
||||
Domain: keyboardvagabond.com
|
||||
Service: tcp://localhost:6443 # Kubernetes API
|
||||
|
||||
Public Hostname:
|
||||
Subdomain: talos
|
||||
Domain: keyboardvagabond.com
|
||||
Service: tcp://<NODE_1_IP>:50000 # Talos API
|
||||
```
|
||||
|
||||
**Client configuration:**
|
||||
```bash
|
||||
# Update kubectl config
|
||||
kubectl config set-cluster keyboardvagabond \
|
||||
--server=https://api.keyboardvagabond.com:443 # Note: 443, not 6443
|
||||
|
||||
# Update talosctl config
|
||||
talosctl config endpoint talos.keyboardvagabond.com:443
|
||||
```
|
||||
|
||||
#### Option 2: Private Network via WARP (Most Secure)
|
||||
|
||||
**Step 1: Configure Private Network**
|
||||
```yaml
|
||||
# Cloudflare Zero Trust → Tunnels → Configure → Private Networks
|
||||
Private Network:
|
||||
CIDR: 10.132.0.0/24 # Your NetCup vLAN network
|
||||
Description: "Keyboard Vagabond Cluster Internal Network"
|
||||
```
|
||||
|
||||
**Step 2: Configure Split Tunnels**
|
||||
```yaml
|
||||
# Zero Trust → Settings → WARP Client → Device settings → Split Tunnels
|
||||
Mode: Exclude (recommended)
|
||||
Remove: 10.0.0.0/8 # Remove broad private range
|
||||
Add back:
|
||||
- 10.0.0.0/9 # 10.0.0.0 - 10.127.255.255
|
||||
- 10.133.0.0/16 # 10.133.0.0 - 10.133.255.255
|
||||
- 10.134.0.0/15 # 10.134.0.0 - 10.135.255.255
|
||||
# This ensures only 10.132.0.0/24 routes through WARP
|
||||
```
|
||||
|
||||
**Step 3: Client Configuration**
|
||||
```bash
|
||||
# Install WARP client on admin machines
|
||||
# macOS: brew install --cask cloudflare-warp
|
||||
# Connect to Zero Trust organization
|
||||
warp-cli registration new
|
||||
|
||||
# Configure kubectl to use internal IPs
|
||||
kubectl config set-cluster keyboardvagabond \
|
||||
--server=https://<NODE_1_IP>:6443 # Direct to internal node IP
|
||||
|
||||
# Configure talosctl to use internal IPs
|
||||
talosctl config endpoint <NODE_1_IP>:50000,<NODE_2_IP>:50000
|
||||
```
|
||||
|
||||
**Step 4: Access Policies (Recommended)**
|
||||
```yaml
|
||||
# Zero Trust → Access → Applications → Add application
|
||||
Application Type: Private Network
|
||||
Name: "Kubernetes Cluster Admin Access"
|
||||
Application Domain: 10.132.0.0/24
|
||||
|
||||
Policies:
|
||||
- Name: "Admin Team Only"
|
||||
Action: Allow
|
||||
Rules:
|
||||
- Email domain: @yourdomain.com
|
||||
- Device Posture: Managed device required
|
||||
```
|
||||
|
||||
**Step 5: Device Enrollment**
|
||||
```bash
|
||||
# On admin device
|
||||
# 1. Install WARP: https://1.1.1.1/
|
||||
# 2. Login with Zero Trust organization
|
||||
# 3. Verify private network access:
|
||||
ping <NODE_1_IP> # Should work through WARP
|
||||
|
||||
# 4. Test API access
|
||||
kubectl get nodes # Should connect to internal cluster
|
||||
talosctl version # Should connect to internal Talos API
|
||||
```
|
||||
|
||||
**Step 6: Lock Down External Access**
|
||||
Once WARP is working, update Talos machine configs to block external access:
|
||||
```yaml
|
||||
# In machineconfigs/n1.yaml and n2.yaml
|
||||
machine:
|
||||
network:
|
||||
extraHostEntries:
|
||||
# Firewall rules via Talos
|
||||
- ip: 127.0.0.1 # Placeholder - actual firewall config needed
|
||||
```
|
||||
|
||||
#### WARP Benefits:
|
||||
- ✅ **No public DNS entries** - Admin endpoints not discoverable
|
||||
- ✅ **Device control** - Only managed devices can access cluster
|
||||
- ✅ **Zero-trust policies** - Granular access control per user/device
|
||||
- ✅ **Audit logs** - Full visibility into who accessed what when
|
||||
- ✅ **Device posture** - Require encryption, OS updates, etc.
|
||||
- ✅ **Split tunneling** - Only cluster traffic goes through tunnel
|
||||
- ✅ **Automatic failover** - Multiple WARP data centers
|
||||
|
||||
## Testing WARP Implementation
|
||||
|
||||
### Before WARP (Current State)
|
||||
```bash
|
||||
# Current kubectl configuration
|
||||
kubectl config view --minify -o jsonpath='{.clusters[0].cluster.server}'
|
||||
# Output: https://api.keyboardvagabond.com:6443
|
||||
|
||||
# This goes through internet → external IPs
|
||||
kubectl get nodes
|
||||
```
|
||||
|
||||
### After WARP Setup
|
||||
```bash
|
||||
# 1. Test private network connectivity first
|
||||
ping <NODE_1_IP> # Should work once WARP is connected
|
||||
|
||||
# 2. Create backup kubectl context
|
||||
kubectl config set-context keyboardvagabond-external \
|
||||
--cluster=keyboardvagabond.com \
|
||||
--user=admin@keyboardvagabond.com
|
||||
|
||||
# 3. Update main context to use internal IP
|
||||
kubectl config set-cluster keyboardvagabond.com \
|
||||
--server=https://<NODE_1_IP>:6443
|
||||
|
||||
# 4. Test internal access
|
||||
kubectl get nodes # Should work through WARP → private network
|
||||
|
||||
# 5. Verify traffic path
|
||||
# WARP status should show "Connected" in system tray
|
||||
warp-cli status # Should show connected to your Zero Trust org
|
||||
```
|
||||
|
||||
### Rollback Plan
|
||||
```bash
|
||||
# If WARP doesn't work, quickly restore external access:
|
||||
kubectl config set-cluster keyboardvagabond.com \
|
||||
--server=https://api.keyboardvagabond.com:6443
|
||||
|
||||
# Test external access still works
|
||||
kubectl get nodes
|
||||
```
|
||||
|
||||
## Next Steps After WARP
|
||||
|
||||
Once WARP is proven working:
|
||||
1. **Configure Talos firewall** to block external access to ports 6443 and 50000
|
||||
2. **Remove public API DNS entry** (api.keyboardvagabond.com)
|
||||
3. **Document emergency access procedure** (temporary firewall rule + external DNS)
|
||||
4. **Set up additional WARP devices** for other administrators
|
||||
|
||||
This gives you a **zero-trust administrative access model** where cluster APIs are completely invisible from the internet! 🔒
|
||||
Reference in New Issue
Block a user