--- description: Detailed technical specifications for nodes, network, and Talos configuration globs: ["machineconfigs/**/*", "patches/**/*", "talosconfig", "kubeconfig*"] alwaysApply: false --- # Technical Specifications & Low-Level Configuration ## Talos Configuration ✅ OPERATIONAL ### Custom Talos Image - **Factory Image**: `613e1592b2da41ae5e265e8789429f22e121aab91cb4deb6bc3c0b6262961245:v1.10.4`, which includes two plugins necessary for Longhorn - **Extensions**: Longhorn extension included for distributed storage - **Version**: Talos v1.10.4 with custom factory build - **Architecture**: ARM64 optimized for NetCup Cloud infrastructure ### Patch Configuration Applied via `patches/` directory for cluster customization: - **allow-controlplane-workloads.yaml**: Enables workload scheduling on control plane - **cluster-name.yaml**: Sets cluster name to `keyboardvagabond.com` - **disable-kube-proxy-and-cni.yaml**: Disables built-in networking for Cilium - **etcd-patch.yaml**: etcd optimization and configuration - **registry-patch.yaml**: Container registry configuration - **worker-discovery-patch.yaml**: Worker node discovery settings ## Network Configuration ✅ OPERATIONAL ### NetCup Cloud Infrastructure - **vLAN ID**: 1004963 for internal cluster communication - **Network Range**: 10.132.0.0/24 (private VLAN) - **DNS Domain**: `cluster.local` (standard Kubernetes domain) - **Cluster Name**: `keyboardvagabond.com` ### Node Network Configuration | Node | Public IP | VLAN IP | Role | Status | |------|-----------|---------|------|--------| | **n1** | 152.53.107.24 | 10.132.0.10/24 | Control Plane | ✅ Schedulable | | **n2** | 152.53.105.81 | 10.132.0.20/24 | Control Plane | ✅ Schedulable | | **n3** | 152.53.200.111 | 10.132.0.30/24 | Control Plane | ✅ Schedulable | - **Control Plane VIP**: `10.132.0.5` (shared VIP, nodes elect primary for HA) - **All nodes are control plane**: High availability with etcd quorum (2 of 3 required) ### Network Interface Configuration - **`enp7s0`**: Public interface (DHCP + static configuration) - **`enp9s0`**: Private VLAN interface (static configuration) - **Internal Traffic**: Uses private VLAN for pod-to-pod and storage replication - **External Access**: Cloudflare Zero Trust tunnels (no direct port exposure) ## Administrative Access Configuration ✅ SECURED ### Kubernetes API Access - **Internal Context**: `admin@keyboardvagabond-tailscale` - **VIP Endpoint**: `10.132.0.5:6443` (shared VIP, recommended for HA) - **Node Endpoints**: `10.132.0.10:6443`, `10.132.0.20:6443`, `10.132.0.30:6443` (individual nodes) - **Public Context**: `admin@keyboardvagabond.com` (blocked by firewall) - **Public Endpoint**: `api.keyboardvagabond.com:6443` (Tailscale-only) - **Access Method**: Tailscale mesh VPN required (CGNAT 100.64.0.0/10) ### Talos API Access ```bash # Talos configuration (VIP recommended for HA) talosctl config endpoint 10.132.0.5 # VIP endpoint talosctl config node 10.132.0.5 # VIP node # Alternative: Individual node endpoints talosctl config endpoint 10.132.0.10 10.132.0.20 10.132.0.30 talosctl config node 10.132.0.10 # Primary endpoint ``` ### Essential Management Commands ```bash # Cluster health check talosctl health --nodes 10.132.0.10,10.132.0.20,10.132.0.30 # Node status talosctl get members # Kubernetes context switching kubectl config use-context admin@keyboardvagabond-tailscale # Node status verification kubectl get nodes -o wide ``` ## Storage Configuration Details ✅ OPERATIONAL ### Longhorn Distributed Storage - **Installation Path**: `/var/lib/longhorn` on each node - **Replica Policy**: 2-replica configuration across nodes - **Storage Class**: `longhorn-retain` for data preservation - **Node Allocation**: 400GB+ per node on system disk - **Auto-balance**: Enabled for optimal distribution ### Volume Configuration - **System Disk**: `/dev/vda` with ephemeral storage - **Longhorn Volume**: 400GB minimum allocation per node - **Backup Strategy**: Label-based S3 backup selection - **Reclaim Policy**: Retain (prevents data loss) ## Tailscale Mesh VPN Configuration ✅ OPERATIONAL ### Tailscale Operator Deployment - **Helm Chart**: `tailscale-operator` from Tailscale Helm repository - **Version**: v1.90.x (operator v1.90.8) - **Namespace**: `tailscale-system` - **Replicas**: 2 operator pods with anti-affinity - **Hostname**: `keyboardvagabond-operator` ### Subnet Router Configuration (Connector Resource) - **Resource Type**: `Connector` (tailscale.com/v1alpha1) - **Device Name**: `keyboardvagabond-cluster` - **Advertised Networks**: - **Pod Network**: 10.244.0.0/16 - **Service Network**: 10.96.0.0/12 - **VLAN Network**: 10.132.0.0/24 - **OAuth Integration**: Client credentials for device authentication - **Device Tagging**: `tag:k8s-operator` for ACL management ### Service Exposure via Magic DNS - **Capability**: Services can be exposed via Tailscale operator with meta attributes - **Magic DNS**: Automatic DNS resolution for exposed services - **Meta Attributes**: Can be used to configure service exposure and routing - **Access Control**: Cilium host firewall restricts to Tailscale only - **Current CGNAT Range**: 100.64.0.0/10 (Tailscale assigned) ## Component Status Matrix ✅ CURRENT STATE ### Active Components | Component | Status | Access Method | Notes | |-----------|--------|---------------|-------| | **Cilium CNI** | ✅ Operational | Internal | Host firewall + Hubble UI | | **Longhorn Storage** | ✅ Operational | Internal | 2-replica with S3 backup | | **PostgreSQL HA** | ✅ Operational | Internal | 3-instance CloudNativePG | | **Harbor Registry** | ✅ Operational | Direct HTTPS | Zero Trust incompatible | | **OpenObserve** | ✅ Operational | Zero Trust | Monitoring platform | | **Tailscale VPN** | ✅ Operational | Mesh Network | Administrative access | ### Disabled/Deprecated Components | Component | Status | Reason | Alternative | |-----------|--------|--------|-------------| | **external-dns** | ❌ Removed | Zero Trust migration | Manual DNS in Cloudflare | | **cert-manager** | ❌ Removed | Zero Trust migration | Cloudflare edge TLS | | **Rook-Ceph** | ❌ Disabled | Complexity and lack of support for partitioning a single drive | Longhorn storage | | **Flux GitOps** | ⏸️ Disabled | Manual deployment | Ready for re-activation | ### Development Components | Component | Status | Purpose | Access | |-----------|--------|---------|--------| | **Renovate** | ✅ Operational | Dependency updates | Automated | | **Elasticsearch** | ✅ Operational | Log aggregation | Internal | | **Kibana** | ✅ Operational | Log analytics | Zero Trust | ## Network Security Configuration ✅ HARDENED ### Cilium Host Firewall Rules ```yaml # Control plane API access (Tailscale only) - fromCIDR: ["100.64.0.0/10"] # Tailscale CGNAT toPorts: [{"port": "6443", "protocol": "TCP"}] # Block world access to HTTP/HTTPS - HTTP/HTTPS ports blocked from 0.0.0.0/0 - Only cluster-internal and Tailscale access permitted ``` ### Zero Trust Architecture - **External Applications**: All via Cloudflare tunnels - **Administrative APIs**: Tailscale mesh VPN only - **Harbor Exception**: Direct ports 80/443 (header modification issues) - **Internal Services**: Cluster-local communication only ## Future Scaling Specifications ### Node Addition Process 1. **Network**: Add to NetCup Cloud vLAN 1004963 2. **IP Assignment**: Sequential (10.132.0.40/24, 10.132.0.50/24, etc.) 3. **Talos Config**: Apply machine config with proper networking 4. **Longhorn**: Automatic storage distribution across new nodes 5. **Workload**: Immediate scheduling capability ### High Availability Expansion - **Additional Control Planes**: Can add for true HA setup - **Load Balancing**: MetalLB or cloud LB integration ready - **Database Scaling**: PostgreSQL can expand to more replicas - **Storage Scaling**: Longhorn distributed across all nodes @talos-machine-config-template.yaml @cilium-network-policy-template.yaml @longhorn-volume-template.yaml