redaction (#1)
Add the redacted source file for demo purposes Reviewed-on: https://source.michaeldileo.org/michael_dileo/Keybard-Vagabond-Demo/pulls/1 Co-authored-by: Michael DiLeo <michael_dileo@proton.me> Co-committed-by: Michael DiLeo <michael_dileo@proton.me>
This commit was merged in pull request #1.
This commit is contained in:
189
.cursor/rules/technical-specifications.mdc
Normal file
189
.cursor/rules/technical-specifications.mdc
Normal file
@@ -0,0 +1,189 @@
|
||||
---
|
||||
description: Detailed technical specifications for nodes, network, and Talos configuration
|
||||
globs: ["machineconfigs/**/*", "patches/**/*", "talosconfig", "kubeconfig*"]
|
||||
alwaysApply: false
|
||||
---
|
||||
|
||||
# Technical Specifications & Low-Level Configuration
|
||||
|
||||
## Talos Configuration ✅ OPERATIONAL
|
||||
|
||||
### Custom Talos Image
|
||||
- **Factory Image**: `613e1592b2da41ae5e265e8789429f22e121aab91cb4deb6bc3c0b6262961245:v1.10.4`, which includes two plugins necessary for Longhorn
|
||||
- **Extensions**: Longhorn extension included for distributed storage
|
||||
- **Version**: Talos v1.10.4 with custom factory build
|
||||
- **Architecture**: ARM64 optimized for NetCup Cloud infrastructure
|
||||
|
||||
### Patch Configuration
|
||||
Applied via `patches/` directory for cluster customization:
|
||||
- **allow-controlplane-workloads.yaml**: Enables workload scheduling on control plane
|
||||
- **cluster-name.yaml**: Sets cluster name to `keyboardvagabond.com`
|
||||
- **disable-kube-proxy-and-cni.yaml**: Disables built-in networking for Cilium
|
||||
- **etcd-patch.yaml**: etcd optimization and configuration
|
||||
- **registry-patch.yaml**: Container registry configuration
|
||||
- **worker-discovery-patch.yaml**: Worker node discovery settings
|
||||
|
||||
## Network Configuration ✅ OPERATIONAL
|
||||
|
||||
### NetCup Cloud Infrastructure
|
||||
- **vLAN ID**: 1004963 for internal cluster communication
|
||||
- **Network Range**: 10.132.0.0/24 (private VLAN)
|
||||
- **DNS Domain**: `cluster.local` (standard Kubernetes domain)
|
||||
- **Cluster Name**: `keyboardvagabond.com`
|
||||
|
||||
### Node Network Configuration
|
||||
| Node | Public IP | VLAN IP | Role | Status |
|
||||
|------|-----------|---------|------|--------|
|
||||
| **n1** | 152.53.107.24 | 10.132.0.10/24 | Control Plane | ✅ Schedulable |
|
||||
| **n2** | 152.53.105.81 | 10.132.0.20/24 | Control Plane | ✅ Schedulable |
|
||||
| **n3** | 152.53.200.111 | 10.132.0.30/24 | Control Plane | ✅ Schedulable |
|
||||
- **Control Plane VIP**: `10.132.0.5` (shared VIP, nodes elect primary for HA)
|
||||
- **All nodes are control plane**: High availability with etcd quorum (2 of 3 required)
|
||||
|
||||
### Network Interface Configuration
|
||||
- **`enp7s0`**: Public interface (DHCP + static configuration)
|
||||
- **`enp9s0`**: Private VLAN interface (static configuration)
|
||||
- **Internal Traffic**: Uses private VLAN for pod-to-pod and storage replication
|
||||
- **External Access**: Cloudflare Zero Trust tunnels (no direct port exposure)
|
||||
|
||||
## Administrative Access Configuration ✅ SECURED
|
||||
|
||||
### Kubernetes API Access
|
||||
- **Internal Context**: `admin@keyboardvagabond-tailscale`
|
||||
- **VIP Endpoint**: `10.132.0.5:6443` (shared VIP, recommended for HA)
|
||||
- **Node Endpoints**: `10.132.0.10:6443`, `10.132.0.20:6443`, `10.132.0.30:6443` (individual nodes)
|
||||
- **Public Context**: `admin@keyboardvagabond.com` (blocked by firewall)
|
||||
- **Public Endpoint**: `api.keyboardvagabond.com:6443` (Tailscale-only)
|
||||
- **Access Method**: Tailscale mesh VPN required (CGNAT 100.64.0.0/10)
|
||||
|
||||
### Talos API Access
|
||||
```bash
|
||||
# Talos configuration (VIP recommended for HA)
|
||||
talosctl config endpoint 10.132.0.5 # VIP endpoint
|
||||
talosctl config node 10.132.0.5 # VIP node
|
||||
|
||||
# Alternative: Individual node endpoints
|
||||
talosctl config endpoint 10.132.0.10 10.132.0.20 10.132.0.30
|
||||
talosctl config node 10.132.0.10 # Primary endpoint
|
||||
```
|
||||
|
||||
### Essential Management Commands
|
||||
```bash
|
||||
# Cluster health check
|
||||
talosctl health --nodes 10.132.0.10,10.132.0.20,10.132.0.30
|
||||
|
||||
# Node status
|
||||
talosctl get members
|
||||
|
||||
# Kubernetes context switching
|
||||
kubectl config use-context admin@keyboardvagabond-tailscale
|
||||
|
||||
# Node status verification
|
||||
kubectl get nodes -o wide
|
||||
```
|
||||
|
||||
## Storage Configuration Details ✅ OPERATIONAL
|
||||
|
||||
### Longhorn Distributed Storage
|
||||
- **Installation Path**: `/var/lib/longhorn` on each node
|
||||
- **Replica Policy**: 2-replica configuration across nodes
|
||||
- **Storage Class**: `longhorn-retain` for data preservation
|
||||
- **Node Allocation**: 400GB+ per node on system disk
|
||||
- **Auto-balance**: Enabled for optimal distribution
|
||||
|
||||
### Volume Configuration
|
||||
- **System Disk**: `/dev/vda` with ephemeral storage
|
||||
- **Longhorn Volume**: 400GB minimum allocation per node
|
||||
- **Backup Strategy**: Label-based S3 backup selection
|
||||
- **Reclaim Policy**: Retain (prevents data loss)
|
||||
|
||||
## Tailscale Mesh VPN Configuration ✅ OPERATIONAL
|
||||
|
||||
### Tailscale Operator Deployment
|
||||
- **Helm Chart**: `tailscale-operator` from Tailscale Helm repository
|
||||
- **Version**: v1.90.x (operator v1.90.8)
|
||||
- **Namespace**: `tailscale-system`
|
||||
- **Replicas**: 2 operator pods with anti-affinity
|
||||
- **Hostname**: `keyboardvagabond-operator`
|
||||
|
||||
### Subnet Router Configuration (Connector Resource)
|
||||
- **Resource Type**: `Connector` (tailscale.com/v1alpha1)
|
||||
- **Device Name**: `keyboardvagabond-cluster`
|
||||
- **Advertised Networks**:
|
||||
- **Pod Network**: 10.244.0.0/16
|
||||
- **Service Network**: 10.96.0.0/12
|
||||
- **VLAN Network**: 10.132.0.0/24
|
||||
- **OAuth Integration**: Client credentials for device authentication
|
||||
- **Device Tagging**: `tag:k8s-operator` for ACL management
|
||||
|
||||
### Service Exposure via Magic DNS
|
||||
- **Capability**: Services can be exposed via Tailscale operator with meta attributes
|
||||
- **Magic DNS**: Automatic DNS resolution for exposed services
|
||||
- **Meta Attributes**: Can be used to configure service exposure and routing
|
||||
- **Access Control**: Cilium host firewall restricts to Tailscale only
|
||||
- **Current CGNAT Range**: 100.64.0.0/10 (Tailscale assigned)
|
||||
|
||||
## Component Status Matrix ✅ CURRENT STATE
|
||||
|
||||
### Active Components
|
||||
| Component | Status | Access Method | Notes |
|
||||
|-----------|--------|---------------|-------|
|
||||
| **Cilium CNI** | ✅ Operational | Internal | Host firewall + Hubble UI |
|
||||
| **Longhorn Storage** | ✅ Operational | Internal | 2-replica with S3 backup |
|
||||
| **PostgreSQL HA** | ✅ Operational | Internal | 3-instance CloudNativePG |
|
||||
| **Harbor Registry** | ✅ Operational | Direct HTTPS | Zero Trust incompatible |
|
||||
| **OpenObserve** | ✅ Operational | Zero Trust | Monitoring platform |
|
||||
| **Tailscale VPN** | ✅ Operational | Mesh Network | Administrative access |
|
||||
|
||||
### Disabled/Deprecated Components
|
||||
| Component | Status | Reason | Alternative |
|
||||
|-----------|--------|--------|-------------|
|
||||
| **external-dns** | ❌ Removed | Zero Trust migration | Manual DNS in Cloudflare |
|
||||
| **cert-manager** | ❌ Removed | Zero Trust migration | Cloudflare edge TLS |
|
||||
| **Rook-Ceph** | ❌ Disabled | Complexity and lack of support for partitioning a single drive | Longhorn storage |
|
||||
| **Flux GitOps** | ⏸️ Disabled | Manual deployment | Ready for re-activation |
|
||||
|
||||
### Development Components
|
||||
| Component | Status | Purpose | Access |
|
||||
|-----------|--------|---------|--------|
|
||||
| **Renovate** | ✅ Operational | Dependency updates | Automated |
|
||||
| **Elasticsearch** | ✅ Operational | Log aggregation | Internal |
|
||||
| **Kibana** | ✅ Operational | Log analytics | Zero Trust |
|
||||
|
||||
## Network Security Configuration ✅ HARDENED
|
||||
|
||||
### Cilium Host Firewall Rules
|
||||
```yaml
|
||||
# Control plane API access (Tailscale only)
|
||||
- fromCIDR: ["100.64.0.0/10"] # Tailscale CGNAT
|
||||
toPorts: [{"port": "6443", "protocol": "TCP"}]
|
||||
|
||||
# Block world access to HTTP/HTTPS
|
||||
- HTTP/HTTPS ports blocked from 0.0.0.0/0
|
||||
- Only cluster-internal and Tailscale access permitted
|
||||
```
|
||||
|
||||
### Zero Trust Architecture
|
||||
- **External Applications**: All via Cloudflare tunnels
|
||||
- **Administrative APIs**: Tailscale mesh VPN only
|
||||
- **Harbor Exception**: Direct ports 80/443 (header modification issues)
|
||||
- **Internal Services**: Cluster-local communication only
|
||||
|
||||
## Future Scaling Specifications
|
||||
|
||||
### Node Addition Process
|
||||
1. **Network**: Add to NetCup Cloud vLAN 1004963
|
||||
2. **IP Assignment**: Sequential (10.132.0.40/24, 10.132.0.50/24, etc.)
|
||||
3. **Talos Config**: Apply machine config with proper networking
|
||||
4. **Longhorn**: Automatic storage distribution across new nodes
|
||||
5. **Workload**: Immediate scheduling capability
|
||||
|
||||
### High Availability Expansion
|
||||
- **Additional Control Planes**: Can add for true HA setup
|
||||
- **Load Balancing**: MetalLB or cloud LB integration ready
|
||||
- **Database Scaling**: PostgreSQL can expand to more replicas
|
||||
- **Storage Scaling**: Longhorn distributed across all nodes
|
||||
|
||||
@talos-machine-config-template.yaml
|
||||
@cilium-network-policy-template.yaml
|
||||
@longhorn-volume-template.yaml
|
||||
Reference in New Issue
Block a user