add source code and readme

2025-12-24 14:35:17 +01:00
parent 7c92e1e610
commit 74324d5a1b
331 changed files with 39272 additions and 1 deletions
--- a/.cursor/rules/00-project-overview.mdc
+++ b/.cursor/rules/00-project-overview.mdc
@@ -0,0 +1,58 @@
+---
+description: Keyboard Vagabond project overview and core infrastructure context
+globs: []
+alwaysApply: true
+---
+
+# Keyboard Vagabond - Project Overview
+
+## System Overview
+This is a **Talos-based Kubernetes cluster** designed to host **fediverse applications** for <200 MAU (Monthly Active Users):
+- **Mastodon** (Twitter-like microblogging) ✅ OPERATIONAL
+- **Pixelfed** (Instagram-like photo sharing) ✅ OPERATIONAL  
+- **PieFed** (Reddit-like forum) ✅ OPERATIONAL
+- **BookWyrm** (Social reading platform) ✅ OPERATIONAL
+- **Matrix** (Chat/messaging) - Future deployment
+
+## Architecture Summary ✅ OPERATIONAL
+- **Three ARM64 Nodes**: n1, n2, n3 (all control plane nodes with VIP 10.132.0.5)
+- **Zero Trust Security**: Cloudflare tunnels + Tailscale mesh VPN
+- **Storage**: Longhorn distributed with S3 backup to Backblaze B2
+- **Database**: PostgreSQL HA cluster with CloudNativePG operator
+- **Cache**: Redis HA cluster with HAProxy (redis-ha-haproxy.redis-system.svc.cluster.local)
+- **Monitoring**: OpenTelemetry + OpenObserve (O2)
+- **Registry**: Harbor container registry
+- **CDN**: Per-application Cloudflare CDN with dedicated S3 buckets
+
+## Project Structure
+```
+keyboard-vagabond/
+├── .cursor/rules/              # Cursor rules (this directory)
+├── docs/                       # Operational documentation and guides
+├── manifests/                  # Kubernetes manifests
+│   ├── infrastructure/         # Core infrastructure components
+│   ├── applications/           # Fediverse applications  
+│   └── cluster/flux-system/    # GitOps configuration
+├── build/                      # Custom container builds
+├── machineconfigs/            # Talos node configurations
+└── tools/                     # Development utilities
+```
+
+## Rule Organization
+The `.cursor/rules/` directory contains specialized rules:
+- **00-project-overview.mdc** (this file) - Always applied project context
+- **infrastructure.mdc**: Auto-attached when working in `manifests/infrastructure/`
+- **applications.mdc**: Auto-attached when working in `manifests/applications/`
+- **security.mdc**: SOPS and Zero Trust patterns (auto-attached for YAML files)
+- **development.mdc**: Development patterns and operational guidelines
+- **troubleshooting-history.mdc**: Historical issues, migrations, and lessons learned
+- **templates/**: Common configuration templates (*.yaml files)
+
+## Key Operational Facts
+- **Domain**: `keyboardvagabond.com`
+- **API Endpoint**: `api.keyboardvagabond.com:6443` (Tailscale-only access)
+- **Control Plane VIP**: `10.132.0.5:6443` (nodes elect primary, VIP provides HA)
+- **Zero Trust**: All external services via Cloudflare tunnels (no port exposure)
+- **Network**: NetCup Cloud vLAN 1004963 (10.132.0.0/24)
+- **Security**: Enterprise-grade with SOPS encryption, mesh VPN, host firewall
+- **Status**: Fully operational, production-ready cluster
--- a/.cursor/rules/applications.mdc
+++ b/.cursor/rules/applications.mdc
@@ -0,0 +1,124 @@
+---
+description: Fediverse applications deployment patterns and configurations
+globs: ["manifests/applications/**/*", "build/**/*"]
+alwaysApply: false
+---
+
+# Fediverse Applications ✅ OPERATIONAL
+
+## Application Overview
+All applications use **Zero Trust architecture** via Cloudflare tunnels with dedicated S3 buckets for media storage:
+
+### Currently Deployed Applications
+- **Mastodon**: `https://mastodon.keyboardvagabond.com` - Microblogging platform ✅ OPERATIONAL
+- **Pixelfed**: `https://pixelfed.keyboardvagabond.com` - Photo sharing platform ✅ OPERATIONAL  
+- **PieFed**: `https://piefed.keyboardvagabond.com` - Forum/Reddit-like platform ✅ OPERATIONAL
+- **BookWyrm**: `https://bookwyrm.keyboardvagabond.com` - Social reading platform ✅ OPERATIONAL
+- **Picsur**: `https://picsur.keyboardvagabond.com` - Image storage ✅ OPERATIONAL
+
+## Application Architecture Patterns
+
+### Multi-Container Design
+Most fediverse applications use **multi-container architecture**:
+- **Web Container**: HTTP requests, API, web UI (Nginx + app server)
+- **Worker Container**: Background jobs, federation, media processing
+- **Beat Container**: (Django apps only) Celery Beat scheduler for periodic tasks
+
+### Storage Strategy ✅ OPERATIONAL
+**Per-Application CDN Strategy**: Each application uses dedicated Backblaze B2 bucket with Cloudflare CDN:
+- **Pixelfed CDN**: `pm.keyboardvagabond.com` → `pixelfed-bucket`
+- **PieFed CDN**: `pfm.keyboardvagabond.com` → `piefed-bucket`  
+- **Mastodon CDN**: `mm.keyboardvagabond.com` → `mastodon-bucket`
+- **BookWyrm CDN**: `bm.keyboardvagabond.com` → `bookwyrm-bucket`
+
+### Database Integration
+All applications use the shared **PostgreSQL HA cluster**:
+- **Connection**: `postgresql-shared-rw.postgresql-system.svc.cluster.local:5432`
+- **Dedicated Databases**: Each app has its own database (e.g., `mastodon`, `pixelfed`, `piefed`, `bookwyrm`)
+- **High Availability**: 3-instance cluster with automatic failover
+
+## Framework-Specific Patterns
+
+### Laravel Applications (Pixelfed)
+```yaml
+# Critical Laravel S3 Configuration
+FILESYSTEM_DRIVER=s3
+PF_ENABLE_CLOUD=true
+FILESYSTEM_CLOUD=s3
+AWS_BUCKET=pixelfed-bucket  # Dedicated bucket approach
+AWS_URL=https://pm.keyboardvagabond.com/  # CDN URL
+```
+
+### Flask Applications (PieFed)  
+```yaml
+# Flask Configuration with Redis and S3
+FLASK_APP=pyfedi.py
+DATABASE_URL=
+CACHE_REDIS_URL=
+S3_BUCKET=
+S3_PUBLIC_URL=https://pfm.keyboardvagabond.com
+```
+
+### Django Applications (BookWyrm)
+```yaml
+# Django S3 Configuration  
+USE_S3=true
+AWS_STORAGE_BUCKET_NAME=bookwyrm-bucket
+AWS_S3_CUSTOM_DOMAIN=bm.keyboardvagabond.com
+AWS_DEFAULT_ACL=""  # Backblaze B2 doesn't support ACLs
+```
+
+### Ruby Applications (Mastodon)
+```yaml
+# Mastodon Dual Ingress Pattern
+# Web: mastodon.keyboardvagabond.com
+# Streaming: streamingmastodon.keyboardvagabond.com (WebSocket)
+STREAMING_API_BASE_URL: wss://streamingmastodon.keyboardvagabond.com
+```
+
+## Container Build Patterns
+
+### Multi-Stage Docker Strategy ✅ WORKING
+Optimized builds reduce image size by ~75%:
+- **Base Image**: Shared foundation with dependencies and source code
+- **Web Container**: Production web server configuration  
+- **Worker Container**: Background processing optimizations
+- **Size Reduction**: From 1.3GB single-stage to ~350MB multi-stage
+
+### Harbor Registry Integration
+- **Registry**: `<YOUR_REGISTRY_URL>`
+- **Image Pattern**: `<YOUR_REGISTRY_URL>/library/app-name:tag`
+- **Build Process**: `./build-all.sh` in project root
+
+## ActivityPub Inbox Rate Limiting ✅ OPERATIONAL
+
+### Nginx Burst Configuration Pattern
+Implemented across all fediverse applications to handle federation traffic spikes:
+```nginx
+# Rate limiting zone - 100MB buffer, 10 requests/second
+limit_req_zone $binary_remote_addr zone=inbox:100m rate=10r/s;
+
+# ActivityPub inbox location block
+location /inbox {
+    limit_req zone=inbox burst=300;  # 300 request buffer
+    # Extended timeouts for ActivityPub processing
+}
+```
+
+### Rate Limiting Behavior
+- **Normal Operation**: 10 requests/second processed immediately
+- **Burst Handling**: Up to 300 additional requests queued
+- **Overflow Response**: HTTP 503 when buffer exceeds capacity
+- **Federation Impact**: Protects backend from overwhelming traffic spikes
+
+## Application Deployment Standards
+- **Zero Trust Ingress**: All applications use Cloudflare tunnel pattern
+- **Container Registry**: Harbor for all custom images  
+- **Multi-Stage Builds**: Required for Python/Node.js applications
+- **Storage**: Longhorn with 2-replica redundancy
+- **Monitoring**: ServiceMonitor integration with OpenObserve
+- **Rate Limiting**: ActivityPub inbox protection for all fediverse apps
+
+@fediverse-app-template.yaml
+@s3-storage-config-template.yaml  
+@activitypub-rate-limiting-template.yaml
--- a/.cursor/rules/development.mdc
+++ b/.cursor/rules/development.mdc
@@ -0,0 +1,140 @@
+---
+description: Development patterns, operational guidelines, and troubleshooting
+globs: ["build/**/*", "tools/**/*", "justfile", "*.md"]
+alwaysApply: false
+---
+
+# Development Patterns & Operational Guidelines
+
+## Configuration Management
+- **Kustomize**: Used for resource composition and patching via `patches/` directory
+- **Helm**: Complex applications deployed via HelmRelease CRDs
+- **GitOps**: All applications deployed via Flux from Git repository (`k8s-fleet` branch)
+- **Staging**: Use separate branches/overlays for staging vs production environments
+
+## Application Deployment Standards
+- **Container Registry**: Use Harbor (`<YOUR_REGISTRY_URL>`) for all custom images
+- **Multi-Stage Builds**: Implement for Python/Node.js applications to reduce image size by ~75%
+- **Storage**: Use Longhorn with 2-replica redundancy, label volumes for S3 backup selection  
+- **Database**: Leverage shared PostgreSQL cluster with dedicated databases per application
+- **Monitoring**: Implement ServiceMonitor for OpenObserve integration
+
+## Email Templates & User Onboarding
+- **Community Signup**: Professional welcome email template at `docs/email-templates/community-signup.html`
+- **Authentik Integration**: Uses `{AUTHENTIK_URL}` placeholder for account activation links
+- **Documentation**: Complete setup guide in `docs/email-templates/README.md`
+- **Services Overview**: Template showcases all fediverse services with direct links
+- **Branding**: Features horizontal Keyboard Vagabond logo from Picsur CDN
+- **Rate Limiting**: Implement ActivityPub inbox burst protection for all fediverse applications
+
+## Container Build Patterns
+
+### Multi-Stage Docker Strategy ✅ WORKING
+**Key Lessons Learned**:
+- **Framework Identification**: Critical to identify Flask vs Django early (different command structures)
+- **Python Virtual Environment**: uWSGI must use same Python version as venv
+- **Static File Paths**: Flask apps with application factory have nested structure (`/app/app/static/`)
+- **Database Initialization**: Flask requires explicit `flask init-db` command
+- **Log File Permissions**: Non-root users need explicit ownership of log files
+
+### Build Process
+```bash
+# Build all containers
+./build-all.sh
+
+# Build specific application
+cd build/app-name
+docker build -t <YOUR_REGISTRY_URL>/library/app-name:tag .
+docker push <YOUR_REGISTRY_URL>/library/app-name:tag
+```
+
+## Key Framework Patterns
+
+### Flask Applications (PieFed)
+- **Environment Variables**: URL-based configuration (DATABASE_URL, REDIS_URL)
+- **uWSGI Integration**: Install via pip in venv, not Alpine packages
+- **Static Files**: Careful nginx configuration for nested structure
+- **Multi-stage Builds**: Essential to remove build dependencies
+
+### Django Applications (BookWyrm)
+- **S3 Static Files**: Theme compilation before static collection
+- **Celery Beat**: Single instance only (prevents duplicate scheduling)
+- **ACL Configuration**: Backblaze B2 requires empty `AWS_DEFAULT_ACL`
+
+### Laravel Applications (Pixelfed) 
+- **S3 Default Disk**: `DANGEROUSLY_SET_FILESYSTEM_DRIVER=s3` required
+- **Cache Invalidation**: `php artisan config:cache` after S3 changes
+- **Dedicated Buckets**: Avoid prefix conflicts with dedicated bucket approach
+
+## Operational Tools & Management
+
+### Administrative Access ✅ SECURED
+- **kubectl Context**: `admin@keyboardvagabond-tailscale` (internal VLAN IP)
+- **Tailscale Client**: CGNAT range 100.64.0.0/10 access only
+- **Harbor Registry**: Direct HTTPS access (Zero Trust incompatible)
+
+### Essential Commands
+```bash
+# Talos cluster management (Tailscale VPN required)
+talosctl config endpoint 10.132.0.10 10.132.0.20 10.132.0.30
+talosctl health
+
+# Kubernetes cluster access
+kubectl config use-context admin@keyboardvagabond-tailscale
+kubectl get nodes
+
+# SOPS secret management
+sops -e -i secrets.yaml
+sops -d secrets.yaml | kubectl apply -f -
+
+# Flux GitOps management  
+flux get sources all
+flux reconcile source git flux-system
+```
+
+### Terminal Environment Notes
+- **PowerShell on macOS**: PSReadLine may display errors but commands execute successfully
+- **Terminal Preference**: Use default OS terminal over PowerShell (except Windows)
+- **Command Output**: Despite display issues, outputs remain readable and functional
+
+## Scaling Preparation
+- **Node Addition**: NetCup Cloud vLAN 1004963 with sequential IPs (10.132.0.x/24)
+- **Storage Scaling**: Longhorn distributed across nodes with S3 backup integration
+- **Load Balancing**: MetalLB or cloud load balancer integration ready
+- **High Availability**: Additional control plane nodes can be added
+
+## Troubleshooting Patterns
+
+### Zero Trust Issues
+- **Corporate VPN Blocking**: SSL handshake failures - test from different networks
+- **Service Discovery**: Check label mismatch between service selector and pod labels
+- **StatefulSet Issues**: Use manual Helm deployment for immutable field changes
+
+### Common Application Issues
+- **PHP Applications**: Clear Laravel config cache after environment changes
+- **Flask Applications**: Verify uWSGI Python version matches venv
+- **Django Applications**: Ensure theme compilation before static file collection
+- **Container Builds**: Multi-stage builds reduce size but require careful dependency management
+
+### Network & Storage Issues
+- **Longhorn**: Check replica distribution across nodes
+- **S3 Backup**: Verify volume labels for backup inclusion
+- **Database**: Use read replicas for read-heavy operations
+- **CDN**: Dedicated buckets eliminate prefix conflicts
+
+## Performance Optimizations
+- **CDN Caching**: Cloudflare cache rules for static assets (1 year cache)
+- **Image Processing**: Background workers handle optimization and federation
+- **Database Optimization**: Read replicas and proper indexing
+- **ActivityPub Rate Limiting**: 10r/s with 300 request burst buffer
+
+## Future Development Guidelines
+- **New Services**: Zero Trust ingress pattern mandatory (no cert-manager/external-dns)
+- **Security**: Never expose external ingress ports - all traffic via Cloudflare tunnels
+- **CDN Strategy**: Use dedicated S3 buckets per application
+- **Subdomains**: Cloudflare Free plan supports only one level (`app.domain.com`)
+
+@development-workflow-template.yaml
+@container-build-template.dockerfile
+@troubleshooting-history.mdc
+@talos-config-template.yaml
--- a/.cursor/rules/fediverse-app-template.yaml
+++ b/.cursor/rules/fediverse-app-template.yaml
@@ -0,0 +1,124 @@
+# Fediverse Application Deployment Template
+# Multi-container architecture with web, worker, and optional beat containers
+
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: app-web
+  namespace: app-namespace
+spec:
+  replicas: 2
+  selector:
+    matchLabels:
+      app: app-name
+      component: web
+  template:
+    metadata:
+      labels:
+        app: app-name
+        component: web
+    spec:
+      containers:
+      - name: web
+        image: <YOUR_REGISTRY_URL>/library/app-name:latest
+        ports:
+        - containerPort: 8080
+        env:
+        - name: DATABASE_URL
+          value: "postgresql://user:password@postgresql-shared-rw.postgresql-system.svc.cluster.local:5432/app_db"
+        - name: REDIS_URL
+          value: "redis://:password@redis-ha-haproxy.redis-system.svc.cluster.local:6379/0"
+        - name: S3_BUCKET
+          value: "app-bucket"
+        - name: S3_CDN_URL
+          value: "https://cdn.keyboardvagabond.com"
+        envFrom:
+        - secretRef:
+            name: app-secret
+        - configMapRef:
+            name: app-config
+        volumeMounts:
+        - name: app-storage
+          mountPath: /app/storage
+        resources:
+          requests:
+            memory: "256Mi"
+            cpu: "100m"
+          limits:
+            memory: "1Gi"
+            cpu: "500m"
+      volumes:
+      - name: app-storage
+        persistentVolumeClaim:
+          claimName: app-storage-pvc
+
+---
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: app-worker
+  namespace: app-namespace
+spec:
+  replicas: 1
+  selector:
+    matchLabels:
+      app: app-name
+      component: worker
+  template:
+    metadata:
+      labels:
+        app: app-name
+        component: worker
+    spec:
+      containers:
+      - name: worker
+        image: <YOUR_REGISTRY_URL>/library/app-worker:latest
+        command: ["worker-command"]  # Framework-specific worker command
+        env:
+        - name: DATABASE_URL
+          value: "postgresql://user:password@postgresql-shared-rw.postgresql-system.svc.cluster.local:5432/app_db"
+        - name: REDIS_URL
+          value: "redis://:password@redis-ha-haproxy.redis-system.svc.cluster.local:6379/0"
+        envFrom:
+        - secretRef:
+            name: app-secret
+        - configMapRef:
+            name: app-config
+        resources:
+          requests:
+            memory: "128Mi"
+            cpu: "50m"
+          limits:
+            memory: "512Mi"
+            cpu: "200m"
+
+---
+# Optional: Celery Beat for Django applications (single replica only)
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: app-beat
+  namespace: app-namespace
+spec:
+  replicas: 1  # CRITICAL: Never scale beyond 1 replica
+  strategy:
+    type: Recreate  # Ensures only one scheduler runs
+  selector:
+    matchLabels:
+      app: app-name
+      component: beat
+  template:
+    metadata:
+      labels:
+        app: app-name
+        component: beat
+    spec:
+      containers:
+      - name: beat
+        image: <YOUR_REGISTRY_URL>/library/app-worker:latest
+        command: ["celery", "-A", "app", "beat", "-l", "info", "--scheduler", "django_celery_beat.schedulers:DatabaseScheduler"]
+        envFrom:
+        - secretRef:
+            name: app-secret
+        - configMapRef:
+            name: app-config
--- a/.cursor/rules/infrastructure.mdc
+++ b/.cursor/rules/infrastructure.mdc
@@ -0,0 +1,157 @@
+---
+description: Infrastructure components configuration and deployment patterns
+globs: ["manifests/infrastructure/**/*", "manifests/cluster/**/*"]
+alwaysApply: false
+---
+
+# Infrastructure Components ✅ OPERATIONAL
+
+## Core Infrastructure Stack
+Located in `manifests/infrastructure/`:
+- **Networking**: Cilium CNI with host firewall and Hubble UI ✅ **OPERATIONAL**
+- **Storage**: Longhorn distributed storage (2-replica configuration) ✅ **OPERATIONAL**
+- **Ingress**: NGINX Ingress Controller with hostNetwork enabled (Zero Trust mode) ✅ **OPERATIONAL**
+- **Zero Trust Tunnels**: Cloudflared deployment in `cloudflared-system` namespace ✅ **OPERATIONAL**
+- **Registry**: Harbor container registry (`<YOUR_REGISTRY_URL>`) ✅ **OPERATIONAL**
+- **Monitoring**: OpenTelemetry Operator + OpenObserve (O2) ✅ **OPERATIONAL**
+- **Database**: PostgreSQL with CloudNativePG operator ✅ **OPERATIONAL**
+- **Identity**: Authentik open-source IAM ✅ **OPERATIONAL**
+- **VPN**: Tailscale mesh VPN for administrative access ✅ **OPERATIONAL**
+
+## Component Status Matrix
+### Active Components ✅ OPERATIONAL
+- **Cilium**: CNI with kube-proxy replacement, host firewall
+- **Longhorn**: Distributed storage with S3 backup to Backblaze B2
+- **PostgreSQL**: 3-instance HA cluster with comprehensive monitoring
+- **Harbor**: Container registry (direct HTTPS - Zero Trust incompatible)
+- **OpenObserve**: Monitoring and observability platform
+- **Authentik**: Open-source identity and access management
+- **Renovate**: Automated dependency updates ✅ **ACTIVE**
+
+### Disabled/Deprecated Components
+- **external-dns**: ❌ **REMOVED** (replaced by Zero Trust tunnels)
+- **cert-manager**: ❌ **REMOVED** (replaced by Cloudflare edge TLS)
+- **Rook-Ceph**: ⏸️ **DISABLED** (complexity - using Longhorn instead)
+- **Flux GitOps**: ⏸️ **DISABLED** (manual deployment - ready for re-activation)
+
+### Development/Optional Components
+- **Elasticsearch**: ✅ **OPERATIONAL** (log aggregation)
+- **Kibana**: ✅ **OPERATIONAL** (log analytics via Zero Trust tunnel)
+
+## Network Configuration ✅ OPERATIONAL
+- **NetCup Cloud vLAN**: VLAN ID 1004963 for internal cluster communication
+- **Control Plane VIP**: `10.132.0.5` (shared VIP, nodes elect primary for HA)
+- **Node IPs** (all control plane nodes):
+  - n1 (152.53.107.24): Public + 10.132.0.10/24 (VLAN)
+  - n2 (152.53.105.81): Public + 10.132.0.20/24 (VLAN) 
+  - n3 (152.53.200.111): Public + 10.132.0.30/24 (VLAN)
+- **DNS Domain**: Uses standard `cluster.local` for maximum compatibility
+- **CNI**: Cilium with kube-proxy replacement
+- **Service Mesh**: Cilium with Hubble for observability
+
+## Storage Configuration ✅ OPERATIONAL
+### Longhorn Storage
+- **Default Path**: `/var/lib/longhorn`
+- **Replica Count**: 2 (distributed across nodes)
+- **Storage Class**: `longhorn-retain` for data preservation
+- **S3 Backup**: Backblaze B2 integration with label-based volume selection
+
+### S3 Backup Configuration
+- **Provider**: Backblaze B2 Cloud Storage  
+- **Cost**: $6/TB storage with $0 egress fees via Cloudflare partnership
+- **Volume Selection**: Label-based tagging system for selective backup
+- **Disaster Recovery**: Automated backup scheduling and restore capabilities
+
+## Database Configuration ✅ OPERATIONAL
+### PostgreSQL with CloudNativePG
+- **Cluster Name**: `postgres-shared` in `postgresql-system` namespace
+- **High Availability**: 3-instance cluster with automatic failover
+- **Instances**: `postgres-shared-2` (primary), `postgres-shared-4`, `postgres-shared-5`
+- **Monitoring**: Port 9187 for comprehensive metrics export
+- **Backup Strategy**: Integrated with S3 backup system via Longhorn volume labels
+
+## Cache Configuration ✅ OPERATIONAL
+### Redis HA Cluster
+- **Helm Chart**: `redis-ha` from `dandydeveloper/charts` (replaced deprecated Bitnami chart)
+- **Namespace**: `redis-system`
+- **Architecture**: 3 Redis replicas with Sentinel for HA, 3 HAProxy pods for load balancing
+- **Connection String**: `redis-ha-haproxy.redis-system.svc.cluster.local:6379`
+- **HAProxy**: Provides unified read/write endpoint managed by 3 HAProxy pods
+- **Storage**: Longhorn persistent volumes (20Gi per Redis instance)
+- **Authentication**: SOPS-encrypted credentials in `redis-credentials` secret
+- **Monitoring**: Redis exporter and HAProxy metrics via ServiceMonitor
+
+### PostgreSQL Comprehensive Metrics ✅ OPERATIONAL
+- **Connection Metrics**: `cnpg_backends_total`, `cnpg_pg_settings_setting{name="max_connections"}`
+- **Performance Metrics**: `cnpg_pg_stat_database_xact_commit`, `cnpg_pg_stat_database_xact_rollback`
+- **Storage Metrics**: `cnpg_pg_database_size_bytes`, `cnpg_pg_stat_database_blks_hit`
+- **Cluster Health**: `cnpg_collector_up`, `cnpg_collector_postgres_version`
+- **Security**: Role-based access control with `pg_monitor` role for metrics collection
+- **Backup Integration**: Native support for WAL archiving and point-in-time recovery
+- **Custom Queries**: ConfigMap-based custom query system with proper RBAC permissions
+- **Dashboard Integration**: Native OpenObserve integration with predefined monitoring queries
+
+## Security & Access Control ✅ ZERO TRUST ARCHITECTURE
+### Zero Trust Migration ✅ COMPLETED
+- **Migration Status**: 10 of 11 external services migrated to Cloudflare Zero Trust tunnels
+- **Harbor Exception**: Direct port exposure (80/443) due to header modification issues
+- **Dependencies Removed**: external-dns and cert-manager no longer needed
+- **Security Improvement**: No external ingress ports exposed
+
+### Tailscale Administrative Access ✅ IMPLEMENTED  
+- **Deployment Model**: Tailscale Operator Helm Chart (v1.90.x)
+- **Operator**: Deployed in `tailscale-system` namespace with 2 replicas
+- **Subnet Router**: Connector resource advertising internal networks (Pod: 10.244.0.0/16, Service: 10.96.0.0/12, VLAN: 10.132.0.0/24)
+- **Magic DNS**: Services can be exposed via Tailscale operator with meta attributes for DNS resolution
+- **OAuth Integration**: Device authentication and tagging with `tag:k8s-operator`
+- **Hostname**: `keyboardvagabond-operator` for operator, `keyboardvagabond-cluster` for subnet router
+
+## Infrastructure Deployment Patterns
+### Kustomize Configuration
+```yaml
+# Standard kustomization.yaml structure
+apiVersion: kustomize.config.k8s.io/v1beta1
+kind: Kustomization
+namespace: component-namespace
+resources:
+  - namespace.yaml
+  - component.yaml
+  - monitoring.yaml
+```
+
+### Helm Integration
+```yaml
+# HelmRelease for complex applications
+apiVersion: helm.toolkit.fluxcd.io/v2beta1
+kind: HelmRelease
+metadata:
+  name: component-name
+  namespace: component-namespace
+spec:
+  chart:
+    spec:
+      chart: chart-name
+      sourceRef:
+        kind: HelmRepository
+        name: repo-name
+```
+
+## Operational Procedures
+
+### Node Addition and Scaling
+When adding new nodes to the cluster, specific steps are required to ensure monitoring and metrics collection continue working properly:
+
+- **Nginx Ingress Metrics**: See `docs/NODE-ADDITION-GUIDE.md` for complete procedures
+  - Nginx ingress controller deploys automatically (DaemonSet)
+  - OpenTelemetry collector static scrape configuration requires manual update
+  - Must add new node IP to targets list in `manifests/infrastructure/openobserve-collector/gateway-collector.yaml`
+  - Verification steps include checking metrics endpoints and collector logs
+
+### Key Files for Node Operations
+- **Monitoring Configuration**: `manifests/infrastructure/openobserve-collector/gateway-collector.yaml`
+- **Network Policies**: `manifests/infrastructure/cluster-policies/host-fw-*.yaml`  
+- **Node Addition Guide**: `docs/NODE-ADDITION-GUIDE.md`
+
+@zero-trust-ingress-template.yaml
+@longhorn-storage-template.yaml
+@postgresql-database-template.yaml
--- a/.cursor/rules/longhorn-storage-template.yaml
+++ b/.cursor/rules/longhorn-storage-template.yaml
@@ -0,0 +1,128 @@
+# Longhorn Storage Templates
+# Persistent volume configurations with backup labels
+
+apiVersion: v1
+kind: PersistentVolumeClaim
+metadata:
+  name: app-storage-pvc
+  namespace: app-namespace
+  labels:
+    # S3 backup inclusion labels
+    recurring-job.longhorn.io/backup: enabled
+    recurring-job-group.longhorn.io/backup: enabled
+spec:
+  accessModes:
+    - ReadWriteMany  # Default for applications that may scale horizontally
+    # Use ReadWriteOnce for:
+    # - Single-instance applications (databases, stateful apps)  
+    # - CloudNativePG (manages its own storage replication)
+    # - Applications with file locking requirements
+  storageClassName: longhorn-retain  # Data preservation on deletion
+  resources:
+    requests:
+      storage: 10Gi
+
+---
+# Longhorn StorageClass with retain policy
+apiVersion: storage.k8s.io/v1
+kind: StorageClass
+metadata:
+  name: longhorn-retain
+provisioner: driver.longhorn.io
+allowVolumeExpansion: true
+reclaimPolicy: Retain  # Preserves data on PVC deletion
+volumeBindingMode: Immediate
+parameters:
+  numberOfReplicas: "2"  # 2-replica redundancy
+  staleReplicaTimeout: "2880"  # 48 hours
+  fromBackup: ""
+  fsType: "xfs"
+  dataLocality: "disabled"  # Allow cross-node placement
+
+---
+# Longhorn Backup Target Configuration
+apiVersion: v1
+kind: Secret
+metadata:
+  name: longhorn-backup-target
+  namespace: longhorn-system
+type: Opaque
+data:
+  # Backblaze B2 credentials (base64 encoded, encrypted by SOPS)
+  AWS_ACCESS_KEY_ID: base64-encoded-key-id
+  AWS_SECRET_ACCESS_KEY: base64-encoded-secret-key
+  AWS_ENDPOINTS: aHR0cHM6Ly9zMy5ldS1jZW50cmFsLTAwMy5iYWNrYmxhemViMi5jb20=  # Base64: https://s3.eu-central-003.backblazeb2.com
+
+---
+# Longhorn RecurringJob for S3 Backup
+apiVersion: longhorn.io/v1beta2
+kind: RecurringJob
+metadata:
+  name: backup-to-s3
+  namespace: longhorn-system
+spec:
+  cron: "0 2 * * *"  # Daily at 2 AM
+  task: "backup"
+  groups:
+  - backup
+  retain: 7  # Keep 7 daily backups
+  concurrency: 2  # Concurrent backup jobs
+  labels:
+    recurring-job: backup-to-s3
+
+---
+# Volume labeling example for backup inclusion
+apiVersion: v1
+kind: PersistentVolume
+metadata:
+  name: example-pv
+  labels:
+    # These labels ensure volume is included in S3 backup jobs
+    recurring-job.longhorn.io/backup: enabled
+    recurring-job-group.longhorn.io/backup: enabled
+spec:
+  capacity:
+    storage: 10Gi
+  accessModes:
+    - ReadWriteOnce
+  persistentVolumeReclaimPolicy: Retain
+  storageClassName: longhorn-retain
+  csi:
+    driver: driver.longhorn.io
+    volumeHandle: example-volume-id
+
+# Example: Database storage (ReadWriteOnce required)
+---
+apiVersion: v1
+kind: PersistentVolumeClaim
+metadata:
+  name: postgres-storage-pvc
+  namespace: postgresql-system
+  labels:
+    recurring-job.longhorn.io/backup: enabled
+    recurring-job-group.longhorn.io/backup: enabled
+spec:
+  accessModes:
+    - ReadWriteOnce  # Required for databases - single writer only
+  storageClassName: longhorn-retain
+  resources:
+    requests:
+      storage: 50Gi
+
+# Access Mode Guidelines:
+# - ReadWriteMany (RWX): Default for horizontally scalable applications
+#   * Web applications that can run multiple pods
+#   * Shared file storage for multiple containers
+#   * Applications without file locking conflicts
+#
+# - ReadWriteOnce (RWO): Required for specific use cases
+#   * Database storage (PostgreSQL, Redis) - single writer required
+#   * Applications with file locking (SQLite, local file databases)
+#   * StatefulSets that manage their own replication
+#   * Single-instance applications by design
+
+# Backup Strategy Notes:
+# - Cost: $6/TB storage with $0 egress fees via Cloudflare partnership
+# - Selection: Label-based tagging system for selective volume backup
+# - Recovery: Automated backup scheduling and restore capabilities
+# - Target: @/longhorn backup location in Backblaze B2
--- a/.cursor/rules/postgresql-database-template.yaml
+++ b/.cursor/rules/postgresql-database-template.yaml
@@ -0,0 +1,202 @@
+# PostgreSQL Database Templates
+# CloudNativePG cluster configuration and application integration
+
+# Main PostgreSQL Cluster (already deployed as postgres-shared)
+---
+apiVersion: postgresql.cnpg.io/v1
+kind: Cluster
+metadata:
+  name: postgres-shared
+  namespace: postgresql-system
+spec:
+  instances: 3  # High availability with automatic failover
+  
+  postgresql:
+    parameters:
+      max_connections: "200"
+      shared_buffers: "256MB"
+      effective_cache_size: "1GB"
+      
+  bootstrap:
+    initdb:
+      database: postgres
+      owner: postgres
+      
+  storage:
+    storageClass: longhorn-retain
+    size: 50Gi
+    
+  monitoring:
+    enabled: true
+    
+# Application-specific database and user creation
+---
+apiVersion: postgresql.cnpg.io/v1
+kind: Database
+metadata:
+  name: app-database
+  namespace: postgresql-system
+spec:
+  name: app_db
+  owner: app_user
+  cluster:
+    name: postgres-shared
+
+---
+# Application database user secret
+apiVersion: v1
+kind: Secret
+metadata:
+  name: app-postgresql-secret
+  namespace: app-namespace
+type: Opaque
+data:
+  # Base64 encoded credentials (encrypted by SOPS)
+  # Replace with actual base64-encoded values before encryption
+  username: <REPLACE_WITH_BASE64_ENCODED_USERNAME>
+  password: <REPLACE_WITH_BASE64_ENCODED_PASSWORD>
+  database: <REPLACE_WITH_BASE64_ENCODED_DATABASE_NAME>
+
+---
+# Connection examples for different frameworks
+
+# Laravel/Pixelfed connection
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: laravel-db-config
+data:
+  DB_CONNECTION: "pgsql"
+  DB_HOST: "postgresql-shared-rw.postgresql-system.svc.cluster.local"
+  DB_PORT: "5432"
+  DB_DATABASE: "pixelfed"
+
+# Flask/PieFed connection  
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: flask-db-config
+data:
+  DATABASE_URL: "postgresql://piefed_user:<REPLACE_WITH_PASSWORD>@postgresql-shared-rw.postgresql-system.svc.cluster.local:5432/piefed"
+
+# Django/BookWyrm connection
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: django-db-config
+data:
+  POSTGRES_HOST: "postgresql-shared-rw.postgresql-system.svc.cluster.local"
+  PGPORT: "5432"
+  POSTGRES_DB: "bookwyrm"
+  POSTGRES_USER: "bookwyrm_user"
+
+# Ruby/Mastodon connection
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: mastodon-db-config
+data:
+  DB_HOST: "postgresql-shared-rw.postgresql-system.svc.cluster.local"
+  DB_PORT: "5432"
+  DB_NAME: "mastodon"
+  DB_USER: "mastodon_user"
+
+---
+# Database monitoring ServiceMonitor
+apiVersion: monitoring.coreos.com/v1
+kind: ServiceMonitor
+metadata:
+  name: postgresql-metrics
+  namespace: postgresql-system
+spec:
+  selector:
+    matchLabels:
+      cnpg.io/cluster: postgres-shared
+  endpoints:
+  - port: metrics
+    interval: 30s
+    path: /metrics
+
+# Connection Patterns:
+# - Read/Write: postgresql-shared-rw.postgresql-system.svc.cluster.local:5432
+# - Read Only: postgresql-shared-ro.postgresql-system.svc.cluster.local:5432  
+# - Read Replica: postgresql-shared-r.postgresql-system.svc.cluster.local:5432
+# - Monitoring: Port 9187 for comprehensive PostgreSQL metrics
+# - Backup: Integrated with S3 backup system via Longhorn volume labels
+
+# Read Replica Usage Examples:
+
+# Mastodon - Read replicas for timeline queries and caching
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: mastodon-db-replica-config
+data:
+  DB_HOST: "postgresql-shared-rw.postgresql-system.svc.cluster.local"  # Primary for writes
+  DB_REPLICA_HOST: "postgresql-shared-ro.postgresql-system.svc.cluster.local"  # Read replica for queries
+  DB_PORT: "5432"
+  DB_NAME: "mastodon"
+  # Mastodon automatically uses read replicas for timeline and cache queries
+
+# PieFed - Flask app with read/write splitting
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: piefed-db-replica-config
+data:
+  # Primary database for writes
+  DATABASE_URL: "postgresql://piefed_user:<REPLACE_WITH_PASSWORD>@postgresql-shared-rw.postgresql-system.svc.cluster.local:5432/piefed"
+  # Read replica for heavy queries (feeds, search, analytics)
+  DATABASE_REPLICA_URL: "postgresql://piefed_user:<REPLACE_WITH_PASSWORD>@postgresql-shared-ro.postgresql-system.svc.cluster.local:5432/piefed"
+
+# Authentik - Optimized performance with primary and replica load balancing  
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: authentik-db-replica-config
+data:
+  AUTHENTIK_POSTGRESQL__HOST: "postgresql-shared-rw.postgresql-system.svc.cluster.local"
+  AUTHENTIK_POSTGRESQL__PORT: "5432"
+  AUTHENTIK_POSTGRESQL__NAME: "authentik"
+  # Authentik can use read replicas for user lookups and session validation
+  AUTHENTIK_POSTGRESQL_REPLICA__HOST: "postgresql-shared-ro.postgresql-system.svc.cluster.local"
+
+# BookWyrm - Django with database routing for read replicas
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: bookwyrm-db-replica-config
+data:
+  POSTGRES_HOST: "postgresql-shared-rw.postgresql-system.svc.cluster.local"  # Primary
+  POSTGRES_REPLICA_HOST: "postgresql-shared-ro.postgresql-system.svc.cluster.local"  # Read replica
+  PGPORT: "5432"
+  POSTGRES_DB: "bookwyrm"
+  # Django database routing can direct read queries to replica automatically
+
+# Available Metrics:
+# - Connection: cnpg_backends_total, cnpg_pg_settings_setting{name="max_connections"}
+# - Performance: cnpg_pg_stat_database_xact_commit, cnpg_pg_stat_database_xact_rollback  
+# - Storage: cnpg_pg_database_size_bytes, cnpg_pg_stat_database_blks_hit
+# - Health: cnpg_collector_up, cnpg_collector_postgres_version
+
+# CRITICAL PostgreSQL Pod Management Safety ⚠️
+# Source: https://cloudnative-pg.io/documentation/1.20/failure_modes/
+
+# ✅ SAFE: Proper pod deletion for failover testing
+# kubectl delete pod [primary-pod] --grace-period=1
+
+# ❌ DANGEROUS: Never use grace-period=0 
+# kubectl delete pod [primary-pod] --grace-period=0  # NEVER DO THIS!
+#
+# Why grace-period=0 is dangerous:
+# - Immediately removes pod from Kubernetes API without proper shutdown
+# - Doesn't ensure PID 1 process (instance manager) is shut down
+# - Operator triggers failover without guarantee primary was properly stopped
+# - Can cause misleading results in failover simulation tests
+# - Does not reflect real failure scenarios (power loss, network partition)
+
+# Proper PostgreSQL Pod Operations:
+# - Use --grace-period=1 for failover simulation tests
+# - Allow CloudNativePG operator to handle automatic failover
+# - Use cnpg.io/reconciliationLoop: "disabled" annotation only for emergency manual intervention
+# - Always remove reconciliation disable annotation after emergency operations
--- a/.cursor/rules/s3-storage-config-template.yaml
+++ b/.cursor/rules/s3-storage-config-template.yaml
@@ -0,0 +1,132 @@
+# S3 Storage Configuration Templates
+# Framework-specific S3 integration patterns with dedicated bucket approach
+
+# Laravel/Pixelfed S3 Configuration
+---
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: pixelfed-s3-config
+data:
+  # Critical Laravel S3 Configuration
+  FILESYSTEM_DRIVER: "s3"
+  DANGEROUSLY_SET_FILESYSTEM_DRIVER: "s3"  # Required for S3 default disk
+  PF_ENABLE_CLOUD: "true"
+  FILESYSTEM_CLOUD: "s3"
+  FILESYSTEM_DISK: "s3"
+  
+  # Backblaze B2 S3-Compatible Storage
+  AWS_BUCKET: "pixelfed-bucket"  # Dedicated bucket approach
+  AWS_URL: "<REPLACE_WITH_CDN_URL>"  # CDN URL
+  AWS_ENDPOINT: "<REPLACE_WITH_S3_ENDPOINT>"
+  AWS_ROOT: ""  # Empty - no prefix needed with dedicated bucket
+  AWS_USE_PATH_STYLE_ENDPOINT: "false"
+  AWS_VISIBILITY: "public"
+
+# Flask/PieFed S3 Configuration  
+---
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: piefed-s3-config
+data:
+  # S3 Storage (Backblaze B2)
+  S3_BUCKET: "piefed-bucket"
+  S3_REGION: "<REPLACE_WITH_S3_REGION>"
+  S3_ENDPOINT_URL: "<REPLACE_WITH_S3_ENDPOINT>"
+  S3_PUBLIC_URL: "<REPLACE_WITH_CDN_URL>"
+
+# Django/BookWyrm S3 Configuration
+---
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: bookwyrm-s3-config
+data:
+  # S3 Storage (Backblaze B2)
+  USE_S3: "true"
+  AWS_STORAGE_BUCKET_NAME: "bookwyrm-bucket"
+  AWS_S3_REGION_NAME: "<REPLACE_WITH_S3_REGION>"
+  AWS_S3_ENDPOINT_URL: "<REPLACE_WITH_S3_ENDPOINT>"
+  AWS_S3_CUSTOM_DOMAIN: "<REPLACE_WITH_CDN_DOMAIN>"
+  AWS_DEFAULT_ACL: ""  # Backblaze B2 doesn't support ACLs
+
+# Ruby/Mastodon S3 Configuration
+---
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: mastodon-s3-config
+data:
+  # S3 Object Storage
+  S3_ENABLED: "true"
+  S3_BUCKET: "mastodon-bucket"
+  S3_REGION: "<REPLACE_WITH_S3_REGION>"
+  S3_ENDPOINT: "<REPLACE_WITH_S3_ENDPOINT>"
+  S3_HOSTNAME: "<REPLACE_WITH_S3_HOSTNAME>"
+  S3_ALIAS_HOST: "<REPLACE_WITH_CDN_DOMAIN>"
+
+# Generic S3 Secret Template
+---
+apiVersion: v1
+kind: Secret
+metadata:
+  name: s3-credentials
+type: Opaque
+data:
+  # Base64 encoded values (will be encrypted by SOPS)
+  # Replace with actual base64-encoded values before encryption
+  AWS_ACCESS_KEY_ID: <REPLACE_WITH_BASE64_ENCODED_KEY_ID>
+  AWS_SECRET_ACCESS_KEY: <REPLACE_WITH_BASE64_ENCODED_SECRET_KEY>
+  S3_KEY: <REPLACE_WITH_BASE64_ENCODED_KEY_ID>  # Flask apps use this naming
+  S3_SECRET: <REPLACE_WITH_BASE64_ENCODED_SECRET_KEY>  # Flask apps use this naming
+
+# CDN Mapping Reference
+# | Application | CDN Subdomain | S3 Bucket | Purpose |
+# |------------|---------------|-----------|---------|
+# | Pixelfed | pm.keyboardvagabond.com | pixelfed-bucket | Photo/media sharing |
+# | PieFed | pfm.keyboardvagabond.com | piefed-bucket | Forum content/uploads |
+# | Mastodon | mm.keyboardvagabond.com | mastodon-bucket | Social media/attachments |
+# | BookWyrm | bm.keyboardvagabond.com | bookwyrm-bucket | Book covers/user uploads |
+
+# Redis Connection Pattern (HAProxy-based):
+# - HAProxy (Read/Write): redis-ha-haproxy.redis-system.svc.cluster.local:6379
+# - Managed by 3 HAProxy pods providing unified endpoint
+# - Redis HA cluster: 3 Redis replicas with Sentinel for HA
+# - Helm Chart: redis-ha from dandydeveloper/charts (replaced deprecated Bitnami)
+
+# Redis Usage Examples:
+
+# Mastodon - Redis for caching and Sidekiq job queue
+---
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: mastodon-redis-config
+data:
+  REDIS_HOST: "redis-ha-haproxy.redis-system.svc.cluster.local"  # HAProxy endpoint
+  REDIS_PORT: "6379"
+
+# PieFed - Flask with Redis for cache and Celery broker  
+---
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: piefed-redis-config
+data:
+  # All Redis connections use HAProxy endpoint
+  CACHE_REDIS_URL: "redis://:<REPLACE_WITH_REDIS_PASSWORD>@redis-ha-haproxy.redis-system.svc.cluster.local:6379/1"
+  CELERY_BROKER_URL: "redis://:<REPLACE_WITH_REDIS_PASSWORD>@redis-ha-haproxy.redis-system.svc.cluster.local:6379/2"
+
+# BookWyrm - Django with Redis for broker and activity streams
+---
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: bookwyrm-redis-config
+data:
+  # All Redis connections use HAProxy endpoint
+  REDIS_BROKER_HOST: "redis-ha-haproxy.redis-system.svc.cluster.local:6379"
+  REDIS_ACTIVITY_HOST: "redis-ha-haproxy.redis-system.svc.cluster.local:6379"
+  REDIS_BROKER_DB_INDEX: "3"
+  REDIS_ACTIVITY_DB: "4"
--- a/.cursor/rules/security.mdc
+++ b/.cursor/rules/security.mdc
@@ -0,0 +1,176 @@
+---
+description: Security patterns including SOPS encryption, Zero Trust, and access control
+globs: ["**/*.yaml", "machineconfigs/**/*", "secrets.yaml", "*.conf"]
+alwaysApply: false
+---
+
+# Security & Encryption ✅ OPERATIONAL
+
+## 🛡️ Maximum Security Architecture Achieved
+- **🚫 Zero External Port Exposure**: No direct internet access to any cluster services
+- **🔐 Dual Security Layers**: Cloudflare Zero Trust (public apps) + Tailscale Mesh VPN (admin access)
+- **🌐 CGNAT-Only API Access**: Kubernetes/Talos APIs restricted to Tailscale network (100.64.0.0/10)
+- **🔒 Encrypted Everything**: SOPS secrets, Zero Trust tunnels, mesh VPN connections
+- **🛡️ Host Firewall**: Cilium policies blocking world access to HTTP/HTTPS ports
+
+## SOPS Configuration ✅ OPERATIONAL
+### Encryption Scope
+- **Files Covered**: All YAML files in `manifests/` directory, Talos configs, machine configurations
+- **Fields Encrypted**: `data` and `stringData` fields in manifests, plus specific credential fields
+- **Key Management**: Multiple PGP keys configured for different components
+- **Workflow**: All secrets encrypted with SOPS before Git commit
+
+### SOPS Usage Patterns
+```bash
+# Encrypt new secret
+sops -e -i secrets.yaml
+
+# Edit encrypted secret  
+sops secrets.yaml
+
+# Decrypt for viewing
+sops -d secrets.yaml
+
+#Decrypt in place
+sops -d -i secrets.yaml
+
+# Apply encrypted manifest
+sops -d secrets.yaml | kubectl apply -f -
+```
+Sops encrypted files should be applied with kubectl in the unencrypted format, and encrypted before
+merging into source control.
+
+## Zero Trust Architecture ✅ MIGRATED
+
+### Zero Trust Tunnels ✅ OPERATIONAL
+- **Cloudflared Deployment**: `cloudflared-system` namespace
+- **Tunnel Architecture**: Secure connectivity without exposing ingress ports
+- **TLS Termination**: Cloudflare edge handles SSL/TLS
+- **DNS Management**: Manual DNS record creation (external-dns removed)
+
+### Standard Zero Trust Ingress Pattern
+```yaml
+apiVersion: networking.k8s.io/v1
+kind: Ingress
+metadata:
+  name: app-ingress
+  namespace: app-namespace
+  annotations:
+    # Basic NGINX Configuration only - no cert-manager or external-dns
+    kubernetes.io/ingress.class: nginx
+    nginx.ingress.kubernetes.io/backend-protocol: "HTTP"
+spec:
+  ingressClassName: nginx
+  tls: []  # Empty - TLS handled by Cloudflare edge
+  rules:
+  - host: app.keyboardvagabond.com
+    http:
+      paths:
+      - path: /
+        pathType: Prefix
+        backend:
+          service:
+            name: app-service
+            port:
+              number: 80
+```
+
+### Migration Steps for Zero Trust
+1. **Remove cert-manager annotations**: `cert-manager.io/cluster-issuer`, `cert-manager.io/issuer`
+2. **Remove external-dns annotations**: `external-dns.alpha.kubernetes.io/hostname`, `external-dns.alpha.kubernetes.io/target`
+3. **Empty TLS sections**: Set `tls: []` to disable certificate generation
+4. **Configure Cloudflare tunnel**: Add hostname in Zero Trust dashboard
+5. **Test connectivity**: Use `kubectl run curl-test` to verify internal service health
+
+## Access Control Matrix
+| **Resource** | **Public Access** | **Administrative Access** | **Security Method** |
+|--------------|-------------------|---------------------------|---------------------|
+| **Applications** | ✅ Cloudflare Zero Trust | ❌ Not Applicable | Authenticated tunnels |
+| **Kubernetes API** | ❌ Blocked | ✅ Tailscale Mesh VPN | CGNAT + OAuth |
+| **Talos API** | ❌ Blocked | ✅ Tailscale Mesh VPN | CGNAT + OAuth |
+| **HTTP/HTTPS Services** | ❌ Blocked | ✅ Cluster Internal Only | Host firewall |
+| **Media CDN** | ✅ Cloudflare CDN | ❌ Not Applicable | Public S3 + Edge caching |
+
+## Tailscale Mesh VPN ✅ OPERATIONAL
+
+### Administrative Access Configuration
+- **kubectl Context**: `admin@keyboardvagabond-tailscale` using internal VLAN IP (10.132.0.10:6443)
+- **Public Context**: `admin@keyboardvagabond.com` (blocked by firewall)
+- **Tailscale Client**: Current IP range 100.64.0.0/10 (CGNAT)
+- **Firewall Rules**: Cilium host firewall restricts API access to Tailscale network only
+
+### Tailscale Subnet Router Configuration ✅ OPERATIONAL
+- **Device Name**: `keyboardvagabond-cluster`
+- **Deployment Model**: Direct deployment (not Kubernetes Operator) for simplicity
+- **Advertised Networks**:
+  - **Pod Network**: 10.244.0.0/16 (Kubernetes pods)
+  - **Service Network**: 10.96.0.0/12 (Kubernetes services)
+  - **VLAN Network**: 10.132.0.0/24 (NetCup Cloud private network)
+- **OAuth Integration**: Client credentials for device authentication and tagging
+- **Device Tagging**: `tag:k8s-operator` for proper ACL management and identification
+- **Network Mode**: Kernel mode (`TS_USERSPACE=false`) with privileged security context
+- **State Persistence**: Kubernetes secret-based storage (`TS_KUBE_SECRET=tailscale-auth`)
+- **RBAC**: Split permissions (ClusterRole for cluster resources, Role for namespace secrets)
+
+### Tailscale Deployment Pattern
+```yaml
+# Direct deployment (not Kubernetes Operator)
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: tailscale-subnet-router
+spec:
+  template:
+    spec:
+      containers:
+      - name: tailscale
+        env:
+        - name: TS_KUBE_SECRET
+          value: tailscale-auth
+        - name: TS_USERSPACE
+          value: "false"
+        - name: TS_ROUTES
+          value: "10.244.0.0/16,10.96.0.0/12,10.132.0.0/24"
+        securityContext:
+          privileged: true
+```
+
+## Network Security ✅ OPERATIONAL
+
+### Cilium Host Firewall
+```yaml
+# Host firewall blocking external access to HTTP/HTTPS
+apiVersion: cilium.io/v2
+kind: CiliumClusterwideNetworkPolicy
+metadata:
+  name: host-fw-control-plane
+spec:
+  nodeSelector:
+    matchLabels:
+      node-role.kubernetes.io/control-plane: ""
+  ingress:
+  - fromCIDR:
+    - "100.64.0.0/10"  # Tailscale CGNAT range only
+    toPorts:
+    - ports:
+      - port: "6443"
+        protocol: TCP
+```
+
+## Security Best Practices
+- **New Services**: All applications must use Zero Trust ingress pattern
+- **Harbor Exception**: Harbor registry requires direct port exposure (header modification issues)
+- **Secret Management**: All secrets SOPS-encrypted before Git commit
+- **Network Policies**: Cilium host firewall with CGNAT-only access
+- **Administrative Access**: Tailscale mesh VPN required for kubectl/talosctl
+
+## 🏆 Security Achievements
+1. **🎯 Zero Trust Network**: No implicit trust, all access authenticated and authorized
+2. **🔐 Defense in Depth**: Multiple security layers prevent single points of failure  
+3. **📊 Comprehensive Monitoring**: All traffic flows monitored via OpenObserve and Cilium Hubble
+4. **🔄 Secure GitOps**: SOPS-encrypted secrets with PGP key management
+5. **🛡️ Hardened Infrastructure**: Minimal attack surface with production-grade security controls
+
+@sops-secret-template.yaml
+@zero-trust-ingress-template.yaml
+@tailscale-config-template.yaml
--- a/.cursor/rules/sops-secret-template.yaml
+++ b/.cursor/rules/sops-secret-template.yaml
@@ -0,0 +1,48 @@
+# SOPS Secret Template
+# Use this template for creating encrypted secrets
+
+apiVersion: v1
+kind: Secret
+metadata:
+  name: app-secret
+  namespace: app-namespace
+type: Opaque
+data:
+  # These fields will be encrypted by SOPS
+  # Replace with actual base64-encoded values before encryption
+  DATABASE_PASSWORD: <REPLACE_WITH_BASE64_ENCODED_PASSWORD>
+  S3_ACCESS_KEY: <REPLACE_WITH_BASE64_ENCODED_KEY>
+  S3_SECRET_KEY: <REPLACE_WITH_BASE64_ENCODED_SECRET>
+  REDIS_PASSWORD: <REPLACE_WITH_BASE64_ENCODED_PASSWORD>
+
+---
+# ConfigMap for non-sensitive configuration
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: app-config
+  namespace: app-namespace
+data:
+  # Database connection
+  DATABASE_HOST: "postgresql-shared-rw.postgresql-system.svc.cluster.local"
+  DATABASE_PORT: "5432"
+  DATABASE_NAME: "app_database"
+  
+  # Redis connection
+  REDIS_HOST: "redis-ha-haproxy.redis-system.svc.cluster.local"
+  REDIS_PORT: "6379"
+  
+  # S3 storage configuration
+  S3_BUCKET: "app-bucket"
+  S3_REGION: "<REPLACE_WITH_S3_REGION>"
+  S3_ENDPOINT: "<REPLACE_WITH_S3_ENDPOINT>"
+  S3_CDN_URL: "<REPLACE_WITH_CDN_URL>"
+  
+  # Application settings
+  APP_ENV: "production"
+  APP_DEBUG: "false"
+
+# SOPS encryption commands:
+# sops -e -i this-file.yaml
+# sops this-file.yaml  # to edit
+# sops -d this-file.yaml | kubectl apply -f -  # to apply
--- a/.cursor/rules/talos-config-template.yaml
+++ b/.cursor/rules/talos-config-template.yaml
@@ -0,0 +1,96 @@
+# Talos Configuration Templates
+# Machine configurations and Talos-specific patterns
+
+# Custom Talos Factory Image
+# Uses factory image with Longhorn extension pre-installed
+TALOS_FACTORY_IMAGE: "613e1592b2da41ae5e265e8789429f22e121aab91cb4deb6bc3c0b6262961245:v1.10.4"
+
+# Network Interface Configuration
+---
+apiVersion: v1alpha1
+kind: MachineConfig
+metadata:
+  name: node-config
+spec:
+  machine:
+    network:
+      interfaces:
+      # Public interface (DHCP + static configuration)
+      - interface: enp7s0
+        dhcp: true
+        addresses:
+        - 152.53.107.24/24  # Example for n1
+        routes:
+        - network: 0.0.0.0/0
+          gateway: 152.53.107.1
+          
+      # Private VLAN interface (static configuration)  
+      - interface: enp9s0
+        addresses:
+        - 10.132.0.10/24    # Example for n1 (VLAN 1004963)
+        vip:
+          ip: 10.132.0.5     # Shared VIP for control plane HA
+
+  # Node IP Configuration
+  machine:
+    kubelet:
+      extraArgs:
+        node-ip: 152.53.107.24  # Use public IP for node reporting
+
+# Node IP Mappings (NetCup Cloud vLAN 1004963)
+# All nodes are control plane nodes with shared VIP for HA
+# n1: Public 152.53.107.24 + Private 10.132.0.10/24 (Control plane)
+# n2: Public 152.53.105.81 + Private 10.132.0.20/24 (Control plane)  
+# n3: Public 152.53.200.111 + Private 10.132.0.30/24 (Control plane)
+# VIP: 10.132.0.5 (shared VIP, nodes elect primary)
+
+# Cluster Configuration
+---
+apiVersion: v1alpha1
+kind: ClusterConfig
+metadata:
+  name: keyboardvagabond
+spec:
+  clusterName: keyboardvagabond.com
+  controlPlane:
+    endpoint: https://10.132.0.5:6443  # VIP endpoint for HA
+  
+  # Allow workloads on control plane
+  allowSchedulingOnControlPlanes: true
+  
+  # CNI Configuration (Cilium)
+  network:
+    cni:
+      name: none  # Cilium installed via Helm
+    dnsDomain: cluster.local  # Standard domain for compatibility
+    
+  # API Server Configuration
+  apiServer:
+    extraArgs:
+      # Enable aggregation layer for metrics
+      enable-aggregator-routing: "true"
+
+# Volume Configuration
+# System disk: /dev/vda with 2-50GB ephemeral storage
+# Longhorn storage: 400GB minimum on system disk at /var/lib/longhorn
+
+# Administrative Access Commands
+# Recommended: Use VIP endpoint for HA
+# talosctl config endpoint 10.132.0.5  # VIP endpoint
+# talosctl config node 10.132.0.5
+# talosctl health
+# talosctl dashboard (via Tailscale VPN only)
+
+# Alternative: Individual node endpoints
+# talosctl config endpoint 10.132.0.10 10.132.0.20 10.132.0.30
+# talosctl config node 10.132.0.10
+
+# kubectl Contexts:
+# - admin@keyboardvagabond-tailscale (VIP: 10.132.0.5:6443 or node IPs) - ACTIVE
+# - admin@keyboardvagabond.com (blocked by firewall, Tailscale-only access)
+
+# Security Notes:
+# - API access restricted to Tailscale CGNAT range (100.64.0.0/10)
+# - Cilium host firewall blocks world access to ports 6443, 50000-50010
+# - All administrative access requires Tailscale mesh VPN connection
+# - Backup kubeconfig available as SOPS-encrypted portable configuration
--- a/.cursor/rules/technical-specifications.mdc
+++ b/.cursor/rules/technical-specifications.mdc
@@ -0,0 +1,189 @@
+---
+description: Detailed technical specifications for nodes, network, and Talos configuration
+globs: ["machineconfigs/**/*", "patches/**/*", "talosconfig", "kubeconfig*"]
+alwaysApply: false
+---
+
+# Technical Specifications & Low-Level Configuration
+
+## Talos Configuration ✅ OPERATIONAL
+
+### Custom Talos Image
+- **Factory Image**: `613e1592b2da41ae5e265e8789429f22e121aab91cb4deb6bc3c0b6262961245:v1.10.4`, which includes two plugins necessary for Longhorn
+- **Extensions**: Longhorn extension included for distributed storage
+- **Version**: Talos v1.10.4 with custom factory build
+- **Architecture**: ARM64 optimized for NetCup Cloud infrastructure
+
+### Patch Configuration
+Applied via `patches/` directory for cluster customization:
+- **allow-controlplane-workloads.yaml**: Enables workload scheduling on control plane
+- **cluster-name.yaml**: Sets cluster name to `keyboardvagabond.com`
+- **disable-kube-proxy-and-cni.yaml**: Disables built-in networking for Cilium
+- **etcd-patch.yaml**: etcd optimization and configuration
+- **registry-patch.yaml**: Container registry configuration
+- **worker-discovery-patch.yaml**: Worker node discovery settings
+
+## Network Configuration ✅ OPERATIONAL
+
+### NetCup Cloud Infrastructure
+- **vLAN ID**: 1004963 for internal cluster communication
+- **Network Range**: 10.132.0.0/24 (private VLAN)
+- **DNS Domain**: `cluster.local` (standard Kubernetes domain)
+- **Cluster Name**: `keyboardvagabond.com`
+
+### Node Network Configuration
+| Node | Public IP | VLAN IP | Role | Status |
+|------|-----------|---------|------|--------|
+| **n1** | 152.53.107.24 | 10.132.0.10/24 | Control Plane | ✅ Schedulable |
+| **n2** | 152.53.105.81 | 10.132.0.20/24 | Control Plane | ✅ Schedulable |
+| **n3** | 152.53.200.111 | 10.132.0.30/24 | Control Plane | ✅ Schedulable |
+- **Control Plane VIP**: `10.132.0.5` (shared VIP, nodes elect primary for HA)
+- **All nodes are control plane**: High availability with etcd quorum (2 of 3 required)
+
+### Network Interface Configuration
+- **`enp7s0`**: Public interface (DHCP + static configuration)
+- **`enp9s0`**: Private VLAN interface (static configuration)
+- **Internal Traffic**: Uses private VLAN for pod-to-pod and storage replication
+- **External Access**: Cloudflare Zero Trust tunnels (no direct port exposure)
+
+## Administrative Access Configuration ✅ SECURED
+
+### Kubernetes API Access
+- **Internal Context**: `admin@keyboardvagabond-tailscale`
+- **VIP Endpoint**: `10.132.0.5:6443` (shared VIP, recommended for HA)
+- **Node Endpoints**: `10.132.0.10:6443`, `10.132.0.20:6443`, `10.132.0.30:6443` (individual nodes)
+- **Public Context**: `admin@keyboardvagabond.com` (blocked by firewall)
+- **Public Endpoint**: `api.keyboardvagabond.com:6443` (Tailscale-only)
+- **Access Method**: Tailscale mesh VPN required (CGNAT 100.64.0.0/10)
+
+### Talos API Access
+```bash
+# Talos configuration (VIP recommended for HA)
+talosctl config endpoint 10.132.0.5  # VIP endpoint
+talosctl config node 10.132.0.5      # VIP node
+
+# Alternative: Individual node endpoints
+talosctl config endpoint 10.132.0.10 10.132.0.20 10.132.0.30
+talosctl config node 10.132.0.10  # Primary endpoint
+```
+
+### Essential Management Commands
+```bash
+# Cluster health check
+talosctl health --nodes 10.132.0.10,10.132.0.20,10.132.0.30
+
+# Node status
+talosctl get members
+
+# Kubernetes context switching
+kubectl config use-context admin@keyboardvagabond-tailscale
+
+# Node status verification
+kubectl get nodes -o wide
+```
+
+## Storage Configuration Details ✅ OPERATIONAL
+
+### Longhorn Distributed Storage
+- **Installation Path**: `/var/lib/longhorn` on each node
+- **Replica Policy**: 2-replica configuration across nodes
+- **Storage Class**: `longhorn-retain` for data preservation
+- **Node Allocation**: 400GB+ per node on system disk
+- **Auto-balance**: Enabled for optimal distribution
+
+### Volume Configuration
+- **System Disk**: `/dev/vda` with ephemeral storage
+- **Longhorn Volume**: 400GB minimum allocation per node
+- **Backup Strategy**: Label-based S3 backup selection
+- **Reclaim Policy**: Retain (prevents data loss)
+
+## Tailscale Mesh VPN Configuration ✅ OPERATIONAL
+
+### Tailscale Operator Deployment
+- **Helm Chart**: `tailscale-operator` from Tailscale Helm repository
+- **Version**: v1.90.x (operator v1.90.8)
+- **Namespace**: `tailscale-system`
+- **Replicas**: 2 operator pods with anti-affinity
+- **Hostname**: `keyboardvagabond-operator`
+
+### Subnet Router Configuration (Connector Resource)
+- **Resource Type**: `Connector` (tailscale.com/v1alpha1)
+- **Device Name**: `keyboardvagabond-cluster`
+- **Advertised Networks**:
+  - **Pod Network**: 10.244.0.0/16
+  - **Service Network**: 10.96.0.0/12  
+  - **VLAN Network**: 10.132.0.0/24
+- **OAuth Integration**: Client credentials for device authentication
+- **Device Tagging**: `tag:k8s-operator` for ACL management
+
+### Service Exposure via Magic DNS
+- **Capability**: Services can be exposed via Tailscale operator with meta attributes
+- **Magic DNS**: Automatic DNS resolution for exposed services
+- **Meta Attributes**: Can be used to configure service exposure and routing
+- **Access Control**: Cilium host firewall restricts to Tailscale only
+- **Current CGNAT Range**: 100.64.0.0/10 (Tailscale assigned)
+
+## Component Status Matrix ✅ CURRENT STATE
+
+### Active Components
+| Component | Status | Access Method | Notes |
+|-----------|--------|---------------|-------|
+| **Cilium CNI** | ✅ Operational | Internal | Host firewall + Hubble UI |
+| **Longhorn Storage** | ✅ Operational | Internal | 2-replica with S3 backup |
+| **PostgreSQL HA** | ✅ Operational | Internal | 3-instance CloudNativePG |
+| **Harbor Registry** | ✅ Operational | Direct HTTPS | Zero Trust incompatible |
+| **OpenObserve** | ✅ Operational | Zero Trust | Monitoring platform |
+| **Tailscale VPN** | ✅ Operational | Mesh Network | Administrative access |
+
+### Disabled/Deprecated Components
+| Component | Status | Reason | Alternative |
+|-----------|--------|--------|-------------|
+| **external-dns** | ❌ Removed | Zero Trust migration | Manual DNS in Cloudflare |
+| **cert-manager** | ❌ Removed | Zero Trust migration | Cloudflare edge TLS |
+| **Rook-Ceph** | ❌ Disabled | Complexity and lack of support for partitioning a single drive | Longhorn storage |
+| **Flux GitOps** | ⏸️ Disabled | Manual deployment | Ready for re-activation |
+
+### Development Components
+| Component | Status | Purpose | Access |
+|-----------|--------|---------|--------|
+| **Renovate** | ✅ Operational | Dependency updates | Automated |
+| **Elasticsearch** | ✅ Operational | Log aggregation | Internal |
+| **Kibana** | ✅ Operational | Log analytics | Zero Trust |
+
+## Network Security Configuration ✅ HARDENED
+
+### Cilium Host Firewall Rules
+```yaml
+# Control plane API access (Tailscale only)
+- fromCIDR: ["100.64.0.0/10"]  # Tailscale CGNAT
+  toPorts: [{"port": "6443", "protocol": "TCP"}]
+
+# Block world access to HTTP/HTTPS
+- HTTP/HTTPS ports blocked from 0.0.0.0/0
+- Only cluster-internal and Tailscale access permitted
+```
+
+### Zero Trust Architecture
+- **External Applications**: All via Cloudflare tunnels
+- **Administrative APIs**: Tailscale mesh VPN only
+- **Harbor Exception**: Direct ports 80/443 (header modification issues)
+- **Internal Services**: Cluster-local communication only
+
+## Future Scaling Specifications
+
+### Node Addition Process
+1. **Network**: Add to NetCup Cloud vLAN 1004963
+2. **IP Assignment**: Sequential (10.132.0.40/24, 10.132.0.50/24, etc.)
+3. **Talos Config**: Apply machine config with proper networking
+4. **Longhorn**: Automatic storage distribution across new nodes
+5. **Workload**: Immediate scheduling capability
+
+### High Availability Expansion
+- **Additional Control Planes**: Can add for true HA setup
+- **Load Balancing**: MetalLB or cloud LB integration ready
+- **Database Scaling**: PostgreSQL can expand to more replicas
+- **Storage Scaling**: Longhorn distributed across all nodes
+
+@talos-machine-config-template.yaml
+@cilium-network-policy-template.yaml
+@longhorn-volume-template.yaml
--- a/.cursor/rules/troubleshooting-history.mdc
+++ b/.cursor/rules/troubleshooting-history.mdc
@@ -0,0 +1,149 @@
+---
+description: Historical issues, lessons learned, and troubleshooting knowledge from cluster evolution
+globs: []
+alwaysApply: false
+---
+
+# Troubleshooting History & Lessons Learned
+
+This rule captures critical historical knowledge from the cluster's evolution, including resolved issues, migration challenges, and lessons learned that inform future decisions.
+
+## 🔄 Major Architecture Migrations
+
+### DNS Domain Evolution ✅ **RESOLVED**
+- **Previous Issue**: Used custom `local.keyboardvagabond.com` domain causing compatibility problems
+- **Resolution**: Reverted to standard `cluster.local` domain  
+- **Benefits**: Full compatibility with monitoring dashboards, service discovery, and all Kubernetes tooling
+- **Lesson**: Always use standard Kubernetes domains unless absolutely necessary
+
+### Zero Trust Migration ✅ **COMPLETED**
+- **Migration Scope**: 10 of 11 external services migrated from external-dns/cert-manager to Cloudflare Zero Trust tunnels
+- **Services Migrated**: Mastodon, Mastodon Streaming, Pixelfed, PieFed, Picsur, BookWyrm, Authentik, OpenObserve, Kibana, WriteFreely
+- **Harbor Exception**: Harbor registry reverted to direct port exposure (80/443) due to Cloudflare header modification breaking container image layer writes
+- **Dependencies Removed**: external-dns and cert-manager components no longer needed
+- **Key Challenges Resolved**: Mastodon streaming subdomain compatibility, StatefulSet immutable fields, service discovery issues
+
+## 🛠️ Historical Technical Issues
+
+### DNS and External-DNS Resolution ✅ **RESOLVED & DEPRECATED**
+- **Previous Issue**: External-DNS creating records with private VLAN IPs (10.132.0.x) which Cloudflare rejected
+- **Temporary Solution**: Used `external-dns.alpha.kubernetes.io/target` annotations with public IPs
+- **Target Annotations**: `152.53.107.24,152.53.105.81` were used for all ingress resources
+- **Final Resolution**: **External-DNS completely removed in favor of Cloudflare Zero Trust tunnels**
+- **Current Status**: Manual DNS record creation via Cloudflare Dashboard (external-dns no longer needed)
+
+### SSL Certificate Issues ✅ **RESOLVED**
+- **Previous Issue**: Let's Encrypt certificates stuck in "False/Not Ready" state due to DNS resolution failures
+- **Resolution**: DNS records now resolve correctly, enabling HTTP-01 challenge completion
+- **Migration**: Eventually replaced by Zero Trust architecture eliminating certificate management
+
+### Node IP Configuration ✅ **IMPLEMENTED**
+- **Approach**: Using kubelet `extraArgs` with `node-ip` parameter
+- **n2 Status**: ✅ Successfully reporting public IP (152.53.105.81)
+- **Backup Strategy**: Target annotations provide reliable DNS record creation regardless of node IP status
+
+## 🔍 Framework-Specific Lessons Learned
+
+### CDN Storage Evolution: Shared vs Dedicated Buckets
+**Original Plan**: Single bucket with prefixes (`/pixelfed`, `/piefed`, `/mastodon`)  
+**Issue Discovered**: Pixelfed demonstrated inconsistent prefix handling, sometimes failing to return URLs with correct subdirectory  
+**Solution**: Dedicated buckets eliminate compatibility issues entirely
+
+**Benefits of Dedicated Bucket Approach**:
+- **Application Compatibility**: Some applications don't fully support S3 prefixes
+- **No Prefix Conflicts**: Eliminates S3 path prefix issues with shared buckets
+- **Simplified Configuration**: Clean S3 endpoints without complex path rewriting
+- **Independent Scaling**: Each application can optimize caching independently
+
+### Mastodon Streaming Subdomain Challenge ✅ **FIXED**
+- **Original**: `streaming.mastodon.keyboardvagabond.com` 
+- **Issue**: Cloudflare Free plan subdomain limitation (not supported)
+- **Solution**: Changed to `streamingmastodon.keyboardvagabond.com` ✅ **WORKING**
+- **Lesson**: Cloudflare Free plan supports only one subdomain level (`app.domain.com` not `sub.app.domain.com`)
+
+### Flask Application Discovery Patterns
+**Critical Framework Identification**: Must identify Flask vs Django early in development
+- **Flask**: Uses `flask` command, URL-based config (DATABASE_URL), application factory pattern
+- **Django**: Uses `python manage.py` commands, separate host/port variables, standard project structure
+- **uWSGI Integration**: Must use same Python version as venv; install via pip, not Alpine packages
+- **Static Files**: Flask with application factory has nested structure (`/app/app/static/`)
+
+### Laravel S3 Configuration Discoveries
+**Critical Laravel S3 Settings**:
+- **`DANGEROUSLY_SET_FILESYSTEM_DRIVER=s3`**: Essential to make S3 the default filesystem
+- **Cache Invalidation**: Must run `php artisan config:cache` after S3 (or any) configuration changes
+- **Dedicated Buckets**: Prevents double-prefix issues that occur with shared buckets
+
+### Django Static File Pipeline
+**Theme Compilation Order**: Must compile themes **before** static file collection to S3
+- **Correct Pipeline**: `compile_themes` → `collectstatic` → S3 upload
+- **Backblaze B2**: Requires empty `AWS_DEFAULT_ACL` due to no ACL support
+- **Container Builds**: Theme compilation at runtime (not build time) requires database access
+
+## 🚨 Zero Trust Migration Issues Resolved
+
+### Common Migration Problems
+- **Mastodon Streaming**: Fixed subdomain compatibility for Cloudflare Free plan
+- **OpenObserve StatefulSet**: Used manual Helm deployment to bypass immutable field restrictions  
+- **Picsur Service Discovery**: Fixed label mismatch between service selector and pod labels
+- **Corporate VPN Blocking**: SSL handshake failures resolved by testing from different networks
+
+### Harbor Registry Exception
+**Why Harbor Can't Use Zero Trust**:
+- **Issue**: Cloudflare header modification breaks container image layer writes
+- **Solution**: Direct port exposure (80/443) for Harbor only
+- **Security**: All other services use Zero Trust tunnels
+
+## 🔧 Infrastructure Evolution Context
+
+### Talos Configuration
+- **Custom Image**: `613e1592b2da41ae5e265e8789429f22e121aab91cb4deb6bc3c0b6262961245:v1.10.4` with Longhorn extension
+- **Network Interfaces**: 
+  - `enp7s0`: Public interface (DHCP + static configuration)
+  - `enp9s0`: Private VLAN interface (static configuration)
+
+### Storage Evolution
+- **Original**: Basic Longhorn setup
+- **Current**: 2-replica configuration with S3 backup integration
+- **Backup Strategy**: Label-based volume selection system
+- **Cost Optimization**: $6/TB with $0 egress via Cloudflare partnership
+
+### Administrative Access Evolution  
+- **Original**: Direct public API access
+- **Migration**: Tailscale mesh VPN implementation
+- **Current**: CGNAT-only access (100.64.0.0/10) via mesh network
+- **Security**: Zero external API exposure
+
+## 📊 Operational Patterns Discovered
+
+### Multi-Stage Docker Benefits
+- **Size Reduction**: From 1.3GB single-stage to ~350MB multi-stage builds (~75% reduction)
+- **Essential for**: Python/Node.js applications to remove build dependencies
+- **Pattern**: Base image → Web container → Worker container specialization
+
+### ActivityPub Rate Limiting Implementation
+**Based on**: [PieFed blog recommendations](https://join.piefed.social/2024/04/17/handling-large-bursts-of-post-requests-to-your-activitypub-inbox-using-a-buffer-in-nginx/)
+- **Rate**: 10 requests/second with 300 request burst buffer
+- **Memory**: 100MB zone sufficient for large-scale instances
+- **Federation Impact**: Graceful handling of viral content spikes
+
+### Terminal Environment Discovery
+- **PowerShell on macOS**: PSReadLine displays errors but commands execute successfully
+- **Recommendation**: Use default OS terminal over PowerShell (except Windows)
+- **Functionality**: Command outputs remain readable despite display issues
+
+## 🎯 Critical Success Factors
+
+### What Made Migrations Successful
+1. **Gradual Migration**: One service at a time instead of big-bang approach
+2. **Testing Pattern**: `kubectl run curl-test` to verify internal service health
+3. **Backup Strategies**: Target annotations as fallback for DNS issues
+4. **Documentation**: Detailed tracking of each migration step and issue resolution
+
+### Patterns to Avoid
+1. **Custom DNS Domains**: Stick to `cluster.local` for compatibility
+2. **Shared S3 Buckets**: Use dedicated buckets to avoid prefix conflicts
+3. **Complex Subdomains**: Cloudflare Free plan limitations require simple patterns
+4. **Single-Stage Containers**: Multi-stage builds essential for production efficiency
+
+This historical knowledge should inform all future architectural decisions and troubleshooting approaches.
--- a/.cursor/rules/zero-trust-ingress-template.yaml
+++ b/.cursor/rules/zero-trust-ingress-template.yaml
@@ -0,0 +1,54 @@
+# Zero Trust Ingress Template
+# Use this template for all new applications deployed via Cloudflare tunnels
+
+apiVersion: networking.k8s.io/v1
+kind: Ingress
+metadata:
+  name: app-ingress
+  namespace: app-namespace
+  annotations:
+    # Basic NGINX Configuration only - no cert-manager or external-dns
+    kubernetes.io/ingress.class: nginx
+    nginx.ingress.kubernetes.io/backend-protocol: "HTTP"
+    
+    # Optional: Extended timeouts for long-running requests
+    nginx.ingress.kubernetes.io/proxy-read-timeout: "3600"
+    nginx.ingress.kubernetes.io/proxy-send-timeout: "3600"
+    
+    # Optional: ActivityPub rate limiting for fediverse applications
+    nginx.ingress.kubernetes.io/server-snippet: |
+      limit_req_zone $binary_remote_addr zone=app_inbox:100m rate=10r/s;
+    nginx.ingress.kubernetes.io/configuration-snippet: |
+      location ~* ^/(inbox|users/.*/inbox) {
+        limit_req zone=app_inbox burst=300;
+      }
+
+spec:
+  ingressClassName: nginx
+  tls: []  # Empty - TLS handled by Cloudflare edge
+  rules:
+  - host: app.keyboardvagabond.com
+    http:
+      paths:
+      - path: /
+        pathType: Prefix
+        backend:
+          service:
+            name: app-service
+            port:
+              number: 80
+
+---
+# Service template
+apiVersion: v1
+kind: Service
+metadata:
+  name: app-service
+  namespace: app-namespace
+spec:
+  selector:
+    app: app-name
+  ports:
+  - name: http
+    port: 80
+    targetPort: 8080