redaction (#1)

Add the redacted source file for demo purposes Reviewed-on: https://source.michaeldileo.org/michael_dileo/Keybard-Vagabond-Demo/pulls/1 Co-authored-by: Michael DiLeo <michael_dileo@proton.me> Co-committed-by: Michael DiLeo <michael_dileo@proton.me>
2025-12-24 13:40:47 +00:00
parent 612235d52b
commit 7327d77dcd
333 changed files with 39286 additions and 1 deletions
--- a/manifests/infrastructure/redis/bitnami/README.md
+++ b/manifests/infrastructure/redis/bitnami/README.md
@@ -0,0 +1,215 @@
+# Redis Infrastructure
+
+This directory contains the Redis Primary-Replica setup for high-availability caching on the Kubernetes cluster.
+
+## Architecture
+
+- **2 Redis instances**: 1 primary + 1 replica for high availability
+- **Asynchronous replication**: Optimized for 100Mbps VLAN performance  
+- **Node distribution**: Instances are distributed across n1 and n2 nodes
+- **Longhorn storage**: Single replica (Redis handles replication), Delete reclaim policy (cache data)
+- **Bitnami Redis**: Industry-standard Helm chart with comprehensive features
+
+## Components
+
+### **Core Components**
+- `namespace.yaml`: Redis system namespace
+- `repository.yaml`: Bitnami Helm repository
+- `redis.yaml`: Redis primary-replica deployment
+- `redis-storageclass.yaml`: Optimized storage class for Redis
+- `secret.yaml`: SOPS-encrypted Redis credentials
+
+### **Monitoring Components**
+- `monitoring.yaml`: ServiceMonitor for OpenObserve integration
+- `redis-exporter.yaml`: Dedicated Redis exporter for comprehensive metrics
+- Built-in metrics: Redis exporter with Celery queue monitoring
+
+### **Backup Components**
+- **Integrated with existing Longhorn backup**: Uses existing S3 backup infrastructure
+- S3 integration: Automated backup to Backblaze B2 via existing `longhorn-s3-backup` group
+
+## Services Created
+
+Redis automatically creates these services:
+
+- `redis-master`: Write operations (connects to primary) - Port 6379
+- `redis-replica`: Read-only operations (connects to replicas) - Port 6379  
+- `redis-headless`: Service discovery for both instances
+
+## Connection Information
+
+### For Applications
+
+Applications should connect using these connection parameters:
+
+**Write Operations:**
+```yaml
+host: redis-ha-haproxy.redis-system.svc.cluster.local
+port: 6379
+auth: <password from redis-credentials secret>
+```
+
+**Read Operations:**
+```yaml
+host: redis-replica.redis-system.svc.cluster.local  
+port: 6379
+auth: <password from redis-credentials secret>
+```
+
+### Getting Credentials
+
+The Redis password is stored in SOPS-encrypted secret:
+
+```bash
+# Get the Redis password
+kubectl get secret redis-credentials -n redis-system -o jsonpath="{.data.redis-password}" | base64 -d
+```
+
+## Application Integration Example
+
+Here's how an application deployment would connect:
+
+```yaml
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: example-app
+spec:
+  template:
+    spec:
+      containers:
+      - name: app
+        image: example-app:latest
+        env:
+        - name: REDIS_HOST_WRITE
+          value: "redis-ha-haproxy.redis-system.svc.cluster.local"
+        - name: REDIS_HOST_READ
+          value: "redis-replica.redis-system.svc.cluster.local"
+        - name: REDIS_PORT
+          value: "6379"
+        - name: REDIS_PASSWORD
+          valueFrom:
+            secretKeyRef:
+              name: redis-credentials
+              key: redis-password
+```
+
+## Monitoring
+
+The Redis cluster includes comprehensive monitoring:
+
+### **Metrics & Monitoring** ✅ **READY**
+- **Metrics Port**: 9121 - Redis exporter metrics endpoint
+- **ServiceMonitor**: Configured for OpenObserve integration
+- **Key Metrics Available**:
+  - **Performance**: `redis_commands_processed_total`, `redis_connected_clients`, `redis_keyspace_hits_total`
+  - **Memory**: `redis_memory_used_bytes`, `redis_memory_max_bytes`
+  - **Replication**: `redis_master_repl_offset`, `redis_replica_lag_seconds`
+  - **Persistence**: `redis_rdb_last_save_timestamp_seconds`
+
+### **High Availability Monitoring**
+- **Automatic Failover**: Manual failover required (unlike PostgreSQL)
+- **Health Checks**: Continuous health monitoring with restart policies
+- **Async Replication**: Real-time replication lag monitoring
+
+## Backup Strategy
+
+### **Integrated with Existing Longhorn Backup Infrastructure**
+Redis volumes automatically use your existing backup system:
+- **Daily backups**: 2 AM UTC via `longhorn-s3-backup` group, retain 7 days
+- **Weekly backups**: 1 AM Sunday via `longhorn-s3-backup-weekly` group, retain 4 weeks  
+- **Target**: Backblaze B2 S3 storage via existing setup
+- **Type**: Incremental (efficient for Redis datasets)
+- **Automatic assignment**: Redis storage class automatically applies backup jobs
+
+### **Redis Persistence**
+- **RDB snapshots**: Enabled with periodic saves
+- **AOF**: Can be enabled for additional durability if needed
+
+### **Backup Integration**
+Redis volumes are automatically backed up because the Redis storage class includes:
+```yaml
+recurringJobSelector: |
+  [
+    {
+      "name":"longhorn-s3-backup",
+      "isGroup":true
+    }
+  ]
+```
+
+## Storage Design Decisions
+
+### **Reclaim Policy: Delete**
+The Redis storage class uses `reclaimPolicy: Delete` because:
+- **Cache Data**: Redis primarily stores ephemeral cache data that can be rebuilt
+- **Resource Efficiency**: Automatic cleanup prevents storage waste on your 2-node cluster
+- **Cost Optimization**: No orphaned volumes consuming storage space
+- **Operational Simplicity**: Clean GitOps deployments without manual volume cleanup
+
+**Note**: Even with Delete policy, data is still backed up to S3 daily for disaster recovery.
+
+## Performance Optimizations
+
+Configured for your 2-node, 100Mbps VLAN setup:
+- **Async replication**: Minimizes network impact
+- **Local reads**: Applications can read from local Redis replica
+- **Memory limits**: 2GB per instance (appropriate for 16GB nodes)
+- **Persistence tuning**: Optimized for SSD storage
+- **TCP keepalive**: Extended for slower network connections
+
+## Scaling
+
+To add more read replicas:
+```yaml
+# Edit redis.yaml
+replica:
+  replicaCount: 2  # Increase from 1 to 2 for additional read replica
+```
+
+## Troubleshooting
+
+### **Cluster Status**
+```bash
+# Check Redis pods
+kubectl get pods -n redis-system
+kubectl logs redis-master-0 -n redis-system
+kubectl logs redis-replica-0 -n redis-system
+
+# Connect to Redis
+kubectl exec -it redis-master-0 -n redis-system -- redis-cli -a $(kubectl get secret redis-credentials -n redis-system -o jsonpath="{.data.redis-password}" | base64 -d)
+```
+
+### **Monitoring & Metrics**
+```bash
+# Check ServiceMonitor
+kubectl get servicemonitor -n redis-system
+kubectl describe servicemonitor redis-metrics -n redis-system
+
+# Check metrics endpoint directly
+kubectl port-forward -n redis-system svc/redis-metrics 9121:9121
+curl http://localhost:9121/metrics
+```
+
+### **Replication Status**
+```bash
+# Check replication from master
+kubectl exec -it redis-master-0 -n redis-system -- redis-cli -a $(kubectl get secret redis-credentials -n redis-system -o jsonpath="{.data.redis-password}" | base64 -d) INFO replication
+
+# Check replica status
+kubectl exec -it redis-replica-0 -n redis-system -- redis-cli -a $(kubectl get secret redis-credentials -n redis-system -o jsonpath="{.data.redis-password}" | base64 -d) INFO replication
+```
+
+### **Performance Testing**
+```bash
+# Benchmark Redis performance
+kubectl exec -it redis-master-0 -n redis-system -- redis-benchmark -h redis-ha-haproxy.redis-system.svc.cluster.local -p 6379 -a $(kubectl get secret redis-credentials -n redis-system -o jsonpath="{.data.redis-password}" | base64 -d) -c 50 -n 10000
+```
+
+## Next Steps
+
+1. **Encrypt secrets**: Use SOPS to encrypt the credentials
+2. **Deploy via GitOps**: Commit and push to trigger Flux deployment
+3. **Verify deployment**: Monitor pods and services
+4. **Update applications**: Configure Harbor and OpenObserve to use Redis
+5. **Setup monitoring**: Verify metrics in OpenObserve dashboards 
--- a/manifests/infrastructure/redis/bitnami/kustomization.yaml
+++ b/manifests/infrastructure/redis/bitnami/kustomization.yaml
@@ -0,0 +1,11 @@
+---
+apiVersion: kustomize.config.k8s.io/v1beta1
+kind: Kustomization
+resources:
+- namespace.yaml
+- repository.yaml
+# redis-storageclass.yaml moved to longhorn/kustomization.yaml
+# StorageClass is managed by Longhorn infrastructure since it's a Longhorn StorageClass
+- secret.yaml
+- redis.yaml
+- monitoring.yaml 
--- a/manifests/infrastructure/redis/bitnami/monitoring.yaml
+++ b/manifests/infrastructure/redis/bitnami/monitoring.yaml
@@ -0,0 +1,24 @@
+---
+apiVersion: monitoring.coreos.com/v1
+kind: ServiceMonitor
+metadata:
+  name: redis-metrics
+  namespace: redis-system
+  labels:
+    app: redis
+    app.kubernetes.io/name: redis
+    app.kubernetes.io/component: metrics
+spec:
+  selector:
+    matchLabels:
+      app.kubernetes.io/name: redis
+      app.kubernetes.io/component: metrics
+  endpoints:
+  - port: http-metrics
+    interval: 30s
+    scrapeTimeout: 10s
+    path: /metrics
+    scheme: http
+  namespaceSelector:
+    matchNames:
+    - redis-system 
--- a/manifests/infrastructure/redis/bitnami/namespace.yaml
+++ b/manifests/infrastructure/redis/bitnami/namespace.yaml
@@ -0,0 +1,9 @@
+---
+apiVersion: v1
+kind: Namespace
+metadata:
+  name: redis-system
+  labels:
+    name: redis-system
+    pod-security.kubernetes.io/enforce: restricted
+    pod-security.kubernetes.io/enforce-version: latest 
--- a/manifests/infrastructure/redis/bitnami/redis.yaml
+++ b/manifests/infrastructure/redis/bitnami/redis.yaml
@@ -0,0 +1,203 @@
+---
+apiVersion: helm.toolkit.fluxcd.io/v2
+kind: HelmRelease
+metadata:
+  name: redis
+  namespace: redis-system
+spec:
+  interval: 5m
+  chart:
+    spec:
+      chart: redis
+      version: "20.13.4"
+      sourceRef:
+        kind: HelmRepository
+        name: bitnami
+        namespace: redis-system
+  
+  values:
+
+    redis:
+      envFrom:
+        - secretRef:
+            name: redis-credentials
+
+    # Use cluster domain for DNS resolution
+    clusterDomain: cluster.local
+    
+    # Global Redis configuration
+    global:
+      # Allow non-Bitnami images for redis/redis-exporter
+      security:
+        allowInsecureImages: true
+      redis:
+        # Use secret for password
+        existingSecret: redis-credentials
+        existingSecretPasswordKey: redis-password
+    
+    # Redis architecture: replication (primary-replica)
+    architecture: replication
+    
+    # Authentication configuration
+    auth:
+      enabled: true
+      # Password will be loaded from secret
+      existingSecret: redis-credentials
+      existingSecretPasswordKey: redis-password
+    
+    # Primary Redis configuration
+    master:
+      count: 1
+
+      podLabels:
+        app.kubernetes.io/name: redis
+        app.kubernetes.io/instance: redis
+        app.kubernetes.io/component: master
+      
+      # Use bitnamilegacy Redis image (includes Bash/Bitnami entrypoint scripts)
+      image:
+        registry: docker.io
+        repository: bitnamilegacy/redis
+
+      disableCommands: []
+      
+      # Node affinity to ensure primary runs on specific node
+      affinity:
+        podAntiAffinity:
+          preferredDuringSchedulingIgnoredDuringExecution:
+          - weight: 100
+            podAffinityTerm:
+              labelSelector:
+                matchLabels:
+                  app.kubernetes.io/name: redis
+                  app.kubernetes.io/component: replica
+              topologyKey: kubernetes.io/hostname
+      
+      # Resource limits appropriate for your 16GB nodes
+      resources:
+        requests:
+          cpu: 500m
+          memory: 1Gi
+        limits:
+          cpu: 2000m
+          memory: 4Gi
+      
+      # Storage configuration
+      persistence:
+        enabled: true
+        storageClass: longhorn-redis
+        size: 20Gi
+        accessModes:
+        - ReadWriteOnce
+      
+      # Redis configuration optimized for your setup
+      configuration: |-
+        # Network and timeout settings optimized for 100Mbps VLAN
+        tcp-keepalive 60
+        timeout 300
+        
+        # Memory and persistence settings
+        maxmemory-policy allkeys-lru
+        save 900 1
+        save 300 10
+        save 60 10000
+        
+        # Replication settings optimized for async over slower network
+        repl-diskless-sync no
+        repl-diskless-sync-delay 5
+        repl-ping-replica-period 10
+        repl-timeout 60
+        
+        # Performance optimizations
+        tcp-backlog 511
+        databases 16
+      
+      # Allow scheduling on control plane nodes
+      tolerations:
+      - effect: NoSchedule
+        key: node-role.kubernetes.io/control-plane
+        operator: Exists
+    
+    # Replica Redis configuration  
+    replica:
+      replicaCount: 0
+      
+      # Use bitnamilegacy Redis image (includes Bash/Bitnami entrypoint scripts)
+      image:
+        registry: docker.io
+        repository: bitnamilegacy/redis
+        tag: 8.2.1-debian-12-r0
+      
+      # Ensure replica runs on different node than primary
+      affinity:
+        podAntiAffinity:
+          requiredDuringSchedulingIgnoredDuringExecution:
+          - labelSelector:
+              matchLabels:
+                app.kubernetes.io/name: redis
+                app.kubernetes.io/component: master
+            topologyKey: kubernetes.io/hostname
+      
+      # Resource limits for replica
+      resources:
+        requests:
+          cpu: 500m
+          memory: 1Gi
+        limits:
+          cpu: 2000m
+          memory: 4Gi
+      
+      # Storage configuration for replica
+      persistence:
+        enabled: true
+        storageClass: longhorn-redis
+        size: 20Gi
+        accessModes:
+        - ReadWriteOnce
+      
+      # Allow scheduling on control plane nodes
+      tolerations:
+      - effect: NoSchedule
+        key: node-role.kubernetes.io/control-plane
+        operator: Exists
+    
+    # Metrics configuration for OpenObserve integration
+    metrics:
+      enabled: false
+      
+      # Redis exporter configuration - using bitnamilegacy image (compatible with chart scripts)
+      image:
+        registry: docker.io
+        repository: bitnamilegacy/redis-exporter
+        tag: 1.76.0-debian-12-r0
+      
+      # Resources for metrics exporter
+      resources:
+        requests:
+          cpu: 50m
+          memory: 64Mi
+        limits:
+          cpu: 200m
+          memory: 128Mi
+      
+      # ServiceMonitor for Prometheus/OpenObserve
+      serviceMonitor:
+        enabled: true
+        namespace: redis-system
+        interval: 30s
+        scrapeTimeout: 10s
+        labels:
+          app: redis
+        selector:
+          matchLabels:
+            app.kubernetes.io/name: redis
+            app.kubernetes.io/component: metrics
+    
+    # Network Policy (optional, can be enabled later)
+    networkPolicy:
+      enabled: false
+    
+    # Pod Disruption Budget for high availability
+    pdb:
+      create: true
+      minAvailable: 1 
--- a/manifests/infrastructure/redis/bitnami/repository.yaml
+++ b/manifests/infrastructure/redis/bitnami/repository.yaml
@@ -0,0 +1,10 @@
+---
+apiVersion: source.toolkit.fluxcd.io/v1
+kind: HelmRepository
+metadata:
+  name: bitnami
+  namespace: redis-system
+spec:
+  interval: 5m0s
+  type: oci
+  url: oci://registry-1.docker.io/bitnamicharts 
--- a/manifests/infrastructure/redis/bitnami/secret.yaml
+++ b/manifests/infrastructure/redis/bitnami/secret.yaml
@@ -0,0 +1,10 @@
+apiVersion: v1
+kind: Secret
+metadata:
+    name: redis-credentials
+    namespace: redis-system
+type: Opaque
+stringData:
+    REDIS_PASSWORD: <REDACTED>
+    redis-password: <REDACTED>
+    redis-replica-password: <REDACTED>
--- a/manifests/infrastructure/redis/kustomization.yaml
+++ b/manifests/infrastructure/redis/kustomization.yaml
@@ -0,0 +1,9 @@
+---
+apiVersion: kustomize.config.k8s.io/v1beta1
+kind: Kustomization
+resources:
+  - namespace.yaml
+  - repository.yaml
+  - secret.yaml
+  - redis.yaml
+
--- a/manifests/infrastructure/redis/namespace.yaml
+++ b/manifests/infrastructure/redis/namespace.yaml
@@ -0,0 +1,10 @@
+---
+apiVersion: v1
+kind: Namespace
+metadata:
+  name: redis-system
+  labels:
+    name: redis-system
+    pod-security.kubernetes.io/enforce: restricted
+    pod-security.kubernetes.io/enforce-version: latest
+
--- a/manifests/infrastructure/redis/redis.yaml
+++ b/manifests/infrastructure/redis/redis.yaml
@@ -0,0 +1,103 @@
+---
+apiVersion: helm.toolkit.fluxcd.io/v2
+kind: HelmRelease
+metadata:
+  name: redis-ha
+  namespace: redis-system
+spec:
+  interval: 10m
+  timeout: 5m
+  chart:
+    spec:
+      chart: redis-ha
+      version: "4.35.3"
+      sourceRef:
+        kind: HelmRepository
+        name: redis-ha
+        namespace: redis-system
+      interval: 1h
+  install:
+    remediation:
+      retries: 3
+  upgrade:
+    remediation:
+      retries: 3
+  values:
+    replicas: 3
+
+    # Force Redis pods onto distinct nodes
+    hardAntiAffinity: true
+
+    auth: true
+    existingSecret: redis-credentials
+    authKey: redis-password
+
+    persistentVolume:
+      enabled: true
+      storageClass: longhorn-redis
+      size: 20Gi
+
+    podDisruptionBudget:
+      minAvailable: 1
+
+    tolerations:
+    - key: node-role.kubernetes.io/control-plane
+      operator: Exists
+      effect: NoSchedule
+
+    redis:
+      resources:
+        requests:
+          cpu: 500m
+          memory: 1Gi
+        limits:
+          cpu: 2000m
+          memory: 4Gi
+      config:
+        tcp-keepalive: "60"
+        timeout: "300"
+        save:
+        - "900 1"
+        - "300 10"
+        - "60 10000"
+        repl-diskless-sync: "no"
+        repl-diskless-sync-delay: "5"
+        repl-ping-replica-period: "10"
+        repl-timeout: "60"
+        tcp-backlog: "511"
+        databases: "16"
+        maxmemory-policy: "allkeys-lru"
+
+    sentinel:
+      auth: true
+      existingSecret: redis-credentials
+      authKey: redis-password
+      resources:
+        requests:
+          cpu: 100m
+          memory: 256Mi
+        limits:
+          cpu: 500m
+          memory: 512Mi
+
+    haproxy:
+      enabled: true
+      hardAntiAffinity: true
+      metrics:
+        enabled: true
+        serviceMonitor:
+          enabled: true
+          namespace: redis-system
+          interval: 30s
+          telemetryPath: /metrics
+          timeout: 10s
+
+    exporter:
+      enabled: true
+      serviceMonitor:
+        enabled: true
+        namespace: redis-system
+        interval: 30s
+        telemetryPath: /metrics
+        timeout: 10s
+
--- a/manifests/infrastructure/redis/repository.yaml
+++ b/manifests/infrastructure/redis/repository.yaml
@@ -0,0 +1,10 @@
+---
+apiVersion: source.toolkit.fluxcd.io/v1
+kind: HelmRepository
+metadata:
+  name: redis-ha
+  namespace: redis-system
+spec:
+  interval: 1h
+  url: https://dandydeveloper.github.io/charts
+
--- a/manifests/infrastructure/redis/secret.yaml
+++ b/manifests/infrastructure/redis/secret.yaml
@@ -0,0 +1,9 @@
+apiVersion: v1
+kind: Secret
+metadata:
+    name: redis-credentials
+    namespace: redis-system
+type: Opaque
+stringData:
+    redis-password: <REDACTED>
+    redis-replica-password: <REDACTED>