# Redis Infrastructure This directory contains the Redis Primary-Replica setup for high-availability caching on the Kubernetes cluster. ## Architecture - **2 Redis instances**: 1 primary + 1 replica for high availability - **Asynchronous replication**: Optimized for 100Mbps VLAN performance - **Node distribution**: Instances are distributed across n1 and n2 nodes - **Longhorn storage**: Single replica (Redis handles replication), Delete reclaim policy (cache data) - **Bitnami Redis**: Industry-standard Helm chart with comprehensive features ## Components ### **Core Components** - `namespace.yaml`: Redis system namespace - `repository.yaml`: Bitnami Helm repository - `redis.yaml`: Redis primary-replica deployment - `redis-storageclass.yaml`: Optimized storage class for Redis - `secret.yaml`: SOPS-encrypted Redis credentials ### **Monitoring Components** - `monitoring.yaml`: ServiceMonitor for OpenObserve integration - `redis-exporter.yaml`: Dedicated Redis exporter for comprehensive metrics - Built-in metrics: Redis exporter with Celery queue monitoring ### **Backup Components** - **Integrated with existing Longhorn backup**: Uses existing S3 backup infrastructure - S3 integration: Automated backup to Backblaze B2 via existing `longhorn-s3-backup` group ## Services Created Redis automatically creates these services: - `redis-master`: Write operations (connects to primary) - Port 6379 - `redis-replica`: Read-only operations (connects to replicas) - Port 6379 - `redis-headless`: Service discovery for both instances ## Connection Information ### For Applications Applications should connect using these connection parameters: **Write Operations:** ```yaml host: redis-ha-haproxy.redis-system.svc.cluster.local port: 6379 auth: ``` **Read Operations:** ```yaml host: redis-replica.redis-system.svc.cluster.local port: 6379 auth: ``` ### Getting Credentials The Redis password is stored in SOPS-encrypted secret: ```bash # Get the Redis password kubectl get secret redis-credentials -n redis-system -o jsonpath="{.data.redis-password}" | base64 -d ``` ## Application Integration Example Here's how an application deployment would connect: ```yaml apiVersion: apps/v1 kind: Deployment metadata: name: example-app spec: template: spec: containers: - name: app image: example-app:latest env: - name: REDIS_HOST_WRITE value: "redis-ha-haproxy.redis-system.svc.cluster.local" - name: REDIS_HOST_READ value: "redis-replica.redis-system.svc.cluster.local" - name: REDIS_PORT value: "6379" - name: REDIS_PASSWORD valueFrom: secretKeyRef: name: redis-credentials key: redis-password ``` ## Monitoring The Redis cluster includes comprehensive monitoring: ### **Metrics & Monitoring** ✅ **READY** - **Metrics Port**: 9121 - Redis exporter metrics endpoint - **ServiceMonitor**: Configured for OpenObserve integration - **Key Metrics Available**: - **Performance**: `redis_commands_processed_total`, `redis_connected_clients`, `redis_keyspace_hits_total` - **Memory**: `redis_memory_used_bytes`, `redis_memory_max_bytes` - **Replication**: `redis_master_repl_offset`, `redis_replica_lag_seconds` - **Persistence**: `redis_rdb_last_save_timestamp_seconds` ### **High Availability Monitoring** - **Automatic Failover**: Manual failover required (unlike PostgreSQL) - **Health Checks**: Continuous health monitoring with restart policies - **Async Replication**: Real-time replication lag monitoring ## Backup Strategy ### **Integrated with Existing Longhorn Backup Infrastructure** Redis volumes automatically use your existing backup system: - **Daily backups**: 2 AM UTC via `longhorn-s3-backup` group, retain 7 days - **Weekly backups**: 1 AM Sunday via `longhorn-s3-backup-weekly` group, retain 4 weeks - **Target**: Backblaze B2 S3 storage via existing setup - **Type**: Incremental (efficient for Redis datasets) - **Automatic assignment**: Redis storage class automatically applies backup jobs ### **Redis Persistence** - **RDB snapshots**: Enabled with periodic saves - **AOF**: Can be enabled for additional durability if needed ### **Backup Integration** Redis volumes are automatically backed up because the Redis storage class includes: ```yaml recurringJobSelector: | [ { "name":"longhorn-s3-backup", "isGroup":true } ] ``` ## Storage Design Decisions ### **Reclaim Policy: Delete** The Redis storage class uses `reclaimPolicy: Delete` because: - **Cache Data**: Redis primarily stores ephemeral cache data that can be rebuilt - **Resource Efficiency**: Automatic cleanup prevents storage waste on your 2-node cluster - **Cost Optimization**: No orphaned volumes consuming storage space - **Operational Simplicity**: Clean GitOps deployments without manual volume cleanup **Note**: Even with Delete policy, data is still backed up to S3 daily for disaster recovery. ## Performance Optimizations Configured for your 2-node, 100Mbps VLAN setup: - **Async replication**: Minimizes network impact - **Local reads**: Applications can read from local Redis replica - **Memory limits**: 2GB per instance (appropriate for 16GB nodes) - **Persistence tuning**: Optimized for SSD storage - **TCP keepalive**: Extended for slower network connections ## Scaling To add more read replicas: ```yaml # Edit redis.yaml replica: replicaCount: 2 # Increase from 1 to 2 for additional read replica ``` ## Troubleshooting ### **Cluster Status** ```bash # Check Redis pods kubectl get pods -n redis-system kubectl logs redis-master-0 -n redis-system kubectl logs redis-replica-0 -n redis-system # Connect to Redis kubectl exec -it redis-master-0 -n redis-system -- redis-cli -a $(kubectl get secret redis-credentials -n redis-system -o jsonpath="{.data.redis-password}" | base64 -d) ``` ### **Monitoring & Metrics** ```bash # Check ServiceMonitor kubectl get servicemonitor -n redis-system kubectl describe servicemonitor redis-metrics -n redis-system # Check metrics endpoint directly kubectl port-forward -n redis-system svc/redis-metrics 9121:9121 curl http://localhost:9121/metrics ``` ### **Replication Status** ```bash # Check replication from master kubectl exec -it redis-master-0 -n redis-system -- redis-cli -a $(kubectl get secret redis-credentials -n redis-system -o jsonpath="{.data.redis-password}" | base64 -d) INFO replication # Check replica status kubectl exec -it redis-replica-0 -n redis-system -- redis-cli -a $(kubectl get secret redis-credentials -n redis-system -o jsonpath="{.data.redis-password}" | base64 -d) INFO replication ``` ### **Performance Testing** ```bash # Benchmark Redis performance kubectl exec -it redis-master-0 -n redis-system -- redis-benchmark -h redis-ha-haproxy.redis-system.svc.cluster.local -p 6379 -a $(kubectl get secret redis-credentials -n redis-system -o jsonpath="{.data.redis-password}" | base64 -d) -c 50 -n 10000 ``` ## Next Steps 1. **Encrypt secrets**: Use SOPS to encrypt the credentials 2. **Deploy via GitOps**: Commit and push to trigger Flux deployment 3. **Verify deployment**: Monitor pods and services 4. **Update applications**: Configure Harbor and OpenObserve to use Redis 5. **Setup monitoring**: Verify metrics in OpenObserve dashboards