7.3 KiB
Redis Infrastructure
This directory contains the Redis Primary-Replica setup for high-availability caching on the Kubernetes cluster.
Architecture
- 2 Redis instances: 1 primary + 1 replica for high availability
- Asynchronous replication: Optimized for 100Mbps VLAN performance
- Node distribution: Instances are distributed across n1 and n2 nodes
- Longhorn storage: Single replica (Redis handles replication), Delete reclaim policy (cache data)
- Bitnami Redis: Industry-standard Helm chart with comprehensive features
Components
Core Components
namespace.yaml: Redis system namespacerepository.yaml: Bitnami Helm repositoryredis.yaml: Redis primary-replica deploymentredis-storageclass.yaml: Optimized storage class for Redissecret.yaml: SOPS-encrypted Redis credentials
Monitoring Components
monitoring.yaml: ServiceMonitor for OpenObserve integrationredis-exporter.yaml: Dedicated Redis exporter for comprehensive metrics- Built-in metrics: Redis exporter with Celery queue monitoring
Backup Components
- Integrated with existing Longhorn backup: Uses existing S3 backup infrastructure
- S3 integration: Automated backup to Backblaze B2 via existing
longhorn-s3-backupgroup
Services Created
Redis automatically creates these services:
redis-master: Write operations (connects to primary) - Port 6379redis-replica: Read-only operations (connects to replicas) - Port 6379redis-headless: Service discovery for both instances
Connection Information
For Applications
Applications should connect using these connection parameters:
Write Operations:
host: redis-ha-haproxy.redis-system.svc.cluster.local
port: 6379
auth: <password from redis-credentials secret>
Read Operations:
host: redis-replica.redis-system.svc.cluster.local
port: 6379
auth: <password from redis-credentials secret>
Getting Credentials
The Redis password is stored in SOPS-encrypted secret:
# Get the Redis password
kubectl get secret redis-credentials -n redis-system -o jsonpath="{.data.redis-password}" | base64 -d
Application Integration Example
Here's how an application deployment would connect:
apiVersion: apps/v1
kind: Deployment
metadata:
name: example-app
spec:
template:
spec:
containers:
- name: app
image: example-app:latest
env:
- name: REDIS_HOST_WRITE
value: "redis-ha-haproxy.redis-system.svc.cluster.local"
- name: REDIS_HOST_READ
value: "redis-replica.redis-system.svc.cluster.local"
- name: REDIS_PORT
value: "6379"
- name: REDIS_PASSWORD
valueFrom:
secretKeyRef:
name: redis-credentials
key: redis-password
Monitoring
The Redis cluster includes comprehensive monitoring:
Metrics & Monitoring ✅ READY
- Metrics Port: 9121 - Redis exporter metrics endpoint
- ServiceMonitor: Configured for OpenObserve integration
- Key Metrics Available:
- Performance:
redis_commands_processed_total,redis_connected_clients,redis_keyspace_hits_total - Memory:
redis_memory_used_bytes,redis_memory_max_bytes - Replication:
redis_master_repl_offset,redis_replica_lag_seconds - Persistence:
redis_rdb_last_save_timestamp_seconds
- Performance:
High Availability Monitoring
- Automatic Failover: Manual failover required (unlike PostgreSQL)
- Health Checks: Continuous health monitoring with restart policies
- Async Replication: Real-time replication lag monitoring
Backup Strategy
Integrated with Existing Longhorn Backup Infrastructure
Redis volumes automatically use your existing backup system:
- Daily backups: 2 AM UTC via
longhorn-s3-backupgroup, retain 7 days - Weekly backups: 1 AM Sunday via
longhorn-s3-backup-weeklygroup, retain 4 weeks - Target: Backblaze B2 S3 storage via existing setup
- Type: Incremental (efficient for Redis datasets)
- Automatic assignment: Redis storage class automatically applies backup jobs
Redis Persistence
- RDB snapshots: Enabled with periodic saves
- AOF: Can be enabled for additional durability if needed
Backup Integration
Redis volumes are automatically backed up because the Redis storage class includes:
recurringJobSelector: |
[
{
"name":"longhorn-s3-backup",
"isGroup":true
}
]
Storage Design Decisions
Reclaim Policy: Delete
The Redis storage class uses reclaimPolicy: Delete because:
- Cache Data: Redis primarily stores ephemeral cache data that can be rebuilt
- Resource Efficiency: Automatic cleanup prevents storage waste on your 2-node cluster
- Cost Optimization: No orphaned volumes consuming storage space
- Operational Simplicity: Clean GitOps deployments without manual volume cleanup
Note: Even with Delete policy, data is still backed up to S3 daily for disaster recovery.
Performance Optimizations
Configured for your 2-node, 100Mbps VLAN setup:
- Async replication: Minimizes network impact
- Local reads: Applications can read from local Redis replica
- Memory limits: 2GB per instance (appropriate for 16GB nodes)
- Persistence tuning: Optimized for SSD storage
- TCP keepalive: Extended for slower network connections
Scaling
To add more read replicas:
# Edit redis.yaml
replica:
replicaCount: 2 # Increase from 1 to 2 for additional read replica
Troubleshooting
Cluster Status
# Check Redis pods
kubectl get pods -n redis-system
kubectl logs redis-master-0 -n redis-system
kubectl logs redis-replica-0 -n redis-system
# Connect to Redis
kubectl exec -it redis-master-0 -n redis-system -- redis-cli -a $(kubectl get secret redis-credentials -n redis-system -o jsonpath="{.data.redis-password}" | base64 -d)
Monitoring & Metrics
# Check ServiceMonitor
kubectl get servicemonitor -n redis-system
kubectl describe servicemonitor redis-metrics -n redis-system
# Check metrics endpoint directly
kubectl port-forward -n redis-system svc/redis-metrics 9121:9121
curl http://localhost:9121/metrics
Replication Status
# Check replication from master
kubectl exec -it redis-master-0 -n redis-system -- redis-cli -a $(kubectl get secret redis-credentials -n redis-system -o jsonpath="{.data.redis-password}" | base64 -d) INFO replication
# Check replica status
kubectl exec -it redis-replica-0 -n redis-system -- redis-cli -a $(kubectl get secret redis-credentials -n redis-system -o jsonpath="{.data.redis-password}" | base64 -d) INFO replication
Performance Testing
# Benchmark Redis performance
kubectl exec -it redis-master-0 -n redis-system -- redis-benchmark -h redis-ha-haproxy.redis-system.svc.cluster.local -p 6379 -a $(kubectl get secret redis-credentials -n redis-system -o jsonpath="{.data.redis-password}" | base64 -d) -c 50 -n 10000
Next Steps
- Encrypt secrets: Use SOPS to encrypt the credentials
- Deploy via GitOps: Commit and push to trigger Flux deployment
- Verify deployment: Monitor pods and services
- Update applications: Configure Harbor and OpenObserve to use Redis
- Setup monitoring: Verify metrics in OpenObserve dashboards