Files
Michael DiLeo 7327d77dcd redaction (#1)
Add the redacted source file for demo purposes

Reviewed-on: https://source.michaeldileo.org/michael_dileo/Keybard-Vagabond-Demo/pulls/1
Co-authored-by: Michael DiLeo <michael_dileo@proton.me>
Co-committed-by: Michael DiLeo <michael_dileo@proton.me>
2025-12-24 13:40:47 +00:00

7.3 KiB

Redis Infrastructure

This directory contains the Redis Primary-Replica setup for high-availability caching on the Kubernetes cluster.

Architecture

  • 2 Redis instances: 1 primary + 1 replica for high availability
  • Asynchronous replication: Optimized for 100Mbps VLAN performance
  • Node distribution: Instances are distributed across n1 and n2 nodes
  • Longhorn storage: Single replica (Redis handles replication), Delete reclaim policy (cache data)
  • Bitnami Redis: Industry-standard Helm chart with comprehensive features

Components

Core Components

  • namespace.yaml: Redis system namespace
  • repository.yaml: Bitnami Helm repository
  • redis.yaml: Redis primary-replica deployment
  • redis-storageclass.yaml: Optimized storage class for Redis
  • secret.yaml: SOPS-encrypted Redis credentials

Monitoring Components

  • monitoring.yaml: ServiceMonitor for OpenObserve integration
  • redis-exporter.yaml: Dedicated Redis exporter for comprehensive metrics
  • Built-in metrics: Redis exporter with Celery queue monitoring

Backup Components

  • Integrated with existing Longhorn backup: Uses existing S3 backup infrastructure
  • S3 integration: Automated backup to Backblaze B2 via existing longhorn-s3-backup group

Services Created

Redis automatically creates these services:

  • redis-master: Write operations (connects to primary) - Port 6379
  • redis-replica: Read-only operations (connects to replicas) - Port 6379
  • redis-headless: Service discovery for both instances

Connection Information

For Applications

Applications should connect using these connection parameters:

Write Operations:

host: redis-ha-haproxy.redis-system.svc.cluster.local
port: 6379
auth: <password from redis-credentials secret>

Read Operations:

host: redis-replica.redis-system.svc.cluster.local  
port: 6379
auth: <password from redis-credentials secret>

Getting Credentials

The Redis password is stored in SOPS-encrypted secret:

# Get the Redis password
kubectl get secret redis-credentials -n redis-system -o jsonpath="{.data.redis-password}" | base64 -d

Application Integration Example

Here's how an application deployment would connect:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: example-app
spec:
  template:
    spec:
      containers:
      - name: app
        image: example-app:latest
        env:
        - name: REDIS_HOST_WRITE
          value: "redis-ha-haproxy.redis-system.svc.cluster.local"
        - name: REDIS_HOST_READ
          value: "redis-replica.redis-system.svc.cluster.local"
        - name: REDIS_PORT
          value: "6379"
        - name: REDIS_PASSWORD
          valueFrom:
            secretKeyRef:
              name: redis-credentials
              key: redis-password

Monitoring

The Redis cluster includes comprehensive monitoring:

Metrics & Monitoring READY

  • Metrics Port: 9121 - Redis exporter metrics endpoint
  • ServiceMonitor: Configured for OpenObserve integration
  • Key Metrics Available:
    • Performance: redis_commands_processed_total, redis_connected_clients, redis_keyspace_hits_total
    • Memory: redis_memory_used_bytes, redis_memory_max_bytes
    • Replication: redis_master_repl_offset, redis_replica_lag_seconds
    • Persistence: redis_rdb_last_save_timestamp_seconds

High Availability Monitoring

  • Automatic Failover: Manual failover required (unlike PostgreSQL)
  • Health Checks: Continuous health monitoring with restart policies
  • Async Replication: Real-time replication lag monitoring

Backup Strategy

Integrated with Existing Longhorn Backup Infrastructure

Redis volumes automatically use your existing backup system:

  • Daily backups: 2 AM UTC via longhorn-s3-backup group, retain 7 days
  • Weekly backups: 1 AM Sunday via longhorn-s3-backup-weekly group, retain 4 weeks
  • Target: Backblaze B2 S3 storage via existing setup
  • Type: Incremental (efficient for Redis datasets)
  • Automatic assignment: Redis storage class automatically applies backup jobs

Redis Persistence

  • RDB snapshots: Enabled with periodic saves
  • AOF: Can be enabled for additional durability if needed

Backup Integration

Redis volumes are automatically backed up because the Redis storage class includes:

recurringJobSelector: |
  [
    {
      "name":"longhorn-s3-backup",
      "isGroup":true
    }
  ]

Storage Design Decisions

Reclaim Policy: Delete

The Redis storage class uses reclaimPolicy: Delete because:

  • Cache Data: Redis primarily stores ephemeral cache data that can be rebuilt
  • Resource Efficiency: Automatic cleanup prevents storage waste on your 2-node cluster
  • Cost Optimization: No orphaned volumes consuming storage space
  • Operational Simplicity: Clean GitOps deployments without manual volume cleanup

Note: Even with Delete policy, data is still backed up to S3 daily for disaster recovery.

Performance Optimizations

Configured for your 2-node, 100Mbps VLAN setup:

  • Async replication: Minimizes network impact
  • Local reads: Applications can read from local Redis replica
  • Memory limits: 2GB per instance (appropriate for 16GB nodes)
  • Persistence tuning: Optimized for SSD storage
  • TCP keepalive: Extended for slower network connections

Scaling

To add more read replicas:

# Edit redis.yaml
replica:
  replicaCount: 2  # Increase from 1 to 2 for additional read replica

Troubleshooting

Cluster Status

# Check Redis pods
kubectl get pods -n redis-system
kubectl logs redis-master-0 -n redis-system
kubectl logs redis-replica-0 -n redis-system

# Connect to Redis
kubectl exec -it redis-master-0 -n redis-system -- redis-cli -a $(kubectl get secret redis-credentials -n redis-system -o jsonpath="{.data.redis-password}" | base64 -d)

Monitoring & Metrics

# Check ServiceMonitor
kubectl get servicemonitor -n redis-system
kubectl describe servicemonitor redis-metrics -n redis-system

# Check metrics endpoint directly
kubectl port-forward -n redis-system svc/redis-metrics 9121:9121
curl http://localhost:9121/metrics

Replication Status

# Check replication from master
kubectl exec -it redis-master-0 -n redis-system -- redis-cli -a $(kubectl get secret redis-credentials -n redis-system -o jsonpath="{.data.redis-password}" | base64 -d) INFO replication

# Check replica status
kubectl exec -it redis-replica-0 -n redis-system -- redis-cli -a $(kubectl get secret redis-credentials -n redis-system -o jsonpath="{.data.redis-password}" | base64 -d) INFO replication

Performance Testing

# Benchmark Redis performance
kubectl exec -it redis-master-0 -n redis-system -- redis-benchmark -h redis-ha-haproxy.redis-system.svc.cluster.local -p 6379 -a $(kubectl get secret redis-credentials -n redis-system -o jsonpath="{.data.redis-password}" | base64 -d) -c 50 -n 10000

Next Steps

  1. Encrypt secrets: Use SOPS to encrypt the credentials
  2. Deploy via GitOps: Commit and push to trigger Flux deployment
  3. Verify deployment: Monitor pods and services
  4. Update applications: Configure Harbor and OpenObserve to use Redis
  5. Setup monitoring: Verify metrics in OpenObserve dashboards