Files
Keybard-Vagabond-Demo/manifests/infrastructure/redis/bitnami/README.md
Michael DiLeo 7327d77dcd redaction (#1)
Add the redacted source file for demo purposes

Reviewed-on: https://source.michaeldileo.org/michael_dileo/Keybard-Vagabond-Demo/pulls/1
Co-authored-by: Michael DiLeo <michael_dileo@proton.me>
Co-committed-by: Michael DiLeo <michael_dileo@proton.me>
2025-12-24 13:40:47 +00:00

215 lines
7.3 KiB
Markdown

# Redis Infrastructure
This directory contains the Redis Primary-Replica setup for high-availability caching on the Kubernetes cluster.
## Architecture
- **2 Redis instances**: 1 primary + 1 replica for high availability
- **Asynchronous replication**: Optimized for 100Mbps VLAN performance
- **Node distribution**: Instances are distributed across n1 and n2 nodes
- **Longhorn storage**: Single replica (Redis handles replication), Delete reclaim policy (cache data)
- **Bitnami Redis**: Industry-standard Helm chart with comprehensive features
## Components
### **Core Components**
- `namespace.yaml`: Redis system namespace
- `repository.yaml`: Bitnami Helm repository
- `redis.yaml`: Redis primary-replica deployment
- `redis-storageclass.yaml`: Optimized storage class for Redis
- `secret.yaml`: SOPS-encrypted Redis credentials
### **Monitoring Components**
- `monitoring.yaml`: ServiceMonitor for OpenObserve integration
- `redis-exporter.yaml`: Dedicated Redis exporter for comprehensive metrics
- Built-in metrics: Redis exporter with Celery queue monitoring
### **Backup Components**
- **Integrated with existing Longhorn backup**: Uses existing S3 backup infrastructure
- S3 integration: Automated backup to Backblaze B2 via existing `longhorn-s3-backup` group
## Services Created
Redis automatically creates these services:
- `redis-master`: Write operations (connects to primary) - Port 6379
- `redis-replica`: Read-only operations (connects to replicas) - Port 6379
- `redis-headless`: Service discovery for both instances
## Connection Information
### For Applications
Applications should connect using these connection parameters:
**Write Operations:**
```yaml
host: redis-ha-haproxy.redis-system.svc.cluster.local
port: 6379
auth: <password from redis-credentials secret>
```
**Read Operations:**
```yaml
host: redis-replica.redis-system.svc.cluster.local
port: 6379
auth: <password from redis-credentials secret>
```
### Getting Credentials
The Redis password is stored in SOPS-encrypted secret:
```bash
# Get the Redis password
kubectl get secret redis-credentials -n redis-system -o jsonpath="{.data.redis-password}" | base64 -d
```
## Application Integration Example
Here's how an application deployment would connect:
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: example-app
spec:
template:
spec:
containers:
- name: app
image: example-app:latest
env:
- name: REDIS_HOST_WRITE
value: "redis-ha-haproxy.redis-system.svc.cluster.local"
- name: REDIS_HOST_READ
value: "redis-replica.redis-system.svc.cluster.local"
- name: REDIS_PORT
value: "6379"
- name: REDIS_PASSWORD
valueFrom:
secretKeyRef:
name: redis-credentials
key: redis-password
```
## Monitoring
The Redis cluster includes comprehensive monitoring:
### **Metrics & Monitoring** ✅ **READY**
- **Metrics Port**: 9121 - Redis exporter metrics endpoint
- **ServiceMonitor**: Configured for OpenObserve integration
- **Key Metrics Available**:
- **Performance**: `redis_commands_processed_total`, `redis_connected_clients`, `redis_keyspace_hits_total`
- **Memory**: `redis_memory_used_bytes`, `redis_memory_max_bytes`
- **Replication**: `redis_master_repl_offset`, `redis_replica_lag_seconds`
- **Persistence**: `redis_rdb_last_save_timestamp_seconds`
### **High Availability Monitoring**
- **Automatic Failover**: Manual failover required (unlike PostgreSQL)
- **Health Checks**: Continuous health monitoring with restart policies
- **Async Replication**: Real-time replication lag monitoring
## Backup Strategy
### **Integrated with Existing Longhorn Backup Infrastructure**
Redis volumes automatically use your existing backup system:
- **Daily backups**: 2 AM UTC via `longhorn-s3-backup` group, retain 7 days
- **Weekly backups**: 1 AM Sunday via `longhorn-s3-backup-weekly` group, retain 4 weeks
- **Target**: Backblaze B2 S3 storage via existing setup
- **Type**: Incremental (efficient for Redis datasets)
- **Automatic assignment**: Redis storage class automatically applies backup jobs
### **Redis Persistence**
- **RDB snapshots**: Enabled with periodic saves
- **AOF**: Can be enabled for additional durability if needed
### **Backup Integration**
Redis volumes are automatically backed up because the Redis storage class includes:
```yaml
recurringJobSelector: |
[
{
"name":"longhorn-s3-backup",
"isGroup":true
}
]
```
## Storage Design Decisions
### **Reclaim Policy: Delete**
The Redis storage class uses `reclaimPolicy: Delete` because:
- **Cache Data**: Redis primarily stores ephemeral cache data that can be rebuilt
- **Resource Efficiency**: Automatic cleanup prevents storage waste on your 2-node cluster
- **Cost Optimization**: No orphaned volumes consuming storage space
- **Operational Simplicity**: Clean GitOps deployments without manual volume cleanup
**Note**: Even with Delete policy, data is still backed up to S3 daily for disaster recovery.
## Performance Optimizations
Configured for your 2-node, 100Mbps VLAN setup:
- **Async replication**: Minimizes network impact
- **Local reads**: Applications can read from local Redis replica
- **Memory limits**: 2GB per instance (appropriate for 16GB nodes)
- **Persistence tuning**: Optimized for SSD storage
- **TCP keepalive**: Extended for slower network connections
## Scaling
To add more read replicas:
```yaml
# Edit redis.yaml
replica:
replicaCount: 2 # Increase from 1 to 2 for additional read replica
```
## Troubleshooting
### **Cluster Status**
```bash
# Check Redis pods
kubectl get pods -n redis-system
kubectl logs redis-master-0 -n redis-system
kubectl logs redis-replica-0 -n redis-system
# Connect to Redis
kubectl exec -it redis-master-0 -n redis-system -- redis-cli -a $(kubectl get secret redis-credentials -n redis-system -o jsonpath="{.data.redis-password}" | base64 -d)
```
### **Monitoring & Metrics**
```bash
# Check ServiceMonitor
kubectl get servicemonitor -n redis-system
kubectl describe servicemonitor redis-metrics -n redis-system
# Check metrics endpoint directly
kubectl port-forward -n redis-system svc/redis-metrics 9121:9121
curl http://localhost:9121/metrics
```
### **Replication Status**
```bash
# Check replication from master
kubectl exec -it redis-master-0 -n redis-system -- redis-cli -a $(kubectl get secret redis-credentials -n redis-system -o jsonpath="{.data.redis-password}" | base64 -d) INFO replication
# Check replica status
kubectl exec -it redis-replica-0 -n redis-system -- redis-cli -a $(kubectl get secret redis-credentials -n redis-system -o jsonpath="{.data.redis-password}" | base64 -d) INFO replication
```
### **Performance Testing**
```bash
# Benchmark Redis performance
kubectl exec -it redis-master-0 -n redis-system -- redis-benchmark -h redis-ha-haproxy.redis-system.svc.cluster.local -p 6379 -a $(kubectl get secret redis-credentials -n redis-system -o jsonpath="{.data.redis-password}" | base64 -d) -c 50 -n 10000
```
## Next Steps
1. **Encrypt secrets**: Use SOPS to encrypt the credentials
2. **Deploy via GitOps**: Commit and push to trigger Flux deployment
3. **Verify deployment**: Monitor pods and services
4. **Update applications**: Configure Harbor and OpenObserve to use Redis
5. **Setup monitoring**: Verify metrics in OpenObserve dashboards