215 lines
7.3 KiB
Markdown
215 lines
7.3 KiB
Markdown
|
|
# Redis Infrastructure
|
||
|
|
|
||
|
|
This directory contains the Redis Primary-Replica setup for high-availability caching on the Kubernetes cluster.
|
||
|
|
|
||
|
|
## Architecture
|
||
|
|
|
||
|
|
- **2 Redis instances**: 1 primary + 1 replica for high availability
|
||
|
|
- **Asynchronous replication**: Optimized for 100Mbps VLAN performance
|
||
|
|
- **Node distribution**: Instances are distributed across n1 and n2 nodes
|
||
|
|
- **Longhorn storage**: Single replica (Redis handles replication), Delete reclaim policy (cache data)
|
||
|
|
- **Bitnami Redis**: Industry-standard Helm chart with comprehensive features
|
||
|
|
|
||
|
|
## Components
|
||
|
|
|
||
|
|
### **Core Components**
|
||
|
|
- `namespace.yaml`: Redis system namespace
|
||
|
|
- `repository.yaml`: Bitnami Helm repository
|
||
|
|
- `redis.yaml`: Redis primary-replica deployment
|
||
|
|
- `redis-storageclass.yaml`: Optimized storage class for Redis
|
||
|
|
- `secret.yaml`: SOPS-encrypted Redis credentials
|
||
|
|
|
||
|
|
### **Monitoring Components**
|
||
|
|
- `monitoring.yaml`: ServiceMonitor for OpenObserve integration
|
||
|
|
- `redis-exporter.yaml`: Dedicated Redis exporter for comprehensive metrics
|
||
|
|
- Built-in metrics: Redis exporter with Celery queue monitoring
|
||
|
|
|
||
|
|
### **Backup Components**
|
||
|
|
- **Integrated with existing Longhorn backup**: Uses existing S3 backup infrastructure
|
||
|
|
- S3 integration: Automated backup to Backblaze B2 via existing `longhorn-s3-backup` group
|
||
|
|
|
||
|
|
## Services Created
|
||
|
|
|
||
|
|
Redis automatically creates these services:
|
||
|
|
|
||
|
|
- `redis-master`: Write operations (connects to primary) - Port 6379
|
||
|
|
- `redis-replica`: Read-only operations (connects to replicas) - Port 6379
|
||
|
|
- `redis-headless`: Service discovery for both instances
|
||
|
|
|
||
|
|
## Connection Information
|
||
|
|
|
||
|
|
### For Applications
|
||
|
|
|
||
|
|
Applications should connect using these connection parameters:
|
||
|
|
|
||
|
|
**Write Operations:**
|
||
|
|
```yaml
|
||
|
|
host: redis-ha-haproxy.redis-system.svc.cluster.local
|
||
|
|
port: 6379
|
||
|
|
auth: <password from redis-credentials secret>
|
||
|
|
```
|
||
|
|
|
||
|
|
**Read Operations:**
|
||
|
|
```yaml
|
||
|
|
host: redis-replica.redis-system.svc.cluster.local
|
||
|
|
port: 6379
|
||
|
|
auth: <password from redis-credentials secret>
|
||
|
|
```
|
||
|
|
|
||
|
|
### Getting Credentials
|
||
|
|
|
||
|
|
The Redis password is stored in SOPS-encrypted secret:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Get the Redis password
|
||
|
|
kubectl get secret redis-credentials -n redis-system -o jsonpath="{.data.redis-password}" | base64 -d
|
||
|
|
```
|
||
|
|
|
||
|
|
## Application Integration Example
|
||
|
|
|
||
|
|
Here's how an application deployment would connect:
|
||
|
|
|
||
|
|
```yaml
|
||
|
|
apiVersion: apps/v1
|
||
|
|
kind: Deployment
|
||
|
|
metadata:
|
||
|
|
name: example-app
|
||
|
|
spec:
|
||
|
|
template:
|
||
|
|
spec:
|
||
|
|
containers:
|
||
|
|
- name: app
|
||
|
|
image: example-app:latest
|
||
|
|
env:
|
||
|
|
- name: REDIS_HOST_WRITE
|
||
|
|
value: "redis-ha-haproxy.redis-system.svc.cluster.local"
|
||
|
|
- name: REDIS_HOST_READ
|
||
|
|
value: "redis-replica.redis-system.svc.cluster.local"
|
||
|
|
- name: REDIS_PORT
|
||
|
|
value: "6379"
|
||
|
|
- name: REDIS_PASSWORD
|
||
|
|
valueFrom:
|
||
|
|
secretKeyRef:
|
||
|
|
name: redis-credentials
|
||
|
|
key: redis-password
|
||
|
|
```
|
||
|
|
|
||
|
|
## Monitoring
|
||
|
|
|
||
|
|
The Redis cluster includes comprehensive monitoring:
|
||
|
|
|
||
|
|
### **Metrics & Monitoring** ✅ **READY**
|
||
|
|
- **Metrics Port**: 9121 - Redis exporter metrics endpoint
|
||
|
|
- **ServiceMonitor**: Configured for OpenObserve integration
|
||
|
|
- **Key Metrics Available**:
|
||
|
|
- **Performance**: `redis_commands_processed_total`, `redis_connected_clients`, `redis_keyspace_hits_total`
|
||
|
|
- **Memory**: `redis_memory_used_bytes`, `redis_memory_max_bytes`
|
||
|
|
- **Replication**: `redis_master_repl_offset`, `redis_replica_lag_seconds`
|
||
|
|
- **Persistence**: `redis_rdb_last_save_timestamp_seconds`
|
||
|
|
|
||
|
|
### **High Availability Monitoring**
|
||
|
|
- **Automatic Failover**: Manual failover required (unlike PostgreSQL)
|
||
|
|
- **Health Checks**: Continuous health monitoring with restart policies
|
||
|
|
- **Async Replication**: Real-time replication lag monitoring
|
||
|
|
|
||
|
|
## Backup Strategy
|
||
|
|
|
||
|
|
### **Integrated with Existing Longhorn Backup Infrastructure**
|
||
|
|
Redis volumes automatically use your existing backup system:
|
||
|
|
- **Daily backups**: 2 AM UTC via `longhorn-s3-backup` group, retain 7 days
|
||
|
|
- **Weekly backups**: 1 AM Sunday via `longhorn-s3-backup-weekly` group, retain 4 weeks
|
||
|
|
- **Target**: Backblaze B2 S3 storage via existing setup
|
||
|
|
- **Type**: Incremental (efficient for Redis datasets)
|
||
|
|
- **Automatic assignment**: Redis storage class automatically applies backup jobs
|
||
|
|
|
||
|
|
### **Redis Persistence**
|
||
|
|
- **RDB snapshots**: Enabled with periodic saves
|
||
|
|
- **AOF**: Can be enabled for additional durability if needed
|
||
|
|
|
||
|
|
### **Backup Integration**
|
||
|
|
Redis volumes are automatically backed up because the Redis storage class includes:
|
||
|
|
```yaml
|
||
|
|
recurringJobSelector: |
|
||
|
|
[
|
||
|
|
{
|
||
|
|
"name":"longhorn-s3-backup",
|
||
|
|
"isGroup":true
|
||
|
|
}
|
||
|
|
]
|
||
|
|
```
|
||
|
|
|
||
|
|
## Storage Design Decisions
|
||
|
|
|
||
|
|
### **Reclaim Policy: Delete**
|
||
|
|
The Redis storage class uses `reclaimPolicy: Delete` because:
|
||
|
|
- **Cache Data**: Redis primarily stores ephemeral cache data that can be rebuilt
|
||
|
|
- **Resource Efficiency**: Automatic cleanup prevents storage waste on your 2-node cluster
|
||
|
|
- **Cost Optimization**: No orphaned volumes consuming storage space
|
||
|
|
- **Operational Simplicity**: Clean GitOps deployments without manual volume cleanup
|
||
|
|
|
||
|
|
**Note**: Even with Delete policy, data is still backed up to S3 daily for disaster recovery.
|
||
|
|
|
||
|
|
## Performance Optimizations
|
||
|
|
|
||
|
|
Configured for your 2-node, 100Mbps VLAN setup:
|
||
|
|
- **Async replication**: Minimizes network impact
|
||
|
|
- **Local reads**: Applications can read from local Redis replica
|
||
|
|
- **Memory limits**: 2GB per instance (appropriate for 16GB nodes)
|
||
|
|
- **Persistence tuning**: Optimized for SSD storage
|
||
|
|
- **TCP keepalive**: Extended for slower network connections
|
||
|
|
|
||
|
|
## Scaling
|
||
|
|
|
||
|
|
To add more read replicas:
|
||
|
|
```yaml
|
||
|
|
# Edit redis.yaml
|
||
|
|
replica:
|
||
|
|
replicaCount: 2 # Increase from 1 to 2 for additional read replica
|
||
|
|
```
|
||
|
|
|
||
|
|
## Troubleshooting
|
||
|
|
|
||
|
|
### **Cluster Status**
|
||
|
|
```bash
|
||
|
|
# Check Redis pods
|
||
|
|
kubectl get pods -n redis-system
|
||
|
|
kubectl logs redis-master-0 -n redis-system
|
||
|
|
kubectl logs redis-replica-0 -n redis-system
|
||
|
|
|
||
|
|
# Connect to Redis
|
||
|
|
kubectl exec -it redis-master-0 -n redis-system -- redis-cli -a $(kubectl get secret redis-credentials -n redis-system -o jsonpath="{.data.redis-password}" | base64 -d)
|
||
|
|
```
|
||
|
|
|
||
|
|
### **Monitoring & Metrics**
|
||
|
|
```bash
|
||
|
|
# Check ServiceMonitor
|
||
|
|
kubectl get servicemonitor -n redis-system
|
||
|
|
kubectl describe servicemonitor redis-metrics -n redis-system
|
||
|
|
|
||
|
|
# Check metrics endpoint directly
|
||
|
|
kubectl port-forward -n redis-system svc/redis-metrics 9121:9121
|
||
|
|
curl http://localhost:9121/metrics
|
||
|
|
```
|
||
|
|
|
||
|
|
### **Replication Status**
|
||
|
|
```bash
|
||
|
|
# Check replication from master
|
||
|
|
kubectl exec -it redis-master-0 -n redis-system -- redis-cli -a $(kubectl get secret redis-credentials -n redis-system -o jsonpath="{.data.redis-password}" | base64 -d) INFO replication
|
||
|
|
|
||
|
|
# Check replica status
|
||
|
|
kubectl exec -it redis-replica-0 -n redis-system -- redis-cli -a $(kubectl get secret redis-credentials -n redis-system -o jsonpath="{.data.redis-password}" | base64 -d) INFO replication
|
||
|
|
```
|
||
|
|
|
||
|
|
### **Performance Testing**
|
||
|
|
```bash
|
||
|
|
# Benchmark Redis performance
|
||
|
|
kubectl exec -it redis-master-0 -n redis-system -- redis-benchmark -h redis-ha-haproxy.redis-system.svc.cluster.local -p 6379 -a $(kubectl get secret redis-credentials -n redis-system -o jsonpath="{.data.redis-password}" | base64 -d) -c 50 -n 10000
|
||
|
|
```
|
||
|
|
|
||
|
|
## Next Steps
|
||
|
|
|
||
|
|
1. **Encrypt secrets**: Use SOPS to encrypt the credentials
|
||
|
|
2. **Deploy via GitOps**: Commit and push to trigger Flux deployment
|
||
|
|
3. **Verify deployment**: Monitor pods and services
|
||
|
|
4. **Update applications**: Configure Harbor and OpenObserve to use Redis
|
||
|
|
5. **Setup monitoring**: Verify metrics in OpenObserve dashboards
|