add source code and readme
This commit is contained in:
215
manifests/infrastructure/redis/bitnami/README.md
Normal file
215
manifests/infrastructure/redis/bitnami/README.md
Normal file
@@ -0,0 +1,215 @@
|
||||
# Redis Infrastructure
|
||||
|
||||
This directory contains the Redis Primary-Replica setup for high-availability caching on the Kubernetes cluster.
|
||||
|
||||
## Architecture
|
||||
|
||||
- **2 Redis instances**: 1 primary + 1 replica for high availability
|
||||
- **Asynchronous replication**: Optimized for 100Mbps VLAN performance
|
||||
- **Node distribution**: Instances are distributed across n1 and n2 nodes
|
||||
- **Longhorn storage**: Single replica (Redis handles replication), Delete reclaim policy (cache data)
|
||||
- **Bitnami Redis**: Industry-standard Helm chart with comprehensive features
|
||||
|
||||
## Components
|
||||
|
||||
### **Core Components**
|
||||
- `namespace.yaml`: Redis system namespace
|
||||
- `repository.yaml`: Bitnami Helm repository
|
||||
- `redis.yaml`: Redis primary-replica deployment
|
||||
- `redis-storageclass.yaml`: Optimized storage class for Redis
|
||||
- `secret.yaml`: SOPS-encrypted Redis credentials
|
||||
|
||||
### **Monitoring Components**
|
||||
- `monitoring.yaml`: ServiceMonitor for OpenObserve integration
|
||||
- `redis-exporter.yaml`: Dedicated Redis exporter for comprehensive metrics
|
||||
- Built-in metrics: Redis exporter with Celery queue monitoring
|
||||
|
||||
### **Backup Components**
|
||||
- **Integrated with existing Longhorn backup**: Uses existing S3 backup infrastructure
|
||||
- S3 integration: Automated backup to Backblaze B2 via existing `longhorn-s3-backup` group
|
||||
|
||||
## Services Created
|
||||
|
||||
Redis automatically creates these services:
|
||||
|
||||
- `redis-master`: Write operations (connects to primary) - Port 6379
|
||||
- `redis-replica`: Read-only operations (connects to replicas) - Port 6379
|
||||
- `redis-headless`: Service discovery for both instances
|
||||
|
||||
## Connection Information
|
||||
|
||||
### For Applications
|
||||
|
||||
Applications should connect using these connection parameters:
|
||||
|
||||
**Write Operations:**
|
||||
```yaml
|
||||
host: redis-ha-haproxy.redis-system.svc.cluster.local
|
||||
port: 6379
|
||||
auth: <password from redis-credentials secret>
|
||||
```
|
||||
|
||||
**Read Operations:**
|
||||
```yaml
|
||||
host: redis-replica.redis-system.svc.cluster.local
|
||||
port: 6379
|
||||
auth: <password from redis-credentials secret>
|
||||
```
|
||||
|
||||
### Getting Credentials
|
||||
|
||||
The Redis password is stored in SOPS-encrypted secret:
|
||||
|
||||
```bash
|
||||
# Get the Redis password
|
||||
kubectl get secret redis-credentials -n redis-system -o jsonpath="{.data.redis-password}" | base64 -d
|
||||
```
|
||||
|
||||
## Application Integration Example
|
||||
|
||||
Here's how an application deployment would connect:
|
||||
|
||||
```yaml
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: example-app
|
||||
spec:
|
||||
template:
|
||||
spec:
|
||||
containers:
|
||||
- name: app
|
||||
image: example-app:latest
|
||||
env:
|
||||
- name: REDIS_HOST_WRITE
|
||||
value: "redis-ha-haproxy.redis-system.svc.cluster.local"
|
||||
- name: REDIS_HOST_READ
|
||||
value: "redis-replica.redis-system.svc.cluster.local"
|
||||
- name: REDIS_PORT
|
||||
value: "6379"
|
||||
- name: REDIS_PASSWORD
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: redis-credentials
|
||||
key: redis-password
|
||||
```
|
||||
|
||||
## Monitoring
|
||||
|
||||
The Redis cluster includes comprehensive monitoring:
|
||||
|
||||
### **Metrics & Monitoring** ✅ **READY**
|
||||
- **Metrics Port**: 9121 - Redis exporter metrics endpoint
|
||||
- **ServiceMonitor**: Configured for OpenObserve integration
|
||||
- **Key Metrics Available**:
|
||||
- **Performance**: `redis_commands_processed_total`, `redis_connected_clients`, `redis_keyspace_hits_total`
|
||||
- **Memory**: `redis_memory_used_bytes`, `redis_memory_max_bytes`
|
||||
- **Replication**: `redis_master_repl_offset`, `redis_replica_lag_seconds`
|
||||
- **Persistence**: `redis_rdb_last_save_timestamp_seconds`
|
||||
|
||||
### **High Availability Monitoring**
|
||||
- **Automatic Failover**: Manual failover required (unlike PostgreSQL)
|
||||
- **Health Checks**: Continuous health monitoring with restart policies
|
||||
- **Async Replication**: Real-time replication lag monitoring
|
||||
|
||||
## Backup Strategy
|
||||
|
||||
### **Integrated with Existing Longhorn Backup Infrastructure**
|
||||
Redis volumes automatically use your existing backup system:
|
||||
- **Daily backups**: 2 AM UTC via `longhorn-s3-backup` group, retain 7 days
|
||||
- **Weekly backups**: 1 AM Sunday via `longhorn-s3-backup-weekly` group, retain 4 weeks
|
||||
- **Target**: Backblaze B2 S3 storage via existing setup
|
||||
- **Type**: Incremental (efficient for Redis datasets)
|
||||
- **Automatic assignment**: Redis storage class automatically applies backup jobs
|
||||
|
||||
### **Redis Persistence**
|
||||
- **RDB snapshots**: Enabled with periodic saves
|
||||
- **AOF**: Can be enabled for additional durability if needed
|
||||
|
||||
### **Backup Integration**
|
||||
Redis volumes are automatically backed up because the Redis storage class includes:
|
||||
```yaml
|
||||
recurringJobSelector: |
|
||||
[
|
||||
{
|
||||
"name":"longhorn-s3-backup",
|
||||
"isGroup":true
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
## Storage Design Decisions
|
||||
|
||||
### **Reclaim Policy: Delete**
|
||||
The Redis storage class uses `reclaimPolicy: Delete` because:
|
||||
- **Cache Data**: Redis primarily stores ephemeral cache data that can be rebuilt
|
||||
- **Resource Efficiency**: Automatic cleanup prevents storage waste on your 2-node cluster
|
||||
- **Cost Optimization**: No orphaned volumes consuming storage space
|
||||
- **Operational Simplicity**: Clean GitOps deployments without manual volume cleanup
|
||||
|
||||
**Note**: Even with Delete policy, data is still backed up to S3 daily for disaster recovery.
|
||||
|
||||
## Performance Optimizations
|
||||
|
||||
Configured for your 2-node, 100Mbps VLAN setup:
|
||||
- **Async replication**: Minimizes network impact
|
||||
- **Local reads**: Applications can read from local Redis replica
|
||||
- **Memory limits**: 2GB per instance (appropriate for 16GB nodes)
|
||||
- **Persistence tuning**: Optimized for SSD storage
|
||||
- **TCP keepalive**: Extended for slower network connections
|
||||
|
||||
## Scaling
|
||||
|
||||
To add more read replicas:
|
||||
```yaml
|
||||
# Edit redis.yaml
|
||||
replica:
|
||||
replicaCount: 2 # Increase from 1 to 2 for additional read replica
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### **Cluster Status**
|
||||
```bash
|
||||
# Check Redis pods
|
||||
kubectl get pods -n redis-system
|
||||
kubectl logs redis-master-0 -n redis-system
|
||||
kubectl logs redis-replica-0 -n redis-system
|
||||
|
||||
# Connect to Redis
|
||||
kubectl exec -it redis-master-0 -n redis-system -- redis-cli -a $(kubectl get secret redis-credentials -n redis-system -o jsonpath="{.data.redis-password}" | base64 -d)
|
||||
```
|
||||
|
||||
### **Monitoring & Metrics**
|
||||
```bash
|
||||
# Check ServiceMonitor
|
||||
kubectl get servicemonitor -n redis-system
|
||||
kubectl describe servicemonitor redis-metrics -n redis-system
|
||||
|
||||
# Check metrics endpoint directly
|
||||
kubectl port-forward -n redis-system svc/redis-metrics 9121:9121
|
||||
curl http://localhost:9121/metrics
|
||||
```
|
||||
|
||||
### **Replication Status**
|
||||
```bash
|
||||
# Check replication from master
|
||||
kubectl exec -it redis-master-0 -n redis-system -- redis-cli -a $(kubectl get secret redis-credentials -n redis-system -o jsonpath="{.data.redis-password}" | base64 -d) INFO replication
|
||||
|
||||
# Check replica status
|
||||
kubectl exec -it redis-replica-0 -n redis-system -- redis-cli -a $(kubectl get secret redis-credentials -n redis-system -o jsonpath="{.data.redis-password}" | base64 -d) INFO replication
|
||||
```
|
||||
|
||||
### **Performance Testing**
|
||||
```bash
|
||||
# Benchmark Redis performance
|
||||
kubectl exec -it redis-master-0 -n redis-system -- redis-benchmark -h redis-ha-haproxy.redis-system.svc.cluster.local -p 6379 -a $(kubectl get secret redis-credentials -n redis-system -o jsonpath="{.data.redis-password}" | base64 -d) -c 50 -n 10000
|
||||
```
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. **Encrypt secrets**: Use SOPS to encrypt the credentials
|
||||
2. **Deploy via GitOps**: Commit and push to trigger Flux deployment
|
||||
3. **Verify deployment**: Monitor pods and services
|
||||
4. **Update applications**: Configure Harbor and OpenObserve to use Redis
|
||||
5. **Setup monitoring**: Verify metrics in OpenObserve dashboards
|
||||
Reference in New Issue
Block a user