redaction (#1)

Add the redacted source file for demo purposes

Reviewed-on: https://source.michaeldileo.org/michael_dileo/Keybard-Vagabond-Demo/pulls/1
Co-authored-by: Michael DiLeo <michael_dileo@proton.me>
Co-committed-by: Michael DiLeo <michael_dileo@proton.me>
This commit was merged in pull request #1.
This commit is contained in:
2025-12-24 13:40:47 +00:00
committed by michael_dileo
parent 612235d52b
commit 7327d77dcd
333 changed files with 39286 additions and 1 deletions

View File

@@ -0,0 +1,215 @@
# Redis Infrastructure
This directory contains the Redis Primary-Replica setup for high-availability caching on the Kubernetes cluster.
## Architecture
- **2 Redis instances**: 1 primary + 1 replica for high availability
- **Asynchronous replication**: Optimized for 100Mbps VLAN performance
- **Node distribution**: Instances are distributed across n1 and n2 nodes
- **Longhorn storage**: Single replica (Redis handles replication), Delete reclaim policy (cache data)
- **Bitnami Redis**: Industry-standard Helm chart with comprehensive features
## Components
### **Core Components**
- `namespace.yaml`: Redis system namespace
- `repository.yaml`: Bitnami Helm repository
- `redis.yaml`: Redis primary-replica deployment
- `redis-storageclass.yaml`: Optimized storage class for Redis
- `secret.yaml`: SOPS-encrypted Redis credentials
### **Monitoring Components**
- `monitoring.yaml`: ServiceMonitor for OpenObserve integration
- `redis-exporter.yaml`: Dedicated Redis exporter for comprehensive metrics
- Built-in metrics: Redis exporter with Celery queue monitoring
### **Backup Components**
- **Integrated with existing Longhorn backup**: Uses existing S3 backup infrastructure
- S3 integration: Automated backup to Backblaze B2 via existing `longhorn-s3-backup` group
## Services Created
Redis automatically creates these services:
- `redis-master`: Write operations (connects to primary) - Port 6379
- `redis-replica`: Read-only operations (connects to replicas) - Port 6379
- `redis-headless`: Service discovery for both instances
## Connection Information
### For Applications
Applications should connect using these connection parameters:
**Write Operations:**
```yaml
host: redis-ha-haproxy.redis-system.svc.cluster.local
port: 6379
auth: <password from redis-credentials secret>
```
**Read Operations:**
```yaml
host: redis-replica.redis-system.svc.cluster.local
port: 6379
auth: <password from redis-credentials secret>
```
### Getting Credentials
The Redis password is stored in SOPS-encrypted secret:
```bash
# Get the Redis password
kubectl get secret redis-credentials -n redis-system -o jsonpath="{.data.redis-password}" | base64 -d
```
## Application Integration Example
Here's how an application deployment would connect:
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: example-app
spec:
template:
spec:
containers:
- name: app
image: example-app:latest
env:
- name: REDIS_HOST_WRITE
value: "redis-ha-haproxy.redis-system.svc.cluster.local"
- name: REDIS_HOST_READ
value: "redis-replica.redis-system.svc.cluster.local"
- name: REDIS_PORT
value: "6379"
- name: REDIS_PASSWORD
valueFrom:
secretKeyRef:
name: redis-credentials
key: redis-password
```
## Monitoring
The Redis cluster includes comprehensive monitoring:
### **Metrics & Monitoring** ✅ **READY**
- **Metrics Port**: 9121 - Redis exporter metrics endpoint
- **ServiceMonitor**: Configured for OpenObserve integration
- **Key Metrics Available**:
- **Performance**: `redis_commands_processed_total`, `redis_connected_clients`, `redis_keyspace_hits_total`
- **Memory**: `redis_memory_used_bytes`, `redis_memory_max_bytes`
- **Replication**: `redis_master_repl_offset`, `redis_replica_lag_seconds`
- **Persistence**: `redis_rdb_last_save_timestamp_seconds`
### **High Availability Monitoring**
- **Automatic Failover**: Manual failover required (unlike PostgreSQL)
- **Health Checks**: Continuous health monitoring with restart policies
- **Async Replication**: Real-time replication lag monitoring
## Backup Strategy
### **Integrated with Existing Longhorn Backup Infrastructure**
Redis volumes automatically use your existing backup system:
- **Daily backups**: 2 AM UTC via `longhorn-s3-backup` group, retain 7 days
- **Weekly backups**: 1 AM Sunday via `longhorn-s3-backup-weekly` group, retain 4 weeks
- **Target**: Backblaze B2 S3 storage via existing setup
- **Type**: Incremental (efficient for Redis datasets)
- **Automatic assignment**: Redis storage class automatically applies backup jobs
### **Redis Persistence**
- **RDB snapshots**: Enabled with periodic saves
- **AOF**: Can be enabled for additional durability if needed
### **Backup Integration**
Redis volumes are automatically backed up because the Redis storage class includes:
```yaml
recurringJobSelector: |
[
{
"name":"longhorn-s3-backup",
"isGroup":true
}
]
```
## Storage Design Decisions
### **Reclaim Policy: Delete**
The Redis storage class uses `reclaimPolicy: Delete` because:
- **Cache Data**: Redis primarily stores ephemeral cache data that can be rebuilt
- **Resource Efficiency**: Automatic cleanup prevents storage waste on your 2-node cluster
- **Cost Optimization**: No orphaned volumes consuming storage space
- **Operational Simplicity**: Clean GitOps deployments without manual volume cleanup
**Note**: Even with Delete policy, data is still backed up to S3 daily for disaster recovery.
## Performance Optimizations
Configured for your 2-node, 100Mbps VLAN setup:
- **Async replication**: Minimizes network impact
- **Local reads**: Applications can read from local Redis replica
- **Memory limits**: 2GB per instance (appropriate for 16GB nodes)
- **Persistence tuning**: Optimized for SSD storage
- **TCP keepalive**: Extended for slower network connections
## Scaling
To add more read replicas:
```yaml
# Edit redis.yaml
replica:
replicaCount: 2 # Increase from 1 to 2 for additional read replica
```
## Troubleshooting
### **Cluster Status**
```bash
# Check Redis pods
kubectl get pods -n redis-system
kubectl logs redis-master-0 -n redis-system
kubectl logs redis-replica-0 -n redis-system
# Connect to Redis
kubectl exec -it redis-master-0 -n redis-system -- redis-cli -a $(kubectl get secret redis-credentials -n redis-system -o jsonpath="{.data.redis-password}" | base64 -d)
```
### **Monitoring & Metrics**
```bash
# Check ServiceMonitor
kubectl get servicemonitor -n redis-system
kubectl describe servicemonitor redis-metrics -n redis-system
# Check metrics endpoint directly
kubectl port-forward -n redis-system svc/redis-metrics 9121:9121
curl http://localhost:9121/metrics
```
### **Replication Status**
```bash
# Check replication from master
kubectl exec -it redis-master-0 -n redis-system -- redis-cli -a $(kubectl get secret redis-credentials -n redis-system -o jsonpath="{.data.redis-password}" | base64 -d) INFO replication
# Check replica status
kubectl exec -it redis-replica-0 -n redis-system -- redis-cli -a $(kubectl get secret redis-credentials -n redis-system -o jsonpath="{.data.redis-password}" | base64 -d) INFO replication
```
### **Performance Testing**
```bash
# Benchmark Redis performance
kubectl exec -it redis-master-0 -n redis-system -- redis-benchmark -h redis-ha-haproxy.redis-system.svc.cluster.local -p 6379 -a $(kubectl get secret redis-credentials -n redis-system -o jsonpath="{.data.redis-password}" | base64 -d) -c 50 -n 10000
```
## Next Steps
1. **Encrypt secrets**: Use SOPS to encrypt the credentials
2. **Deploy via GitOps**: Commit and push to trigger Flux deployment
3. **Verify deployment**: Monitor pods and services
4. **Update applications**: Configure Harbor and OpenObserve to use Redis
5. **Setup monitoring**: Verify metrics in OpenObserve dashboards

View File

@@ -0,0 +1,11 @@
---
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- namespace.yaml
- repository.yaml
# redis-storageclass.yaml moved to longhorn/kustomization.yaml
# StorageClass is managed by Longhorn infrastructure since it's a Longhorn StorageClass
- secret.yaml
- redis.yaml
- monitoring.yaml

View File

@@ -0,0 +1,24 @@
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: redis-metrics
namespace: redis-system
labels:
app: redis
app.kubernetes.io/name: redis
app.kubernetes.io/component: metrics
spec:
selector:
matchLabels:
app.kubernetes.io/name: redis
app.kubernetes.io/component: metrics
endpoints:
- port: http-metrics
interval: 30s
scrapeTimeout: 10s
path: /metrics
scheme: http
namespaceSelector:
matchNames:
- redis-system

View File

@@ -0,0 +1,9 @@
---
apiVersion: v1
kind: Namespace
metadata:
name: redis-system
labels:
name: redis-system
pod-security.kubernetes.io/enforce: restricted
pod-security.kubernetes.io/enforce-version: latest

View File

@@ -0,0 +1,203 @@
---
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: redis
namespace: redis-system
spec:
interval: 5m
chart:
spec:
chart: redis
version: "20.13.4"
sourceRef:
kind: HelmRepository
name: bitnami
namespace: redis-system
values:
redis:
envFrom:
- secretRef:
name: redis-credentials
# Use cluster domain for DNS resolution
clusterDomain: cluster.local
# Global Redis configuration
global:
# Allow non-Bitnami images for redis/redis-exporter
security:
allowInsecureImages: true
redis:
# Use secret for password
existingSecret: redis-credentials
existingSecretPasswordKey: redis-password
# Redis architecture: replication (primary-replica)
architecture: replication
# Authentication configuration
auth:
enabled: true
# Password will be loaded from secret
existingSecret: redis-credentials
existingSecretPasswordKey: redis-password
# Primary Redis configuration
master:
count: 1
podLabels:
app.kubernetes.io/name: redis
app.kubernetes.io/instance: redis
app.kubernetes.io/component: master
# Use bitnamilegacy Redis image (includes Bash/Bitnami entrypoint scripts)
image:
registry: docker.io
repository: bitnamilegacy/redis
disableCommands: []
# Node affinity to ensure primary runs on specific node
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app.kubernetes.io/name: redis
app.kubernetes.io/component: replica
topologyKey: kubernetes.io/hostname
# Resource limits appropriate for your 16GB nodes
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 2000m
memory: 4Gi
# Storage configuration
persistence:
enabled: true
storageClass: longhorn-redis
size: 20Gi
accessModes:
- ReadWriteOnce
# Redis configuration optimized for your setup
configuration: |-
# Network and timeout settings optimized for 100Mbps VLAN
tcp-keepalive 60
timeout 300
# Memory and persistence settings
maxmemory-policy allkeys-lru
save 900 1
save 300 10
save 60 10000
# Replication settings optimized for async over slower network
repl-diskless-sync no
repl-diskless-sync-delay 5
repl-ping-replica-period 10
repl-timeout 60
# Performance optimizations
tcp-backlog 511
databases 16
# Allow scheduling on control plane nodes
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/control-plane
operator: Exists
# Replica Redis configuration
replica:
replicaCount: 0
# Use bitnamilegacy Redis image (includes Bash/Bitnami entrypoint scripts)
image:
registry: docker.io
repository: bitnamilegacy/redis
tag: 8.2.1-debian-12-r0
# Ensure replica runs on different node than primary
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app.kubernetes.io/name: redis
app.kubernetes.io/component: master
topologyKey: kubernetes.io/hostname
# Resource limits for replica
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 2000m
memory: 4Gi
# Storage configuration for replica
persistence:
enabled: true
storageClass: longhorn-redis
size: 20Gi
accessModes:
- ReadWriteOnce
# Allow scheduling on control plane nodes
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/control-plane
operator: Exists
# Metrics configuration for OpenObserve integration
metrics:
enabled: false
# Redis exporter configuration - using bitnamilegacy image (compatible with chart scripts)
image:
registry: docker.io
repository: bitnamilegacy/redis-exporter
tag: 1.76.0-debian-12-r0
# Resources for metrics exporter
resources:
requests:
cpu: 50m
memory: 64Mi
limits:
cpu: 200m
memory: 128Mi
# ServiceMonitor for Prometheus/OpenObserve
serviceMonitor:
enabled: true
namespace: redis-system
interval: 30s
scrapeTimeout: 10s
labels:
app: redis
selector:
matchLabels:
app.kubernetes.io/name: redis
app.kubernetes.io/component: metrics
# Network Policy (optional, can be enabled later)
networkPolicy:
enabled: false
# Pod Disruption Budget for high availability
pdb:
create: true
minAvailable: 1

View File

@@ -0,0 +1,10 @@
---
apiVersion: source.toolkit.fluxcd.io/v1
kind: HelmRepository
metadata:
name: bitnami
namespace: redis-system
spec:
interval: 5m0s
type: oci
url: oci://registry-1.docker.io/bitnamicharts

View File

@@ -0,0 +1,10 @@
apiVersion: v1
kind: Secret
metadata:
name: redis-credentials
namespace: redis-system
type: Opaque
stringData:
REDIS_PASSWORD: <REDACTED>
redis-password: <REDACTED>
redis-replica-password: <REDACTED>

View File

@@ -0,0 +1,9 @@
---
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- namespace.yaml
- repository.yaml
- secret.yaml
- redis.yaml

View File

@@ -0,0 +1,10 @@
---
apiVersion: v1
kind: Namespace
metadata:
name: redis-system
labels:
name: redis-system
pod-security.kubernetes.io/enforce: restricted
pod-security.kubernetes.io/enforce-version: latest

View File

@@ -0,0 +1,103 @@
---
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: redis-ha
namespace: redis-system
spec:
interval: 10m
timeout: 5m
chart:
spec:
chart: redis-ha
version: "4.35.3"
sourceRef:
kind: HelmRepository
name: redis-ha
namespace: redis-system
interval: 1h
install:
remediation:
retries: 3
upgrade:
remediation:
retries: 3
values:
replicas: 3
# Force Redis pods onto distinct nodes
hardAntiAffinity: true
auth: true
existingSecret: redis-credentials
authKey: redis-password
persistentVolume:
enabled: true
storageClass: longhorn-redis
size: 20Gi
podDisruptionBudget:
minAvailable: 1
tolerations:
- key: node-role.kubernetes.io/control-plane
operator: Exists
effect: NoSchedule
redis:
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 2000m
memory: 4Gi
config:
tcp-keepalive: "60"
timeout: "300"
save:
- "900 1"
- "300 10"
- "60 10000"
repl-diskless-sync: "no"
repl-diskless-sync-delay: "5"
repl-ping-replica-period: "10"
repl-timeout: "60"
tcp-backlog: "511"
databases: "16"
maxmemory-policy: "allkeys-lru"
sentinel:
auth: true
existingSecret: redis-credentials
authKey: redis-password
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
haproxy:
enabled: true
hardAntiAffinity: true
metrics:
enabled: true
serviceMonitor:
enabled: true
namespace: redis-system
interval: 30s
telemetryPath: /metrics
timeout: 10s
exporter:
enabled: true
serviceMonitor:
enabled: true
namespace: redis-system
interval: 30s
telemetryPath: /metrics
timeout: 10s

View File

@@ -0,0 +1,10 @@
---
apiVersion: source.toolkit.fluxcd.io/v1
kind: HelmRepository
metadata:
name: redis-ha
namespace: redis-system
spec:
interval: 1h
url: https://dandydeveloper.github.io/charts

View File

@@ -0,0 +1,9 @@
apiVersion: v1
kind: Secret
metadata:
name: redis-credentials
namespace: redis-system
type: Opaque
stringData:
redis-password: <REDACTED>
redis-replica-password: <REDACTED>