redaction (#1)
Add the redacted source file for demo purposes Reviewed-on: https://source.michaeldileo.org/michael_dileo/Keybard-Vagabond-Demo/pulls/1 Co-authored-by: Michael DiLeo <michael_dileo@proton.me> Co-committed-by: Michael DiLeo <michael_dileo@proton.me>
This commit was merged in pull request #1.
This commit is contained in:
203
manifests/infrastructure/celery-monitoring/README.md
Normal file
203
manifests/infrastructure/celery-monitoring/README.md
Normal file
@@ -0,0 +1,203 @@
|
||||
# Celery Monitoring (Flower)
|
||||
|
||||
This directory contains the infrastructure for monitoring Celery tasks across all applications in the cluster using Flower.
|
||||
|
||||
## Overview
|
||||
|
||||
- **Flower**: Web-based tool for monitoring and administrating Celery clusters
|
||||
- **Multi-Application**: Monitors both PieFed and BookWyrm Celery tasks
|
||||
- **Namespace**: `celery-monitoring`
|
||||
- **URL**: `https://flower.keyboardvagabond.com`
|
||||
|
||||
## Components
|
||||
|
||||
- `namespace.yaml` - Dedicated namespace for monitoring
|
||||
- `flower-deployment.yaml` - Flower application deployment
|
||||
- `service.yaml` - Internal service for Flower
|
||||
- `ingress.yaml` - External access with TLS and basic auth
|
||||
- `kustomization.yaml` - Kustomize configuration
|
||||
|
||||
## Redis Database Monitoring
|
||||
|
||||
Flower monitors multiple Redis databases:
|
||||
- **Database 0**: PieFed Celery broker
|
||||
- **Database 3**: BookWyrm Celery broker
|
||||
|
||||
## Access & Security
|
||||
|
||||
- **Access Method**: kubectl port-forward (local access only)
|
||||
- **Command**: `kubectl port-forward -n celery-monitoring svc/celery-flower 8080:5555`
|
||||
- **URL**: http://localhost:8080
|
||||
- **Security**: No authentication required (local access only)
|
||||
- **Network Policies**: Cilium policies allow cluster and health check access only
|
||||
|
||||
### Port-Forward Setup
|
||||
|
||||
1. **Prerequisites**:
|
||||
- Valid kubeconfig with access to the cluster
|
||||
- kubectl installed and configured
|
||||
- RBAC permissions to create port-forwards in celery-monitoring namespace
|
||||
|
||||
2. **Network Policies**: Cilium policies ensure:
|
||||
- Port 5555 access from cluster and host (for port-forward)
|
||||
- Redis access for monitoring (DB 0 & 3)
|
||||
- Cluster-internal health checks
|
||||
|
||||
3. **No Authentication Required**:
|
||||
- Port-forward provides secure local access
|
||||
- No additional credentials needed
|
||||
|
||||
## **🔒 Simplified Security Architecture**
|
||||
|
||||
**Current Status**: ✅ **Local access via kubectl port-forward**
|
||||
|
||||
### **Security Model**
|
||||
|
||||
**1. Local Access Only**
|
||||
- **Port-Forward**: `kubectl port-forward` provides secure tunnel to the service
|
||||
- **No External Exposure**: Service is not accessible from outside the cluster
|
||||
- **Authentication**: Kubernetes RBAC controls who can create port-forwards
|
||||
- **Encryption**: Traffic encrypted via Kubernetes API tunnel
|
||||
|
||||
**2. Network Layer (Cilium Network Policies)**
|
||||
- **`celery-flower-ingress`**: Allows cluster and host access for port-forward and health checks
|
||||
- **`celery-flower-egress`**: Restricts outbound to Redis and DNS only
|
||||
- **DNS Resolution**: Explicit DNS access for service discovery
|
||||
- **Redis Connectivity**: Targeted access to Redis master (DB 0 & 3)
|
||||
|
||||
**3. Pod-Level Security**
|
||||
- Resource limits (CPU: 500m, Memory: 256Mi)
|
||||
- Health checks (liveness/readiness probes)
|
||||
- Non-root container execution
|
||||
- Read-only root filesystem (where possible)
|
||||
|
||||
### **How It Works**
|
||||
1. **Access Layer**: kubectl port-forward creates secure tunnel via Kubernetes API
|
||||
2. **Network Layer**: Cilium policies ensure only cluster traffic reaches pods
|
||||
3. **Application Layer**: Flower connects only to authorized Redis databases
|
||||
4. **Monitoring Layer**: Health checks ensure service availability
|
||||
5. **Local Security**: Access requires valid kubeconfig and RBAC permissions
|
||||
|
||||
## Features
|
||||
|
||||
- **Flower Web UI**: Real-time task monitoring and worker status
|
||||
- **Prometheus Metrics**: Custom Celery queue metrics exported to OpenObserve
|
||||
- **Automated Alerts**: Queue size and connection status monitoring
|
||||
- **Dashboard**: Visual monitoring of queue trends and processing rates
|
||||
|
||||
## Monitoring & Alerts
|
||||
|
||||
### Metrics Exported
|
||||
|
||||
**From Celery Metrics Exporter** (celery-monitoring namespace):
|
||||
1. **`celery_queue_length`**: Number of pending tasks in each queue
|
||||
- Labels: `queue_name`, `database` (piefed/bookwyrm)
|
||||
|
||||
2. **`redis_connection_status`**: Redis connectivity status (1=connected, 0=disconnected)
|
||||
|
||||
3. **`celery_queue_info`**: General information about queue status
|
||||
|
||||
**From Redis Exporter** (redis-system namespace):
|
||||
4. **`redis_list_length`**: General Redis list lengths including Celery queues
|
||||
5. **`redis_memory_used_bytes`**: Redis memory usage
|
||||
6. **`redis_connected_clients`**: Number of connected Redis clients
|
||||
7. **`redis_commands_total`**: Total Redis commands executed
|
||||
|
||||
### Alert Thresholds
|
||||
|
||||
- **PieFed Warning**: > 10,000 pending tasks
|
||||
- **PieFed Critical**: > 50,000 pending tasks
|
||||
- **BookWyrm Warning**: > 1,000 pending tasks
|
||||
- **Redis Connection**: Connection lost alert
|
||||
|
||||
### OpenObserve Setup
|
||||
|
||||
1. **Deploy the monitoring infrastructure**:
|
||||
```bash
|
||||
kubectl apply -k manifests/infrastructure/celery-monitoring/
|
||||
```
|
||||
|
||||
2. **Import alerts and dashboard**:
|
||||
- Access OpenObserve dashboard
|
||||
- Import alert configurations from the `openobserve-alert-configs` ConfigMap
|
||||
- Import dashboard from the same ConfigMap
|
||||
- Configure webhook URLs for notifications
|
||||
|
||||
3. **Verify metrics collection**:
|
||||
```sql
|
||||
SELECT * FROM metrics WHERE __name__ LIKE 'celery_%' ORDER BY _timestamp DESC LIMIT 10
|
||||
```
|
||||
|
||||
### Useful Monitoring Queries
|
||||
|
||||
**Current queue sizes**:
|
||||
```sql
|
||||
SELECT queue_name, database, celery_queue_length
|
||||
FROM metrics
|
||||
WHERE _timestamp >= now() - interval '5 minutes'
|
||||
GROUP BY queue_name, database
|
||||
ORDER BY celery_queue_length DESC
|
||||
```
|
||||
|
||||
**Queue processing rate**:
|
||||
```sql
|
||||
SELECT _timestamp,
|
||||
celery_queue_length - LAG(celery_queue_length, 1) OVER (ORDER BY _timestamp) as processing_rate
|
||||
FROM metrics
|
||||
WHERE queue_name='celery' AND database='piefed'
|
||||
AND _timestamp >= now() - interval '1 hour'
|
||||
```
|
||||
- Queue length monitoring
|
||||
- Task history and details
|
||||
- Performance metrics
|
||||
- Multi-broker support
|
||||
|
||||
## Dependencies
|
||||
|
||||
- Redis (for Celery brokers)
|
||||
- kubectl (for port-forward access)
|
||||
- Valid kubeconfig with cluster access
|
||||
|
||||
## Testing & Validation
|
||||
|
||||
### Quick Access
|
||||
```bash
|
||||
# Start port-forward (runs in background)
|
||||
kubectl port-forward -n celery-monitoring svc/celery-flower 8080:5555 &
|
||||
|
||||
# Access Flower UI
|
||||
open http://localhost:8080
|
||||
# or visit http://localhost:8080 in your browser
|
||||
|
||||
# Stop port-forward when done
|
||||
pkill -f "kubectl port-forward.*celery-flower"
|
||||
```
|
||||
|
||||
### Manual Testing Checklist
|
||||
1. **Port-Forward Access**: ✅ Can access http://localhost:8080 after port-forward
|
||||
2. **No External Access**: ❌ Service not accessible from outside cluster
|
||||
3. **Redis Connectivity**: 📊 Shows tasks from both PieFed (DB 0) and BookWyrm (DB 3)
|
||||
4. **Health Checks**: ✅ Pod shows Ready status
|
||||
5. **Network Policies**: 🛡️ Egress restricted to DNS and Redis only
|
||||
|
||||
### Troubleshooting Commands
|
||||
```bash
|
||||
# Check Flower pod status
|
||||
kubectl get pods -n celery-monitoring -l app.kubernetes.io/name=celery-flower
|
||||
|
||||
# View Flower logs
|
||||
kubectl logs -n celery-monitoring -l app.kubernetes.io/name=celery-flower
|
||||
|
||||
# Test Redis connectivity
|
||||
kubectl exec -n celery-monitoring -it deployment/celery-flower -- wget -qO- http://localhost:5555
|
||||
|
||||
# Check network policies
|
||||
kubectl get cnp -n celery-monitoring
|
||||
|
||||
# Test port-forward connectivity
|
||||
kubectl port-forward -n celery-monitoring svc/celery-flower 8080:5555 --dry-run=client
|
||||
```
|
||||
|
||||
## Deployment
|
||||
|
||||
Deployed automatically via Flux GitOps from `manifests/cluster/flux-system/celery-monitoring.yaml`.
|
||||
Reference in New Issue
Block a user