Add the redacted source file for demo purposes Reviewed-on: https://source.michaeldileo.org/michael_dileo/Keybard-Vagabond-Demo/pulls/1 Co-authored-by: Michael DiLeo <michael_dileo@proton.me> Co-committed-by: Michael DiLeo <michael_dileo@proton.me>
204 lines
6.9 KiB
Markdown
204 lines
6.9 KiB
Markdown
# Celery Monitoring (Flower)
|
|
|
|
This directory contains the infrastructure for monitoring Celery tasks across all applications in the cluster using Flower.
|
|
|
|
## Overview
|
|
|
|
- **Flower**: Web-based tool for monitoring and administrating Celery clusters
|
|
- **Multi-Application**: Monitors both PieFed and BookWyrm Celery tasks
|
|
- **Namespace**: `celery-monitoring`
|
|
- **URL**: `https://flower.keyboardvagabond.com`
|
|
|
|
## Components
|
|
|
|
- `namespace.yaml` - Dedicated namespace for monitoring
|
|
- `flower-deployment.yaml` - Flower application deployment
|
|
- `service.yaml` - Internal service for Flower
|
|
- `ingress.yaml` - External access with TLS and basic auth
|
|
- `kustomization.yaml` - Kustomize configuration
|
|
|
|
## Redis Database Monitoring
|
|
|
|
Flower monitors multiple Redis databases:
|
|
- **Database 0**: PieFed Celery broker
|
|
- **Database 3**: BookWyrm Celery broker
|
|
|
|
## Access & Security
|
|
|
|
- **Access Method**: kubectl port-forward (local access only)
|
|
- **Command**: `kubectl port-forward -n celery-monitoring svc/celery-flower 8080:5555`
|
|
- **URL**: http://localhost:8080
|
|
- **Security**: No authentication required (local access only)
|
|
- **Network Policies**: Cilium policies allow cluster and health check access only
|
|
|
|
### Port-Forward Setup
|
|
|
|
1. **Prerequisites**:
|
|
- Valid kubeconfig with access to the cluster
|
|
- kubectl installed and configured
|
|
- RBAC permissions to create port-forwards in celery-monitoring namespace
|
|
|
|
2. **Network Policies**: Cilium policies ensure:
|
|
- Port 5555 access from cluster and host (for port-forward)
|
|
- Redis access for monitoring (DB 0 & 3)
|
|
- Cluster-internal health checks
|
|
|
|
3. **No Authentication Required**:
|
|
- Port-forward provides secure local access
|
|
- No additional credentials needed
|
|
|
|
## **🔒 Simplified Security Architecture**
|
|
|
|
**Current Status**: ✅ **Local access via kubectl port-forward**
|
|
|
|
### **Security Model**
|
|
|
|
**1. Local Access Only**
|
|
- **Port-Forward**: `kubectl port-forward` provides secure tunnel to the service
|
|
- **No External Exposure**: Service is not accessible from outside the cluster
|
|
- **Authentication**: Kubernetes RBAC controls who can create port-forwards
|
|
- **Encryption**: Traffic encrypted via Kubernetes API tunnel
|
|
|
|
**2. Network Layer (Cilium Network Policies)**
|
|
- **`celery-flower-ingress`**: Allows cluster and host access for port-forward and health checks
|
|
- **`celery-flower-egress`**: Restricts outbound to Redis and DNS only
|
|
- **DNS Resolution**: Explicit DNS access for service discovery
|
|
- **Redis Connectivity**: Targeted access to Redis master (DB 0 & 3)
|
|
|
|
**3. Pod-Level Security**
|
|
- Resource limits (CPU: 500m, Memory: 256Mi)
|
|
- Health checks (liveness/readiness probes)
|
|
- Non-root container execution
|
|
- Read-only root filesystem (where possible)
|
|
|
|
### **How It Works**
|
|
1. **Access Layer**: kubectl port-forward creates secure tunnel via Kubernetes API
|
|
2. **Network Layer**: Cilium policies ensure only cluster traffic reaches pods
|
|
3. **Application Layer**: Flower connects only to authorized Redis databases
|
|
4. **Monitoring Layer**: Health checks ensure service availability
|
|
5. **Local Security**: Access requires valid kubeconfig and RBAC permissions
|
|
|
|
## Features
|
|
|
|
- **Flower Web UI**: Real-time task monitoring and worker status
|
|
- **Prometheus Metrics**: Custom Celery queue metrics exported to OpenObserve
|
|
- **Automated Alerts**: Queue size and connection status monitoring
|
|
- **Dashboard**: Visual monitoring of queue trends and processing rates
|
|
|
|
## Monitoring & Alerts
|
|
|
|
### Metrics Exported
|
|
|
|
**From Celery Metrics Exporter** (celery-monitoring namespace):
|
|
1. **`celery_queue_length`**: Number of pending tasks in each queue
|
|
- Labels: `queue_name`, `database` (piefed/bookwyrm)
|
|
|
|
2. **`redis_connection_status`**: Redis connectivity status (1=connected, 0=disconnected)
|
|
|
|
3. **`celery_queue_info`**: General information about queue status
|
|
|
|
**From Redis Exporter** (redis-system namespace):
|
|
4. **`redis_list_length`**: General Redis list lengths including Celery queues
|
|
5. **`redis_memory_used_bytes`**: Redis memory usage
|
|
6. **`redis_connected_clients`**: Number of connected Redis clients
|
|
7. **`redis_commands_total`**: Total Redis commands executed
|
|
|
|
### Alert Thresholds
|
|
|
|
- **PieFed Warning**: > 10,000 pending tasks
|
|
- **PieFed Critical**: > 50,000 pending tasks
|
|
- **BookWyrm Warning**: > 1,000 pending tasks
|
|
- **Redis Connection**: Connection lost alert
|
|
|
|
### OpenObserve Setup
|
|
|
|
1. **Deploy the monitoring infrastructure**:
|
|
```bash
|
|
kubectl apply -k manifests/infrastructure/celery-monitoring/
|
|
```
|
|
|
|
2. **Import alerts and dashboard**:
|
|
- Access OpenObserve dashboard
|
|
- Import alert configurations from the `openobserve-alert-configs` ConfigMap
|
|
- Import dashboard from the same ConfigMap
|
|
- Configure webhook URLs for notifications
|
|
|
|
3. **Verify metrics collection**:
|
|
```sql
|
|
SELECT * FROM metrics WHERE __name__ LIKE 'celery_%' ORDER BY _timestamp DESC LIMIT 10
|
|
```
|
|
|
|
### Useful Monitoring Queries
|
|
|
|
**Current queue sizes**:
|
|
```sql
|
|
SELECT queue_name, database, celery_queue_length
|
|
FROM metrics
|
|
WHERE _timestamp >= now() - interval '5 minutes'
|
|
GROUP BY queue_name, database
|
|
ORDER BY celery_queue_length DESC
|
|
```
|
|
|
|
**Queue processing rate**:
|
|
```sql
|
|
SELECT _timestamp,
|
|
celery_queue_length - LAG(celery_queue_length, 1) OVER (ORDER BY _timestamp) as processing_rate
|
|
FROM metrics
|
|
WHERE queue_name='celery' AND database='piefed'
|
|
AND _timestamp >= now() - interval '1 hour'
|
|
```
|
|
- Queue length monitoring
|
|
- Task history and details
|
|
- Performance metrics
|
|
- Multi-broker support
|
|
|
|
## Dependencies
|
|
|
|
- Redis (for Celery brokers)
|
|
- kubectl (for port-forward access)
|
|
- Valid kubeconfig with cluster access
|
|
|
|
## Testing & Validation
|
|
|
|
### Quick Access
|
|
```bash
|
|
# Start port-forward (runs in background)
|
|
kubectl port-forward -n celery-monitoring svc/celery-flower 8080:5555 &
|
|
|
|
# Access Flower UI
|
|
open http://localhost:8080
|
|
# or visit http://localhost:8080 in your browser
|
|
|
|
# Stop port-forward when done
|
|
pkill -f "kubectl port-forward.*celery-flower"
|
|
```
|
|
|
|
### Manual Testing Checklist
|
|
1. **Port-Forward Access**: ✅ Can access http://localhost:8080 after port-forward
|
|
2. **No External Access**: ❌ Service not accessible from outside cluster
|
|
3. **Redis Connectivity**: 📊 Shows tasks from both PieFed (DB 0) and BookWyrm (DB 3)
|
|
4. **Health Checks**: ✅ Pod shows Ready status
|
|
5. **Network Policies**: 🛡️ Egress restricted to DNS and Redis only
|
|
|
|
### Troubleshooting Commands
|
|
```bash
|
|
# Check Flower pod status
|
|
kubectl get pods -n celery-monitoring -l app.kubernetes.io/name=celery-flower
|
|
|
|
# View Flower logs
|
|
kubectl logs -n celery-monitoring -l app.kubernetes.io/name=celery-flower
|
|
|
|
# Test Redis connectivity
|
|
kubectl exec -n celery-monitoring -it deployment/celery-flower -- wget -qO- http://localhost:5555
|
|
|
|
# Check network policies
|
|
kubectl get cnp -n celery-monitoring
|
|
|
|
# Test port-forward connectivity
|
|
kubectl port-forward -n celery-monitoring svc/celery-flower 8080:5555 --dry-run=client
|
|
```
|
|
|
|
## Deployment
|
|
|
|
Deployed automatically via Flux GitOps from `manifests/cluster/flux-system/celery-monitoring.yaml`.
|