add source code and readme
This commit is contained in:
298
manifests/infrastructure/celery-monitoring/DATABASE-CONFIG.md
Normal file
298
manifests/infrastructure/celery-monitoring/DATABASE-CONFIG.md
Normal file
@@ -0,0 +1,298 @@
|
||||
# Auto-Discovery Celery Metrics Exporter
|
||||
|
||||
The Celery metrics exporter now **automatically discovers** all Redis databases and their queues without requiring manual configuration. It scans all Redis databases (0-15) and identifies potential Celery queues based on patterns and naming conventions.
|
||||
|
||||
## How Auto-Discovery Works
|
||||
|
||||
### Automatic Database Scanning
|
||||
- Scans Redis databases 0-15 by default
|
||||
- Only monitors databases that contain keys
|
||||
- Only includes databases that have identifiable queues
|
||||
|
||||
### Automatic Queue Discovery
|
||||
|
||||
The exporter supports two discovery modes:
|
||||
|
||||
#### Smart Filtering Mode (Default: `monitor_all_lists: false`)
|
||||
Identifies queues using multiple strategies:
|
||||
|
||||
1. **Pattern Matching**: Matches known queue patterns from your applications:
|
||||
- `celery`, `*_priority`, `default`, `mailers`, `push`, `scheduler`
|
||||
- `streams`, `images`, `suggested_users`, `email`, `connectors`, `lists`, `inbox`, `imports`, `import_triggered`, `misc` (BookWyrm)
|
||||
- `background`, `send` (PieFed)
|
||||
- `high`, `mmo` (Pixelfed/Laravel)
|
||||
|
||||
2. **Heuristic Detection**: Identifies Redis lists containing queue-related keywords:
|
||||
- Keys containing: `queue`, `celery`, `task`, `job`, `work`
|
||||
|
||||
3. **Type Checking**: Only considers Redis `list` type keys (Celery queues are Redis lists)
|
||||
|
||||
#### Monitor Everything Mode (`monitor_all_lists: true`)
|
||||
- Monitors **ALL** Redis list-type keys in all databases
|
||||
- No filtering or pattern matching
|
||||
- Maximum visibility but potentially more noise
|
||||
- Useful for debugging or comprehensive monitoring
|
||||
|
||||
### Which Mode Should You Use?
|
||||
|
||||
**Use Smart Filtering (default)** when:
|
||||
- ✅ You want clean, relevant metrics
|
||||
- ✅ You care about Prometheus cardinality limits
|
||||
- ✅ Your applications use standard queue naming
|
||||
- ✅ You want to avoid monitoring non-queue Redis lists
|
||||
|
||||
**Use Monitor Everything** when:
|
||||
- ✅ You're debugging queue discovery issues
|
||||
- ✅ You have non-standard queue names not covered by patterns
|
||||
- ✅ You want absolute certainty you're not missing anything
|
||||
- ✅ You have sufficient Prometheus storage/performance headroom
|
||||
- ❌ You don't mind potential noise from non-queue lists
|
||||
|
||||
## Configuration (Optional)
|
||||
|
||||
While the exporter works completely automatically, you can customize its behavior via the `celery-exporter-config` ConfigMap:
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: ConfigMap
|
||||
metadata:
|
||||
name: celery-exporter-config
|
||||
namespace: celery-monitoring
|
||||
data:
|
||||
config.yaml: |
|
||||
# Auto-discovery settings
|
||||
auto_discovery:
|
||||
enabled: true
|
||||
scan_databases: true # Scan all Redis databases 0-15
|
||||
scan_queues: true # Auto-discover queues in each database
|
||||
monitor_all_lists: false # If true, monitor ALL Redis lists, not just queue-like ones
|
||||
|
||||
# Queue patterns to look for (Redis list keys that are likely Celery queues)
|
||||
queue_patterns:
|
||||
- "celery"
|
||||
- "*_priority"
|
||||
- "default"
|
||||
- "mailers"
|
||||
- "push"
|
||||
- "scheduler"
|
||||
- "broadcast"
|
||||
- "federation"
|
||||
- "media"
|
||||
- "user_dir"
|
||||
|
||||
# Optional: Database name mapping (if you want friendly names)
|
||||
# If not specified, databases will be named "db_0", "db_1", etc.
|
||||
database_names:
|
||||
0: "piefed"
|
||||
1: "mastodon"
|
||||
2: "matrix"
|
||||
3: "bookwyrm"
|
||||
|
||||
# Minimum queue length to report (avoid noise from empty queues)
|
||||
min_queue_length: 0
|
||||
|
||||
# Maximum number of databases to scan (safety limit)
|
||||
max_databases: 16
|
||||
```
|
||||
|
||||
## Adding New Applications
|
||||
|
||||
**No configuration needed!** New applications are automatically discovered when they:
|
||||
|
||||
1. **Use a Redis database** (any database 0-15)
|
||||
2. **Create queues** that match common patterns or contain queue-related keywords
|
||||
3. **Use Redis lists** for their queues (standard Celery behavior)
|
||||
|
||||
### Custom Queue Patterns
|
||||
|
||||
If your application uses non-standard queue names, add them to the `queue_patterns` list:
|
||||
|
||||
```bash
|
||||
kubectl edit configmap celery-exporter-config -n celery-monitoring
|
||||
```
|
||||
|
||||
Add your pattern:
|
||||
```yaml
|
||||
queue_patterns:
|
||||
- "celery"
|
||||
- "*_priority"
|
||||
- "my_custom_queue_*" # Add your pattern here
|
||||
```
|
||||
|
||||
### Friendly Database Names
|
||||
|
||||
To give databases friendly names instead of `db_0`, `db_1`, etc.:
|
||||
|
||||
```yaml
|
||||
database_names:
|
||||
0: "piefed"
|
||||
1: "mastodon"
|
||||
2: "matrix"
|
||||
3: "bookwyrm"
|
||||
4: "my_new_app" # Add your app here
|
||||
```
|
||||
|
||||
## Metrics Produced
|
||||
|
||||
The exporter produces these metrics for each discovered database:
|
||||
|
||||
### `celery_queue_length`
|
||||
- **Labels**: `queue_name`, `database`, `db_number`
|
||||
- **Description**: Number of pending tasks in each queue
|
||||
- **Example**: `celery_queue_length{queue_name="celery", database="piefed", db_number="0"} 1234`
|
||||
- **Special**: `queue_name="_total"` shows total tasks across all queues in a database
|
||||
|
||||
### `redis_connection_status`
|
||||
- **Labels**: `database`, `db_number`
|
||||
- **Description**: Connection status per database (1=connected, 0=disconnected)
|
||||
- **Example**: `redis_connection_status{database="piefed", db_number="0"} 1`
|
||||
|
||||
### `celery_databases_discovered`
|
||||
- **Description**: Total number of databases with queues discovered
|
||||
- **Example**: `celery_databases_discovered 4`
|
||||
|
||||
### `celery_queues_discovered`
|
||||
- **Labels**: `database`
|
||||
- **Description**: Number of queues discovered per database
|
||||
- **Example**: `celery_queues_discovered{database="bookwyrm"} 5`
|
||||
|
||||
### `celery_queue_info`
|
||||
- **Description**: General information about all monitored queues
|
||||
- **Includes**: Total lengths, Redis host, last update timestamp, auto-discovery status
|
||||
|
||||
## PromQL Query Examples
|
||||
|
||||
### Discovery Overview
|
||||
```promql
|
||||
# How many databases were discovered
|
||||
celery_databases_discovered
|
||||
|
||||
# How many queues per database
|
||||
celery_queues_discovered
|
||||
|
||||
# Auto-discovery status
|
||||
celery_queue_info
|
||||
```
|
||||
|
||||
### All Applications Overview
|
||||
```promql
|
||||
# All queue lengths grouped by database
|
||||
sum by (database) (celery_queue_length{queue_name!="_total"})
|
||||
|
||||
# Total tasks across all databases
|
||||
sum(celery_queue_length{queue_name="_total"})
|
||||
|
||||
# Individual queues (excluding totals)
|
||||
celery_queue_length{queue_name!="_total"}
|
||||
|
||||
# Only active queues (> 0 tasks)
|
||||
celery_queue_length{queue_name!="_total"} > 0
|
||||
```
|
||||
|
||||
### Specific Applications
|
||||
```promql
|
||||
# PieFed queues only
|
||||
celery_queue_length{database="piefed", queue_name!="_total"}
|
||||
|
||||
# BookWyrm high priority queue (if it exists)
|
||||
celery_queue_length{database="bookwyrm", queue_name="high_priority"}
|
||||
|
||||
# All applications' main celery queue
|
||||
celery_queue_length{queue_name="celery"}
|
||||
|
||||
# Database totals only
|
||||
celery_queue_length{queue_name="_total"}
|
||||
```
|
||||
|
||||
### Processing Rates
|
||||
```promql
|
||||
# Tasks processed per minute (negative = queue decreasing)
|
||||
rate(celery_queue_length{queue_name!="_total"}[5m]) * -60
|
||||
|
||||
# Processing rate by database (using totals)
|
||||
rate(celery_queue_length{queue_name="_total"}[5m]) * -60
|
||||
|
||||
# Overall processing rate across all databases
|
||||
sum(rate(celery_queue_length{queue_name="_total"}[5m]) * -60)
|
||||
```
|
||||
|
||||
### Health Monitoring
|
||||
```promql
|
||||
# Databases with connection issues
|
||||
redis_connection_status == 0
|
||||
|
||||
# Queues growing too fast
|
||||
increase(celery_queue_length{queue_name!="_total"}[5m]) > 1000
|
||||
|
||||
# Stalled processing (no change in 15 minutes)
|
||||
changes(celery_queue_length{queue_name="_total"}[15m]) == 0 and celery_queue_length{queue_name="_total"} > 100
|
||||
|
||||
# Databases that stopped being discovered
|
||||
changes(celery_databases_discovered[10m]) < 0
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Check Auto-Discovery Status
|
||||
```bash
|
||||
# View current configuration
|
||||
kubectl get configmap celery-exporter-config -n celery-monitoring -o yaml
|
||||
|
||||
# Check exporter logs for discovery results
|
||||
kubectl logs -n celery-monitoring deployment/celery-metrics-exporter
|
||||
|
||||
# Look for discovery messages like:
|
||||
# "Database 0 (piefed): 1 queues, 245 total keys"
|
||||
# "Auto-discovery complete: Found 3 databases with queues"
|
||||
```
|
||||
|
||||
### Test Redis Connectivity
|
||||
```bash
|
||||
# Test connection to specific database
|
||||
kubectl exec -n redis-system redis-master-0 -- redis-cli -a PASSWORD -n DB_NUMBER ping
|
||||
|
||||
# Check what keys exist in a database
|
||||
kubectl exec -n redis-system redis-master-0 -- redis-cli -a PASSWORD -n DB_NUMBER keys '*'
|
||||
|
||||
# Check if a key is a list (queue)
|
||||
kubectl exec -n redis-system redis-master-0 -- redis-cli -a PASSWORD -n DB_NUMBER type QUEUE_NAME
|
||||
|
||||
# Check queue length manually
|
||||
kubectl exec -n redis-system redis-master-0 -- redis-cli -a PASSWORD -n DB_NUMBER llen QUEUE_NAME
|
||||
```
|
||||
|
||||
### Validate Metrics
|
||||
```bash
|
||||
# Port forward and check metrics endpoint
|
||||
kubectl port-forward -n celery-monitoring svc/celery-metrics-exporter 8000:8000
|
||||
|
||||
# Check discovery metrics
|
||||
curl http://localhost:8000/metrics | grep celery_databases_discovered
|
||||
curl http://localhost:8000/metrics | grep celery_queues_discovered
|
||||
|
||||
# Check queue metrics
|
||||
curl http://localhost:8000/metrics | grep celery_queue_length
|
||||
```
|
||||
|
||||
### Debug Discovery Issues
|
||||
|
||||
If queues aren't being discovered:
|
||||
|
||||
1. **Check queue patterns** - Add your queue names to `queue_patterns`
|
||||
2. **Verify queue type** - Ensure queues are Redis lists: `redis-cli type queue_name`
|
||||
3. **Check database numbers** - Verify your app uses the expected Redis database
|
||||
4. **Review logs** - Look for discovery debug messages in exporter logs
|
||||
|
||||
### Force Restart Discovery
|
||||
```bash
|
||||
# Restart the exporter to re-run discovery
|
||||
kubectl rollout restart deployment/celery-metrics-exporter -n celery-monitoring
|
||||
```
|
||||
|
||||
## Security Notes
|
||||
|
||||
- The exporter connects to Redis using the shared `redis-credentials` secret
|
||||
- All database connections use the same Redis host and password
|
||||
- Only queue length information is exposed, not queue contents
|
||||
- The exporter scans all databases but only reports queue-like keys
|
||||
- Metrics are scraped via ServiceMonitor for OpenTelemetry collection
|
||||
Reference in New Issue
Block a user