Files

299 lines
9.3 KiB
Markdown
Raw Permalink Normal View History

# Auto-Discovery Celery Metrics Exporter
The Celery metrics exporter now **automatically discovers** all Redis databases and their queues without requiring manual configuration. It scans all Redis databases (0-15) and identifies potential Celery queues based on patterns and naming conventions.
## How Auto-Discovery Works
### Automatic Database Scanning
- Scans Redis databases 0-15 by default
- Only monitors databases that contain keys
- Only includes databases that have identifiable queues
### Automatic Queue Discovery
The exporter supports two discovery modes:
#### Smart Filtering Mode (Default: `monitor_all_lists: false`)
Identifies queues using multiple strategies:
1. **Pattern Matching**: Matches known queue patterns from your applications:
- `celery`, `*_priority`, `default`, `mailers`, `push`, `scheduler`
- `streams`, `images`, `suggested_users`, `email`, `connectors`, `lists`, `inbox`, `imports`, `import_triggered`, `misc` (BookWyrm)
- `background`, `send` (PieFed)
- `high`, `mmo` (Pixelfed/Laravel)
2. **Heuristic Detection**: Identifies Redis lists containing queue-related keywords:
- Keys containing: `queue`, `celery`, `task`, `job`, `work`
3. **Type Checking**: Only considers Redis `list` type keys (Celery queues are Redis lists)
#### Monitor Everything Mode (`monitor_all_lists: true`)
- Monitors **ALL** Redis list-type keys in all databases
- No filtering or pattern matching
- Maximum visibility but potentially more noise
- Useful for debugging or comprehensive monitoring
### Which Mode Should You Use?
**Use Smart Filtering (default)** when:
- ✅ You want clean, relevant metrics
- ✅ You care about Prometheus cardinality limits
- ✅ Your applications use standard queue naming
- ✅ You want to avoid monitoring non-queue Redis lists
**Use Monitor Everything** when:
- ✅ You're debugging queue discovery issues
- ✅ You have non-standard queue names not covered by patterns
- ✅ You want absolute certainty you're not missing anything
- ✅ You have sufficient Prometheus storage/performance headroom
- ❌ You don't mind potential noise from non-queue lists
## Configuration (Optional)
While the exporter works completely automatically, you can customize its behavior via the `celery-exporter-config` ConfigMap:
```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: celery-exporter-config
namespace: celery-monitoring
data:
config.yaml: |
# Auto-discovery settings
auto_discovery:
enabled: true
scan_databases: true # Scan all Redis databases 0-15
scan_queues: true # Auto-discover queues in each database
monitor_all_lists: false # If true, monitor ALL Redis lists, not just queue-like ones
# Queue patterns to look for (Redis list keys that are likely Celery queues)
queue_patterns:
- "celery"
- "*_priority"
- "default"
- "mailers"
- "push"
- "scheduler"
- "broadcast"
- "federation"
- "media"
- "user_dir"
# Optional: Database name mapping (if you want friendly names)
# If not specified, databases will be named "db_0", "db_1", etc.
database_names:
0: "piefed"
1: "mastodon"
2: "matrix"
3: "bookwyrm"
# Minimum queue length to report (avoid noise from empty queues)
min_queue_length: 0
# Maximum number of databases to scan (safety limit)
max_databases: 16
```
## Adding New Applications
**No configuration needed!** New applications are automatically discovered when they:
1. **Use a Redis database** (any database 0-15)
2. **Create queues** that match common patterns or contain queue-related keywords
3. **Use Redis lists** for their queues (standard Celery behavior)
### Custom Queue Patterns
If your application uses non-standard queue names, add them to the `queue_patterns` list:
```bash
kubectl edit configmap celery-exporter-config -n celery-monitoring
```
Add your pattern:
```yaml
queue_patterns:
- "celery"
- "*_priority"
- "my_custom_queue_*" # Add your pattern here
```
### Friendly Database Names
To give databases friendly names instead of `db_0`, `db_1`, etc.:
```yaml
database_names:
0: "piefed"
1: "mastodon"
2: "matrix"
3: "bookwyrm"
4: "my_new_app" # Add your app here
```
## Metrics Produced
The exporter produces these metrics for each discovered database:
### `celery_queue_length`
- **Labels**: `queue_name`, `database`, `db_number`
- **Description**: Number of pending tasks in each queue
- **Example**: `celery_queue_length{queue_name="celery", database="piefed", db_number="0"} 1234`
- **Special**: `queue_name="_total"` shows total tasks across all queues in a database
### `redis_connection_status`
- **Labels**: `database`, `db_number`
- **Description**: Connection status per database (1=connected, 0=disconnected)
- **Example**: `redis_connection_status{database="piefed", db_number="0"} 1`
### `celery_databases_discovered`
- **Description**: Total number of databases with queues discovered
- **Example**: `celery_databases_discovered 4`
### `celery_queues_discovered`
- **Labels**: `database`
- **Description**: Number of queues discovered per database
- **Example**: `celery_queues_discovered{database="bookwyrm"} 5`
### `celery_queue_info`
- **Description**: General information about all monitored queues
- **Includes**: Total lengths, Redis host, last update timestamp, auto-discovery status
## PromQL Query Examples
### Discovery Overview
```promql
# How many databases were discovered
celery_databases_discovered
# How many queues per database
celery_queues_discovered
# Auto-discovery status
celery_queue_info
```
### All Applications Overview
```promql
# All queue lengths grouped by database
sum by (database) (celery_queue_length{queue_name!="_total"})
# Total tasks across all databases
sum(celery_queue_length{queue_name="_total"})
# Individual queues (excluding totals)
celery_queue_length{queue_name!="_total"}
# Only active queues (> 0 tasks)
celery_queue_length{queue_name!="_total"} > 0
```
### Specific Applications
```promql
# PieFed queues only
celery_queue_length{database="piefed", queue_name!="_total"}
# BookWyrm high priority queue (if it exists)
celery_queue_length{database="bookwyrm", queue_name="high_priority"}
# All applications' main celery queue
celery_queue_length{queue_name="celery"}
# Database totals only
celery_queue_length{queue_name="_total"}
```
### Processing Rates
```promql
# Tasks processed per minute (negative = queue decreasing)
rate(celery_queue_length{queue_name!="_total"}[5m]) * -60
# Processing rate by database (using totals)
rate(celery_queue_length{queue_name="_total"}[5m]) * -60
# Overall processing rate across all databases
sum(rate(celery_queue_length{queue_name="_total"}[5m]) * -60)
```
### Health Monitoring
```promql
# Databases with connection issues
redis_connection_status == 0
# Queues growing too fast
increase(celery_queue_length{queue_name!="_total"}[5m]) > 1000
# Stalled processing (no change in 15 minutes)
changes(celery_queue_length{queue_name="_total"}[15m]) == 0 and celery_queue_length{queue_name="_total"} > 100
# Databases that stopped being discovered
changes(celery_databases_discovered[10m]) < 0
```
## Troubleshooting
### Check Auto-Discovery Status
```bash
# View current configuration
kubectl get configmap celery-exporter-config -n celery-monitoring -o yaml
# Check exporter logs for discovery results
kubectl logs -n celery-monitoring deployment/celery-metrics-exporter
# Look for discovery messages like:
# "Database 0 (piefed): 1 queues, 245 total keys"
# "Auto-discovery complete: Found 3 databases with queues"
```
### Test Redis Connectivity
```bash
# Test connection to specific database
kubectl exec -n redis-system redis-master-0 -- redis-cli -a PASSWORD -n DB_NUMBER ping
# Check what keys exist in a database
kubectl exec -n redis-system redis-master-0 -- redis-cli -a PASSWORD -n DB_NUMBER keys '*'
# Check if a key is a list (queue)
kubectl exec -n redis-system redis-master-0 -- redis-cli -a PASSWORD -n DB_NUMBER type QUEUE_NAME
# Check queue length manually
kubectl exec -n redis-system redis-master-0 -- redis-cli -a PASSWORD -n DB_NUMBER llen QUEUE_NAME
```
### Validate Metrics
```bash
# Port forward and check metrics endpoint
kubectl port-forward -n celery-monitoring svc/celery-metrics-exporter 8000:8000
# Check discovery metrics
curl http://localhost:8000/metrics | grep celery_databases_discovered
curl http://localhost:8000/metrics | grep celery_queues_discovered
# Check queue metrics
curl http://localhost:8000/metrics | grep celery_queue_length
```
### Debug Discovery Issues
If queues aren't being discovered:
1. **Check queue patterns** - Add your queue names to `queue_patterns`
2. **Verify queue type** - Ensure queues are Redis lists: `redis-cli type queue_name`
3. **Check database numbers** - Verify your app uses the expected Redis database
4. **Review logs** - Look for discovery debug messages in exporter logs
### Force Restart Discovery
```bash
# Restart the exporter to re-run discovery
kubectl rollout restart deployment/celery-metrics-exporter -n celery-monitoring
```
## Security Notes
- The exporter connects to Redis using the shared `redis-credentials` secret
- All database connections use the same Redis host and password
- Only queue length information is exposed, not queue contents
- The exporter scans all databases but only reports queue-like keys
- Metrics are scraped via ServiceMonitor for OpenTelemetry collection