Add the redacted source file for demo purposes Reviewed-on: https://source.michaeldileo.org/michael_dileo/Keybard-Vagabond-Demo/pulls/1 Co-authored-by: Michael DiLeo <michael_dileo@proton.me> Co-committed-by: Michael DiLeo <michael_dileo@proton.me>
299 lines
9.3 KiB
Markdown
299 lines
9.3 KiB
Markdown
# Auto-Discovery Celery Metrics Exporter
|
|
|
|
The Celery metrics exporter now **automatically discovers** all Redis databases and their queues without requiring manual configuration. It scans all Redis databases (0-15) and identifies potential Celery queues based on patterns and naming conventions.
|
|
|
|
## How Auto-Discovery Works
|
|
|
|
### Automatic Database Scanning
|
|
- Scans Redis databases 0-15 by default
|
|
- Only monitors databases that contain keys
|
|
- Only includes databases that have identifiable queues
|
|
|
|
### Automatic Queue Discovery
|
|
|
|
The exporter supports two discovery modes:
|
|
|
|
#### Smart Filtering Mode (Default: `monitor_all_lists: false`)
|
|
Identifies queues using multiple strategies:
|
|
|
|
1. **Pattern Matching**: Matches known queue patterns from your applications:
|
|
- `celery`, `*_priority`, `default`, `mailers`, `push`, `scheduler`
|
|
- `streams`, `images`, `suggested_users`, `email`, `connectors`, `lists`, `inbox`, `imports`, `import_triggered`, `misc` (BookWyrm)
|
|
- `background`, `send` (PieFed)
|
|
- `high`, `mmo` (Pixelfed/Laravel)
|
|
|
|
2. **Heuristic Detection**: Identifies Redis lists containing queue-related keywords:
|
|
- Keys containing: `queue`, `celery`, `task`, `job`, `work`
|
|
|
|
3. **Type Checking**: Only considers Redis `list` type keys (Celery queues are Redis lists)
|
|
|
|
#### Monitor Everything Mode (`monitor_all_lists: true`)
|
|
- Monitors **ALL** Redis list-type keys in all databases
|
|
- No filtering or pattern matching
|
|
- Maximum visibility but potentially more noise
|
|
- Useful for debugging or comprehensive monitoring
|
|
|
|
### Which Mode Should You Use?
|
|
|
|
**Use Smart Filtering (default)** when:
|
|
- ✅ You want clean, relevant metrics
|
|
- ✅ You care about Prometheus cardinality limits
|
|
- ✅ Your applications use standard queue naming
|
|
- ✅ You want to avoid monitoring non-queue Redis lists
|
|
|
|
**Use Monitor Everything** when:
|
|
- ✅ You're debugging queue discovery issues
|
|
- ✅ You have non-standard queue names not covered by patterns
|
|
- ✅ You want absolute certainty you're not missing anything
|
|
- ✅ You have sufficient Prometheus storage/performance headroom
|
|
- ❌ You don't mind potential noise from non-queue lists
|
|
|
|
## Configuration (Optional)
|
|
|
|
While the exporter works completely automatically, you can customize its behavior via the `celery-exporter-config` ConfigMap:
|
|
|
|
```yaml
|
|
apiVersion: v1
|
|
kind: ConfigMap
|
|
metadata:
|
|
name: celery-exporter-config
|
|
namespace: celery-monitoring
|
|
data:
|
|
config.yaml: |
|
|
# Auto-discovery settings
|
|
auto_discovery:
|
|
enabled: true
|
|
scan_databases: true # Scan all Redis databases 0-15
|
|
scan_queues: true # Auto-discover queues in each database
|
|
monitor_all_lists: false # If true, monitor ALL Redis lists, not just queue-like ones
|
|
|
|
# Queue patterns to look for (Redis list keys that are likely Celery queues)
|
|
queue_patterns:
|
|
- "celery"
|
|
- "*_priority"
|
|
- "default"
|
|
- "mailers"
|
|
- "push"
|
|
- "scheduler"
|
|
- "broadcast"
|
|
- "federation"
|
|
- "media"
|
|
- "user_dir"
|
|
|
|
# Optional: Database name mapping (if you want friendly names)
|
|
# If not specified, databases will be named "db_0", "db_1", etc.
|
|
database_names:
|
|
0: "piefed"
|
|
1: "mastodon"
|
|
2: "matrix"
|
|
3: "bookwyrm"
|
|
|
|
# Minimum queue length to report (avoid noise from empty queues)
|
|
min_queue_length: 0
|
|
|
|
# Maximum number of databases to scan (safety limit)
|
|
max_databases: 16
|
|
```
|
|
|
|
## Adding New Applications
|
|
|
|
**No configuration needed!** New applications are automatically discovered when they:
|
|
|
|
1. **Use a Redis database** (any database 0-15)
|
|
2. **Create queues** that match common patterns or contain queue-related keywords
|
|
3. **Use Redis lists** for their queues (standard Celery behavior)
|
|
|
|
### Custom Queue Patterns
|
|
|
|
If your application uses non-standard queue names, add them to the `queue_patterns` list:
|
|
|
|
```bash
|
|
kubectl edit configmap celery-exporter-config -n celery-monitoring
|
|
```
|
|
|
|
Add your pattern:
|
|
```yaml
|
|
queue_patterns:
|
|
- "celery"
|
|
- "*_priority"
|
|
- "my_custom_queue_*" # Add your pattern here
|
|
```
|
|
|
|
### Friendly Database Names
|
|
|
|
To give databases friendly names instead of `db_0`, `db_1`, etc.:
|
|
|
|
```yaml
|
|
database_names:
|
|
0: "piefed"
|
|
1: "mastodon"
|
|
2: "matrix"
|
|
3: "bookwyrm"
|
|
4: "my_new_app" # Add your app here
|
|
```
|
|
|
|
## Metrics Produced
|
|
|
|
The exporter produces these metrics for each discovered database:
|
|
|
|
### `celery_queue_length`
|
|
- **Labels**: `queue_name`, `database`, `db_number`
|
|
- **Description**: Number of pending tasks in each queue
|
|
- **Example**: `celery_queue_length{queue_name="celery", database="piefed", db_number="0"} 1234`
|
|
- **Special**: `queue_name="_total"` shows total tasks across all queues in a database
|
|
|
|
### `redis_connection_status`
|
|
- **Labels**: `database`, `db_number`
|
|
- **Description**: Connection status per database (1=connected, 0=disconnected)
|
|
- **Example**: `redis_connection_status{database="piefed", db_number="0"} 1`
|
|
|
|
### `celery_databases_discovered`
|
|
- **Description**: Total number of databases with queues discovered
|
|
- **Example**: `celery_databases_discovered 4`
|
|
|
|
### `celery_queues_discovered`
|
|
- **Labels**: `database`
|
|
- **Description**: Number of queues discovered per database
|
|
- **Example**: `celery_queues_discovered{database="bookwyrm"} 5`
|
|
|
|
### `celery_queue_info`
|
|
- **Description**: General information about all monitored queues
|
|
- **Includes**: Total lengths, Redis host, last update timestamp, auto-discovery status
|
|
|
|
## PromQL Query Examples
|
|
|
|
### Discovery Overview
|
|
```promql
|
|
# How many databases were discovered
|
|
celery_databases_discovered
|
|
|
|
# How many queues per database
|
|
celery_queues_discovered
|
|
|
|
# Auto-discovery status
|
|
celery_queue_info
|
|
```
|
|
|
|
### All Applications Overview
|
|
```promql
|
|
# All queue lengths grouped by database
|
|
sum by (database) (celery_queue_length{queue_name!="_total"})
|
|
|
|
# Total tasks across all databases
|
|
sum(celery_queue_length{queue_name="_total"})
|
|
|
|
# Individual queues (excluding totals)
|
|
celery_queue_length{queue_name!="_total"}
|
|
|
|
# Only active queues (> 0 tasks)
|
|
celery_queue_length{queue_name!="_total"} > 0
|
|
```
|
|
|
|
### Specific Applications
|
|
```promql
|
|
# PieFed queues only
|
|
celery_queue_length{database="piefed", queue_name!="_total"}
|
|
|
|
# BookWyrm high priority queue (if it exists)
|
|
celery_queue_length{database="bookwyrm", queue_name="high_priority"}
|
|
|
|
# All applications' main celery queue
|
|
celery_queue_length{queue_name="celery"}
|
|
|
|
# Database totals only
|
|
celery_queue_length{queue_name="_total"}
|
|
```
|
|
|
|
### Processing Rates
|
|
```promql
|
|
# Tasks processed per minute (negative = queue decreasing)
|
|
rate(celery_queue_length{queue_name!="_total"}[5m]) * -60
|
|
|
|
# Processing rate by database (using totals)
|
|
rate(celery_queue_length{queue_name="_total"}[5m]) * -60
|
|
|
|
# Overall processing rate across all databases
|
|
sum(rate(celery_queue_length{queue_name="_total"}[5m]) * -60)
|
|
```
|
|
|
|
### Health Monitoring
|
|
```promql
|
|
# Databases with connection issues
|
|
redis_connection_status == 0
|
|
|
|
# Queues growing too fast
|
|
increase(celery_queue_length{queue_name!="_total"}[5m]) > 1000
|
|
|
|
# Stalled processing (no change in 15 minutes)
|
|
changes(celery_queue_length{queue_name="_total"}[15m]) == 0 and celery_queue_length{queue_name="_total"} > 100
|
|
|
|
# Databases that stopped being discovered
|
|
changes(celery_databases_discovered[10m]) < 0
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### Check Auto-Discovery Status
|
|
```bash
|
|
# View current configuration
|
|
kubectl get configmap celery-exporter-config -n celery-monitoring -o yaml
|
|
|
|
# Check exporter logs for discovery results
|
|
kubectl logs -n celery-monitoring deployment/celery-metrics-exporter
|
|
|
|
# Look for discovery messages like:
|
|
# "Database 0 (piefed): 1 queues, 245 total keys"
|
|
# "Auto-discovery complete: Found 3 databases with queues"
|
|
```
|
|
|
|
### Test Redis Connectivity
|
|
```bash
|
|
# Test connection to specific database
|
|
kubectl exec -n redis-system redis-master-0 -- redis-cli -a PASSWORD -n DB_NUMBER ping
|
|
|
|
# Check what keys exist in a database
|
|
kubectl exec -n redis-system redis-master-0 -- redis-cli -a PASSWORD -n DB_NUMBER keys '*'
|
|
|
|
# Check if a key is a list (queue)
|
|
kubectl exec -n redis-system redis-master-0 -- redis-cli -a PASSWORD -n DB_NUMBER type QUEUE_NAME
|
|
|
|
# Check queue length manually
|
|
kubectl exec -n redis-system redis-master-0 -- redis-cli -a PASSWORD -n DB_NUMBER llen QUEUE_NAME
|
|
```
|
|
|
|
### Validate Metrics
|
|
```bash
|
|
# Port forward and check metrics endpoint
|
|
kubectl port-forward -n celery-monitoring svc/celery-metrics-exporter 8000:8000
|
|
|
|
# Check discovery metrics
|
|
curl http://localhost:8000/metrics | grep celery_databases_discovered
|
|
curl http://localhost:8000/metrics | grep celery_queues_discovered
|
|
|
|
# Check queue metrics
|
|
curl http://localhost:8000/metrics | grep celery_queue_length
|
|
```
|
|
|
|
### Debug Discovery Issues
|
|
|
|
If queues aren't being discovered:
|
|
|
|
1. **Check queue patterns** - Add your queue names to `queue_patterns`
|
|
2. **Verify queue type** - Ensure queues are Redis lists: `redis-cli type queue_name`
|
|
3. **Check database numbers** - Verify your app uses the expected Redis database
|
|
4. **Review logs** - Look for discovery debug messages in exporter logs
|
|
|
|
### Force Restart Discovery
|
|
```bash
|
|
# Restart the exporter to re-run discovery
|
|
kubectl rollout restart deployment/celery-metrics-exporter -n celery-monitoring
|
|
```
|
|
|
|
## Security Notes
|
|
|
|
- The exporter connects to Redis using the shared `redis-credentials` secret
|
|
- All database connections use the same Redis host and password
|
|
- Only queue length information is exposed, not queue contents
|
|
- The exporter scans all databases but only reports queue-like keys
|
|
- Metrics are scraped via ServiceMonitor for OpenTelemetry collection
|