Files

Michael DiLeo 7327d77dcd redaction (#1 )

Add the redacted source file for demo purposes

Reviewed-on: https://source.michaeldileo.org/michael_dileo/Keybard-Vagabond-Demo/pulls/1
Co-authored-by: Michael DiLeo <michael_dileo@proton.me>
Co-committed-by: Michael DiLeo <michael_dileo@proton.me>

2025-12-24 13:40:47 +00:00

9.3 KiB

Raw Permalink Blame History

Auto-Discovery Celery Metrics Exporter

The Celery metrics exporter now automatically discovers all Redis databases and their queues without requiring manual configuration. It scans all Redis databases (0-15) and identifies potential Celery queues based on patterns and naming conventions.

How Auto-Discovery Works

Automatic Database Scanning

Scans Redis databases 0-15 by default
Only monitors databases that contain keys
Only includes databases that have identifiable queues

Automatic Queue Discovery

The exporter supports two discovery modes:

Smart Filtering Mode (Default: `monitor_all_lists: false`)

Identifies queues using multiple strategies:

Pattern Matching: Matches known queue patterns from your applications:
- celery, *_priority, default, mailers, push, scheduler
- streams, images, suggested_users, email, connectors, lists, inbox, imports, import_triggered, misc (BookWyrm)
- background, send (PieFed)
- high, mmo (Pixelfed/Laravel)
Heuristic Detection: Identifies Redis lists containing queue-related keywords:
- Keys containing: queue, celery, task, job, work
Type Checking: Only considers Redis list type keys (Celery queues are Redis lists)

Monitor Everything Mode (`monitor_all_lists: true`)

Monitors ALL Redis list-type keys in all databases
No filtering or pattern matching
Maximum visibility but potentially more noise
Useful for debugging or comprehensive monitoring

Which Mode Should You Use?

Use Smart Filtering (default) when:

✅ You want clean, relevant metrics
✅ You care about Prometheus cardinality limits
✅ Your applications use standard queue naming
✅ You want to avoid monitoring non-queue Redis lists

Use Monitor Everything when:

✅ You're debugging queue discovery issues
✅ You have non-standard queue names not covered by patterns
✅ You want absolute certainty you're not missing anything
✅ You have sufficient Prometheus storage/performance headroom
❌ You don't mind potential noise from non-queue lists

Configuration (Optional)

While the exporter works completely automatically, you can customize its behavior via the celery-exporter-config ConfigMap:

apiVersion: v1
kind: ConfigMap
metadata:
  name: celery-exporter-config
  namespace: celery-monitoring
data:
  config.yaml: |
    # Auto-discovery settings
    auto_discovery:
      enabled: true
      scan_databases: true  # Scan all Redis databases 0-15
      scan_queues: true     # Auto-discover queues in each database
      monitor_all_lists: false  # If true, monitor ALL Redis lists, not just queue-like ones
      
    # Queue patterns to look for (Redis list keys that are likely Celery queues)
    queue_patterns:
      - "celery"
      - "*_priority"
      - "default"
      - "mailers"
      - "push"
      - "scheduler"
      - "broadcast"
      - "federation"
      - "media"
      - "user_dir"
      
    # Optional: Database name mapping (if you want friendly names)
    # If not specified, databases will be named "db_0", "db_1", etc.
    database_names:
      0: "piefed"
      1: "mastodon"
      2: "matrix"
      3: "bookwyrm"
      
    # Minimum queue length to report (avoid noise from empty queues)
    min_queue_length: 0
    
    # Maximum number of databases to scan (safety limit)
    max_databases: 16

Adding New Applications

No configuration needed! New applications are automatically discovered when they:

Use a Redis database (any database 0-15)
Create queues that match common patterns or contain queue-related keywords
Use Redis lists for their queues (standard Celery behavior)

Custom Queue Patterns

If your application uses non-standard queue names, add them to the queue_patterns list:

kubectl edit configmap celery-exporter-config -n celery-monitoring

Add your pattern:

queue_patterns:
  - "celery"
  - "*_priority"
  - "my_custom_queue_*"  # Add your pattern here

Friendly Database Names

To give databases friendly names instead of db_0, db_1, etc.:

database_names:
  0: "piefed"
  1: "mastodon"
  2: "matrix"
  3: "bookwyrm"
  4: "my_new_app"  # Add your app here

Metrics Produced

The exporter produces these metrics for each discovered database:

`celery_queue_length`

Labels: queue_name, database, db_number
Description: Number of pending tasks in each queue
Example: celery_queue_length{queue_name="celery", database="piefed", db_number="0"} 1234
Special: queue_name="_total" shows total tasks across all queues in a database

`redis_connection_status`

Labels: database, db_number
Description: Connection status per database (1=connected, 0=disconnected)
Example: redis_connection_status{database="piefed", db_number="0"} 1

`celery_databases_discovered`

Description: Total number of databases with queues discovered
Example: celery_databases_discovered 4

`celery_queues_discovered`

Labels: database
Description: Number of queues discovered per database
Example: celery_queues_discovered{database="bookwyrm"} 5

`celery_queue_info`

Description: General information about all monitored queues
Includes: Total lengths, Redis host, last update timestamp, auto-discovery status

PromQL Query Examples

Discovery Overview

# How many databases were discovered
celery_databases_discovered

# How many queues per database
celery_queues_discovered

# Auto-discovery status
celery_queue_info

All Applications Overview

# All queue lengths grouped by database
sum by (database) (celery_queue_length{queue_name!="_total"})

# Total tasks across all databases
sum(celery_queue_length{queue_name="_total"})

# Individual queues (excluding totals)
celery_queue_length{queue_name!="_total"}

# Only active queues (> 0 tasks)
celery_queue_length{queue_name!="_total"} > 0

Specific Applications

# PieFed queues only
celery_queue_length{database="piefed", queue_name!="_total"}

# BookWyrm high priority queue (if it exists)
celery_queue_length{database="bookwyrm", queue_name="high_priority"}

# All applications' main celery queue
celery_queue_length{queue_name="celery"}

# Database totals only
celery_queue_length{queue_name="_total"}

Processing Rates

# Tasks processed per minute (negative = queue decreasing)
rate(celery_queue_length{queue_name!="_total"}[5m]) * -60

# Processing rate by database (using totals)
rate(celery_queue_length{queue_name="_total"}[5m]) * -60

# Overall processing rate across all databases
sum(rate(celery_queue_length{queue_name="_total"}[5m]) * -60)

Health Monitoring

# Databases with connection issues
redis_connection_status == 0

# Queues growing too fast
increase(celery_queue_length{queue_name!="_total"}[5m]) > 1000

# Stalled processing (no change in 15 minutes)
changes(celery_queue_length{queue_name="_total"}[15m]) == 0 and celery_queue_length{queue_name="_total"} > 100

# Databases that stopped being discovered
changes(celery_databases_discovered[10m]) < 0

Troubleshooting

Check Auto-Discovery Status

# View current configuration
kubectl get configmap celery-exporter-config -n celery-monitoring -o yaml

# Check exporter logs for discovery results
kubectl logs -n celery-monitoring deployment/celery-metrics-exporter

# Look for discovery messages like:
# "Database 0 (piefed): 1 queues, 245 total keys"
# "Auto-discovery complete: Found 3 databases with queues"

Test Redis Connectivity

# Test connection to specific database
kubectl exec -n redis-system redis-master-0 -- redis-cli -a PASSWORD -n DB_NUMBER ping

# Check what keys exist in a database
kubectl exec -n redis-system redis-master-0 -- redis-cli -a PASSWORD -n DB_NUMBER keys '*'

# Check if a key is a list (queue)
kubectl exec -n redis-system redis-master-0 -- redis-cli -a PASSWORD -n DB_NUMBER type QUEUE_NAME

# Check queue length manually
kubectl exec -n redis-system redis-master-0 -- redis-cli -a PASSWORD -n DB_NUMBER llen QUEUE_NAME

Validate Metrics

# Port forward and check metrics endpoint
kubectl port-forward -n celery-monitoring svc/celery-metrics-exporter 8000:8000

# Check discovery metrics
curl http://localhost:8000/metrics | grep celery_databases_discovered
curl http://localhost:8000/metrics | grep celery_queues_discovered

# Check queue metrics
curl http://localhost:8000/metrics | grep celery_queue_length

Debug Discovery Issues

If queues aren't being discovered:

Check queue patterns - Add your queue names to queue_patterns
Verify queue type - Ensure queues are Redis lists: redis-cli type queue_name
Check database numbers - Verify your app uses the expected Redis database
Review logs - Look for discovery debug messages in exporter logs

Force Restart Discovery

# Restart the exporter to re-run discovery
kubectl rollout restart deployment/celery-metrics-exporter -n celery-monitoring

Security Notes

The exporter connects to Redis using the shared redis-credentials secret
All database connections use the same Redis host and password
Only queue length information is exposed, not queue contents
The exporter scans all databases but only reports queue-like keys
Metrics are scraped via ServiceMonitor for OpenTelemetry collection

9.3 KiB Raw Permalink Blame History