7.5 KiB
BookWyrm Celery Beat to Kubernetes CronJob Migration
Overview
This document outlines the migration from BookWyrm's Celery beat container to Kubernetes CronJobs. The beat container currently runs continuously and schedules periodic tasks, but this can be replaced with more efficient Kubernetes-native CronJobs.
Current Beat Container Analysis
What Celery Beat Does
The current deployment-beat.yaml runs a Celery beat scheduler that:
- Uses
django_celery_beat.schedulers:DatabaseSchedulerto store schedules in the database - Manages periodic task execution by queuing tasks to Redis for workers to pick up
- Runs continuously consuming resources (100m CPU, 256Mi memory)
Scheduled Tasks Identified
Through analysis of the BookWyrm source code, we identified two main periodic tasks:
-
Automod Task (
bookwyrm.models.antispam.automod_task)- Function: Scans users and statuses for moderation flags based on AutoMod rules
- Purpose: Automatically flags suspicious content and users for moderator review
- Trigger: Only runs when AutoMod rules exist in the database
- Recommended Schedule: Every 6 hours (adjustable based on community size)
-
Update Check Task (
bookwyrm.models.site.check_for_updates_task)- Function: Checks GitHub API for new BookWyrm releases
- Purpose: Notifies administrators when updates are available
- Trigger: Makes HTTP request to GitHub releases API
- Recommended Schedule: Daily at 3:00 AM UTC
Migration Strategy
Phase 1: Parallel Operation (Recommended)
- Deploy CronJobs alongside existing beat container
- Monitor CronJob execution for several days
- Verify tasks execute correctly and at expected intervals
- Compare resource usage between approaches
Phase 2: Beat Container Removal
- Remove
deployment-beat.yamlfrom kustomization - Clean up any database-stored periodic tasks (if desired)
- Monitor for any missed functionality
CronJob Implementation
Key Design Decisions
- Direct Task Execution: Instead of going through Celery, CronJobs execute tasks directly using Django management shell
- Resource Optimization: Each job uses minimal resources (50-100m CPU, 128-256Mi memory) and only when running
- Security: Same security context as other BookWyrm containers (non-root, dropped capabilities)
- Scheduling: Uses standard cron expressions for predictable timing
- Job Management: Configures history limits and TTL for automatic cleanup
CronJob Specifications
Automod CronJob
- Schedule:
0 */6 * * *(every 6 hours) - Command: Direct Python execution of
automod_task() - Resources: 50m CPU, 128Mi memory
- Concurrency: Forbid (prevent overlapping executions)
Update Check CronJob
- Schedule:
0 3 * * *(daily at 3:00 AM UTC) - Command: Direct Python execution of
check_for_updates_task() - Resources: 50m CPU, 128Mi memory
- Concurrency: Forbid (prevent overlapping executions)
Database Cleanup CronJob (Bonus)
- Schedule:
0 2 * * 0(weekly on Sunday at 2:00 AM UTC) - Command: Django shell script to clean expired sessions and old notifications
- Resources: 100m CPU, 256Mi memory
- Purpose: Maintain database health (not part of original beat functionality)
Benefits of Migration
Resource Efficiency
- Before: Beat container runs 24/7 consuming ~100m CPU and 256Mi memory
- After: CronJobs run only when needed, typically <1 minute execution time
- Savings: ~99% reduction in resource usage for periodic tasks
Operational Benefits
- Kubernetes Native: Leverage built-in CronJob features (history, TTL, concurrency control)
- Observability: Better visibility into job execution and failures
- Scaling: No single point of failure for task scheduling
- Maintenance: Easier to modify schedules without redeploying beat container
Simplified Architecture
- Removes dependency on Celery beat scheduler
- Reduces Redis usage (no beat schedule storage)
- Eliminates one running container (reduced complexity)
Migration Steps
1. Deploy CronJobs
# Apply the new CronJob manifests
kubectl apply -f manifests/applications/bookwyrm/cronjobs.yaml
2. Verify CronJob Creation
# Check CronJobs are created
kubectl get cronjobs -n bookwyrm-application
# Check for any immediate execution (if testing)
kubectl get jobs -n bookwyrm-application
3. Monitor Execution (Run for 1-2 weeks)
# Watch job execution
kubectl get jobs -n bookwyrm-application -w
# Check job logs
kubectl logs job/bookwyrm-automod-<timestamp> -n bookwyrm-application
kubectl logs job/bookwyrm-update-check-<timestamp> -n bookwyrm-application
4. Optional: Disable Beat Container (Testing)
# Scale down beat deployment temporarily
kubectl scale deployment bookwyrm-beat --replicas=0 -n bookwyrm-application
# Monitor for any issues for several days
5. Permanent Migration
# Remove beat from kustomization.yaml
# Comment out or remove: - deployment-beat.yaml
# Apply changes
kubectl apply -k manifests/applications/bookwyrm/
6. Cleanup (Optional)
# Remove beat deployment entirely
kubectl delete deployment bookwyrm-beat -n bookwyrm-application
# Clean up database periodic tasks (if desired)
# This requires connecting to BookWyrm admin panel or database directly
Schedule Customization
Automod Schedule Adjustment
If your instance has high activity, you might want more frequent automod checks:
# For every 2 hours instead of 6:
schedule: "0 */2 * * *"
# For hourly:
schedule: "0 * * * *"
Update Check Frequency
For development instances, you might want more frequent update checks:
# For twice daily:
schedule: "0 3,15 * * *"
# For weekly instead of daily:
schedule: "0 3 * * 0"
Troubleshooting
CronJob Not Executing
# Check CronJob status
kubectl describe cronjob bookwyrm-automod -n bookwyrm-application
# Check for suspended jobs
kubectl get cronjobs -n bookwyrm-application -o wide
Job Failures
# Check failed job logs
kubectl logs job/bookwyrm-automod-<timestamp> -n bookwyrm-application
# Common issues:
# - Database connection problems
# - Missing environment variables
# - Redis connectivity issues
Missed Executions
# Check for node resource constraints
kubectl top nodes
# Verify startingDeadlineSeconds is appropriate
# Current setting: 600 seconds (10 minutes)
Rollback Plan
If issues arise, rollback is straightforward:
-
Scale up beat container:
kubectl scale deployment bookwyrm-beat --replicas=1 -n bookwyrm-application -
Remove CronJobs:
kubectl delete cronjobs bookwyrm-automod bookwyrm-update-check -n bookwyrm-application -
Restore original kustomization.yaml
Monitoring and Alerting
Consider setting up monitoring for:
- CronJob execution failures
- Job duration anomalies
- Missing job executions
- Resource usage patterns
Example Prometheus alert:
- alert: BookWyrmCronJobFailed
expr: kube_job_status_failed{namespace="bookwyrm-application"} > 0
for: 0m
labels:
severity: warning
annotations:
summary: "BookWyrm CronJob failed"
description: "CronJob {{ $labels.job_name }} failed in namespace {{ $labels.namespace }}"
Conclusion
This migration replaces the continuously running Celery beat container with efficient Kubernetes CronJobs, providing the same functionality with significantly reduced resource consumption and improved operational characteristics. The migration can be done gradually with minimal risk.