Files
Keybard-Vagabond-Demo/manifests/applications/piefed/MIGRATION-SETUP.md

183 lines
5.0 KiB
Markdown
Raw Normal View History

# PieFed Database Migration Setup
## Overview
Database migrations are now handled by a **dedicated Kubernetes Job** that runs before web and worker pods start. This eliminates race conditions and follows Kubernetes best practices.
## Architecture
```
1. piefed-db-init Job (runs once)
├── Uses entrypoint-init.sh
├── Waits for DB and Redis
├── Runs: flask db upgrade
└── Exits on completion
2. Web/Worker Deployments (wait for Job)
├── Init Container: wait-for-migrations
│ ├── Watches Job status
│ └── Blocks until Job completes
└── Main Container: starts after init passes
```
## Components
### 1. Database Init Job
**File**: `job-db-init.yaml`
- Runs migrations using `entrypoint-init.sh`
- Must complete before any pods start
- Retries up to 3 times on failure
- Kept for 24h after completion (for debugging)
### 2. Init Containers (Web & Worker)
**Files**: `deployment-web.yaml`, `deployment-worker.yaml`
- Wait for `piefed-db-init` Job to complete
- Timeout after 10 minutes
- Show migration logs if Job fails
- Block pod startup until migrations succeed
### 3. RBAC Permissions
**File**: `rbac-init-checker.yaml`
- ServiceAccount: `piefed-init-checker`
- Permissions to read Job status and logs
- Scoped to `piefed-application` namespace only
## Deployment Flow
```mermaid
sequenceDiagram
participant Flux
participant RBAC as RBAC Resources
participant Job as DB Init Job
participant Init as Init Containers
participant Pods as Web/Worker Pods
Flux->>RBAC: 1. Create ServiceAccount + Role
Flux->>Job: 2. Create Job
Job->>Job: 3. Run migrations
Flux->>Init: 4. Start Deployments
Init->>Job: 5. Wait for Job complete
Job-->>Init: 6. Job successful
Init->>Pods: 7. Start main containers
```
## First-Time Setup
### 1. Build New Container Images
The base image now includes `entrypoint-init.sh`:
```bash
cd build/piefed
./build-all.sh
```
### 2. Apply Manifests
Flux will automatically pick up changes, or apply manually:
```bash
# Apply everything
kubectl apply -k manifests/applications/piefed/
# Watch the migration Job
kubectl logs -f -n piefed-application job/piefed-db-init
# Watch pods waiting for migrations
kubectl get pods -n piefed-application -w
```
## Upgrade Process (New Versions)
When upgrading PieFed to a new version with schema changes:
```bash
# 1. Build and push new images
cd build/piefed
./build-all.sh
# 2. Delete old Job (so it re-runs with new image)
kubectl delete job piefed-db-init -n piefed-application
# 3. Apply manifests (Job will recreate)
kubectl apply -k manifests/applications/piefed/
# 4. Watch migration progress
kubectl logs -f -n piefed-application job/piefed-db-init
# 5. Verify Job completed
kubectl wait --for=condition=complete --timeout=300s \
job/piefed-db-init -n piefed-application
# 6. Restart deployments to pick up new image
kubectl rollout restart deployment piefed-web -n piefed-application
kubectl rollout restart deployment piefed-worker -n piefed-application
```
## Troubleshooting
### Migration Job Failed
```bash
# Check Job status
kubectl get job piefed-db-init -n piefed-application
# View full logs
kubectl logs -n piefed-application job/piefed-db-init
# Check database connection
kubectl exec -n piefed-application deployment/piefed-web -- \
flask db current
```
### Pods Stuck in Init
```bash
# Check init container logs
kubectl logs -n piefed-application <pod-name> -c wait-for-migrations
# Check if Job is running
kubectl get job piefed-db-init -n piefed-application
# Manual Job completion check
kubectl get job piefed-db-init -n piefed-application \
-o jsonpath='{.status.conditions[?(@.type=="Complete")].status}'
```
### RBAC Permissions Issue
```bash
# Verify ServiceAccount exists
kubectl get sa piefed-init-checker -n piefed-application
# Check Role binding
kubectl get rolebinding piefed-init-checker -n piefed-application
# Test permissions from a pod
kubectl auth can-i get jobs \
--as=system:serviceaccount:piefed-application:piefed-init-checker \
-n piefed-application
```
## Benefits
**No Race Conditions**: Single Job runs migrations sequentially
**Proper Ordering**: Init containers enforce dependencies
**Clean Separation**: Web/worker focus on their primary roles
**Easy Debugging**: Clear logs for each stage
**GitOps Compatible**: Works perfectly with Flux CD
**Idempotent**: Safe to re-run, Jobs handle completion state
**Fast Scaling**: Web/worker pods start immediately after migrations
## Migration from Old Setup
The old setup had `PIEFED_INIT_CONTAINER=true` on all pods, causing race conditions.
**Changes Made**:
1. ✅ Removed `PIEFED_INIT_CONTAINER` env var from all pods
2. ✅ Removed migration logic from `entrypoint-common.sh`
3. ✅ Created dedicated `entrypoint-init.sh` for Job
4. ✅ Added init containers to wait for Job
5. ✅ Created RBAC for Job status checking
**Before deploying**, ensure you rebuild images with the new entrypoint script!