183 lines
5.0 KiB
Markdown
183 lines
5.0 KiB
Markdown
|
|
# PieFed Database Migration Setup
|
||
|
|
|
||
|
|
## Overview
|
||
|
|
|
||
|
|
Database migrations are now handled by a **dedicated Kubernetes Job** that runs before web and worker pods start. This eliminates race conditions and follows Kubernetes best practices.
|
||
|
|
|
||
|
|
## Architecture
|
||
|
|
|
||
|
|
```
|
||
|
|
1. piefed-db-init Job (runs once)
|
||
|
|
├── Uses entrypoint-init.sh
|
||
|
|
├── Waits for DB and Redis
|
||
|
|
├── Runs: flask db upgrade
|
||
|
|
└── Exits on completion
|
||
|
|
|
||
|
|
2. Web/Worker Deployments (wait for Job)
|
||
|
|
├── Init Container: wait-for-migrations
|
||
|
|
│ ├── Watches Job status
|
||
|
|
│ └── Blocks until Job completes
|
||
|
|
└── Main Container: starts after init passes
|
||
|
|
```
|
||
|
|
|
||
|
|
## Components
|
||
|
|
|
||
|
|
### 1. Database Init Job
|
||
|
|
**File**: `job-db-init.yaml`
|
||
|
|
- Runs migrations using `entrypoint-init.sh`
|
||
|
|
- Must complete before any pods start
|
||
|
|
- Retries up to 3 times on failure
|
||
|
|
- Kept for 24h after completion (for debugging)
|
||
|
|
|
||
|
|
### 2. Init Containers (Web & Worker)
|
||
|
|
**Files**: `deployment-web.yaml`, `deployment-worker.yaml`
|
||
|
|
- Wait for `piefed-db-init` Job to complete
|
||
|
|
- Timeout after 10 minutes
|
||
|
|
- Show migration logs if Job fails
|
||
|
|
- Block pod startup until migrations succeed
|
||
|
|
|
||
|
|
### 3. RBAC Permissions
|
||
|
|
**File**: `rbac-init-checker.yaml`
|
||
|
|
- ServiceAccount: `piefed-init-checker`
|
||
|
|
- Permissions to read Job status and logs
|
||
|
|
- Scoped to `piefed-application` namespace only
|
||
|
|
|
||
|
|
## Deployment Flow
|
||
|
|
|
||
|
|
```mermaid
|
||
|
|
sequenceDiagram
|
||
|
|
participant Flux
|
||
|
|
participant RBAC as RBAC Resources
|
||
|
|
participant Job as DB Init Job
|
||
|
|
participant Init as Init Containers
|
||
|
|
participant Pods as Web/Worker Pods
|
||
|
|
|
||
|
|
Flux->>RBAC: 1. Create ServiceAccount + Role
|
||
|
|
Flux->>Job: 2. Create Job
|
||
|
|
Job->>Job: 3. Run migrations
|
||
|
|
Flux->>Init: 4. Start Deployments
|
||
|
|
Init->>Job: 5. Wait for Job complete
|
||
|
|
Job-->>Init: 6. Job successful
|
||
|
|
Init->>Pods: 7. Start main containers
|
||
|
|
```
|
||
|
|
|
||
|
|
## First-Time Setup
|
||
|
|
|
||
|
|
### 1. Build New Container Images
|
||
|
|
The base image now includes `entrypoint-init.sh`:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
cd build/piefed
|
||
|
|
./build-all.sh
|
||
|
|
```
|
||
|
|
|
||
|
|
### 2. Apply Manifests
|
||
|
|
Flux will automatically pick up changes, or apply manually:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Apply everything
|
||
|
|
kubectl apply -k manifests/applications/piefed/
|
||
|
|
|
||
|
|
# Watch the migration Job
|
||
|
|
kubectl logs -f -n piefed-application job/piefed-db-init
|
||
|
|
|
||
|
|
# Watch pods waiting for migrations
|
||
|
|
kubectl get pods -n piefed-application -w
|
||
|
|
```
|
||
|
|
|
||
|
|
## Upgrade Process (New Versions)
|
||
|
|
|
||
|
|
When upgrading PieFed to a new version with schema changes:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# 1. Build and push new images
|
||
|
|
cd build/piefed
|
||
|
|
./build-all.sh
|
||
|
|
|
||
|
|
# 2. Delete old Job (so it re-runs with new image)
|
||
|
|
kubectl delete job piefed-db-init -n piefed-application
|
||
|
|
|
||
|
|
# 3. Apply manifests (Job will recreate)
|
||
|
|
kubectl apply -k manifests/applications/piefed/
|
||
|
|
|
||
|
|
# 4. Watch migration progress
|
||
|
|
kubectl logs -f -n piefed-application job/piefed-db-init
|
||
|
|
|
||
|
|
# 5. Verify Job completed
|
||
|
|
kubectl wait --for=condition=complete --timeout=300s \
|
||
|
|
job/piefed-db-init -n piefed-application
|
||
|
|
|
||
|
|
# 6. Restart deployments to pick up new image
|
||
|
|
kubectl rollout restart deployment piefed-web -n piefed-application
|
||
|
|
kubectl rollout restart deployment piefed-worker -n piefed-application
|
||
|
|
```
|
||
|
|
|
||
|
|
## Troubleshooting
|
||
|
|
|
||
|
|
### Migration Job Failed
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Check Job status
|
||
|
|
kubectl get job piefed-db-init -n piefed-application
|
||
|
|
|
||
|
|
# View full logs
|
||
|
|
kubectl logs -n piefed-application job/piefed-db-init
|
||
|
|
|
||
|
|
# Check database connection
|
||
|
|
kubectl exec -n piefed-application deployment/piefed-web -- \
|
||
|
|
flask db current
|
||
|
|
```
|
||
|
|
|
||
|
|
### Pods Stuck in Init
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Check init container logs
|
||
|
|
kubectl logs -n piefed-application <pod-name> -c wait-for-migrations
|
||
|
|
|
||
|
|
# Check if Job is running
|
||
|
|
kubectl get job piefed-db-init -n piefed-application
|
||
|
|
|
||
|
|
# Manual Job completion check
|
||
|
|
kubectl get job piefed-db-init -n piefed-application \
|
||
|
|
-o jsonpath='{.status.conditions[?(@.type=="Complete")].status}'
|
||
|
|
```
|
||
|
|
|
||
|
|
### RBAC Permissions Issue
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Verify ServiceAccount exists
|
||
|
|
kubectl get sa piefed-init-checker -n piefed-application
|
||
|
|
|
||
|
|
# Check Role binding
|
||
|
|
kubectl get rolebinding piefed-init-checker -n piefed-application
|
||
|
|
|
||
|
|
# Test permissions from a pod
|
||
|
|
kubectl auth can-i get jobs \
|
||
|
|
--as=system:serviceaccount:piefed-application:piefed-init-checker \
|
||
|
|
-n piefed-application
|
||
|
|
```
|
||
|
|
|
||
|
|
## Benefits
|
||
|
|
|
||
|
|
✅ **No Race Conditions**: Single Job runs migrations sequentially
|
||
|
|
✅ **Proper Ordering**: Init containers enforce dependencies
|
||
|
|
✅ **Clean Separation**: Web/worker focus on their primary roles
|
||
|
|
✅ **Easy Debugging**: Clear logs for each stage
|
||
|
|
✅ **GitOps Compatible**: Works perfectly with Flux CD
|
||
|
|
✅ **Idempotent**: Safe to re-run, Jobs handle completion state
|
||
|
|
✅ **Fast Scaling**: Web/worker pods start immediately after migrations
|
||
|
|
|
||
|
|
## Migration from Old Setup
|
||
|
|
|
||
|
|
The old setup had `PIEFED_INIT_CONTAINER=true` on all pods, causing race conditions.
|
||
|
|
|
||
|
|
**Changes Made**:
|
||
|
|
1. ✅ Removed `PIEFED_INIT_CONTAINER` env var from all pods
|
||
|
|
2. ✅ Removed migration logic from `entrypoint-common.sh`
|
||
|
|
3. ✅ Created dedicated `entrypoint-init.sh` for Job
|
||
|
|
4. ✅ Added init containers to wait for Job
|
||
|
|
5. ✅ Created RBAC for Job status checking
|
||
|
|
|
||
|
|
**Before deploying**, ensure you rebuild images with the new entrypoint script!
|
||
|
|
|