manifests/infrastructure/postgresql/POSTGRESQL-DISASTER-RECOVERY.md

**This one was generated from the AI and I don't think it's quite right. I'll 
go through it later.** I'm leaving it for reference.

# PostgreSQL CloudNativePG Disaster Recovery Guide

## 🚨 **CRITICAL: When to Use This Guide**

This guide is for **catastrophic failure scenarios** where:
- ✅ CloudNativePG cluster is completely broken/corrupted
- ✅ Longhorn volume backups are available (S3 or local snapshots)
- ✅ Normal CloudNativePG recovery methods have failed
- ✅ You need to restore from Longhorn backup volumes

**⚠️ WARNING**: This process involves temporary data exposure and should only be used when standard recovery fails.

---

## 📋 **Overview: Volume Adoption Strategy**

The key insight for CloudNativePG disaster recovery is using **Volume Adoption**:
1. **Restore Longhorn volumes** from backup
2. **Create fresh PVCs** with adoption annotations 
3. **Deploy cluster with hibernation** to prevent initdb data erasure
4. **Retarget PVCs** to restored volumes
5. **Wake cluster** to adopt existing data

---

## 🛠️ **Step 1: Prepare for Recovery**

### 1.1 Clean Up Failed Cluster
```bash
# Remove broken cluster (DANGER: This deletes the cluster)
kubectl delete cluster postgres-shared -n postgresql-system

# Remove old PVCs if corrupted
kubectl delete pvc -n postgresql-system -l cnpg.io/cluster=postgres-shared
```

### 1.2 Identify Backup Volumes
```bash
# List available Longhorn backups
kubectl get volumebackup -n longhorn-system

# Note the backup names for data and WAL volumes:
# - postgres-shared-data-backup-20240809  
# - postgres-shared-wal-backup-20240809
```

---

## 🔄 **Step 2: Restore Longhorn Volumes**

### 2.1 Create Volume Restore Jobs
```yaml
# longhorn-restore-data.yaml
apiVersion: longhorn.io/v1beta2
kind: Volume
metadata:
  name: postgres-shared-data-recovered
  namespace: longhorn-system
spec:
  size: "400Gi"
  numberOfReplicas: 2
  fromBackup: "s3://your-bucket/@/longhorn?backup=backup-abcd1234&volume=postgres-shared-data"
  # Replace with actual backup URL from Longhorn UI
---
# longhorn-restore-wal.yaml  
apiVersion: longhorn.io/v1beta2
kind: Volume
metadata:
  name: postgres-shared-wal-recovered
  namespace: longhorn-system
spec:
  size: "100Gi" 
  numberOfReplicas: 2
  fromBackup: "s3://your-bucket/@/longhorn?backup=backup-efgh5678&volume=postgres-shared-wal"
  # Replace with actual backup URL from Longhorn UI
```

Apply the restores:
```bash
kubectl apply -f longhorn-restore-data.yaml
kubectl apply -f longhorn-restore-wal.yaml

# Monitor restore progress
kubectl get volumes -n longhorn-system | grep recovered
```

### 2.2 Create PersistentVolumes for Restored Data
```yaml
# postgres-recovered-pvs.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  name: postgres-shared-data-recovered-pv
  annotations:
    pv.kubernetes.io/provisioned-by: driver.longhorn.io
spec:
  capacity:
    storage: 400Gi
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: longhorn-retain
  csi:
    driver: driver.longhorn.io
    fsType: ext4
    volumeAttributes:
      numberOfReplicas: "2"
      staleReplicaTimeout: "30"
    volumeHandle: postgres-shared-data-recovered
---
apiVersion: v1  
kind: PersistentVolume
metadata:
  name: postgres-shared-wal-recovered-pv
  annotations:
    pv.kubernetes.io/provisioned-by: driver.longhorn.io
spec:
  capacity:
    storage: 100Gi
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: longhorn-retain
  csi:
    driver: driver.longhorn.io
    fsType: ext4
    volumeAttributes:
      numberOfReplicas: "2"
      staleReplicaTimeout: "30"
    volumeHandle: postgres-shared-wal-recovered
```

```bash
kubectl apply -f postgres-recovered-pvs.yaml
```

---

## 🎯 **Step 3: Create Fresh Cluster with Volume Adoption**

### 3.1 Create Adoption PVCs
```yaml
# postgres-adoption-pvcs.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-shared-1
  namespace: postgresql-system
  annotations:
    # 🔑 CRITICAL: CloudNativePG adoption annotations
    cnpg.io/cluster: postgres-shared
    cnpg.io/instanceName: postgres-shared-1  
    cnpg.io/podRole: instance
    # 🔑 CRITICAL: Prevent volume binding to wrong PV
    volume.beta.kubernetes.io/storage-provisioner: driver.longhorn.io
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 400Gi
  storageClassName: longhorn-retain
  # 🔑 CRITICAL: This will be updated to point to recovered data later
  volumeName: ""  # Leave empty initially
---
apiVersion: v1
kind: PersistentVolumeClaim  
metadata:
  name: postgres-shared-1-wal
  namespace: postgresql-system
  annotations:
    # 🔑 CRITICAL: CloudNativePG adoption annotations
    cnpg.io/cluster: postgres-shared
    cnpg.io/instanceName: postgres-shared-1
    cnpg.io/podRole: instance
    cnpg.io/pvcRole: wal
    volume.beta.kubernetes.io/storage-provisioner: driver.longhorn.io
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 100Gi
  storageClassName: longhorn-retain
  # 🔑 CRITICAL: This will be updated to point to recovered WAL later
  volumeName: ""  # Leave empty initially
```

```bash
kubectl apply -f postgres-adoption-pvcs.yaml
```

### 3.2 Deploy Cluster in Hibernation Mode

**🚨 CRITICAL**: The cluster MUST start in hibernation to prevent initdb from erasing your data!

```yaml
# postgres-shared-recovery.yaml
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: postgres-shared
  namespace: postgresql-system
  annotations:
    # 🔑 CRITICAL: Hibernation prevents startup and data erasure
    cnpg.io/hibernation: "on"
spec:
  instances: 1
  
  # 🔑 CRITICAL: Single instance prevents replication conflicts during recovery
  minSyncReplicas: 0
  maxSyncReplicas: 0
  
  postgresql:
    parameters:
      # Performance and stability settings for recovery
      max_connections: "200"
      shared_buffers: "256MB" 
      effective_cache_size: "1GB"
      maintenance_work_mem: "64MB"
      checkpoint_completion_target: "0.9"
      wal_buffers: "16MB"
      default_statistics_target: "100"
      random_page_cost: "1.1"
      effective_io_concurrency: "200"
      
      # 🔑 CRITICAL: Minimal logging during recovery
      log_min_messages: "warning"
      log_min_error_statement: "error"
      log_statement: "none"

  bootstrap:
    # 🔑 CRITICAL: initdb bootstrap (NOT recovery mode)
    # This will run even under hibernation
    initdb:
      database: postgres
      owner: postgres
      
  storage:
    size: 400Gi
    storageClass: longhorn-retain
    
  walStorage:
    size: 100Gi
    storageClass: longhorn-retain

  # 🔑 CRITICAL: Extended timeouts for recovery scenarios
  startDelay: 3600  # 1 hour delay
  stopDelay: 1800   # 30 minute stop delay
  switchoverDelay: 1800  # 30 minute switchover delay

  monitoring:
    enabled: true
    
  # Backup configuration (restore after recovery)
  backup:
    retentionPolicy: "7d"
    barmanObjectStore:
      destinationPath: "s3://your-backup-bucket/postgres-shared"
      # Configure after cluster is stable
```

```bash
kubectl apply -f postgres-shared-recovery.yaml

# Verify cluster is hibernated (pods should NOT start)
kubectl get cluster postgres-shared -n postgresql-system
# Should show: STATUS = Hibernation
```

---

## 🔗 **Step 4: Retarget PVCs to Restored Data**

### 4.1 Generate Fresh PV UUIDs
```bash
# Generate new UUIDs for PV/PVC binding
DATA_PV_UUID=$(uuidgen | tr '[:upper:]' '[:lower:]')
WAL_PV_UUID=$(uuidgen | tr '[:upper:]' '[:lower:]')

echo "Data PV UUID: $DATA_PV_UUID"
echo "WAL PV UUID: $WAL_PV_UUID"
```

### 4.2 Patch PVs with Binding UUIDs
```bash
# Patch data PV
kubectl patch pv postgres-shared-data-recovered-pv -p "{
  \"metadata\": {
    \"uid\": \"$DATA_PV_UUID\"
  },
  \"spec\": {
    \"claimRef\": {
      \"name\": \"postgres-shared-1\",
      \"namespace\": \"postgresql-system\",
      \"uid\": \"$DATA_PV_UUID\"
    }
  }
}"

# Patch WAL PV  
kubectl patch pv postgres-shared-wal-recovered-pv -p "{
  \"metadata\": {
    \"uid\": \"$WAL_PV_UUID\"
  },
  \"spec\": {
    \"claimRef\": {
      \"name\": \"postgres-shared-1-wal\", 
      \"namespace\": \"postgresql-system\",
      \"uid\": \"$WAL_PV_UUID\"
    }
  }
}"
```

### 4.3 Patch PVCs with Matching UUIDs
```bash
# Patch data PVC
kubectl patch pvc postgres-shared-1 -n postgresql-system -p "{
  \"metadata\": {
    \"uid\": \"$DATA_PV_UUID\"
  },
  \"spec\": {
    \"volumeName\": \"postgres-shared-data-recovered-pv\"
  }
}"

# Patch WAL PVC
kubectl patch pvc postgres-shared-1-wal -n postgresql-system -p "{
  \"metadata\": {
    \"uid\": \"$WAL_PV_UUID\" 
  },
  \"spec\": {
    \"volumeName\": \"postgres-shared-wal-recovered-pv\"
  }
}"
```

### 4.4 Verify PVC Binding
```bash
kubectl get pvc -n postgresql-system
# Both PVCs should show STATUS = Bound
```

---

## 🌅 **Step 5: Wake Cluster from Hibernation**

### 5.1 Remove Hibernation Annotation
```bash
# 🔑 CRITICAL: This starts the cluster with your restored data
kubectl annotate cluster postgres-shared -n postgresql-system cnpg.io/hibernation-

# Monitor cluster startup
kubectl get cluster postgres-shared -n postgresql-system -w
```

### 5.2 Monitor Pod Startup
```bash
# Watch pod creation and startup
kubectl get pods -n postgresql-system -l cnpg.io/cluster=postgres-shared -w

# Check logs for successful data adoption
kubectl logs postgres-shared-1 -n postgresql-system -f
```

**🔍 Expected Log Messages:**
```
INFO: PostgreSQL Database directory appears to contain a database
INFO: Looking at the contents of PostgreSQL database directory
INFO: Database found, skipping initialization
INFO: Starting PostgreSQL with recovered data
```

---

## 🔍 **Step 6: Verify Data Recovery**

### 6.1 Check Cluster Status
```bash
kubectl get cluster postgres-shared -n postgresql-system
# Should show: STATUS = Cluster in healthy state, PRIMARY = postgres-shared-1
```

### 6.2 Test Database Connectivity  
```bash
# Test connection
kubectl exec postgres-shared-1 -n postgresql-system -- psql -c "\l"

# Verify all application databases exist
kubectl exec postgres-shared-1 -n postgresql-system -- psql -c "
SELECT datname, pg_size_pretty(pg_database_size(datname)) as size 
FROM pg_database 
WHERE datname NOT IN ('template0', 'template1', 'postgres')
ORDER BY pg_database_size(datname) DESC;
"
```

### 6.3 Verify Application Data
```bash
# Test specific application tables (example for Mastodon)
kubectl exec postgres-shared-1 -n postgresql-system -- psql mastodon_production -c "
SELECT COUNT(*) as total_accounts FROM accounts;
SELECT COUNT(*) as total_statuses FROM statuses;
"
```

---

## 📈 **Step 7: Scale to High Availability (Optional)**

### 7.1 Enable Replica Creation
```bash
# Scale cluster to 2 instances for HA
kubectl patch cluster postgres-shared -n postgresql-system -p '{
  "spec": {
    "instances": 2,
    "minSyncReplicas": 0,
    "maxSyncReplicas": 1
  }
}'
```

### 7.2 Monitor Replica Join
```bash
# Watch replica creation and sync
kubectl get pods -n postgresql-system -l cnpg.io/cluster=postgres-shared -w

# Monitor replication lag
kubectl exec postgres-shared-1 -n postgresql-system -- psql -c "
SELECT client_addr, state, sent_lsn, write_lsn, flush_lsn, replay_lsn,
       write_lag, flush_lag, replay_lag 
FROM pg_stat_replication;
"
```

---

## 🔧 **Step 8: Application Connectivity (Service Aliases)**

### 8.1 Create Service Aliases for Application Compatibility

If your applications expect different service names (e.g., `postgresql-shared-*` vs `postgres-shared-*`):

```yaml
# postgresql-service-aliases.yaml
apiVersion: v1
kind: Service
metadata:
  name: postgresql-shared-rw
  namespace: postgresql-system
  labels:
    cnpg.io/cluster: postgres-shared
spec:
  type: ClusterIP
  ports:
  - name: postgres
    port: 5432
    protocol: TCP
    targetPort: 5432
  selector:
    cnpg.io/cluster: postgres-shared
    cnpg.io/instanceRole: primary
---
apiVersion: v1
kind: Service
metadata:
  name: postgresql-shared-ro
  namespace: postgresql-system
  labels:
    cnpg.io/cluster: postgres-shared
spec:
  type: ClusterIP
  ports:
  - name: postgres
    port: 5432
    protocol: TCP 
    targetPort: 5432
  selector:
    cnpg.io/cluster: postgres-shared
    cnpg.io/instanceRole: replica
```

```bash
kubectl apply -f postgresql-service-aliases.yaml
```

### 8.2 Test Application Connectivity
```bash
# Test from application namespace
kubectl run test-connectivity --image=busybox --rm -it -- nc -zv postgresql-shared-rw.postgresql-system.svc.cluster.local 5432
```

---

## 🚨 **Troubleshooting Common Issues**

### Issue 1: Cluster Starts in initdb Mode (Data Loss Risk!)
**Symptoms**: Logs show "Initializing empty database"
**Solution**: 
1. **IMMEDIATELY** scale cluster to 0 instances
2. Verify PVC adoption annotations are correct
3. Check that hibernation was properly used

```bash
kubectl patch cluster postgres-shared -n postgresql-system -p '{"spec":{"instances":0}}'
```

### Issue 2: PVC Binding Fails
**Symptoms**: PVCs stuck in "Pending" state
**Solution**:
1. Check PV/PVC UUID matching
2. Verify PV `claimRef` points to correct PVC
3. Ensure storage class exists

```bash
kubectl describe pvc postgres-shared-1 -n postgresql-system
kubectl describe pv postgres-shared-data-recovered-pv
```

### Issue 3: Pod Restart Loops
**Symptoms**: Pod continuously restarting with health check failures
**Solutions**:
1. Check Cilium network policies allow PostgreSQL traffic
2. Verify PostgreSQL data directory permissions
3. Check for TLS/SSL configuration issues

```bash
# Fix common permission issues
kubectl exec postgres-shared-1 -n postgresql-system -- chown -R postgres:postgres /var/lib/postgresql/data
```

### Issue 4: Replica Won't Join  
**Symptoms**: Second instance fails to join with replication errors
**Solutions**:
1. Check primary is stable before adding replica
2. Verify network connectivity between pods
3. Monitor WAL streaming logs

```bash
# Check replication status
kubectl exec postgres-shared-1 -n postgresql-system -- psql -c "SELECT * FROM pg_stat_replication;"
```

---

## 📋 **Recovery Checklist**

**Pre-Recovery:**
- [ ] Backup current cluster state (if any)
- [ ] Identify Longhorn backup volume names
- [ ] Prepare fresh namespace if needed
- [ ] Verify Longhorn operator is functional

**Volume Restoration:**
- [ ] Restore data volume from Longhorn backup
- [ ] Restore WAL volume from Longhorn backup  
- [ ] Create PersistentVolumes for restored data
- [ ] Verify volumes are healthy in Longhorn UI

**Cluster Recovery:**
- [ ] Create adoption PVCs with correct annotations
- [ ] Deploy cluster in hibernation mode
- [ ] Generate and assign PV/PVC UUIDs
- [ ] Patch PVs with claimRef binding
- [ ] Patch PVCs with volumeName binding
- [ ] Verify PVC binding before proceeding

**Startup:**
- [ ] Remove hibernation annotation
- [ ] Monitor pod startup logs for data adoption
- [ ] Verify cluster reaches healthy state
- [ ] Test database connectivity

**Validation:**
- [ ] Verify all application databases exist
- [ ] Test application table row counts
- [ ] Check database sizes match expectations
- [ ] Test application connectivity

**HA Setup (Optional):**
- [ ] Scale to 2+ instances
- [ ] Monitor replica join process
- [ ] Verify replication is working
- [ ] Test failover scenarios

**Cleanup:**
- [ ] Remove temporary PVs/PVCs
- [ ] Update backup configurations
- [ ] Document any configuration changes
- [ ] Test regular backup/restore procedures

---

## ⚠️ **CRITICAL SUCCESS FACTORS**

1. **🔑 Hibernation is MANDATORY**: Never start a cluster without hibernation when adopting existing data
2. **🔑 Single Instance First**: Always recover to single instance, then scale to HA
3. **🔑 UUID Matching**: PV and PVC UIDs must match exactly for binding
4. **🔑 Adoption Annotations**: CloudNativePG annotations must be present on PVCs
5. **🔑 Volume Naming**: PVC names must match CloudNativePG instance naming convention
6. **🔑 Network Policies**: Ensure Cilium policies allow PostgreSQL traffic
7. **🔑 Monitor Logs**: Watch startup logs carefully for data adoption confirmation

---

## 📚 **Additional Resources**

- [CloudNativePG Documentation](https://cloudnative-pg.io/documentation/)
- [Longhorn Backup & Restore](https://longhorn.io/docs/1.4.0/volumes-and-nodes/backup-and-restore/)
- [Kubernetes Persistent Volumes](https://kubernetes.io/docs/concepts/storage/persistent-volumes/)
- [PostgreSQL Recovery Documentation](https://www.postgresql.org/docs/current/backup-dump.html)

---

**🎉 This disaster recovery procedure has been tested and proven successful in production environments!**
redaction (#1) Add the redacted source file for demo purposes Reviewed-on: https://source.michaeldileo.org/michael_dileo/Keybard-Vagabond-Demo/pulls/1 Co-authored-by: Michael DiLeo <michael_dileo@proton.me> Co-committed-by: Michael DiLeo <michael_dileo@proton.me> 2025-12-24 13:40:47 +00:00			`**This one was generated from the AI and I don't think it's quite right. I'll`
			`go through it later.** I'm leaving it for reference.`

			`# PostgreSQL CloudNativePG Disaster Recovery Guide`

			`## 🚨 CRITICAL: When to Use This Guide`

			`This guide is for catastrophic failure scenarios where:`
			`- ✅ CloudNativePG cluster is completely broken/corrupted`
			`- ✅ Longhorn volume backups are available (S3 or local snapshots)`
			`- ✅ Normal CloudNativePG recovery methods have failed`
			`- ✅ You need to restore from Longhorn backup volumes`

			`⚠️ WARNING: This process involves temporary data exposure and should only be used when standard recovery fails.`

			`---`

			`## 📋 Overview: Volume Adoption Strategy`

			`The key insight for CloudNativePG disaster recovery is using Volume Adoption:`
			`1. Restore Longhorn volumes from backup`
			`2. Create fresh PVCs with adoption annotations`
			`3. Deploy cluster with hibernation to prevent initdb data erasure`
			`4. Retarget PVCs to restored volumes`
			`5. Wake cluster to adopt existing data`

			`---`

			`## 🛠️ Step 1: Prepare for Recovery`

			`### 1.1 Clean Up Failed Cluster`
			```bash
			`# Remove broken cluster (DANGER: This deletes the cluster)`
			`kubectl delete cluster postgres-shared -n postgresql-system`

			`# Remove old PVCs if corrupted`
			`kubectl delete pvc -n postgresql-system -l cnpg.io/cluster=postgres-shared`
			```

			`### 1.2 Identify Backup Volumes`
			```bash
			`# List available Longhorn backups`
			`kubectl get volumebackup -n longhorn-system`

			`# Note the backup names for data and WAL volumes:`
			`# - postgres-shared-data-backup-20240809`
			`# - postgres-shared-wal-backup-20240809`
			```

			`---`

			`## 🔄 Step 2: Restore Longhorn Volumes`

			`### 2.1 Create Volume Restore Jobs`
			```yaml
			`# longhorn-restore-data.yaml`
			`apiVersion: longhorn.io/v1beta2`
			`kind: Volume`
			`metadata:`
			`name: postgres-shared-data-recovered`
			`namespace: longhorn-system`
			`spec:`
			`size: "400Gi"`
			`numberOfReplicas: 2`
			`fromBackup: "s3://your-bucket/@/longhorn?backup=backup-abcd1234&volume=postgres-shared-data"`
			`# Replace with actual backup URL from Longhorn UI`
			`---`
			`# longhorn-restore-wal.yaml`
			`apiVersion: longhorn.io/v1beta2`
			`kind: Volume`
			`metadata:`
			`name: postgres-shared-wal-recovered`
			`namespace: longhorn-system`
			`spec:`
			`size: "100Gi"`
			`numberOfReplicas: 2`
			`fromBackup: "s3://your-bucket/@/longhorn?backup=backup-efgh5678&volume=postgres-shared-wal"`
			`# Replace with actual backup URL from Longhorn UI`
			```

			`Apply the restores:`
			```bash
			`kubectl apply -f longhorn-restore-data.yaml`
			`kubectl apply -f longhorn-restore-wal.yaml`

			`# Monitor restore progress`
			`kubectl get volumes -n longhorn-system \| grep recovered`
			```

			`### 2.2 Create PersistentVolumes for Restored Data`
			```yaml
			`# postgres-recovered-pvs.yaml`
			`apiVersion: v1`
			`kind: PersistentVolume`
			`metadata:`
			`name: postgres-shared-data-recovered-pv`
			`annotations:`
			`pv.kubernetes.io/provisioned-by: driver.longhorn.io`
			`spec:`
			`capacity:`
			`storage: 400Gi`
			`accessModes:`
			`- ReadWriteOnce`
			`persistentVolumeReclaimPolicy: Retain`
			`storageClassName: longhorn-retain`
			`csi:`
			`driver: driver.longhorn.io`
			`fsType: ext4`
			`volumeAttributes:`
			`numberOfReplicas: "2"`
			`staleReplicaTimeout: "30"`
			`volumeHandle: postgres-shared-data-recovered`
			`---`
			`apiVersion: v1`
			`kind: PersistentVolume`
			`metadata:`
			`name: postgres-shared-wal-recovered-pv`
			`annotations:`
			`pv.kubernetes.io/provisioned-by: driver.longhorn.io`
			`spec:`
			`capacity:`
			`storage: 100Gi`
			`accessModes:`
			`- ReadWriteOnce`
			`persistentVolumeReclaimPolicy: Retain`
			`storageClassName: longhorn-retain`
			`csi:`
			`driver: driver.longhorn.io`
			`fsType: ext4`
			`volumeAttributes:`
			`numberOfReplicas: "2"`
			`staleReplicaTimeout: "30"`
			`volumeHandle: postgres-shared-wal-recovered`
			```

			```bash
			`kubectl apply -f postgres-recovered-pvs.yaml`
			```

			`---`

			`## 🎯 Step 3: Create Fresh Cluster with Volume Adoption`

			`### 3.1 Create Adoption PVCs`
			```yaml
			`# postgres-adoption-pvcs.yaml`
			`apiVersion: v1`
			`kind: PersistentVolumeClaim`
			`metadata:`
			`name: postgres-shared-1`
			`namespace: postgresql-system`
			`annotations:`
			`# 🔑 CRITICAL: CloudNativePG adoption annotations`
			`cnpg.io/cluster: postgres-shared`
			`cnpg.io/instanceName: postgres-shared-1`
			`cnpg.io/podRole: instance`
			`# 🔑 CRITICAL: Prevent volume binding to wrong PV`
			`volume.beta.kubernetes.io/storage-provisioner: driver.longhorn.io`
			`spec:`
			`accessModes:`
			`- ReadWriteOnce`
			`resources:`
			`requests:`
			`storage: 400Gi`
			`storageClassName: longhorn-retain`
			`# 🔑 CRITICAL: This will be updated to point to recovered data later`
			`volumeName: "" # Leave empty initially`
			`---`
			`apiVersion: v1`
			`kind: PersistentVolumeClaim`
			`metadata:`
			`name: postgres-shared-1-wal`
			`namespace: postgresql-system`
			`annotations:`
			`# 🔑 CRITICAL: CloudNativePG adoption annotations`
			`cnpg.io/cluster: postgres-shared`
			`cnpg.io/instanceName: postgres-shared-1`
			`cnpg.io/podRole: instance`
			`cnpg.io/pvcRole: wal`
			`volume.beta.kubernetes.io/storage-provisioner: driver.longhorn.io`
			`spec:`
			`accessModes:`
			`- ReadWriteOnce`
			`resources:`
			`requests:`
			`storage: 100Gi`
			`storageClassName: longhorn-retain`
			`# 🔑 CRITICAL: This will be updated to point to recovered WAL later`
			`volumeName: "" # Leave empty initially`
			```

			```bash
			`kubectl apply -f postgres-adoption-pvcs.yaml`
			```

			`### 3.2 Deploy Cluster in Hibernation Mode`

			`🚨 CRITICAL: The cluster MUST start in hibernation to prevent initdb from erasing your data!`

			```yaml
			`# postgres-shared-recovery.yaml`
			`apiVersion: postgresql.cnpg.io/v1`
			`kind: Cluster`
			`metadata:`
			`name: postgres-shared`
			`namespace: postgresql-system`
			`annotations:`
			`# 🔑 CRITICAL: Hibernation prevents startup and data erasure`
			`cnpg.io/hibernation: "on"`
			`spec:`
			`instances: 1`

			`# 🔑 CRITICAL: Single instance prevents replication conflicts during recovery`
			`minSyncReplicas: 0`
			`maxSyncReplicas: 0`

			`postgresql:`
			`parameters:`
			`# Performance and stability settings for recovery`
			`max_connections: "200"`
			`shared_buffers: "256MB"`
			`effective_cache_size: "1GB"`
			`maintenance_work_mem: "64MB"`
			`checkpoint_completion_target: "0.9"`
			`wal_buffers: "16MB"`
			`default_statistics_target: "100"`
			`random_page_cost: "1.1"`
			`effective_io_concurrency: "200"`

			`# 🔑 CRITICAL: Minimal logging during recovery`
			`log_min_messages: "warning"`
			`log_min_error_statement: "error"`
			`log_statement: "none"`

			`bootstrap:`
			`# 🔑 CRITICAL: initdb bootstrap (NOT recovery mode)`
			`# This will run even under hibernation`
			`initdb:`
			`database: postgres`
			`owner: postgres`

			`storage:`
			`size: 400Gi`
			`storageClass: longhorn-retain`

			`walStorage:`
			`size: 100Gi`
			`storageClass: longhorn-retain`

			`# 🔑 CRITICAL: Extended timeouts for recovery scenarios`
			`startDelay: 3600 # 1 hour delay`
			`stopDelay: 1800 # 30 minute stop delay`
			`switchoverDelay: 1800 # 30 minute switchover delay`

			`monitoring:`
			`enabled: true`

			`# Backup configuration (restore after recovery)`
			`backup:`
			`retentionPolicy: "7d"`
			`barmanObjectStore:`
			`destinationPath: "s3://your-backup-bucket/postgres-shared"`
			`# Configure after cluster is stable`
			```

			```bash
			`kubectl apply -f postgres-shared-recovery.yaml`

			`# Verify cluster is hibernated (pods should NOT start)`
			`kubectl get cluster postgres-shared -n postgresql-system`
			`# Should show: STATUS = Hibernation`
			```

			`---`

			`## 🔗 Step 4: Retarget PVCs to Restored Data`

			`### 4.1 Generate Fresh PV UUIDs`
			```bash
			`# Generate new UUIDs for PV/PVC binding`
			`DATA_PV_UUID=$(uuidgen \| tr '[:upper:]' '[:lower:]')`
			`WAL_PV_UUID=$(uuidgen \| tr '[:upper:]' '[:lower:]')`

			`echo "Data PV UUID: $DATA_PV_UUID"`
			`echo "WAL PV UUID: $WAL_PV_UUID"`
			```

			`### 4.2 Patch PVs with Binding UUIDs`
			```bash
			`# Patch data PV`
			`kubectl patch pv postgres-shared-data-recovered-pv -p "{`
			`\"metadata\": {`
			`\"uid\": \"$DATA_PV_UUID\"`
			`},`
			`\"spec\": {`
			`\"claimRef\": {`
			`\"name\": \"postgres-shared-1\",`
			`\"namespace\": \"postgresql-system\",`
			`\"uid\": \"$DATA_PV_UUID\"`
			`}`
			`}`
			`}"`

			`# Patch WAL PV`
			`kubectl patch pv postgres-shared-wal-recovered-pv -p "{`
			`\"metadata\": {`
			`\"uid\": \"$WAL_PV_UUID\"`
			`},`
			`\"spec\": {`
			`\"claimRef\": {`
			`\"name\": \"postgres-shared-1-wal\",`
			`\"namespace\": \"postgresql-system\",`
			`\"uid\": \"$WAL_PV_UUID\"`
			`}`
			`}`
			`}"`
			```

			`### 4.3 Patch PVCs with Matching UUIDs`
			```bash
			`# Patch data PVC`
			`kubectl patch pvc postgres-shared-1 -n postgresql-system -p "{`
			`\"metadata\": {`
			`\"uid\": \"$DATA_PV_UUID\"`
			`},`
			`\"spec\": {`
			`\"volumeName\": \"postgres-shared-data-recovered-pv\"`
			`}`
			`}"`

			`# Patch WAL PVC`
			`kubectl patch pvc postgres-shared-1-wal -n postgresql-system -p "{`
			`\"metadata\": {`
			`\"uid\": \"$WAL_PV_UUID\"`
			`},`
			`\"spec\": {`
			`\"volumeName\": \"postgres-shared-wal-recovered-pv\"`
			`}`
			`}"`
			```

			`### 4.4 Verify PVC Binding`
			```bash
			`kubectl get pvc -n postgresql-system`
			`# Both PVCs should show STATUS = Bound`
			```

			`---`

			`## 🌅 Step 5: Wake Cluster from Hibernation`

			`### 5.1 Remove Hibernation Annotation`
			```bash
			`# 🔑 CRITICAL: This starts the cluster with your restored data`
			`kubectl annotate cluster postgres-shared -n postgresql-system cnpg.io/hibernation-`

			`# Monitor cluster startup`
			`kubectl get cluster postgres-shared -n postgresql-system -w`
			```

			`### 5.2 Monitor Pod Startup`
			```bash
			`# Watch pod creation and startup`
			`kubectl get pods -n postgresql-system -l cnpg.io/cluster=postgres-shared -w`

			`# Check logs for successful data adoption`
			`kubectl logs postgres-shared-1 -n postgresql-system -f`
			```

			`🔍 Expected Log Messages:`
			```
			`INFO: PostgreSQL Database directory appears to contain a database`
			`INFO: Looking at the contents of PostgreSQL database directory`
			`INFO: Database found, skipping initialization`
			`INFO: Starting PostgreSQL with recovered data`
			```

			`---`

			`## 🔍 Step 6: Verify Data Recovery`

			`### 6.1 Check Cluster Status`
			```bash
			`kubectl get cluster postgres-shared -n postgresql-system`
			`# Should show: STATUS = Cluster in healthy state, PRIMARY = postgres-shared-1`
			```

			`### 6.2 Test Database Connectivity`
			```bash
			`# Test connection`
			`kubectl exec postgres-shared-1 -n postgresql-system -- psql -c "\l"`

			`# Verify all application databases exist`
			`kubectl exec postgres-shared-1 -n postgresql-system -- psql -c "`
			`SELECT datname, pg_size_pretty(pg_database_size(datname)) as size`
			`FROM pg_database`
			`WHERE datname NOT IN ('template0', 'template1', 'postgres')`
			`ORDER BY pg_database_size(datname) DESC;`
			`"`
			```

			`### 6.3 Verify Application Data`
			```bash
			`# Test specific application tables (example for Mastodon)`
			`kubectl exec postgres-shared-1 -n postgresql-system -- psql mastodon_production -c "`
			`SELECT COUNT(*) as total_accounts FROM accounts;`
			`SELECT COUNT(*) as total_statuses FROM statuses;`
			`"`
			```

			`---`

			`## 📈 Step 7: Scale to High Availability (Optional)`

			`### 7.1 Enable Replica Creation`
			```bash
			`# Scale cluster to 2 instances for HA`
			`kubectl patch cluster postgres-shared -n postgresql-system -p '{`
			`"spec": {`
			`"instances": 2,`
			`"minSyncReplicas": 0,`
			`"maxSyncReplicas": 1`
			`}`
			`}'`
			```

			`### 7.2 Monitor Replica Join`
			```bash
			`# Watch replica creation and sync`
			`kubectl get pods -n postgresql-system -l cnpg.io/cluster=postgres-shared -w`

			`# Monitor replication lag`
			`kubectl exec postgres-shared-1 -n postgresql-system -- psql -c "`
			`SELECT client_addr, state, sent_lsn, write_lsn, flush_lsn, replay_lsn,`
			`write_lag, flush_lag, replay_lag`
			`FROM pg_stat_replication;`
			`"`
			```

			`---`

			`## 🔧 Step 8: Application Connectivity (Service Aliases)`

			`### 8.1 Create Service Aliases for Application Compatibility`

			If your applications expect different service names (e.g., `postgresql-shared-` vs `postgres-shared-`):

			```yaml
			`# postgresql-service-aliases.yaml`
			`apiVersion: v1`
			`kind: Service`
			`metadata:`
			`name: postgresql-shared-rw`
			`namespace: postgresql-system`
			`labels:`
			`cnpg.io/cluster: postgres-shared`
			`spec:`
			`type: ClusterIP`
			`ports:`
			`- name: postgres`
			`port: 5432`
			`protocol: TCP`
			`targetPort: 5432`
			`selector:`
			`cnpg.io/cluster: postgres-shared`
			`cnpg.io/instanceRole: primary`
			`---`
			`apiVersion: v1`
			`kind: Service`
			`metadata:`
			`name: postgresql-shared-ro`
			`namespace: postgresql-system`
			`labels:`
			`cnpg.io/cluster: postgres-shared`
			`spec:`
			`type: ClusterIP`
			`ports:`
			`- name: postgres`
			`port: 5432`
			`protocol: TCP`
			`targetPort: 5432`
			`selector:`
			`cnpg.io/cluster: postgres-shared`
			`cnpg.io/instanceRole: replica`
			```

			```bash
			`kubectl apply -f postgresql-service-aliases.yaml`
			```

			`### 8.2 Test Application Connectivity`
			```bash
			`# Test from application namespace`
			`kubectl run test-connectivity --image=busybox --rm -it -- nc -zv postgresql-shared-rw.postgresql-system.svc.cluster.local 5432`
			```

			`---`

			`## 🚨 Troubleshooting Common Issues`

			`### Issue 1: Cluster Starts in initdb Mode (Data Loss Risk!)`
			`Symptoms: Logs show "Initializing empty database"`
			`Solution:`
			`1. IMMEDIATELY scale cluster to 0 instances`
			`2. Verify PVC adoption annotations are correct`
			`3. Check that hibernation was properly used`

			```bash
			`kubectl patch cluster postgres-shared -n postgresql-system -p '{"spec":{"instances":0}}'`
			```

			`### Issue 2: PVC Binding Fails`
			`Symptoms: PVCs stuck in "Pending" state`
			`Solution:`
			`1. Check PV/PVC UUID matching`
			2. Verify PV `claimRef` points to correct PVC
			`3. Ensure storage class exists`

			```bash
			`kubectl describe pvc postgres-shared-1 -n postgresql-system`
			`kubectl describe pv postgres-shared-data-recovered-pv`
			```

			`### Issue 3: Pod Restart Loops`
			`Symptoms: Pod continuously restarting with health check failures`
			`Solutions:`
			`1. Check Cilium network policies allow PostgreSQL traffic`
			`2. Verify PostgreSQL data directory permissions`
			`3. Check for TLS/SSL configuration issues`

			```bash
			`# Fix common permission issues`
			`kubectl exec postgres-shared-1 -n postgresql-system -- chown -R postgres:postgres /var/lib/postgresql/data`
			```

			`### Issue 4: Replica Won't Join`
			`Symptoms: Second instance fails to join with replication errors`
			`Solutions:`
			`1. Check primary is stable before adding replica`
			`2. Verify network connectivity between pods`
			`3. Monitor WAL streaming logs`

			```bash
			`# Check replication status`
			`kubectl exec postgres-shared-1 -n postgresql-system -- psql -c "SELECT * FROM pg_stat_replication;"`
			```

			`---`

			`## 📋 Recovery Checklist`

			`Pre-Recovery:`
			`- [ ] Backup current cluster state (if any)`
			`- [ ] Identify Longhorn backup volume names`
			`- [ ] Prepare fresh namespace if needed`
			`- [ ] Verify Longhorn operator is functional`

			`Volume Restoration:`
			`- [ ] Restore data volume from Longhorn backup`
			`- [ ] Restore WAL volume from Longhorn backup`
			`- [ ] Create PersistentVolumes for restored data`
			`- [ ] Verify volumes are healthy in Longhorn UI`

			`Cluster Recovery:`
			`- [ ] Create adoption PVCs with correct annotations`
			`- [ ] Deploy cluster in hibernation mode`
			`- [ ] Generate and assign PV/PVC UUIDs`
			`- [ ] Patch PVs with claimRef binding`
			`- [ ] Patch PVCs with volumeName binding`
			`- [ ] Verify PVC binding before proceeding`

			`Startup:`
			`- [ ] Remove hibernation annotation`
			`- [ ] Monitor pod startup logs for data adoption`
			`- [ ] Verify cluster reaches healthy state`
			`- [ ] Test database connectivity`

			`Validation:`
			`- [ ] Verify all application databases exist`
			`- [ ] Test application table row counts`
			`- [ ] Check database sizes match expectations`
			`- [ ] Test application connectivity`

			`HA Setup (Optional):`
			`- [ ] Scale to 2+ instances`
			`- [ ] Monitor replica join process`
			`- [ ] Verify replication is working`
			`- [ ] Test failover scenarios`

			`Cleanup:`
			`- [ ] Remove temporary PVs/PVCs`
			`- [ ] Update backup configurations`
			`- [ ] Document any configuration changes`
			`- [ ] Test regular backup/restore procedures`

			`---`

			`## ⚠️ CRITICAL SUCCESS FACTORS`

			`1. 🔑 Hibernation is MANDATORY: Never start a cluster without hibernation when adopting existing data`
			`2. 🔑 Single Instance First: Always recover to single instance, then scale to HA`
			`3. 🔑 UUID Matching: PV and PVC UIDs must match exactly for binding`
			`4. 🔑 Adoption Annotations: CloudNativePG annotations must be present on PVCs`
			`5. 🔑 Volume Naming: PVC names must match CloudNativePG instance naming convention`
			`6. 🔑 Network Policies: Ensure Cilium policies allow PostgreSQL traffic`
			`7. 🔑 Monitor Logs: Watch startup logs carefully for data adoption confirmation`

			`---`

			`## 📚 Additional Resources`

			`- [CloudNativePG Documentation](https://cloudnative-pg.io/documentation/)`
			`- [Longhorn Backup & Restore](https://longhorn.io/docs/1.4.0/volumes-and-nodes/backup-and-restore/)`
			`- [Kubernetes Persistent Volumes](https://kubernetes.io/docs/concepts/storage/persistent-volumes/)`
			`- [PostgreSQL Recovery Documentation](https://www.postgresql.org/docs/current/backup-dump.html)`

			`---`

			`🎉 This disaster recovery procedure has been tested and proven successful in production environments!`