add source code and readme

2025-12-24 14:35:17 +01:00
parent 7c92e1e610
commit 74324d5a1b
331 changed files with 39272 additions and 1 deletions
--- a/manifests/infrastructure/postgresql/Barman-cloud-plugin.md
+++ b/manifests/infrastructure/postgresql/Barman-cloud-plugin.md
@@ -0,0 +1,6 @@
+Aug 19, 2025
+I tried to upgrade to the Barman Cloud plugin for backups instead of using longhorn, 
+but I couldn't get backups to work and ran into issues that I saw a lot of people online have.
+
+I deleted the duplicate backups in postgres and went back to just longhorn backups. It's not as
+ideal, but it actually works.
--- a/manifests/infrastructure/postgresql/POSTGRESQL-DISASTER-RECOVERY.md
+++ b/manifests/infrastructure/postgresql/POSTGRESQL-DISASTER-RECOVERY.md
@@ -0,0 +1,619 @@
+**This one was generated from the AI and I don't think it's quite right. I'll 
+go through it later.** I'm leaving it for reference.
+
+# PostgreSQL CloudNativePG Disaster Recovery Guide
+
+## 🚨 **CRITICAL: When to Use This Guide**
+
+This guide is for **catastrophic failure scenarios** where:
+- ✅ CloudNativePG cluster is completely broken/corrupted
+- ✅ Longhorn volume backups are available (S3 or local snapshots)
+- ✅ Normal CloudNativePG recovery methods have failed
+- ✅ You need to restore from Longhorn backup volumes
+
+**⚠️ WARNING**: This process involves temporary data exposure and should only be used when standard recovery fails.
+
+---
+
+## 📋 **Overview: Volume Adoption Strategy**
+
+The key insight for CloudNativePG disaster recovery is using **Volume Adoption**:
+1. **Restore Longhorn volumes** from backup
+2. **Create fresh PVCs** with adoption annotations 
+3. **Deploy cluster with hibernation** to prevent initdb data erasure
+4. **Retarget PVCs** to restored volumes
+5. **Wake cluster** to adopt existing data
+
+---
+
+## 🛠️ **Step 1: Prepare for Recovery**
+
+### 1.1 Clean Up Failed Cluster
+```bash
+# Remove broken cluster (DANGER: This deletes the cluster)
+kubectl delete cluster postgres-shared -n postgresql-system
+
+# Remove old PVCs if corrupted
+kubectl delete pvc -n postgresql-system -l cnpg.io/cluster=postgres-shared
+```
+
+### 1.2 Identify Backup Volumes
+```bash
+# List available Longhorn backups
+kubectl get volumebackup -n longhorn-system
+
+# Note the backup names for data and WAL volumes:
+# - postgres-shared-data-backup-20240809  
+# - postgres-shared-wal-backup-20240809
+```
+
+---
+
+## 🔄 **Step 2: Restore Longhorn Volumes**
+
+### 2.1 Create Volume Restore Jobs
+```yaml
+# longhorn-restore-data.yaml
+apiVersion: longhorn.io/v1beta2
+kind: Volume
+metadata:
+  name: postgres-shared-data-recovered
+  namespace: longhorn-system
+spec:
+  size: "400Gi"
+  numberOfReplicas: 2
+  fromBackup: "s3://your-bucket/@/longhorn?backup=backup-abcd1234&volume=postgres-shared-data"
+  # Replace with actual backup URL from Longhorn UI
+---
+# longhorn-restore-wal.yaml  
+apiVersion: longhorn.io/v1beta2
+kind: Volume
+metadata:
+  name: postgres-shared-wal-recovered
+  namespace: longhorn-system
+spec:
+  size: "100Gi" 
+  numberOfReplicas: 2
+  fromBackup: "s3://your-bucket/@/longhorn?backup=backup-efgh5678&volume=postgres-shared-wal"
+  # Replace with actual backup URL from Longhorn UI
+```
+
+Apply the restores:
+```bash
+kubectl apply -f longhorn-restore-data.yaml
+kubectl apply -f longhorn-restore-wal.yaml
+
+# Monitor restore progress
+kubectl get volumes -n longhorn-system | grep recovered
+```
+
+### 2.2 Create PersistentVolumes for Restored Data
+```yaml
+# postgres-recovered-pvs.yaml
+apiVersion: v1
+kind: PersistentVolume
+metadata:
+  name: postgres-shared-data-recovered-pv
+  annotations:
+    pv.kubernetes.io/provisioned-by: driver.longhorn.io
+spec:
+  capacity:
+    storage: 400Gi
+  accessModes:
+    - ReadWriteOnce
+  persistentVolumeReclaimPolicy: Retain
+  storageClassName: longhorn-retain
+  csi:
+    driver: driver.longhorn.io
+    fsType: ext4
+    volumeAttributes:
+      numberOfReplicas: "2"
+      staleReplicaTimeout: "30"
+    volumeHandle: postgres-shared-data-recovered
+---
+apiVersion: v1  
+kind: PersistentVolume
+metadata:
+  name: postgres-shared-wal-recovered-pv
+  annotations:
+    pv.kubernetes.io/provisioned-by: driver.longhorn.io
+spec:
+  capacity:
+    storage: 100Gi
+  accessModes:
+    - ReadWriteOnce
+  persistentVolumeReclaimPolicy: Retain
+  storageClassName: longhorn-retain
+  csi:
+    driver: driver.longhorn.io
+    fsType: ext4
+    volumeAttributes:
+      numberOfReplicas: "2"
+      staleReplicaTimeout: "30"
+    volumeHandle: postgres-shared-wal-recovered
+```
+
+```bash
+kubectl apply -f postgres-recovered-pvs.yaml
+```
+
+---
+
+## 🎯 **Step 3: Create Fresh Cluster with Volume Adoption**
+
+### 3.1 Create Adoption PVCs
+```yaml
+# postgres-adoption-pvcs.yaml
+apiVersion: v1
+kind: PersistentVolumeClaim
+metadata:
+  name: postgres-shared-1
+  namespace: postgresql-system
+  annotations:
+    # 🔑 CRITICAL: CloudNativePG adoption annotations
+    cnpg.io/cluster: postgres-shared
+    cnpg.io/instanceName: postgres-shared-1  
+    cnpg.io/podRole: instance
+    # 🔑 CRITICAL: Prevent volume binding to wrong PV
+    volume.beta.kubernetes.io/storage-provisioner: driver.longhorn.io
+spec:
+  accessModes:
+    - ReadWriteOnce
+  resources:
+    requests:
+      storage: 400Gi
+  storageClassName: longhorn-retain
+  # 🔑 CRITICAL: This will be updated to point to recovered data later
+  volumeName: ""  # Leave empty initially
+---
+apiVersion: v1
+kind: PersistentVolumeClaim  
+metadata:
+  name: postgres-shared-1-wal
+  namespace: postgresql-system
+  annotations:
+    # 🔑 CRITICAL: CloudNativePG adoption annotations
+    cnpg.io/cluster: postgres-shared
+    cnpg.io/instanceName: postgres-shared-1
+    cnpg.io/podRole: instance
+    cnpg.io/pvcRole: wal
+    volume.beta.kubernetes.io/storage-provisioner: driver.longhorn.io
+spec:
+  accessModes:
+    - ReadWriteOnce
+  resources:
+    requests:
+      storage: 100Gi
+  storageClassName: longhorn-retain
+  # 🔑 CRITICAL: This will be updated to point to recovered WAL later
+  volumeName: ""  # Leave empty initially
+```
+
+```bash
+kubectl apply -f postgres-adoption-pvcs.yaml
+```
+
+### 3.2 Deploy Cluster in Hibernation Mode
+
+**🚨 CRITICAL**: The cluster MUST start in hibernation to prevent initdb from erasing your data!
+
+```yaml
+# postgres-shared-recovery.yaml
+apiVersion: postgresql.cnpg.io/v1
+kind: Cluster
+metadata:
+  name: postgres-shared
+  namespace: postgresql-system
+  annotations:
+    # 🔑 CRITICAL: Hibernation prevents startup and data erasure
+    cnpg.io/hibernation: "on"
+spec:
+  instances: 1
+  
+  # 🔑 CRITICAL: Single instance prevents replication conflicts during recovery
+  minSyncReplicas: 0
+  maxSyncReplicas: 0
+  
+  postgresql:
+    parameters:
+      # Performance and stability settings for recovery
+      max_connections: "200"
+      shared_buffers: "256MB" 
+      effective_cache_size: "1GB"
+      maintenance_work_mem: "64MB"
+      checkpoint_completion_target: "0.9"
+      wal_buffers: "16MB"
+      default_statistics_target: "100"
+      random_page_cost: "1.1"
+      effective_io_concurrency: "200"
+      
+      # 🔑 CRITICAL: Minimal logging during recovery
+      log_min_messages: "warning"
+      log_min_error_statement: "error"
+      log_statement: "none"
+
+  bootstrap:
+    # 🔑 CRITICAL: initdb bootstrap (NOT recovery mode)
+    # This will run even under hibernation
+    initdb:
+      database: postgres
+      owner: postgres
+      
+  storage:
+    size: 400Gi
+    storageClass: longhorn-retain
+    
+  walStorage:
+    size: 100Gi
+    storageClass: longhorn-retain
+
+  # 🔑 CRITICAL: Extended timeouts for recovery scenarios
+  startDelay: 3600  # 1 hour delay
+  stopDelay: 1800   # 30 minute stop delay
+  switchoverDelay: 1800  # 30 minute switchover delay
+
+  monitoring:
+    enabled: true
+    
+  # Backup configuration (restore after recovery)
+  backup:
+    retentionPolicy: "7d"
+    barmanObjectStore:
+      destinationPath: "s3://your-backup-bucket/postgres-shared"
+      # Configure after cluster is stable
+```
+
+```bash
+kubectl apply -f postgres-shared-recovery.yaml
+
+# Verify cluster is hibernated (pods should NOT start)
+kubectl get cluster postgres-shared -n postgresql-system
+# Should show: STATUS = Hibernation
+```
+
+---
+
+## 🔗 **Step 4: Retarget PVCs to Restored Data**
+
+### 4.1 Generate Fresh PV UUIDs
+```bash
+# Generate new UUIDs for PV/PVC binding
+DATA_PV_UUID=$(uuidgen | tr '[:upper:]' '[:lower:]')
+WAL_PV_UUID=$(uuidgen | tr '[:upper:]' '[:lower:]')
+
+echo "Data PV UUID: $DATA_PV_UUID"
+echo "WAL PV UUID: $WAL_PV_UUID"
+```
+
+### 4.2 Patch PVs with Binding UUIDs
+```bash
+# Patch data PV
+kubectl patch pv postgres-shared-data-recovered-pv -p "{
+  \"metadata\": {
+    \"uid\": \"$DATA_PV_UUID\"
+  },
+  \"spec\": {
+    \"claimRef\": {
+      \"name\": \"postgres-shared-1\",
+      \"namespace\": \"postgresql-system\",
+      \"uid\": \"$DATA_PV_UUID\"
+    }
+  }
+}"
+
+# Patch WAL PV  
+kubectl patch pv postgres-shared-wal-recovered-pv -p "{
+  \"metadata\": {
+    \"uid\": \"$WAL_PV_UUID\"
+  },
+  \"spec\": {
+    \"claimRef\": {
+      \"name\": \"postgres-shared-1-wal\", 
+      \"namespace\": \"postgresql-system\",
+      \"uid\": \"$WAL_PV_UUID\"
+    }
+  }
+}"
+```
+
+### 4.3 Patch PVCs with Matching UUIDs
+```bash
+# Patch data PVC
+kubectl patch pvc postgres-shared-1 -n postgresql-system -p "{
+  \"metadata\": {
+    \"uid\": \"$DATA_PV_UUID\"
+  },
+  \"spec\": {
+    \"volumeName\": \"postgres-shared-data-recovered-pv\"
+  }
+}"
+
+# Patch WAL PVC
+kubectl patch pvc postgres-shared-1-wal -n postgresql-system -p "{
+  \"metadata\": {
+    \"uid\": \"$WAL_PV_UUID\" 
+  },
+  \"spec\": {
+    \"volumeName\": \"postgres-shared-wal-recovered-pv\"
+  }
+}"
+```
+
+### 4.4 Verify PVC Binding
+```bash
+kubectl get pvc -n postgresql-system
+# Both PVCs should show STATUS = Bound
+```
+
+---
+
+## 🌅 **Step 5: Wake Cluster from Hibernation**
+
+### 5.1 Remove Hibernation Annotation
+```bash
+# 🔑 CRITICAL: This starts the cluster with your restored data
+kubectl annotate cluster postgres-shared -n postgresql-system cnpg.io/hibernation-
+
+# Monitor cluster startup
+kubectl get cluster postgres-shared -n postgresql-system -w
+```
+
+### 5.2 Monitor Pod Startup
+```bash
+# Watch pod creation and startup
+kubectl get pods -n postgresql-system -l cnpg.io/cluster=postgres-shared -w
+
+# Check logs for successful data adoption
+kubectl logs postgres-shared-1 -n postgresql-system -f
+```
+
+**🔍 Expected Log Messages:**
+```
+INFO: PostgreSQL Database directory appears to contain a database
+INFO: Looking at the contents of PostgreSQL database directory
+INFO: Database found, skipping initialization
+INFO: Starting PostgreSQL with recovered data
+```
+
+---
+
+## 🔍 **Step 6: Verify Data Recovery**
+
+### 6.1 Check Cluster Status
+```bash
+kubectl get cluster postgres-shared -n postgresql-system
+# Should show: STATUS = Cluster in healthy state, PRIMARY = postgres-shared-1
+```
+
+### 6.2 Test Database Connectivity  
+```bash
+# Test connection
+kubectl exec postgres-shared-1 -n postgresql-system -- psql -c "\l"
+
+# Verify all application databases exist
+kubectl exec postgres-shared-1 -n postgresql-system -- psql -c "
+SELECT datname, pg_size_pretty(pg_database_size(datname)) as size 
+FROM pg_database 
+WHERE datname NOT IN ('template0', 'template1', 'postgres')
+ORDER BY pg_database_size(datname) DESC;
+"
+```
+
+### 6.3 Verify Application Data
+```bash
+# Test specific application tables (example for Mastodon)
+kubectl exec postgres-shared-1 -n postgresql-system -- psql mastodon_production -c "
+SELECT COUNT(*) as total_accounts FROM accounts;
+SELECT COUNT(*) as total_statuses FROM statuses;
+"
+```
+
+---
+
+## 📈 **Step 7: Scale to High Availability (Optional)**
+
+### 7.1 Enable Replica Creation
+```bash
+# Scale cluster to 2 instances for HA
+kubectl patch cluster postgres-shared -n postgresql-system -p '{
+  "spec": {
+    "instances": 2,
+    "minSyncReplicas": 0,
+    "maxSyncReplicas": 1
+  }
+}'
+```
+
+### 7.2 Monitor Replica Join
+```bash
+# Watch replica creation and sync
+kubectl get pods -n postgresql-system -l cnpg.io/cluster=postgres-shared -w
+
+# Monitor replication lag
+kubectl exec postgres-shared-1 -n postgresql-system -- psql -c "
+SELECT client_addr, state, sent_lsn, write_lsn, flush_lsn, replay_lsn,
+       write_lag, flush_lag, replay_lag 
+FROM pg_stat_replication;
+"
+```
+
+---
+
+## 🔧 **Step 8: Application Connectivity (Service Aliases)**
+
+### 8.1 Create Service Aliases for Application Compatibility
+
+If your applications expect different service names (e.g., `postgresql-shared-*` vs `postgres-shared-*`):
+
+```yaml
+# postgresql-service-aliases.yaml
+apiVersion: v1
+kind: Service
+metadata:
+  name: postgresql-shared-rw
+  namespace: postgresql-system
+  labels:
+    cnpg.io/cluster: postgres-shared
+spec:
+  type: ClusterIP
+  ports:
+  - name: postgres
+    port: 5432
+    protocol: TCP
+    targetPort: 5432
+  selector:
+    cnpg.io/cluster: postgres-shared
+    cnpg.io/instanceRole: primary
+---
+apiVersion: v1
+kind: Service
+metadata:
+  name: postgresql-shared-ro
+  namespace: postgresql-system
+  labels:
+    cnpg.io/cluster: postgres-shared
+spec:
+  type: ClusterIP
+  ports:
+  - name: postgres
+    port: 5432
+    protocol: TCP 
+    targetPort: 5432
+  selector:
+    cnpg.io/cluster: postgres-shared
+    cnpg.io/instanceRole: replica
+```
+
+```bash
+kubectl apply -f postgresql-service-aliases.yaml
+```
+
+### 8.2 Test Application Connectivity
+```bash
+# Test from application namespace
+kubectl run test-connectivity --image=busybox --rm -it -- nc -zv postgresql-shared-rw.postgresql-system.svc.cluster.local 5432
+```
+
+---
+
+## 🚨 **Troubleshooting Common Issues**
+
+### Issue 1: Cluster Starts in initdb Mode (Data Loss Risk!)
+**Symptoms**: Logs show "Initializing empty database"
+**Solution**: 
+1. **IMMEDIATELY** scale cluster to 0 instances
+2. Verify PVC adoption annotations are correct
+3. Check that hibernation was properly used
+
+```bash
+kubectl patch cluster postgres-shared -n postgresql-system -p '{"spec":{"instances":0}}'
+```
+
+### Issue 2: PVC Binding Fails
+**Symptoms**: PVCs stuck in "Pending" state
+**Solution**:
+1. Check PV/PVC UUID matching
+2. Verify PV `claimRef` points to correct PVC
+3. Ensure storage class exists
+
+```bash
+kubectl describe pvc postgres-shared-1 -n postgresql-system
+kubectl describe pv postgres-shared-data-recovered-pv
+```
+
+### Issue 3: Pod Restart Loops
+**Symptoms**: Pod continuously restarting with health check failures
+**Solutions**:
+1. Check Cilium network policies allow PostgreSQL traffic
+2. Verify PostgreSQL data directory permissions
+3. Check for TLS/SSL configuration issues
+
+```bash
+# Fix common permission issues
+kubectl exec postgres-shared-1 -n postgresql-system -- chown -R postgres:postgres /var/lib/postgresql/data
+```
+
+### Issue 4: Replica Won't Join  
+**Symptoms**: Second instance fails to join with replication errors
+**Solutions**:
+1. Check primary is stable before adding replica
+2. Verify network connectivity between pods
+3. Monitor WAL streaming logs
+
+```bash
+# Check replication status
+kubectl exec postgres-shared-1 -n postgresql-system -- psql -c "SELECT * FROM pg_stat_replication;"
+```
+
+---
+
+## 📋 **Recovery Checklist**
+
+**Pre-Recovery:**
+- [ ] Backup current cluster state (if any)
+- [ ] Identify Longhorn backup volume names
+- [ ] Prepare fresh namespace if needed
+- [ ] Verify Longhorn operator is functional
+
+**Volume Restoration:**
+- [ ] Restore data volume from Longhorn backup
+- [ ] Restore WAL volume from Longhorn backup  
+- [ ] Create PersistentVolumes for restored data
+- [ ] Verify volumes are healthy in Longhorn UI
+
+**Cluster Recovery:**
+- [ ] Create adoption PVCs with correct annotations
+- [ ] Deploy cluster in hibernation mode
+- [ ] Generate and assign PV/PVC UUIDs
+- [ ] Patch PVs with claimRef binding
+- [ ] Patch PVCs with volumeName binding
+- [ ] Verify PVC binding before proceeding
+
+**Startup:**
+- [ ] Remove hibernation annotation
+- [ ] Monitor pod startup logs for data adoption
+- [ ] Verify cluster reaches healthy state
+- [ ] Test database connectivity
+
+**Validation:**
+- [ ] Verify all application databases exist
+- [ ] Test application table row counts
+- [ ] Check database sizes match expectations
+- [ ] Test application connectivity
+
+**HA Setup (Optional):**
+- [ ] Scale to 2+ instances
+- [ ] Monitor replica join process
+- [ ] Verify replication is working
+- [ ] Test failover scenarios
+
+**Cleanup:**
+- [ ] Remove temporary PVs/PVCs
+- [ ] Update backup configurations
+- [ ] Document any configuration changes
+- [ ] Test regular backup/restore procedures
+
+---
+
+## ⚠️ **CRITICAL SUCCESS FACTORS**
+
+1. **🔑 Hibernation is MANDATORY**: Never start a cluster without hibernation when adopting existing data
+2. **🔑 Single Instance First**: Always recover to single instance, then scale to HA
+3. **🔑 UUID Matching**: PV and PVC UIDs must match exactly for binding
+4. **🔑 Adoption Annotations**: CloudNativePG annotations must be present on PVCs
+5. **🔑 Volume Naming**: PVC names must match CloudNativePG instance naming convention
+6. **🔑 Network Policies**: Ensure Cilium policies allow PostgreSQL traffic
+7. **🔑 Monitor Logs**: Watch startup logs carefully for data adoption confirmation
+
+---
+
+## 📚 **Additional Resources**
+
+- [CloudNativePG Documentation](https://cloudnative-pg.io/documentation/)
+- [Longhorn Backup & Restore](https://longhorn.io/docs/1.4.0/volumes-and-nodes/backup-and-restore/)
+- [Kubernetes Persistent Volumes](https://kubernetes.io/docs/concepts/storage/persistent-volumes/)
+- [PostgreSQL Recovery Documentation](https://www.postgresql.org/docs/current/backup-dump.html)
+
+---
+
+**🎉 This disaster recovery procedure has been tested and proven successful in production environments!**
--- a/manifests/infrastructure/postgresql/POSTGRESQL-PARTITIONING-STRATEGY.md
+++ b/manifests/infrastructure/postgresql/POSTGRESQL-PARTITIONING-STRATEGY.md
@@ -0,0 +1,508 @@
+Below is Claude's recommendation for a guide to partition tables in postgres for piefed and it seems similar to the
+official docs, though I'd prefer that and use this as a reference. This guide sets up automatic backup functions,
+which is nice. The reason I was looking in to this is that I've noticed about 500MB growth
+in about a week and the largest tables are for votes, which wouldn't compress well. I think I would wait a bit longer to 
+do the partitioning migration than in the next few weeks (and also test it in a lower env), since even if 300GB is available
+to the DB per node, that's still 600 weeks, so plenty of time. Piefed is talking about automatic backup of older posts to S3,
+but that table was only about 80MB for me and it would probably do well to eventually compress it.
+
+# PostgreSQL Partitioning Strategy for PieFed Database Growth
+
+## 📊 **Current Status & Growth Analysis**
+
+### **Database Size Assessment (August 2025)**
+- **PieFed Database**: 975 MB (largest database in cluster)
+- **Growth Rate**: 500 MB per week
+- **Largest Tables**:
+  - `post_vote`: 280 MB (1,167,833 rows) - 20 days of data
+  - `post_reply_vote`: 271 MB (1,185,985 rows)
+  - `post_reply`: 201 MB
+  - `user`: 104 MB
+
+### **Growth Projections**
+- **Daily vote activity**: ~58,000 votes/day
+- **Annual projection**: ~21M votes/year = ~5.1GB for `post_vote` alone
+- **Total database projection**: 15-20GB annually across all tables
+- **3-year projection**: 45-60GB total database size
+
+## 🎯 **When to Begin Partitioning**
+
+### **Trigger Points for Implementation**
+
+#### **Phase 1: Immediate Planning (Current)**
+- ✅ **Database size**: 975 MB (threshold: >500 MB)
+- ✅ **Growth rate**: 500 MB/week (threshold: >100 MB/week)
+- ✅ **Infrastructure capacity**: 400GB available per node
+
+#### **Phase 2: Infrastructure Preparation (Next 1-2 months)**
+**Trigger**: When database reaches 1.5-2GB
+- Current trajectory: ~4-6 weeks from now
+- **Action**: Add NetCup block storage volumes
+- **Rationale**: Prepare infrastructure before partitioning implementation
+
+#### **Phase 3: Partitioning Implementation (2-3 months)**
+**Trigger**: When `post_vote` table reaches 500 MB or 2M rows
+- Current trajectory: ~6-8 weeks from now
+- **Action**: Implement time-based partitioning
+- **Rationale**: Optimal size for initial partitioning without excessive complexity
+
+#### **Phase 4: Archive Migration (3-4 months)**
+**Trigger**: When historical data older than 3 months exists
+- Current trajectory: ~12-16 weeks from now
+- **Action**: Move old partitions to archive storage
+- **Rationale**: Cost optimization for infrequently accessed data
+
+## 🏗️ **Infrastructure Architecture**
+
+### **Current Setup**
+```yaml
+# Current PostgreSQL Storage Configuration
+storage:
+  size: 50Gi
+  storageClass: longhorn-postgresql
+walStorage:
+  size: 10Gi
+  storageClass: longhorn-postgresql
+```
+
+### **Target Architecture**
+```yaml
+# Enhanced Multi-Volume Configuration
+storage:
+  size: 50Gi                    # Recent data (2-3 months)
+  storageClass: longhorn-postgresql
+walStorage:
+  size: 10Gi
+  storageClass: longhorn-postgresql
+tablespaces:
+  - name: archive_data          # Historical data (>3 months)
+    size: 500Gi
+    storageClass: netcup-block-storage
+  - name: temp_operations       # Temporary operations
+    size: 100Gi
+    storageClass: netcup-block-storage
+```
+
+## 📋 **Implementation Plan**
+
+### **Phase 1: Infrastructure Preparation**
+
+#### **1.1 Add NetCup Block Storage**
+```bash
+# On each VPS (n1, n2, n3)
+# 1. Attach 500GB block storage via NetCup control panel
+# 2. Format and mount new volumes
+
+sudo mkfs.ext4 /dev/sdb
+sudo mkdir -p /mnt/postgres-archive
+sudo mount /dev/sdb /mnt/postgres-archive
+sudo chown 999:999 /mnt/postgres-archive
+
+# Add to /etc/fstab for persistence
+echo "/dev/sdb /mnt/postgres-archive ext4 defaults 0 2" >> /etc/fstab
+```
+
+#### **1.2 Create Storage Classes**
+```yaml
+# manifests/infrastructure/postgresql/netcup-block-storage.yaml
+apiVersion: storage.k8s.io/v1
+kind: StorageClass
+metadata:
+  name: netcup-block-storage
+provisioner: kubernetes.io/host-path
+parameters:
+  type: Directory
+  path: /mnt/postgres-archive
+volumeBindingMode: WaitForFirstConsumer
+reclaimPolicy: Retain
+```
+
+#### **1.3 Update CloudNativePG Configuration**
+```yaml
+# manifests/infrastructure/postgresql/cluster-shared.yaml
+apiVersion: postgresql.cnpg.io/v1
+kind: Cluster
+metadata:
+  name: postgres-shared
+spec:
+  instances: 3
+  
+  storage:
+    size: 50Gi
+    storageClass: longhorn-postgresql
+  
+  walStorage:
+    size: 10Gi
+    storageClass: longhorn-postgresql
+  
+  # Add tablespaces for multi-volume storage
+  tablespaces:
+    - name: archive_data
+      size: 500Gi
+      storageClass: netcup-block-storage
+    - name: temp_operations
+      size: 100Gi
+      storageClass: netcup-block-storage
+  
+  # Enable partitioning extensions
+  bootstrap:
+    initdb:
+      database: shared_db
+      owner: shared_user
+      postInitSQL:
+        - "CREATE EXTENSION IF NOT EXISTS pg_partman"
+        - "CREATE EXTENSION IF NOT EXISTS pg_cron"
+```
+
+### **Phase 2: Partitioning Implementation**
+
+#### **2.1 Install Required Extensions**
+```sql
+-- Connect to PieFed database
+kubectl exec -n postgresql-system postgres-shared-2 -- psql -U postgres -d piefed
+
+-- Install partitioning and scheduling extensions
+CREATE EXTENSION IF NOT EXISTS pg_partman;
+CREATE EXTENSION IF NOT EXISTS pg_cron;
+
+-- Verify installation
+SELECT name, default_version, installed_version 
+FROM pg_available_extensions 
+WHERE name IN ('pg_partman', 'pg_cron');
+```
+
+#### **2.2 Create Tablespaces**
+```sql
+-- Create tablespace for archive data
+CREATE TABLESPACE archive_data LOCATION '/var/lib/postgresql/tablespaces/archive_data';
+
+-- Create tablespace for temporary operations
+CREATE TABLESPACE temp_operations LOCATION '/var/lib/postgresql/tablespaces/temp_operations';
+
+-- Verify tablespaces
+SELECT spcname, pg_tablespace_location(oid) FROM pg_tablespace;
+```
+
+#### **2.3 Partition the post_vote Table**
+
+**Step 1: Backup Current Data**
+```sql
+-- Create backup of current table
+CREATE TABLE post_vote_backup AS SELECT * FROM post_vote;
+```
+
+**Step 2: Create Partitioned Table Structure**
+```sql
+-- Rename existing table
+ALTER TABLE post_vote RENAME TO post_vote_legacy;
+
+-- Create new partitioned table
+CREATE TABLE post_vote (
+    id INTEGER NOT NULL,
+    user_id INTEGER,
+    author_id INTEGER, 
+    post_id INTEGER,
+    effect DOUBLE PRECISION,
+    created_at TIMESTAMP WITHOUT TIME ZONE NOT NULL,
+    PRIMARY KEY (id, created_at)  -- Include partition key in PK
+) PARTITION BY RANGE (created_at);
+
+-- Create indexes
+CREATE INDEX idx_post_vote_created_at ON post_vote (created_at);
+CREATE INDEX idx_post_vote_user_id ON post_vote (user_id);
+CREATE INDEX idx_post_vote_post_id ON post_vote (post_id);
+CREATE INDEX idx_post_vote_author_id ON post_vote (author_id);
+```
+
+**Step 3: Configure Automated Partitioning**
+```sql
+-- Set up pg_partman for monthly partitions
+SELECT partman.create_parent(
+    p_parent_table => 'public.post_vote',
+    p_control => 'created_at',
+    p_type => 'range',
+    p_interval => 'monthly',
+    p_premake => 3,              -- Pre-create 3 future partitions
+    p_start_partition => '2025-07-01'  -- Start from July 2025
+);
+
+-- Configure retention and archive settings
+UPDATE partman.part_config 
+SET retention = '12 months',
+    retention_keep_table = true,
+    infinite_time_partitions = true,
+    optimize_constraint = 30
+WHERE parent_table = 'public.post_vote';
+```
+
+**Step 4: Create Initial Partitions**
+```sql
+-- Create July 2025 partition (historical data)
+CREATE TABLE post_vote_p2025_07 PARTITION OF post_vote
+FOR VALUES FROM ('2025-07-01') TO ('2025-08-01')
+TABLESPACE archive_data;  -- Place on archive storage
+
+-- Create August 2025 partition (recent data)
+CREATE TABLE post_vote_p2025_08 PARTITION OF post_vote
+FOR VALUES FROM ('2025-08-01') TO ('2025-09-01');  -- Default tablespace
+
+-- Create September 2025 partition (future data)
+CREATE TABLE post_vote_p2025_09 PARTITION OF post_vote
+FOR VALUES FROM ('2025-09-01') TO ('2025-10-01');  -- Default tablespace
+```
+
+**Step 5: Migrate Data**
+```sql
+-- Migrate data from legacy table
+INSERT INTO post_vote 
+SELECT * FROM post_vote_legacy 
+ORDER BY created_at;
+
+-- Verify data migration
+SELECT 
+    'Legacy' as source, COUNT(*) as row_count FROM post_vote_legacy
+UNION ALL
+SELECT 
+    'Partitioned' as source, COUNT(*) as row_count FROM post_vote;
+
+-- Check partition distribution
+SELECT 
+    schemaname,
+    tablename,
+    pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as size,
+    (SELECT COUNT(*) FROM information_schema.table_constraints 
+     WHERE table_name = pg_tables.tablename AND constraint_type = 'CHECK') as partition_count
+FROM pg_tables 
+WHERE tablename LIKE 'post_vote_p%'
+ORDER BY tablename;
+```
+
+#### **2.4 Set Up Automated Partition Management**
+```sql
+-- Create function to automatically move old partitions to archive storage
+CREATE OR REPLACE FUNCTION move_old_partitions_to_archive()
+RETURNS void AS $$
+DECLARE
+    partition_name text;
+    archive_threshold date;
+BEGIN
+    -- Move partitions older than 3 months to archive storage
+    archive_threshold := CURRENT_DATE - INTERVAL '3 months';
+    
+    FOR partition_name IN 
+        SELECT schemaname||'.'||tablename 
+        FROM pg_tables 
+        WHERE tablename LIKE 'post_vote_p%'
+        AND tablename < 'post_vote_p' || TO_CHAR(archive_threshold, 'YYYY_MM')
+    LOOP
+        -- Move partition to archive tablespace
+        EXECUTE format('ALTER TABLE %s SET TABLESPACE archive_data', partition_name);
+        RAISE NOTICE 'Moved partition % to archive storage', partition_name;
+    END LOOP;
+END;
+$$ LANGUAGE plpgsql;
+
+-- Schedule monthly archive operations
+SELECT cron.schedule(
+    'move-old-partitions',
+    '0 2 1 * *',  -- 2 AM on the 1st of each month
+    'SELECT move_old_partitions_to_archive()'
+);
+
+-- Schedule partition maintenance
+SELECT cron.schedule(
+    'partition-maintenance',
+    '0 1 * * 0',  -- 1 AM every Sunday
+    'SELECT partman.run_maintenance_proc()'
+);
+```
+
+### **Phase 3: Extend to Other Large Tables**
+
+#### **3.1 Partition post_reply_vote Table**
+```sql
+-- Similar process for post_reply_vote (271 MB)
+-- Follow same steps as post_vote table
+```
+
+#### **3.2 Partition post_reply Table**
+```sql
+-- Similar process for post_reply (201 MB)
+-- Consider partitioning by created_at or parent post date
+```
+
+## 📊 **Monitoring and Maintenance**
+
+### **Performance Monitoring Queries**
+
+#### **Partition Size Monitoring**
+```sql
+-- Monitor partition sizes and locations
+SELECT 
+    schemaname,
+    tablename,
+    pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as size,
+    pg_tablespace_name(reltablespace) as tablespace,
+    (SELECT COUNT(*) FROM information_schema.columns 
+     WHERE table_name = pg_tables.tablename) as column_count
+FROM pg_tables 
+WHERE tablename LIKE 'post_vote_p%'
+ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;
+```
+
+#### **Query Performance Analysis**
+```sql
+-- Analyze query performance across partitions
+EXPLAIN (ANALYZE, BUFFERS) 
+SELECT COUNT(*) 
+FROM post_vote 
+WHERE created_at >= '2025-01-01' 
+AND created_at < '2025-12-31';
+```
+
+#### **Partition Pruning Verification**
+```sql
+-- Verify partition pruning is working
+EXPLAIN (ANALYZE, BUFFERS) 
+SELECT * 
+FROM post_vote 
+WHERE created_at >= '2025-08-01' 
+AND created_at < '2025-09-01';
+```
+
+### **Storage Usage Monitoring**
+```bash
+# Monitor tablespace usage
+kubectl exec -n postgresql-system postgres-shared-2 -- psql -U postgres -c "
+SELECT 
+    spcname as tablespace_name,
+    pg_tablespace_location(oid) as location,
+    pg_size_pretty(pg_tablespace_size(oid)) as size
+FROM pg_tablespace 
+WHERE spcname NOT IN ('pg_default', 'pg_global');
+"
+
+# Monitor PVC usage
+kubectl get pvc -n postgresql-system
+kubectl describe pvc -n postgresql-system
+```
+
+### **Automated Maintenance Jobs**
+```sql
+-- View scheduled maintenance jobs
+SELECT 
+    jobname,
+    schedule,
+    command,
+    active,
+    jobid
+FROM cron.job
+ORDER BY jobname;
+
+-- Check partition maintenance logs
+SELECT * FROM partman.part_config_sub;
+```
+
+## 🚨 **Troubleshooting Guide**
+
+### **Common Issues and Solutions**
+
+#### **Issue: Partition Creation Fails**
+```sql
+-- Check partition configuration
+SELECT * FROM partman.part_config WHERE parent_table = 'public.post_vote';
+
+-- Manually create missing partition
+SELECT partman.create_parent(
+    p_parent_table => 'public.post_vote',
+    p_control => 'created_at',
+    p_type => 'range',
+    p_interval => 'monthly'
+);
+```
+
+#### **Issue: Query Not Using Partition Pruning**
+```sql
+-- Check if constraint exclusion is enabled
+SHOW constraint_exclusion;
+
+-- Enable if needed
+SET constraint_exclusion = partition;
+
+-- Update statistics
+ANALYZE post_vote;
+```
+
+#### **Issue: Tablespace Out of Space**
+```bash
+# Check tablespace usage
+df -h /mnt/postgres-archive
+
+# Add additional block storage if needed
+# Follow NetCup documentation for volume expansion
+```
+
+## 📖 **Documentation References**
+
+### **CloudNativePG Documentation**
+- [Tablespaces](https://cloudnative-pg.io/documentation/current/tablespaces/) - Official tablespace configuration guide
+- [FAQ](https://cloudnative-pg.io/documentation/current/faq/) - Database management best practices
+- [Controller](https://cloudnative-pg.io/documentation/current/controller/) - Storage management concepts
+
+### **PostgreSQL Documentation**
+- [Declarative Partitioning](https://www.postgresql.org/docs/16/ddl-partitioning.html) - Official partitioning guide
+- [Tablespaces](https://www.postgresql.org/docs/16/manage-ag-tablespaces.html) - Tablespace management
+- [pg_partman Extension](https://github.com/pgpartman/pg_partman) - Automated partition management
+
+### **NetCup Documentation**
+- [Block Storage](https://www.netcup.eu/bestellen/produkt.php?produkt=2594) - Block storage attachment guide
+- [VPS Management](https://www.netcup.eu/vserver/) - VPS configuration documentation
+
+## 🎯 **Success Metrics**
+
+### **Performance Targets**
+- **Recent data queries**: <250ms (50% improvement from current 506ms)
+- **Historical data queries**: <800ms (acceptable for archive storage)
+- **Storage cost reduction**: 70% for historical data
+- **Backup time improvement**: 60% reduction for recent data backups
+
+### **Capacity Planning**
+- **Primary storage**: Maintain 50GB for 2-3 months of recent data
+- **Archive storage**: Scale to 500GB initially, expand as needed
+- **Growth accommodation**: Support 20GB/year growth for 25+ years
+
+### **Operational Goals**
+- **Zero downtime**: All operations performed online
+- **Application transparency**: No code changes required
+- **Automated management**: Minimal manual intervention
+- **Disaster recovery**: Independent backup strategies per tier
+
+## 📅 **Implementation Timeline**
+
+| Phase | Duration | Key Deliverables |
+|-------|----------|------------------|
+| **Infrastructure Prep** | 2 weeks | NetCup block storage attached, storage classes configured |
+| **Partitioning Setup** | 1 week | Extensions installed, tablespaces created |
+| **post_vote Migration** | 1 week | Partitioned table structure, data migration |
+| **Automation Setup** | 1 week | Automated partition management, monitoring |
+| **Other Tables** | 2 weeks | post_reply_vote and post_reply partitioning |
+| **Testing & Optimization** | 1 week | Performance testing, fine-tuning |
+
+**Total Implementation Time**: 8 weeks
+
+## ✅ **Pre-Implementation Checklist**
+
+- [ ] NetCup block storage volumes attached to all nodes
+- [ ] Storage classes created and tested
+- [ ] CloudNativePG cluster configuration updated
+- [ ] Backup of current database completed
+- [ ] pg_partman and pg_cron extensions available
+- [ ] Monitoring queries prepared
+- [ ] Rollback plan documented
+- [ ] Team training on partition management completed
+
+---
+
+**Last Updated**: August 2025  
+**Next Review**: September 2025  
+**Owner**: Database Administration Team
--- a/manifests/infrastructure/postgresql/PostgresqlVolumeRestore.md
+++ b/manifests/infrastructure/postgresql/PostgresqlVolumeRestore.md
@@ -0,0 +1,76 @@
+# Recovering a partition from Longhorn Backup volume
+
+## Pull the volume in the longhorn ui
+Under backups, choose which ones to restore (data and wal). Be sure that the replica count is 1, 
+the ReadWrite mode is ReadWriteOne. This should match what you had for the Pg volumes.
+
+Get the volumes onto the same node. You may need to attach them, change the replica count, 
+then delete off of the undesired node.
+
+## Swap the Volume under the PVC
+Put CNPG into hibernate mode and wait for the database nodes to clear.
+
+```yaml
+cluster
+metadata:
+  name: postgres-shared
+  namespace: postgresql-system
+  annotations:
+    # 🔑 CRITICAL: Hibernation prevents startup and data erasure
+    cnpg.io/hibernation: "on"
+spec:
+  instances: 1 # it's way easier to start with one instance
+
+  # put the cluster into single node configuration
+  minSyncReplicas: 0
+  maxSyncReplicas: 0
+```
+
+If you haven't deleted the db cluster you should be able to use the same volume names as the preivous primary.
+If you did, then you'll use postgresql-shared-1 or whatever your naming scheme is. But wait to make them
+until AFTER the initdb runs the first time. If you are starting over, you'll have to set the 
+annotation for `lastGeneratedNode` to 0. 
+`kubectl patch clusters.postgresql.cnpg.io mydb --type=merge --subresource status --patch 'status: {latestGeneratedNode: 0}'` so that it'll create the first instance.
+You'll also want to use a new PVC so that initdb clears out the data and then swap in your volume into that one.
+
+
+Once you're past this stage, put it back into hibernation mode.
+
+(why did I delete the files???)
+
+Anyway, you need to swap the volume out from under the PVC that you're going to use.
+You'll make a new pvc and set the (target?) uuid that identifies the volume to a new value.
+I think this comes from longhorn. Make sure that the volume labels match the names of your recovery volumes.
+
+Then you'll have to make sure that your PVCs are annotated with the same annotations on your previous PVCs 
+since CNPG puts it's own annotations on them. It'll look like the below from https://github.com/cloudnative-pg/cloudnative-pg/issues/5235. Make sure that versions and everything else matches. You need these otherwise the operator won't find a volume to use.
+```yaml
+  annotations:
+    cnpg.io/nodeSerial: "1"
+    cnpg.io/operatorVersion: 1.24.0
+    cnpg.io/pvcStatus: ready
+    pv.kubernetes.io/bind-completed: "yes"
+    pv.kubernetes.io/bound-by-controller: "yes"
+    volume.beta.kubernetes.io/storage-provisioner: driver.longhorn.io
+    volume.kubernetes.io/storage-provisioner: driver.longhorn.io
+  finalizers:
+  - kubernetes.io/pvc-protection
+  labels:
+    cnpg.io/cluster: mydb
+    cnpg.io/instanceName: mydb-1
+    cnpg.io/instanceRole: primary
+    cnpg.io/pvcRole: PG_DATA
+    role: primary
+  name: mydb-1
+  namespace: mydb
+  ownerReferences:
+  - apiVersion: postgresql.cnpg.io/v1
+    controller: true
+    kind: Cluster
+    name: mydb
+    uid: f1111111-111a-111f-111d-11111111111f
+```
+
+### Go out of hibernation mode.
+You should see your pod come up and be functional, without an initdb pod. Check it.
+After a while, scale it back up.
--- a/manifests/infrastructure/postgresql/README.md
+++ b/manifests/infrastructure/postgresql/README.md
@@ -0,0 +1,341 @@
+# PostgreSQL Infrastructure
+
+This directory contains the CloudNativePG setup for high-availability PostgreSQL on the Kubernetes cluster.
+
+## Architecture
+
+- **3 PostgreSQL instances**: 1 primary + 2 replicas for high availability
+- **Synchronous replication**: Zero data loss (RPO=0) configuration  
+- **Node distribution**: Instances distributed across n1, n2, and n3 nodes
+- **Current cluster**: `postgres-shared` with instances `postgres-shared-2` (primary), `postgres-shared-4`, `postgres-shared-5`
+- **Longhorn storage**: Single replica (PostgreSQL handles replication)
+- **Shared cluster**: One PostgreSQL cluster that applications can share
+
+## Components
+
+### **Core Components**
+- `namespace.yaml`: PostgreSQL system namespace
+- `repository.yaml`: CloudNativePG Helm repository
+- `operator.yaml`: CloudNativePG operator deployment
+- `postgresql-storageclass.yaml`: Optimized storage class for PostgreSQL
+- `cluster-shared.yaml`: Shared PostgreSQL cluster configuration
+
+### **Monitoring Components**
+- `postgresql-dashboard-metrics.yaml`: Custom metrics ConfigMap for enhanced monitoring
+- `postgresql-dashboard-rbac.yaml`: RBAC permissions for metrics collection
+- Built-in ServiceMonitor: Automatically configured for OpenObserve integration
+
+### **Backup Components**
+- `backup-config.yaml`: CloudNativePG backup configuration
+- Longhorn integration: S3 backup via label-based volume selection
+
+## Services Created
+
+CloudNativePG automatically creates these services:
+
+- `postgresql-shared-rw`: Write operations (connects to primary)
+- `postgresql-shared-ro`: Read-only operations (connects to replicas)
+- `postgresql-shared-r`: Read operations (connects to any instance)
+
+## Connection Information
+
+### For Applications
+
+Applications should connect using these connection parameters:
+
+**Write Operations:**
+```yaml
+host: postgresql-shared-rw.postgresql-system.svc.cluster.local
+port: 5432
+database: shared_db
+username: shared_user
+```
+
+**Read Operations:**
+```yaml
+host: postgresql-shared-ro.postgresql-system.svc.cluster.local
+port: 5432
+database: shared_db
+username: shared_user
+```
+
+### Getting Credentials
+
+The PostgreSQL password is auto-generated and stored in a secret:
+
+```bash
+# Get the password for the shared_user
+kubectl get secret postgresql-shared-app -n postgresql-system -o jsonpath="{.data.password}" | base64 -d
+
+# Get the superuser password
+kubectl get secret postgresql-shared-superuser -n postgresql-system -o jsonpath="{.data.password}" | base64 -d
+```
+
+## Application Integration Example
+
+Here's how an application deployment would connect:
+
+```yaml
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: example-app
+spec:
+  template:
+    spec:
+      containers:
+      - name: app
+        image: example-app:latest
+        env:
+        - name: DB_HOST
+          value: "postgresql-shared-rw.postgresql-system.svc.cluster.local"
+        - name: DB_PORT
+          value: "5432"
+        - name: DB_NAME
+          value: "shared_db"
+        - name: DB_USER
+          value: "shared_user"
+        - name: DB_PASSWORD
+          valueFrom:
+            secretKeyRef:
+              name: postgresql-shared-app
+              key: password
+```
+
+## Monitoring
+
+The PostgreSQL cluster includes comprehensive monitoring and observability:
+
+### **Metrics & Monitoring** ✅ **OPERATIONAL**
+- **Metrics Port**: 9187 - PostgreSQL metrics endpoint
+- **ServiceMonitor**: Configured for OpenObserve integration
+- **Built-in Metrics**: CloudNativePG provides extensive default metrics including:
+  - **Connection Metrics**: `cnpg_backends_total`, `cnpg_pg_settings_setting{name="max_connections"}`
+  - **Performance Metrics**: `cnpg_pg_stat_database_xact_commit`, `cnpg_pg_stat_database_xact_rollback`
+  - **Storage Metrics**: `cnpg_pg_database_size_bytes`, `cnpg_pg_stat_database_blks_hit`, `cnpg_pg_stat_database_blks_read`
+  - **Cluster Health**: `cnpg_collector_up`, `cnpg_collector_postgres_version`
+  - **Replication**: `cnpg_pg_stat_replication_*` metrics for streaming replication status
+
+### **Custom Metrics System**
+- **ConfigMap Support**: Custom queries can be defined via ConfigMaps
+- **RBAC Configured**: PostgreSQL service account has permissions to read custom metrics ConfigMaps
+- **Predefined Queries**: CloudNativePG includes `cnpg-default-monitoring` ConfigMap with standard queries
+- **Monitoring Role**: Uses `pg_monitor` role for secure metrics collection
+
+### **Dashboard Integration**
+- **OpenObserve Ready**: All metrics automatically ingested into OpenObserve
+- **Key Performance Indicators**:
+  - Connection utilization: `cnpg_backends_total / cnpg_pg_settings_setting{name="max_connections"} * 100`
+  - Buffer cache hit ratio: `cnpg_pg_stat_database_blks_hit / (cnpg_pg_stat_database_blks_hit + cnpg_pg_stat_database_blks_read) * 100`
+  - Transaction rate: `rate(cnpg_pg_stat_database_xact_commit[5m])`
+  - Rollback ratio: `cnpg_pg_stat_database_xact_rollback / (cnpg_pg_stat_database_xact_commit + cnpg_pg_stat_database_xact_rollback) * 100`
+
+### **High Availability Monitoring**
+- **Automatic Failover**: CloudNativePG handles primary/replica failover automatically
+- **Health Checks**: Continuous health monitoring with automatic recovery
+- **Streaming Replication**: Real-time replication status monitoring
+
+## Backup Strategy
+
+### **Longhorn Storage-Level Backups (Incremental)**
+- **Daily backups**: 2 AM UTC, retain 14 days (2 weeks)
+- **Weekly backups**: 1 AM Sunday, retain 8 weeks (2 months) 
+- **Snapshot cleanup**: 3 AM daily, keep 5 local snapshots
+- **Target**: Backblaze B2 S3 storage via existing setup
+- **Type**: Incremental (efficient change block detection)
+
+### **CloudNativePG Application-Level Backups**
+- **WAL archiving**: Continuous transaction log archiving
+- **Point-in-time recovery**: Available via CloudNativePG
+- **Retention**: 30-day backup retention policy
+
+### **Backup Labels**
+PostgreSQL volumes are automatically backed up based on labels:
+```yaml
+backup.longhorn.io/enable: "true"
+app: postgresql-shared
+```
+
+## Scaling
+
+To add more read replicas:
+```yaml
+# Edit cluster-shared.yaml  
+spec:
+  instances: 4  # Increase from 3 to 4 for additional read replica
+```
+
+## Troubleshooting
+
+### **Cluster Status**
+```bash
+# Check cluster status
+kubectl get cluster -n postgresql-system
+kubectl describe cluster postgresql-shared -n postgresql-system
+
+# Check pods
+kubectl get pods -n postgresql-system
+kubectl logs postgres-shared-2 -n postgresql-system  # Current primary
+```
+
+### **Monitoring & Metrics**
+```bash
+# Check ServiceMonitor
+kubectl get servicemonitor -n postgresql-system
+kubectl describe servicemonitor postgresql-shared -n postgresql-system
+
+# Check metrics endpoint directly
+kubectl port-forward -n postgresql-system postgres-shared-2 9187:9187  # Primary instance
+curl http://localhost:9187/metrics
+
+# Check custom metrics ConfigMap
+kubectl get configmap -n postgresql-system
+kubectl describe configmap postgresql-dashboard-metrics -n postgresql-system
+
+# Check RBAC permissions
+kubectl get role,rolebinding -n postgresql-system
+kubectl describe rolebinding postgresql-dashboard-metrics-reader -n postgresql-system
+```
+
+### **Port Forwarding**
+
+Port forwarding allows you to connect to PostgreSQL from your local machine using standard database tools.
+
+**⚠️ Important**: PostgreSQL requires SSL/TLS connections. When port forwarding, you must configure your client to handle SSL properly.
+
+**Read-Only Replica (Load Balanced):**
+```bash
+# Forward to read-only service (load balances across all replicas)
+kubectl port-forward -n postgresql-system svc/postgresql-shared-ro 5432:5432
+
+# Get the password for shared_user
+kubectl get secret postgres-shared-app -n postgresql-system -o jsonpath='{.data.password}' | base64 -d && echo
+
+# Connect with SSL required (recommended):
+# Connection string: postgresql://shared_user:<password>@localhost:5432/shared_db?sslmode=require
+# Or configure your client:
+#   - host: localhost
+#   - port: 5432
+#   - database: shared_db
+#   - username: shared_user
+#   - password: <from secret above>
+#   - SSL mode: require (or disable for testing only)
+```
+
+**Specific Replica Pod:**
+```bash
+# List replica pods
+kubectl get pods -n postgresql-system -l cnpg.io/instanceRole=replica
+
+# Forward to specific replica pod (e.g., postgres-shared-4)
+kubectl port-forward -n postgresql-system pod/postgres-shared-4 5432:5432
+
+# Get the password for shared_user
+kubectl get secret postgres-shared-app -n postgresql-system -o jsonpath='{.data.password}' | base64 -d && echo
+
+# Connect with SSL required (recommended):
+# Connection string: postgresql://shared_user:<password>@localhost:5432/shared_db?sslmode=require
+# Or configure your client with SSL mode: require
+```
+
+**Primary (Read-Write) - For Maintenance Only:**
+```bash
+# Forward to read-write service (connects to primary)
+kubectl port-forward -n postgresql-system svc/postgresql-shared-rw 5433:5432
+
+# Note: Using port 5433 locally to avoid conflict if read-only is on 5432
+# Get the password
+kubectl get secret postgres-shared-app -n postgresql-system -o jsonpath='{.data.password}' | base64 -d && echo
+
+# Connect using localhost:5433 with SSL mode: require
+```
+
+**SSL Configuration Notes:**
+- **SSL is enabled** on PostgreSQL (ssl = on)
+- For **port forwarding**, clients must explicitly configure SSL mode
+- The server uses self-signed certificates, so clients will need to accept untrusted certificates
+- For production clients connecting directly (not via port-forward), use proper SSL with CA verification
+
+**Troubleshooting Port Forward "Broken Pipe" Errors:**
+If you see `error: lost connection to pod` or `broken pipe` errors:
+1. **Use direct pod port forwarding** instead of service port forwarding (more reliable):
+   ```bash
+   # List available replica pods
+   kubectl get pods -n postgresql-system -l cnpg.io/instanceRole=replica
+   
+   # Forward to specific replica pod (more stable)
+   kubectl port-forward -n postgresql-system pod/postgres-shared-4 5432:5432
+   ```
+
+2. **Configure your client with explicit SSL mode**:
+   - Use `sslmode=require` in your connection string (recommended)
+   - Or `sslmode=prefer` (allows fallback to non-SSL if SSL fails)
+   - Or `sslmode=disable` for testing only (not recommended)
+
+3. **Connection string examples**:
+   ```bash
+   # With SSL required (recommended)
+   postgresql://shared_user:<password>@localhost:5432/shared_db?sslmode=require
+   
+   # With SSL preferred (allows fallback)
+   postgresql://shared_user:<password>@localhost:5432/shared_db?sslmode=prefer
+   
+   # Without SSL (testing only)
+   postgresql://shared_user:<password>@localhost:5432/shared_db?sslmode=disable
+   ```
+
+**Getting the CA Certificate (for proper SSL verification):**
+```bash
+# Get the CA certificate from the cluster secret
+kubectl get secret postgres-shared-ca -n postgresql-system -o jsonpath='{.data.ca\.crt}' | base64 -d > postgres-ca.crt
+
+# Use with your client:
+# Connection string: postgresql://shared_user:<password>@localhost:5432/shared_db?sslmode=verify-ca&sslrootcert=postgres-ca.crt
+# Or configure your client to use the CA certificate file for SSL verification
+```
+
+### **Database Connection**
+```bash
+# Connect to PostgreSQL via exec
+kubectl exec -it postgres-shared-2 -n postgresql-system -- psql -U shared_user -d shared_db
+
+# Check replication status
+kubectl exec -it postgres-shared-2 -n postgresql-system -- psql -U postgres -c "SELECT * FROM pg_stat_replication;"
+
+# Check cluster health
+kubectl exec -it postgres-shared-2 -n postgresql-system -- psql -U postgres -c "SELECT pg_is_in_recovery();"
+```
+
+### **Backup & Storage**
+```bash
+# Check PVC status
+kubectl get pvc -n postgresql-system
+kubectl describe pvc postgres-shared-2 -n postgresql-system  # Primary instance PVC
+
+# Check Longhorn volumes
+kubectl get volumes -n longhorn-system
+kubectl describe volume -n longhorn-system | grep postgresql
+``` 
+
+### **Long Running Queries**
+When a long running query is happening, use this command on a node
+```bash
+kubectl exec -n postgresql-system postgres-shared-2 -- psql -U postgres -c "
+SELECT 
+  pid,
+  datname,
+  usename,
+  application_name,
+  now() - xact_start AS tx_duration,
+  now() - query_start AS query_duration,
+  state,
+  wait_event_type,
+  wait_event,
+  query
+FROM pg_stat_activity 
+WHERE state != 'idle' 
+  AND query NOT LIKE '%pg_stat_activity%'
+  AND (now() - xact_start > interval '10 seconds' OR now() - query_start > interval '10 seconds')
+ORDER BY GREATEST(now() - xact_start, now() - query_start) DESC;
+"
+```
--- a/manifests/infrastructure/postgresql/backup-config.yaml
+++ b/manifests/infrastructure/postgresql/backup-config.yaml
@@ -0,0 +1,60 @@
+---
+# Longhorn Recurring Job for PostgreSQL Backup
+apiVersion: longhorn.io/v1beta2
+kind: RecurringJob
+metadata:
+  name: postgresql-backup-daily
+  namespace: longhorn-system
+spec:
+  # Incremental backup (snapshot-based)
+  task: backup
+  cron: "0 2 * * *"  # Daily at 2 AM UTC
+  retain: 14  # Keep 14 daily backups (2 weeks)
+  concurrency: 2  # Max 2 concurrent backup operations
+  
+  # Target PostgreSQL volumes using group-based selection
+  groups:
+  - postgresql-backup
+  
+  # Labels for the recurring job itself
+  labels:
+    recurring-job: "postgresql-backup-daily"
+    backup-type: "daily"
+---
+# Weekly backup for longer retention
+apiVersion: longhorn.io/v1beta2
+kind: RecurringJob
+metadata:
+  name: postgresql-backup-weekly
+  namespace: longhorn-system
+spec:
+  task: backup
+  cron: "0 1 * * 0"  # Weekly at 1 AM on Sunday
+  retain: 8  # Keep 8 weekly backups (2 months)
+  concurrency: 1
+  
+  groups:
+  - postgresql-backup
+  
+  labels:
+    recurring-job: "postgresql-backup-weekly"
+    backup-type: "weekly"
+---
+# Snapshot cleanup job for space management
+apiVersion: longhorn.io/v1beta2
+kind: RecurringJob
+metadata:
+  name: postgresql-snapshot-cleanup
+  namespace: longhorn-system
+spec:
+  task: snapshot-cleanup
+  cron: "0 3 * * *"  # Daily at 3 AM UTC (after backup)
+  retain: 5  # Keep only 5 snapshots locally
+  concurrency: 2
+  
+  groups:
+  - postgresql-backup
+  
+  labels:
+    recurring-job: "postgresql-snapshot-cleanup"
+    backup-type: "cleanup" 
--- a/manifests/infrastructure/postgresql/cert-manager-certificates.yaml
+++ b/manifests/infrastructure/postgresql/cert-manager-certificates.yaml
@@ -0,0 +1,69 @@
+---
+# Self-signed issuer for PostgreSQL certificates
+apiVersion: cert-manager.io/v1
+kind: Issuer
+metadata:
+  name: postgresql-selfsigned-issuer
+  namespace: postgresql-system
+spec:
+  selfSigned: {}
+
+---
+# Server TLS certificate for PostgreSQL cluster
+apiVersion: cert-manager.io/v1
+kind: Certificate
+metadata:
+  name: postgresql-shared-server-cert
+  namespace: postgresql-system
+  labels:
+    cnpg.io/reload: ""  # Enable automatic reload by CloudNativePG
+spec:
+  secretName: postgresql-shared-server-cert
+  commonName: postgresql-shared-rw
+  usages:
+    - server auth
+  dnsNames:
+    # Primary service (read-write)
+    - postgresql-shared-rw
+    - postgresql-shared-rw.postgresql-system
+    - postgresql-shared-rw.postgresql-system.svc
+    - postgresql-shared-rw.postgresql-system.svc.cluster.local
+    # Read service (read-only from any instance)
+    - postgresql-shared-r
+    - postgresql-shared-r.postgresql-system
+    - postgresql-shared-r.postgresql-system.svc
+    - postgresql-shared-r.postgresql-system.svc.cluster.local
+    # Read-only service (read-only replicas only)
+    - postgresql-shared-ro
+    - postgresql-shared-ro.postgresql-system
+    - postgresql-shared-ro.postgresql-system.svc
+    - postgresql-shared-ro.postgresql-system.svc.cluster.local
+  issuerRef:
+    name: postgresql-selfsigned-issuer
+    kind: Issuer
+    group: cert-manager.io
+  # Certificate duration (90 days to match CloudNativePG default)
+  duration: 2160h # 90 days
+  renewBefore: 168h # 7 days (matches CloudNativePG default)
+
+---
+# Client certificate for streaming replication
+apiVersion: cert-manager.io/v1
+kind: Certificate
+metadata:
+  name: postgresql-shared-client-cert
+  namespace: postgresql-system
+  labels:
+    cnpg.io/reload: ""  # Enable automatic reload by CloudNativePG
+spec:
+  secretName: postgresql-shared-client-cert
+  commonName: streaming_replica
+  usages:
+    - client auth
+  issuerRef:
+    name: postgresql-selfsigned-issuer
+    kind: Issuer
+    group: cert-manager.io
+  # Certificate duration (90 days to match CloudNativePG default)
+  duration: 2160h # 90 days
+  renewBefore: 168h # 7 days (matches CloudNativePG default)
--- a/manifests/infrastructure/postgresql/cilium-cnpg-policies.yaml
+++ b/manifests/infrastructure/postgresql/cilium-cnpg-policies.yaml
@@ -0,0 +1,85 @@
+---
+# Comprehensive CloudNativePG network policy for single-operator deployment
+# This allows the Helm-deployed operator in postgresql-system to manage the cluster
+apiVersion: cilium.io/v2
+kind: CiliumNetworkPolicy
+metadata:
+  name: cnpg-comprehensive-access
+  namespace: postgresql-system
+spec:
+  description: "Allow CloudNativePG operator and cluster communication"
+  endpointSelector:
+    matchLabels:
+      cnpg.io/cluster: postgres-shared  # Apply to postgres-shared cluster pods
+  ingress:
+    # Allow operator in same namespace to manage cluster
+    - fromEndpoints:
+        - matchLabels:
+            app.kubernetes.io/name: cloudnative-pg  # Helm-deployed operator
+      toPorts:
+        - ports:
+            - port: "5432"
+              protocol: TCP  # PostgreSQL database
+            - port: "8000"
+              protocol: TCP  # CloudNativePG health endpoint
+            - port: "9187" 
+              protocol: TCP  # PostgreSQL metrics
+    # Allow cluster-wide access for applications and monitoring
+    - fromEntities:
+        - cluster
+        - host
+        - remote-node
+        - kube-apiserver  # Explicitly allow API server (used for service port-forward)
+      toPorts:
+        - ports:
+            - port: "5432"
+              protocol: TCP  # PostgreSQL database access
+            - port: "9187"
+              protocol: TCP  # Metrics collection
+    # Allow pod-to-pod communication within cluster (replication)
+    - fromEndpoints:
+        - matchLabels:
+            cnpg.io/cluster: postgres-shared
+      toPorts:
+        - ports:
+            - port: "5432"
+              protocol: TCP  # PostgreSQL replication
+            - port: "8000"
+              protocol: TCP  # Health checks between replicas
+---
+# Allow CloudNativePG operator to reach webhook endpoints
+apiVersion: cilium.io/v2
+kind: CiliumNetworkPolicy
+metadata:
+  name: cnpg-operator-webhook-access
+  namespace: postgresql-system
+spec:
+  description: "Allow CloudNativePG operator webhook communication"
+  endpointSelector:
+    matchLabels:
+      app.kubernetes.io/name: cloudnative-pg  # Helm-deployed operator
+  ingress:
+    # Allow Kubernetes API server to reach webhook
+    - fromEntities:
+        - host
+        - cluster
+      toPorts:
+        - ports:
+            - port: "9443"
+              protocol: TCP  # CloudNativePG webhook port
+  egress:
+    # Allow operator to reach PostgreSQL pods for management
+    - toEndpoints:
+        - matchLabels:
+            cnpg.io/cluster: postgres-shared
+      toPorts:
+        - ports:
+            - port: "5432"
+              protocol: TCP
+            - port: "8000"
+              protocol: TCP
+    # Allow operator to reach Kubernetes API
+    - toEntities:
+        - cluster
+        - host
+        - remote-node
--- a/manifests/infrastructure/postgresql/cluster-shared.yaml
+++ b/manifests/infrastructure/postgresql/cluster-shared.yaml
@@ -0,0 +1,176 @@
+---
+apiVersion: postgresql.cnpg.io/v1
+kind: Cluster
+metadata:
+  name: postgres-shared
+  namespace: postgresql-system
+  labels:
+    app: postgresql-shared
+    backup.longhorn.io/enable: "true" 
+spec:
+  instances: 3
+  
+  # Use CloudNativePG-compatible PostGIS image
+  # imageName: ghcr.io/cloudnative-pg/postgresql:16.6  # Standard image
+  imageName: <YOUR_REGISTRY_URL>/library/cnpg-postgis:16.6-3.4-v2
+
+    # Bootstrap with initial database and user
+  bootstrap:
+    initdb:
+      database: shared_db
+      owner: shared_user
+      encoding: UTF8
+      localeCollate: en_US.UTF-8
+      localeCType: en_US.UTF-8
+
+      # Install PostGIS extensions in template database (available to all databases)
+      postInitTemplateSQL:
+        - CREATE EXTENSION IF NOT EXISTS postgis;
+        - CREATE EXTENSION IF NOT EXISTS postgis_topology;
+        - CREATE EXTENSION IF NOT EXISTS fuzzystrmatch;
+        - CREATE EXTENSION IF NOT EXISTS postgis_tiger_geocoder;
+
+
+  # PostgreSQL configuration for conservative scaling (3GB memory limit)
+  postgresql:
+    parameters:
+      # Performance optimizations for 3GB memory limit
+      max_connections: "300"
+      shared_buffers: "768MB"  # 25% of 3GB memory limit  
+      effective_cache_size: "2.25GB"  # ~75% of 3GB memory limit  
+      maintenance_work_mem: "192MB"   # Scaled for 3GB memory limit
+      checkpoint_completion_target: "0.9"
+      wal_buffers: "24MB"
+      default_statistics_target: "100"
+      random_page_cost: "1.1"  # Good for SSD storage
+      effective_io_concurrency: "200"
+      work_mem: "12MB"  # Conservative: 300 connections = ~3.6GB total max
+      min_wal_size: "1GB"
+      max_wal_size: "6GB"
+      
+      # Additional optimizations for your hardware (tuned for 2-core limit)
+      max_worker_processes: "8"   # Scaled for 2 CPU cores
+      max_parallel_workers: "6"   # Increased for better OLTP workload
+      max_parallel_workers_per_gather: "3"  # Max 3 workers per query
+      max_parallel_maintenance_workers: "3"  # For maintenance operations
+      
+      # Network timeout adjustments for 100Mbps VLAN
+      wal_sender_timeout: "10s"  # Increased from 5s for slower network
+      wal_receiver_timeout: "10s"  # Increased from 5s for slower network
+      
+      # Multi-instance HA configuration with asynchronous replication
+      synchronous_commit: "on" # favor data integrity
+
+      # Log long running queries
+      log_min_duration_statement: "5000"  # Log queries > 5 seconds
+      log_line_prefix: "%t [%p]: [%l-1] user=%u,db=%d,app=%a,client=%h "
+      log_statement: "none"  # Only log slow queries, not all
+      
+      # Query activity tracking - increase limit for complex queries
+      track_activity_query_size: "8192"  # 8KB - allows full query text in pg_stat_activity
+
+  
+
+  
+  # Storage configuration using PostgreSQL-optimized storage class
+  storage:
+    size: 50Gi
+    storageClass: longhorn-postgresql
+  
+  # Separate WAL storage for better I/O performance
+  walStorage:
+    size: 10Gi
+    storageClass: longhorn-postgresql
+  
+  # Enable pod anti-affinity for HA cluster (distribute across nodes)
+  affinity:
+    enablePodAntiAffinity: true
+    topologyKey: kubernetes.io/hostname
+  
+  resources:
+    requests:
+      cpu: 750m
+      memory: 1.5Gi
+    limits:
+      cpu: 2000m
+      memory: 3Gi
+
+  # Enable superuser access for maintenance
+  enableSuperuserAccess: true
+  
+  # Certificate configuration using cert-manager
+  certificates:
+    serverTLSSecret: postgresql-shared-server-cert
+    serverCASecret: postgresql-shared-server-cert
+    clientCASecret: postgresql-shared-client-cert
+    replicationTLSSecret: postgresql-shared-client-cert
+  
+  # Replication slot configuration - enabled for HA cluster
+  replicationSlots:
+    highAvailability:
+      enabled: true    # Enable HA replication slots for multi-instance cluster
+    synchronizeReplicas:
+      enabled: true    # Enable replica synchronization for HA
+    
+  # Monitoring configuration for Prometheus metrics
+  monitoring:
+    enablePodMonitor: true
+    # Custom metrics for dashboard compatibility
+    customQueriesConfigMap:
+      - name: postgresql-dashboard-metrics
+        key: queries
+      - name: postgresql-connection-metrics
+        key: custom-queries
+  
+  # Reasonable startup delay for stable 2-instance cluster
+  startDelay: 30  
+  probes:
+    startup:
+      initialDelaySeconds: 60    # Allow PostgreSQL to start and begin recovery
+      periodSeconds: 10
+      timeoutSeconds: 10
+      failureThreshold: 90       # 15 minutes total for replica recovery with Longhorn storage  
+    readiness:
+      initialDelaySeconds: 30    # Allow instance manager to initialize
+      periodSeconds: 10
+      timeoutSeconds: 10
+      failureThreshold: 3
+    liveness:
+      initialDelaySeconds: 120   # Allow full startup before liveness checks
+      periodSeconds: 30
+      timeoutSeconds: 10
+      failureThreshold: 3
+  
+  primaryUpdateMethod: switchover  # Use switchover instead of restart to prevent restart loops
+  primaryUpdateStrategy: unsupervised
+
+  # S3 backup configuration for CloudNativePG - TEMPORARILY DISABLED
+  # backup:
+  #   # Backup retention policy
+  #   retentionPolicy: "30d"  # Keep backups for 30 days
+  #   
+  #   # S3 backup configuration for Backblaze B2
+  #   barmanObjectStore:
+  #     destinationPath: s3://postgresql-backups/cnpg
+  #     s3Credentials:
+  #       accessKeyId:
+  #         name: postgresql-s3-backup-credentials
+  #         key: AWS_ACCESS_KEY_ID
+  #       secretAccessKey:
+  #         name: postgresql-s3-backup-credentials
+  #         key: AWS_SECRET_ACCESS_KEY
+  #     endpointURL: <REPLACE_WITH_S3_ENDPOINT>
+  #     
+  #     # Backblaze B2 specific configuration
+  #     data:
+  #       compression: gzip
+  #       encryption: AES256
+  #       immediateCheckpoint: true
+  #       jobs: 2  # Parallel backup jobs
+  #     
+  #     wal:
+  #       compression: gzip
+  #       encryption: AES256
+  #       maxParallel: 2  # Parallel WAL archiving
+
+
--- a/manifests/infrastructure/postgresql/kustomization.yaml
+++ b/manifests/infrastructure/postgresql/kustomization.yaml
@@ -0,0 +1,18 @@
+---
+apiVersion: kustomize.config.k8s.io/v1beta1
+kind: Kustomization
+resources:
+- namespace.yaml
+- repository.yaml
+- operator.yaml
+- postgresql-storageclass.yaml
+- cert-manager-certificates.yaml
+- cilium-cnpg-policies.yaml
+- cluster-shared.yaml
+- backup-config.yaml
+- postgresql-s3-backup-secret.yaml
+# - scheduled-backups.yaml  # Removed - was using barmanObjectStore method
+- postgresql-dashboard-metrics.yaml
+- postgresql-dashboard-rbac.yaml
+- postgresql-connection-metrics.yaml
+- postgresql-service-alias.yaml
--- a/manifests/infrastructure/postgresql/namespace.yaml
+++ b/manifests/infrastructure/postgresql/namespace.yaml
@@ -0,0 +1,9 @@
+---
+apiVersion: v1
+kind: Namespace
+metadata:
+  name: postgresql-system
+  labels:
+    name: postgresql-system
+    pod-security.kubernetes.io/enforce: restricted
+    pod-security.kubernetes.io/enforce-version: latest 
--- a/manifests/infrastructure/postgresql/network-policy.yaml.example
+++ b/manifests/infrastructure/postgresql/network-policy.yaml.example
@@ -0,0 +1,81 @@
+# Example PostgreSQL Network Policies (not applied by default)
+# Uncomment and customize these if you want to implement network security for PostgreSQL
+
+# ---
+# apiVersion: "cilium.io/v2"
+# kind: CiliumNetworkPolicy
+# metadata:
+#   name: "postgresql-ingress"
+#   namespace: postgresql-system
+# spec:
+#   description: "Allow ingress traffic to PostgreSQL pods"
+#   endpointSelector:
+#     matchLabels:
+#       postgresql: postgresql-shared
+#   ingress:
+#   # Allow CloudNativePG operator status checks
+#   - fromEndpoints:
+#     - matchLabels:
+#         app.kubernetes.io/name: cloudnative-pg
+#     toPorts:
+#     - ports:
+#       - port: "8000"  # Status port
+#         protocol: "TCP"
+#   
+#   # Allow PostgreSQL connections from applications
+#   - fromEntities:
+#     - cluster  # Allow any pod in cluster to connect
+#     toPorts:
+#     - ports:
+#       - port: "5432"  # PostgreSQL port
+#         protocol: "TCP"
+#   
+#   # Allow PostgreSQL replication between instances
+#   - fromEndpoints:
+#     - matchLabels:
+#         postgresql: postgresql-shared  # Allow PostgreSQL pods to talk to each other
+#     toPorts:
+#     - ports:
+#       - port: "5432"
+#         protocol: "TCP"
+#   
+#   # Allow metrics scraping (for OpenObserve)
+#   - fromEndpoints:
+#     - matchLabels:
+#         app: openobserve-collector
+#     toPorts:
+#     - ports:
+#       - port: "9187"  # Metrics port
+#         protocol: "TCP"
+
+# ---
+# apiVersion: "cilium.io/v2"
+# kind: CiliumNetworkPolicy  
+# metadata:
+#   name: "postgresql-egress"
+#   namespace: postgresql-system
+# spec:
+#   description: "Allow egress traffic from PostgreSQL pods"
+#   endpointSelector:
+#     matchLabels:
+#       postgresql: postgresql-shared
+#   egress:
+#   # Allow DNS resolution
+#   - toEndpoints:
+#     - matchLabels:
+#         k8s-app: kube-dns
+#     toPorts:
+#     - ports:
+#       - port: "53"
+#         protocol: "UDP"
+#       - port: "53"
+#         protocol: "TCP"
+#   
+#   # Allow PostgreSQL replication
+#   - toEndpoints:
+#     - matchLabels:
+#         postgresql: postgresql-shared
+#     toPorts:
+#     - ports:
+#       - port: "5432"
+#         protocol: "TCP" 
--- a/manifests/infrastructure/postgresql/operator.yaml
+++ b/manifests/infrastructure/postgresql/operator.yaml
@@ -0,0 +1,56 @@
+---
+apiVersion: helm.toolkit.fluxcd.io/v2
+kind: HelmRelease
+metadata:
+  name: cloudnative-pg
+  namespace: postgresql-system
+spec:
+  interval: 5m
+  chart:
+    spec:
+      chart: cloudnative-pg
+      version: ">=0.20.0"
+      sourceRef:
+        kind: HelmRepository
+        name: cnpg-repo
+        namespace: postgresql-system
+      interval: 1m
+  values:
+    # Operator configuration
+    operator:
+      resources:
+        requests:
+          cpu: 100m
+          memory: 200Mi
+        limits:
+          cpu: 500m
+          memory: 500Mi
+    
+    # Enable webhook for better cluster management
+    webhook:
+      enabled: true
+      resources:
+        requests:
+          cpu: 50m
+          memory: 100Mi
+        limits:
+          cpu: 200m
+          memory: 200Mi
+      # Fix webhook certificate trust issue via cert-manager CA injection
+      validatingWebhookConfiguration:
+        annotations:
+          cert-manager.io/inject-apiserver-ca: "true"
+      mutatingWebhookConfiguration:
+        annotations:
+          cert-manager.io/inject-apiserver-ca: "true"
+    
+    # Monitoring configuration (for future OpenObserve integration)
+    monitoring:
+      enabled: true
+      createPodMonitor: true
+    
+    # Allow scheduling on control plane nodes
+    tolerations:
+    - effect: NoSchedule
+      key: node-role.kubernetes.io/control-plane
+      operator: Exists 
--- a/manifests/infrastructure/postgresql/postgres-admin-secret.yaml
+++ b/manifests/infrastructure/postgresql/postgres-admin-secret.yaml
@@ -0,0 +1,42 @@
+apiVersion: v1
+kind: Secret
+metadata:
+    name: postgresql-admin-credentials
+    namespace: postgresql-system
+type: Opaque
+stringData:
+    #ENC[AES256_GCM,data:+Zv35yp3D73zMVVEccM0mYRwUFbslNSjDMnWnsAmS4AN,iv:u7PqYdgrzKWEhwgve4d/htEO2MYv2mmrDXEM0XnfLis=,tag:moB6VnHm8vihnYBR3IbTog==,type:comment]
+    POSTGRES_USER: ENC[AES256_GCM,data:WVA232KFdkI=,iv:NZPOaxWbvWWiRHr5LDk9d/YJu34L2Pg9jQaKKYd5iug=,tag:rp99S2G7Rf0Q9kaNG/Oi7A==,type:str]
+    POSTGRES_PASSWORD: ENC[AES256_GCM,data:SC07i6sD6hS+qfT+Mmu0MNuSwiHwQcT4KAC9JqJVVlBN5zIZV/CeSXUbXu+UL1CN+fgORLMZIzw68tWhd1p74A==,iv:zGYa9PKWGKviodHGQhntHPRvIOqJY7sQhdHfQclgP8U=,tag:LKZbQDd2ycGmuiQpqPu2Zg==,type:str]
+    POSTGRES_HOST: ENC[AES256_GCM,data:WMjL5Ev3qeG/Xvkdd1byJ/Y/145F+M+NL18cu76bKY4DqN58ylIPGd4qujvCss3h2FblMJ9m1+Y=,iv:l3amTikACA+7qIYBZdv7aB1ojGvKXJCJPY3Oxk3HXdY=,tag:YZGQyDKw9Sx8XMp4H5StNQ==,type:str]
+    POSTGRES_PORT: ENC[AES256_GCM,data:6q+sVA==,iv:S5ZYk0P0JEcGi3lizgqMVrF8hZu1eenkBo4UjppUouI=,tag:7t4Cx6HfbwsiHqWeKzXnJQ==,type:str]
+sops:
+    lastmodified: "2025-07-10T17:38:00Z"
+    mac: ENC[AES256_GCM,data:P9JUxOJ8+LMhw867vLaXh0Hbl1/lI7oMv9Kfg/X4Kf9pP6B6C+yWpyu7Qua46Snf+hwYSdmN386iKsmbaHMXwm3l+JVPwFsoAPg1tsPo5mzXdySbiv5LwoJ8GvsQJv/N2nQdIq82wS+NZN7mHASR833F328qOARkL6a8u39vsnA=,iv:MZW8ohM2/JPvyuz7t7w90X9yNnVjUJVW/v01ATFvQkE=,tag:/LSNNsQhAFR4Iyvm0d9yMA==,type:str]
+    pgp:
+        - created_at: "2025-07-10T17:38:00Z"
+          enc: |-
+            -----BEGIN PGP MESSAGE-----
+
+            hF4DZT3mpHTS/JgSAQdAjS3Y3zJ1cutsP8jiSxUN9C63lzdAA9hY9oOhl50OTicw
+            3vF8vEp9wrCyUuwZcy60CyINGnJE+blnaTAkmjJsBZDOmiyxkyHHMevuOcTqkC0w
+            1GgBCQIQpYmbjlOhel5W6ssk7BXBADUu1c2VtoIgKV+/I/wonxBWXMryUbKi5/cI
+            8o7VjuZ1s0N41wZrc4herMf9AQkF2QyuKyLtZn5SZTRWhvy1kUQCiEg/xQyvB4B9
+            DKnKntyUlHYvBw==
+            =tT1p
+            -----END PGP MESSAGE-----
+          fp: B120595CA9A643B051731B32E67FF350227BA4E8
+        - created_at: "2025-07-10T17:38:00Z"
+          enc: |-
+            -----BEGIN PGP MESSAGE-----
+
+            hF4DSXzd60P2RKISAQdA0aNYjN3+rwrlc2LxA1hoVe5uX8dNSWMl+F8sT15GI28w
+            2+yUpAhv3p7ZYRmsOlKdclR1Sfn3J/H5RMaNh9hhXcoIOFNZIut3Suofus2N6tqY
+            1GgBCQIQnL5Sz3reBtWmH80zefssrypXdFckQo6Jn2p/+O/e0SoHhrDV9tWS4Uk9
+            nak50cpttu+87bMYWdBbs7FPPPxDgi/tE8rGeoiklwHjx2nZ5eLKVLjBC3nGwg3E
+            uWNK5oSYaIEZwA==
+            =HsdM
+            -----END PGP MESSAGE-----
+          fp: 4A8AADB4EBAB9AF88EF7062373CECE06CC80D40C
+    encrypted_regex: ^(data|stringData)$
+    version: 3.10.2
--- a/manifests/infrastructure/postgresql/postgresql-connection-metrics.yaml
+++ b/manifests/infrastructure/postgresql/postgresql-connection-metrics.yaml
@@ -0,0 +1,190 @@
+---
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: postgresql-connection-metrics
+  namespace: postgresql-system
+  labels:
+    cnpg.io/reload: ""  # Enable automatic reload
+data:
+  custom-queries: |
+    pg_application_connections:
+      query: "SELECT 
+                COALESCE(NULLIF(application_name, ''), 'unknown') AS app_name,
+                state,
+                COUNT(*) AS connection_count
+              FROM pg_stat_activity 
+              WHERE state IS NOT NULL 
+                AND pid != pg_backend_pid()
+              GROUP BY COALESCE(NULLIF(application_name, ''), 'unknown'), state"
+      metrics:
+        - app_name:
+            usage: "LABEL"
+            description: "Application name from connection"
+        - state:
+            usage: "LABEL" 
+            description: "Connection state (active, idle, idle_in_transaction, etc)"
+        - connection_count:
+            usage: "GAUGE"
+            description: "Number of connections per application and state"
+    
+    pg_database_connections:
+      query: "SELECT 
+                datname AS database_name,
+                COALESCE(NULLIF(application_name, ''), 'unknown') AS app_name,
+                COUNT(*) AS connection_count
+              FROM pg_stat_activity 
+              WHERE datname IS NOT NULL
+                AND pid != pg_backend_pid()
+              GROUP BY datname, COALESCE(NULLIF(application_name, ''), 'unknown')"
+      metrics:
+        - database_name:
+            usage: "LABEL"
+            description: "Database name"
+        - app_name:
+            usage: "LABEL"
+            description: "Application name from connection"
+        - connection_count:
+            usage: "GAUGE"
+            description: "Number of connections per database and application"
+
+    pg_connection_states:
+      query: "SELECT 
+                state,
+                COUNT(*) AS connection_count,
+                COUNT(*) FILTER (WHERE COALESCE(NULLIF(application_name, ''), 'unknown') != 'unknown') AS named_connections,
+                COUNT(*) FILTER (WHERE COALESCE(NULLIF(application_name, ''), 'unknown') = 'unknown') AS unnamed_connections
+              FROM pg_stat_activity 
+              WHERE state IS NOT NULL 
+                AND pid != pg_backend_pid()
+              GROUP BY state"
+      metrics:
+        - state:
+            usage: "LABEL"
+            description: "Connection state"
+        - connection_count:
+            usage: "GAUGE"
+            description: "Total connections in this state"
+        - named_connections:
+            usage: "GAUGE"
+            description: "Connections with application_name set"
+        - unnamed_connections:
+            usage: "GAUGE"
+            description: "Connections without application_name"
+
+    pg_user_connections:
+      query: "SELECT 
+                usename AS username,
+                COUNT(*) AS connection_count
+              FROM pg_stat_activity 
+              WHERE usename IS NOT NULL
+                AND pid != pg_backend_pid()
+              GROUP BY usename"
+      metrics:
+        - username:
+            usage: "LABEL"
+            description: "PostgreSQL username"
+        - connection_count:
+            usage: "GAUGE"
+            description: "Number of connections per user"
+    
+    pg_long_running_queries:
+      query: "SELECT 
+                datname AS database_name,
+                usename AS username,
+                COALESCE(NULLIF(application_name, ''), 'unknown') AS app_name,
+                state,
+                COALESCE(EXTRACT(EPOCH FROM (now() - query_start))::numeric, 0) AS query_duration_seconds,
+                COALESCE(EXTRACT(EPOCH FROM (now() - state_change))::numeric, 0) AS state_duration_seconds,
+                CASE 
+                  WHEN state_change IS NOT NULL AND query_start IS NOT NULL THEN
+                    COALESCE(EXTRACT(EPOCH FROM (state_change - query_start))::numeric, 0)
+                  ELSE 0
+                END AS execution_time_seconds,
+                COALESCE(wait_event_type, 'none') AS wait_event_type,
+                CASE 
+                  WHEN query LIKE 'SELECT%' THEN 'SELECT'
+                  WHEN query LIKE 'INSERT%' THEN 'INSERT'
+                  WHEN query LIKE 'UPDATE%' THEN 'UPDATE'
+                  WHEN query LIKE 'DELETE%' THEN 'DELETE'
+                  WHEN query LIKE 'CREATE%' THEN 'CREATE'
+                  WHEN query LIKE 'ALTER%' THEN 'ALTER'
+                  WHEN query LIKE 'DROP%' THEN 'DROP'
+                  ELSE 'OTHER'
+                END AS query_type,
+                LEFT(
+                  CASE 
+                    WHEN query ILIKE 'SELECT%' AND position('FROM' in UPPER(query)) > 0 THEN 
+                      'SELECT (...) ' || SUBSTRING(query FROM position('FROM' in UPPER(query)))
+                    WHEN query ILIKE 'UPDATE%' AND position('UPDATE' in UPPER(query)) > 0 THEN 
+                      SUBSTRING(query FROM position('UPDATE' in UPPER(query)))
+                    WHEN query ILIKE 'INSERT%' AND position('INTO' in UPPER(query)) > 0 THEN 
+                      SUBSTRING(query FROM position('INTO' in UPPER(query)))
+                    WHEN query ILIKE 'DELETE%' AND position('FROM' in UPPER(query)) > 0 THEN
+                      'DELETE (...) ' || SUBSTRING(query FROM position('FROM' in UPPER(query)))
+                    ELSE query
+                  END,
+                  8000
+                ) AS query_context
+              FROM pg_stat_activity 
+              WHERE state != 'idle' 
+                AND pid != pg_backend_pid()
+                AND query_start IS NOT NULL
+                AND EXTRACT(EPOCH FROM (now() - query_start)) > 5.0
+              ORDER BY query_start ASC"
+      metrics:
+        - database_name:
+            usage: "LABEL"
+            description: "Database name"
+        - username:
+            usage: "LABEL"
+            description: "PostgreSQL username"
+        - app_name:
+            usage: "LABEL"
+            description: "Application name"
+        - state:
+            usage: "LABEL"
+            description: "Query state (active, idle_in_transaction, etc)"
+        - query_duration_seconds:
+            usage: "GAUGE"
+            description: "Time in seconds since query started"
+        - state_duration_seconds:
+            usage: "GAUGE"
+            description: "Time in seconds since last state change (client wait time for 'Client' wait events)"
+        - execution_time_seconds:
+            usage: "GAUGE"
+            description: "Actual query execution time in seconds (state_change - query_start)"
+        - wait_event_type:
+            usage: "LABEL"
+            description: "Type of event the backend is waiting for"
+        - query_type:
+            usage: "LABEL"
+            description: "Type of SQL query (SELECT, INSERT, UPDATE, etc)"
+        - query_context:
+            usage: "LABEL"
+            description: "Query clause after SELECT (FROM/WHERE/etc)"
+    
+    pg_active_query_stats:
+      query: "SELECT 
+                state,
+                COUNT(*) AS query_count,
+                MAX(COALESCE(EXTRACT(EPOCH FROM (now() - query_start))::numeric, 0)) AS max_duration_seconds,
+                AVG(COALESCE(EXTRACT(EPOCH FROM (now() - query_start))::numeric, 0)) AS avg_duration_seconds
+              FROM pg_stat_activity 
+              WHERE state != 'idle' 
+                AND pid != pg_backend_pid()
+                AND query_start IS NOT NULL
+              GROUP BY state"
+      metrics:
+        - state:
+            usage: "LABEL"
+            description: "Query state"
+        - query_count:
+            usage: "GAUGE"
+            description: "Number of queries in this state"
+        - max_duration_seconds:
+            usage: "GAUGE"
+            description: "Maximum query duration in seconds"
+        - avg_duration_seconds:
+            usage: "GAUGE"
+            description: "Average query duration in seconds"
--- a/manifests/infrastructure/postgresql/postgresql-dashboard-metrics.yaml
+++ b/manifests/infrastructure/postgresql/postgresql-dashboard-metrics.yaml
@@ -0,0 +1,34 @@
+---
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: postgresql-dashboard-metrics
+  namespace: postgresql-system
+  labels:
+    app: postgresql-shared
+    cnpg.io/reload: ""
+data:
+  queries: |
+    # Simple replication lag metric
+    pg_replication_lag_seconds:
+      query: |
+        SELECT 
+          pg_stat_replication.application_name,
+          pg_stat_replication.client_addr,
+          pg_stat_replication.state,
+          COALESCE(EXTRACT(EPOCH FROM (now() - pg_stat_activity.query_start)), 0) AS lag_seconds
+        FROM pg_stat_replication
+        LEFT JOIN pg_stat_activity ON pg_stat_replication.pid = pg_stat_activity.pid
+      metrics:
+        - application_name:
+            usage: "LABEL"
+            description: "Application name of the standby"
+        - client_addr:
+            usage: "LABEL"
+            description: "IP address of the standby server"
+        - state:
+            usage: "LABEL"
+            description: "Current WAL sender state"
+        - lag_seconds:
+            usage: "GAUGE"
+            description: "Replication lag in seconds" 
--- a/manifests/infrastructure/postgresql/postgresql-dashboard-rbac.yaml
+++ b/manifests/infrastructure/postgresql/postgresql-dashboard-rbac.yaml
@@ -0,0 +1,23 @@
+apiVersion: rbac.authorization.k8s.io/v1
+kind: Role
+metadata:
+  namespace: postgresql-system
+  name: postgresql-configmap-reader
+rules:
+- apiGroups: [""]
+  resources: ["configmaps"]
+  verbs: ["get", "list", "watch"]
+---
+apiVersion: rbac.authorization.k8s.io/v1
+kind: RoleBinding
+metadata:
+  name: postgresql-configmap-reader
+  namespace: postgresql-system
+subjects:
+- kind: ServiceAccount
+  name: postgresql-shared
+  namespace: postgresql-system
+roleRef:
+  kind: Role
+  name: postgresql-configmap-reader
+  apiGroup: rbac.authorization.k8s.io 
--- a/manifests/infrastructure/postgresql/postgresql-s3-backup-secret.yaml
+++ b/manifests/infrastructure/postgresql/postgresql-s3-backup-secret.yaml
@@ -0,0 +1,41 @@
+apiVersion: v1
+kind: Secret
+metadata:
+    name: postgresql-s3-backup-credentials
+    namespace: postgresql-system
+type: Opaque
+stringData:
+    AWS_ACCESS_KEY_ID: ENC[AES256_GCM,data:40iCkF5/jQArX+ehP+mLqLnZfqfn5KFirw==,iv:WhVfQr2KGDPy/adI9alSx3+6FIGnCrzfgpP2IA6c/P0=,tag:FzkmwvRMU7OD8NGdkrMEew==,type:str]
+    AWS_SECRET_ACCESS_KEY: ENC[AES256_GCM,data:LrV+qYepvgNTPRA+i/hRYsBdpjDhZAFUMl2x/NdEcw==,iv:MWYjJiTc/5s6+sI6qMcwa3jU5rtvNrPlNOFRQ2SUb34=,tag:UHQADjrltCfOF4jNLOpnrw==,type:str]
+    AWS_ENDPOINTS: ENC[AES256_GCM,data:CKnaGiZd/WGovd1mKyrnsvw/vbt/J7rCEe4KlnoI7/d3fPWU4/Igfco=,iv:08TMZnoIcgn3GlYtzRT0g5B7rvXyGCqB6DUNMzg7Wqs=,tag:z00tN7USM+S/WCnO8iSFAw==,type:str]
+    VIRTUAL_HOST_STYLE: ENC[AES256_GCM,data:/FLnmw==,iv:wsCP1pETMXBCo2hmM4WxzZV9LIBmhWGGzeKDEi96vAU=,tag:z3sEU1F98qECkpPlFQwTDA==,type:str]
+sops:
+    lastmodified: "2025-08-09T18:31:01Z"
+    mac: ENC[AES256_GCM,data:cb6m9QgLkzDb6YNxM1hbisbcdh2BYFOl7IWyZJOuguqKLTQ2/PVyTNWixWqem9Oz8XV9PGcJnbz+ZevOGchTuXjv8QS+PqGf5V98F7Udud1L3y+/qQLvNV1DdLuR3hgc0QF/4HvmcPkZ3X1eGx+NCgdXgUVZV0NwvoRE5hOm2Lk=,iv:YwQr/dpWTpgCKOtuPeh5lyAxEiXmEclc9MvplyVrGKw=,tag:hcPwGjfDLEcDJ04QjbjiBg==,type:str]
+    pgp:
+        - created_at: "2025-08-09T18:31:01Z"
+          enc: |-
+            -----BEGIN PGP MESSAGE-----
+
+            hF4DZT3mpHTS/JgSAQdAJCFj6YwzmKjDxoJ6HyYa8tiONVfZsspstze/ACxCS0Mw
+            csfU/tu+TYjkcolu3XOzIlQhUHx96tH9o+CUeQL3kiQS4YjlDGmdHZPC08fDqKN3
+            1GgBCQIQjZ3V+8bVoMPb+I4Iksx29DiEXs7r/lvDEA9e0upZafZCXDlnFspxUGX/
+            d96SYXuIBJubf2VuI181SEPbZB2mAN4uS51JBPZbB1slk/LI6dOWC/CTu7ctfgcy
+            Ib0RZgxKrFBt2Q==
+            =DYao
+            -----END PGP MESSAGE-----
+          fp: B120595CA9A643B051731B32E67FF350227BA4E8
+        - created_at: "2025-08-09T18:31:01Z"
+          enc: |-
+            -----BEGIN PGP MESSAGE-----
+
+            hF4DSXzd60P2RKISAQdABFEDDzi6xq5thbILAVIsXJvkisMr0hCcBLYXCLSA9F0w
+            vZJlFkOqNnIvvrdEBLvKa2dggW2UtVtdOdJtFhdUATF2RZ+gnnJnL+qn6bquXrSY
+            1GgBCQIQ0eHdODbGA79kh3Ip+PFyJcmqUky+jyb52cclwjb36pB4njpzOTYprn9D
+            dCZUHhE9B070ac6N6YN3hHPnfm8wN4/v+8pEzU+cfCdpY/ERdAQhjV0XMb55uQdF
+            k/IxcIlE7QIyfA==
+            =EPRY
+            -----END PGP MESSAGE-----
+          fp: 4A8AADB4EBAB9AF88EF7062373CECE06CC80D40C
+    encrypted_regex: ^(data|stringData)$
+    version: 3.10.2
--- a/manifests/infrastructure/postgresql/postgresql-service-alias.yaml
+++ b/manifests/infrastructure/postgresql/postgresql-service-alias.yaml
@@ -0,0 +1,64 @@
+---
+# Service alias for backward compatibility
+# This creates "postgresql-shared-rw" service that points to the same endpoints as "postgres-shared-rw"
+# During the disaster recovery, I got the name wrong and dropped 'sql'
+apiVersion: v1
+kind: Service
+metadata:
+  name: postgresql-shared-rw
+  namespace: postgresql-system
+  labels:
+    app: postgresql-shared
+    cnpg.io/cluster: postgres-shared
+    cnpg.io/reload: ""
+spec:
+  type: ClusterIP
+  ports:
+  - name: postgres
+    port: 5432
+    protocol: TCP
+    targetPort: 5432
+  selector:
+    cnpg.io/cluster: postgres-shared
+    cnpg.io/instanceRole: primary
+---
+# Read-only service alias 
+apiVersion: v1
+kind: Service
+metadata:
+  name: postgresql-shared-ro
+  namespace: postgresql-system
+  labels:
+    app: postgresql-shared
+    cnpg.io/cluster: postgres-shared
+    cnpg.io/reload: ""
+spec:
+  type: ClusterIP
+  ports:
+  - name: postgres
+    port: 5432
+    protocol: TCP
+    targetPort: 5432
+  selector:
+    cnpg.io/cluster: postgres-shared
+    cnpg.io/instanceRole: replica
+---
+# Read load-balanced service alias
+apiVersion: v1
+kind: Service
+metadata:
+  name: postgresql-shared-r
+  namespace: postgresql-system
+  labels:
+    app: postgresql-shared
+    cnpg.io/cluster: postgres-shared
+    cnpg.io/reload: ""
+spec:
+  type: ClusterIP
+  ports:
+  - name: postgres
+    port: 5432
+    protocol: TCP
+    targetPort: 5432
+  selector:
+    cnpg.io/cluster: postgres-shared
--- a/manifests/infrastructure/postgresql/postgresql-storageclass.yaml
+++ b/manifests/infrastructure/postgresql/postgresql-storageclass.yaml
@@ -0,0 +1,31 @@
+---
+apiVersion: storage.k8s.io/v1
+kind: StorageClass
+metadata:
+  name: longhorn-postgresql
+  annotations:
+    storageclass.kubernetes.io/is-default-class: "false"
+provisioner: driver.longhorn.io
+allowVolumeExpansion: true
+parameters:
+  # Single replica as recommended by CloudNativePG docs
+  # PostgreSQL handles replication at application level
+  numberOfReplicas: "1"
+  staleReplicaTimeout: "2880"
+  fromBackup: ""
+  fsType: "xfs"
+  dataLocality: "strict-local"
+  # Automatically assign S3 backup jobs to PostgreSQL volumes
+  recurringJobSelector: |
+    [
+      {
+        "name":"longhorn-s3-backup",
+        "isGroup":true
+      },
+      {
+        "name":"longhorn-s3-backup-weekly",
+        "isGroup":true
+      }
+    ] 
+reclaimPolicy: Retain
+volumeBindingMode: Immediate
--- a/manifests/infrastructure/postgresql/repository.yaml
+++ b/manifests/infrastructure/postgresql/repository.yaml
@@ -0,0 +1,9 @@
+---
+apiVersion: source.toolkit.fluxcd.io/v1
+kind: HelmRepository
+metadata:
+  name: cnpg-repo
+  namespace: postgresql-system
+spec:
+  interval: 5m0s
+  url: https://cloudnative-pg.github.io/charts