# PostgreSQL Infrastructure This directory contains the CloudNativePG setup for high-availability PostgreSQL on the Kubernetes cluster. ## Architecture - **3 PostgreSQL instances**: 1 primary + 2 replicas for high availability - **Synchronous replication**: Zero data loss (RPO=0) configuration - **Node distribution**: Instances distributed across n1, n2, and n3 nodes - **Current cluster**: `postgres-shared` with instances `postgres-shared-2` (primary), `postgres-shared-4`, `postgres-shared-5` - **Longhorn storage**: Single replica (PostgreSQL handles replication) - **Shared cluster**: One PostgreSQL cluster that applications can share ## Components ### **Core Components** - `namespace.yaml`: PostgreSQL system namespace - `repository.yaml`: CloudNativePG Helm repository - `operator.yaml`: CloudNativePG operator deployment - `postgresql-storageclass.yaml`: Optimized storage class for PostgreSQL - `cluster-shared.yaml`: Shared PostgreSQL cluster configuration ### **Monitoring Components** - `postgresql-dashboard-metrics.yaml`: Custom metrics ConfigMap for enhanced monitoring - `postgresql-dashboard-rbac.yaml`: RBAC permissions for metrics collection - Built-in ServiceMonitor: Automatically configured for OpenObserve integration ### **Backup Components** - `backup-config.yaml`: CloudNativePG backup configuration - Longhorn integration: S3 backup via label-based volume selection ## Services Created CloudNativePG automatically creates these services: - `postgresql-shared-rw`: Write operations (connects to primary) - `postgresql-shared-ro`: Read-only operations (connects to replicas) - `postgresql-shared-r`: Read operations (connects to any instance) ## Connection Information ### For Applications Applications should connect using these connection parameters: **Write Operations:** ```yaml host: postgresql-shared-rw.postgresql-system.svc.cluster.local port: 5432 database: shared_db username: shared_user ``` **Read Operations:** ```yaml host: postgresql-shared-ro.postgresql-system.svc.cluster.local port: 5432 database: shared_db username: shared_user ``` ### Getting Credentials The PostgreSQL password is auto-generated and stored in a secret: ```bash # Get the password for the shared_user kubectl get secret postgresql-shared-app -n postgresql-system -o jsonpath="{.data.password}" | base64 -d # Get the superuser password kubectl get secret postgresql-shared-superuser -n postgresql-system -o jsonpath="{.data.password}" | base64 -d ``` ## Application Integration Example Here's how an application deployment would connect: ```yaml apiVersion: apps/v1 kind: Deployment metadata: name: example-app spec: template: spec: containers: - name: app image: example-app:latest env: - name: DB_HOST value: "postgresql-shared-rw.postgresql-system.svc.cluster.local" - name: DB_PORT value: "5432" - name: DB_NAME value: "shared_db" - name: DB_USER value: "shared_user" - name: DB_PASSWORD valueFrom: secretKeyRef: name: postgresql-shared-app key: password ``` ## Monitoring The PostgreSQL cluster includes comprehensive monitoring and observability: ### **Metrics & Monitoring** ✅ **OPERATIONAL** - **Metrics Port**: 9187 - PostgreSQL metrics endpoint - **ServiceMonitor**: Configured for OpenObserve integration - **Built-in Metrics**: CloudNativePG provides extensive default metrics including: - **Connection Metrics**: `cnpg_backends_total`, `cnpg_pg_settings_setting{name="max_connections"}` - **Performance Metrics**: `cnpg_pg_stat_database_xact_commit`, `cnpg_pg_stat_database_xact_rollback` - **Storage Metrics**: `cnpg_pg_database_size_bytes`, `cnpg_pg_stat_database_blks_hit`, `cnpg_pg_stat_database_blks_read` - **Cluster Health**: `cnpg_collector_up`, `cnpg_collector_postgres_version` - **Replication**: `cnpg_pg_stat_replication_*` metrics for streaming replication status ### **Custom Metrics System** - **ConfigMap Support**: Custom queries can be defined via ConfigMaps - **RBAC Configured**: PostgreSQL service account has permissions to read custom metrics ConfigMaps - **Predefined Queries**: CloudNativePG includes `cnpg-default-monitoring` ConfigMap with standard queries - **Monitoring Role**: Uses `pg_monitor` role for secure metrics collection ### **Dashboard Integration** - **OpenObserve Ready**: All metrics automatically ingested into OpenObserve - **Key Performance Indicators**: - Connection utilization: `cnpg_backends_total / cnpg_pg_settings_setting{name="max_connections"} * 100` - Buffer cache hit ratio: `cnpg_pg_stat_database_blks_hit / (cnpg_pg_stat_database_blks_hit + cnpg_pg_stat_database_blks_read) * 100` - Transaction rate: `rate(cnpg_pg_stat_database_xact_commit[5m])` - Rollback ratio: `cnpg_pg_stat_database_xact_rollback / (cnpg_pg_stat_database_xact_commit + cnpg_pg_stat_database_xact_rollback) * 100` ### **High Availability Monitoring** - **Automatic Failover**: CloudNativePG handles primary/replica failover automatically - **Health Checks**: Continuous health monitoring with automatic recovery - **Streaming Replication**: Real-time replication status monitoring ## Backup Strategy ### **Longhorn Storage-Level Backups (Incremental)** - **Daily backups**: 2 AM UTC, retain 14 days (2 weeks) - **Weekly backups**: 1 AM Sunday, retain 8 weeks (2 months) - **Snapshot cleanup**: 3 AM daily, keep 5 local snapshots - **Target**: Backblaze B2 S3 storage via existing setup - **Type**: Incremental (efficient change block detection) ### **CloudNativePG Application-Level Backups** - **WAL archiving**: Continuous transaction log archiving - **Point-in-time recovery**: Available via CloudNativePG - **Retention**: 30-day backup retention policy ### **Backup Labels** PostgreSQL volumes are automatically backed up based on labels: ```yaml backup.longhorn.io/enable: "true" app: postgresql-shared ``` ## Scaling To add more read replicas: ```yaml # Edit cluster-shared.yaml spec: instances: 4 # Increase from 3 to 4 for additional read replica ``` ## Troubleshooting ### **Cluster Status** ```bash # Check cluster status kubectl get cluster -n postgresql-system kubectl describe cluster postgresql-shared -n postgresql-system # Check pods kubectl get pods -n postgresql-system kubectl logs postgres-shared-2 -n postgresql-system # Current primary ``` ### **Monitoring & Metrics** ```bash # Check ServiceMonitor kubectl get servicemonitor -n postgresql-system kubectl describe servicemonitor postgresql-shared -n postgresql-system # Check metrics endpoint directly kubectl port-forward -n postgresql-system postgres-shared-2 9187:9187 # Primary instance curl http://localhost:9187/metrics # Check custom metrics ConfigMap kubectl get configmap -n postgresql-system kubectl describe configmap postgresql-dashboard-metrics -n postgresql-system # Check RBAC permissions kubectl get role,rolebinding -n postgresql-system kubectl describe rolebinding postgresql-dashboard-metrics-reader -n postgresql-system ``` ### **Port Forwarding** Port forwarding allows you to connect to PostgreSQL from your local machine using standard database tools. **⚠️ Important**: PostgreSQL requires SSL/TLS connections. When port forwarding, you must configure your client to handle SSL properly. **Read-Only Replica (Load Balanced):** ```bash # Forward to read-only service (load balances across all replicas) kubectl port-forward -n postgresql-system svc/postgresql-shared-ro 5432:5432 # Get the password for shared_user kubectl get secret postgres-shared-app -n postgresql-system -o jsonpath='{.data.password}' | base64 -d && echo # Connect with SSL required (recommended): # Connection string: postgresql://shared_user:@localhost:5432/shared_db?sslmode=require # Or configure your client: # - host: localhost # - port: 5432 # - database: shared_db # - username: shared_user # - password: # - SSL mode: require (or disable for testing only) ``` **Specific Replica Pod:** ```bash # List replica pods kubectl get pods -n postgresql-system -l cnpg.io/instanceRole=replica # Forward to specific replica pod (e.g., postgres-shared-4) kubectl port-forward -n postgresql-system pod/postgres-shared-4 5432:5432 # Get the password for shared_user kubectl get secret postgres-shared-app -n postgresql-system -o jsonpath='{.data.password}' | base64 -d && echo # Connect with SSL required (recommended): # Connection string: postgresql://shared_user:@localhost:5432/shared_db?sslmode=require # Or configure your client with SSL mode: require ``` **Primary (Read-Write) - For Maintenance Only:** ```bash # Forward to read-write service (connects to primary) kubectl port-forward -n postgresql-system svc/postgresql-shared-rw 5433:5432 # Note: Using port 5433 locally to avoid conflict if read-only is on 5432 # Get the password kubectl get secret postgres-shared-app -n postgresql-system -o jsonpath='{.data.password}' | base64 -d && echo # Connect using localhost:5433 with SSL mode: require ``` **SSL Configuration Notes:** - **SSL is enabled** on PostgreSQL (ssl = on) - For **port forwarding**, clients must explicitly configure SSL mode - The server uses self-signed certificates, so clients will need to accept untrusted certificates - For production clients connecting directly (not via port-forward), use proper SSL with CA verification **Troubleshooting Port Forward "Broken Pipe" Errors:** If you see `error: lost connection to pod` or `broken pipe` errors: 1. **Use direct pod port forwarding** instead of service port forwarding (more reliable): ```bash # List available replica pods kubectl get pods -n postgresql-system -l cnpg.io/instanceRole=replica # Forward to specific replica pod (more stable) kubectl port-forward -n postgresql-system pod/postgres-shared-4 5432:5432 ``` 2. **Configure your client with explicit SSL mode**: - Use `sslmode=require` in your connection string (recommended) - Or `sslmode=prefer` (allows fallback to non-SSL if SSL fails) - Or `sslmode=disable` for testing only (not recommended) 3. **Connection string examples**: ```bash # With SSL required (recommended) postgresql://shared_user:@localhost:5432/shared_db?sslmode=require # With SSL preferred (allows fallback) postgresql://shared_user:@localhost:5432/shared_db?sslmode=prefer # Without SSL (testing only) postgresql://shared_user:@localhost:5432/shared_db?sslmode=disable ``` **Getting the CA Certificate (for proper SSL verification):** ```bash # Get the CA certificate from the cluster secret kubectl get secret postgres-shared-ca -n postgresql-system -o jsonpath='{.data.ca\.crt}' | base64 -d > postgres-ca.crt # Use with your client: # Connection string: postgresql://shared_user:@localhost:5432/shared_db?sslmode=verify-ca&sslrootcert=postgres-ca.crt # Or configure your client to use the CA certificate file for SSL verification ``` ### **Database Connection** ```bash # Connect to PostgreSQL via exec kubectl exec -it postgres-shared-2 -n postgresql-system -- psql -U shared_user -d shared_db # Check replication status kubectl exec -it postgres-shared-2 -n postgresql-system -- psql -U postgres -c "SELECT * FROM pg_stat_replication;" # Check cluster health kubectl exec -it postgres-shared-2 -n postgresql-system -- psql -U postgres -c "SELECT pg_is_in_recovery();" ``` ### **Backup & Storage** ```bash # Check PVC status kubectl get pvc -n postgresql-system kubectl describe pvc postgres-shared-2 -n postgresql-system # Primary instance PVC # Check Longhorn volumes kubectl get volumes -n longhorn-system kubectl describe volume -n longhorn-system | grep postgresql ``` ### **Long Running Queries** When a long running query is happening, use this command on a node ```bash kubectl exec -n postgresql-system postgres-shared-2 -- psql -U postgres -c " SELECT pid, datname, usename, application_name, now() - xact_start AS tx_duration, now() - query_start AS query_duration, state, wait_event_type, wait_event, query FROM pg_stat_activity WHERE state != 'idle' AND query NOT LIKE '%pg_stat_activity%' AND (now() - xact_start > interval '10 seconds' OR now() - query_start > interval '10 seconds') ORDER BY GREATEST(now() - xact_start, now() - query_start) DESC; " ```