12 KiB
PostgreSQL Infrastructure
This directory contains the CloudNativePG setup for high-availability PostgreSQL on the Kubernetes cluster.
Architecture
- 3 PostgreSQL instances: 1 primary + 2 replicas for high availability
- Synchronous replication: Zero data loss (RPO=0) configuration
- Node distribution: Instances distributed across n1, n2, and n3 nodes
- Current cluster:
postgres-sharedwith instancespostgres-shared-2(primary),postgres-shared-4,postgres-shared-5 - Longhorn storage: Single replica (PostgreSQL handles replication)
- Shared cluster: One PostgreSQL cluster that applications can share
Components
Core Components
namespace.yaml: PostgreSQL system namespacerepository.yaml: CloudNativePG Helm repositoryoperator.yaml: CloudNativePG operator deploymentpostgresql-storageclass.yaml: Optimized storage class for PostgreSQLcluster-shared.yaml: Shared PostgreSQL cluster configuration
Monitoring Components
postgresql-dashboard-metrics.yaml: Custom metrics ConfigMap for enhanced monitoringpostgresql-dashboard-rbac.yaml: RBAC permissions for metrics collection- Built-in ServiceMonitor: Automatically configured for OpenObserve integration
Backup Components
backup-config.yaml: CloudNativePG backup configuration- Longhorn integration: S3 backup via label-based volume selection
Services Created
CloudNativePG automatically creates these services:
postgresql-shared-rw: Write operations (connects to primary)postgresql-shared-ro: Read-only operations (connects to replicas)postgresql-shared-r: Read operations (connects to any instance)
Connection Information
For Applications
Applications should connect using these connection parameters:
Write Operations:
host: postgresql-shared-rw.postgresql-system.svc.cluster.local
port: 5432
database: shared_db
username: shared_user
Read Operations:
host: postgresql-shared-ro.postgresql-system.svc.cluster.local
port: 5432
database: shared_db
username: shared_user
Getting Credentials
The PostgreSQL password is auto-generated and stored in a secret:
# Get the password for the shared_user
kubectl get secret postgresql-shared-app -n postgresql-system -o jsonpath="{.data.password}" | base64 -d
# Get the superuser password
kubectl get secret postgresql-shared-superuser -n postgresql-system -o jsonpath="{.data.password}" | base64 -d
Application Integration Example
Here's how an application deployment would connect:
apiVersion: apps/v1
kind: Deployment
metadata:
name: example-app
spec:
template:
spec:
containers:
- name: app
image: example-app:latest
env:
- name: DB_HOST
value: "postgresql-shared-rw.postgresql-system.svc.cluster.local"
- name: DB_PORT
value: "5432"
- name: DB_NAME
value: "shared_db"
- name: DB_USER
value: "shared_user"
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: postgresql-shared-app
key: password
Monitoring
The PostgreSQL cluster includes comprehensive monitoring and observability:
Metrics & Monitoring ✅ OPERATIONAL
- Metrics Port: 9187 - PostgreSQL metrics endpoint
- ServiceMonitor: Configured for OpenObserve integration
- Built-in Metrics: CloudNativePG provides extensive default metrics including:
- Connection Metrics:
cnpg_backends_total,cnpg_pg_settings_setting{name="max_connections"} - Performance Metrics:
cnpg_pg_stat_database_xact_commit,cnpg_pg_stat_database_xact_rollback - Storage Metrics:
cnpg_pg_database_size_bytes,cnpg_pg_stat_database_blks_hit,cnpg_pg_stat_database_blks_read - Cluster Health:
cnpg_collector_up,cnpg_collector_postgres_version - Replication:
cnpg_pg_stat_replication_*metrics for streaming replication status
- Connection Metrics:
Custom Metrics System
- ConfigMap Support: Custom queries can be defined via ConfigMaps
- RBAC Configured: PostgreSQL service account has permissions to read custom metrics ConfigMaps
- Predefined Queries: CloudNativePG includes
cnpg-default-monitoringConfigMap with standard queries - Monitoring Role: Uses
pg_monitorrole for secure metrics collection
Dashboard Integration
- OpenObserve Ready: All metrics automatically ingested into OpenObserve
- Key Performance Indicators:
- Connection utilization:
cnpg_backends_total / cnpg_pg_settings_setting{name="max_connections"} * 100 - Buffer cache hit ratio:
cnpg_pg_stat_database_blks_hit / (cnpg_pg_stat_database_blks_hit + cnpg_pg_stat_database_blks_read) * 100 - Transaction rate:
rate(cnpg_pg_stat_database_xact_commit[5m]) - Rollback ratio:
cnpg_pg_stat_database_xact_rollback / (cnpg_pg_stat_database_xact_commit + cnpg_pg_stat_database_xact_rollback) * 100
- Connection utilization:
High Availability Monitoring
- Automatic Failover: CloudNativePG handles primary/replica failover automatically
- Health Checks: Continuous health monitoring with automatic recovery
- Streaming Replication: Real-time replication status monitoring
Backup Strategy
Longhorn Storage-Level Backups (Incremental)
- Daily backups: 2 AM UTC, retain 14 days (2 weeks)
- Weekly backups: 1 AM Sunday, retain 8 weeks (2 months)
- Snapshot cleanup: 3 AM daily, keep 5 local snapshots
- Target: Backblaze B2 S3 storage via existing setup
- Type: Incremental (efficient change block detection)
CloudNativePG Application-Level Backups
- WAL archiving: Continuous transaction log archiving
- Point-in-time recovery: Available via CloudNativePG
- Retention: 30-day backup retention policy
Backup Labels
PostgreSQL volumes are automatically backed up based on labels:
backup.longhorn.io/enable: "true"
app: postgresql-shared
Scaling
To add more read replicas:
# Edit cluster-shared.yaml
spec:
instances: 4 # Increase from 3 to 4 for additional read replica
Troubleshooting
Cluster Status
# Check cluster status
kubectl get cluster -n postgresql-system
kubectl describe cluster postgresql-shared -n postgresql-system
# Check pods
kubectl get pods -n postgresql-system
kubectl logs postgres-shared-2 -n postgresql-system # Current primary
Monitoring & Metrics
# Check ServiceMonitor
kubectl get servicemonitor -n postgresql-system
kubectl describe servicemonitor postgresql-shared -n postgresql-system
# Check metrics endpoint directly
kubectl port-forward -n postgresql-system postgres-shared-2 9187:9187 # Primary instance
curl http://localhost:9187/metrics
# Check custom metrics ConfigMap
kubectl get configmap -n postgresql-system
kubectl describe configmap postgresql-dashboard-metrics -n postgresql-system
# Check RBAC permissions
kubectl get role,rolebinding -n postgresql-system
kubectl describe rolebinding postgresql-dashboard-metrics-reader -n postgresql-system
Port Forwarding
Port forwarding allows you to connect to PostgreSQL from your local machine using standard database tools.
⚠️ Important: PostgreSQL requires SSL/TLS connections. When port forwarding, you must configure your client to handle SSL properly.
Read-Only Replica (Load Balanced):
# Forward to read-only service (load balances across all replicas)
kubectl port-forward -n postgresql-system svc/postgresql-shared-ro 5432:5432
# Get the password for shared_user
kubectl get secret postgres-shared-app -n postgresql-system -o jsonpath='{.data.password}' | base64 -d && echo
# Connect with SSL required (recommended):
# Connection string: postgresql://shared_user:<password>@localhost:5432/shared_db?sslmode=require
# Or configure your client:
# - host: localhost
# - port: 5432
# - database: shared_db
# - username: shared_user
# - password: <from secret above>
# - SSL mode: require (or disable for testing only)
Specific Replica Pod:
# List replica pods
kubectl get pods -n postgresql-system -l cnpg.io/instanceRole=replica
# Forward to specific replica pod (e.g., postgres-shared-4)
kubectl port-forward -n postgresql-system pod/postgres-shared-4 5432:5432
# Get the password for shared_user
kubectl get secret postgres-shared-app -n postgresql-system -o jsonpath='{.data.password}' | base64 -d && echo
# Connect with SSL required (recommended):
# Connection string: postgresql://shared_user:<password>@localhost:5432/shared_db?sslmode=require
# Or configure your client with SSL mode: require
Primary (Read-Write) - For Maintenance Only:
# Forward to read-write service (connects to primary)
kubectl port-forward -n postgresql-system svc/postgresql-shared-rw 5433:5432
# Note: Using port 5433 locally to avoid conflict if read-only is on 5432
# Get the password
kubectl get secret postgres-shared-app -n postgresql-system -o jsonpath='{.data.password}' | base64 -d && echo
# Connect using localhost:5433 with SSL mode: require
SSL Configuration Notes:
- SSL is enabled on PostgreSQL (ssl = on)
- For port forwarding, clients must explicitly configure SSL mode
- The server uses self-signed certificates, so clients will need to accept untrusted certificates
- For production clients connecting directly (not via port-forward), use proper SSL with CA verification
Troubleshooting Port Forward "Broken Pipe" Errors:
If you see error: lost connection to pod or broken pipe errors:
-
Use direct pod port forwarding instead of service port forwarding (more reliable):
# List available replica pods kubectl get pods -n postgresql-system -l cnpg.io/instanceRole=replica # Forward to specific replica pod (more stable) kubectl port-forward -n postgresql-system pod/postgres-shared-4 5432:5432 -
Configure your client with explicit SSL mode:
- Use
sslmode=requirein your connection string (recommended) - Or
sslmode=prefer(allows fallback to non-SSL if SSL fails) - Or
sslmode=disablefor testing only (not recommended)
- Use
-
Connection string examples:
# With SSL required (recommended) postgresql://shared_user:<password>@localhost:5432/shared_db?sslmode=require # With SSL preferred (allows fallback) postgresql://shared_user:<password>@localhost:5432/shared_db?sslmode=prefer # Without SSL (testing only) postgresql://shared_user:<password>@localhost:5432/shared_db?sslmode=disable
Getting the CA Certificate (for proper SSL verification):
# Get the CA certificate from the cluster secret
kubectl get secret postgres-shared-ca -n postgresql-system -o jsonpath='{.data.ca\.crt}' | base64 -d > postgres-ca.crt
# Use with your client:
# Connection string: postgresql://shared_user:<password>@localhost:5432/shared_db?sslmode=verify-ca&sslrootcert=postgres-ca.crt
# Or configure your client to use the CA certificate file for SSL verification
Database Connection
# Connect to PostgreSQL via exec
kubectl exec -it postgres-shared-2 -n postgresql-system -- psql -U shared_user -d shared_db
# Check replication status
kubectl exec -it postgres-shared-2 -n postgresql-system -- psql -U postgres -c "SELECT * FROM pg_stat_replication;"
# Check cluster health
kubectl exec -it postgres-shared-2 -n postgresql-system -- psql -U postgres -c "SELECT pg_is_in_recovery();"
Backup & Storage
# Check PVC status
kubectl get pvc -n postgresql-system
kubectl describe pvc postgres-shared-2 -n postgresql-system # Primary instance PVC
# Check Longhorn volumes
kubectl get volumes -n longhorn-system
kubectl describe volume -n longhorn-system | grep postgresql
Long Running Queries
When a long running query is happening, use this command on a node
kubectl exec -n postgresql-system postgres-shared-2 -- psql -U postgres -c "
SELECT
pid,
datname,
usename,
application_name,
now() - xact_start AS tx_duration,
now() - query_start AS query_duration,
state,
wait_event_type,
wait_event,
query
FROM pg_stat_activity
WHERE state != 'idle'
AND query NOT LIKE '%pg_stat_activity%'
AND (now() - xact_start > interval '10 seconds' OR now() - query_start > interval '10 seconds')
ORDER BY GREATEST(now() - xact_start, now() - query_start) DESC;
"