Add the redacted source file for demo purposes Reviewed-on: https://source.michaeldileo.org/michael_dileo/Keybard-Vagabond-Demo/pulls/1 Co-authored-by: Michael DiLeo <michael_dileo@proton.me> Co-committed-by: Michael DiLeo <michael_dileo@proton.me>
341 lines
12 KiB
Markdown
341 lines
12 KiB
Markdown
# PostgreSQL Infrastructure
|
|
|
|
This directory contains the CloudNativePG setup for high-availability PostgreSQL on the Kubernetes cluster.
|
|
|
|
## Architecture
|
|
|
|
- **3 PostgreSQL instances**: 1 primary + 2 replicas for high availability
|
|
- **Synchronous replication**: Zero data loss (RPO=0) configuration
|
|
- **Node distribution**: Instances distributed across n1, n2, and n3 nodes
|
|
- **Current cluster**: `postgres-shared` with instances `postgres-shared-2` (primary), `postgres-shared-4`, `postgres-shared-5`
|
|
- **Longhorn storage**: Single replica (PostgreSQL handles replication)
|
|
- **Shared cluster**: One PostgreSQL cluster that applications can share
|
|
|
|
## Components
|
|
|
|
### **Core Components**
|
|
- `namespace.yaml`: PostgreSQL system namespace
|
|
- `repository.yaml`: CloudNativePG Helm repository
|
|
- `operator.yaml`: CloudNativePG operator deployment
|
|
- `postgresql-storageclass.yaml`: Optimized storage class for PostgreSQL
|
|
- `cluster-shared.yaml`: Shared PostgreSQL cluster configuration
|
|
|
|
### **Monitoring Components**
|
|
- `postgresql-dashboard-metrics.yaml`: Custom metrics ConfigMap for enhanced monitoring
|
|
- `postgresql-dashboard-rbac.yaml`: RBAC permissions for metrics collection
|
|
- Built-in ServiceMonitor: Automatically configured for OpenObserve integration
|
|
|
|
### **Backup Components**
|
|
- `backup-config.yaml`: CloudNativePG backup configuration
|
|
- Longhorn integration: S3 backup via label-based volume selection
|
|
|
|
## Services Created
|
|
|
|
CloudNativePG automatically creates these services:
|
|
|
|
- `postgresql-shared-rw`: Write operations (connects to primary)
|
|
- `postgresql-shared-ro`: Read-only operations (connects to replicas)
|
|
- `postgresql-shared-r`: Read operations (connects to any instance)
|
|
|
|
## Connection Information
|
|
|
|
### For Applications
|
|
|
|
Applications should connect using these connection parameters:
|
|
|
|
**Write Operations:**
|
|
```yaml
|
|
host: postgresql-shared-rw.postgresql-system.svc.cluster.local
|
|
port: 5432
|
|
database: shared_db
|
|
username: shared_user
|
|
```
|
|
|
|
**Read Operations:**
|
|
```yaml
|
|
host: postgresql-shared-ro.postgresql-system.svc.cluster.local
|
|
port: 5432
|
|
database: shared_db
|
|
username: shared_user
|
|
```
|
|
|
|
### Getting Credentials
|
|
|
|
The PostgreSQL password is auto-generated and stored in a secret:
|
|
|
|
```bash
|
|
# Get the password for the shared_user
|
|
kubectl get secret postgresql-shared-app -n postgresql-system -o jsonpath="{.data.password}" | base64 -d
|
|
|
|
# Get the superuser password
|
|
kubectl get secret postgresql-shared-superuser -n postgresql-system -o jsonpath="{.data.password}" | base64 -d
|
|
```
|
|
|
|
## Application Integration Example
|
|
|
|
Here's how an application deployment would connect:
|
|
|
|
```yaml
|
|
apiVersion: apps/v1
|
|
kind: Deployment
|
|
metadata:
|
|
name: example-app
|
|
spec:
|
|
template:
|
|
spec:
|
|
containers:
|
|
- name: app
|
|
image: example-app:latest
|
|
env:
|
|
- name: DB_HOST
|
|
value: "postgresql-shared-rw.postgresql-system.svc.cluster.local"
|
|
- name: DB_PORT
|
|
value: "5432"
|
|
- name: DB_NAME
|
|
value: "shared_db"
|
|
- name: DB_USER
|
|
value: "shared_user"
|
|
- name: DB_PASSWORD
|
|
valueFrom:
|
|
secretKeyRef:
|
|
name: postgresql-shared-app
|
|
key: password
|
|
```
|
|
|
|
## Monitoring
|
|
|
|
The PostgreSQL cluster includes comprehensive monitoring and observability:
|
|
|
|
### **Metrics & Monitoring** ✅ **OPERATIONAL**
|
|
- **Metrics Port**: 9187 - PostgreSQL metrics endpoint
|
|
- **ServiceMonitor**: Configured for OpenObserve integration
|
|
- **Built-in Metrics**: CloudNativePG provides extensive default metrics including:
|
|
- **Connection Metrics**: `cnpg_backends_total`, `cnpg_pg_settings_setting{name="max_connections"}`
|
|
- **Performance Metrics**: `cnpg_pg_stat_database_xact_commit`, `cnpg_pg_stat_database_xact_rollback`
|
|
- **Storage Metrics**: `cnpg_pg_database_size_bytes`, `cnpg_pg_stat_database_blks_hit`, `cnpg_pg_stat_database_blks_read`
|
|
- **Cluster Health**: `cnpg_collector_up`, `cnpg_collector_postgres_version`
|
|
- **Replication**: `cnpg_pg_stat_replication_*` metrics for streaming replication status
|
|
|
|
### **Custom Metrics System**
|
|
- **ConfigMap Support**: Custom queries can be defined via ConfigMaps
|
|
- **RBAC Configured**: PostgreSQL service account has permissions to read custom metrics ConfigMaps
|
|
- **Predefined Queries**: CloudNativePG includes `cnpg-default-monitoring` ConfigMap with standard queries
|
|
- **Monitoring Role**: Uses `pg_monitor` role for secure metrics collection
|
|
|
|
### **Dashboard Integration**
|
|
- **OpenObserve Ready**: All metrics automatically ingested into OpenObserve
|
|
- **Key Performance Indicators**:
|
|
- Connection utilization: `cnpg_backends_total / cnpg_pg_settings_setting{name="max_connections"} * 100`
|
|
- Buffer cache hit ratio: `cnpg_pg_stat_database_blks_hit / (cnpg_pg_stat_database_blks_hit + cnpg_pg_stat_database_blks_read) * 100`
|
|
- Transaction rate: `rate(cnpg_pg_stat_database_xact_commit[5m])`
|
|
- Rollback ratio: `cnpg_pg_stat_database_xact_rollback / (cnpg_pg_stat_database_xact_commit + cnpg_pg_stat_database_xact_rollback) * 100`
|
|
|
|
### **High Availability Monitoring**
|
|
- **Automatic Failover**: CloudNativePG handles primary/replica failover automatically
|
|
- **Health Checks**: Continuous health monitoring with automatic recovery
|
|
- **Streaming Replication**: Real-time replication status monitoring
|
|
|
|
## Backup Strategy
|
|
|
|
### **Longhorn Storage-Level Backups (Incremental)**
|
|
- **Daily backups**: 2 AM UTC, retain 14 days (2 weeks)
|
|
- **Weekly backups**: 1 AM Sunday, retain 8 weeks (2 months)
|
|
- **Snapshot cleanup**: 3 AM daily, keep 5 local snapshots
|
|
- **Target**: Backblaze B2 S3 storage via existing setup
|
|
- **Type**: Incremental (efficient change block detection)
|
|
|
|
### **CloudNativePG Application-Level Backups**
|
|
- **WAL archiving**: Continuous transaction log archiving
|
|
- **Point-in-time recovery**: Available via CloudNativePG
|
|
- **Retention**: 30-day backup retention policy
|
|
|
|
### **Backup Labels**
|
|
PostgreSQL volumes are automatically backed up based on labels:
|
|
```yaml
|
|
backup.longhorn.io/enable: "true"
|
|
app: postgresql-shared
|
|
```
|
|
|
|
## Scaling
|
|
|
|
To add more read replicas:
|
|
```yaml
|
|
# Edit cluster-shared.yaml
|
|
spec:
|
|
instances: 4 # Increase from 3 to 4 for additional read replica
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### **Cluster Status**
|
|
```bash
|
|
# Check cluster status
|
|
kubectl get cluster -n postgresql-system
|
|
kubectl describe cluster postgresql-shared -n postgresql-system
|
|
|
|
# Check pods
|
|
kubectl get pods -n postgresql-system
|
|
kubectl logs postgres-shared-2 -n postgresql-system # Current primary
|
|
```
|
|
|
|
### **Monitoring & Metrics**
|
|
```bash
|
|
# Check ServiceMonitor
|
|
kubectl get servicemonitor -n postgresql-system
|
|
kubectl describe servicemonitor postgresql-shared -n postgresql-system
|
|
|
|
# Check metrics endpoint directly
|
|
kubectl port-forward -n postgresql-system postgres-shared-2 9187:9187 # Primary instance
|
|
curl http://localhost:9187/metrics
|
|
|
|
# Check custom metrics ConfigMap
|
|
kubectl get configmap -n postgresql-system
|
|
kubectl describe configmap postgresql-dashboard-metrics -n postgresql-system
|
|
|
|
# Check RBAC permissions
|
|
kubectl get role,rolebinding -n postgresql-system
|
|
kubectl describe rolebinding postgresql-dashboard-metrics-reader -n postgresql-system
|
|
```
|
|
|
|
### **Port Forwarding**
|
|
|
|
Port forwarding allows you to connect to PostgreSQL from your local machine using standard database tools.
|
|
|
|
**⚠️ Important**: PostgreSQL requires SSL/TLS connections. When port forwarding, you must configure your client to handle SSL properly.
|
|
|
|
**Read-Only Replica (Load Balanced):**
|
|
```bash
|
|
# Forward to read-only service (load balances across all replicas)
|
|
kubectl port-forward -n postgresql-system svc/postgresql-shared-ro 5432:5432
|
|
|
|
# Get the password for shared_user
|
|
kubectl get secret postgres-shared-app -n postgresql-system -o jsonpath='{.data.password}' | base64 -d && echo
|
|
|
|
# Connect with SSL required (recommended):
|
|
# Connection string: postgresql://shared_user:<password>@localhost:5432/shared_db?sslmode=require
|
|
# Or configure your client:
|
|
# - host: localhost
|
|
# - port: 5432
|
|
# - database: shared_db
|
|
# - username: shared_user
|
|
# - password: <from secret above>
|
|
# - SSL mode: require (or disable for testing only)
|
|
```
|
|
|
|
**Specific Replica Pod:**
|
|
```bash
|
|
# List replica pods
|
|
kubectl get pods -n postgresql-system -l cnpg.io/instanceRole=replica
|
|
|
|
# Forward to specific replica pod (e.g., postgres-shared-4)
|
|
kubectl port-forward -n postgresql-system pod/postgres-shared-4 5432:5432
|
|
|
|
# Get the password for shared_user
|
|
kubectl get secret postgres-shared-app -n postgresql-system -o jsonpath='{.data.password}' | base64 -d && echo
|
|
|
|
# Connect with SSL required (recommended):
|
|
# Connection string: postgresql://shared_user:<password>@localhost:5432/shared_db?sslmode=require
|
|
# Or configure your client with SSL mode: require
|
|
```
|
|
|
|
**Primary (Read-Write) - For Maintenance Only:**
|
|
```bash
|
|
# Forward to read-write service (connects to primary)
|
|
kubectl port-forward -n postgresql-system svc/postgresql-shared-rw 5433:5432
|
|
|
|
# Note: Using port 5433 locally to avoid conflict if read-only is on 5432
|
|
# Get the password
|
|
kubectl get secret postgres-shared-app -n postgresql-system -o jsonpath='{.data.password}' | base64 -d && echo
|
|
|
|
# Connect using localhost:5433 with SSL mode: require
|
|
```
|
|
|
|
**SSL Configuration Notes:**
|
|
- **SSL is enabled** on PostgreSQL (ssl = on)
|
|
- For **port forwarding**, clients must explicitly configure SSL mode
|
|
- The server uses self-signed certificates, so clients will need to accept untrusted certificates
|
|
- For production clients connecting directly (not via port-forward), use proper SSL with CA verification
|
|
|
|
**Troubleshooting Port Forward "Broken Pipe" Errors:**
|
|
If you see `error: lost connection to pod` or `broken pipe` errors:
|
|
1. **Use direct pod port forwarding** instead of service port forwarding (more reliable):
|
|
```bash
|
|
# List available replica pods
|
|
kubectl get pods -n postgresql-system -l cnpg.io/instanceRole=replica
|
|
|
|
# Forward to specific replica pod (more stable)
|
|
kubectl port-forward -n postgresql-system pod/postgres-shared-4 5432:5432
|
|
```
|
|
|
|
2. **Configure your client with explicit SSL mode**:
|
|
- Use `sslmode=require` in your connection string (recommended)
|
|
- Or `sslmode=prefer` (allows fallback to non-SSL if SSL fails)
|
|
- Or `sslmode=disable` for testing only (not recommended)
|
|
|
|
3. **Connection string examples**:
|
|
```bash
|
|
# With SSL required (recommended)
|
|
postgresql://shared_user:<password>@localhost:5432/shared_db?sslmode=require
|
|
|
|
# With SSL preferred (allows fallback)
|
|
postgresql://shared_user:<password>@localhost:5432/shared_db?sslmode=prefer
|
|
|
|
# Without SSL (testing only)
|
|
postgresql://shared_user:<password>@localhost:5432/shared_db?sslmode=disable
|
|
```
|
|
|
|
**Getting the CA Certificate (for proper SSL verification):**
|
|
```bash
|
|
# Get the CA certificate from the cluster secret
|
|
kubectl get secret postgres-shared-ca -n postgresql-system -o jsonpath='{.data.ca\.crt}' | base64 -d > postgres-ca.crt
|
|
|
|
# Use with your client:
|
|
# Connection string: postgresql://shared_user:<password>@localhost:5432/shared_db?sslmode=verify-ca&sslrootcert=postgres-ca.crt
|
|
# Or configure your client to use the CA certificate file for SSL verification
|
|
```
|
|
|
|
### **Database Connection**
|
|
```bash
|
|
# Connect to PostgreSQL via exec
|
|
kubectl exec -it postgres-shared-2 -n postgresql-system -- psql -U shared_user -d shared_db
|
|
|
|
# Check replication status
|
|
kubectl exec -it postgres-shared-2 -n postgresql-system -- psql -U postgres -c "SELECT * FROM pg_stat_replication;"
|
|
|
|
# Check cluster health
|
|
kubectl exec -it postgres-shared-2 -n postgresql-system -- psql -U postgres -c "SELECT pg_is_in_recovery();"
|
|
```
|
|
|
|
### **Backup & Storage**
|
|
```bash
|
|
# Check PVC status
|
|
kubectl get pvc -n postgresql-system
|
|
kubectl describe pvc postgres-shared-2 -n postgresql-system # Primary instance PVC
|
|
|
|
# Check Longhorn volumes
|
|
kubectl get volumes -n longhorn-system
|
|
kubectl describe volume -n longhorn-system | grep postgresql
|
|
```
|
|
|
|
### **Long Running Queries**
|
|
When a long running query is happening, use this command on a node
|
|
```bash
|
|
kubectl exec -n postgresql-system postgres-shared-2 -- psql -U postgres -c "
|
|
SELECT
|
|
pid,
|
|
datname,
|
|
usename,
|
|
application_name,
|
|
now() - xact_start AS tx_duration,
|
|
now() - query_start AS query_duration,
|
|
state,
|
|
wait_event_type,
|
|
wait_event,
|
|
query
|
|
FROM pg_stat_activity
|
|
WHERE state != 'idle'
|
|
AND query NOT LIKE '%pg_stat_activity%'
|
|
AND (now() - xact_start > interval '10 seconds' OR now() - query_start > interval '10 seconds')
|
|
ORDER BY GREATEST(now() - xact_start, now() - query_start) DESC;
|
|
"
|
|
``` |