redaction (#1)
Add the redacted source file for demo purposes Reviewed-on: https://source.michaeldileo.org/michael_dileo/Keybard-Vagabond-Demo/pulls/1 Co-authored-by: Michael DiLeo <michael_dileo@proton.me> Co-committed-by: Michael DiLeo <michael_dileo@proton.me>
This commit was merged in pull request #1.
This commit is contained in:
341
manifests/infrastructure/postgresql/README.md
Normal file
341
manifests/infrastructure/postgresql/README.md
Normal file
@@ -0,0 +1,341 @@
|
||||
# PostgreSQL Infrastructure
|
||||
|
||||
This directory contains the CloudNativePG setup for high-availability PostgreSQL on the Kubernetes cluster.
|
||||
|
||||
## Architecture
|
||||
|
||||
- **3 PostgreSQL instances**: 1 primary + 2 replicas for high availability
|
||||
- **Synchronous replication**: Zero data loss (RPO=0) configuration
|
||||
- **Node distribution**: Instances distributed across n1, n2, and n3 nodes
|
||||
- **Current cluster**: `postgres-shared` with instances `postgres-shared-2` (primary), `postgres-shared-4`, `postgres-shared-5`
|
||||
- **Longhorn storage**: Single replica (PostgreSQL handles replication)
|
||||
- **Shared cluster**: One PostgreSQL cluster that applications can share
|
||||
|
||||
## Components
|
||||
|
||||
### **Core Components**
|
||||
- `namespace.yaml`: PostgreSQL system namespace
|
||||
- `repository.yaml`: CloudNativePG Helm repository
|
||||
- `operator.yaml`: CloudNativePG operator deployment
|
||||
- `postgresql-storageclass.yaml`: Optimized storage class for PostgreSQL
|
||||
- `cluster-shared.yaml`: Shared PostgreSQL cluster configuration
|
||||
|
||||
### **Monitoring Components**
|
||||
- `postgresql-dashboard-metrics.yaml`: Custom metrics ConfigMap for enhanced monitoring
|
||||
- `postgresql-dashboard-rbac.yaml`: RBAC permissions for metrics collection
|
||||
- Built-in ServiceMonitor: Automatically configured for OpenObserve integration
|
||||
|
||||
### **Backup Components**
|
||||
- `backup-config.yaml`: CloudNativePG backup configuration
|
||||
- Longhorn integration: S3 backup via label-based volume selection
|
||||
|
||||
## Services Created
|
||||
|
||||
CloudNativePG automatically creates these services:
|
||||
|
||||
- `postgresql-shared-rw`: Write operations (connects to primary)
|
||||
- `postgresql-shared-ro`: Read-only operations (connects to replicas)
|
||||
- `postgresql-shared-r`: Read operations (connects to any instance)
|
||||
|
||||
## Connection Information
|
||||
|
||||
### For Applications
|
||||
|
||||
Applications should connect using these connection parameters:
|
||||
|
||||
**Write Operations:**
|
||||
```yaml
|
||||
host: postgresql-shared-rw.postgresql-system.svc.cluster.local
|
||||
port: 5432
|
||||
database: shared_db
|
||||
username: shared_user
|
||||
```
|
||||
|
||||
**Read Operations:**
|
||||
```yaml
|
||||
host: postgresql-shared-ro.postgresql-system.svc.cluster.local
|
||||
port: 5432
|
||||
database: shared_db
|
||||
username: shared_user
|
||||
```
|
||||
|
||||
### Getting Credentials
|
||||
|
||||
The PostgreSQL password is auto-generated and stored in a secret:
|
||||
|
||||
```bash
|
||||
# Get the password for the shared_user
|
||||
kubectl get secret postgresql-shared-app -n postgresql-system -o jsonpath="{.data.password}" | base64 -d
|
||||
|
||||
# Get the superuser password
|
||||
kubectl get secret postgresql-shared-superuser -n postgresql-system -o jsonpath="{.data.password}" | base64 -d
|
||||
```
|
||||
|
||||
## Application Integration Example
|
||||
|
||||
Here's how an application deployment would connect:
|
||||
|
||||
```yaml
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: example-app
|
||||
spec:
|
||||
template:
|
||||
spec:
|
||||
containers:
|
||||
- name: app
|
||||
image: example-app:latest
|
||||
env:
|
||||
- name: DB_HOST
|
||||
value: "postgresql-shared-rw.postgresql-system.svc.cluster.local"
|
||||
- name: DB_PORT
|
||||
value: "5432"
|
||||
- name: DB_NAME
|
||||
value: "shared_db"
|
||||
- name: DB_USER
|
||||
value: "shared_user"
|
||||
- name: DB_PASSWORD
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: postgresql-shared-app
|
||||
key: password
|
||||
```
|
||||
|
||||
## Monitoring
|
||||
|
||||
The PostgreSQL cluster includes comprehensive monitoring and observability:
|
||||
|
||||
### **Metrics & Monitoring** ✅ **OPERATIONAL**
|
||||
- **Metrics Port**: 9187 - PostgreSQL metrics endpoint
|
||||
- **ServiceMonitor**: Configured for OpenObserve integration
|
||||
- **Built-in Metrics**: CloudNativePG provides extensive default metrics including:
|
||||
- **Connection Metrics**: `cnpg_backends_total`, `cnpg_pg_settings_setting{name="max_connections"}`
|
||||
- **Performance Metrics**: `cnpg_pg_stat_database_xact_commit`, `cnpg_pg_stat_database_xact_rollback`
|
||||
- **Storage Metrics**: `cnpg_pg_database_size_bytes`, `cnpg_pg_stat_database_blks_hit`, `cnpg_pg_stat_database_blks_read`
|
||||
- **Cluster Health**: `cnpg_collector_up`, `cnpg_collector_postgres_version`
|
||||
- **Replication**: `cnpg_pg_stat_replication_*` metrics for streaming replication status
|
||||
|
||||
### **Custom Metrics System**
|
||||
- **ConfigMap Support**: Custom queries can be defined via ConfigMaps
|
||||
- **RBAC Configured**: PostgreSQL service account has permissions to read custom metrics ConfigMaps
|
||||
- **Predefined Queries**: CloudNativePG includes `cnpg-default-monitoring` ConfigMap with standard queries
|
||||
- **Monitoring Role**: Uses `pg_monitor` role for secure metrics collection
|
||||
|
||||
### **Dashboard Integration**
|
||||
- **OpenObserve Ready**: All metrics automatically ingested into OpenObserve
|
||||
- **Key Performance Indicators**:
|
||||
- Connection utilization: `cnpg_backends_total / cnpg_pg_settings_setting{name="max_connections"} * 100`
|
||||
- Buffer cache hit ratio: `cnpg_pg_stat_database_blks_hit / (cnpg_pg_stat_database_blks_hit + cnpg_pg_stat_database_blks_read) * 100`
|
||||
- Transaction rate: `rate(cnpg_pg_stat_database_xact_commit[5m])`
|
||||
- Rollback ratio: `cnpg_pg_stat_database_xact_rollback / (cnpg_pg_stat_database_xact_commit + cnpg_pg_stat_database_xact_rollback) * 100`
|
||||
|
||||
### **High Availability Monitoring**
|
||||
- **Automatic Failover**: CloudNativePG handles primary/replica failover automatically
|
||||
- **Health Checks**: Continuous health monitoring with automatic recovery
|
||||
- **Streaming Replication**: Real-time replication status monitoring
|
||||
|
||||
## Backup Strategy
|
||||
|
||||
### **Longhorn Storage-Level Backups (Incremental)**
|
||||
- **Daily backups**: 2 AM UTC, retain 14 days (2 weeks)
|
||||
- **Weekly backups**: 1 AM Sunday, retain 8 weeks (2 months)
|
||||
- **Snapshot cleanup**: 3 AM daily, keep 5 local snapshots
|
||||
- **Target**: Backblaze B2 S3 storage via existing setup
|
||||
- **Type**: Incremental (efficient change block detection)
|
||||
|
||||
### **CloudNativePG Application-Level Backups**
|
||||
- **WAL archiving**: Continuous transaction log archiving
|
||||
- **Point-in-time recovery**: Available via CloudNativePG
|
||||
- **Retention**: 30-day backup retention policy
|
||||
|
||||
### **Backup Labels**
|
||||
PostgreSQL volumes are automatically backed up based on labels:
|
||||
```yaml
|
||||
backup.longhorn.io/enable: "true"
|
||||
app: postgresql-shared
|
||||
```
|
||||
|
||||
## Scaling
|
||||
|
||||
To add more read replicas:
|
||||
```yaml
|
||||
# Edit cluster-shared.yaml
|
||||
spec:
|
||||
instances: 4 # Increase from 3 to 4 for additional read replica
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### **Cluster Status**
|
||||
```bash
|
||||
# Check cluster status
|
||||
kubectl get cluster -n postgresql-system
|
||||
kubectl describe cluster postgresql-shared -n postgresql-system
|
||||
|
||||
# Check pods
|
||||
kubectl get pods -n postgresql-system
|
||||
kubectl logs postgres-shared-2 -n postgresql-system # Current primary
|
||||
```
|
||||
|
||||
### **Monitoring & Metrics**
|
||||
```bash
|
||||
# Check ServiceMonitor
|
||||
kubectl get servicemonitor -n postgresql-system
|
||||
kubectl describe servicemonitor postgresql-shared -n postgresql-system
|
||||
|
||||
# Check metrics endpoint directly
|
||||
kubectl port-forward -n postgresql-system postgres-shared-2 9187:9187 # Primary instance
|
||||
curl http://localhost:9187/metrics
|
||||
|
||||
# Check custom metrics ConfigMap
|
||||
kubectl get configmap -n postgresql-system
|
||||
kubectl describe configmap postgresql-dashboard-metrics -n postgresql-system
|
||||
|
||||
# Check RBAC permissions
|
||||
kubectl get role,rolebinding -n postgresql-system
|
||||
kubectl describe rolebinding postgresql-dashboard-metrics-reader -n postgresql-system
|
||||
```
|
||||
|
||||
### **Port Forwarding**
|
||||
|
||||
Port forwarding allows you to connect to PostgreSQL from your local machine using standard database tools.
|
||||
|
||||
**⚠️ Important**: PostgreSQL requires SSL/TLS connections. When port forwarding, you must configure your client to handle SSL properly.
|
||||
|
||||
**Read-Only Replica (Load Balanced):**
|
||||
```bash
|
||||
# Forward to read-only service (load balances across all replicas)
|
||||
kubectl port-forward -n postgresql-system svc/postgresql-shared-ro 5432:5432
|
||||
|
||||
# Get the password for shared_user
|
||||
kubectl get secret postgres-shared-app -n postgresql-system -o jsonpath='{.data.password}' | base64 -d && echo
|
||||
|
||||
# Connect with SSL required (recommended):
|
||||
# Connection string: postgresql://shared_user:<password>@localhost:5432/shared_db?sslmode=require
|
||||
# Or configure your client:
|
||||
# - host: localhost
|
||||
# - port: 5432
|
||||
# - database: shared_db
|
||||
# - username: shared_user
|
||||
# - password: <from secret above>
|
||||
# - SSL mode: require (or disable for testing only)
|
||||
```
|
||||
|
||||
**Specific Replica Pod:**
|
||||
```bash
|
||||
# List replica pods
|
||||
kubectl get pods -n postgresql-system -l cnpg.io/instanceRole=replica
|
||||
|
||||
# Forward to specific replica pod (e.g., postgres-shared-4)
|
||||
kubectl port-forward -n postgresql-system pod/postgres-shared-4 5432:5432
|
||||
|
||||
# Get the password for shared_user
|
||||
kubectl get secret postgres-shared-app -n postgresql-system -o jsonpath='{.data.password}' | base64 -d && echo
|
||||
|
||||
# Connect with SSL required (recommended):
|
||||
# Connection string: postgresql://shared_user:<password>@localhost:5432/shared_db?sslmode=require
|
||||
# Or configure your client with SSL mode: require
|
||||
```
|
||||
|
||||
**Primary (Read-Write) - For Maintenance Only:**
|
||||
```bash
|
||||
# Forward to read-write service (connects to primary)
|
||||
kubectl port-forward -n postgresql-system svc/postgresql-shared-rw 5433:5432
|
||||
|
||||
# Note: Using port 5433 locally to avoid conflict if read-only is on 5432
|
||||
# Get the password
|
||||
kubectl get secret postgres-shared-app -n postgresql-system -o jsonpath='{.data.password}' | base64 -d && echo
|
||||
|
||||
# Connect using localhost:5433 with SSL mode: require
|
||||
```
|
||||
|
||||
**SSL Configuration Notes:**
|
||||
- **SSL is enabled** on PostgreSQL (ssl = on)
|
||||
- For **port forwarding**, clients must explicitly configure SSL mode
|
||||
- The server uses self-signed certificates, so clients will need to accept untrusted certificates
|
||||
- For production clients connecting directly (not via port-forward), use proper SSL with CA verification
|
||||
|
||||
**Troubleshooting Port Forward "Broken Pipe" Errors:**
|
||||
If you see `error: lost connection to pod` or `broken pipe` errors:
|
||||
1. **Use direct pod port forwarding** instead of service port forwarding (more reliable):
|
||||
```bash
|
||||
# List available replica pods
|
||||
kubectl get pods -n postgresql-system -l cnpg.io/instanceRole=replica
|
||||
|
||||
# Forward to specific replica pod (more stable)
|
||||
kubectl port-forward -n postgresql-system pod/postgres-shared-4 5432:5432
|
||||
```
|
||||
|
||||
2. **Configure your client with explicit SSL mode**:
|
||||
- Use `sslmode=require` in your connection string (recommended)
|
||||
- Or `sslmode=prefer` (allows fallback to non-SSL if SSL fails)
|
||||
- Or `sslmode=disable` for testing only (not recommended)
|
||||
|
||||
3. **Connection string examples**:
|
||||
```bash
|
||||
# With SSL required (recommended)
|
||||
postgresql://shared_user:<password>@localhost:5432/shared_db?sslmode=require
|
||||
|
||||
# With SSL preferred (allows fallback)
|
||||
postgresql://shared_user:<password>@localhost:5432/shared_db?sslmode=prefer
|
||||
|
||||
# Without SSL (testing only)
|
||||
postgresql://shared_user:<password>@localhost:5432/shared_db?sslmode=disable
|
||||
```
|
||||
|
||||
**Getting the CA Certificate (for proper SSL verification):**
|
||||
```bash
|
||||
# Get the CA certificate from the cluster secret
|
||||
kubectl get secret postgres-shared-ca -n postgresql-system -o jsonpath='{.data.ca\.crt}' | base64 -d > postgres-ca.crt
|
||||
|
||||
# Use with your client:
|
||||
# Connection string: postgresql://shared_user:<password>@localhost:5432/shared_db?sslmode=verify-ca&sslrootcert=postgres-ca.crt
|
||||
# Or configure your client to use the CA certificate file for SSL verification
|
||||
```
|
||||
|
||||
### **Database Connection**
|
||||
```bash
|
||||
# Connect to PostgreSQL via exec
|
||||
kubectl exec -it postgres-shared-2 -n postgresql-system -- psql -U shared_user -d shared_db
|
||||
|
||||
# Check replication status
|
||||
kubectl exec -it postgres-shared-2 -n postgresql-system -- psql -U postgres -c "SELECT * FROM pg_stat_replication;"
|
||||
|
||||
# Check cluster health
|
||||
kubectl exec -it postgres-shared-2 -n postgresql-system -- psql -U postgres -c "SELECT pg_is_in_recovery();"
|
||||
```
|
||||
|
||||
### **Backup & Storage**
|
||||
```bash
|
||||
# Check PVC status
|
||||
kubectl get pvc -n postgresql-system
|
||||
kubectl describe pvc postgres-shared-2 -n postgresql-system # Primary instance PVC
|
||||
|
||||
# Check Longhorn volumes
|
||||
kubectl get volumes -n longhorn-system
|
||||
kubectl describe volume -n longhorn-system | grep postgresql
|
||||
```
|
||||
|
||||
### **Long Running Queries**
|
||||
When a long running query is happening, use this command on a node
|
||||
```bash
|
||||
kubectl exec -n postgresql-system postgres-shared-2 -- psql -U postgres -c "
|
||||
SELECT
|
||||
pid,
|
||||
datname,
|
||||
usename,
|
||||
application_name,
|
||||
now() - xact_start AS tx_duration,
|
||||
now() - query_start AS query_duration,
|
||||
state,
|
||||
wait_event_type,
|
||||
wait_event,
|
||||
query
|
||||
FROM pg_stat_activity
|
||||
WHERE state != 'idle'
|
||||
AND query NOT LIKE '%pg_stat_activity%'
|
||||
AND (now() - xact_start > interval '10 seconds' OR now() - query_start > interval '10 seconds')
|
||||
ORDER BY GREATEST(now() - xact_start, now() - query_start) DESC;
|
||||
"
|
||||
```
|
||||
Reference in New Issue
Block a user