509 lines
16 KiB
Markdown
509 lines
16 KiB
Markdown
Below is Claude's recommendation for a guide to partition tables in postgres for piefed and it seems similar to the
|
|
official docs, though I'd prefer that and use this as a reference. This guide sets up automatic backup functions,
|
|
which is nice. The reason I was looking in to this is that I've noticed about 500MB growth
|
|
in about a week and the largest tables are for votes, which wouldn't compress well. I think I would wait a bit longer to
|
|
do the partitioning migration than in the next few weeks (and also test it in a lower env), since even if 300GB is available
|
|
to the DB per node, that's still 600 weeks, so plenty of time. Piefed is talking about automatic backup of older posts to S3,
|
|
but that table was only about 80MB for me and it would probably do well to eventually compress it.
|
|
|
|
# PostgreSQL Partitioning Strategy for PieFed Database Growth
|
|
|
|
## 📊 **Current Status & Growth Analysis**
|
|
|
|
### **Database Size Assessment (August 2025)**
|
|
- **PieFed Database**: 975 MB (largest database in cluster)
|
|
- **Growth Rate**: 500 MB per week
|
|
- **Largest Tables**:
|
|
- `post_vote`: 280 MB (1,167,833 rows) - 20 days of data
|
|
- `post_reply_vote`: 271 MB (1,185,985 rows)
|
|
- `post_reply`: 201 MB
|
|
- `user`: 104 MB
|
|
|
|
### **Growth Projections**
|
|
- **Daily vote activity**: ~58,000 votes/day
|
|
- **Annual projection**: ~21M votes/year = ~5.1GB for `post_vote` alone
|
|
- **Total database projection**: 15-20GB annually across all tables
|
|
- **3-year projection**: 45-60GB total database size
|
|
|
|
## 🎯 **When to Begin Partitioning**
|
|
|
|
### **Trigger Points for Implementation**
|
|
|
|
#### **Phase 1: Immediate Planning (Current)**
|
|
- ✅ **Database size**: 975 MB (threshold: >500 MB)
|
|
- ✅ **Growth rate**: 500 MB/week (threshold: >100 MB/week)
|
|
- ✅ **Infrastructure capacity**: 400GB available per node
|
|
|
|
#### **Phase 2: Infrastructure Preparation (Next 1-2 months)**
|
|
**Trigger**: When database reaches 1.5-2GB
|
|
- Current trajectory: ~4-6 weeks from now
|
|
- **Action**: Add NetCup block storage volumes
|
|
- **Rationale**: Prepare infrastructure before partitioning implementation
|
|
|
|
#### **Phase 3: Partitioning Implementation (2-3 months)**
|
|
**Trigger**: When `post_vote` table reaches 500 MB or 2M rows
|
|
- Current trajectory: ~6-8 weeks from now
|
|
- **Action**: Implement time-based partitioning
|
|
- **Rationale**: Optimal size for initial partitioning without excessive complexity
|
|
|
|
#### **Phase 4: Archive Migration (3-4 months)**
|
|
**Trigger**: When historical data older than 3 months exists
|
|
- Current trajectory: ~12-16 weeks from now
|
|
- **Action**: Move old partitions to archive storage
|
|
- **Rationale**: Cost optimization for infrequently accessed data
|
|
|
|
## 🏗️ **Infrastructure Architecture**
|
|
|
|
### **Current Setup**
|
|
```yaml
|
|
# Current PostgreSQL Storage Configuration
|
|
storage:
|
|
size: 50Gi
|
|
storageClass: longhorn-postgresql
|
|
walStorage:
|
|
size: 10Gi
|
|
storageClass: longhorn-postgresql
|
|
```
|
|
|
|
### **Target Architecture**
|
|
```yaml
|
|
# Enhanced Multi-Volume Configuration
|
|
storage:
|
|
size: 50Gi # Recent data (2-3 months)
|
|
storageClass: longhorn-postgresql
|
|
walStorage:
|
|
size: 10Gi
|
|
storageClass: longhorn-postgresql
|
|
tablespaces:
|
|
- name: archive_data # Historical data (>3 months)
|
|
size: 500Gi
|
|
storageClass: netcup-block-storage
|
|
- name: temp_operations # Temporary operations
|
|
size: 100Gi
|
|
storageClass: netcup-block-storage
|
|
```
|
|
|
|
## 📋 **Implementation Plan**
|
|
|
|
### **Phase 1: Infrastructure Preparation**
|
|
|
|
#### **1.1 Add NetCup Block Storage**
|
|
```bash
|
|
# On each VPS (n1, n2, n3)
|
|
# 1. Attach 500GB block storage via NetCup control panel
|
|
# 2. Format and mount new volumes
|
|
|
|
sudo mkfs.ext4 /dev/sdb
|
|
sudo mkdir -p /mnt/postgres-archive
|
|
sudo mount /dev/sdb /mnt/postgres-archive
|
|
sudo chown 999:999 /mnt/postgres-archive
|
|
|
|
# Add to /etc/fstab for persistence
|
|
echo "/dev/sdb /mnt/postgres-archive ext4 defaults 0 2" >> /etc/fstab
|
|
```
|
|
|
|
#### **1.2 Create Storage Classes**
|
|
```yaml
|
|
# manifests/infrastructure/postgresql/netcup-block-storage.yaml
|
|
apiVersion: storage.k8s.io/v1
|
|
kind: StorageClass
|
|
metadata:
|
|
name: netcup-block-storage
|
|
provisioner: kubernetes.io/host-path
|
|
parameters:
|
|
type: Directory
|
|
path: /mnt/postgres-archive
|
|
volumeBindingMode: WaitForFirstConsumer
|
|
reclaimPolicy: Retain
|
|
```
|
|
|
|
#### **1.3 Update CloudNativePG Configuration**
|
|
```yaml
|
|
# manifests/infrastructure/postgresql/cluster-shared.yaml
|
|
apiVersion: postgresql.cnpg.io/v1
|
|
kind: Cluster
|
|
metadata:
|
|
name: postgres-shared
|
|
spec:
|
|
instances: 3
|
|
|
|
storage:
|
|
size: 50Gi
|
|
storageClass: longhorn-postgresql
|
|
|
|
walStorage:
|
|
size: 10Gi
|
|
storageClass: longhorn-postgresql
|
|
|
|
# Add tablespaces for multi-volume storage
|
|
tablespaces:
|
|
- name: archive_data
|
|
size: 500Gi
|
|
storageClass: netcup-block-storage
|
|
- name: temp_operations
|
|
size: 100Gi
|
|
storageClass: netcup-block-storage
|
|
|
|
# Enable partitioning extensions
|
|
bootstrap:
|
|
initdb:
|
|
database: shared_db
|
|
owner: shared_user
|
|
postInitSQL:
|
|
- "CREATE EXTENSION IF NOT EXISTS pg_partman"
|
|
- "CREATE EXTENSION IF NOT EXISTS pg_cron"
|
|
```
|
|
|
|
### **Phase 2: Partitioning Implementation**
|
|
|
|
#### **2.1 Install Required Extensions**
|
|
```sql
|
|
-- Connect to PieFed database
|
|
kubectl exec -n postgresql-system postgres-shared-2 -- psql -U postgres -d piefed
|
|
|
|
-- Install partitioning and scheduling extensions
|
|
CREATE EXTENSION IF NOT EXISTS pg_partman;
|
|
CREATE EXTENSION IF NOT EXISTS pg_cron;
|
|
|
|
-- Verify installation
|
|
SELECT name, default_version, installed_version
|
|
FROM pg_available_extensions
|
|
WHERE name IN ('pg_partman', 'pg_cron');
|
|
```
|
|
|
|
#### **2.2 Create Tablespaces**
|
|
```sql
|
|
-- Create tablespace for archive data
|
|
CREATE TABLESPACE archive_data LOCATION '/var/lib/postgresql/tablespaces/archive_data';
|
|
|
|
-- Create tablespace for temporary operations
|
|
CREATE TABLESPACE temp_operations LOCATION '/var/lib/postgresql/tablespaces/temp_operations';
|
|
|
|
-- Verify tablespaces
|
|
SELECT spcname, pg_tablespace_location(oid) FROM pg_tablespace;
|
|
```
|
|
|
|
#### **2.3 Partition the post_vote Table**
|
|
|
|
**Step 1: Backup Current Data**
|
|
```sql
|
|
-- Create backup of current table
|
|
CREATE TABLE post_vote_backup AS SELECT * FROM post_vote;
|
|
```
|
|
|
|
**Step 2: Create Partitioned Table Structure**
|
|
```sql
|
|
-- Rename existing table
|
|
ALTER TABLE post_vote RENAME TO post_vote_legacy;
|
|
|
|
-- Create new partitioned table
|
|
CREATE TABLE post_vote (
|
|
id INTEGER NOT NULL,
|
|
user_id INTEGER,
|
|
author_id INTEGER,
|
|
post_id INTEGER,
|
|
effect DOUBLE PRECISION,
|
|
created_at TIMESTAMP WITHOUT TIME ZONE NOT NULL,
|
|
PRIMARY KEY (id, created_at) -- Include partition key in PK
|
|
) PARTITION BY RANGE (created_at);
|
|
|
|
-- Create indexes
|
|
CREATE INDEX idx_post_vote_created_at ON post_vote (created_at);
|
|
CREATE INDEX idx_post_vote_user_id ON post_vote (user_id);
|
|
CREATE INDEX idx_post_vote_post_id ON post_vote (post_id);
|
|
CREATE INDEX idx_post_vote_author_id ON post_vote (author_id);
|
|
```
|
|
|
|
**Step 3: Configure Automated Partitioning**
|
|
```sql
|
|
-- Set up pg_partman for monthly partitions
|
|
SELECT partman.create_parent(
|
|
p_parent_table => 'public.post_vote',
|
|
p_control => 'created_at',
|
|
p_type => 'range',
|
|
p_interval => 'monthly',
|
|
p_premake => 3, -- Pre-create 3 future partitions
|
|
p_start_partition => '2025-07-01' -- Start from July 2025
|
|
);
|
|
|
|
-- Configure retention and archive settings
|
|
UPDATE partman.part_config
|
|
SET retention = '12 months',
|
|
retention_keep_table = true,
|
|
infinite_time_partitions = true,
|
|
optimize_constraint = 30
|
|
WHERE parent_table = 'public.post_vote';
|
|
```
|
|
|
|
**Step 4: Create Initial Partitions**
|
|
```sql
|
|
-- Create July 2025 partition (historical data)
|
|
CREATE TABLE post_vote_p2025_07 PARTITION OF post_vote
|
|
FOR VALUES FROM ('2025-07-01') TO ('2025-08-01')
|
|
TABLESPACE archive_data; -- Place on archive storage
|
|
|
|
-- Create August 2025 partition (recent data)
|
|
CREATE TABLE post_vote_p2025_08 PARTITION OF post_vote
|
|
FOR VALUES FROM ('2025-08-01') TO ('2025-09-01'); -- Default tablespace
|
|
|
|
-- Create September 2025 partition (future data)
|
|
CREATE TABLE post_vote_p2025_09 PARTITION OF post_vote
|
|
FOR VALUES FROM ('2025-09-01') TO ('2025-10-01'); -- Default tablespace
|
|
```
|
|
|
|
**Step 5: Migrate Data**
|
|
```sql
|
|
-- Migrate data from legacy table
|
|
INSERT INTO post_vote
|
|
SELECT * FROM post_vote_legacy
|
|
ORDER BY created_at;
|
|
|
|
-- Verify data migration
|
|
SELECT
|
|
'Legacy' as source, COUNT(*) as row_count FROM post_vote_legacy
|
|
UNION ALL
|
|
SELECT
|
|
'Partitioned' as source, COUNT(*) as row_count FROM post_vote;
|
|
|
|
-- Check partition distribution
|
|
SELECT
|
|
schemaname,
|
|
tablename,
|
|
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as size,
|
|
(SELECT COUNT(*) FROM information_schema.table_constraints
|
|
WHERE table_name = pg_tables.tablename AND constraint_type = 'CHECK') as partition_count
|
|
FROM pg_tables
|
|
WHERE tablename LIKE 'post_vote_p%'
|
|
ORDER BY tablename;
|
|
```
|
|
|
|
#### **2.4 Set Up Automated Partition Management**
|
|
```sql
|
|
-- Create function to automatically move old partitions to archive storage
|
|
CREATE OR REPLACE FUNCTION move_old_partitions_to_archive()
|
|
RETURNS void AS $$
|
|
DECLARE
|
|
partition_name text;
|
|
archive_threshold date;
|
|
BEGIN
|
|
-- Move partitions older than 3 months to archive storage
|
|
archive_threshold := CURRENT_DATE - INTERVAL '3 months';
|
|
|
|
FOR partition_name IN
|
|
SELECT schemaname||'.'||tablename
|
|
FROM pg_tables
|
|
WHERE tablename LIKE 'post_vote_p%'
|
|
AND tablename < 'post_vote_p' || TO_CHAR(archive_threshold, 'YYYY_MM')
|
|
LOOP
|
|
-- Move partition to archive tablespace
|
|
EXECUTE format('ALTER TABLE %s SET TABLESPACE archive_data', partition_name);
|
|
RAISE NOTICE 'Moved partition % to archive storage', partition_name;
|
|
END LOOP;
|
|
END;
|
|
$$ LANGUAGE plpgsql;
|
|
|
|
-- Schedule monthly archive operations
|
|
SELECT cron.schedule(
|
|
'move-old-partitions',
|
|
'0 2 1 * *', -- 2 AM on the 1st of each month
|
|
'SELECT move_old_partitions_to_archive()'
|
|
);
|
|
|
|
-- Schedule partition maintenance
|
|
SELECT cron.schedule(
|
|
'partition-maintenance',
|
|
'0 1 * * 0', -- 1 AM every Sunday
|
|
'SELECT partman.run_maintenance_proc()'
|
|
);
|
|
```
|
|
|
|
### **Phase 3: Extend to Other Large Tables**
|
|
|
|
#### **3.1 Partition post_reply_vote Table**
|
|
```sql
|
|
-- Similar process for post_reply_vote (271 MB)
|
|
-- Follow same steps as post_vote table
|
|
```
|
|
|
|
#### **3.2 Partition post_reply Table**
|
|
```sql
|
|
-- Similar process for post_reply (201 MB)
|
|
-- Consider partitioning by created_at or parent post date
|
|
```
|
|
|
|
## 📊 **Monitoring and Maintenance**
|
|
|
|
### **Performance Monitoring Queries**
|
|
|
|
#### **Partition Size Monitoring**
|
|
```sql
|
|
-- Monitor partition sizes and locations
|
|
SELECT
|
|
schemaname,
|
|
tablename,
|
|
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as size,
|
|
pg_tablespace_name(reltablespace) as tablespace,
|
|
(SELECT COUNT(*) FROM information_schema.columns
|
|
WHERE table_name = pg_tables.tablename) as column_count
|
|
FROM pg_tables
|
|
WHERE tablename LIKE 'post_vote_p%'
|
|
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;
|
|
```
|
|
|
|
#### **Query Performance Analysis**
|
|
```sql
|
|
-- Analyze query performance across partitions
|
|
EXPLAIN (ANALYZE, BUFFERS)
|
|
SELECT COUNT(*)
|
|
FROM post_vote
|
|
WHERE created_at >= '2025-01-01'
|
|
AND created_at < '2025-12-31';
|
|
```
|
|
|
|
#### **Partition Pruning Verification**
|
|
```sql
|
|
-- Verify partition pruning is working
|
|
EXPLAIN (ANALYZE, BUFFERS)
|
|
SELECT *
|
|
FROM post_vote
|
|
WHERE created_at >= '2025-08-01'
|
|
AND created_at < '2025-09-01';
|
|
```
|
|
|
|
### **Storage Usage Monitoring**
|
|
```bash
|
|
# Monitor tablespace usage
|
|
kubectl exec -n postgresql-system postgres-shared-2 -- psql -U postgres -c "
|
|
SELECT
|
|
spcname as tablespace_name,
|
|
pg_tablespace_location(oid) as location,
|
|
pg_size_pretty(pg_tablespace_size(oid)) as size
|
|
FROM pg_tablespace
|
|
WHERE spcname NOT IN ('pg_default', 'pg_global');
|
|
"
|
|
|
|
# Monitor PVC usage
|
|
kubectl get pvc -n postgresql-system
|
|
kubectl describe pvc -n postgresql-system
|
|
```
|
|
|
|
### **Automated Maintenance Jobs**
|
|
```sql
|
|
-- View scheduled maintenance jobs
|
|
SELECT
|
|
jobname,
|
|
schedule,
|
|
command,
|
|
active,
|
|
jobid
|
|
FROM cron.job
|
|
ORDER BY jobname;
|
|
|
|
-- Check partition maintenance logs
|
|
SELECT * FROM partman.part_config_sub;
|
|
```
|
|
|
|
## 🚨 **Troubleshooting Guide**
|
|
|
|
### **Common Issues and Solutions**
|
|
|
|
#### **Issue: Partition Creation Fails**
|
|
```sql
|
|
-- Check partition configuration
|
|
SELECT * FROM partman.part_config WHERE parent_table = 'public.post_vote';
|
|
|
|
-- Manually create missing partition
|
|
SELECT partman.create_parent(
|
|
p_parent_table => 'public.post_vote',
|
|
p_control => 'created_at',
|
|
p_type => 'range',
|
|
p_interval => 'monthly'
|
|
);
|
|
```
|
|
|
|
#### **Issue: Query Not Using Partition Pruning**
|
|
```sql
|
|
-- Check if constraint exclusion is enabled
|
|
SHOW constraint_exclusion;
|
|
|
|
-- Enable if needed
|
|
SET constraint_exclusion = partition;
|
|
|
|
-- Update statistics
|
|
ANALYZE post_vote;
|
|
```
|
|
|
|
#### **Issue: Tablespace Out of Space**
|
|
```bash
|
|
# Check tablespace usage
|
|
df -h /mnt/postgres-archive
|
|
|
|
# Add additional block storage if needed
|
|
# Follow NetCup documentation for volume expansion
|
|
```
|
|
|
|
## 📖 **Documentation References**
|
|
|
|
### **CloudNativePG Documentation**
|
|
- [Tablespaces](https://cloudnative-pg.io/documentation/current/tablespaces/) - Official tablespace configuration guide
|
|
- [FAQ](https://cloudnative-pg.io/documentation/current/faq/) - Database management best practices
|
|
- [Controller](https://cloudnative-pg.io/documentation/current/controller/) - Storage management concepts
|
|
|
|
### **PostgreSQL Documentation**
|
|
- [Declarative Partitioning](https://www.postgresql.org/docs/16/ddl-partitioning.html) - Official partitioning guide
|
|
- [Tablespaces](https://www.postgresql.org/docs/16/manage-ag-tablespaces.html) - Tablespace management
|
|
- [pg_partman Extension](https://github.com/pgpartman/pg_partman) - Automated partition management
|
|
|
|
### **NetCup Documentation**
|
|
- [Block Storage](https://www.netcup.eu/bestellen/produkt.php?produkt=2594) - Block storage attachment guide
|
|
- [VPS Management](https://www.netcup.eu/vserver/) - VPS configuration documentation
|
|
|
|
## 🎯 **Success Metrics**
|
|
|
|
### **Performance Targets**
|
|
- **Recent data queries**: <250ms (50% improvement from current 506ms)
|
|
- **Historical data queries**: <800ms (acceptable for archive storage)
|
|
- **Storage cost reduction**: 70% for historical data
|
|
- **Backup time improvement**: 60% reduction for recent data backups
|
|
|
|
### **Capacity Planning**
|
|
- **Primary storage**: Maintain 50GB for 2-3 months of recent data
|
|
- **Archive storage**: Scale to 500GB initially, expand as needed
|
|
- **Growth accommodation**: Support 20GB/year growth for 25+ years
|
|
|
|
### **Operational Goals**
|
|
- **Zero downtime**: All operations performed online
|
|
- **Application transparency**: No code changes required
|
|
- **Automated management**: Minimal manual intervention
|
|
- **Disaster recovery**: Independent backup strategies per tier
|
|
|
|
## 📅 **Implementation Timeline**
|
|
|
|
| Phase | Duration | Key Deliverables |
|
|
|-------|----------|------------------|
|
|
| **Infrastructure Prep** | 2 weeks | NetCup block storage attached, storage classes configured |
|
|
| **Partitioning Setup** | 1 week | Extensions installed, tablespaces created |
|
|
| **post_vote Migration** | 1 week | Partitioned table structure, data migration |
|
|
| **Automation Setup** | 1 week | Automated partition management, monitoring |
|
|
| **Other Tables** | 2 weeks | post_reply_vote and post_reply partitioning |
|
|
| **Testing & Optimization** | 1 week | Performance testing, fine-tuning |
|
|
|
|
**Total Implementation Time**: 8 weeks
|
|
|
|
## ✅ **Pre-Implementation Checklist**
|
|
|
|
- [ ] NetCup block storage volumes attached to all nodes
|
|
- [ ] Storage classes created and tested
|
|
- [ ] CloudNativePG cluster configuration updated
|
|
- [ ] Backup of current database completed
|
|
- [ ] pg_partman and pg_cron extensions available
|
|
- [ ] Monitoring queries prepared
|
|
- [ ] Rollback plan documented
|
|
- [ ] Team training on partition management completed
|
|
|
|
---
|
|
|
|
**Last Updated**: August 2025
|
|
**Next Review**: September 2025
|
|
**Owner**: Database Administration Team
|