Add the redacted source file for demo purposes Reviewed-on: https://source.michaeldileo.org/michael_dileo/Keybard-Vagabond-Demo/pulls/1 Co-authored-by: Michael DiLeo <michael_dileo@proton.me> Co-committed-by: Michael DiLeo <michael_dileo@proton.me>
16 KiB
Below is Claude's recommendation for a guide to partition tables in postgres for piefed and it seems similar to the official docs, though I'd prefer that and use this as a reference. This guide sets up automatic backup functions, which is nice. The reason I was looking in to this is that I've noticed about 500MB growth in about a week and the largest tables are for votes, which wouldn't compress well. I think I would wait a bit longer to do the partitioning migration than in the next few weeks (and also test it in a lower env), since even if 300GB is available to the DB per node, that's still 600 weeks, so plenty of time. Piefed is talking about automatic backup of older posts to S3, but that table was only about 80MB for me and it would probably do well to eventually compress it.
PostgreSQL Partitioning Strategy for PieFed Database Growth
📊 Current Status & Growth Analysis
Database Size Assessment (August 2025)
- PieFed Database: 975 MB (largest database in cluster)
- Growth Rate: 500 MB per week
- Largest Tables:
post_vote: 280 MB (1,167,833 rows) - 20 days of datapost_reply_vote: 271 MB (1,185,985 rows)post_reply: 201 MBuser: 104 MB
Growth Projections
- Daily vote activity: ~58,000 votes/day
- Annual projection: ~21M votes/year = ~5.1GB for
post_votealone - Total database projection: 15-20GB annually across all tables
- 3-year projection: 45-60GB total database size
🎯 When to Begin Partitioning
Trigger Points for Implementation
Phase 1: Immediate Planning (Current)
- ✅ Database size: 975 MB (threshold: >500 MB)
- ✅ Growth rate: 500 MB/week (threshold: >100 MB/week)
- ✅ Infrastructure capacity: 400GB available per node
Phase 2: Infrastructure Preparation (Next 1-2 months)
Trigger: When database reaches 1.5-2GB
- Current trajectory: ~4-6 weeks from now
- Action: Add NetCup block storage volumes
- Rationale: Prepare infrastructure before partitioning implementation
Phase 3: Partitioning Implementation (2-3 months)
Trigger: When post_vote table reaches 500 MB or 2M rows
- Current trajectory: ~6-8 weeks from now
- Action: Implement time-based partitioning
- Rationale: Optimal size for initial partitioning without excessive complexity
Phase 4: Archive Migration (3-4 months)
Trigger: When historical data older than 3 months exists
- Current trajectory: ~12-16 weeks from now
- Action: Move old partitions to archive storage
- Rationale: Cost optimization for infrequently accessed data
🏗️ Infrastructure Architecture
Current Setup
# Current PostgreSQL Storage Configuration
storage:
size: 50Gi
storageClass: longhorn-postgresql
walStorage:
size: 10Gi
storageClass: longhorn-postgresql
Target Architecture
# Enhanced Multi-Volume Configuration
storage:
size: 50Gi # Recent data (2-3 months)
storageClass: longhorn-postgresql
walStorage:
size: 10Gi
storageClass: longhorn-postgresql
tablespaces:
- name: archive_data # Historical data (>3 months)
size: 500Gi
storageClass: netcup-block-storage
- name: temp_operations # Temporary operations
size: 100Gi
storageClass: netcup-block-storage
📋 Implementation Plan
Phase 1: Infrastructure Preparation
1.1 Add NetCup Block Storage
# On each VPS (n1, n2, n3)
# 1. Attach 500GB block storage via NetCup control panel
# 2. Format and mount new volumes
sudo mkfs.ext4 /dev/sdb
sudo mkdir -p /mnt/postgres-archive
sudo mount /dev/sdb /mnt/postgres-archive
sudo chown 999:999 /mnt/postgres-archive
# Add to /etc/fstab for persistence
echo "/dev/sdb /mnt/postgres-archive ext4 defaults 0 2" >> /etc/fstab
1.2 Create Storage Classes
# manifests/infrastructure/postgresql/netcup-block-storage.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: netcup-block-storage
provisioner: kubernetes.io/host-path
parameters:
type: Directory
path: /mnt/postgres-archive
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Retain
1.3 Update CloudNativePG Configuration
# manifests/infrastructure/postgresql/cluster-shared.yaml
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: postgres-shared
spec:
instances: 3
storage:
size: 50Gi
storageClass: longhorn-postgresql
walStorage:
size: 10Gi
storageClass: longhorn-postgresql
# Add tablespaces for multi-volume storage
tablespaces:
- name: archive_data
size: 500Gi
storageClass: netcup-block-storage
- name: temp_operations
size: 100Gi
storageClass: netcup-block-storage
# Enable partitioning extensions
bootstrap:
initdb:
database: shared_db
owner: shared_user
postInitSQL:
- "CREATE EXTENSION IF NOT EXISTS pg_partman"
- "CREATE EXTENSION IF NOT EXISTS pg_cron"
Phase 2: Partitioning Implementation
2.1 Install Required Extensions
-- Connect to PieFed database
kubectl exec -n postgresql-system postgres-shared-2 -- psql -U postgres -d piefed
-- Install partitioning and scheduling extensions
CREATE EXTENSION IF NOT EXISTS pg_partman;
CREATE EXTENSION IF NOT EXISTS pg_cron;
-- Verify installation
SELECT name, default_version, installed_version
FROM pg_available_extensions
WHERE name IN ('pg_partman', 'pg_cron');
2.2 Create Tablespaces
-- Create tablespace for archive data
CREATE TABLESPACE archive_data LOCATION '/var/lib/postgresql/tablespaces/archive_data';
-- Create tablespace for temporary operations
CREATE TABLESPACE temp_operations LOCATION '/var/lib/postgresql/tablespaces/temp_operations';
-- Verify tablespaces
SELECT spcname, pg_tablespace_location(oid) FROM pg_tablespace;
2.3 Partition the post_vote Table
Step 1: Backup Current Data
-- Create backup of current table
CREATE TABLE post_vote_backup AS SELECT * FROM post_vote;
Step 2: Create Partitioned Table Structure
-- Rename existing table
ALTER TABLE post_vote RENAME TO post_vote_legacy;
-- Create new partitioned table
CREATE TABLE post_vote (
id INTEGER NOT NULL,
user_id INTEGER,
author_id INTEGER,
post_id INTEGER,
effect DOUBLE PRECISION,
created_at TIMESTAMP WITHOUT TIME ZONE NOT NULL,
PRIMARY KEY (id, created_at) -- Include partition key in PK
) PARTITION BY RANGE (created_at);
-- Create indexes
CREATE INDEX idx_post_vote_created_at ON post_vote (created_at);
CREATE INDEX idx_post_vote_user_id ON post_vote (user_id);
CREATE INDEX idx_post_vote_post_id ON post_vote (post_id);
CREATE INDEX idx_post_vote_author_id ON post_vote (author_id);
Step 3: Configure Automated Partitioning
-- Set up pg_partman for monthly partitions
SELECT partman.create_parent(
p_parent_table => 'public.post_vote',
p_control => 'created_at',
p_type => 'range',
p_interval => 'monthly',
p_premake => 3, -- Pre-create 3 future partitions
p_start_partition => '2025-07-01' -- Start from July 2025
);
-- Configure retention and archive settings
UPDATE partman.part_config
SET retention = '12 months',
retention_keep_table = true,
infinite_time_partitions = true,
optimize_constraint = 30
WHERE parent_table = 'public.post_vote';
Step 4: Create Initial Partitions
-- Create July 2025 partition (historical data)
CREATE TABLE post_vote_p2025_07 PARTITION OF post_vote
FOR VALUES FROM ('2025-07-01') TO ('2025-08-01')
TABLESPACE archive_data; -- Place on archive storage
-- Create August 2025 partition (recent data)
CREATE TABLE post_vote_p2025_08 PARTITION OF post_vote
FOR VALUES FROM ('2025-08-01') TO ('2025-09-01'); -- Default tablespace
-- Create September 2025 partition (future data)
CREATE TABLE post_vote_p2025_09 PARTITION OF post_vote
FOR VALUES FROM ('2025-09-01') TO ('2025-10-01'); -- Default tablespace
Step 5: Migrate Data
-- Migrate data from legacy table
INSERT INTO post_vote
SELECT * FROM post_vote_legacy
ORDER BY created_at;
-- Verify data migration
SELECT
'Legacy' as source, COUNT(*) as row_count FROM post_vote_legacy
UNION ALL
SELECT
'Partitioned' as source, COUNT(*) as row_count FROM post_vote;
-- Check partition distribution
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as size,
(SELECT COUNT(*) FROM information_schema.table_constraints
WHERE table_name = pg_tables.tablename AND constraint_type = 'CHECK') as partition_count
FROM pg_tables
WHERE tablename LIKE 'post_vote_p%'
ORDER BY tablename;
2.4 Set Up Automated Partition Management
-- Create function to automatically move old partitions to archive storage
CREATE OR REPLACE FUNCTION move_old_partitions_to_archive()
RETURNS void AS $$
DECLARE
partition_name text;
archive_threshold date;
BEGIN
-- Move partitions older than 3 months to archive storage
archive_threshold := CURRENT_DATE - INTERVAL '3 months';
FOR partition_name IN
SELECT schemaname||'.'||tablename
FROM pg_tables
WHERE tablename LIKE 'post_vote_p%'
AND tablename < 'post_vote_p' || TO_CHAR(archive_threshold, 'YYYY_MM')
LOOP
-- Move partition to archive tablespace
EXECUTE format('ALTER TABLE %s SET TABLESPACE archive_data', partition_name);
RAISE NOTICE 'Moved partition % to archive storage', partition_name;
END LOOP;
END;
$$ LANGUAGE plpgsql;
-- Schedule monthly archive operations
SELECT cron.schedule(
'move-old-partitions',
'0 2 1 * *', -- 2 AM on the 1st of each month
'SELECT move_old_partitions_to_archive()'
);
-- Schedule partition maintenance
SELECT cron.schedule(
'partition-maintenance',
'0 1 * * 0', -- 1 AM every Sunday
'SELECT partman.run_maintenance_proc()'
);
Phase 3: Extend to Other Large Tables
3.1 Partition post_reply_vote Table
-- Similar process for post_reply_vote (271 MB)
-- Follow same steps as post_vote table
3.2 Partition post_reply Table
-- Similar process for post_reply (201 MB)
-- Consider partitioning by created_at or parent post date
📊 Monitoring and Maintenance
Performance Monitoring Queries
Partition Size Monitoring
-- Monitor partition sizes and locations
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as size,
pg_tablespace_name(reltablespace) as tablespace,
(SELECT COUNT(*) FROM information_schema.columns
WHERE table_name = pg_tables.tablename) as column_count
FROM pg_tables
WHERE tablename LIKE 'post_vote_p%'
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;
Query Performance Analysis
-- Analyze query performance across partitions
EXPLAIN (ANALYZE, BUFFERS)
SELECT COUNT(*)
FROM post_vote
WHERE created_at >= '2025-01-01'
AND created_at < '2025-12-31';
Partition Pruning Verification
-- Verify partition pruning is working
EXPLAIN (ANALYZE, BUFFERS)
SELECT *
FROM post_vote
WHERE created_at >= '2025-08-01'
AND created_at < '2025-09-01';
Storage Usage Monitoring
# Monitor tablespace usage
kubectl exec -n postgresql-system postgres-shared-2 -- psql -U postgres -c "
SELECT
spcname as tablespace_name,
pg_tablespace_location(oid) as location,
pg_size_pretty(pg_tablespace_size(oid)) as size
FROM pg_tablespace
WHERE spcname NOT IN ('pg_default', 'pg_global');
"
# Monitor PVC usage
kubectl get pvc -n postgresql-system
kubectl describe pvc -n postgresql-system
Automated Maintenance Jobs
-- View scheduled maintenance jobs
SELECT
jobname,
schedule,
command,
active,
jobid
FROM cron.job
ORDER BY jobname;
-- Check partition maintenance logs
SELECT * FROM partman.part_config_sub;
🚨 Troubleshooting Guide
Common Issues and Solutions
Issue: Partition Creation Fails
-- Check partition configuration
SELECT * FROM partman.part_config WHERE parent_table = 'public.post_vote';
-- Manually create missing partition
SELECT partman.create_parent(
p_parent_table => 'public.post_vote',
p_control => 'created_at',
p_type => 'range',
p_interval => 'monthly'
);
Issue: Query Not Using Partition Pruning
-- Check if constraint exclusion is enabled
SHOW constraint_exclusion;
-- Enable if needed
SET constraint_exclusion = partition;
-- Update statistics
ANALYZE post_vote;
Issue: Tablespace Out of Space
# Check tablespace usage
df -h /mnt/postgres-archive
# Add additional block storage if needed
# Follow NetCup documentation for volume expansion
📖 Documentation References
CloudNativePG Documentation
- Tablespaces - Official tablespace configuration guide
- FAQ - Database management best practices
- Controller - Storage management concepts
PostgreSQL Documentation
- Declarative Partitioning - Official partitioning guide
- Tablespaces - Tablespace management
- pg_partman Extension - Automated partition management
NetCup Documentation
- Block Storage - Block storage attachment guide
- VPS Management - VPS configuration documentation
🎯 Success Metrics
Performance Targets
- Recent data queries: <250ms (50% improvement from current 506ms)
- Historical data queries: <800ms (acceptable for archive storage)
- Storage cost reduction: 70% for historical data
- Backup time improvement: 60% reduction for recent data backups
Capacity Planning
- Primary storage: Maintain 50GB for 2-3 months of recent data
- Archive storage: Scale to 500GB initially, expand as needed
- Growth accommodation: Support 20GB/year growth for 25+ years
Operational Goals
- Zero downtime: All operations performed online
- Application transparency: No code changes required
- Automated management: Minimal manual intervention
- Disaster recovery: Independent backup strategies per tier
📅 Implementation Timeline
| Phase | Duration | Key Deliverables |
|---|---|---|
| Infrastructure Prep | 2 weeks | NetCup block storage attached, storage classes configured |
| Partitioning Setup | 1 week | Extensions installed, tablespaces created |
| post_vote Migration | 1 week | Partitioned table structure, data migration |
| Automation Setup | 1 week | Automated partition management, monitoring |
| Other Tables | 2 weeks | post_reply_vote and post_reply partitioning |
| Testing & Optimization | 1 week | Performance testing, fine-tuning |
Total Implementation Time: 8 weeks
✅ Pre-Implementation Checklist
- NetCup block storage volumes attached to all nodes
- Storage classes created and tested
- CloudNativePG cluster configuration updated
- Backup of current database completed
- pg_partman and pg_cron extensions available
- Monitoring queries prepared
- Rollback plan documented
- Team training on partition management completed
Last Updated: August 2025
Next Review: September 2025
Owner: Database Administration Team