redaction (#1)
Add the redacted source file for demo purposes Reviewed-on: https://source.michaeldileo.org/michael_dileo/Keybard-Vagabond-Demo/pulls/1 Co-authored-by: Michael DiLeo <michael_dileo@proton.me> Co-committed-by: Michael DiLeo <michael_dileo@proton.me>
This commit was merged in pull request #1.
This commit is contained in:
277
manifests/infrastructure/longhorn/S3-API-OPTIMIZATION.md
Normal file
277
manifests/infrastructure/longhorn/S3-API-OPTIMIZATION.md
Normal file
@@ -0,0 +1,277 @@
|
||||
# Longhorn S3 API Call Optimization - Implementation Summary
|
||||
|
||||
## Problem Statement
|
||||
|
||||
Longhorn was making **145,000+ Class C API calls/day** to Backblaze B2, primarily `s3_list_objects` operations. This exceeded Backblaze's free tier (2,500 calls/day) and incurred significant costs.
|
||||
|
||||
### Root Cause
|
||||
|
||||
Even with `backupstore-poll-interval` set to `0`, Longhorn manager pods continuously poll the S3 backup target to check for new backups. With 3 manager pods (one per node) polling independently, this resulted in excessive API calls.
|
||||
|
||||
Reference: [Longhorn GitHub Issue #1547](https://github.com/longhorn/longhorn/issues/1547)
|
||||
|
||||
## Solution: NetworkPolicy-Based Access Control
|
||||
|
||||
Inspired by [this community solution](https://github.com/longhorn/longhorn/issues/1547#issuecomment-3395447100), we implemented **time-based network access control** using Kubernetes NetworkPolicies and CronJobs.
|
||||
|
||||
### Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────┐
|
||||
│ Normal State (21 hours/day) │
|
||||
│ NetworkPolicy BLOCKS S3 access │
|
||||
│ → Longhorn polls fail at network layer │
|
||||
│ → S3 API calls: 0 │
|
||||
└─────────────────────────────────────────────────┘
|
||||
▼
|
||||
┌─────────────────────────────────────────────────┐
|
||||
│ Backup Window (3 hours/day: 1-4 AM) │
|
||||
│ CronJob REMOVES NetworkPolicy at 12:55 AM │
|
||||
│ → S3 access enabled │
|
||||
│ → Recurring backups run automatically │
|
||||
│ → CronJob RESTORES NetworkPolicy at 4:00 AM │
|
||||
│ → S3 API calls: ~5,000-10,000/day │
|
||||
└─────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Components
|
||||
|
||||
1. **NetworkPolicy** (`longhorn-block-s3-access`) - **Dynamically Managed**
|
||||
- Targets: `app=longhorn-manager` pods
|
||||
- Blocks: All egress except DNS and intra-cluster
|
||||
- Effect: Prevents S3 API calls at network layer
|
||||
- **Important**: NOT managed by Flux - only the CronJobs control it
|
||||
- Flux manages the CronJobs/RBAC, but NOT the NetworkPolicy itself
|
||||
|
||||
2. **CronJob: Enable S3 Access** (`longhorn-enable-s3-access`)
|
||||
- Schedule: `55 0 * * *` (12:55 AM daily)
|
||||
- Action: Deletes NetworkPolicy
|
||||
- Result: S3 access enabled 5 minutes before earliest backup
|
||||
|
||||
3. **CronJob: Disable S3 Access** (`longhorn-disable-s3-access`)
|
||||
- Schedule: `0 4 * * *` (4:00 AM daily)
|
||||
- Action: Re-creates NetworkPolicy
|
||||
- Result: S3 access blocked after 3-hour backup window
|
||||
|
||||
4. **RBAC Resources**
|
||||
- ServiceAccount: `longhorn-netpol-manager`
|
||||
- Role: Permissions to manage NetworkPolicies
|
||||
- RoleBinding: Binds role to service account
|
||||
|
||||
## Benefits
|
||||
|
||||
| Metric | Before | After | Improvement |
|
||||
|--------|--------|-------|-------------|
|
||||
| **Daily S3 API Calls** | 145,000+ | 5,000-10,000 | **93% reduction** |
|
||||
| **Cost Impact** | Exceeds free tier | Within free tier | **$X/month savings** |
|
||||
| **Automation** | Manual intervention | Fully automated | **Zero manual work** |
|
||||
| **Backup Reliability** | Compromised | Maintained | **No impact** |
|
||||
|
||||
## Backup Schedule
|
||||
|
||||
| Type | Schedule | Retention | Window |
|
||||
|------|----------|-----------|--------|
|
||||
| **Daily** | 2:00 AM | 7 days | 12:55 AM - 4:00 AM |
|
||||
| **Weekly** | 1:00 AM Sundays | 4 weeks | Same window |
|
||||
|
||||
## FluxCD Integration
|
||||
|
||||
**Critical Design Decision**: The NetworkPolicy is **dynamically managed by CronJobs**, NOT by Flux.
|
||||
|
||||
### Why This Matters
|
||||
|
||||
Flux continuously reconciles resources to match the Git repository state. If the NetworkPolicy were managed by Flux:
|
||||
- CronJob deletes NetworkPolicy at 12:55 AM → Flux recreates it within minutes
|
||||
- S3 remains blocked during backup window → Backups fail ❌
|
||||
|
||||
### How We Solved It
|
||||
|
||||
1. **NetworkPolicy is NOT in Git** - Only the CronJobs and RBAC are in `network-policy-s3-block.yaml`
|
||||
2. **CronJobs are managed by Flux** - Flux ensures they exist and run on schedule
|
||||
3. **NetworkPolicy is created by CronJob** - Without Flux labels/ownership
|
||||
4. **Flux ignores the NetworkPolicy** - Not in Flux's inventory, so Flux won't touch it
|
||||
|
||||
### Verification
|
||||
|
||||
```bash
|
||||
# Check Flux inventory (NetworkPolicy should NOT be listed)
|
||||
kubectl get kustomization -n flux-system longhorn -o jsonpath='{.status.inventory.entries[*].id}' | grep -i network
|
||||
# (Should return nothing)
|
||||
|
||||
# Check NetworkPolicy exists (managed by CronJobs)
|
||||
kubectl get networkpolicy -n longhorn-system longhorn-block-s3-access
|
||||
# (Should exist)
|
||||
```
|
||||
|
||||
## Deployment
|
||||
|
||||
### Files Modified/Created
|
||||
|
||||
1. ✅ `network-policy-s3-block.yaml` - **NEW**: CronJobs and RBAC (NOT the NetworkPolicy itself)
|
||||
2. ✅ `kustomization.yaml` - Added new file to resources
|
||||
3. ✅ `BACKUP-GUIDE.md` - Updated with new solution documentation
|
||||
4. ✅ `S3-API-OPTIMIZATION.md` - **NEW**: This implementation summary
|
||||
5. ✅ `config-map.yaml` - Kept backup target configured (no changes needed)
|
||||
6. ✅ `longhorn.yaml` - Reverted `backupstorePollInterval` (not needed)
|
||||
|
||||
### Deployment Steps
|
||||
|
||||
1. **Commit and push** changes to your k8s-fleet branch
|
||||
2. **FluxCD will automatically apply** the new NetworkPolicy and CronJobs
|
||||
3. **Monitor for one backup cycle**:
|
||||
```bash
|
||||
# Watch CronJobs
|
||||
kubectl get cronjobs -n longhorn-system -w
|
||||
|
||||
# Check NetworkPolicy status
|
||||
kubectl get networkpolicy -n longhorn-system
|
||||
|
||||
# Verify backups complete
|
||||
kubectl get backups -n longhorn-system
|
||||
```
|
||||
|
||||
### Verification Steps
|
||||
|
||||
#### Day 1: Initial Deployment
|
||||
```bash
|
||||
# 1. Verify NetworkPolicy is active (should exist immediately)
|
||||
kubectl get networkpolicy -n longhorn-system longhorn-block-s3-access
|
||||
|
||||
# 2. Verify CronJobs are scheduled
|
||||
kubectl get cronjobs -n longhorn-system | grep longhorn-.*-s3-access
|
||||
|
||||
# 3. Test: S3 access should be blocked
|
||||
kubectl exec -n longhorn-system deploy/longhorn-ui -- curl -I https://<B2_ENDPOINT>
|
||||
# Expected: Connection timeout or network error
|
||||
```
|
||||
|
||||
#### Day 2: After First Backup Window
|
||||
```bash
|
||||
# 1. Check if CronJob ran successfully (should see completed job at 12:55 AM)
|
||||
kubectl get jobs -n longhorn-system | grep enable-s3-access
|
||||
|
||||
# 2. Verify backups completed (check after 4:00 AM)
|
||||
kubectl get backups -n longhorn-system
|
||||
# Should see new backups with recent timestamps
|
||||
|
||||
# 3. Confirm NetworkPolicy was re-applied (after 4:00 AM)
|
||||
kubectl get networkpolicy -n longhorn-system longhorn-block-s3-access
|
||||
# Should exist again
|
||||
|
||||
# 4. Check CronJob logs
|
||||
kubectl logs -n longhorn-system job/longhorn-enable-s3-access-<timestamp>
|
||||
kubectl logs -n longhorn-system job/longhorn-disable-s3-access-<timestamp>
|
||||
```
|
||||
|
||||
#### Week 1: Monitor S3 API Usage
|
||||
```bash
|
||||
# Monitor Backblaze B2 dashboard
|
||||
# → Daily Class C transactions should drop from 145,000 to 5,000-10,000
|
||||
# → Verify calls only occur during 1-4 AM window
|
||||
```
|
||||
|
||||
## Manual Backup Outside Window
|
||||
|
||||
If you need to create a backup outside the scheduled window:
|
||||
|
||||
```bash
|
||||
# 1. Temporarily remove NetworkPolicy
|
||||
kubectl delete networkpolicy -n longhorn-system longhorn-block-s3-access
|
||||
|
||||
# 2. Create backup via Longhorn UI or:
|
||||
kubectl create -f - <<EOF
|
||||
apiVersion: longhorn.io/v1beta2
|
||||
kind: Backup
|
||||
metadata:
|
||||
name: manual-backup-$(date +%s)
|
||||
namespace: longhorn-system
|
||||
spec:
|
||||
snapshotName: <snapshot-name>
|
||||
labels:
|
||||
backup-type: manual
|
||||
EOF
|
||||
|
||||
# 3. Wait for backup to complete
|
||||
kubectl get backup -n longhorn-system manual-backup-* -w
|
||||
|
||||
# 4. Restore NetworkPolicy
|
||||
kubectl apply -f manifests/infrastructure/longhorn/network-policy-s3-block.yaml
|
||||
```
|
||||
|
||||
Or simply wait until the next automatic re-application at 4:00 AM.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### NetworkPolicy Not Blocking S3
|
||||
|
||||
**Symptom**: S3 calls continue despite NetworkPolicy being active
|
||||
|
||||
**Check**:
|
||||
```bash
|
||||
# Verify NetworkPolicy is applied
|
||||
kubectl describe networkpolicy -n longhorn-system longhorn-block-s3-access
|
||||
|
||||
# Check if CNI supports NetworkPolicies (Cilium does)
|
||||
kubectl get pods -n kube-system | grep cilium
|
||||
```
|
||||
|
||||
### Backups Failing
|
||||
|
||||
**Symptom**: Backups fail during scheduled window
|
||||
|
||||
**Check**:
|
||||
```bash
|
||||
# Verify NetworkPolicy was removed during backup window
|
||||
kubectl get networkpolicy -n longhorn-system
|
||||
# Should NOT exist between 12:55 AM - 4:00 AM
|
||||
|
||||
# Check enable-s3-access CronJob ran
|
||||
kubectl get jobs -n longhorn-system | grep enable
|
||||
|
||||
# Check Longhorn manager logs
|
||||
kubectl logs -n longhorn-system -l app=longhorn-manager --tail=100
|
||||
```
|
||||
|
||||
### CronJobs Not Running
|
||||
|
||||
**Symptom**: CronJobs never execute
|
||||
|
||||
**Check**:
|
||||
```bash
|
||||
# Verify CronJobs exist and are scheduled
|
||||
kubectl get cronjobs -n longhorn-system -o wide
|
||||
|
||||
# Check events
|
||||
kubectl get events -n longhorn-system --sort-by='.lastTimestamp' | grep CronJob
|
||||
|
||||
# Manually trigger a job
|
||||
kubectl create job -n longhorn-system test-enable --from=cronjob/longhorn-enable-s3-access
|
||||
```
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
1. **Adjust Window Size**: If backups consistently complete faster than 3 hours, reduce window to 2 hours (change disable CronJob to `0 3 * * *`)
|
||||
|
||||
2. **Alerting**: Add Prometheus alerts for:
|
||||
- Backup failures during window
|
||||
- CronJob execution failures
|
||||
- NetworkPolicy re-creation failures
|
||||
|
||||
3. **Metrics**: Track actual S3 API call counts via Backblaze B2 API and alert if threshold exceeded
|
||||
|
||||
## References
|
||||
|
||||
- [Longhorn Issue #1547 - Excessive S3 Calls](https://github.com/longhorn/longhorn/issues/1547)
|
||||
- [Community NetworkPolicy Solution](https://github.com/longhorn/longhorn/issues/1547#issuecomment-3395447100)
|
||||
- [Longhorn Backup Target Documentation](https://longhorn.io/docs/1.9.0/snapshots-and-backups/backup-and-restore/set-backup-target/)
|
||||
- [Kubernetes NetworkPolicy Documentation](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
|
||||
|
||||
## Success Metrics
|
||||
|
||||
After 1 week of operation, you should observe:
|
||||
- ✅ S3 API calls reduced by 85-93%
|
||||
- ✅ Backblaze costs within free tier
|
||||
- ✅ All scheduled backups completing successfully
|
||||
- ✅ Zero manual intervention required
|
||||
- ✅ Longhorn polls fail silently (network errors) outside backup window
|
||||
|
||||
200
manifests/infrastructure/longhorn/S3-API-SOLUTION-FINAL.md
Normal file
200
manifests/infrastructure/longhorn/S3-API-SOLUTION-FINAL.md
Normal file
@@ -0,0 +1,200 @@
|
||||
# Longhorn S3 API Call Reduction - Final Solution
|
||||
|
||||
## Problem Summary
|
||||
|
||||
Longhorn was making **145,000+ Class C API calls/day** to Backblaze B2, primarily `s3_list_objects` operations. This exceeded Backblaze's free tier (2,500 calls/day) by 58x, incurring significant costs.
|
||||
|
||||
## Root Cause
|
||||
|
||||
Longhorn's `backupstore-poll-interval` setting controls how frequently Longhorn managers poll the S3 backup target to check for new backups (primarily for Disaster Recovery volumes). With 3 manager pods and a low poll interval, this resulted in excessive API calls.
|
||||
|
||||
## Solution History
|
||||
|
||||
### Attempt 1: NetworkPolicy-Based Access Control ❌
|
||||
|
||||
**Approach**: Use NetworkPolicies dynamically managed by CronJobs to block S3 access outside backup windows (12:55 AM - 4:00 AM).
|
||||
|
||||
**Why It Failed**:
|
||||
- NetworkPolicies that blocked external S3 also inadvertently blocked the Kubernetes API server
|
||||
- Longhorn manager pods couldn't perform leader election or webhook operations
|
||||
- Pods entered 1/2 Ready state with errors: `error retrieving resource lock longhorn-system/longhorn-manager-webhook-lock: dial tcp 10.96.0.1:443: i/o timeout`
|
||||
- Even with CIDR-based rules (10.244.0.0/16 for pods, 10.96.0.0/12 for services), the NetworkPolicy was too aggressive
|
||||
- Cilium/NetworkPolicy interaction complexity made it unreliable
|
||||
|
||||
**Files Created** (kept for reference):
|
||||
- `network-policy-s3-block.yaml` - CronJobs and NetworkPolicy definitions
|
||||
- Removed from `kustomization.yaml` but retained in repository
|
||||
|
||||
## Final Solution: Increased Poll Interval ✅
|
||||
|
||||
### Implementation
|
||||
|
||||
**Change**: Set `backupstore-poll-interval` to `86400` seconds (24 hours) instead of `0`.
|
||||
|
||||
**Location**: `manifests/infrastructure/longhorn/config-map.yaml`
|
||||
|
||||
```yaml
|
||||
data:
|
||||
default-resource.yaml: |-
|
||||
"backup-target": "s3://<BUCKET_NAME>@<B2_ENDPOINT>/longhorn-backup"
|
||||
"backup-target-credential-secret": "backblaze-credentials"
|
||||
"backupstore-poll-interval": "86400" # 24 hours
|
||||
"virtual-hosted-style": "true"
|
||||
```
|
||||
|
||||
### Why This Works
|
||||
|
||||
1. **Dramatic Reduction**: Polling happens once per day instead of continuously
|
||||
2. **No Breakage**: Kubernetes API, webhooks, and leader election work normally
|
||||
3. **Simple**: No complex NetworkPolicies or CronJobs to manage
|
||||
4. **Reliable**: Well-tested Longhorn configuration option
|
||||
5. **Sufficient**: Backups don't require frequent polling since we use scheduled recurring jobs
|
||||
|
||||
### Expected Results
|
||||
|
||||
| Metric | Before | After | Improvement |
|
||||
|--------|--------|-------|-------------|
|
||||
| **Poll Frequency** | Every ~5 seconds | Every 24 hours | **99.99% reduction** |
|
||||
| **Daily S3 API Calls** | 145,000+ | ~300-1,000 | **99% reduction** 📉 |
|
||||
| **Backblaze Costs** | Exceeds free tier | Within free tier | ✅ |
|
||||
| **System Stability** | Affected by NetworkPolicy | Stable | ✅ |
|
||||
|
||||
## Current Status
|
||||
|
||||
✅ **Applied**: ConfigMap updated with `backupstore-poll-interval: 86400`
|
||||
✅ **Verified**: Longhorn manager pods are 2/2 Ready
|
||||
✅ **Backups**: Continue working normally via recurring jobs
|
||||
✅ **Monitoring**: Backblaze API usage should drop to <1,000 calls/day
|
||||
|
||||
## Monitoring
|
||||
|
||||
### Check Longhorn Manager Health
|
||||
|
||||
```bash
|
||||
kubectl get pods -n longhorn-system -l app=longhorn-manager
|
||||
# Should show: 2/2 Ready for all pods
|
||||
```
|
||||
|
||||
### Check Poll Interval Setting
|
||||
|
||||
```bash
|
||||
kubectl get configmap -n longhorn-system longhorn-default-resource -o jsonpath='{.data.default-resource\.yaml}' | grep backupstore-poll-interval
|
||||
# Should show: "backupstore-poll-interval": "86400"
|
||||
```
|
||||
|
||||
### Check Backups Continue Working
|
||||
|
||||
```bash
|
||||
kubectl get backups -n longhorn-system --sort-by=.status.snapshotCreatedAt | tail -10
|
||||
# Should see recent backups with "Completed" status
|
||||
```
|
||||
|
||||
### Monitor Backblaze API Usage
|
||||
|
||||
1. Log into Backblaze B2 dashboard
|
||||
2. Navigate to "Caps and Alerts"
|
||||
3. Check "Class C Transactions" (includes `s3_list_objects`)
|
||||
4. **Expected**: Should drop from 145,000/day to ~300-1,000/day within 24-48 hours
|
||||
|
||||
## Backup Schedule (Unchanged)
|
||||
|
||||
| Type | Schedule | Retention |
|
||||
|------|----------|-----------|
|
||||
| **Daily** | 2:00 AM | 7 days |
|
||||
| **Weekly** | 1:00 AM Sundays | 4 weeks |
|
||||
|
||||
Backups are triggered by `RecurringJob` resources, not by polling.
|
||||
|
||||
## Why Polling Isn't Critical
|
||||
|
||||
**Longhorn's backupstore polling is primarily for**:
|
||||
- Disaster Recovery (DR) volumes that need continuous sync
|
||||
- Detecting backups created outside the cluster
|
||||
|
||||
**We don't use DR volumes**, and all backups are created by recurring jobs within the cluster, so:
|
||||
- ✅ Once-daily polling is more than sufficient
|
||||
- ✅ Backups work independently of polling frequency
|
||||
- ✅ Manual backups via Longhorn UI still work immediately
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### If Pods Show 1/2 Ready
|
||||
|
||||
**Symptom**: Longhorn manager pods stuck at 1/2 Ready
|
||||
|
||||
**Cause**: NetworkPolicy may have been accidentally applied
|
||||
|
||||
**Solution**:
|
||||
```bash
|
||||
# Check for NetworkPolicy
|
||||
kubectl get networkpolicy -n longhorn-system
|
||||
|
||||
# If found, delete it
|
||||
kubectl delete networkpolicy -n longhorn-system longhorn-block-s3-access
|
||||
|
||||
# Wait 30 seconds
|
||||
sleep 30
|
||||
|
||||
# Verify pods recover
|
||||
kubectl get pods -n longhorn-system -l app=longhorn-manager
|
||||
```
|
||||
|
||||
### If S3 API Calls Remain High
|
||||
|
||||
**Check poll interval is applied**:
|
||||
```bash
|
||||
kubectl get configmap -n longhorn-system longhorn-default-resource -o yaml
|
||||
```
|
||||
|
||||
**Restart Longhorn managers to pick up changes**:
|
||||
```bash
|
||||
kubectl rollout restart daemonset -n longhorn-system longhorn-manager
|
||||
```
|
||||
|
||||
### If Backups Fail
|
||||
|
||||
Backups should continue working normally since they're triggered by recurring jobs, not polling. If issues occur:
|
||||
|
||||
```bash
|
||||
# Check recurring jobs
|
||||
kubectl get recurringjobs -n longhorn-system
|
||||
|
||||
# Check recent backup jobs
|
||||
kubectl get jobs -n longhorn-system | grep backup
|
||||
|
||||
# Check backup target connectivity (should work anytime)
|
||||
MANAGER_POD=$(kubectl get pods -n longhorn-system -l app=longhorn-manager --no-headers | head -1 | awk '{print $1}')
|
||||
kubectl exec -n longhorn-system "$MANAGER_POD" -c longhorn-manager -- curl -I https://<B2_ENDPOINT>
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
- [Longhorn Issue #1547](https://github.com/longhorn/longhorn/issues/1547) - Original excessive S3 calls issue
|
||||
- [Longhorn Backup Target Documentation](https://longhorn.io/docs/1.9.0/snapshots-and-backups/backup-and-restore/set-backup-target/)
|
||||
- Longhorn version: v1.9.0
|
||||
|
||||
## Files Modified
|
||||
|
||||
1. ✅ `config-map.yaml` - Updated `backupstore-poll-interval` to 86400
|
||||
2. ✅ `kustomization.yaml` - Removed network-policy-s3-block.yaml reference
|
||||
3. ✅ `network-policy-s3-block.yaml` - Retained for reference (not applied)
|
||||
4. ✅ `S3-API-SOLUTION-FINAL.md` - This document
|
||||
|
||||
## Lessons Learned
|
||||
|
||||
1. **NetworkPolicies are tricky**: Blocking external traffic can inadvertently block internal cluster communication
|
||||
2. **Start simple**: Configuration-based solutions are often more reliable than complex automation
|
||||
3. **Test thoroughly**: Always verify pods remain healthy after applying NetworkPolicies
|
||||
4. **Understand the feature**: Longhorn's polling is for DR volumes, which we don't use
|
||||
5. **24-hour polling is sufficient**: For non-DR use cases, frequent polling isn't necessary
|
||||
|
||||
## Success Metrics
|
||||
|
||||
Monitor these over the next week:
|
||||
|
||||
- ✅ Longhorn manager pods: 2/2 Ready
|
||||
- ✅ Daily backups: Completing successfully
|
||||
- ✅ S3 API calls: <1,000/day (down from 145,000)
|
||||
- ✅ Backblaze costs: Within free tier
|
||||
- ✅ No manual intervention required
|
||||
|
||||
41
manifests/infrastructure/longhorn/backblaze-secret.yaml
Normal file
41
manifests/infrastructure/longhorn/backblaze-secret.yaml
Normal file
@@ -0,0 +1,41 @@
|
||||
apiVersion: v1
|
||||
kind: Secret
|
||||
metadata:
|
||||
name: backblaze-credentials
|
||||
namespace: longhorn-system
|
||||
type: Opaque
|
||||
stringData:
|
||||
AWS_ACCESS_KEY_ID: ENC[AES256_GCM,data:OGCSNVoeABeigczChYkRTKjIsjEYDA+cNA==,iv:So6ipxl+te3LkPbtyOwixnvv4DPbzl0yCGT8cqPgPbY=,tag:ApaM+bBqi9BJU/EVraKWrQ==,type:str]
|
||||
AWS_SECRET_ACCESS_KEY: ENC[AES256_GCM,data:EMFNPCdt/V+2d4xnVARNTBBpY3UTqvpN3LezT/TZ7w==,iv:Q5pNnuKX+lUt/V4xpgF2Zg1q6e1znvG+laDNrLIrgBY=,tag:xGF/SvAJ9+tfuB7QdirAhw==,type:str]
|
||||
AWS_ENDPOINTS: ENC[AES256_GCM,data:PSiRbt53KKK5XOOxIEiiycaFTriaJbuY0Z4Q9yC1xTwz9H/+hoOQ35w=,iv:pGwbR98F5C4N9Vca9btaJ9mKVS7XUkL8+Pva7TWTeTk=,tag:PxFllLIjj+wXDSXGuU/oLA==,type:str]
|
||||
VIRTUAL_HOST_STYLE: ENC[AES256_GCM,data:a9RJ2Q==,iv:1VSTWiv1WFia0rgwkoZ9WftaLDdKtJabwiyY90AWvNY=,tag:tQZDFjqAABueZJ4bjD2PfA==,type:str]
|
||||
sops:
|
||||
lastmodified: "2025-06-30T18:44:50Z"
|
||||
mac: ENC[AES256_GCM,data:5cdqJQiwoFwWfaNjtqNiaD5sY31979cdS4R6vBmNIKqd7ZaCMJLEKBm5lCLF7ow3+V17pxGhVu4EXX+rKVaNu6Qs6ivXtVM+kA0RutqPFnWDVfoZcnuW98IBjpyh4i9Y6Dra8zSda++Dt2R7Frouc/7lT74ANZYmSRN9WCYsTNg=,iv:s9c+YDDxAUdjWlzsx5jALux2UW5dtg56Pfi3FF4K0lU=,tag:U9bTTOZaqQ9lekpsIbUkWA==,type:str]
|
||||
pgp:
|
||||
- created_at: "2025-06-30T18:44:50Z"
|
||||
enc: |-
|
||||
-----BEGIN PGP MESSAGE-----
|
||||
|
||||
hF4DZT3mpHTS/JgSAQdAbJ88Og3rBkHDPJXf04xSp79A1rfXUDwsP2Wzz0rgI2ww
|
||||
67XRMSSu2nUApEk08vf1ZF5ulewMQbnVjDDqvM8+BcgELllZVhnNW09NzMb5uPD+
|
||||
1GgBCQIQXzEZTIi11OR5Z44vLkU64tF+yAPzA6j6y0lyemabOJLDB/XJiV/nq57h
|
||||
+Udy8rg3sAmZt6FmBiTssKpxy6C6nFFSHVnTY7RhKg9p87AYKz36bSUI7TRhjZGb
|
||||
f9U9EUo09Zh4JA==
|
||||
=6fMP
|
||||
-----END PGP MESSAGE-----
|
||||
fp: B120595CA9A643B051731B32E67FF350227BA4E8
|
||||
- created_at: "2025-06-30T18:44:50Z"
|
||||
enc: |-
|
||||
-----BEGIN PGP MESSAGE-----
|
||||
|
||||
hF4DSXzd60P2RKISAQdAPYpP5mUd4lVstNeGURyFoXbfPbaSH+IlSxgrh/wBfCEw
|
||||
oI6DwAxkRAxLRwptJoQA9zU+N6LRN+o5kcHLMG/eNnUyNdAfNg17fs16UXf5N2Gi
|
||||
1GgBCQIQRcLoTo+r7TyUUTxtPGIrQ7c5jy7WFRzm25XqLuvwTYipDTbQC5PyZu5R
|
||||
4zFgx4ZfDayB3ldPMoAHZ8BeB2VTiQID+HRQGGbSSCM7U+HvzSXNuapNSGXpfWEA
|
||||
qShkjhXz1sF7JQ==
|
||||
=UqeC
|
||||
-----END PGP MESSAGE-----
|
||||
fp: 4A8AADB4EBAB9AF88EF7062373CECE06CC80D40C
|
||||
encrypted_regex: ^(data|stringData)$
|
||||
version: 3.10.2
|
||||
78
manifests/infrastructure/longhorn/backup-examples.yaml
Normal file
78
manifests/infrastructure/longhorn/backup-examples.yaml
Normal file
@@ -0,0 +1,78 @@
|
||||
# Examples of how to apply S3 backup recurring jobs to volumes
|
||||
# These are examples - you would apply these patterns to your actual PVCs/StorageClasses
|
||||
|
||||
---
|
||||
# Example 1: Apply backup labels to an existing PVC
|
||||
# This requires the PVC to be labeled as a recurring job source first
|
||||
apiVersion: v1
|
||||
kind: PersistentVolumeClaim
|
||||
metadata:
|
||||
name: example-app-data
|
||||
namespace: default
|
||||
labels:
|
||||
# Enable this PVC as a source for recurring job labels
|
||||
recurring-job.longhorn.io/source: "enabled"
|
||||
# Apply daily backup job group
|
||||
recurring-job-group.longhorn.io/longhorn-s3-backup: "enabled"
|
||||
# OR apply weekly backup job group (choose one)
|
||||
# recurring-job-group.longhorn.io/longhorn-s3-backup-weekly: "enabled"
|
||||
# OR apply specific recurring job by name
|
||||
# recurring-job.longhorn.io/s3-backup-daily: "enabled"
|
||||
spec:
|
||||
accessModes:
|
||||
- ReadWriteOnce
|
||||
resources:
|
||||
requests:
|
||||
storage: 10Gi
|
||||
storageClassName: longhorn
|
||||
|
||||
---
|
||||
# Example 2: StorageClass with automatic backup assignment
|
||||
# Any PVC created with this StorageClass will automatically get backups
|
||||
apiVersion: storage.k8s.io/v1
|
||||
kind: StorageClass
|
||||
metadata:
|
||||
name: longhorn-backup-daily
|
||||
provisioner: driver.longhorn.io
|
||||
allowVolumeExpansion: true
|
||||
reclaimPolicy: Retain
|
||||
volumeBindingMode: Immediate
|
||||
parameters:
|
||||
numberOfReplicas: "2"
|
||||
staleReplicaTimeout: "30"
|
||||
fromBackup: ""
|
||||
# Automatically assign backup jobs to volumes created with this StorageClass
|
||||
recurringJobSelector: |
|
||||
[
|
||||
{
|
||||
"name":"longhorn-s3-backup",
|
||||
"isGroup":true
|
||||
}
|
||||
]
|
||||
|
||||
---
|
||||
# Example 3: StorageClass for critical data with both daily and weekly backups
|
||||
apiVersion: storage.k8s.io/v1
|
||||
kind: StorageClass
|
||||
metadata:
|
||||
name: longhorn-backup-critical
|
||||
provisioner: driver.longhorn.io
|
||||
allowVolumeExpansion: true
|
||||
reclaimPolicy: Retain
|
||||
volumeBindingMode: Immediate
|
||||
parameters:
|
||||
numberOfReplicas: "2"
|
||||
staleReplicaTimeout: "30"
|
||||
fromBackup: ""
|
||||
# Assign both daily and weekly backup groups
|
||||
recurringJobSelector: |
|
||||
[
|
||||
{
|
||||
"name":"longhorn-s3-backup",
|
||||
"isGroup":true
|
||||
},
|
||||
{
|
||||
"name":"longhorn-s3-backup-weekly",
|
||||
"isGroup":true
|
||||
}
|
||||
]
|
||||
37
manifests/infrastructure/longhorn/config-map.yaml
Normal file
37
manifests/infrastructure/longhorn/config-map.yaml
Normal file
@@ -0,0 +1,37 @@
|
||||
apiVersion: v1
|
||||
kind: ConfigMap
|
||||
metadata:
|
||||
name: longhorn-default-resource
|
||||
namespace: longhorn-system
|
||||
data:
|
||||
default-resource.yaml: ENC[AES256_GCM,data:vw2doEgVQYr1p9vHN9MLqoOSVM8LDBeowAvs2zOkwmGPue8QLxkxxpaFRy2zJH9igjXn30h1dsukmSZBfD9Y3cwrRcvuEZRMo3IsAJ6M1G/oeVpKc14Rll6/V48ZXPiB9qfn1upmUbJtl1EMyPc3vUetUD37fI81N3x4+bNK2OB6V8yGczuE3bJxIi4vV/Zay83Z3s0VyNRF4y18R3T0200Ib5KomANAZUMSCxKvjv4GOKHGYTVE5+C4LFxeOnPgmAtjV4x+lKcNCD1saNZ56yhVzsKVJClLdaRtIQ==,iv:s3OyHFQxd99NGwjXxHqa8rs9aYsl1vf+GCLNtvZ9nuc=,tag:2n8RLcHmp9ueKNm12MxjxQ==,type:str]
|
||||
sops:
|
||||
lastmodified: "2025-11-12T10:07:54Z"
|
||||
mac: ENC[AES256_GCM,data:VBxywwWrVnKiyby+FzCdUlI89OkruNh1jyFE3cVXU/WR4FoCWclDSQ8v0FxT+/mS1/0eTX9XAXVIyqtzpAUU3YY3znq2CU8qsZa45B2PlPQP+7qGNBcyrpZZCsJxTYO/+jxr/9gV4pAJV27HFnyYfZDVZxArLUWQs32eJSdOfpc=,iv:7lbZjWhSEX7NisarWxCAAvw3+8v6wadq3/chrjWk2GQ=,tag:9AZyEuo7omdCbtRJ3YDarA==,type:str]
|
||||
pgp:
|
||||
- created_at: "2025-11-09T13:37:18Z"
|
||||
enc: |-
|
||||
-----BEGIN PGP MESSAGE-----
|
||||
|
||||
hF4DZT3mpHTS/JgSAQdAYMBTNc+JasEkeJpsS1d8OQ6iuhRTULXvFrGEia7gLXkw
|
||||
+TRNuC4ZH+Lxmb5s3ImRX9dF1cMXoMGUCWJN/bScm5cLElNd2dHrtFoElVjn4/vI
|
||||
1GgBCQIQ4jPpbQJym+xU5jS5rN3dtW6U60IYxX5rPvh0294bxgOzIIqI/oI/0qak
|
||||
C4EYFsfH9plAOmvF56SnFX0PSczBjyUlngJ36NFHMN3any7qW/C0tYXFF3DDiOC3
|
||||
kpa/moMr5CNTnQ==
|
||||
=xVwB
|
||||
-----END PGP MESSAGE-----
|
||||
fp: B120595CA9A643B051731B32E67FF350227BA4E8
|
||||
- created_at: "2025-11-09T13:37:18Z"
|
||||
enc: |-
|
||||
-----BEGIN PGP MESSAGE-----
|
||||
|
||||
hF4DSXzd60P2RKISAQdA9omTE+Cuy7BvMA8xfqsZv2o+Jh3QvOL+gZY/Z5CuVgIw
|
||||
IBgwiVypHqwDf8loCVIdlo1/h5gctj/t11cxb2hKNRGQ0kFNLdpu5Mx+RbJZ/az/
|
||||
1GgBCQIQB/gKeYbAqSxrJMKl/Q+6PfAXTAjH33K8IlDQKbF8q3QvoQDJJU3i0XwQ
|
||||
ljhWRC/RZzO7hHXJqkR9z5sVIysHoEo+O9DZ0OzefjKb+GscdgSwJwGgsZzrVRXP
|
||||
kSLdNO0eE5ubMQ==
|
||||
=O/Lu
|
||||
-----END PGP MESSAGE-----
|
||||
fp: 4A8AADB4EBAB9AF88EF7062373CECE06CC80D40C
|
||||
encrypted_regex: ^(data|stringData)$
|
||||
version: 3.10.2
|
||||
11
manifests/infrastructure/longhorn/kustomization.yaml
Normal file
11
manifests/infrastructure/longhorn/kustomization.yaml
Normal file
@@ -0,0 +1,11 @@
|
||||
---
|
||||
apiVersion: kustomize.config.k8s.io/v1beta1
|
||||
kind: Kustomization
|
||||
resources:
|
||||
- namespace.yaml
|
||||
- longhorn.yaml
|
||||
- storageclass.yaml
|
||||
- backblaze-secret.yaml
|
||||
- config-map.yaml
|
||||
- recurring-job-s3-backup.yaml
|
||||
- network-policy-s3-block.yaml
|
||||
64
manifests/infrastructure/longhorn/longhorn.yaml
Normal file
64
manifests/infrastructure/longhorn/longhorn.yaml
Normal file
@@ -0,0 +1,64 @@
|
||||
---
|
||||
apiVersion: source.toolkit.fluxcd.io/v1
|
||||
kind: HelmRepository
|
||||
metadata:
|
||||
name: longhorn-repo
|
||||
namespace: longhorn-system
|
||||
spec:
|
||||
interval: 5m0s
|
||||
url: https://charts.longhorn.io
|
||||
---
|
||||
apiVersion: helm.toolkit.fluxcd.io/v2
|
||||
kind: HelmRelease
|
||||
metadata:
|
||||
name: longhorn-release
|
||||
namespace: longhorn-system
|
||||
spec:
|
||||
interval: 5m
|
||||
chart:
|
||||
spec:
|
||||
chart: longhorn
|
||||
version: v1.10.0
|
||||
sourceRef:
|
||||
kind: HelmRepository
|
||||
name: longhorn-repo
|
||||
namespace: longhorn-system
|
||||
interval: 1m
|
||||
values:
|
||||
# Use hotfixed longhorn-manager image
|
||||
image:
|
||||
longhorn:
|
||||
manager:
|
||||
tag: v1.10.0-hotfix-1
|
||||
defaultSettings:
|
||||
defaultDataPath: /var/mnt/longhorn-storage
|
||||
defaultReplicaCount: "2"
|
||||
replicaNodeLevelSoftAntiAffinity: true
|
||||
allowVolumeCreationWithDegradedAvailability: false
|
||||
guaranteedInstanceManagerCpu: 5
|
||||
createDefaultDiskLabeledNodes: true
|
||||
# Multi-node optimized settings
|
||||
storageMinimalAvailablePercentage: "20"
|
||||
storageReservedPercentageForDefaultDisk: "15"
|
||||
storageOverProvisioningPercentage: "200"
|
||||
# Single replica for UI
|
||||
service:
|
||||
ui:
|
||||
type: ClusterIP
|
||||
# Longhorn UI replica count
|
||||
longhornUI:
|
||||
replicas: 1
|
||||
# Enable metrics collection
|
||||
metrics:
|
||||
serviceMonitor:
|
||||
enabled: true
|
||||
longhornManager:
|
||||
tolerations:
|
||||
- effect: NoSchedule
|
||||
key: node-role.kubernetes.io/control-plane
|
||||
operator: Exists
|
||||
longhornDriver:
|
||||
tolerations:
|
||||
- effect: NoSchedule
|
||||
key: node-role.kubernetes.io/control-plane
|
||||
operator: Exists
|
||||
8
manifests/infrastructure/longhorn/namespace.yaml
Normal file
8
manifests/infrastructure/longhorn/namespace.yaml
Normal file
@@ -0,0 +1,8 @@
|
||||
---
|
||||
apiVersion: v1
|
||||
kind: Namespace
|
||||
metadata:
|
||||
name: longhorn-system
|
||||
labels:
|
||||
pod-security.kubernetes.io/enforce: privileged
|
||||
pod-security.kubernetes.io/enforce-version: latest
|
||||
211
manifests/infrastructure/longhorn/network-policy-s3-block.yaml
Normal file
211
manifests/infrastructure/longhorn/network-policy-s3-block.yaml
Normal file
@@ -0,0 +1,211 @@
|
||||
---
|
||||
# Longhorn S3 Access Control via NetworkPolicy
|
||||
#
|
||||
# NetworkPolicy that blocks external S3 access by default, with CronJobs to
|
||||
# automatically remove it during backup windows (12:55 AM - 4:00 AM).
|
||||
#
|
||||
# Network Details:
|
||||
# - Pod CIDR: 10.244.0.0/16 (within 10.0.0.0/8)
|
||||
# - Service CIDR: 10.96.0.0/12 (within 10.0.0.0/8)
|
||||
# - VLAN Network: 10.132.0.0/24 (within 10.0.0.0/8)
|
||||
#
|
||||
# How It Works:
|
||||
# - NetworkPolicy is applied by default, blocking external S3 (Backblaze B2)
|
||||
# - CronJob removes NetworkPolicy at 12:55 AM (5 min before earliest backup at 1 AM)
|
||||
# - CronJob reapplies NetworkPolicy at 4:00 AM (after backup window closes)
|
||||
# - Allows all internal cluster traffic (10.0.0.0/8) while blocking external S3
|
||||
#
|
||||
# Backup Schedule:
|
||||
# - Daily backups: 2:00 AM
|
||||
# - Weekly backups: 1:00 AM Sundays
|
||||
# - Backup window: 12:55 AM - 4:00 AM (3 hours 5 minutes)
|
||||
#
|
||||
# See: BACKUP-GUIDE.md and S3-API-SOLUTION-FINAL.md for full documentation
|
||||
---
|
||||
# NetworkPolicy: Blocks S3 access by default
|
||||
# This is applied initially, then managed by CronJobs below
|
||||
# Using CiliumNetworkPolicy for better API server support via toEntities
|
||||
apiVersion: cilium.io/v2
|
||||
kind: CiliumNetworkPolicy
|
||||
metadata:
|
||||
name: longhorn-block-s3-access
|
||||
namespace: longhorn-system
|
||||
labels:
|
||||
app: longhorn
|
||||
purpose: s3-access-control
|
||||
spec:
|
||||
description: "Block external S3 access while allowing internal cluster communication"
|
||||
endpointSelector:
|
||||
matchLabels:
|
||||
app: longhorn-manager
|
||||
egress:
|
||||
# Allow DNS to kube-system namespace
|
||||
- toEndpoints:
|
||||
- matchLabels:
|
||||
k8s-app: kube-dns
|
||||
toPorts:
|
||||
- ports:
|
||||
- port: "53"
|
||||
protocol: UDP
|
||||
- port: "53"
|
||||
protocol: TCP
|
||||
# Explicitly allow Kubernetes API server (critical for Longhorn)
|
||||
# Cilium handles this specially - kube-apiserver entity is required
|
||||
- toEntities:
|
||||
- kube-apiserver
|
||||
# Allow all internal cluster traffic (10.0.0.0/8)
|
||||
# This includes:
|
||||
# - Pod CIDR: 10.244.0.0/16
|
||||
# - Service CIDR: 10.96.0.0/12 (API server already covered above)
|
||||
# - VLAN Network: 10.132.0.0/24
|
||||
# - All other internal 10.x.x.x addresses
|
||||
- toCIDR:
|
||||
- 10.0.0.0/8
|
||||
# Allow pod-to-pod communication within cluster
|
||||
# The 10.0.0.0/8 CIDR block above covers all pod-to-pod communication
|
||||
# This explicit rule ensures instance-manager pods are reachable
|
||||
- toEntities:
|
||||
- cluster
|
||||
# Block all other egress (including external S3 like Backblaze B2)
|
||||
---
|
||||
# RBAC for CronJobs that manage the NetworkPolicy
|
||||
apiVersion: v1
|
||||
kind: ServiceAccount
|
||||
metadata:
|
||||
name: longhorn-netpol-manager
|
||||
namespace: longhorn-system
|
||||
---
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: Role
|
||||
metadata:
|
||||
name: longhorn-netpol-manager
|
||||
namespace: longhorn-system
|
||||
rules:
|
||||
- apiGroups: ["cilium.io"]
|
||||
resources: ["ciliumnetworkpolicies"]
|
||||
verbs: ["get", "create", "delete"]
|
||||
---
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: RoleBinding
|
||||
metadata:
|
||||
name: longhorn-netpol-manager
|
||||
namespace: longhorn-system
|
||||
roleRef:
|
||||
apiGroup: rbac.authorization.k8s.io
|
||||
kind: Role
|
||||
name: longhorn-netpol-manager
|
||||
subjects:
|
||||
- kind: ServiceAccount
|
||||
name: longhorn-netpol-manager
|
||||
namespace: longhorn-system
|
||||
---
|
||||
# CronJob: Remove NetworkPolicy before backups (12:55 AM daily)
|
||||
# This allows S3 access during the backup window
|
||||
apiVersion: batch/v1
|
||||
kind: CronJob
|
||||
metadata:
|
||||
name: longhorn-enable-s3-access
|
||||
namespace: longhorn-system
|
||||
labels:
|
||||
app: longhorn
|
||||
purpose: s3-access-control
|
||||
spec:
|
||||
# Run at 12:55 AM daily (5 minutes before earliest backup at 1:00 AM Sunday weekly)
|
||||
schedule: "55 0 * * *"
|
||||
successfulJobsHistoryLimit: 2
|
||||
failedJobsHistoryLimit: 2
|
||||
concurrencyPolicy: Forbid
|
||||
jobTemplate:
|
||||
spec:
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app: longhorn-netpol-manager
|
||||
spec:
|
||||
serviceAccountName: longhorn-netpol-manager
|
||||
restartPolicy: OnFailure
|
||||
containers:
|
||||
- name: delete-netpol
|
||||
image: bitnami/kubectl:latest
|
||||
imagePullPolicy: IfNotPresent
|
||||
command:
|
||||
- /bin/sh
|
||||
- -c
|
||||
- |
|
||||
echo "Removing CiliumNetworkPolicy to allow S3 access for backups..."
|
||||
kubectl delete ciliumnetworkpolicy longhorn-block-s3-access -n longhorn-system --ignore-not-found=true
|
||||
echo "S3 access enabled. Backups can proceed."
|
||||
---
|
||||
# CronJob: Re-apply NetworkPolicy after backups (4:00 AM daily)
|
||||
# This blocks S3 access after the backup window closes
|
||||
apiVersion: batch/v1
|
||||
kind: CronJob
|
||||
metadata:
|
||||
name: longhorn-disable-s3-access
|
||||
namespace: longhorn-system
|
||||
labels:
|
||||
app: longhorn
|
||||
purpose: s3-access-control
|
||||
spec:
|
||||
# Run at 4:00 AM daily (gives 3 hours 5 minutes for backups to complete)
|
||||
schedule: "0 4 * * *"
|
||||
successfulJobsHistoryLimit: 2
|
||||
failedJobsHistoryLimit: 2
|
||||
concurrencyPolicy: Forbid
|
||||
jobTemplate:
|
||||
spec:
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app: longhorn-netpol-manager
|
||||
spec:
|
||||
serviceAccountName: longhorn-netpol-manager
|
||||
restartPolicy: OnFailure
|
||||
containers:
|
||||
- name: create-netpol
|
||||
image: bitnami/kubectl:latest
|
||||
imagePullPolicy: IfNotPresent
|
||||
command:
|
||||
- /bin/sh
|
||||
- -c
|
||||
- |
|
||||
echo "Re-applying CiliumNetworkPolicy to block S3 access..."
|
||||
kubectl apply -f - <<EOF
|
||||
apiVersion: cilium.io/v2
|
||||
kind: CiliumNetworkPolicy
|
||||
metadata:
|
||||
name: longhorn-block-s3-access
|
||||
namespace: longhorn-system
|
||||
labels:
|
||||
app: longhorn
|
||||
purpose: s3-access-control
|
||||
spec:
|
||||
description: "Block external S3 access while allowing internal cluster communication"
|
||||
endpointSelector:
|
||||
matchLabels:
|
||||
app: longhorn-manager
|
||||
egress:
|
||||
# Allow DNS to kube-system namespace
|
||||
- toEndpoints:
|
||||
- matchLabels:
|
||||
k8s-app: kube-dns
|
||||
toPorts:
|
||||
- ports:
|
||||
- port: "53"
|
||||
protocol: UDP
|
||||
- port: "53"
|
||||
protocol: TCP
|
||||
# Explicitly allow Kubernetes API server (critical for Longhorn)
|
||||
- toEntities:
|
||||
- kube-apiserver
|
||||
# Allow all internal cluster traffic (10.0.0.0/8)
|
||||
- toCIDR:
|
||||
- 10.0.0.0/8
|
||||
# Allow pod-to-pod communication within cluster
|
||||
# The 10.0.0.0/8 CIDR block above covers all pod-to-pod communication
|
||||
- toEntities:
|
||||
- cluster
|
||||
# Block all other egress (including external S3)
|
||||
EOF
|
||||
echo "S3 access blocked. Polling stopped until next backup window."
|
||||
|
||||
@@ -0,0 +1,34 @@
|
||||
---
|
||||
apiVersion: longhorn.io/v1beta2
|
||||
kind: RecurringJob
|
||||
metadata:
|
||||
name: s3-backup-daily
|
||||
namespace: longhorn-system
|
||||
spec:
|
||||
cron: "0 2 * * *" # Daily at 2 AM
|
||||
task: "backup"
|
||||
groups:
|
||||
- longhorn-s3-backup
|
||||
retain: 7 # Keep 7 daily backups
|
||||
concurrency: 2 # Max 2 concurrent backup jobs
|
||||
labels:
|
||||
recurring-job: "s3-backup-daily"
|
||||
backup-type: "daily"
|
||||
---
|
||||
apiVersion: longhorn.io/v1beta2
|
||||
kind: RecurringJob
|
||||
metadata:
|
||||
name: s3-backup-weekly
|
||||
namespace: longhorn-system
|
||||
spec:
|
||||
cron: "0 1 * * 0" # Weekly on Sunday at 1 AM
|
||||
task: "backup"
|
||||
groups:
|
||||
- longhorn-s3-backup-weekly
|
||||
retain: 4 # Keep 4 weekly backups
|
||||
concurrency: 1 # Only 1 concurrent weekly backup
|
||||
labels:
|
||||
recurring-job: "s3-backup-weekly"
|
||||
backup-type: "weekly"
|
||||
parameters:
|
||||
full-backup-interval: "1" # Full backup every other week (alternating full/incremental)
|
||||
81
manifests/infrastructure/longhorn/storageclass.yaml
Normal file
81
manifests/infrastructure/longhorn/storageclass.yaml
Normal file
@@ -0,0 +1,81 @@
|
||||
apiVersion: storage.k8s.io/v1
|
||||
kind: StorageClass
|
||||
metadata:
|
||||
name: longhorn-retain
|
||||
annotations:
|
||||
storageclass.kubernetes.io/is-default-class: "false"
|
||||
provisioner: driver.longhorn.io
|
||||
allowVolumeExpansion: true
|
||||
parameters:
|
||||
numberOfReplicas: "2"
|
||||
staleReplicaTimeout: "2880"
|
||||
fromBackup: ""
|
||||
fsType: "xfs"
|
||||
dataLocality: "best-effort"
|
||||
reclaimPolicy: Retain
|
||||
volumeBindingMode: Immediate
|
||||
---
|
||||
apiVersion: storage.k8s.io/v1
|
||||
kind: StorageClass
|
||||
metadata:
|
||||
name: longhorn-delete
|
||||
annotations:
|
||||
storageclass.kubernetes.io/is-default-class: "false"
|
||||
provisioner: driver.longhorn.io
|
||||
allowVolumeExpansion: true
|
||||
parameters:
|
||||
numberOfReplicas: "2"
|
||||
staleReplicaTimeout: "2880"
|
||||
fromBackup: ""
|
||||
fsType: "xfs"
|
||||
dataLocality: "best-effort"
|
||||
reclaimPolicy: Delete
|
||||
volumeBindingMode: Immediate
|
||||
---
|
||||
apiVersion: storage.k8s.io/v1
|
||||
kind: StorageClass
|
||||
metadata:
|
||||
name: longhorn-single-delete
|
||||
annotations:
|
||||
storageclass.kubernetes.io/is-default-class: "false"
|
||||
provisioner: driver.longhorn.io
|
||||
allowVolumeExpansion: true
|
||||
parameters:
|
||||
numberOfReplicas: "1"
|
||||
staleReplicaTimeout: "2880"
|
||||
fromBackup: ""
|
||||
fsType: "xfs"
|
||||
dataLocality: "best-effort"
|
||||
reclaimPolicy: Delete
|
||||
volumeBindingMode: Immediate
|
||||
---
|
||||
# Redis-specific StorageClass
|
||||
# Single replica as Redis handles replication at application level
|
||||
# Note: volumeBindingMode is immutable after creation
|
||||
# If this StorageClass already exists with matching configuration, Flux reconciliation
|
||||
# may show an error but it's harmless - the existing StorageClass will continue to work.
|
||||
# For new clusters, this will be created correctly.
|
||||
apiVersion: storage.k8s.io/v1
|
||||
kind: StorageClass
|
||||
metadata:
|
||||
name: longhorn-redis
|
||||
annotations:
|
||||
storageclass.kubernetes.io/is-default-class: "false"
|
||||
provisioner: driver.longhorn.io
|
||||
allowVolumeExpansion: true
|
||||
parameters:
|
||||
# Single replica as Redis handles replication at application level
|
||||
numberOfReplicas: "1"
|
||||
staleReplicaTimeout: "2880"
|
||||
fsType: "xfs" # xfs to match existing Longhorn volumes
|
||||
dataLocality: "strict-local" # Keep Redis data local to node
|
||||
# Integrate with existing S3 backup infrastructure
|
||||
recurringJobSelector: |
|
||||
[
|
||||
{
|
||||
"name":"longhorn-s3-backup",
|
||||
"isGroup":true
|
||||
}
|
||||
]
|
||||
reclaimPolicy: Delete
|
||||
volumeBindingMode: Immediate
|
||||
Reference in New Issue
Block a user