redaction (#1)
Add the redacted source file for demo purposes Reviewed-on: https://source.michaeldileo.org/michael_dileo/Keybard-Vagabond-Demo/pulls/1 Co-authored-by: Michael DiLeo <michael_dileo@proton.me> Co-committed-by: Michael DiLeo <michael_dileo@proton.me>
This commit was merged in pull request #1.
This commit is contained in:
169
docs/CILIUM-POLICY-AUDIT-TESTING.md
Normal file
169
docs/CILIUM-POLICY-AUDIT-TESTING.md
Normal file
@@ -0,0 +1,169 @@
|
||||
# Cilium Host Firewall Policy Audit Mode Testing
|
||||
|
||||
## Overview
|
||||
|
||||
This guide explains how to test Cilium host firewall policies in audit mode before applying them in enforcement mode. This prevents accidentally locking yourself out of the cluster.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- `kubectl` configured and working
|
||||
- Access to the cluster (via Tailscale or direct connection)
|
||||
- Cilium installed and running
|
||||
|
||||
## Quick Start
|
||||
|
||||
Run the automated test script:
|
||||
|
||||
```bash
|
||||
./tools/test-cilium-policy-audit.sh
|
||||
```
|
||||
|
||||
This script will:
|
||||
1. Find the Cilium pod
|
||||
2. Locate the host endpoint (identity 1)
|
||||
3. Enable PolicyAuditMode
|
||||
4. Start monitoring policy verdicts
|
||||
5. Test basic connectivity
|
||||
6. Show audit log entries
|
||||
|
||||
## Manual Testing Steps
|
||||
|
||||
### 1. Find Cilium Pod
|
||||
|
||||
```bash
|
||||
kubectl -n kube-system get pods -l "k8s-app=cilium"
|
||||
```
|
||||
|
||||
### 2. Find Host Endpoint
|
||||
|
||||
The host endpoint has identity `1`. Find its endpoint ID:
|
||||
|
||||
```bash
|
||||
CILIUM_POD=$(kubectl -n kube-system get pods -l "k8s-app=cilium" -o jsonpath='{.items[0].metadata.name}')
|
||||
kubectl exec -n kube-system ${CILIUM_POD} -- \
|
||||
cilium endpoint list -o jsonpath='{[?(@.status.identity.id==1)].id}'
|
||||
```
|
||||
|
||||
### 3. Enable Audit Mode
|
||||
|
||||
```bash
|
||||
kubectl exec -n kube-system ${CILIUM_POD} -- \
|
||||
cilium endpoint config <ENDPOINT_ID> PolicyAuditMode=Enabled
|
||||
```
|
||||
|
||||
### 4. Verify Audit Mode
|
||||
|
||||
```bash
|
||||
kubectl exec -n kube-system ${CILIUM_POD} -- \
|
||||
cilium endpoint config <ENDPOINT_ID> | grep PolicyAuditMode
|
||||
```
|
||||
|
||||
Should show: `PolicyAuditMode : Enabled`
|
||||
|
||||
### 5. Start Monitoring
|
||||
|
||||
In a separate terminal, start monitoring policy verdicts:
|
||||
|
||||
```bash
|
||||
kubectl exec -n kube-system ${CILIUM_POD} -- \
|
||||
cilium monitor -t policy-verdict --related-to <ENDPOINT_ID>
|
||||
```
|
||||
|
||||
### 6. Test Connectivity
|
||||
|
||||
While monitoring, test various connections:
|
||||
|
||||
**Kubernetes API:**
|
||||
```bash
|
||||
kubectl get nodes
|
||||
kubectl get pods -A
|
||||
```
|
||||
|
||||
**Talos API (if talosctl available):**
|
||||
```bash
|
||||
talosctl -n <NODE_IP> time
|
||||
talosctl -n <NODE_IP> version
|
||||
```
|
||||
|
||||
**Cluster Internal:**
|
||||
```bash
|
||||
kubectl get services -A
|
||||
```
|
||||
|
||||
### 7. Review Audit Log
|
||||
|
||||
Look for entries in the monitor output:
|
||||
- `action allow` - Traffic allowed by policy
|
||||
- `action audit` - Traffic would be denied but is being audited (not dropped)
|
||||
- `action deny` - Traffic denied (only in enforcement mode)
|
||||
|
||||
### 8. Disable Audit Mode (When Ready)
|
||||
|
||||
Once you've verified all necessary traffic is allowed:
|
||||
|
||||
```bash
|
||||
kubectl exec -n kube-system ${CILIUM_POD} -- \
|
||||
cilium endpoint config <ENDPOINT_ID> PolicyAuditMode=Disabled
|
||||
```
|
||||
|
||||
## Expected Results
|
||||
|
||||
With the current policies, you should see `action allow` for:
|
||||
|
||||
1. **Kubernetes API (6443)** from:
|
||||
- Tailscale network (100.64.0.0/10)
|
||||
- VLAN subnet (10.132.0.0/24)
|
||||
- VIP (<VIP_IP>)
|
||||
- External IPs (152.53.x.x)
|
||||
- Cluster entities
|
||||
|
||||
2. **Talos API (50000, 50001)** from:
|
||||
- Tailscale network
|
||||
- VLAN subnet
|
||||
- VIP
|
||||
- External IPs
|
||||
- Cluster entities
|
||||
|
||||
3. **Cluster Internal Traffic** from:
|
||||
- Cluster entities
|
||||
- Remote nodes
|
||||
- Host
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### No Policy Verdicts Appearing
|
||||
|
||||
- Ensure PolicyAuditMode is enabled
|
||||
- Check that policies are actually applied: `kubectl get ciliumclusterwidenetworkpolicies`
|
||||
- Generate more traffic to trigger policy evaluation
|
||||
|
||||
### Seeing `action audit` (Would Be Denied)
|
||||
|
||||
This means traffic would be blocked in enforcement mode. Review your policies and add appropriate rules.
|
||||
|
||||
### Locked Out After Disabling Audit Mode
|
||||
|
||||
If you lose access after disabling audit mode:
|
||||
|
||||
1. Use the Hetzner Robot firewall escape hatch (if configured)
|
||||
2. Or access via Tailscale network (should still work)
|
||||
3. Re-enable audit mode via direct node access if needed
|
||||
|
||||
## Policy Verification Checklist
|
||||
|
||||
Before disabling audit mode, verify:
|
||||
|
||||
- [ ] Kubernetes API accessible from Tailscale
|
||||
- [ ] Kubernetes API accessible from VLAN
|
||||
- [ ] Talos API accessible from Tailscale
|
||||
- [ ] Talos API accessible from VLAN
|
||||
- [ ] Cluster internal communication working
|
||||
- [ ] Worker nodes can reach control plane
|
||||
- [ ] No unexpected `action audit` entries for critical services
|
||||
|
||||
## References
|
||||
|
||||
- [Cilium Host Firewall Documentation](https://docs.cilium.io/en/stable/policy/language/#host-firewall)
|
||||
- [Policy Audit Mode Guide](https://datavirke.dk/posts/bare-metal-kubernetes-part-2-cilium-and-firewalls/#policy-audit-mode)
|
||||
- [Cilium Network Policies](https://docs.cilium.io/en/stable/policy/language/)
|
||||
|
||||
Reference in New Issue
Block a user