Add the redacted source file for demo purposes Reviewed-on: https://source.michaeldileo.org/michael_dileo/Keybard-Vagabond-Demo/pulls/1 Co-authored-by: Michael DiLeo <michael_dileo@proton.me> Co-committed-by: Michael DiLeo <michael_dileo@proton.me>
170 lines
4.2 KiB
Markdown
170 lines
4.2 KiB
Markdown
# Cilium Host Firewall Policy Audit Mode Testing
|
|
|
|
## Overview
|
|
|
|
This guide explains how to test Cilium host firewall policies in audit mode before applying them in enforcement mode. This prevents accidentally locking yourself out of the cluster.
|
|
|
|
## Prerequisites
|
|
|
|
- `kubectl` configured and working
|
|
- Access to the cluster (via Tailscale or direct connection)
|
|
- Cilium installed and running
|
|
|
|
## Quick Start
|
|
|
|
Run the automated test script:
|
|
|
|
```bash
|
|
./tools/test-cilium-policy-audit.sh
|
|
```
|
|
|
|
This script will:
|
|
1. Find the Cilium pod
|
|
2. Locate the host endpoint (identity 1)
|
|
3. Enable PolicyAuditMode
|
|
4. Start monitoring policy verdicts
|
|
5. Test basic connectivity
|
|
6. Show audit log entries
|
|
|
|
## Manual Testing Steps
|
|
|
|
### 1. Find Cilium Pod
|
|
|
|
```bash
|
|
kubectl -n kube-system get pods -l "k8s-app=cilium"
|
|
```
|
|
|
|
### 2. Find Host Endpoint
|
|
|
|
The host endpoint has identity `1`. Find its endpoint ID:
|
|
|
|
```bash
|
|
CILIUM_POD=$(kubectl -n kube-system get pods -l "k8s-app=cilium" -o jsonpath='{.items[0].metadata.name}')
|
|
kubectl exec -n kube-system ${CILIUM_POD} -- \
|
|
cilium endpoint list -o jsonpath='{[?(@.status.identity.id==1)].id}'
|
|
```
|
|
|
|
### 3. Enable Audit Mode
|
|
|
|
```bash
|
|
kubectl exec -n kube-system ${CILIUM_POD} -- \
|
|
cilium endpoint config <ENDPOINT_ID> PolicyAuditMode=Enabled
|
|
```
|
|
|
|
### 4. Verify Audit Mode
|
|
|
|
```bash
|
|
kubectl exec -n kube-system ${CILIUM_POD} -- \
|
|
cilium endpoint config <ENDPOINT_ID> | grep PolicyAuditMode
|
|
```
|
|
|
|
Should show: `PolicyAuditMode : Enabled`
|
|
|
|
### 5. Start Monitoring
|
|
|
|
In a separate terminal, start monitoring policy verdicts:
|
|
|
|
```bash
|
|
kubectl exec -n kube-system ${CILIUM_POD} -- \
|
|
cilium monitor -t policy-verdict --related-to <ENDPOINT_ID>
|
|
```
|
|
|
|
### 6. Test Connectivity
|
|
|
|
While monitoring, test various connections:
|
|
|
|
**Kubernetes API:**
|
|
```bash
|
|
kubectl get nodes
|
|
kubectl get pods -A
|
|
```
|
|
|
|
**Talos API (if talosctl available):**
|
|
```bash
|
|
talosctl -n <NODE_IP> time
|
|
talosctl -n <NODE_IP> version
|
|
```
|
|
|
|
**Cluster Internal:**
|
|
```bash
|
|
kubectl get services -A
|
|
```
|
|
|
|
### 7. Review Audit Log
|
|
|
|
Look for entries in the monitor output:
|
|
- `action allow` - Traffic allowed by policy
|
|
- `action audit` - Traffic would be denied but is being audited (not dropped)
|
|
- `action deny` - Traffic denied (only in enforcement mode)
|
|
|
|
### 8. Disable Audit Mode (When Ready)
|
|
|
|
Once you've verified all necessary traffic is allowed:
|
|
|
|
```bash
|
|
kubectl exec -n kube-system ${CILIUM_POD} -- \
|
|
cilium endpoint config <ENDPOINT_ID> PolicyAuditMode=Disabled
|
|
```
|
|
|
|
## Expected Results
|
|
|
|
With the current policies, you should see `action allow` for:
|
|
|
|
1. **Kubernetes API (6443)** from:
|
|
- Tailscale network (100.64.0.0/10)
|
|
- VLAN subnet (10.132.0.0/24)
|
|
- VIP (<VIP_IP>)
|
|
- External IPs (152.53.x.x)
|
|
- Cluster entities
|
|
|
|
2. **Talos API (50000, 50001)** from:
|
|
- Tailscale network
|
|
- VLAN subnet
|
|
- VIP
|
|
- External IPs
|
|
- Cluster entities
|
|
|
|
3. **Cluster Internal Traffic** from:
|
|
- Cluster entities
|
|
- Remote nodes
|
|
- Host
|
|
|
|
## Troubleshooting
|
|
|
|
### No Policy Verdicts Appearing
|
|
|
|
- Ensure PolicyAuditMode is enabled
|
|
- Check that policies are actually applied: `kubectl get ciliumclusterwidenetworkpolicies`
|
|
- Generate more traffic to trigger policy evaluation
|
|
|
|
### Seeing `action audit` (Would Be Denied)
|
|
|
|
This means traffic would be blocked in enforcement mode. Review your policies and add appropriate rules.
|
|
|
|
### Locked Out After Disabling Audit Mode
|
|
|
|
If you lose access after disabling audit mode:
|
|
|
|
1. Use the Hetzner Robot firewall escape hatch (if configured)
|
|
2. Or access via Tailscale network (should still work)
|
|
3. Re-enable audit mode via direct node access if needed
|
|
|
|
## Policy Verification Checklist
|
|
|
|
Before disabling audit mode, verify:
|
|
|
|
- [ ] Kubernetes API accessible from Tailscale
|
|
- [ ] Kubernetes API accessible from VLAN
|
|
- [ ] Talos API accessible from Tailscale
|
|
- [ ] Talos API accessible from VLAN
|
|
- [ ] Cluster internal communication working
|
|
- [ ] Worker nodes can reach control plane
|
|
- [ ] No unexpected `action audit` entries for critical services
|
|
|
|
## References
|
|
|
|
- [Cilium Host Firewall Documentation](https://docs.cilium.io/en/stable/policy/language/#host-firewall)
|
|
- [Policy Audit Mode Guide](https://datavirke.dk/posts/bare-metal-kubernetes-part-2-cilium-and-firewalls/#policy-audit-mode)
|
|
- [Cilium Network Policies](https://docs.cilium.io/en/stable/policy/language/)
|
|
|