Files
Keybard-Vagabond-Demo/docs/CILIUM-POLICY-AUDIT-TESTING.md

4.2 KiB

Cilium Host Firewall Policy Audit Mode Testing

Overview

This guide explains how to test Cilium host firewall policies in audit mode before applying them in enforcement mode. This prevents accidentally locking yourself out of the cluster.

Prerequisites

  • kubectl configured and working
  • Access to the cluster (via Tailscale or direct connection)
  • Cilium installed and running

Quick Start

Run the automated test script:

./tools/test-cilium-policy-audit.sh

This script will:

  1. Find the Cilium pod
  2. Locate the host endpoint (identity 1)
  3. Enable PolicyAuditMode
  4. Start monitoring policy verdicts
  5. Test basic connectivity
  6. Show audit log entries

Manual Testing Steps

1. Find Cilium Pod

kubectl -n kube-system get pods -l "k8s-app=cilium"

2. Find Host Endpoint

The host endpoint has identity 1. Find its endpoint ID:

CILIUM_POD=$(kubectl -n kube-system get pods -l "k8s-app=cilium" -o jsonpath='{.items[0].metadata.name}')
kubectl exec -n kube-system ${CILIUM_POD} -- \
  cilium endpoint list -o jsonpath='{[?(@.status.identity.id==1)].id}'

3. Enable Audit Mode

kubectl exec -n kube-system ${CILIUM_POD} -- \
  cilium endpoint config <ENDPOINT_ID> PolicyAuditMode=Enabled

4. Verify Audit Mode

kubectl exec -n kube-system ${CILIUM_POD} -- \
  cilium endpoint config <ENDPOINT_ID> | grep PolicyAuditMode

Should show: PolicyAuditMode : Enabled

5. Start Monitoring

In a separate terminal, start monitoring policy verdicts:

kubectl exec -n kube-system ${CILIUM_POD} -- \
  cilium monitor -t policy-verdict --related-to <ENDPOINT_ID>

6. Test Connectivity

While monitoring, test various connections:

Kubernetes API:

kubectl get nodes
kubectl get pods -A

Talos API (if talosctl available):

talosctl -n <NODE_IP> time
talosctl -n <NODE_IP> version

Cluster Internal:

kubectl get services -A

7. Review Audit Log

Look for entries in the monitor output:

  • action allow - Traffic allowed by policy
  • action audit - Traffic would be denied but is being audited (not dropped)
  • action deny - Traffic denied (only in enforcement mode)

8. Disable Audit Mode (When Ready)

Once you've verified all necessary traffic is allowed:

kubectl exec -n kube-system ${CILIUM_POD} -- \
  cilium endpoint config <ENDPOINT_ID> PolicyAuditMode=Disabled

Expected Results

With the current policies, you should see action allow for:

  1. Kubernetes API (6443) from:

    • Tailscale network (100.64.0.0/10)
    • VLAN subnet (10.132.0.0/24)
    • VIP (<VIP_IP>)
    • External IPs (152.53.x.x)
    • Cluster entities
  2. Talos API (50000, 50001) from:

    • Tailscale network
    • VLAN subnet
    • VIP
    • External IPs
    • Cluster entities
  3. Cluster Internal Traffic from:

    • Cluster entities
    • Remote nodes
    • Host

Troubleshooting

No Policy Verdicts Appearing

  • Ensure PolicyAuditMode is enabled
  • Check that policies are actually applied: kubectl get ciliumclusterwidenetworkpolicies
  • Generate more traffic to trigger policy evaluation

Seeing action audit (Would Be Denied)

This means traffic would be blocked in enforcement mode. Review your policies and add appropriate rules.

Locked Out After Disabling Audit Mode

If you lose access after disabling audit mode:

  1. Use the Hetzner Robot firewall escape hatch (if configured)
  2. Or access via Tailscale network (should still work)
  3. Re-enable audit mode via direct node access if needed

Policy Verification Checklist

Before disabling audit mode, verify:

  • Kubernetes API accessible from Tailscale
  • Kubernetes API accessible from VLAN
  • Talos API accessible from Tailscale
  • Talos API accessible from VLAN
  • Cluster internal communication working
  • Worker nodes can reach control plane
  • No unexpected action audit entries for critical services

References