Files
Keybard-Vagabond-Demo/docs/NODE-ADDITION-GUIDE.md
Michael DiLeo 7327d77dcd redaction (#1)
Add the redacted source file for demo purposes

Reviewed-on: https://source.michaeldileo.org/michael_dileo/Keybard-Vagabond-Demo/pulls/1
Co-authored-by: Michael DiLeo <michael_dileo@proton.me>
Co-committed-by: Michael DiLeo <michael_dileo@proton.me>
2025-12-24 13:40:47 +00:00

5.7 KiB

Adding a New Node for Nginx Ingress Metrics Collection

This guide documents the steps required to add a new node to the cluster and ensure nginx ingress controller metrics are properly collected from it.

Overview

The nginx ingress controller is deployed as a DaemonSet (kind: DaemonSet), which means it automatically deploys one pod per node. However, for metrics collection to work properly, additional configuration steps are required.

Current Configuration

Currently, the cluster has 3 nodes with metrics collection configured for:

  • n1 (<NODE_1_EXTERNAL_IP>): Control plane + worker
  • n2 (<NODE_2_EXTERNAL_IP>): Worker
  • n3 (<NODE_3_EXTERNAL_IP>): Worker

Steps to Add a New Node

1. Add the Node to Kubernetes Cluster

Follow your standard node addition process (this is outside the scope of this guide). Ensure the new node:

  • Is properly joined to the cluster
  • Has the nginx ingress controller pod deployed (should happen automatically due to DaemonSet)
  • Is accessible on the cluster network

2. Verify Nginx Ingress Controller Deployment

Check that the nginx ingress controller pod is running on the new node:

kubectl get pods -n ingress-nginx -o wide

Look for a pod on your new node. The nginx ingress controller should automatically deploy due to the DaemonSet configuration.

3. Update OpenTelemetry Collector Configuration

File to modify: manifests/infrastructure/openobserve-collector/gateway-collector.yaml

Current configuration (lines 217-219):

- job_name: 'nginx-ingress'
  static_configs:
    - targets: ['<NODE_1_EXTERNAL_IP>:10254', '<NODE_2_EXTERNAL_IP>:10254', '<NODE_3_EXTERNAL_IP>:10254']

Add the new node IP to the targets list:

- job_name: 'nginx-ingress'
  static_configs:
    - targets: ['<NODE_1_EXTERNAL_IP>:10254', '<NODE_2_EXTERNAL_IP>:10254', '<NODE_3_EXTERNAL_IP>:10254', 'NEW_NODE_IP:10254']

Replace NEW_NODE_IP with the actual IP address of your new node.

4. Update Host Firewall Policies (if applicable)

File to check: manifests/infrastructure/cluster-policies/host-fw-worker-nodes.yaml

Ensure the firewall allows nginx metrics port access (should already be configured):

# NGINX Ingress Controller metrics port
- fromEntities:
  - cluster
  toPorts:
  - ports:
    - port: "10254"
      protocol: "TCP"  # NGINX Ingress metrics

5. Apply the Configuration Changes

# Apply the updated collector configuration
kubectl apply -f manifests/infrastructure/openobserve-collector/gateway-collector.yaml

# Restart the collector to pick up the new configuration
kubectl rollout restart statefulset/openobserve-collector-gateway-collector -n openobserve-collector

6. Verification Steps

  1. Check that the nginx pod is running on the new node:

    kubectl get pods -n ingress-nginx -o wide | grep NEW_NODE_NAME
    
  2. Verify metrics endpoint is accessible:

    curl -s http://NEW_NODE_IP:10254/metrics | grep nginx_ingress_controller_requests | head -3
    
  3. Check collector logs for the new target:

    kubectl logs -n openobserve-collector openobserve-collector-gateway-collector-0 --tail=50 | grep -i nginx
    
  4. Verify target discovery: Look for log entries like:

    Scrape job added {"jobName": "nginx-ingress"}
    
  5. Test metrics in OpenObserve: Your dashboard query should now include metrics from the new node:

    sum(increase(nginx_ingress_controller_requests[5m])) by (host)
    

Important Notes

Automatic vs Manual Configuration

  • Automatic: Nginx ingress controller deployment (DaemonSet handles this)
  • Automatic: ServiceMonitor discovery (target allocator handles this)
  • Manual: Static scrape configuration (requires updating the targets list)

Why Both ServiceMonitor and Static Config?

The current setup uses both approaches for redundancy:

  1. ServiceMonitor: Automatically discovers nginx ingress services
  2. Static Configuration: Ensures specific node IPs are always monitored

Network Requirements

  • Port 10254 must be accessible from the OpenTelemetry collector pods
  • The new node should be on the same network as existing nodes
  • Host firewall policies should allow metrics collection

Monitoring Best Practices

  • Always verify metrics are flowing after adding a node
  • Test your dashboard queries to ensure the new node's metrics appear
  • Monitor collector logs for any scraping errors

Troubleshooting

Common Issues

  1. Nginx pod not starting: Check node labels and taints
  2. Metrics endpoint not accessible: Verify network connectivity and firewall rules
  3. Collector not scraping: Check collector logs and restart if needed
  4. Missing metrics in dashboard: Wait 30-60 seconds for metrics to propagate

Useful Commands

# Check nginx ingress pods
kubectl get pods -n ingress-nginx -o wide

# Test metrics endpoint
curl -s http://NODE_IP:10254/metrics | grep nginx_ingress_controller_requests

# Check collector status
kubectl get pods -n openobserve-collector

# View collector logs
kubectl logs -n openobserve-collector openobserve-collector-gateway-collector-0 --tail=50

# Check ServiceMonitor
kubectl get servicemonitor -n ingress-nginx -o yaml

Configuration Files Summary

Files that may need updates when adding a node:

  1. Required: manifests/infrastructure/openobserve-collector/gateway-collector.yaml

    • Update static targets list (line ~219)
  2. Optional: manifests/infrastructure/cluster-policies/host-fw-worker-nodes.yaml

    • Usually already configured for port 10254
  3. Automatic: manifests/infrastructure/ingress-nginx/ingress-nginx.yaml

    • No changes needed (DaemonSet handles deployment)