Add the redacted source file for demo purposes Reviewed-on: https://source.michaeldileo.org/michael_dileo/Keybard-Vagabond-Demo/pulls/1 Co-authored-by: Michael DiLeo <michael_dileo@proton.me> Co-committed-by: Michael DiLeo <michael_dileo@proton.me>
5.7 KiB
Adding a New Node for Nginx Ingress Metrics Collection
This guide documents the steps required to add a new node to the cluster and ensure nginx ingress controller metrics are properly collected from it.
Overview
The nginx ingress controller is deployed as a DaemonSet (kind: DaemonSet), which means it automatically deploys one pod per node. However, for metrics collection to work properly, additional configuration steps are required.
Current Configuration
Currently, the cluster has 3 nodes with metrics collection configured for:
- n1 (<NODE_1_EXTERNAL_IP>): Control plane + worker
- n2 (<NODE_2_EXTERNAL_IP>): Worker
- n3 (<NODE_3_EXTERNAL_IP>): Worker
Steps to Add a New Node
1. Add the Node to Kubernetes Cluster
Follow your standard node addition process (this is outside the scope of this guide). Ensure the new node:
- Is properly joined to the cluster
- Has the nginx ingress controller pod deployed (should happen automatically due to DaemonSet)
- Is accessible on the cluster network
2. Verify Nginx Ingress Controller Deployment
Check that the nginx ingress controller pod is running on the new node:
kubectl get pods -n ingress-nginx -o wide
Look for a pod on your new node. The nginx ingress controller should automatically deploy due to the DaemonSet configuration.
3. Update OpenTelemetry Collector Configuration
File to modify: manifests/infrastructure/openobserve-collector/gateway-collector.yaml
Current configuration (lines 217-219):
- job_name: 'nginx-ingress'
static_configs:
- targets: ['<NODE_1_EXTERNAL_IP>:10254', '<NODE_2_EXTERNAL_IP>:10254', '<NODE_3_EXTERNAL_IP>:10254']
Add the new node IP to the targets list:
- job_name: 'nginx-ingress'
static_configs:
- targets: ['<NODE_1_EXTERNAL_IP>:10254', '<NODE_2_EXTERNAL_IP>:10254', '<NODE_3_EXTERNAL_IP>:10254', 'NEW_NODE_IP:10254']
Replace NEW_NODE_IP with the actual IP address of your new node.
4. Update Host Firewall Policies (if applicable)
File to check: manifests/infrastructure/cluster-policies/host-fw-worker-nodes.yaml
Ensure the firewall allows nginx metrics port access (should already be configured):
# NGINX Ingress Controller metrics port
- fromEntities:
- cluster
toPorts:
- ports:
- port: "10254"
protocol: "TCP" # NGINX Ingress metrics
5. Apply the Configuration Changes
# Apply the updated collector configuration
kubectl apply -f manifests/infrastructure/openobserve-collector/gateway-collector.yaml
# Restart the collector to pick up the new configuration
kubectl rollout restart statefulset/openobserve-collector-gateway-collector -n openobserve-collector
6. Verification Steps
-
Check that the nginx pod is running on the new node:
kubectl get pods -n ingress-nginx -o wide | grep NEW_NODE_NAME -
Verify metrics endpoint is accessible:
curl -s http://NEW_NODE_IP:10254/metrics | grep nginx_ingress_controller_requests | head -3 -
Check collector logs for the new target:
kubectl logs -n openobserve-collector openobserve-collector-gateway-collector-0 --tail=50 | grep -i nginx -
Verify target discovery: Look for log entries like:
Scrape job added {"jobName": "nginx-ingress"} -
Test metrics in OpenObserve: Your dashboard query should now include metrics from the new node:
sum(increase(nginx_ingress_controller_requests[5m])) by (host)
Important Notes
Automatic vs Manual Configuration
- ✅ Automatic: Nginx ingress controller deployment (DaemonSet handles this)
- ✅ Automatic: ServiceMonitor discovery (target allocator handles this)
- ❌ Manual: Static scrape configuration (requires updating the targets list)
Why Both ServiceMonitor and Static Config?
The current setup uses both approaches for redundancy:
- ServiceMonitor: Automatically discovers nginx ingress services
- Static Configuration: Ensures specific node IPs are always monitored
Network Requirements
- Port 10254 must be accessible from the OpenTelemetry collector pods
- The new node should be on the same network as existing nodes
- Host firewall policies should allow metrics collection
Monitoring Best Practices
- Always verify metrics are flowing after adding a node
- Test your dashboard queries to ensure the new node's metrics appear
- Monitor collector logs for any scraping errors
Troubleshooting
Common Issues
- Nginx pod not starting: Check node labels and taints
- Metrics endpoint not accessible: Verify network connectivity and firewall rules
- Collector not scraping: Check collector logs and restart if needed
- Missing metrics in dashboard: Wait 30-60 seconds for metrics to propagate
Useful Commands
# Check nginx ingress pods
kubectl get pods -n ingress-nginx -o wide
# Test metrics endpoint
curl -s http://NODE_IP:10254/metrics | grep nginx_ingress_controller_requests
# Check collector status
kubectl get pods -n openobserve-collector
# View collector logs
kubectl logs -n openobserve-collector openobserve-collector-gateway-collector-0 --tail=50
# Check ServiceMonitor
kubectl get servicemonitor -n ingress-nginx -o yaml
Configuration Files Summary
Files that may need updates when adding a node:
-
Required:
manifests/infrastructure/openobserve-collector/gateway-collector.yaml- Update static targets list (line ~219)
-
Optional:
manifests/infrastructure/cluster-policies/host-fw-worker-nodes.yaml- Usually already configured for port 10254
-
Automatic:
manifests/infrastructure/ingress-nginx/ingress-nginx.yaml- No changes needed (DaemonSet handles deployment)