175 lines
5.7 KiB
Markdown
175 lines
5.7 KiB
Markdown
|
|
# Adding a New Node for Nginx Ingress Metrics Collection
|
||
|
|
|
||
|
|
This guide documents the steps required to add a new node to the cluster and ensure nginx ingress controller metrics are properly collected from it.
|
||
|
|
|
||
|
|
## Overview
|
||
|
|
|
||
|
|
The nginx ingress controller is deployed as a **DaemonSet** (kind: DaemonSet), which means it automatically deploys one pod per node. However, for metrics collection to work properly, additional configuration steps are required.
|
||
|
|
|
||
|
|
## Current Configuration
|
||
|
|
|
||
|
|
Currently, the cluster has 3 nodes with metrics collection configured for:
|
||
|
|
- **n1 (<NODE_1_EXTERNAL_IP>)**: Control plane + worker
|
||
|
|
- **n2 (<NODE_2_EXTERNAL_IP>)**: Worker
|
||
|
|
- **n3 (<NODE_3_EXTERNAL_IP>)**: Worker
|
||
|
|
|
||
|
|
## Steps to Add a New Node
|
||
|
|
|
||
|
|
### 1. Add the Node to Kubernetes Cluster
|
||
|
|
|
||
|
|
Follow your standard node addition process (this is outside the scope of this guide). Ensure the new node:
|
||
|
|
- Is properly joined to the cluster
|
||
|
|
- Has the nginx ingress controller pod deployed (should happen automatically due to DaemonSet)
|
||
|
|
- Is accessible on the cluster network
|
||
|
|
|
||
|
|
### 2. Verify Nginx Ingress Controller Deployment
|
||
|
|
|
||
|
|
Check that the nginx ingress controller pod is running on the new node:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
kubectl get pods -n ingress-nginx -o wide
|
||
|
|
```
|
||
|
|
|
||
|
|
Look for a pod on your new node. The nginx ingress controller should automatically deploy due to the DaemonSet configuration.
|
||
|
|
|
||
|
|
### 3. Update OpenTelemetry Collector Configuration
|
||
|
|
|
||
|
|
**File to modify**: `manifests/infrastructure/openobserve-collector/gateway-collector.yaml`
|
||
|
|
|
||
|
|
**Current configuration** (lines 217-219):
|
||
|
|
```yaml
|
||
|
|
- job_name: 'nginx-ingress'
|
||
|
|
static_configs:
|
||
|
|
- targets: ['<NODE_1_EXTERNAL_IP>:10254', '<NODE_2_EXTERNAL_IP>:10254', '<NODE_3_EXTERNAL_IP>:10254']
|
||
|
|
```
|
||
|
|
|
||
|
|
**Add the new node IP** to the targets list:
|
||
|
|
```yaml
|
||
|
|
- job_name: 'nginx-ingress'
|
||
|
|
static_configs:
|
||
|
|
- targets: ['<NODE_1_EXTERNAL_IP>:10254', '<NODE_2_EXTERNAL_IP>:10254', '<NODE_3_EXTERNAL_IP>:10254', 'NEW_NODE_IP:10254']
|
||
|
|
```
|
||
|
|
|
||
|
|
Replace `NEW_NODE_IP` with the actual IP address of your new node.
|
||
|
|
|
||
|
|
### 4. Update Host Firewall Policies (if applicable)
|
||
|
|
|
||
|
|
**File to check**: `manifests/infrastructure/cluster-policies/host-fw-worker-nodes.yaml`
|
||
|
|
|
||
|
|
Ensure the firewall allows nginx metrics port access (should already be configured):
|
||
|
|
```yaml
|
||
|
|
# NGINX Ingress Controller metrics port
|
||
|
|
- fromEntities:
|
||
|
|
- cluster
|
||
|
|
toPorts:
|
||
|
|
- ports:
|
||
|
|
- port: "10254"
|
||
|
|
protocol: "TCP" # NGINX Ingress metrics
|
||
|
|
```
|
||
|
|
|
||
|
|
### 5. Apply the Configuration Changes
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Apply the updated collector configuration
|
||
|
|
kubectl apply -f manifests/infrastructure/openobserve-collector/gateway-collector.yaml
|
||
|
|
|
||
|
|
# Restart the collector to pick up the new configuration
|
||
|
|
kubectl rollout restart statefulset/openobserve-collector-gateway-collector -n openobserve-collector
|
||
|
|
```
|
||
|
|
|
||
|
|
### 6. Verification Steps
|
||
|
|
|
||
|
|
1. **Check that the nginx pod is running on the new node**:
|
||
|
|
```bash
|
||
|
|
kubectl get pods -n ingress-nginx -o wide | grep NEW_NODE_NAME
|
||
|
|
```
|
||
|
|
|
||
|
|
2. **Verify metrics endpoint is accessible**:
|
||
|
|
```bash
|
||
|
|
curl -s http://NEW_NODE_IP:10254/metrics | grep nginx_ingress_controller_requests | head -3
|
||
|
|
```
|
||
|
|
|
||
|
|
3. **Check collector logs for the new target**:
|
||
|
|
```bash
|
||
|
|
kubectl logs -n openobserve-collector openobserve-collector-gateway-collector-0 --tail=50 | grep -i nginx
|
||
|
|
```
|
||
|
|
|
||
|
|
4. **Verify target discovery**:
|
||
|
|
Look for log entries like:
|
||
|
|
```
|
||
|
|
Scrape job added {"jobName": "nginx-ingress"}
|
||
|
|
```
|
||
|
|
|
||
|
|
5. **Test metrics in OpenObserve**:
|
||
|
|
Your dashboard query should now include metrics from the new node:
|
||
|
|
```promql
|
||
|
|
sum(increase(nginx_ingress_controller_requests[5m])) by (host)
|
||
|
|
```
|
||
|
|
|
||
|
|
## Important Notes
|
||
|
|
|
||
|
|
### Automatic vs Manual Configuration
|
||
|
|
|
||
|
|
- ✅ **Automatic**: Nginx ingress controller deployment (DaemonSet handles this)
|
||
|
|
- ✅ **Automatic**: ServiceMonitor discovery (target allocator handles this)
|
||
|
|
- ❌ **Manual**: Static scrape configuration (requires updating the targets list)
|
||
|
|
|
||
|
|
### Why Both ServiceMonitor and Static Config?
|
||
|
|
|
||
|
|
The current setup uses **both approaches** for redundancy:
|
||
|
|
1. **ServiceMonitor**: Automatically discovers nginx ingress services
|
||
|
|
2. **Static Configuration**: Ensures specific node IPs are always monitored
|
||
|
|
|
||
|
|
### Network Requirements
|
||
|
|
|
||
|
|
- Port **10254** must be accessible from the OpenTelemetry collector pods
|
||
|
|
- The new node should be on the same network as existing nodes
|
||
|
|
- Host firewall policies should allow metrics collection
|
||
|
|
|
||
|
|
### Monitoring Best Practices
|
||
|
|
|
||
|
|
- Always verify metrics are flowing after adding a node
|
||
|
|
- Test your dashboard queries to ensure the new node's metrics appear
|
||
|
|
- Monitor collector logs for any scraping errors
|
||
|
|
|
||
|
|
## Troubleshooting
|
||
|
|
|
||
|
|
### Common Issues
|
||
|
|
|
||
|
|
1. **Nginx pod not starting**: Check node labels and taints
|
||
|
|
2. **Metrics endpoint not accessible**: Verify network connectivity and firewall rules
|
||
|
|
3. **Collector not scraping**: Check collector logs and restart if needed
|
||
|
|
4. **Missing metrics in dashboard**: Wait 30-60 seconds for metrics to propagate
|
||
|
|
|
||
|
|
### Useful Commands
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Check nginx ingress pods
|
||
|
|
kubectl get pods -n ingress-nginx -o wide
|
||
|
|
|
||
|
|
# Test metrics endpoint
|
||
|
|
curl -s http://NODE_IP:10254/metrics | grep nginx_ingress_controller_requests
|
||
|
|
|
||
|
|
# Check collector status
|
||
|
|
kubectl get pods -n openobserve-collector
|
||
|
|
|
||
|
|
# View collector logs
|
||
|
|
kubectl logs -n openobserve-collector openobserve-collector-gateway-collector-0 --tail=50
|
||
|
|
|
||
|
|
# Check ServiceMonitor
|
||
|
|
kubectl get servicemonitor -n ingress-nginx -o yaml
|
||
|
|
```
|
||
|
|
|
||
|
|
## Configuration Files Summary
|
||
|
|
|
||
|
|
Files that may need updates when adding a node:
|
||
|
|
|
||
|
|
1. **Required**: `manifests/infrastructure/openobserve-collector/gateway-collector.yaml`
|
||
|
|
- Update static targets list (line ~219)
|
||
|
|
|
||
|
|
2. **Optional**: `manifests/infrastructure/cluster-policies/host-fw-worker-nodes.yaml`
|
||
|
|
- Usually already configured for port 10254
|
||
|
|
|
||
|
|
3. **Automatic**: `manifests/infrastructure/ingress-nginx/ingress-nginx.yaml`
|
||
|
|
- No changes needed (DaemonSet handles deployment)
|