add source code and readme
This commit is contained in:
174
docs/NODE-ADDITION-GUIDE.md
Normal file
174
docs/NODE-ADDITION-GUIDE.md
Normal file
@@ -0,0 +1,174 @@
|
||||
# Adding a New Node for Nginx Ingress Metrics Collection
|
||||
|
||||
This guide documents the steps required to add a new node to the cluster and ensure nginx ingress controller metrics are properly collected from it.
|
||||
|
||||
## Overview
|
||||
|
||||
The nginx ingress controller is deployed as a **DaemonSet** (kind: DaemonSet), which means it automatically deploys one pod per node. However, for metrics collection to work properly, additional configuration steps are required.
|
||||
|
||||
## Current Configuration
|
||||
|
||||
Currently, the cluster has 3 nodes with metrics collection configured for:
|
||||
- **n1 (<NODE_1_EXTERNAL_IP>)**: Control plane + worker
|
||||
- **n2 (<NODE_2_EXTERNAL_IP>)**: Worker
|
||||
- **n3 (<NODE_3_EXTERNAL_IP>)**: Worker
|
||||
|
||||
## Steps to Add a New Node
|
||||
|
||||
### 1. Add the Node to Kubernetes Cluster
|
||||
|
||||
Follow your standard node addition process (this is outside the scope of this guide). Ensure the new node:
|
||||
- Is properly joined to the cluster
|
||||
- Has the nginx ingress controller pod deployed (should happen automatically due to DaemonSet)
|
||||
- Is accessible on the cluster network
|
||||
|
||||
### 2. Verify Nginx Ingress Controller Deployment
|
||||
|
||||
Check that the nginx ingress controller pod is running on the new node:
|
||||
|
||||
```bash
|
||||
kubectl get pods -n ingress-nginx -o wide
|
||||
```
|
||||
|
||||
Look for a pod on your new node. The nginx ingress controller should automatically deploy due to the DaemonSet configuration.
|
||||
|
||||
### 3. Update OpenTelemetry Collector Configuration
|
||||
|
||||
**File to modify**: `manifests/infrastructure/openobserve-collector/gateway-collector.yaml`
|
||||
|
||||
**Current configuration** (lines 217-219):
|
||||
```yaml
|
||||
- job_name: 'nginx-ingress'
|
||||
static_configs:
|
||||
- targets: ['<NODE_1_EXTERNAL_IP>:10254', '<NODE_2_EXTERNAL_IP>:10254', '<NODE_3_EXTERNAL_IP>:10254']
|
||||
```
|
||||
|
||||
**Add the new node IP** to the targets list:
|
||||
```yaml
|
||||
- job_name: 'nginx-ingress'
|
||||
static_configs:
|
||||
- targets: ['<NODE_1_EXTERNAL_IP>:10254', '<NODE_2_EXTERNAL_IP>:10254', '<NODE_3_EXTERNAL_IP>:10254', 'NEW_NODE_IP:10254']
|
||||
```
|
||||
|
||||
Replace `NEW_NODE_IP` with the actual IP address of your new node.
|
||||
|
||||
### 4. Update Host Firewall Policies (if applicable)
|
||||
|
||||
**File to check**: `manifests/infrastructure/cluster-policies/host-fw-worker-nodes.yaml`
|
||||
|
||||
Ensure the firewall allows nginx metrics port access (should already be configured):
|
||||
```yaml
|
||||
# NGINX Ingress Controller metrics port
|
||||
- fromEntities:
|
||||
- cluster
|
||||
toPorts:
|
||||
- ports:
|
||||
- port: "10254"
|
||||
protocol: "TCP" # NGINX Ingress metrics
|
||||
```
|
||||
|
||||
### 5. Apply the Configuration Changes
|
||||
|
||||
```bash
|
||||
# Apply the updated collector configuration
|
||||
kubectl apply -f manifests/infrastructure/openobserve-collector/gateway-collector.yaml
|
||||
|
||||
# Restart the collector to pick up the new configuration
|
||||
kubectl rollout restart statefulset/openobserve-collector-gateway-collector -n openobserve-collector
|
||||
```
|
||||
|
||||
### 6. Verification Steps
|
||||
|
||||
1. **Check that the nginx pod is running on the new node**:
|
||||
```bash
|
||||
kubectl get pods -n ingress-nginx -o wide | grep NEW_NODE_NAME
|
||||
```
|
||||
|
||||
2. **Verify metrics endpoint is accessible**:
|
||||
```bash
|
||||
curl -s http://NEW_NODE_IP:10254/metrics | grep nginx_ingress_controller_requests | head -3
|
||||
```
|
||||
|
||||
3. **Check collector logs for the new target**:
|
||||
```bash
|
||||
kubectl logs -n openobserve-collector openobserve-collector-gateway-collector-0 --tail=50 | grep -i nginx
|
||||
```
|
||||
|
||||
4. **Verify target discovery**:
|
||||
Look for log entries like:
|
||||
```
|
||||
Scrape job added {"jobName": "nginx-ingress"}
|
||||
```
|
||||
|
||||
5. **Test metrics in OpenObserve**:
|
||||
Your dashboard query should now include metrics from the new node:
|
||||
```promql
|
||||
sum(increase(nginx_ingress_controller_requests[5m])) by (host)
|
||||
```
|
||||
|
||||
## Important Notes
|
||||
|
||||
### Automatic vs Manual Configuration
|
||||
|
||||
- ✅ **Automatic**: Nginx ingress controller deployment (DaemonSet handles this)
|
||||
- ✅ **Automatic**: ServiceMonitor discovery (target allocator handles this)
|
||||
- ❌ **Manual**: Static scrape configuration (requires updating the targets list)
|
||||
|
||||
### Why Both ServiceMonitor and Static Config?
|
||||
|
||||
The current setup uses **both approaches** for redundancy:
|
||||
1. **ServiceMonitor**: Automatically discovers nginx ingress services
|
||||
2. **Static Configuration**: Ensures specific node IPs are always monitored
|
||||
|
||||
### Network Requirements
|
||||
|
||||
- Port **10254** must be accessible from the OpenTelemetry collector pods
|
||||
- The new node should be on the same network as existing nodes
|
||||
- Host firewall policies should allow metrics collection
|
||||
|
||||
### Monitoring Best Practices
|
||||
|
||||
- Always verify metrics are flowing after adding a node
|
||||
- Test your dashboard queries to ensure the new node's metrics appear
|
||||
- Monitor collector logs for any scraping errors
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
1. **Nginx pod not starting**: Check node labels and taints
|
||||
2. **Metrics endpoint not accessible**: Verify network connectivity and firewall rules
|
||||
3. **Collector not scraping**: Check collector logs and restart if needed
|
||||
4. **Missing metrics in dashboard**: Wait 30-60 seconds for metrics to propagate
|
||||
|
||||
### Useful Commands
|
||||
|
||||
```bash
|
||||
# Check nginx ingress pods
|
||||
kubectl get pods -n ingress-nginx -o wide
|
||||
|
||||
# Test metrics endpoint
|
||||
curl -s http://NODE_IP:10254/metrics | grep nginx_ingress_controller_requests
|
||||
|
||||
# Check collector status
|
||||
kubectl get pods -n openobserve-collector
|
||||
|
||||
# View collector logs
|
||||
kubectl logs -n openobserve-collector openobserve-collector-gateway-collector-0 --tail=50
|
||||
|
||||
# Check ServiceMonitor
|
||||
kubectl get servicemonitor -n ingress-nginx -o yaml
|
||||
```
|
||||
|
||||
## Configuration Files Summary
|
||||
|
||||
Files that may need updates when adding a node:
|
||||
|
||||
1. **Required**: `manifests/infrastructure/openobserve-collector/gateway-collector.yaml`
|
||||
- Update static targets list (line ~219)
|
||||
|
||||
2. **Optional**: `manifests/infrastructure/cluster-policies/host-fw-worker-nodes.yaml`
|
||||
- Usually already configured for port 10254
|
||||
|
||||
3. **Automatic**: `manifests/infrastructure/ingress-nginx/ingress-nginx.yaml`
|
||||
- No changes needed (DaemonSet handles deployment)
|
||||
Reference in New Issue
Block a user