add source code and readme

This commit is contained in:
2025-12-24 14:35:17 +01:00
parent 7c92e1e610
commit 74324d5a1b
331 changed files with 39272 additions and 1 deletions

174
docs/NODE-ADDITION-GUIDE.md Normal file
View File

@@ -0,0 +1,174 @@
# Adding a New Node for Nginx Ingress Metrics Collection
This guide documents the steps required to add a new node to the cluster and ensure nginx ingress controller metrics are properly collected from it.
## Overview
The nginx ingress controller is deployed as a **DaemonSet** (kind: DaemonSet), which means it automatically deploys one pod per node. However, for metrics collection to work properly, additional configuration steps are required.
## Current Configuration
Currently, the cluster has 3 nodes with metrics collection configured for:
- **n1 (<NODE_1_EXTERNAL_IP>)**: Control plane + worker
- **n2 (<NODE_2_EXTERNAL_IP>)**: Worker
- **n3 (<NODE_3_EXTERNAL_IP>)**: Worker
## Steps to Add a New Node
### 1. Add the Node to Kubernetes Cluster
Follow your standard node addition process (this is outside the scope of this guide). Ensure the new node:
- Is properly joined to the cluster
- Has the nginx ingress controller pod deployed (should happen automatically due to DaemonSet)
- Is accessible on the cluster network
### 2. Verify Nginx Ingress Controller Deployment
Check that the nginx ingress controller pod is running on the new node:
```bash
kubectl get pods -n ingress-nginx -o wide
```
Look for a pod on your new node. The nginx ingress controller should automatically deploy due to the DaemonSet configuration.
### 3. Update OpenTelemetry Collector Configuration
**File to modify**: `manifests/infrastructure/openobserve-collector/gateway-collector.yaml`
**Current configuration** (lines 217-219):
```yaml
- job_name: 'nginx-ingress'
static_configs:
- targets: ['<NODE_1_EXTERNAL_IP>:10254', '<NODE_2_EXTERNAL_IP>:10254', '<NODE_3_EXTERNAL_IP>:10254']
```
**Add the new node IP** to the targets list:
```yaml
- job_name: 'nginx-ingress'
static_configs:
- targets: ['<NODE_1_EXTERNAL_IP>:10254', '<NODE_2_EXTERNAL_IP>:10254', '<NODE_3_EXTERNAL_IP>:10254', 'NEW_NODE_IP:10254']
```
Replace `NEW_NODE_IP` with the actual IP address of your new node.
### 4. Update Host Firewall Policies (if applicable)
**File to check**: `manifests/infrastructure/cluster-policies/host-fw-worker-nodes.yaml`
Ensure the firewall allows nginx metrics port access (should already be configured):
```yaml
# NGINX Ingress Controller metrics port
- fromEntities:
- cluster
toPorts:
- ports:
- port: "10254"
protocol: "TCP" # NGINX Ingress metrics
```
### 5. Apply the Configuration Changes
```bash
# Apply the updated collector configuration
kubectl apply -f manifests/infrastructure/openobserve-collector/gateway-collector.yaml
# Restart the collector to pick up the new configuration
kubectl rollout restart statefulset/openobserve-collector-gateway-collector -n openobserve-collector
```
### 6. Verification Steps
1. **Check that the nginx pod is running on the new node**:
```bash
kubectl get pods -n ingress-nginx -o wide | grep NEW_NODE_NAME
```
2. **Verify metrics endpoint is accessible**:
```bash
curl -s http://NEW_NODE_IP:10254/metrics | grep nginx_ingress_controller_requests | head -3
```
3. **Check collector logs for the new target**:
```bash
kubectl logs -n openobserve-collector openobserve-collector-gateway-collector-0 --tail=50 | grep -i nginx
```
4. **Verify target discovery**:
Look for log entries like:
```
Scrape job added {"jobName": "nginx-ingress"}
```
5. **Test metrics in OpenObserve**:
Your dashboard query should now include metrics from the new node:
```promql
sum(increase(nginx_ingress_controller_requests[5m])) by (host)
```
## Important Notes
### Automatic vs Manual Configuration
- ✅ **Automatic**: Nginx ingress controller deployment (DaemonSet handles this)
- ✅ **Automatic**: ServiceMonitor discovery (target allocator handles this)
- ❌ **Manual**: Static scrape configuration (requires updating the targets list)
### Why Both ServiceMonitor and Static Config?
The current setup uses **both approaches** for redundancy:
1. **ServiceMonitor**: Automatically discovers nginx ingress services
2. **Static Configuration**: Ensures specific node IPs are always monitored
### Network Requirements
- Port **10254** must be accessible from the OpenTelemetry collector pods
- The new node should be on the same network as existing nodes
- Host firewall policies should allow metrics collection
### Monitoring Best Practices
- Always verify metrics are flowing after adding a node
- Test your dashboard queries to ensure the new node's metrics appear
- Monitor collector logs for any scraping errors
## Troubleshooting
### Common Issues
1. **Nginx pod not starting**: Check node labels and taints
2. **Metrics endpoint not accessible**: Verify network connectivity and firewall rules
3. **Collector not scraping**: Check collector logs and restart if needed
4. **Missing metrics in dashboard**: Wait 30-60 seconds for metrics to propagate
### Useful Commands
```bash
# Check nginx ingress pods
kubectl get pods -n ingress-nginx -o wide
# Test metrics endpoint
curl -s http://NODE_IP:10254/metrics | grep nginx_ingress_controller_requests
# Check collector status
kubectl get pods -n openobserve-collector
# View collector logs
kubectl logs -n openobserve-collector openobserve-collector-gateway-collector-0 --tail=50
# Check ServiceMonitor
kubectl get servicemonitor -n ingress-nginx -o yaml
```
## Configuration Files Summary
Files that may need updates when adding a node:
1. **Required**: `manifests/infrastructure/openobserve-collector/gateway-collector.yaml`
- Update static targets list (line ~219)
2. **Optional**: `manifests/infrastructure/cluster-policies/host-fw-worker-nodes.yaml`
- Usually already configured for port 10254
3. **Automatic**: `manifests/infrastructure/ingress-nginx/ingress-nginx.yaml`
- No changes needed (DaemonSet handles deployment)