Files
Keybard-Vagabond-Demo/manifests/infrastructure/metrics-server/README.md

87 lines
2.8 KiB
Markdown
Raw Normal View History

# Kubernetes Metrics Server
## Overview
This deploys the Kubernetes Metrics Server to provide resource metrics for nodes and pods. The metrics server enables `kubectl top` commands and provides metrics for Horizontal Pod Autoscaling (HPA) and Vertical Pod Autoscaling (VPA).
## Architecture
### Current Deployment (Simple)
- **Version**: v0.7.2 (latest stable)
- **Replicas**: 2 (HA across both cluster nodes)
- **TLS Mode**: Insecure TLS for initial deployment (`--kubelet-insecure-tls=true`)
- **Integration**: OpenObserve monitoring via ServiceMonitor
### Security Configuration
The current deployment uses `--kubelet-insecure-tls=true` for compatibility with Talos Linux. This is acceptable for internal cluster metrics as:
- Metrics traffic stays within the cluster network
- The VLAN provides network isolation
- No sensitive data is exposed via metrics
- Proper RBAC controls access to the metrics API
### Future Enhancements (Optional)
For production hardening, the repository includes:
- `certificate.yaml`: cert-manager certificates for proper TLS
- `metrics-server.yaml`: Full TLS-enabled deployment
- Switch to secure TLS by updating kustomization.yaml when needed
## Usage
### Basic Commands
```bash
# View node resource usage
kubectl top nodes
# View pod resource usage (all namespaces)
kubectl top pods --all-namespaces
# View pod resource usage (specific namespace)
kubectl top pods -n kube-system
# View pod resource usage with containers
kubectl top pods --containers
```
### Integration with Monitoring
The metrics server is automatically discovered by OpenObserve via ServiceMonitor for:
- Metrics server performance monitoring
- Resource usage dashboards
- Alerting on high resource consumption
## Troubleshooting
### Common Issues
1. **"Metrics API not available"**: Check pod status with `kubectl get pods -n metrics-server-system`
2. **TLS certificate errors**: Verify APIService with `kubectl get apiservice v1beta1.metrics.k8s.io`
3. **Resource limits**: Pods may be OOMKilled if cluster load is high
### Verification
```bash
# Check metrics server status
kubectl get pods -n metrics-server-system
# Verify API registration
kubectl get apiservice v1beta1.metrics.k8s.io
# Test metrics collection
kubectl top nodes
kubectl top pods -n metrics-server-system
```
## Configuration
### Resource Requests/Limits
- **CPU**: 100m request, 500m limit
- **Memory**: 200Mi request, 500Mi limit
- **Priority**: system-cluster-critical
### Node Scheduling
- Tolerates control plane taints
- Can schedule on both n1 (control plane) and n2 (worker)
- Uses node selector for Linux nodes only
## Monitoring Integration
- **ServiceMonitor**: Automatically scraped by OpenObserve
- **Metrics Path**: `/metrics` on HTTPS port
- **Scrape Interval**: 30 seconds
- **Dashboard**: Available in OpenObserve for resource analysis