87 lines
2.8 KiB
Markdown
87 lines
2.8 KiB
Markdown
|
|
# Kubernetes Metrics Server
|
||
|
|
|
||
|
|
## Overview
|
||
|
|
This deploys the Kubernetes Metrics Server to provide resource metrics for nodes and pods. The metrics server enables `kubectl top` commands and provides metrics for Horizontal Pod Autoscaling (HPA) and Vertical Pod Autoscaling (VPA).
|
||
|
|
|
||
|
|
## Architecture
|
||
|
|
|
||
|
|
### Current Deployment (Simple)
|
||
|
|
- **Version**: v0.7.2 (latest stable)
|
||
|
|
- **Replicas**: 2 (HA across both cluster nodes)
|
||
|
|
- **TLS Mode**: Insecure TLS for initial deployment (`--kubelet-insecure-tls=true`)
|
||
|
|
- **Integration**: OpenObserve monitoring via ServiceMonitor
|
||
|
|
|
||
|
|
### Security Configuration
|
||
|
|
The current deployment uses `--kubelet-insecure-tls=true` for compatibility with Talos Linux. This is acceptable for internal cluster metrics as:
|
||
|
|
- Metrics traffic stays within the cluster network
|
||
|
|
- The VLAN provides network isolation
|
||
|
|
- No sensitive data is exposed via metrics
|
||
|
|
- Proper RBAC controls access to the metrics API
|
||
|
|
|
||
|
|
### Future Enhancements (Optional)
|
||
|
|
For production hardening, the repository includes:
|
||
|
|
- `certificate.yaml`: cert-manager certificates for proper TLS
|
||
|
|
- `metrics-server.yaml`: Full TLS-enabled deployment
|
||
|
|
- Switch to secure TLS by updating kustomization.yaml when needed
|
||
|
|
|
||
|
|
## Usage
|
||
|
|
|
||
|
|
### Basic Commands
|
||
|
|
```bash
|
||
|
|
# View node resource usage
|
||
|
|
kubectl top nodes
|
||
|
|
|
||
|
|
# View pod resource usage (all namespaces)
|
||
|
|
kubectl top pods --all-namespaces
|
||
|
|
|
||
|
|
# View pod resource usage (specific namespace)
|
||
|
|
kubectl top pods -n kube-system
|
||
|
|
|
||
|
|
# View pod resource usage with containers
|
||
|
|
kubectl top pods --containers
|
||
|
|
```
|
||
|
|
|
||
|
|
### Integration with Monitoring
|
||
|
|
The metrics server is automatically discovered by OpenObserve via ServiceMonitor for:
|
||
|
|
- Metrics server performance monitoring
|
||
|
|
- Resource usage dashboards
|
||
|
|
- Alerting on high resource consumption
|
||
|
|
|
||
|
|
## Troubleshooting
|
||
|
|
|
||
|
|
### Common Issues
|
||
|
|
1. **"Metrics API not available"**: Check pod status with `kubectl get pods -n metrics-server-system`
|
||
|
|
2. **TLS certificate errors**: Verify APIService with `kubectl get apiservice v1beta1.metrics.k8s.io`
|
||
|
|
3. **Resource limits**: Pods may be OOMKilled if cluster load is high
|
||
|
|
|
||
|
|
### Verification
|
||
|
|
```bash
|
||
|
|
# Check metrics server status
|
||
|
|
kubectl get pods -n metrics-server-system
|
||
|
|
|
||
|
|
# Verify API registration
|
||
|
|
kubectl get apiservice v1beta1.metrics.k8s.io
|
||
|
|
|
||
|
|
# Test metrics collection
|
||
|
|
kubectl top nodes
|
||
|
|
kubectl top pods -n metrics-server-system
|
||
|
|
```
|
||
|
|
|
||
|
|
## Configuration
|
||
|
|
|
||
|
|
### Resource Requests/Limits
|
||
|
|
- **CPU**: 100m request, 500m limit
|
||
|
|
- **Memory**: 200Mi request, 500Mi limit
|
||
|
|
- **Priority**: system-cluster-critical
|
||
|
|
|
||
|
|
### Node Scheduling
|
||
|
|
- Tolerates control plane taints
|
||
|
|
- Can schedule on both n1 (control plane) and n2 (worker)
|
||
|
|
- Uses node selector for Linux nodes only
|
||
|
|
|
||
|
|
## Monitoring Integration
|
||
|
|
- **ServiceMonitor**: Automatically scraped by OpenObserve
|
||
|
|
- **Metrics Path**: `/metrics` on HTTPS port
|
||
|
|
- **Scrape Interval**: 30 seconds
|
||
|
|
- **Dashboard**: Available in OpenObserve for resource analysis
|