add source code and readme
This commit is contained in:
86
manifests/infrastructure/metrics-server/README.md
Normal file
86
manifests/infrastructure/metrics-server/README.md
Normal file
@@ -0,0 +1,86 @@
|
||||
# Kubernetes Metrics Server
|
||||
|
||||
## Overview
|
||||
This deploys the Kubernetes Metrics Server to provide resource metrics for nodes and pods. The metrics server enables `kubectl top` commands and provides metrics for Horizontal Pod Autoscaling (HPA) and Vertical Pod Autoscaling (VPA).
|
||||
|
||||
## Architecture
|
||||
|
||||
### Current Deployment (Simple)
|
||||
- **Version**: v0.7.2 (latest stable)
|
||||
- **Replicas**: 2 (HA across both cluster nodes)
|
||||
- **TLS Mode**: Insecure TLS for initial deployment (`--kubelet-insecure-tls=true`)
|
||||
- **Integration**: OpenObserve monitoring via ServiceMonitor
|
||||
|
||||
### Security Configuration
|
||||
The current deployment uses `--kubelet-insecure-tls=true` for compatibility with Talos Linux. This is acceptable for internal cluster metrics as:
|
||||
- Metrics traffic stays within the cluster network
|
||||
- The VLAN provides network isolation
|
||||
- No sensitive data is exposed via metrics
|
||||
- Proper RBAC controls access to the metrics API
|
||||
|
||||
### Future Enhancements (Optional)
|
||||
For production hardening, the repository includes:
|
||||
- `certificate.yaml`: cert-manager certificates for proper TLS
|
||||
- `metrics-server.yaml`: Full TLS-enabled deployment
|
||||
- Switch to secure TLS by updating kustomization.yaml when needed
|
||||
|
||||
## Usage
|
||||
|
||||
### Basic Commands
|
||||
```bash
|
||||
# View node resource usage
|
||||
kubectl top nodes
|
||||
|
||||
# View pod resource usage (all namespaces)
|
||||
kubectl top pods --all-namespaces
|
||||
|
||||
# View pod resource usage (specific namespace)
|
||||
kubectl top pods -n kube-system
|
||||
|
||||
# View pod resource usage with containers
|
||||
kubectl top pods --containers
|
||||
```
|
||||
|
||||
### Integration with Monitoring
|
||||
The metrics server is automatically discovered by OpenObserve via ServiceMonitor for:
|
||||
- Metrics server performance monitoring
|
||||
- Resource usage dashboards
|
||||
- Alerting on high resource consumption
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
1. **"Metrics API not available"**: Check pod status with `kubectl get pods -n metrics-server-system`
|
||||
2. **TLS certificate errors**: Verify APIService with `kubectl get apiservice v1beta1.metrics.k8s.io`
|
||||
3. **Resource limits**: Pods may be OOMKilled if cluster load is high
|
||||
|
||||
### Verification
|
||||
```bash
|
||||
# Check metrics server status
|
||||
kubectl get pods -n metrics-server-system
|
||||
|
||||
# Verify API registration
|
||||
kubectl get apiservice v1beta1.metrics.k8s.io
|
||||
|
||||
# Test metrics collection
|
||||
kubectl top nodes
|
||||
kubectl top pods -n metrics-server-system
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
### Resource Requests/Limits
|
||||
- **CPU**: 100m request, 500m limit
|
||||
- **Memory**: 200Mi request, 500Mi limit
|
||||
- **Priority**: system-cluster-critical
|
||||
|
||||
### Node Scheduling
|
||||
- Tolerates control plane taints
|
||||
- Can schedule on both n1 (control plane) and n2 (worker)
|
||||
- Uses node selector for Linux nodes only
|
||||
|
||||
## Monitoring Integration
|
||||
- **ServiceMonitor**: Automatically scraped by OpenObserve
|
||||
- **Metrics Path**: `/metrics` on HTTPS port
|
||||
- **Scrape Interval**: 30 seconds
|
||||
- **Dashboard**: Available in OpenObserve for resource analysis
|
||||
Reference in New Issue
Block a user