Files
Keybard-Vagabond-Demo/manifests/infrastructure/metrics-server/README.md
Michael DiLeo 7327d77dcd redaction (#1)
Add the redacted source file for demo purposes

Reviewed-on: https://source.michaeldileo.org/michael_dileo/Keybard-Vagabond-Demo/pulls/1
Co-authored-by: Michael DiLeo <michael_dileo@proton.me>
Co-committed-by: Michael DiLeo <michael_dileo@proton.me>
2025-12-24 13:40:47 +00:00

2.8 KiB

Kubernetes Metrics Server

Overview

This deploys the Kubernetes Metrics Server to provide resource metrics for nodes and pods. The metrics server enables kubectl top commands and provides metrics for Horizontal Pod Autoscaling (HPA) and Vertical Pod Autoscaling (VPA).

Architecture

Current Deployment (Simple)

  • Version: v0.7.2 (latest stable)
  • Replicas: 2 (HA across both cluster nodes)
  • TLS Mode: Insecure TLS for initial deployment (--kubelet-insecure-tls=true)
  • Integration: OpenObserve monitoring via ServiceMonitor

Security Configuration

The current deployment uses --kubelet-insecure-tls=true for compatibility with Talos Linux. This is acceptable for internal cluster metrics as:

  • Metrics traffic stays within the cluster network
  • The VLAN provides network isolation
  • No sensitive data is exposed via metrics
  • Proper RBAC controls access to the metrics API

Future Enhancements (Optional)

For production hardening, the repository includes:

  • certificate.yaml: cert-manager certificates for proper TLS
  • metrics-server.yaml: Full TLS-enabled deployment
  • Switch to secure TLS by updating kustomization.yaml when needed

Usage

Basic Commands

# View node resource usage
kubectl top nodes

# View pod resource usage (all namespaces)
kubectl top pods --all-namespaces

# View pod resource usage (specific namespace)
kubectl top pods -n kube-system

# View pod resource usage with containers
kubectl top pods --containers

Integration with Monitoring

The metrics server is automatically discovered by OpenObserve via ServiceMonitor for:

  • Metrics server performance monitoring
  • Resource usage dashboards
  • Alerting on high resource consumption

Troubleshooting

Common Issues

  1. "Metrics API not available": Check pod status with kubectl get pods -n metrics-server-system
  2. TLS certificate errors: Verify APIService with kubectl get apiservice v1beta1.metrics.k8s.io
  3. Resource limits: Pods may be OOMKilled if cluster load is high

Verification

# Check metrics server status
kubectl get pods -n metrics-server-system

# Verify API registration
kubectl get apiservice v1beta1.metrics.k8s.io

# Test metrics collection
kubectl top nodes
kubectl top pods -n metrics-server-system

Configuration

Resource Requests/Limits

  • CPU: 100m request, 500m limit
  • Memory: 200Mi request, 500Mi limit
  • Priority: system-cluster-critical

Node Scheduling

  • Tolerates control plane taints
  • Can schedule on both n1 (control plane) and n2 (worker)
  • Uses node selector for Linux nodes only

Monitoring Integration

  • ServiceMonitor: Automatically scraped by OpenObserve
  • Metrics Path: /metrics on HTTPS port
  • Scrape Interval: 30 seconds
  • Dashboard: Available in OpenObserve for resource analysis