Files

Michael DiLeo 7327d77dcd redaction (#1 )

Add the redacted source file for demo purposes

Reviewed-on: https://source.michaeldileo.org/michael_dileo/Keybard-Vagabond-Demo/pulls/1
Co-authored-by: Michael DiLeo <michael_dileo@proton.me>
Co-committed-by: Michael DiLeo <michael_dileo@proton.me>

2025-12-24 13:40:47 +00:00

2.8 KiB

Raw Blame History

Kubernetes Metrics Server

Overview

This deploys the Kubernetes Metrics Server to provide resource metrics for nodes and pods. The metrics server enables kubectl top commands and provides metrics for Horizontal Pod Autoscaling (HPA) and Vertical Pod Autoscaling (VPA).

Architecture

Current Deployment (Simple)

Version: v0.7.2 (latest stable)
Replicas: 2 (HA across both cluster nodes)
TLS Mode: Insecure TLS for initial deployment (--kubelet-insecure-tls=true)
Integration: OpenObserve monitoring via ServiceMonitor

Security Configuration

The current deployment uses --kubelet-insecure-tls=true for compatibility with Talos Linux. This is acceptable for internal cluster metrics as:

Metrics traffic stays within the cluster network
The VLAN provides network isolation
No sensitive data is exposed via metrics
Proper RBAC controls access to the metrics API

Future Enhancements (Optional)

For production hardening, the repository includes:

certificate.yaml: cert-manager certificates for proper TLS
metrics-server.yaml: Full TLS-enabled deployment
Switch to secure TLS by updating kustomization.yaml when needed

Usage

Basic Commands

# View node resource usage
kubectl top nodes

# View pod resource usage (all namespaces)
kubectl top pods --all-namespaces

# View pod resource usage (specific namespace)
kubectl top pods -n kube-system

# View pod resource usage with containers
kubectl top pods --containers

Integration with Monitoring

The metrics server is automatically discovered by OpenObserve via ServiceMonitor for:

Metrics server performance monitoring
Resource usage dashboards
Alerting on high resource consumption

Troubleshooting

Common Issues

"Metrics API not available": Check pod status with kubectl get pods -n metrics-server-system
TLS certificate errors: Verify APIService with kubectl get apiservice v1beta1.metrics.k8s.io
Resource limits: Pods may be OOMKilled if cluster load is high

Verification

# Check metrics server status
kubectl get pods -n metrics-server-system

# Verify API registration
kubectl get apiservice v1beta1.metrics.k8s.io

# Test metrics collection
kubectl top nodes
kubectl top pods -n metrics-server-system

Configuration

Resource Requests/Limits

CPU: 100m request, 500m limit
Memory: 200Mi request, 500Mi limit
Priority: system-cluster-critical

Node Scheduling

Tolerates control plane taints
Can schedule on both n1 (control plane) and n2 (worker)
Uses node selector for Linux nodes only

Monitoring Integration

ServiceMonitor: Automatically scraped by OpenObserve
Metrics Path: /metrics on HTTPS port
Scrape Interval: 30 seconds
Dashboard: Available in OpenObserve for resource analysis

2.8 KiB Raw Blame History