redaction (#1)

Add the redacted source file for demo purposes

Reviewed-on: https://source.michaeldileo.org/michael_dileo/Keybard-Vagabond-Demo/pulls/1
Co-authored-by: Michael DiLeo <michael_dileo@proton.me>
Co-committed-by: Michael DiLeo <michael_dileo@proton.me>
This commit was merged in pull request #1.
This commit is contained in:
2025-12-24 13:40:47 +00:00
committed by michael_dileo
parent 612235d52b
commit 7327d77dcd
333 changed files with 39286 additions and 1 deletions

View File

@@ -0,0 +1,86 @@
# Kubernetes Metrics Server
## Overview
This deploys the Kubernetes Metrics Server to provide resource metrics for nodes and pods. The metrics server enables `kubectl top` commands and provides metrics for Horizontal Pod Autoscaling (HPA) and Vertical Pod Autoscaling (VPA).
## Architecture
### Current Deployment (Simple)
- **Version**: v0.7.2 (latest stable)
- **Replicas**: 2 (HA across both cluster nodes)
- **TLS Mode**: Insecure TLS for initial deployment (`--kubelet-insecure-tls=true`)
- **Integration**: OpenObserve monitoring via ServiceMonitor
### Security Configuration
The current deployment uses `--kubelet-insecure-tls=true` for compatibility with Talos Linux. This is acceptable for internal cluster metrics as:
- Metrics traffic stays within the cluster network
- The VLAN provides network isolation
- No sensitive data is exposed via metrics
- Proper RBAC controls access to the metrics API
### Future Enhancements (Optional)
For production hardening, the repository includes:
- `certificate.yaml`: cert-manager certificates for proper TLS
- `metrics-server.yaml`: Full TLS-enabled deployment
- Switch to secure TLS by updating kustomization.yaml when needed
## Usage
### Basic Commands
```bash
# View node resource usage
kubectl top nodes
# View pod resource usage (all namespaces)
kubectl top pods --all-namespaces
# View pod resource usage (specific namespace)
kubectl top pods -n kube-system
# View pod resource usage with containers
kubectl top pods --containers
```
### Integration with Monitoring
The metrics server is automatically discovered by OpenObserve via ServiceMonitor for:
- Metrics server performance monitoring
- Resource usage dashboards
- Alerting on high resource consumption
## Troubleshooting
### Common Issues
1. **"Metrics API not available"**: Check pod status with `kubectl get pods -n metrics-server-system`
2. **TLS certificate errors**: Verify APIService with `kubectl get apiservice v1beta1.metrics.k8s.io`
3. **Resource limits**: Pods may be OOMKilled if cluster load is high
### Verification
```bash
# Check metrics server status
kubectl get pods -n metrics-server-system
# Verify API registration
kubectl get apiservice v1beta1.metrics.k8s.io
# Test metrics collection
kubectl top nodes
kubectl top pods -n metrics-server-system
```
## Configuration
### Resource Requests/Limits
- **CPU**: 100m request, 500m limit
- **Memory**: 200Mi request, 500Mi limit
- **Priority**: system-cluster-critical
### Node Scheduling
- Tolerates control plane taints
- Can schedule on both n1 (control plane) and n2 (worker)
- Uses node selector for Linux nodes only
## Monitoring Integration
- **ServiceMonitor**: Automatically scraped by OpenObserve
- **Metrics Path**: `/metrics` on HTTPS port
- **Scrape Interval**: 30 seconds
- **Dashboard**: Available in OpenObserve for resource analysis

View File

@@ -0,0 +1,50 @@
---
# Self-signed CA for metrics server (for internal cluster communication)
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: metrics-server-selfsigned-issuer
spec:
selfSigned: {}
---
# CA Certificate for metrics server
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: metrics-server-ca
namespace: metrics-server-system
spec:
secretName: metrics-server-ca-secret
commonName: "metrics-server-ca"
isCA: true
issuerRef:
name: metrics-server-selfsigned-issuer
kind: ClusterIssuer
---
# CA Issuer using the generated CA
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
name: metrics-server-ca-issuer
namespace: metrics-server-system
spec:
ca:
secretName: metrics-server-ca-secret
---
# TLS Certificate for metrics server
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: metrics-server-certs
namespace: metrics-server-system
spec:
secretName: metrics-server-certs
issuerRef:
name: metrics-server-ca-issuer
kind: Issuer
commonName: metrics-server
dnsNames:
- metrics-server
- metrics-server.metrics-server-system
- metrics-server.metrics-server-system.svc
- metrics-server.metrics-server-system.svc.cluster.local

View File

@@ -0,0 +1,16 @@
---
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
metadata:
name: metrics-server
namespace: metrics-server-system
resources:
- namespace.yaml
- metrics-server-simple.yaml # Use simple version for immediate deployment
- monitoring.yaml
commonLabels:
app.kubernetes.io/name: metrics-server
app.kubernetes.io/component: metrics-server

View File

@@ -0,0 +1,217 @@
---
# Simplified metrics server deployment for immediate use (without cert-manager dependency)
# This version uses kubelet insecure TLS for initial setup
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: metrics-server-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
k8s-app: metrics-server
rbac.authorization.k8s.io/aggregate-to-admin: "true"
rbac.authorization.k8s.io/aggregate-to-edit: "true"
rbac.authorization.k8s.io/aggregate-to-view: "true"
name: system:aggregated-metrics-reader
rules:
- apiGroups:
- metrics.k8s.io
resources:
- pods
- nodes
verbs:
- get
- list
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
k8s-app: metrics-server
name: system:metrics-server
rules:
- apiGroups:
- ""
resources:
- nodes/metrics
verbs:
- get
- apiGroups:
- ""
resources:
- pods
- nodes
verbs:
- get
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
labels:
k8s-app: metrics-server
name: metrics-server-auth-reader
namespace: kube-system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: extension-apiserver-authentication-reader
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: metrics-server-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
k8s-app: metrics-server
name: metrics-server:system:auth-delegator
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:auth-delegator
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: metrics-server-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
k8s-app: metrics-server
name: system:metrics-server
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:metrics-server
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: metrics-server-system
---
apiVersion: v1
kind: Service
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: metrics-server-system
spec:
ports:
- name: https
port: 443
protocol: TCP
targetPort: https
selector:
k8s-app: metrics-server
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: metrics-server-system
spec:
replicas: 2 # HA setup for your 2-node cluster
selector:
matchLabels:
k8s-app: metrics-server
strategy:
rollingUpdate:
maxUnavailable: 0
template:
metadata:
labels:
k8s-app: metrics-server
spec:
containers:
- args:
- --cert-dir=/tmp
- --secure-port=10250
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --kubelet-use-node-status-port
- --metric-resolution=15s
# Talos-specific: Use insecure TLS for initial setup
- --kubelet-insecure-tls=true
image: registry.k8s.io/metrics-server/metrics-server:v0.7.2
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /livez
port: https
scheme: HTTPS
periodSeconds: 10
name: metrics-server
ports:
- containerPort: 10250
name: https
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /readyz
port: https
scheme: HTTPS
initialDelaySeconds: 20
periodSeconds: 10
resources:
requests:
cpu: 25m # Reduced from 100m - actual usage ~7-14m
memory: 64Mi # Reduced from 200Mi - actual usage ~48-52MB
limits:
cpu: 100m # Reduced from 500m but still adequate
memory: 128Mi # Reduced from 500Mi but still adequate
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
volumeMounts:
- mountPath: /tmp
name: tmp-dir
nodeSelector:
kubernetes.io/os: linux
priorityClassName: system-cluster-critical
serviceAccountName: metrics-server
tolerations:
# Allow scheduling on control plane
- key: node-role.kubernetes.io/control-plane
operator: Exists
effect: NoSchedule
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
volumes:
- emptyDir: {}
name: tmp-dir
---
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
labels:
k8s-app: metrics-server
name: v1beta1.metrics.k8s.io
spec:
group: metrics.k8s.io
groupPriorityMinimum: 100
insecureSkipTLSVerify: true # For initial setup
service:
name: metrics-server
namespace: metrics-server-system
version: v1beta1
versionPriority: 100

View File

@@ -0,0 +1,228 @@
---
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: metrics-server-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
k8s-app: metrics-server
rbac.authorization.k8s.io/aggregate-to-admin: "true"
rbac.authorization.k8s.io/aggregate-to-edit: "true"
rbac.authorization.k8s.io/aggregate-to-view: "true"
name: system:aggregated-metrics-reader
rules:
- apiGroups:
- metrics.k8s.io
resources:
- pods
- nodes
verbs:
- get
- list
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
k8s-app: metrics-server
name: system:metrics-server
rules:
- apiGroups:
- ""
resources:
- nodes/metrics
verbs:
- get
- apiGroups:
- ""
resources:
- pods
- nodes
verbs:
- get
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
labels:
k8s-app: metrics-server
name: metrics-server-auth-reader
namespace: kube-system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: extension-apiserver-authentication-reader
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: metrics-server-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
k8s-app: metrics-server
name: metrics-server:system:auth-delegator
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:auth-delegator
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: metrics-server-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
k8s-app: metrics-server
name: system:metrics-server
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:metrics-server
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: metrics-server-system
---
apiVersion: v1
kind: Service
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: metrics-server-system
spec:
ports:
- name: https
port: 443
protocol: TCP
targetPort: https
selector:
k8s-app: metrics-server
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: metrics-server-system
spec:
replicas: 2 # HA setup for your 2-node cluster
selector:
matchLabels:
k8s-app: metrics-server
strategy:
rollingUpdate:
maxUnavailable: 0
template:
metadata:
labels:
k8s-app: metrics-server
spec:
containers:
- args:
- --cert-dir=/tmp
- --secure-port=10250
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --kubelet-use-node-status-port
- --metric-resolution=30s
# Talos-specific configuration for proper TLS
- --kubelet-insecure-tls=false # Use proper TLS for production
- --tls-cert-file=/etc/certs/tls.crt
- --tls-private-key-file=/etc/certs/tls.key
- --requestheader-client-ca-file=/etc/certs/ca.crt
- --requestheader-allowed-names=aggregator
- --requestheader-extra-headers-prefix=X-Remote-Extra-
- --requestheader-group-headers=X-Remote-Group
- --requestheader-username-headers=X-Remote-User
image: registry.k8s.io/metrics-server/metrics-server:v0.7.2
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /livez
port: https
scheme: HTTPS
periodSeconds: 10
name: metrics-server
ports:
- containerPort: 10250
name: https
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /readyz
port: https
scheme: HTTPS
initialDelaySeconds: 20
periodSeconds: 10
resources:
requests:
cpu: 100m
memory: 200Mi
limits:
cpu: 500m
memory: 500Mi
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
volumeMounts:
- mountPath: /tmp
name: tmp-dir
- mountPath: /etc/certs
name: certs
readOnly: true
nodeSelector:
kubernetes.io/os: linux
priorityClassName: system-cluster-critical
serviceAccountName: metrics-server
tolerations:
# Allow scheduling on control plane
- key: node-role.kubernetes.io/control-plane
operator: Exists
effect: NoSchedule
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
volumes:
- emptyDir: {}
name: tmp-dir
- name: certs
secret:
secretName: metrics-server-certs
---
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
labels:
k8s-app: metrics-server
name: v1beta1.metrics.k8s.io
spec:
group: metrics.k8s.io
groupPriorityMinimum: 100
insecureSkipTLSVerify: false
service:
name: metrics-server
namespace: metrics-server-system
version: v1beta1
versionPriority: 100

View File

@@ -0,0 +1,26 @@
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: metrics-server
namespace: metrics-server-system
labels:
k8s-app: metrics-server
spec:
selector:
matchLabels:
k8s-app: metrics-server
endpoints:
- port: https
interval: 30s
path: /metrics
scheme: https
tlsConfig:
# Use the cluster's CA to verify the metrics server certificate
caFile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
serverName: metrics-server.metrics-server-system.svc.cluster.local
insecureSkipVerify: false
bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
namespaceSelector:
matchNames:
- metrics-server-system

View File

@@ -0,0 +1,10 @@
---
apiVersion: v1
kind: Namespace
metadata:
name: metrics-server-system
labels:
name: metrics-server-system
pod-security.kubernetes.io/enforce: restricted
pod-security.kubernetes.io/audit: restricted
pod-security.kubernetes.io/warn: restricted