Skip to content

metrics-server

metrics-server is the small extension API server that scrapes kubelet’s /metrics/resource endpoint on every node and exposes the result as the metrics.k8s.io/v1beta1 API. Without it, kubectl top returns “Metrics API not available,” HorizontalPodAutoscaler can’t read resource utilisation, and Homepage’s cluster widget can’t draw its CPU/RAM bars. With it, all three Just Work.

A single Deployment in the metrics-server namespace, plus the v1beta1.metrics.k8s.io APIService registration that points the aggregation layer at it. No UI, no LAN exposure, no secrets to sync — the entire addon is the chart and a values file:

kubernetes/apps/metrics-server/
├── application.yaml # multi-source ArgoCD Application
└── helm-values.yaml # metrics-server chart values, tuned for Talos

Source lives at kubernetes/apps/metrics-server/.

Three concrete consumers in this lab:

  • kubectl top nodes / kubectl top pods — the operator’s go-to “what’s hot right now” check during incident triage. The CLI reads from metrics.k8s.io directly; nothing else has to be installed.
  • HorizontalPodAutoscaler — any HPA that scales on cpu or memory (the two built-in metrics) reads from the same API. No HPAs ship with the lab today, but the moment one does, this is what feeds it.
  • Homepage’s Kubernetes widget — the cluster summary card on dashboard.lab.jackhall.dev draws its cluster-wide and per-node CPU/RAM bars from this API. Without metrics-server the widget renders an “ensure you have metrics-server installed” string instead of bars.

Talos issues kubelet serving certs from its own machine-CA, not from the cluster CA that metrics-server validates against by default. Two non-default args land in helm-values.yaml to make the scrape work on this cluster:

ArgWhy
--kubelet-insecure-tlsSkip kubelet cert validation. Standard workaround on bare-metal Talos clusters; the alternative is plumbing the Talos machine-CA bundle into metrics-server, which is more surface than a read-only scrape warrants.
--kubelet-preferred-address-types=InternalIP,HostnameChart default also lists ExternalIP in the middle. No node here has an ExternalIP (the cluster is LAN-only), so dropping it removes a dead lookup.

The full arg list lives in defaultArgs: (which the chart concatenates with args:) so the resulting kubelet-flag set is explicit in Git — no need to read the chart to know what’s actually being passed.

metrics-server is not part of the observability stack, and the distinction is worth being explicit about. metrics-server is a strictly smaller thing, and the two run side by side:

metrics-serverkube-prometheus-stack
PurposeResource metrics for HPA / kubectl top / widgetsFull TSDB + scrape config + alerting + Grafana
APImetrics.k8s.io/v1beta1 extension APIPrometheus HTTP API + custom CRDs
StorageIn-memory ring buffer (~1 minute of data)PVC-backed Prometheus TSDB
FootprintOne Deployment, 100m CPU / 200Mi RAMPrometheus, Alertmanager, node-exporter, kube-state-metrics, Grafana, etc
MaintenanceNone — no dashboards, no rulesDashboards, rules, recording, retention

metrics-server stays the lightweight resource-metrics floor — kubectl top, HPA, the Homepage widget — with no pipeline to maintain. It does not go away now that the observability stack has shipped: Prometheus does not serve the metrics.k8s.io API that kubectl top and HPA read. Hubble UI covers live network-flow visibility on the same “free, no extra pipeline” footing. The three coexist, each answering a different question.

Terminal window
# Extension APIService registered and reachable.
kubectl get apiservice v1beta1.metrics.k8s.io
# → AVAILABLE True
# Node-level metrics flowing.
kubectl top nodes
# → six rows, non-zero CPU/RAM
# Homepage cluster widget recovers.
curl -fsS https://dashboard.lab.jackhall.dev/api/widgets/kubernetes | jq .
# → HTTP 200, no "ensure you have metrics-server installed" string

kubectl top on a freshly-synced metrics-server can take up to a full scrape interval (15s, per --metric-resolution) before it returns a non-empty row — first call sometimes responds with “metrics not yet available.” Retry once.

The repo-side README at kubernetes/apps/metrics-server/README.md covers the full Talos-tuning rationale, the kube-prometheus-stack comparison in more depth, and the disaster-recovery story (the addon is stateless — re-syncing the Application is the entire procedure; the metrics buffer rebuilds within one scrape interval).