metrics-server
metrics-server is
the small extension API server that scrapes kubelet’s
/metrics/resource endpoint on every node and exposes the result as
the metrics.k8s.io/v1beta1 API. Without it, kubectl top returns
“Metrics API not available,” HorizontalPodAutoscaler can’t read
resource utilisation, and Homepage’s cluster widget can’t draw its
CPU/RAM bars. With it, all three Just Work.
Where it runs
Section titled “Where it runs”A single Deployment in the metrics-server namespace, plus the
v1beta1.metrics.k8s.io APIService registration that points the
aggregation layer at it. No UI, no LAN exposure, no secrets to sync —
the entire addon is the chart and a values file:
kubernetes/apps/metrics-server/├── application.yaml # multi-source ArgoCD Application└── helm-values.yaml # metrics-server chart values, tuned for TalosSource lives at
kubernetes/apps/metrics-server/.
What it unlocks
Section titled “What it unlocks”Three concrete consumers in this lab:
kubectl top nodes/kubectl top pods— the operator’s go-to “what’s hot right now” check during incident triage. The CLI reads frommetrics.k8s.iodirectly; nothing else has to be installed.HorizontalPodAutoscaler— any HPA that scales oncpuormemory(the two built-in metrics) reads from the same API. No HPAs ship with the lab today, but the moment one does, this is what feeds it.- Homepage’s Kubernetes widget — the cluster summary card on
dashboard.lab.jackhall.devdraws its cluster-wide and per-node CPU/RAM bars from this API. Without metrics-server the widget renders an “ensure you have metrics-server installed” string instead of bars.
Talos-specific tuning
Section titled “Talos-specific tuning”Talos issues kubelet serving certs from its own machine-CA, not from
the cluster CA that metrics-server validates against by default. Two
non-default args land in helm-values.yaml to make the scrape work
on this cluster:
| Arg | Why |
|---|---|
--kubelet-insecure-tls | Skip kubelet cert validation. Standard workaround on bare-metal Talos clusters; the alternative is plumbing the Talos machine-CA bundle into metrics-server, which is more surface than a read-only scrape warrants. |
--kubelet-preferred-address-types=InternalIP,Hostname | Chart default also lists ExternalIP in the middle. No node here has an ExternalIP (the cluster is LAN-only), so dropping it removes a dead lookup. |
The full arg list lives in defaultArgs: (which the chart concatenates
with args:) so the resulting kubelet-flag set is explicit in Git —
no need to read the chart to know what’s actually being passed.
Why this isn’t kube-prometheus-stack
Section titled “Why this isn’t kube-prometheus-stack”metrics-server is not part of the observability stack, and the distinction is worth being explicit about. metrics-server is a strictly smaller thing, and the two run side by side:
| metrics-server | kube-prometheus-stack | |
|---|---|---|
| Purpose | Resource metrics for HPA / kubectl top / widgets | Full TSDB + scrape config + alerting + Grafana |
| API | metrics.k8s.io/v1beta1 extension API | Prometheus HTTP API + custom CRDs |
| Storage | In-memory ring buffer (~1 minute of data) | PVC-backed Prometheus TSDB |
| Footprint | One Deployment, 100m CPU / 200Mi RAM | Prometheus, Alertmanager, node-exporter, kube-state-metrics, Grafana, etc |
| Maintenance | None — no dashboards, no rules | Dashboards, rules, recording, retention |
metrics-server stays the lightweight resource-metrics floor —
kubectl top, HPA, the Homepage widget — with no pipeline to
maintain. It does not go away now that the
observability stack has shipped: Prometheus does
not serve the metrics.k8s.io API that kubectl top and HPA read.
Hubble UI covers live network-flow visibility on the
same “free, no extra pipeline” footing. The three coexist, each
answering a different question.
Verifying after first sync
Section titled “Verifying after first sync”# Extension APIService registered and reachable.kubectl get apiservice v1beta1.metrics.k8s.io# → AVAILABLE True
# Node-level metrics flowing.kubectl top nodes# → six rows, non-zero CPU/RAM
# Homepage cluster widget recovers.curl -fsS https://dashboard.lab.jackhall.dev/api/widgets/kubernetes | jq .# → HTTP 200, no "ensure you have metrics-server installed" stringkubectl top on a freshly-synced metrics-server can take up to a
full scrape interval (15s, per --metric-resolution) before it
returns a non-empty row — first call sometimes responds with
“metrics not yet available.” Retry once.
The repo-side README at
kubernetes/apps/metrics-server/README.md
covers the full Talos-tuning rationale, the kube-prometheus-stack
comparison in more depth, and the disaster-recovery story (the addon
is stateless — re-syncing the Application is the entire procedure;
the metrics buffer rebuilds within one scrape interval).