Diagnose a Prometheus cardinality explosion
WHAT IT TESTS: operating Prometheus at scale. OUTLINE: find offenders via TSDB stats and topk count by __name__, identify unbounded labels, then drop or aggregate them with relabeling. RED FLAG: just scaling memory without fixing label design.
WHAT IT TESTS: whether you understand that each unique label-set is a separate time series. ANSWER OUTLINE: diagnose using /tsdb-status, prometheus_tsdb_head_series, and count by (__name__) to find the worst metrics; cardinality usually explodes from unbounded labels like user ID, pod name, request path, or error message; mitigate with metric_relabel_configs to drop labels, bucketing high-variance values, and recording rules. RED FLAG: just adding RAM, blaming Prometheus itself, or proposing per-request labels.
Read the original → interview
- #prometheus
- #observability
- #cardinality
- #kubernetes
- #metrics
Get five bites like this every day.
Tezvyn delivers a daily feed of 60-second tech bites with quizzes to lock in what you learn.