Cardinality as Cost
The pattern: high-cardinality labels (user_id, request_id, IP) on metrics explode storage cost. A counter http_requests_total{user_id="..."} with 100k users = 100k time series. The discipline: structured logs for high-cardinality data; metrics only for low-cardinality aggregates; traces (sampled) for per-request detail.
The trade-off: query power vs. storage cost. Tempting to label everything because “I might want to query by it.” Resist. Modern observability stacks (Honeycomb, structured Loki) push toward “high-cardinality logs/events as primary, metrics derived from logs” — solving the cardinality problem at the storage shape, not at the user discipline.
Deepens in Year 3 Phase 14: Observability — replace one bad-cardinality metric with a structured log + Loki query and observe the storage delta. The discipline matters earlier, starting in Year 1 Phase 7: Kubernetes + GitOps when Prometheus first scrapes pulse and bad labels become real series. ops-handbook carries the runbook for “metric cardinality blew up at 3am.”
Related patterns
- three-pillars-and-unified-telemetry — picking the right pillar is how you pay the right price for a question.
- sli-slo-error-budget — SLI metrics live or die by their cardinality budget.
- partitioning — same intuition, different domain: cardinality is partitioning over label space.
- routing-and-addressing — labels like
pathandclient_ipare where cardinality leaks in from the network layer.