Skip to content
STUB

Three Pillars and Unified Telemetry

The pattern: three orthogonal observability signals. Logs answer “what happened?” (high cardinality, expensive to index full-text). Metrics answer “how often + how much?” (low cardinality, cheap, aggregatable). Traces answer “where in the request path?” (high cardinality, sampled). OpenTelemetry unifies them with shared correlation IDs (trace_id) so you can pivot between pillars for one request.

The trade-off: storage cost vs. query power. Each pillar has a different cost shape; choosing the wrong one for a question is expensive. High-cardinality data belongs in logs (search) or traces (sampled), not metrics. The discipline: pick the pillar that matches the question, then correlate via trace_id when you need the full picture.

Stood up first in Year 1 Phase 7: Kubernetes + GitOps (Prometheus + Grafana on K3s) and reaches DEEP in Year 3 Phase 14: Observability — Prometheus + Loki + Tempo with OpenTelemetry-based correlation across them. Probe-emitted metrics flow in from pulse; incident timelines reconstructed from these three pillars are what feed every entry in ops-handbook.

  • cardinality-as-cost — the constraint that forces you to pick the right pillar in the first place.
  • sli-slo-error-budget — SLIs are derived from these pillars; usually the metrics one.
  • runbook-as-code — runbook verification steps query each pillar in turn.
  • distributed-time — trace correlation depends on causal time, not wall clocks.
  • service-mesh — the data plane that emits structured RED metrics and span context for free.