Consensus

The pattern: get a group of machines to agree on a single value, even when some fail or messages are lost. Used for leader election, replicated state machines, distributed locks, configuration stores. Every Kubernetes cluster runs Raft (in etcd) underneath; you can’t escape consensus once you go distributed.

The trade-off: safety vs. liveness vs. performance. Consensus algorithms guarantee safety (never agree on two different values) but may stall under partition (no liveness). They require a majority (quorum) — performance scales sub-linearly in cluster size. Paxos is famously hard; Raft is the readable alternative; both solve the same problem with different trade-offs in implementation cost.

[Deepen Year 2 Phase 8 by implementing Raft leader election in Go. The implementation is what makes it not magic.]

Replication — single-leader replication runs on top of consensus for the leader-election step.
CAP and PACELC — consensus is the price you pay for the “C” side of the trade-off.
Distributed time — terms / view numbers are logical clocks; consensus rides on happens-before.
Two-phase commit vs. sagas — atomic commit is a consensus problem in disguise; sagas opt out.
Control loops — Kubernetes controllers reconcile against an etcd-backed (Raft-replicated) source of truth.

First touched in Year 2 Phase 8; operational depth comes from running basecamp’s K3s + ArgoCD on a single etcd quorum.

Consensus

Related patterns