Skip to content
STUB

Replication

The pattern: keep multiple copies of the same data on different machines. Reads can fan out to replicas (throughput). Writes must somehow reach all copies (consistency cost). When a node fails, others still have the data (availability).

The trade-off triangle: sync vs. async (sync = safety + latency; async = speed + possible loss); single-leader vs. multi-leader vs. leaderless (single = simple; multi = HA writes + conflict resolution; leaderless = quorum reads/writes); strong vs. eventual (strong = correct everywhere always; eventual = bounded staleness for availability). Every database picks somewhere on this surface.

[Deepen Year 2 Phase 8 — DDIA Ch. 5 + a Postgres failover drill is where this becomes muscle memory.]

  • Consensus — single-leader replication needs leader election; that’s a consensus problem.
  • Partitioning — orthogonal axis: replication is “how many copies?”, partitioning is “where do they live?”.
  • CAP and PACELC — every replication choice is a CAP/PACELC stance.
  • Eventual consistency — what you get for free with leaderless / async replication.
  • Write-ahead logging — the WAL is what gets shipped between leader and replicas.
  • Append-only log — Kafka ISR is replication of an append-only log, made operational.

First touched in Year 1 Phase 3 (Postgres streaming replication); promoted to DEEP in Year 2 Phase 8 after a real failover drill on basecamp.