LSM vs B-tree
The pattern: two index data structures with opposite write/read trade-offs. B-tree: balanced, in-place updates, optimized for reads. LSM (Log-Structured Merge): append-only writes batched into immutable sorted runs, periodic compaction merges them; optimized for writes, with multi-level reads mitigated by Bloom filters.
The trade-off: read amplification vs. write amplification. B-trees do random writes (bad for write-heavy workloads on disk) and direct reads. LSMs do sequential writes (great) and multi-level reads (mitigated by Bloom filters and compaction). Workload shape decides: write-heavy → LSM; read-heavy with index-only scans → B-tree. Modern databases blur the line — Postgres B-tree + WAL is LSM-like at the WAL level.
Deepens in Year 1 Phase 3 (Postgres B-tree under load + a RocksDB compare). DDIA Ch. 3 and the CMU database lectures are the standard references.
Related patterns
- append-only-log — the substrate LSM writes land on before compaction.
- write-ahead-logging — even B-tree engines lean on append-only WAL for durability.
- oltp-vs-olap — workload shape is what picks the index structure.