Skip to content
STUB

Schema on Read vs Schema on Write

The pattern: when do you validate that data conforms to a schema? Schema-on-write: validate at insert, reject if wrong, data is always conformant. Schema-on-read: store anything, validate (or pivot) at query time, flexibility at the cost of correctness. Iceberg lets you have both — store as Parquet (typed), evolve schema later, validate columns at query.

The trade-off: upfront discipline vs. iteration speed. Schema-on-write forces you to commit early; rewards consumers (clean queries) at the cost of producers (rigid evolution). Schema-on-read is the data-lake-anything-goes shape; rewards producers (just dump it) at the cost of consumers (every query is “what is this column?”). The lakehouse is a compromise: store typed (Parquet), allow schema evolution, validate at the catalog layer.

Deepens in Year 3 Phase 15 where Iceberg’s schema evolution becomes the worked example on top of basecamp’s MinIO substrate.

  • snapshot-plus-delta — the metadata layer that makes schema evolution safe in lakehouses.
  • oltp-vs-olap — OLTP is almost always schema-on-write; OLAP can afford to relax.
  • batch-processing — schema-on-read pipelines live or die by their batch validation step.