Skip to content
STUB

Stream Processing

The pattern: continuously process events as they arrive. Stateful (sessionize, join, aggregate over windows) with watermark-based handling of out-of-order events. The hard parts: time semantics (event-time vs. processing-time), state management (RocksDB-backed), exactly-once semantics across producer + processor + sink.

The trade-off: latency vs. correctness. Aggressive watermarks (close windows fast) give low latency but drop late events. Conservative watermarks wait for late arrivals (correct, slow). Stateful operators need durable state (checkpoints) — checkpoint frequency trades recovery time for steady-state throughput. Flink wins for serious streaming; Kafka Streams wins for “JVM-shop, broker is the system.”

Deepens in Year 3 Phase 16: Stream Processing — DDIA Ch. 11 plus a Flink keyed-window exercise on Redpanda. The bounded counterpart lands in Year 3 Phase 17: Batch Processing, and lambda-and-kappa is where the two architectures meet.

  • batch-processing: the bounded counterpart — same data, different temporal contract.
  • lambda-and-kappa: how stream and batch combine, or whether stream replaces batch entirely.
  • delivery-semantics: at-least-once vs. exactly-once is the streaming contract that defines correctness.
  • distributed-time: event-time vs. processing-time vs. ingestion-time — the choice that drives watermarks.
  • backpressure: a streaming operator that can’t keep up must signal upstream, not buffer to OOM.
  • append-only-log: the substrate stream processors consume from (Kafka, Redpanda topics).
  • materialized-views: a streaming aggregate is an incrementally-maintained view.
  • idempotency: the consumer-side property that turns at-least-once into effectively exactly-once.