ML Systems Patterns

Twenty-two patterns at the ML / LLM / agent tier — the largest category, reflecting Year 5's surface area. Feature stores, model registries, evals, RAG, LLM serving, agent loops, MCP, AI security, AIOps, and more.

Twenty-two patterns at the AI tier. Touched in Y4 Phase 34 (Python ML stack); deepens through Y5 across the full AI stack.

Patterns in this category

MLOps lifecycle

PatternFirst touchedDEEP target
model-registryY4 Phase 37 + Y5 Phase 39Y5 Phase 39
experiment-trackingY4 Phase 37 + Y5 Phase 39Y5 (cross-cutting)
feature-storeY5 Phase 40Y5 Phase 40
train-serve-skewY5 Phase 40Y5 end
evalsY5 Phase 41 + Phase 46 + Phase 48Y5 end
drift-detectionY5 Phase 41OUTLINE target

Retrieval and RAG

PatternFirst touchedDEEP target
vector-searchY5 Phase 42Y5 Phase 42
embedding-storeY5 Phase 42OUTLINE target
rag-as-patternY5 Phase 42Y5 Phase 42

LLM serving and gateway

PatternFirst touchedDEEP target
llm-servingY5 Phase 43Y5 Phase 43
inference-optimizationY5 Phase 44OUTLINE target
fine-tuning-strategiesY5 Phase 45OUTLINE target
llm-routingY5 Phase 46Y5 Phase 46
llm-cachingY5 Phase 46OUTLINE target
prompt-engineeringY5 Phase 47Y5 Phase 47
structured-outputsY5 Phase 47OUTLINE target

Agents, MCP, AIOps

PatternFirst touchedDEEP target
agent-loopY5 Phase 48 + Phase 50Y5 Phase 48
tool-useY5 Phase 48Y5 Phase 48
mcp-protocolY5 Phase 48OUTLINE target
ai-securityY5 Phase 49OUTLINE target
ai-observabilityY5 Phase 49OUTLINE target
aiopsY5 Phase 50OUTLINE target

Why this category exists

ML systems is the largest pattern category because it’s the youngest. The patterns aren’t yet codified in canonical books the way DDIA codified distributed systems. They’re emerging from public engineering blogs, papers, and operational practice across 2023-2026. Capturing them as patterns — not as tool-specific tutorials — is the way they survive the next wave of tool churn.

The 22 patterns split into four arcs: MLOps lifecycle (registry, tracking, feature store, train-serve-skew, evals, drift); retrieval and RAG (vector search, embeddings, RAG); LLM serving and gateway (serving, optimization, fine-tuning, routing, caching, prompting, structured outputs); and agents, MCP, AIOps (agent loops, tool use, MCP, AI security, AI observability, AIOps).

This category is the reason /root exists as a 5-year program. The first four years build the substrate. Year 5 is where the substrate hosts the AI-tier workloads that make the ML systems patterns operable. Reading these patterns in Year 1 without the substrate is fine; internalizing them requires the substrate to run them on.

Half the DEEP targets in this category are honest at OUTLINE-only. Twenty-two patterns is a lot to genuinely operate in one year. The DEEP claims are calibrated to what basecamp actually exercises: LLM serving through vLLM, LLM routing through llm-gateway, RAG through pgvector + retrieval, agent loops through the runtime plus MCP servers, feature store through Feast. Everything else stays at OUTLINE, honestly labeled.

How to read this category

Read this category by arc, not linearly. Each arc has one or two entry points that anchor the rest.

MLOps lifecycle arc: model-registry and experiment-tracking are the entry points (Y4 Phase 37 + Y5 Phase 39). Read them first. feature-store and train-serve-skew compose on top. evals and drift-detection are the quality-control patterns above the lifecycle.

Retrieval and RAG arc: rag-as-pattern is the anchor. Read it first, then vector-search and embedding-store as its component parts. The arc is small and self-contained.

LLM serving and gateway arc: llm-serving is the substrate; inference-optimization and fine-tuning-strategies are performance and customization; llm-routing and llm-caching are the gateway-layer patterns; prompt-engineering and structured-outputs are the caller-facing patterns. Seven patterns total; the anchor is llm-serving.

Agents, MCP, AIOps arc: agent-loop is the anchor. Read it first, then tool-use (how agents call things) and mcp-protocol (the protocol tool-use runs over). ai-security and ai-observability are the operational-safety patterns; aiops is the capstone application.

Which arc goes DEEP depends on where the operational hours accumulate. In basecamp: MLOps lifecycle (Feast, MLflow) gets 3-6 months. LLM serving (vLLM) gets 3-6 months through llm-gateway. RAG (pgvector) gets 3-6 months through notes-rag. Agents get 3-6 months through services/aiops/. That produces ~10 DEEP patterns out of 22 — honestly.

How the patterns connect

The four arcs form a stack.

  • Data tier below — Year 4’s data engineering patterns produce the tables and streams the MLOps arc uses.
  • MLOps lifecycle — the discipline layer. Every model has a registry entry, an experiment lineage, a feature-store dependency, and an eval history.
  • Retrieval and RAG — the retrieval layer. Sits between the data tier (where documents/embeddings live) and the LLM serving layer (where retrieved context feeds prompts).
  • LLM serving and gateway — the inference layer. Models get served (llm-serving), optimized (inference-optimization), customized (fine-tuning-strategies), routed (llm-routing), and cached (llm-caching). Prompts get engineered and constrained to structured outputs.
  • Agents, MCP, AIOps — the composition layer. Agents call tools (via MCP), which call the LLM serving layer for reasoning and the data layer for state. AI security and AI observability keep the composition safe and debuggable. AIOps is one specific application of the whole stack: agents that operate the platform.

Every pattern in the fourth arc depends on patterns in earlier arcs. Agents need LLM serving (arc 3), which uses RAG (arc 2), which uses feature stores (arc 1). The composition is deep and specific; skipping an arc leaves the ones above unsupported.

Where these show up in /root

  • Y4 Phase 34-38 — the ML foundations year. model-registry and experiment-tracking first-fire through MLflow. First model trained-registered-served end-to-end. The patterns are OUTLINE at year end; DEEP evidence accumulates in Y5.
  • Y5 Phase 39 — MLOps lifecycle deepening. MLflow becomes operational; the registry is where every Y5 model lands.
  • Y5 Phase 40feature-store and train-serve-skew first-fire through Feast. Train and serve pipelines share feature definitions; parity is verified through observed skew metrics.
  • Y5 Phase 41evals and drift-detection first-fire. Every model gets offline evals; online evals compare model versions in production; drift-detection alerts when input distributions shift.
  • Y5 Phase 42vector-search, embedding-store, rag-as-pattern first-fire together through pgvector + embedding pipelines. notes-rag (RAG over your weekly logs) is the personal-services-tier proof.
  • Y5 Phase 43llm-serving first-fires through vLLM on the GPU. Local Llama or Mistral models served at your throughput.
  • Y5 Phase 44inference-optimization first-fires through quantization (INT8, INT4), speculative decoding, and PagedAttention memory management.
  • Y5 Phase 45fine-tuning-strategies first-fires through LoRA and QLoRA. Small parameter-efficient tunes on the local GPU.
  • Y5 Phase 46llm-routing and llm-caching first-fire through llm-gateway. Route between vLLM (local) and hosted providers based on cost, latency, and capability.
  • Y5 Phase 47prompt-engineering and structured-outputs first-fire. Prompts as versioned artifacts; JSON-schema-enforced outputs for tool calling.
  • Y5 Phase 48agent-loop, tool-use, mcp-protocol first-fire together. The agent runtime plus MCP servers wire platform-ctl, data-tier, and ops-handbook as agent-accessible tools.
  • Y5 Phase 49ai-security and ai-observability first-fire. Prompt-injection tests; jailbreak evals; agent action audits; trace-every-LLM-call observability.
  • Y5 Phase 50aiops first-fires through services/aiops/. Agents triage alerts, propose runbooks, execute through platform-ctl with human approval gates. All four arcs compose in one operational service.

The Y5 capstone is Studio + the Pattern Paper. The Pattern Paper explicitly synthesizes across this category — which patterns held up under operational load, which needed adjustment, which were misnamed at the start of Y5 and got renamed by the end.

Anti-patterns

Anti-patternWhy
Promoting all 22 patterns to DEEP~10 DEEP is honest for one year of operational work. Claiming 22 DEEP is claiming three months of operational evidence per pattern, which doesn’t fit. Half the patterns stay at OUTLINE, honestly labeled.
Reading LLM patterns from vendor blog posts onlyVendor blogs skew toward the vendor’s product. Read the patterns from at least two vendors (Anthropic + OpenAI, or Anthropic + Google) and the OSS community. Divergence between sources is where the real trade-offs live.
Skipping evals for a model that “works well enough”Evals are how you know a model degradation happened. Without evals, model updates ship blind, and the first symptom of regression is a user complaint. Every deployed model needs at least one automated eval running continuously.
Feature store without train-serve skew monitoringThe whole point of a feature store is train-serve parity. If you’re not measuring the skew, the feature store is a database with extra ceremony. Instrument train-serve skew from day one.
Prompt-engineering as a permanent rolePrompt engineering is a phase, not a discipline. Systems that codify prompts in versioned artifacts (like structured-output schemas) evolve past the need for prompt-engineers-as-people. The pattern is transient; use it while it’s needed, expect it to disappear.
Agent runtimes with no human-in-the-loopAgent systems that execute destructive actions autonomously produce disasters. Every aiops action against basecamp goes through an approval gate. The approval gate is not optional; it’s the pattern’s core.
MCP tools with wildcard permissionsAn MCP server exposing “run any SQL” or “execute any kubectl” is a supply-chain attack waiting to happen. Every MCP tool is scoped to specific verbs against specific resources; wildcards are a design smell.
Fine-tuning when RAG would workFine-tuning is expensive, opaque, and hard to update. RAG is cheap, transparent, and updatable at query time. Try RAG first. Fine-tune only when RAG’s retrieval quality is genuinely insufficient.
AI observability as “we log the prompt”Real AI observability captures: prompt, completion, model, temperature, tokens in/out, cost, latency, retrieval results, tool calls, evaluation scores. If you log only the prompt, you can’t debug regressions.

Cross-references