ML Systems Patterns
Twenty-two patterns at the ML / LLM / agent tier — the largest category, reflecting Year 5's surface area. Feature stores, model registries, evals, RAG, LLM serving, agent loops, MCP, AI security, AIOps, and more.
Twenty-two patterns at the AI tier. Touched in Y4 Phase 34 (Python ML stack); deepens through Y5 across the full AI stack.
Patterns in this category
MLOps lifecycle
| Pattern | First touched | DEEP target |
|---|---|---|
| model-registry | Y4 Phase 37 + Y5 Phase 39 | Y5 Phase 39 |
| experiment-tracking | Y4 Phase 37 + Y5 Phase 39 | Y5 (cross-cutting) |
| feature-store | Y5 Phase 40 | Y5 Phase 40 |
| train-serve-skew | Y5 Phase 40 | Y5 end |
| evals | Y5 Phase 41 + Phase 46 + Phase 48 | Y5 end |
| drift-detection | Y5 Phase 41 | OUTLINE target |
Retrieval and RAG
| Pattern | First touched | DEEP target |
|---|---|---|
| vector-search | Y5 Phase 42 | Y5 Phase 42 |
| embedding-store | Y5 Phase 42 | OUTLINE target |
| rag-as-pattern | Y5 Phase 42 | Y5 Phase 42 |
LLM serving and gateway
| Pattern | First touched | DEEP target |
|---|---|---|
| llm-serving | Y5 Phase 43 | Y5 Phase 43 |
| inference-optimization | Y5 Phase 44 | OUTLINE target |
| fine-tuning-strategies | Y5 Phase 45 | OUTLINE target |
| llm-routing | Y5 Phase 46 | Y5 Phase 46 |
| llm-caching | Y5 Phase 46 | OUTLINE target |
| prompt-engineering | Y5 Phase 47 | Y5 Phase 47 |
| structured-outputs | Y5 Phase 47 | OUTLINE target |
Agents, MCP, AIOps
| Pattern | First touched | DEEP target |
|---|---|---|
| agent-loop | Y5 Phase 48 + Phase 50 | Y5 Phase 48 |
| tool-use | Y5 Phase 48 | Y5 Phase 48 |
| mcp-protocol | Y5 Phase 48 | OUTLINE target |
| ai-security | Y5 Phase 49 | OUTLINE target |
| ai-observability | Y5 Phase 49 | OUTLINE target |
| aiops | Y5 Phase 50 | OUTLINE target |
Why this category exists
ML systems is the largest pattern category because it’s the youngest. The patterns aren’t yet codified in canonical books the way DDIA codified distributed systems. They’re emerging from public engineering blogs, papers, and operational practice across 2023-2026. Capturing them as patterns — not as tool-specific tutorials — is the way they survive the next wave of tool churn.
The 22 patterns split into four arcs: MLOps lifecycle (registry, tracking, feature store, train-serve-skew, evals, drift); retrieval and RAG (vector search, embeddings, RAG); LLM serving and gateway (serving, optimization, fine-tuning, routing, caching, prompting, structured outputs); and agents, MCP, AIOps (agent loops, tool use, MCP, AI security, AI observability, AIOps).
This category is the reason /root exists as a 5-year program. The first four years build the substrate. Year 5 is where the substrate hosts the AI-tier workloads that make the ML systems patterns operable. Reading these patterns in Year 1 without the substrate is fine; internalizing them requires the substrate to run them on.
Half the DEEP targets in this category are honest at OUTLINE-only. Twenty-two patterns is a lot to genuinely operate in one year. The DEEP claims are calibrated to what basecamp actually exercises: LLM serving through vLLM, LLM routing through llm-gateway, RAG through pgvector + retrieval, agent loops through the runtime plus MCP servers, feature store through Feast. Everything else stays at OUTLINE, honestly labeled.
How to read this category
Read this category by arc, not linearly. Each arc has one or two entry points that anchor the rest.
MLOps lifecycle arc: model-registry and experiment-tracking are the entry points (Y4 Phase 37 + Y5 Phase 39). Read them first. feature-store and train-serve-skew compose on top. evals and drift-detection are the quality-control patterns above the lifecycle.
Retrieval and RAG arc: rag-as-pattern is the anchor. Read it first, then vector-search and embedding-store as its component parts. The arc is small and self-contained.
LLM serving and gateway arc: llm-serving is the substrate; inference-optimization and fine-tuning-strategies are performance and customization; llm-routing and llm-caching are the gateway-layer patterns; prompt-engineering and structured-outputs are the caller-facing patterns. Seven patterns total; the anchor is llm-serving.
Agents, MCP, AIOps arc: agent-loop is the anchor. Read it first, then tool-use (how agents call things) and mcp-protocol (the protocol tool-use runs over). ai-security and ai-observability are the operational-safety patterns; aiops is the capstone application.
Which arc goes DEEP depends on where the operational hours accumulate. In basecamp: MLOps lifecycle (Feast, MLflow) gets 3-6 months. LLM serving (vLLM) gets 3-6 months through llm-gateway. RAG (pgvector) gets 3-6 months through notes-rag. Agents get 3-6 months through services/aiops/. That produces ~10 DEEP patterns out of 22 — honestly.
How the patterns connect
The four arcs form a stack.
- Data tier below — Year 4’s data engineering patterns produce the tables and streams the MLOps arc uses.
- MLOps lifecycle — the discipline layer. Every model has a registry entry, an experiment lineage, a feature-store dependency, and an eval history.
- Retrieval and RAG — the retrieval layer. Sits between the data tier (where documents/embeddings live) and the LLM serving layer (where retrieved context feeds prompts).
- LLM serving and gateway — the inference layer. Models get served (
llm-serving), optimized (inference-optimization), customized (fine-tuning-strategies), routed (llm-routing), and cached (llm-caching). Prompts get engineered and constrained to structured outputs. - Agents, MCP, AIOps — the composition layer. Agents call tools (via MCP), which call the LLM serving layer for reasoning and the data layer for state. AI security and AI observability keep the composition safe and debuggable. AIOps is one specific application of the whole stack: agents that operate the platform.
Every pattern in the fourth arc depends on patterns in earlier arcs. Agents need LLM serving (arc 3), which uses RAG (arc 2), which uses feature stores (arc 1). The composition is deep and specific; skipping an arc leaves the ones above unsupported.
Where these show up in /root
- Y4 Phase 34-38 — the ML foundations year.
model-registryandexperiment-trackingfirst-fire through MLflow. First model trained-registered-served end-to-end. The patterns are OUTLINE at year end; DEEP evidence accumulates in Y5. - Y5 Phase 39 — MLOps lifecycle deepening. MLflow becomes operational; the registry is where every Y5 model lands.
- Y5 Phase 40 —
feature-storeandtrain-serve-skewfirst-fire through Feast. Train and serve pipelines share feature definitions; parity is verified through observed skew metrics. - Y5 Phase 41 —
evalsanddrift-detectionfirst-fire. Every model gets offline evals; online evals compare model versions in production; drift-detection alerts when input distributions shift. - Y5 Phase 42 —
vector-search,embedding-store,rag-as-patternfirst-fire together through pgvector + embedding pipelines.notes-rag(RAG over your weekly logs) is the personal-services-tier proof. - Y5 Phase 43 —
llm-servingfirst-fires through vLLM on the GPU. Local Llama or Mistral models served at your throughput. - Y5 Phase 44 —
inference-optimizationfirst-fires through quantization (INT8, INT4), speculative decoding, and PagedAttention memory management. - Y5 Phase 45 —
fine-tuning-strategiesfirst-fires through LoRA and QLoRA. Small parameter-efficient tunes on the local GPU. - Y5 Phase 46 —
llm-routingandllm-cachingfirst-fire throughllm-gateway. Route between vLLM (local) and hosted providers based on cost, latency, and capability. - Y5 Phase 47 —
prompt-engineeringandstructured-outputsfirst-fire. Prompts as versioned artifacts; JSON-schema-enforced outputs for tool calling. - Y5 Phase 48 —
agent-loop,tool-use,mcp-protocolfirst-fire together. The agent runtime plus MCP servers wireplatform-ctl,data-tier, andops-handbookas agent-accessible tools. - Y5 Phase 49 —
ai-securityandai-observabilityfirst-fire. Prompt-injection tests; jailbreak evals; agent action audits; trace-every-LLM-call observability. - Y5 Phase 50 —
aiopsfirst-fires throughservices/aiops/. Agents triage alerts, propose runbooks, execute throughplatform-ctlwith human approval gates. All four arcs compose in one operational service.
The Y5 capstone is Studio + the Pattern Paper. The Pattern Paper explicitly synthesizes across this category — which patterns held up under operational load, which needed adjustment, which were misnamed at the start of Y5 and got renamed by the end.
Anti-patterns
| Anti-pattern | Why |
|---|---|
| Promoting all 22 patterns to DEEP | ~10 DEEP is honest for one year of operational work. Claiming 22 DEEP is claiming three months of operational evidence per pattern, which doesn’t fit. Half the patterns stay at OUTLINE, honestly labeled. |
| Reading LLM patterns from vendor blog posts only | Vendor blogs skew toward the vendor’s product. Read the patterns from at least two vendors (Anthropic + OpenAI, or Anthropic + Google) and the OSS community. Divergence between sources is where the real trade-offs live. |
Skipping evals for a model that “works well enough” | Evals are how you know a model degradation happened. Without evals, model updates ship blind, and the first symptom of regression is a user complaint. Every deployed model needs at least one automated eval running continuously. |
| Feature store without train-serve skew monitoring | The whole point of a feature store is train-serve parity. If you’re not measuring the skew, the feature store is a database with extra ceremony. Instrument train-serve skew from day one. |
| Prompt-engineering as a permanent role | Prompt engineering is a phase, not a discipline. Systems that codify prompts in versioned artifacts (like structured-output schemas) evolve past the need for prompt-engineers-as-people. The pattern is transient; use it while it’s needed, expect it to disappear. |
| Agent runtimes with no human-in-the-loop | Agent systems that execute destructive actions autonomously produce disasters. Every aiops action against basecamp goes through an approval gate. The approval gate is not optional; it’s the pattern’s core. |
| MCP tools with wildcard permissions | An MCP server exposing “run any SQL” or “execute any kubectl” is a supply-chain attack waiting to happen. Every MCP tool is scoped to specific verbs against specific resources; wildcards are a design smell. |
| Fine-tuning when RAG would work | Fine-tuning is expensive, opaque, and hard to update. RAG is cheap, transparent, and updatable at query time. Try RAG first. Fine-tune only when RAG’s retrieval quality is genuinely insufficient. |
| AI observability as “we log the prompt” | Real AI observability captures: prompt, completion, model, temperature, tokens in/out, cost, latency, retrieval results, tool calls, evaluation scores. If you log only the prompt, you can’t debug regressions. |
Cross-references
- Pattern Library
- Year 5 overview — the AI tier curriculum
- Platform Patterns in the Industry — public-knowledge LLM/agent implementations
- Data Engineering Patterns — the data tier the MLOps arc depends on
- Distributed Systems Patterns — backpressure, delivery-semantics, and fault-isolation apply to agent runtimes and LLM gateways
- Observability and Operations Patterns — AI observability extends the three-pillars pattern with AI-specific spans
- Reading list — MLflow docs, vLLM paper, RAG survey, agent frameworks (LangGraph docs), Anthropic + OpenAI engineering blogs