ML Systems Patterns

Twenty-two patterns at the ML / LLM / agent tier — the largest category, reflecting Year 5's surface area. Feature stores, model registries, evals, RAG, LLM serving, agent loops, MCP, AI security, AIOps, and more.

Twenty-two patterns at the AI tier. Touched in Y4 Phase 34 (Python ML stack); deepens through Y5 across the full AI stack.

Patterns in this category

MLOps lifecycle

Pattern	First touched	DEEP target
model-registry	Y4 Phase 37 + Y5 Phase 39	Y5 Phase 39
experiment-tracking	Y4 Phase 37 + Y5 Phase 39	Y5 (cross-cutting)
feature-store	Y5 Phase 40	Y5 Phase 40
train-serve-skew	Y5 Phase 40	Y5 end
evals	Y5 Phase 41 + Phase 46 + Phase 48	Y5 end
drift-detection	Y5 Phase 41	OUTLINE target

Retrieval and RAG

Pattern	First touched	DEEP target
vector-search	Y5 Phase 42	Y5 Phase 42
embedding-store	Y5 Phase 42	OUTLINE target
rag-as-pattern	Y5 Phase 42	Y5 Phase 42

LLM serving and gateway

Pattern	First touched	DEEP target
llm-serving	Y5 Phase 43	Y5 Phase 43
inference-optimization	Y5 Phase 44	OUTLINE target
fine-tuning-strategies	Y5 Phase 45	OUTLINE target
llm-routing	Y5 Phase 46	Y5 Phase 46
llm-caching	Y5 Phase 46	OUTLINE target
prompt-engineering	Y5 Phase 47	Y5 Phase 47
structured-outputs	Y5 Phase 47	OUTLINE target

Agents, MCP, AIOps

Pattern	First touched	DEEP target
agent-loop	Y5 Phase 48 + Phase 50	Y5 Phase 48
tool-use	Y5 Phase 48	Y5 Phase 48
mcp-protocol	Y5 Phase 48	OUTLINE target
ai-security	Y5 Phase 49	OUTLINE target
ai-observability	Y5 Phase 49	OUTLINE target
aiops	Y5 Phase 50	OUTLINE target

Why this category exists

ML systems is the largest pattern category because it’s the youngest. The patterns aren’t yet codified in canonical books the way DDIA codified distributed systems. They’re emerging from public engineering blogs, papers, and operational practice across 2023-2026. Capturing them as patterns — not as tool-specific tutorials — is the way they survive the next wave of tool churn.

The 22 patterns split into four arcs: MLOps lifecycle (registry, tracking, feature store, train-serve-skew, evals, drift); retrieval and RAG (vector search, embeddings, RAG); LLM serving and gateway (serving, optimization, fine-tuning, routing, caching, prompting, structured outputs); and agents, MCP, AIOps (agent loops, tool use, MCP, AI security, AI observability, AIOps).

This category is the reason /root exists as a 5-year program. The first four years build the substrate. Year 5 is where the substrate hosts the AI-tier workloads that make the ML systems patterns operable. Reading these patterns in Year 1 without the substrate is fine; internalizing them requires the substrate to run them on.

Half the DEEP targets in this category are honest at OUTLINE-only. Twenty-two patterns is a lot to genuinely operate in one year. The DEEP claims are calibrated to what basecamp actually exercises: LLM serving through vLLM, LLM routing through llm-gateway, RAG through pgvector + retrieval, agent loops through the runtime plus MCP servers, feature store through Feast. Everything else stays at OUTLINE, honestly labeled.

How to read this category

Read this category by arc, not linearly. Each arc has one or two entry points that anchor the rest.

MLOps lifecycle arc: model-registry and experiment-tracking are the entry points (Y4 Phase 37 + Y5 Phase 39). Read them first. feature-store and train-serve-skew compose on top. evals and drift-detection are the quality-control patterns above the lifecycle.

Retrieval and RAG arc: rag-as-pattern is the anchor. Read it first, then vector-search and embedding-store as its component parts. The arc is small and self-contained.

LLM serving and gateway arc: llm-serving is the substrate; inference-optimization and fine-tuning-strategies are performance and customization; llm-routing and llm-caching are the gateway-layer patterns; prompt-engineering and structured-outputs are the caller-facing patterns. Seven patterns total; the anchor is llm-serving.

Agents, MCP, AIOps arc: agent-loop is the anchor. Read it first, then tool-use (how agents call things) and mcp-protocol (the protocol tool-use runs over). ai-security and ai-observability are the operational-safety patterns; aiops is the capstone application.

Which arc goes DEEP depends on where the operational hours accumulate. In basecamp: MLOps lifecycle (Feast, MLflow) gets 3-6 months. LLM serving (vLLM) gets 3-6 months through llm-gateway. RAG (pgvector) gets 3-6 months through notes-rag. Agents get 3-6 months through services/aiops/. That produces ~10 DEEP patterns out of 22 — honestly.

How the patterns connect

The four arcs form a stack.

Data tier below — Year 4’s data engineering patterns produce the tables and streams the MLOps arc uses.
MLOps lifecycle — the discipline layer. Every model has a registry entry, an experiment lineage, a feature-store dependency, and an eval history.
Retrieval and RAG — the retrieval layer. Sits between the data tier (where documents/embeddings live) and the LLM serving layer (where retrieved context feeds prompts).
LLM serving and gateway — the inference layer. Models get served (llm-serving), optimized (inference-optimization), customized (fine-tuning-strategies), routed (llm-routing), and cached (llm-caching). Prompts get engineered and constrained to structured outputs.
Agents, MCP, AIOps — the composition layer. Agents call tools (via MCP), which call the LLM serving layer for reasoning and the data layer for state. AI security and AI observability keep the composition safe and debuggable. AIOps is one specific application of the whole stack: agents that operate the platform.

Every pattern in the fourth arc depends on patterns in earlier arcs. Agents need LLM serving (arc 3), which uses RAG (arc 2), which uses feature stores (arc 1). The composition is deep and specific; skipping an arc leaves the ones above unsupported.

Where these show up in /root

Y4 Phase 34-38 — the ML foundations year. model-registry and experiment-tracking first-fire through MLflow. First model trained-registered-served end-to-end. The patterns are OUTLINE at year end; DEEP evidence accumulates in Y5.
Y5 Phase 39 — MLOps lifecycle deepening. MLflow becomes operational; the registry is where every Y5 model lands.
Y5 Phase 40 — feature-store and train-serve-skew first-fire through Feast. Train and serve pipelines share feature definitions; parity is verified through observed skew metrics.
Y5 Phase 41 — evals and drift-detection first-fire. Every model gets offline evals; online evals compare model versions in production; drift-detection alerts when input distributions shift.
Y5 Phase 42 — vector-search, embedding-store, rag-as-pattern first-fire together through pgvector + embedding pipelines. notes-rag (RAG over your weekly logs) is the personal-services-tier proof.
Y5 Phase 43 — llm-serving first-fires through vLLM on the GPU. Local Llama or Mistral models served at your throughput.
Y5 Phase 44 — inference-optimization first-fires through quantization (INT8, INT4), speculative decoding, and PagedAttention memory management.
Y5 Phase 45 — fine-tuning-strategies first-fires through LoRA and QLoRA. Small parameter-efficient tunes on the local GPU.
Y5 Phase 46 — llm-routing and llm-caching first-fire through llm-gateway. Route between vLLM (local) and hosted providers based on cost, latency, and capability.
Y5 Phase 47 — prompt-engineering and structured-outputs first-fire. Prompts as versioned artifacts; JSON-schema-enforced outputs for tool calling.
Y5 Phase 48 — agent-loop, tool-use, mcp-protocol first-fire together. The agent runtime plus MCP servers wire platform-ctl, data-tier, and ops-handbook as agent-accessible tools.
Y5 Phase 49 — ai-security and ai-observability first-fire. Prompt-injection tests; jailbreak evals; agent action audits; trace-every-LLM-call observability.
Y5 Phase 50 — aiops first-fires through services/aiops/. Agents triage alerts, propose runbooks, execute through platform-ctl with human approval gates. All four arcs compose in one operational service.

The Y5 capstone is Studio + the Pattern Paper. The Pattern Paper explicitly synthesizes across this category — which patterns held up under operational load, which needed adjustment, which were misnamed at the start of Y5 and got renamed by the end.

Anti-patterns

Anti-pattern	Why
Promoting all 22 patterns to DEEP	~10 DEEP is honest for one year of operational work. Claiming 22 DEEP is claiming three months of operational evidence per pattern, which doesn’t fit. Half the patterns stay at OUTLINE, honestly labeled.
Reading LLM patterns from vendor blog posts only	Vendor blogs skew toward the vendor’s product. Read the patterns from at least two vendors (Anthropic + OpenAI, or Anthropic + Google) and the OSS community. Divergence between sources is where the real trade-offs live.
Skipping `evals` for a model that “works well enough”	Evals are how you know a model degradation happened. Without evals, model updates ship blind, and the first symptom of regression is a user complaint. Every deployed model needs at least one automated eval running continuously.
Feature store without train-serve skew monitoring	The whole point of a feature store is train-serve parity. If you’re not measuring the skew, the feature store is a database with extra ceremony. Instrument train-serve skew from day one.
Prompt-engineering as a permanent role	Prompt engineering is a phase, not a discipline. Systems that codify prompts in versioned artifacts (like structured-output schemas) evolve past the need for prompt-engineers-as-people. The pattern is transient; use it while it’s needed, expect it to disappear.
Agent runtimes with no human-in-the-loop	Agent systems that execute destructive actions autonomously produce disasters. Every `aiops` action against basecamp goes through an approval gate. The approval gate is not optional; it’s the pattern’s core.
MCP tools with wildcard permissions	An MCP server exposing “run any SQL” or “execute any kubectl” is a supply-chain attack waiting to happen. Every MCP tool is scoped to specific verbs against specific resources; wildcards are a design smell.
Fine-tuning when RAG would work	Fine-tuning is expensive, opaque, and hard to update. RAG is cheap, transparent, and updatable at query time. Try RAG first. Fine-tune only when RAG’s retrieval quality is genuinely insufficient.
AI observability as “we log the prompt”	Real AI observability captures: prompt, completion, model, temperature, tokens in/out, cost, latency, retrieval results, tool calls, evaluation scores. If you log only the prompt, you can’t debug regressions.

Cross-references

Pattern Library
Year 5 overview — the AI tier curriculum
Platform Patterns in the Industry — public-knowledge LLM/agent implementations
Data Engineering Patterns — the data tier the MLOps arc depends on
Distributed Systems Patterns — backpressure, delivery-semantics, and fault-isolation apply to agent runtimes and LLM gateways
Observability and Operations Patterns — AI observability extends the three-pillars pattern with AI-specific spans
Reading list — MLflow docs, vLLM paper, RAG survey, agent frameworks (LangGraph docs), Anthropic + OpenAI engineering blogs