Structured Outputs

Force LLM responses into a deterministic schema (JSON, function-call, regex). The discipline that makes LLM outputs reliably parseable by downstream systems.

The LLM doesn’t get to return free-form prose. It returns valid JSON matching a schema. Or a function call. Or it doesn’t return. Status: STUB — promoted to OUTLINE in Y5 Phase 47.

What this pattern is

Structured outputs force LLM responses into a deterministic schema rather than free-form prose, so downstream systems can parse them reliably without regex hackery. The techniques: JSON-mode / Structured Outputs (OpenAI, Anthropic — server-side enforcement that response matches a JSON schema, achieved via constrained decoding); function-calling / tool-use schemas (the LLM emits a function name + arguments matching a declared signature); grammar-constrained decoding (vLLM, Outlines, Guidance — the local-serving equivalent that constrains generation to a CFG or regex).

The pattern matters because most LLM applications need structured downstream consumption — extracting fields, choosing actions, populating UIs, calling tools. Without structured outputs, the consumer must parse LLM prose and fail gracefully on malformed responses; with structured outputs, the schema is the contract and the LLM provider enforces it. The pattern composes with tool-use (function-calling IS structured output applied to action selection) and with prompt-engineering (the schema is part of the prompt’s contract).

The underlying mechanism is constrained decoding. At each generation step, the LLM’s output distribution is masked to allow only tokens that keep the output valid against the schema. This is what makes JSON mode reliable — the model literally cannot emit a malformed JSON structure because invalid tokens are masked to zero probability. This is deterministic at the schema-conformance level (output will parse) while leaving semantic content free (the model still chooses what to say within the schema).

The pattern is transformative for building reliable LLM applications. Before structured outputs, applications wrapped LLM calls in try/catch blocks, regex fallbacks, and retry logic to handle malformed responses. Prompt engineering included pleading with the model to “please return valid JSON!” and hoping. After structured outputs, the schema is a hard constraint. The application code assumes parseable output; the LLM provider guarantees it. This shift dramatically simplifies application logic.

Concrete instances in the wild

  • OpenAI Structured Outputs. Full JSON Schema support. Released 2024, generally available.
  • Anthropic tool use. Function-calling with typed schemas. Same mechanism used for extraction.
  • Anthropic prefill. Force response to start with specific text ({, XML tag, etc.) — simpler primitive that composes.
  • vLLM guided decoding. Regex, CFG, JSON Schema-based constraint at serving time.
  • Outlines. OSS library for constrained LLM generation with regex/CFG/schema.
  • Guidance. Microsoft OSS library for constrained generation with template + logic.
  • LMQL. Query language for constrained LLM generation.
  • Instructor (Jason Liu). Python library that wraps OpenAI + Pydantic for structured extraction.
  • Marvin. Similar library with Pydantic-based schemas.
  • BAML. Newer language for LLM function definitions with strong types.
  • basecamp structured-outputs (Y5 Phase 47). OpenAI/Anthropic native support + vLLM Outlines for local models.

Why this pattern matters

Building reliable LLM applications requires that LLM outputs be predictably parseable. Without structured outputs, application code becomes defensive: try to parse; catch parse errors; retry; give up gracefully; log for later analysis. The complexity accumulates and the reliability never quite reaches “just works.” With structured outputs, the schema is enforced by the provider, and application code can assume parseable output. Complexity moves out of the app into the LLM API.

The pattern is what makes function-calling reliable. Tool-use requires the LLM to select the right tool and provide well-formed arguments. Without constrained decoding, the LLM might emit almost-valid JSON, forget a parameter, or invent parameters. With constrained decoding, the output is guaranteed to be a valid function-call with valid arguments. This is the difference between “agent works most of the time” and “agent works reliably.”

For data extraction workflows, structured outputs are transformative. Extracting fields from documents. Categorizing tickets. Parsing user intents. Generating structured summaries. Each of these becomes a schema definition + LLM call + guaranteed parseable output. What used to require complex prompt engineering + parsing logic becomes a Pydantic model + one API call.

The pattern also enables new LLM patterns that weren’t possible with unstructured output. State machines where LLMs traverse specific transitions. Multi-step reasoning where each step has a specific output schema. Composable LLM chains where each step’s output is another step’s input. Reliable JSON output is the enabling primitive for structured LLM workflows.

For basecamp specifically, structured outputs are what makes the agent stack work reliably. Tool-use in the agent loop depends on structured outputs. RAG citation formats depend on structured outputs. AIOps triage decisions depend on structured outputs. Without them, the agent stack would be a house of cards built on hope; with them, it’s engineered infrastructure.

The failure modes to know: overly complex schemas that models struggle to satisfy (simplicity matters); schemas that force models into unnatural output patterns (quality can degrade); missing schema evolution when APIs change; latency overhead of constrained decoding at scale (usually small but measurable). Each has known patterns for mitigation.

Modern tooling makes structured outputs increasingly accessible. Pydantic-based libraries (Instructor, Marvin) provide Python-native integration. LangChain and LlamaIndex both support structured outputs as first-class primitives. vLLM’s guided decoding brings the pattern to self-hosted models. What used to require custom parsing logic is increasingly a library call.

Depth progression

STUB     ← you are here.
OUTLINE  Promoted when Y5 Phase 47 uses structured outputs in llm-gateway
         (function-calling or JSON mode).
DEEP     Out of scope; promoted with cross-cutting use across Y5 if natural.

Preview: what OUTLINE will answer

When Y5 Phase 47 promotes this entry to OUTLINE, it will name:

  • PROBLEM. How do you get LLM outputs that downstream systems can reliably parse?
  • PRINCIPLES. Schema is the contract. Constrained decoding enforces the contract. Keep schemas simple enough for models to satisfy naturally. Function-calling is structured outputs applied to action selection. Pydantic (or equivalent) as the developer interface.
  • TRADE-OFFS. Native API support (OpenAI, Anthropic — easy, vendor-specific) vs local constrained decoding (vLLM, Outlines — flexible, more setup). Simple schemas (easy for model) vs complex (may force retries). Strict enforcement (guaranteed valid) vs flexible (allows LLM creativity).
  • TOOLS (time-stamped as of 2026-06): OpenAI Structured Outputs, Anthropic tool use / prefill, vLLM guided decoding, Outlines, Guidance, LMQL, Instructor, Marvin, BAML.

The DEEP-level use is folded into every Y5 LLM application; no separate DEEP promotion, but structured outputs become second nature across basecamp’s LLM stack.

Canonical references

  • OpenAI Structured Outputs documentation. Free at platform.openai.com.
  • Anthropic tool use documentation. Free at docs.anthropic.com.
  • Outlines documentation. Free at outlines-dev.github.io/outlines.
  • Instructor documentation. Free at python.useinstructor.com.
  • vLLM documentation on guided decoding. Free at docs.vllm.ai.

Cross-references