LLM Fundamentals: Part 7 -- Structured Output | BDIGITAL

This is Part 7 of the LLM Fundamentals series.

In Part 5, I drew a line between prompt engineering and harness engineering. Structured output is pure harness. You do not ask a model nicely for JSON. You constrain it at the API level so it cannot produce anything else.

Hoping Is Not a Strategy

The wall shows up early in any LLM pipeline that needs the model to return JSON so downstream code can parse it. A careful prompt, “Return a JSON object with keys name, age, and city. Do not include any other text,” works most of the time. The failure modes are familiar: the model wraps the JSON in a markdown code fence, adds a preamble like “Here is the JSON you requested:” before the actual object, or returns valid JSON with extra keys you did not ask for.

At small scale, you write regex to strip code fences and try-catch around JSON.parse. At production scale, with thousands of calls per hour feeding downstream systems, even a small failure rate compounds into hundreds of broken records a day. I burned real debugging hours on this exact problem before structured outputs existed, building fragile parsing layers that grew more complex with every new edge case.

Structured output eliminates the entire category of problem.

How It Works

Anthropic’s structured outputs constrain the model’s generation to conform to a JSON schema you provide. Instead of generating free text and hoping it looks like JSON, the model is forced to produce output that validates against your schema. Every key, type, and required field is guaranteed.

This is not a prompt trick or a post-processing step. Constraining happens during generation, at the token selection level. When the model reaches a point where the schema requires a string value, it can only select tokens that produce a valid string. When a required key is missing, the model must produce it before closing the object.

From Post 2: the model generates one token at a time from a probability distribution. Structured output narrows that distribution at each step to only tokens that keep the output schema-valid. Same mechanism, tighter constraints.

Tool Definitions as Schema Enforcement

Tool use gives you another path to structured output, and in practice it is the one I reach for most often. When you define a tool with an input schema, the model must produce arguments that match that schema when it decides to call the tool. Combined with tool_choice to force a specific tool, you effectively get structured output through the tool calling mechanism.

Here is a tool definition that extracts contact information:

import anthropic

client = anthropic.Anthropic()

tools = [
    {
        "name": "extract_contact",
        "description": "Extract contact information from text.",
        "input_schema": {
            "type": "object",
            "properties": {
                "name": {
                    "type": "string",
                    "description": "Full name of the person"
                },
                "email": {
                    "type": "string",
                    "description": "Email address"
                },
                "phone": {
                    "type": "string",
                    "description": "Phone number"
                },
                "company": {
                    "type": "string",
                    "description": "Company or organization"
                }
            },
            "required": ["name", "email"]
        }
    }
]

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    tools=tools,
    tool_choice={"type": "tool", "name": "extract_contact"},
    messages=[
        {
            "role": "user",
            "content": "Contact: Jane Park, jane@acme.co, works at Acme Corp"
        }
    ],
)

# response.content[0].input is guaranteed to match the schema
contact = response.content[0].input
print(contact["name"])   # "Jane Park"
print(contact["email"])  # "jane@acme.co"

Setting tool_choice to {"type": "tool", "name": "extract_contact"} forces Claude to call that specific tool, which means the response will always contain a valid input object matching the schema. The parse-and-recover code disappears. Access response.content[0].input and get a Python dictionary that conforms to your schema definition.

When to Use Which Approach

Direct structured output and tool-based extraction both guarantee schema conformance, but they serve different purposes.

Direct structured output fits when the model needs to produce a formatted response and nothing else: classification tasks, data extraction, configuration generation. Any case where the entire response is the structured data.

Tool definitions fit when structured output is part of a larger interaction. If the model might need to call multiple tools, or if the structured extraction is one step in an agent loop, tool schemas keep each action’s inputs validated while letting the model reason about which actions to take and in what order.

Both approaches solve the same fundamental problem: turning probabilistic text generation into deterministic, schema-valid output that code can consume without defensive parsing.

Schema Design Rules

A schema that is too loose gives the model room to produce technically valid but useless output. A schema that is too specific fights against the model’s natural expression. A few practical rules from the pipelines I have shipped:

Mark fields as required only when downstream code truly depends on them. Optional fields let the model omit information it cannot confidently extract rather than hallucinating a value to satisfy the schema.

Use enums for classification. Instead of "type": "string" for a category field, list the valid values with "enum": ["bug", "feature", "question"]. This constrains the model to your exact taxonomy instead of letting it invent synonyms like “defect” or “enhancement.”

Keep descriptions concrete. “Email address” works. “Relevant electronic communication identifier for the individual” does not. Clear, simple descriptions help the model map input data to schema fields accurately.

Cost and Caching

Schemas add tokens to your request, which adds cost. Anthropic compiles each schema into a grammar and caches that artifact for 24 hours from last use, so repeated calls against the same schema benefit from reduced processing overhead. For high-volume pipelines, this caching changes the economics.

Schemas are best designed once and reused as shared constants. One source of truth for the schema, used in the API call and in the TypeScript types that consume the result. If the schema and the types ever diverge, you find out at compile time, not in production.

What Changes When Output Is Guaranteed

Once you can rely on the model’s output structure, entire architectural patterns become possible. Prompt chaining from Post 5 becomes trivial when every step produces validated output that the next step can consume without error handling. Pipeline orchestration simplifies because you remove an entire class of runtime failures.

Before structured output, a meaningful fraction of my pipeline code was parsing, validation, and error recovery glue around JSON.parse. After adopting schema-constrained generation, that layer went away entirely.

Structured output is the prerequisite for everything that follows in this series. Tool use in the next post builds directly on tool schemas. Agent loops the post after that depend on structured tool calls flowing reliably between steps.