How do you get reliable JSON output from an LLM?

For reliable LLM structured output, include the exact JSON schema in your prompt, provide one or two examples of correct output, use model features like response_format where available, validate all outputs against the schema, and implement retry logic with corrective prompting when validation fails. Never trust raw model output without validation.

What is structured output in AI?

LLM structured output refers to model responses that follow a predefined format such as JSON, XML, CSV, or a typed schema. Instead of free-form text, the model produces data that can be parsed and processed by code. This is essential for integrating AI outputs into applications, APIs, and automated workflows.

How do you handle malformed LLM output?

Handle malformed LLM structured output with multiple strategies: validate against a schema, attempt auto-repair of common issues like trailing commas or missing brackets, retry with a corrective prompt that includes the error message, fall back to a simpler output format, and log failures for prompt improvement. Never let invalid output reach downstream systems.

Should you use JSON mode or prompt-based structured output?

Use both together when available. JSON mode or response_format constraints provide a guardrail at the model level. Prompt-based instructions define the specific schema and field semantics. LLM structured output is most reliable when the model is both instructed via prompt and constrained via API parameters simultaneously.

How do you design a schema for LLM output?

Design LLM structured output schemas with flat structures when possible, required fields only for truly essential data, optional fields with sensible defaults, enum types for constrained values, and clear field descriptions. Avoid deeply nested objects. Simpler schemas produce more reliable outputs with fewer validation failures.

Prompt Engineering CraftPattern Posts

Structured Output Design: Making LLMs Return What You Need

JSON schemas, XML tags, markdown formatting, and table outputs.

The Prompt Engineering Project March 3, 2025 7 min read

Quick Answer

LLM structured output is the practice of designing prompts and systems that produce machine-parseable responses in formats like JSON, XML, or typed schemas. Reliable structured output requires explicit schema definitions in the prompt, output validation with retry logic, examples of correct formatting, and fallback handling when the model produces malformed data. It is essential for any AI system that feeds into downstream code.

The most common failure mode in production AI systems is not wrong answers. It is correct answers in unparseable formats. A model that returns brilliant analysis as free-form prose is useless to the API endpoint expecting JSON. A model that almost follows your schema -- adding an extra field here, omitting a required one there -- is worse than useless, because it fails intermittently and unpredictably.

Structured output design is the practice of constraining language model responses into predictable, machine-readable formats. It sits at the intersection of prompt engineering and API design, and it is one of the highest-leverage skills in production AI work. Get it right and your system hums. Get it wrong and you spend your weekends writing regex to salvage malformed JSON from a model that was trying its best.

There are four reliable patterns for structured output. Each has its strengths, failure modes, and ideal use cases. This article covers all four, then addresses the defense strategies that keep them working at scale.

Pattern 1: JSON with Schema Definition

The most common structured output pattern is providing an explicit JSON schema in your prompt. You show the model the exact shape of the object you want, and you tell it to conform. This works because language models are excellent at pattern matching, and a concrete example of the desired structure is the strongest possible signal for what you expect.

The key is specificity. Do not describe the schema in prose. Show the schema itself, with types annotated and required fields marked.

prompt-with-json-schema.txt

Analyze the following customer support ticket and return your analysis as JSON matching this exact schema:

{
  "sentiment": "positive" | "negative" | "neutral",
  "category": "billing" | "technical" | "account" | "feature_request",
  "urgency": "low" | "medium" | "high" | "critical",
  "summary": "string (1-2 sentences)",
  "suggested_action": "string",
  "confidence": number between 0 and 1
}

Return ONLY the JSON object. No markdown fences, no explanation, no preamble.

Ticket: """
{ticket_text}
"""

The Return ONLY the JSON object instruction is critical. Without it, most models will wrap the JSON in a conversational response or markdown code fences, both of which break JSON.parse() in production. Some models add trailing explanations even when told not to. You may need to extract the JSON from the response with a simple regex as a fallback.

When using the OpenAI or Anthropic APIs, prefer their native structured output modes (response_format for OpenAI, tool_use for Claude) over prompt-based schema enforcement. These provide guaranteed valid JSON at the API level.

Pattern 2: XML Tags for Section Boundaries

XML-style tags are exceptionally effective for structuring outputs that contain a mix of prose, data, and reasoning. Unlike JSON, which requires strict syntax and escaping, XML tags are forgiving and natural for language models to produce. Claude in particular responds extremely well to XML-structured prompts, both for input and output formatting.

The pattern works by defining clear section boundaries using opening and closing tags. The model fills in each section independently, which reduces the chance of one section contaminating another.

xml-output-prompt.txt

Analyze the provided document and structure your response using these exact tags:

<analysis>
  <summary>A 2-3 sentence overview of the document's main argument</summary>
  <key_claims>
    <claim>First major claim</claim>
    <claim>Second major claim</claim>
  </key_claims>
  <evidence_quality>weak | moderate | strong</evidence_quality>
  <reasoning>Your detailed assessment of the argument's logic</reasoning>
  <recommendation>Your recommended action based on this analysis</recommendation>
</analysis>

Document: {document_text}

XML tags have a distinct advantage for outputs that contain long-form text within a structured container. A JSON value containing multiple paragraphs requires escaping newlines and quotes. An XML tag simply wraps the text naturally. This makes XML the better choice when your structured output includes narrative sections alongside categorical data.

Instead of

{"analysis": "The document argues that...\n\nFurthermore..."}

Try this

<analysis>The document argues that... Furthermore...</analysis>

Pattern 3: Markdown Tables for Tabular Data

When your output is fundamentally tabular -- comparisons, feature matrices, extracted records -- markdown tables offer a format that is both human-readable and machine-parseable. Models produce markdown tables reliably because they have seen millions of them in training data.

table-output-prompt.txt

Extract all products mentioned in this review and present them in a markdown table with these exact columns:

| Product Name | Rating (1-5) | Pros | Cons | Recommended |
|---|---|---|---|---|

Each row should be one product. Rating should be a single integer. Recommended should be Yes or No.
Do not add any text before or after the table.

Review: {review_text}

Parsing markdown tables is straightforward -- split by |, trim whitespace, skip the header separator row. The main failure mode is inconsistent column counts, which happens when the model decides a cell needs a pipe character or when a value is empty. Defensive parsing should handle both cases.

Markdown tables work best when your data is flat and uniform. For nested data or variable-length arrays within cells, JSON or XML are better choices.

Pattern 4: Typed Enums for Constrained Values

Sometimes you do not need a full schema. You need the model to choose from a fixed set of values. Classification, routing, triage, and sentiment analysis all fall into this category. The pattern is deceptively simple: list the allowed values explicitly and instruct the model to return one and only one.

enum-constraint-prompt.txt

Classify the following customer message into exactly one category.

Allowed categories:
- BILLING
- TECHNICAL_SUPPORT
- ACCOUNT_ACCESS
- FEATURE_REQUEST
- CANCELLATION
- GENERAL_INQUIRY

Respond with ONLY the category name. No punctuation, no explanation.

Message: {message_text}

The failure mode here is subtle: the model returns a valid-looking but non-canonical value. Instead of TECHNICAL_SUPPORT, it returns Technical Support or tech_support or Technical. Your validation layer must normalize casing and handle near-matches. A fuzzy match against the allowed set with a confidence threshold is more robust than exact string comparison.

The model does not need to be creative with your enum values. It needs to be exact. Make the allowed set explicit and the rejection criteria absolute.

Defense Strategies: When Structure Fails

No prompting strategy produces perfectly structured output 100% of the time. Production systems need defense layers. Three strategies, used together, cover the vast majority of failure modes.

Validation with retry. Parse the output. If it fails schema validation, send it back to the model with the specific error message and ask it to fix the output. This works surprisingly well -- models are good at correcting their own formatting errors when given explicit feedback about what went wrong.

validation-retry.ts

async function getStructuredOutput<T>(
  prompt: string,
  schema: z.ZodSchema<T>,
  maxRetries = 2
): Promise<T> {
  let lastError = ''

  for (let i = 0; i <= maxRetries; i++) {
    const fullPrompt = i === 0
      ? prompt
      : `${prompt}\n\nYour previous response had this error: ${lastError}\nPlease fix and return valid output.`

    const response = await callModel(fullPrompt)
    const parsed = schema.safeParse(JSON.parse(response))

    if (parsed.success) return parsed.data
    lastError = parsed.error.message
  }

  throw new Error(`Failed after ${maxRetries} retries: ${lastError}`)
}

Schema enforcement at the API level. Both OpenAI and Anthropic now offer native structured output modes. OpenAI's response_format with a JSON schema guarantees valid JSON. Anthropic's tool use forces the model to return arguments matching a defined schema. These are strictly superior to prompt-based enforcement when available.

Graceful degradation. When all else fails, your system should not crash. Log the malformed output, return a sensible default or error state, and alert your monitoring. The worst production systems are the ones that silently pass malformed data downstream, where it causes failures that are far harder to diagnose.

Never trust model output without validation, even with API-level schema enforcement. Defense in depth is not paranoia -- it is production engineering.

Key Takeaways

JSON with explicit schema is the default choice for structured output. Show the exact shape, annotate types, and instruct the model to return only the JSON.

XML tags excel when structured output contains long-form prose sections. They are more forgiving than JSON and particularly effective with Claude.

Markdown tables work well for flat, uniform tabular data but break down with nested structures or variable-length content.

Typed enums need explicit allowed values and fuzzy validation to handle the near-miss responses models inevitably produce.

Defense in depth -- validation with retry, API-level enforcement, and graceful degradation -- is not optional for production systems.

Frequently Asked Questions

Common questions about this topic

The System Prompt Checklist: 12 Sections Every System Prompt Needs Writing Prompts for Claude vs GPT vs Gemini: What Transfers

Prompt Engineering Craft