Prompt Engineering CraftPattern Posts

Structured Output Design: Making LLMs Return What You Need

JSON schemas, XML tags, markdown formatting, and table outputs.

The Prompt Engineering Project March 3, 2025 7 min read

Quick Answer

LLM structured output is the practice of designing prompts and systems that produce machine-parseable responses in formats like JSON, XML, or typed schemas. Reliable structured output requires explicit schema definitions in the prompt, output validation with retry logic, examples of correct formatting, and fallback handling when the model produces malformed data. It is essential for any AI system that feeds into downstream code.

The most common failure mode in production AI systems is not wrong answers. It is correct answers in unparseable formats. A model that returns brilliant analysis as free-form prose is useless to the API endpoint expecting JSON. A model that almost follows your schema -- adding an extra field here, omitting a required one there -- is worse than useless, because it fails intermittently and unpredictably.

Structured output design is the practice of constraining language model responses into predictable, machine-readable formats. It sits at the intersection of prompt engineering and API design, and it is one of the highest-leverage skills in production AI work. Get it right and your system hums. Get it wrong and you spend your weekends writing regex to salvage malformed JSON from a model that was trying its best.

There are four reliable patterns for structured output. Each has its strengths, failure modes, and ideal use cases. This article covers all four, then addresses the defense strategies that keep them working at scale.

Pattern 1: JSON with Schema Definition

The most common structured output pattern is providing an explicit JSON schema in your prompt. You show the model the exact shape of the object you want, and you tell it to conform. This works because language models are excellent at pattern matching, and a concrete example of the desired structure is the strongest possible signal for what you expect.

The key is specificity. Do not describe the schema in prose. Show the schema itself, with types annotated and required fields marked.

prompt-with-json-schema.txt
Analyze the following customer support ticket and return your analysis as JSON matching this exact schema:

{
  "sentiment": "positive" | "negative" | "neutral",
  "category": "billing" | "technical" | "account" | "feature_request",
  "urgency": "low" | "medium" | "high" | "critical",
  "summary": "string (1-2 sentences)",
  "suggested_action": "string",
  "confidence": number between 0 and 1
}

Return ONLY the JSON object. No markdown fences, no explanation, no preamble.

Ticket: """
{ticket_text}
"""

The Return ONLY the JSON object instruction is critical. Without it, most models will wrap the JSON in a conversational response or markdown code fences, both of which break JSON.parse() in production. Some models add trailing explanations even when told not to. You may need to extract the JSON from the response with a simple regex as a fallback.

When using the OpenAI or Anthropic APIs, prefer their native structured output modes (response_format for OpenAI, tool_use for Claude) over prompt-based schema enforcement. These provide guaranteed valid JSON at the API level.

Pattern 2: XML Tags for Section Boundaries

XML-style tags are exceptionally effective for structuring outputs that contain a mix of prose, data, and reasoning. Unlike JSON, which requires strict syntax and escaping, XML tags are forgiving and natural for language models to produce. Claude in particular responds extremely well to XML-structured prompts, both for input and output formatting.

The pattern works by defining clear section boundaries using opening and closing tags. The model fills in each section independently, which reduces the chance of one section contaminating another.

xml-output-prompt.txt
Analyze the provided document and structure your response using these exact tags:

<analysis>
  <summary>A 2-3 sentence overview of the document's main argument</summary>
  <key_claims>
    <claim>First major claim</claim>
    <claim>Second major claim</claim>
  </key_claims>
  <evidence_quality>weak | moderate | strong</evidence_quality>
  <reasoning>Your detailed assessment of the argument's logic</reasoning>
  <recommendation>Your recommended action based on this analysis</recommendation>
</analysis>

Document: {document_text}

XML tags have a distinct advantage for outputs that contain long-form text within a structured container. A JSON value containing multiple paragraphs requires escaping newlines and quotes. An XML tag simply wraps the text naturally. This makes XML the better choice when your structured output includes narrative sections alongside categorical data.

Instead of

{"analysis": "The document argues that...\n\nFurthermore..."}

Try this

<analysis>The document argues that... Furthermore...</analysis>

Pattern 3: Markdown Tables for Tabular Data

When your output is fundamentally tabular -- comparisons, feature matrices, extracted records -- markdown tables offer a format that is both human-readable and machine-parseable. Models produce markdown tables reliably because they have seen millions of them in training data.

table-output-prompt.txt
Extract all products mentioned in this review and present them in a markdown table with these exact columns:

| Product Name | Rating (1-5) | Pros | Cons | Recommended |
|---|---|---|---|---|

Each row should be one product. Rating should be a single integer. Recommended should be Yes or No.
Do not add any text before or after the table.

Review: {review_text}

Parsing markdown tables is straightforward -- split by |, trim whitespace, skip the header separator row. The main failure mode is inconsistent column counts, which happens when the model decides a cell needs a pipe character or when a value is empty. Defensive parsing should handle both cases.

Markdown tables work best when your data is flat and uniform. For nested data or variable-length arrays within cells, JSON or XML are better choices.

Pattern 4: Typed Enums for Constrained Values

Sometimes you do not need a full schema. You need the model to choose from a fixed set of values. Classification, routing, triage, and sentiment analysis all fall into this category. The pattern is deceptively simple: list the allowed values explicitly and instruct the model to return one and only one.

enum-constraint-prompt.txt
Classify the following customer message into exactly one category.

Allowed categories:
- BILLING
- TECHNICAL_SUPPORT
- ACCOUNT_ACCESS
- FEATURE_REQUEST
- CANCELLATION
- GENERAL_INQUIRY

Respond with ONLY the category name. No punctuation, no explanation.

Message: {message_text}

The failure mode here is subtle: the model returns a valid-looking but non-canonical value. Instead of TECHNICAL_SUPPORT, it returns Technical Support or tech_support or Technical. Your validation layer must normalize casing and handle near-matches. A fuzzy match against the allowed set with a confidence threshold is more robust than exact string comparison.

The model does not need to be creative with your enum values. It needs to be exact. Make the allowed set explicit and the rejection criteria absolute.

Defense Strategies: When Structure Fails

No prompting strategy produces perfectly structured output 100% of the time. Production systems need defense layers. Three strategies, used together, cover the vast majority of failure modes.

Validation with retry. Parse the output. If it fails schema validation, send it back to the model with the specific error message and ask it to fix the output. This works surprisingly well -- models are good at correcting their own formatting errors when given explicit feedback about what went wrong.

validation-retry.ts
async function getStructuredOutput<T>(
  prompt: string,
  schema: z.ZodSchema<T>,
  maxRetries = 2
): Promise<T> {
  let lastError = ''

  for (let i = 0; i <= maxRetries; i++) {
    const fullPrompt = i === 0
      ? prompt
      : `${prompt}\n\nYour previous response had this error: ${lastError}\nPlease fix and return valid output.`

    const response = await callModel(fullPrompt)
    const parsed = schema.safeParse(JSON.parse(response))

    if (parsed.success) return parsed.data
    lastError = parsed.error.message
  }

  throw new Error(`Failed after ${maxRetries} retries: ${lastError}`)
}

Schema enforcement at the API level. Both OpenAI and Anthropic now offer native structured output modes. OpenAI's response_format with a JSON schema guarantees valid JSON. Anthropic's tool use forces the model to return arguments matching a defined schema. These are strictly superior to prompt-based enforcement when available.

Graceful degradation. When all else fails, your system should not crash. Log the malformed output, return a sensible default or error state, and alert your monitoring. The worst production systems are the ones that silently pass malformed data downstream, where it causes failures that are far harder to diagnose.

Never trust model output without validation, even with API-level schema enforcement. Defense in depth is not paranoia -- it is production engineering.

Key Takeaways

1

JSON with explicit schema is the default choice for structured output. Show the exact shape, annotate types, and instruct the model to return only the JSON.

2

XML tags excel when structured output contains long-form prose sections. They are more forgiving than JSON and particularly effective with Claude.

3

Markdown tables work well for flat, uniform tabular data but break down with nested structures or variable-length content.

4

Typed enums need explicit allowed values and fuzzy validation to handle the near-miss responses models inevitably produce.

5

Defense in depth -- validation with retry, API-level enforcement, and graceful degradation -- is not optional for production systems.

Frequently Asked Questions

Common questions about this topic

The System Prompt Checklist: 12 Sections Every System Prompt NeedsWriting Prompts for Claude vs GPT vs Gemini: What Transfers

Related Articles

Prompt Engineering Craft

Anatomy of a System Prompt

A line-by-line breakdown of a real system prompt: role definition, constraints, output format, examples, and context bou...

Prompt Engineering Craft

5 Prompt Anti-Patterns That Waste Tokens and Trust

Five specific anti-patterns with examples: vague instructions, over-constraining, context dumping, ignoring output forma...

Prompt Engineering Craft

Writing Prompts for Claude vs GPT vs Gemini: What Transfers

Some prompt techniques transfer perfectly across models. Others fail spectacularly. Here's a practical guide to what wor...

All Articles