What are the most common prompt engineering mistakes?

The most common prompt engineering mistakes are writing vague instructions, stuffing too much context into a single prompt, giving contradictory rules, not specifying output format, failing to handle edge cases, and never testing with adversarial inputs. These mistakes cause hallucinations, inconsistent outputs, and unreliable production systems.

Why do AI prompts produce inconsistent results?

Inconsistent results usually stem from prompt engineering mistakes like ambiguous instructions, missing constraints, and reliance on implicit assumptions. When a prompt does not precisely define expected behavior, the model fills gaps with its own interpretation. Adding explicit output formats, behavioral rules, and examples dramatically improves consistency.

How do you fix a prompt that causes hallucinations?

Fix hallucination-prone prompts by adding explicit instructions to say "I don't know" when uncertain, providing source material for the model to reference, constraining the output domain, and removing instructions that pressure the model to always provide an answer. Test with questions that should produce refusals.

What is prompt drift and how do you prevent it?

Prompt drift occurs when prompts change over time without tracking, causing gradual quality degradation. Prevent it by versioning all prompts in source control, running automated evaluations on each change, maintaining a changelog, and testing new versions against baseline metrics before deploying to production.

How do you debug a failing AI prompt?

Debug failing prompts by isolating sections and testing them individually, examining specific failure cases for patterns, checking instruction ordering, removing ambiguous language, adding explicit constraints, and comparing outputs across multiple inputs. Keep a failure log and categorize issues by root cause to fix systematically.

Prompt Engineering CraftPattern Posts

5 Prompt Anti-Patterns That Waste Tokens and Trust

What not to do, and what to do instead.

The Prompt Engineering Project March 22, 2025 7 min read

Quick Answer

The most damaging prompt engineering mistakes include vague role definitions, contradictory instructions, missing output format specifications, over-stuffed context windows, no fallback behaviors, untested edge cases, hard-coded model assumptions, ignoring instruction ordering, skipping evaluation, and prompt drift without version control. Each anti-pattern degrades output quality and reliability in production systems.

Most prompt engineering advice focuses on what to do. This article is about what to stop doing. These five anti-patterns appear in nearly every codebase we audit. They waste tokens, produce unreliable outputs, and erode the trust that users place in AI-powered features. Each one is easy to identify and straightforward to fix.

The cost of bad prompts is not just computational. A poorly structured prompt generates outputs that require human review, rework, or apology. At scale, these failures become operational debt -- invisible, compounding, and expensive to unwind.

Anti-Pattern 1: The Vague Instruction

This is the most common anti-pattern and the most costly. It appears when a prompt relies on the model to infer what "good" means without providing any definition of quality. The developer knows what they want. The model does not.

Instead of

Make this email better.

Try this

Rewrite this email to be concise (under 150 words), use a professional tone, and end with a specific call to action requesting a meeting next Tuesday.

The word "better" is doing no work in the first prompt. It carries no information about which dimension of quality matters: brevity, tone, structure, accuracy, or persuasiveness. The model will make a guess, and that guess will be wrong often enough to be a problem.

The fix is explicit criteria. Define what good looks like. Specify measurable constraints where possible -- word counts, tone descriptors, structural requirements, and success conditions. The model cannot optimize for criteria it does not know about.

A useful test: if two different people could read your prompt and expect two different outputs, the prompt is too vague.

Anti-Pattern 2: The Over-Constraint

The opposite failure mode. Having learned that specificity matters, some teams overcorrect by writing prompts with dozens of rules, constraints, formatting requirements, and behavioral directives that conflict with each other or leave no room for the model to generate useful output.

over-constrained-prompt.txt

You are a helpful assistant. Always be concise. Always provide
thorough explanations. Never use jargon. Use technical terminology
when appropriate. Keep responses under 100 words. Include at least
3 examples for every claim. Never use bullet points. Structure
your response as a bulleted list. Be creative but stick exactly
to the format I describe. Do not make assumptions but infer what
the user needs from context.

Count the contradictions: concise but thorough. No jargon but technical terminology. Under 100 words but three examples per claim. No bullet points but bulleted list. The model will attempt to satisfy all constraints simultaneously, and the result will be incoherent.

The fix is prioritization. Identify the three to five constraints that actually matter for this specific task. Remove everything else. If two rules conflict, choose one and delete the other. A prompt with five clear rules will outperform a prompt with thirty ambiguous ones in every measurable dimension.

A prompt with five clear rules will outperform a prompt with thirty ambiguous ones every time.

Anti-Pattern 3: The Context Dump

Context windows are large. This has made people careless. The anti-pattern looks like this: paste an entire 40-page document into the context, then ask a question that requires only three sentences from page 12.

Instead of

Here is our entire 50-page employee handbook. What is the PTO policy for employees in their first year?

Try this

Section 4.2 of our employee handbook states: [relevant paragraph]. Based on this policy, summarize the PTO entitlement for first-year employees.

The cost is threefold. First, you are paying for tokens that do nothing. A 50-page document might consume 25,000 tokens. If only 200 tokens are relevant, you are paying 125 times more than necessary. Second, irrelevant context dilutes the model's attention. The model must determine which parts of the input matter, and more noise means more opportunities for distraction. Third, latency increases linearly with input length.

The fix is retrieval. Use RAG, manual extraction, or semantic search to identify the relevant passages before they reach the model. Send only what the model needs. If you are not sure which passages are relevant, use a two-stage approach: a fast, cheap model identifies relevant sections, and a slower, more capable model processes them.

Every token in the context window is a spending decision. Treat your context budget with the same discipline you apply to your cloud infrastructure budget.

Anti-Pattern 4: The Format Amnesia

This anti-pattern occurs when a prompt requests structured output without defining the structure. The developer expects JSON but does not say so. Or expects JSON with specific field names but provides no schema. Or expects a table but does not specify columns.

Instead of

Analyze these three products and compare them.

Try this

Compare these three products. Return a JSON array where each object has: name (string), strengths (string array, max 3), weaknesses (string array, max 3), and rating (number, 1-10).

Without a format specification, the model will choose its own structure. That structure will change between requests. Your downstream parsing code will break. Your users will see inconsistent outputs. Your team will spend hours debugging what looks like a model failure but is actually a prompt failure.

structured-output-prompt.txt

Analyze the provided customer feedback and return a JSON object
with this exact schema:

{
  "sentiment": "positive" | "negative" | "neutral",
  "confidence": number between 0 and 1,
  "key_themes": string[] (max 5 items),
  "action_required": boolean,
  "summary": string (max 50 words)
}

Return ONLY the JSON object. No additional text.

The fix is a schema. Every prompt that expects structured output should include the exact schema, field names, types, constraints, and an explicit instruction about whether additional text is acceptable. If you are using a model that supports structured output natively (like JSON mode), use that feature. If not, include the schema in the prompt and validate the output programmatically.

Anti-Pattern 5: The Model Monolith

This is the most insidious anti-pattern because it hides behind a reasonable assumption: a prompt that works on one model should work on another. It does not. Claude, GPT, and Gemini have different training data, different instruction-following tendencies, different strengths, and different failure modes. A prompt tuned for Claude will behave differently on GPT. A prompt tuned for GPT-4 will behave differently on GPT-4o.

Instead of

Writing one 'universal' prompt and deploying it across Claude, GPT, and Gemini without testing.

Try this

Maintaining model-specific prompt variants, tested against a shared evaluation suite, with documented behavioral differences.

The problem compounds when you switch models. A team that optimized prompts for GPT-4 migrates to Claude and finds that outputs are different -- not worse, but different. Formatting changes. Tone shifts. Edge cases that one model handled gracefully now produce errors. Without model-specific testing, these differences become production incidents.

The fix is model-aware prompt management. Maintain a prompt registry that supports model-specific variants. Build an evaluation suite that runs the same test cases across all target models. Document the behavioral differences. When you switch models or add a new one, run the full evaluation suite before deploying.

A prompt is not portable. It is tuned to a specific model, a specific version, and a specific task. Treat it accordingly.

These five anti-patterns share a common root cause: treating prompts as casual instructions rather than engineered artifacts. Prompts are code. They have inputs, outputs, edge cases, and failure modes. They deserve the same rigor you apply to any other interface between systems.

The fix for all five patterns is the same discipline: be specific, be minimal, be structured, be format-aware, and be model-aware. None of these require advanced techniques. They require attention.

Key Takeaways

Vague prompts produce vague outputs. Define explicit success criteria for every prompt, including measurable constraints where possible.

More rules do not mean better results. Prioritize the three to five constraints that matter most and remove conflicting instructions.

Context windows are expensive. Send only the tokens the model needs, using retrieval or two-stage processing to filter irrelevant content.

Always specify output format with exact schemas, field names, and types. Unstructured output requests produce inconsistent, unparseable results.

Prompts are not portable across models. Maintain model-specific variants and validate with a shared evaluation suite before deploying.

Frequently Asked Questions

Common questions about this topic

No Emoji, Ever: Restraint as a Design Principle Building an MCP Server: Architecture Decisions

Prompt Engineering Craft