Module 02 · Lesson 05
Common Failure Modes and Recovery Patterns
Reading time: 17 minutes Track: Prompt Mastery · Required for all learners Prerequisites: Module 02 · Lessons 01–04
What this lesson does
Lesson 04 covered how to debug a specific prompt that didn't work. This lesson zooms out and covers the failure patterns you'll see repeatedly across many prompts and tasks. Pattern recognition is what separates competent users from expert ones — when you see a failure, you immediately know what kind it is and what to do.
By the end of this lesson, you'll be able to:
- Recognize the seven most common failure patterns in biotech AI use
- Apply the right recovery strategy for each
- Anticipate failures before they happen by recognizing risk conditions
- Build prompts that prevent the most common failures structurally
This is a reference lesson. Don't memorize — bookmark, then come back when you see a failure you can't immediately classify.
01 · The seven patterns
After observing hundreds of biotech professionals using AI on real work, the same seven failure patterns appear over and over. Here they are, ordered by frequency:
- The Confident Hallucination — Plausible content that's wrong
- The Generic Drift — Output that's competent but not specific to your situation
- The Constraint Violation — Hard rules ignored despite being stated
- The Format Wobble — Inconsistent structure across outputs
- The Audience Miss — Right content, wrong reader
- The Scope Creep — Output goes beyond what was asked
- The Surface Polish — Looks finished but lacks substance
Let's go through each.
02 · Pattern 1 — The Confident Hallucination
What it looks like
The output makes specific claims that sound authoritative but turn out to be fabricated when verified. Common examples:
- Citations that don't exist (or exist but say something different)
- Specific statistics, doses, or trial results that are wrong
- Quotes attributed to real people who didn't say them
- Regulatory citations referencing the wrong section of the wrong document
- Historical facts about trials, approvals, or companies that don't check out
When it's most likely
- Niche or rare topics (rare disease, novel mechanisms, obscure regulatory pathways)
- Tasks asking for specific data points or citations
- Topics where the training data is sparse or out-of-date
- Long outputs where the model is generating extensively
Why it happens
The model generates plausible-sounding text. On topics it's well-trained on, plausible content tends to be accurate. On topics where it's not, plausible content can be wrong with no signal to you.
Recovery
You don't recover the output — you re-verify and replace the bad parts:
- Scan output for specific claims (numbers, citations, names, dates)
- Cross-check each against the source materials you provided or against external authoritative sources
- Mark any that don't check out
- Re-prompt: "The following claims in the output couldn't be verified: [list]. Either provide the source from the materials I supplied, or remove and replace with content I can support."
Prevention
- Always include the constraint: "Use only the information I have provided. Flag uncertainty with [VERIFY]."
- Always include the constraint: "Do not generate citations. Use [CITE] placeholders."
- For high-stakes outputs, treat every specific claim as a hypothesis to check.
03 · Pattern 2 — The Generic Drift
What it looks like
The output is well-written, structurally appropriate, and on-topic, but it could have been about any company, any drug, any trial. It doesn't reference the specifics you provided or addresses them only superficially.
A telltale sign: the output reads like a passable version of "what such a document typically says" rather than "what this specific document needs to say."
When it's most likely
- When you provided thin context
- When you didn't include a reference example
- When you used vague role and audience specifications
- When the task type is common (the model's averaged version is competent but generic)
Why it happens
Without strong context, the model fills the gap with averaged content from training. The output is a plausible generic document, not your specific document.
Recovery
Re-prompt with explicit specificity demands:
"The output is too generic. It could be about any [task type]. I need it specific to my situation. Specifically: integrate the following details from the context: [list 4-6 specific items the output should reference]. Rewrite with these specifics woven throughout."
Prevention
The Examples element of the six-element framework prevents this almost entirely. A reference example forces the model to anchor on your specific style and structure. If you find yourself fighting generic drift repeatedly, you're underusing examples.
04 · Pattern 3 — The Constraint Violation
What it looks like
You stated a hard constraint. The output ignores it. Most common:
- Generated citations despite "use [CITE] placeholders"
- Made causality claims despite "no causality language"
- Introduced data not in your source despite "use only provided data"
- Used informal language despite "regulatory tone only"
When it's most likely
- Long outputs (constraint attention degrades over distance)
- Many constraints stated as a single list (hard ones get buried among soft ones)
- Constraints stated late in the prompt rather than early
- Implicit rather than explicit constraints
Why it happens
The model is biased toward producing helpful, complete output. Some constraints fight that bias. Without strong enforcement, the helpfulness wins.
Recovery
Re-prompt directly:
"The output violated this constraint I stated explicitly: [quote constraint]. Specifically, [point to violation]. Reproduce the output with the constraint properly enforced. The constraint is non-negotiable; do not violate it again."
If it violates again, the prompt needs structural revision (move constraint earlier, separate hard from soft, restate at multiple points in the prompt for long outputs).
Prevention
- Separate hard constraints from soft ones with clear labels ("CRITICAL — MUST NOT VIOLATE" vs "STYLISTIC GUIDANCE")
- For long outputs, restate the most important constraints at section breaks
- For automated workflows, add a verification step that checks constraint compliance before accepting output
05 · Pattern 4 — The Format Wobble
What it looks like
The output's structure is inconsistent. Some sections are paragraphs, some are bullets. Heading levels vary. Length per section varies wildly. Citation style shifts mid-document.
When it's most likely
- Long outputs covering multiple sections
- Prompts that specified structure loosely ("organize as you see fit")
- Outputs where the model encountered ambiguous transitions
Why it happens
Without explicit structural specifications, the model picks structure as it goes, and what it picks for section 1 may not match what it picks for section 4.
Recovery
Re-prompt with explicit structural specifications:
"The output has inconsistent structure. Reorganize using exactly this format: [explicit template with every section labeled, length specified, format specified]. Apply this format consistently to every section."
Prevention
Output templates (Lesson 03, Section 05) prevent this almost entirely. For any task where format matters, define the template upfront. Don't let the model improvise structure.
06 · Pattern 5 — The Audience Miss
What it looks like
The content is correct, the tone is appropriate for the writer's role, but it doesn't land for the actual reader. Too technical for executives. Too superficial for FDA reviewers. Too marketing-y for clinicians. Too clinical for patients.
When it's most likely
- When you specified role but not audience
- When the audience changed since the document type was established
- When the document serves multiple audiences with conflicting needs
Why it happens
Role-only prompts produce output calibrated to the writer's typical reader, which may not be your actual reader.
Recovery
Re-prompt with explicit audience targeting:
"The current output is calibrated to the wrong audience. The actual reader is [specific audience description]. They want [what they want from this document]. They have [time available]. Their disposition is [skeptical/supportive/neutral]. Rewrite the output recalibrated for this audience: change [specific elements] and add [what's missing for this reader]."
Prevention
Include audience specification in every important prompt. The biotech audience map in Lesson 02 covers the recurring ones. For unusual audiences, write the 5-question audience spec from Lesson 02, Section 04.
07 · Pattern 6 — The Scope Creep
What it looks like
The output goes beyond what you asked. You requested a 200-word summary; you got 600 words covering tangentially related topics. You asked for an analysis of one scenario; you got analyses of three.
When it's most likely
- Open-ended prompts ("tell me about X")
- Prompts where the model interpreted the task more broadly than you intended
- Topics where the model has lots of relevant content and wants to include it
Why it happens
The model is biased toward thoroughness. Without bounded scope, it'll cover more than necessary.
Recovery
Re-prompt with explicit scope limits:
"The output covers more than I asked. The scope I need is exactly: [restate scope precisely]. Remove content about [X, Y, Z] — these are out of scope. Keep only content directly addressing [the actual question]. Length: [specific limit]."
Prevention
- Specify length precisely (range, not aspirational)
- State the scope explicitly (what's in scope, what's out)
- Use deliverable-specific prompts ("draft a 200-word summary") rather than open-ended ones ("summarize")
08 · Pattern 7 — The Surface Polish
What it looks like
The output reads well. Grammar is correct, structure is appropriate, vocabulary is right. But the substance is thin. The arguments aren't deep, the analysis isn't rigorous, the conclusions don't reflect serious thought.
This is the most dangerous failure pattern in biotech AI use because it's the hardest to detect. The output passes superficial review.
When it's most likely
- High-complexity tasks given to a single all-at-once prompt
- Strategic or analytical work
- Outputs going to audiences who do appreciate depth (board, FDA, KOLs)
- When you didn't use sequential prompting or chain-of-thought for a task that needed it
Why it happens
The model produces fluent prose easily; producing rigorous analysis requires the model to actually think through implications. In single-shot prompts on complex topics, fluency outpaces rigor.
Recovery
Re-prompt for depth:
"The output reads well but lacks depth. Specifically: [point to the shallow parts]. Add the following depth: [what depth means here — counter-arguments addressed, specific evidence cited, implications worked through, etc.]. Maintain the structure but make the substance rigorous."
Often, the better recovery is to break the task into stages (sequential prompting) rather than trying to deepen the single-shot output.
Prevention
- For analytical or strategic tasks, use sequential prompting (Lesson 04, Section 05) from the start
- For high-stakes analysis, use chain-of-thought prompting (Lesson 04, Section 06)
- Build adversarial review into your workflow: "what's wrong with this analysis?" as a standard follow-up
09 · The failure pattern decision tree
When output is wrong, classify the failure quickly:
Is the output factually wrong about specific claims?
└── YES → Confident Hallucination (Pattern 1)
Is the output appropriately structured but not specific to your situation?
└── YES → Generic Drift (Pattern 2)
Did the output violate a hard constraint you stated?
└── YES → Constraint Violation (Pattern 3)
Is the format inconsistent across sections?
└── YES → Format Wobble (Pattern 4)
Is content correct but pitched to the wrong reader?
└── YES → Audience Miss (Pattern 5)
Did the output go beyond what you asked?
└── YES → Scope Creep (Pattern 6)
Does the output read well but lack depth/rigor?
└── YES → Surface Polish (Pattern 7)
Each pattern has its own recovery strategy. The faster you can classify, the faster you recover.
10 · A meta-pattern — multiple failures at once
In practice, you'll often see two or three patterns at once. A typical "this output isn't right" reaction is actually:
- Generic Drift + Format Wobble + Surface Polish
When that happens, fix the most important one first. Usually that's Generic Drift (since fixing specificity often improves everything else). Then re-evaluate the output to see if the other patterns still exist or have resolved on their own.
Don't try to fix three patterns in one correction. Sequential fixes work better than combined ones.
11 · Building failure-resistant prompts from the start
The best response to common failure patterns is to build prompts that prevent them upfront. A checklist for high-stakes prompts:
| Failure to prevent | Prompt element to add |
|---|---|
| Confident Hallucination | "Use only provided info. No citations. Flag uncertainty." |
| Generic Drift | Reference example. Specific context. |
| Constraint Violation | Hard constraints labeled and separated from soft. |
| Format Wobble | Output template specified upfront. |
| Audience Miss | Explicit audience description in prompt. |
| Scope Creep | Length range and out-of-scope items specified. |
| Surface Polish | Sequential prompting for complex tasks. Chain-of-thought for analysis. |
A prompt with all seven preventions takes maybe 10 extra minutes to write. For tasks you do repeatedly, that time is amortized over every future use.
12 · Knowledge check
Three questions.
Q1. Which failure pattern is the most dangerous in biotech because it's the hardest to detect during review?
a) Confident Hallucination — fabricated content b) Format Wobble — inconsistent structure c) Surface Polish — output that reads well but lacks substantive depth or rigor d) Scope Creep — output beyond what was asked
Q2. A medical writer's prompt produced output that violated the "no citations — use [CITE] placeholders" constraint. What's the most likely structural cause?
a) The model is broken b) The constraint was buried in a list with stylistic preferences, making it harder for the model to weight it appropriately c) The writer needs to use a more powerful model d) The constraint was unreasonable
Q3. When you see Generic Drift in output, what's the single most effective fix?
a) Use a more powerful model b) Rewrite the prompt entirely c) Add a reference example showing specifically what good output looks like for your situation — this addresses the root cause of generic drift more reliably than any other intervention d) Increase the length specification
Answers: Q1: c · Q2: b · Q3: c
13 · What's next
Lesson 06 — Building your personal prompt library — operationalizes everything you've learned in Module 02. By the end, you'll have started a structured library of prompts you can reuse for your most common tasks. This is the Module 02 capstone.
End of Lesson 05.