Module 02 · Lesson 05

Common Failure Modes and Recovery Patterns

Reading time: 17 minutes Track: Prompt Mastery · Required for all learners Prerequisites: Module 02 · Lessons 01–04

What this lesson does

Lesson 04 covered how to debug a specific prompt that didn't work. This lesson zooms out and covers the failure patterns you'll see repeatedly across many prompts and tasks. Pattern recognition is what separates competent users from expert ones — when you see a failure, you immediately know what kind it is and what to do.

By the end of this lesson, you'll be able to:

Recognize the seven most common failure patterns in biotech AI use
Apply the right recovery strategy for each
Anticipate failures before they happen by recognizing risk conditions
Build prompts that prevent the most common failures structurally

This is a reference lesson. Don't memorize — bookmark, then come back when you see a failure you can't immediately classify.

01 · The seven patterns

After observing hundreds of biotech professionals using AI on real work, the same seven failure patterns appear over and over. Here they are, ordered by frequency:

The Confident Hallucination — Plausible content that's wrong
The Generic Drift — Output that's competent but not specific to your situation
The Constraint Violation — Hard rules ignored despite being stated
The Format Wobble — Inconsistent structure across outputs
The Audience Miss — Right content, wrong reader
The Scope Creep — Output goes beyond what was asked
The Surface Polish — Looks finished but lacks substance

Let's go through each.

02 · Pattern 1 — The Confident Hallucination

What it looks like

The output makes specific claims that sound authoritative but turn out to be fabricated when verified. Common examples:

Citations that don't exist (or exist but say something different)
Specific statistics, doses, or trial results that are wrong
Quotes attributed to real people who didn't say them
Regulatory citations referencing the wrong section of the wrong document
Historical facts about trials, approvals, or companies that don't check out

When it's most likely

Niche or rare topics (rare disease, novel mechanisms, obscure regulatory pathways)
Tasks asking for specific data points or citations
Topics where the training data is sparse or out-of-date
Long outputs where the model is generating extensively

Why it happens

The model generates plausible-sounding text. On topics it's well-trained on, plausible content tends to be accurate. On topics where it's not, plausible content can be wrong with no signal to you.

Recovery

You don't recover the output — you re-verify and replace the bad parts:

Scan output for specific claims (numbers, citations, names, dates)
Cross-check each against the source materials you provided or against external authoritative sources
Mark any that don't check out
Re-prompt: "The following claims in the output couldn't be verified: [list]. Either provide the source from the materials I supplied, or remove and replace with content I can support."

Prevention

Always include the constraint: "Use only the information I have provided. Flag uncertainty with [VERIFY]."
Always include the constraint: "Do not generate citations. Use [CITE] placeholders."
For high-stakes outputs, treat every specific claim as a hypothesis to check.

03 · Pattern 2 — The Generic Drift

What it looks like

The output is well-written, structurally appropriate, and on-topic, but it could have been about any company, any drug, any trial. It doesn't reference the specifics you provided or addresses them only superficially.

A telltale sign: the output reads like a passable version of "what such a document typically says" rather than "what this specific document needs to say."

When it's most likely

When you provided thin context
When you didn't include a reference example
When you used vague role and audience specifications
When the task type is common (the model's averaged version is competent but generic)

Why it happens

Without strong context, the model fills the gap with averaged content from training. The output is a plausible generic document, not your specific document.

Recovery

Re-prompt with explicit specificity demands:

"The output is too generic. It could be about any [task type]. I need it specific to my situation. Specifically: integrate the following details from the context: [list 4-6 specific items the output should reference]. Rewrite with these specifics woven throughout."

Prevention

The Examples element of the six-element framework prevents this almost entirely. A reference example forces the model to anchor on your specific style and structure. If you find yourself fighting generic drift repeatedly, you're underusing examples.

04 · Pattern 3 — The Constraint Violation

What it looks like

You stated a hard constraint. The output ignores it. Most common:

Generated citations despite "use [CITE] placeholders"
Made causality claims despite "no causality language"
Introduced data not in your source despite "use only provided data"
Used informal language despite "regulatory tone only"

When it's most likely

Long outputs (constraint attention degrades over distance)
Many constraints stated as a single list (hard ones get buried among soft ones)
Constraints stated late in the prompt rather than early
Implicit rather than explicit constraints

Why it happens

The model is biased toward producing helpful, complete output. Some constraints fight that bias. Without strong enforcement, the helpfulness wins.

Recovery

Re-prompt directly:

"The output violated this constraint I stated explicitly: [quote constraint]. Specifically, [point to violation]. Reproduce the output with the constraint properly enforced. The constraint is non-negotiable; do not violate it again."

If it violates again, the prompt needs structural revision (move constraint earlier, separate hard from soft, restate at multiple points in the prompt for long outputs).

Prevention

Separate hard constraints from soft ones with clear labels ("CRITICAL — MUST NOT VIOLATE" vs "STYLISTIC GUIDANCE")
For long outputs, restate the most important constraints at section breaks
For automated workflows, add a verification step that checks constraint compliance before accepting output

05 · Pattern 4 — The Format Wobble

What it looks like

The output's structure is inconsistent. Some sections are paragraphs, some are bullets. Heading levels vary. Length per section varies wildly. Citation style shifts mid-document.

When it's most likely

Long outputs covering multiple sections
Prompts that specified structure loosely ("organize as you see fit")
Outputs where the model encountered ambiguous transitions

Why it happens

Without explicit structural specifications, the model picks structure as it goes, and what it picks for section 1 may not match what it picks for section 4.

Recovery

Re-prompt with explicit structural specifications:

"The output has inconsistent structure. Reorganize using exactly this format: [explicit template with every section labeled, length specified, format specified]. Apply this format consistently to every section."

Prevention

Output templates (Lesson 03, Section 05) prevent this almost entirely. For any task where format matters, define the template upfront. Don't let the model improvise structure.

06 · Pattern 5 — The Audience Miss

What it looks like

The content is correct, the tone is appropriate for the writer's role, but it doesn't land for the actual reader. Too technical for executives. Too superficial for FDA reviewers. Too marketing-y for clinicians. Too clinical for patients.

When it's most likely

When you specified role but not audience
When the audience changed since the document type was established
When the document serves multiple audiences with conflicting needs

Why it happens

Role-only prompts produce output calibrated to the writer's typical reader, which may not be your actual reader.

Recovery

Re-prompt with explicit audience targeting:

"The current output is calibrated to the wrong audience. The actual reader is [specific audience description]. They want [what they want from this document]. They have [time available]. Their disposition is [skeptical/supportive/neutral]. Rewrite the output recalibrated for this audience: change [specific elements] and add [what's missing for this reader]."

Prevention

Include audience specification in every important prompt. The biotech audience map in Lesson 02 covers the recurring ones. For unusual audiences, write the 5-question audience spec from Lesson 02, Section 04.

07 · Pattern 6 — The Scope Creep

What it looks like

The output goes beyond what you asked. You requested a 200-word summary; you got 600 words covering tangentially related topics. You asked for an analysis of one scenario; you got analyses of three.

When it's most likely

Open-ended prompts ("tell me about X")
Prompts where the model interpreted the task more broadly than you intended
Topics where the model has lots of relevant content and wants to include it

Why it happens

The model is biased toward thoroughness. Without bounded scope, it'll cover more than necessary.

Recovery

Re-prompt with explicit scope limits:

"The output covers more than I asked. The scope I need is exactly: [restate scope precisely]. Remove content about [X, Y, Z] — these are out of scope. Keep only content directly addressing [the actual question]. Length: [specific limit]."

Prevention

Specify length precisely (range, not aspirational)
State the scope explicitly (what's in scope, what's out)
Use deliverable-specific prompts ("draft a 200-word summary") rather than open-ended ones ("summarize")

08 · Pattern 7 — The Surface Polish

What it looks like

The output reads well. Grammar is correct, structure is appropriate, vocabulary is right. But the substance is thin. The arguments aren't deep, the analysis isn't rigorous, the conclusions don't reflect serious thought.

This is the most dangerous failure pattern in biotech AI use because it's the hardest to detect. The output passes superficial review.

When it's most likely

High-complexity tasks given to a single all-at-once prompt
Strategic or analytical work
Outputs going to audiences who do appreciate depth (board, FDA, KOLs)
When you didn't use sequential prompting or chain-of-thought for a task that needed it

Why it happens

The model produces fluent prose easily; producing rigorous analysis requires the model to actually think through implications. In single-shot prompts on complex topics, fluency outpaces rigor.

Recovery

Re-prompt for depth:

"The output reads well but lacks depth. Specifically: [point to the shallow parts]. Add the following depth: [what depth means here — counter-arguments addressed, specific evidence cited, implications worked through, etc.]. Maintain the structure but make the substance rigorous."

Often, the better recovery is to break the task into stages (sequential prompting) rather than trying to deepen the single-shot output.

Prevention

For analytical or strategic tasks, use sequential prompting (Lesson 04, Section 05) from the start
For high-stakes analysis, use chain-of-thought prompting (Lesson 04, Section 06)
Build adversarial review into your workflow: "what's wrong with this analysis?" as a standard follow-up

09 · The failure pattern decision tree

When output is wrong, classify the failure quickly:

Is the output factually wrong about specific claims?
└── YES → Confident Hallucination (Pattern 1)

Is the output appropriately structured but not specific to your situation?
└── YES → Generic Drift (Pattern 2)

Did the output violate a hard constraint you stated?
└── YES → Constraint Violation (Pattern 3)

Is the format inconsistent across sections?
└── YES → Format Wobble (Pattern 4)

Is content correct but pitched to the wrong reader?
└── YES → Audience Miss (Pattern 5)

Did the output go beyond what you asked?
└── YES → Scope Creep (Pattern 6)

Does the output read well but lack depth/rigor?
└── YES → Surface Polish (Pattern 7)

Each pattern has its own recovery strategy. The faster you can classify, the faster you recover.

10 · A meta-pattern — multiple failures at once

In practice, you'll often see two or three patterns at once. A typical "this output isn't right" reaction is actually:

Generic Drift + Format Wobble + Surface Polish

When that happens, fix the most important one first. Usually that's Generic Drift (since fixing specificity often improves everything else). Then re-evaluate the output to see if the other patterns still exist or have resolved on their own.

Don't try to fix three patterns in one correction. Sequential fixes work better than combined ones.

11 · Building failure-resistant prompts from the start

The best response to common failure patterns is to build prompts that prevent them upfront. A checklist for high-stakes prompts:

Failure to prevent	Prompt element to add
Confident Hallucination	"Use only provided info. No citations. Flag uncertainty."
Generic Drift	Reference example. Specific context.
Constraint Violation	Hard constraints labeled and separated from soft.
Format Wobble	Output template specified upfront.
Audience Miss	Explicit audience description in prompt.
Scope Creep	Length range and out-of-scope items specified.
Surface Polish	Sequential prompting for complex tasks. Chain-of-thought for analysis.

A prompt with all seven preventions takes maybe 10 extra minutes to write. For tasks you do repeatedly, that time is amortized over every future use.

12 · Knowledge check

Three questions.

Q1. Which failure pattern is the most dangerous in biotech because it's the hardest to detect during review?

a) Confident Hallucination — fabricated content b) Format Wobble — inconsistent structure c) Surface Polish — output that reads well but lacks substantive depth or rigor d) Scope Creep — output beyond what was asked

Q2. A medical writer's prompt produced output that violated the "no citations — use [CITE] placeholders" constraint. What's the most likely structural cause?

a) The model is broken b) The constraint was buried in a list with stylistic preferences, making it harder for the model to weight it appropriately c) The writer needs to use a more powerful model d) The constraint was unreasonable

Q3. When you see Generic Drift in output, what's the single most effective fix?

a) Use a more powerful model b) Rewrite the prompt entirely c) Add a reference example showing specifically what good output looks like for your situation — this addresses the root cause of generic drift more reliably than any other intervention d) Increase the length specification

Answers: Q1: c · Q2: b · Q3: c

13 · What's next

Lesson 06 — Building your personal prompt library — operationalizes everything you've learned in Module 02. By the end, you'll have started a structured library of prompts you can reuse for your most common tasks. This is the Module 02 capstone.

End of Lesson 05.