Module 02 · Lesson 04

Iterative Refinement and Prompt Debugging

Reading time: 18 minutes Track: Prompt Mastery · Required for all learners Prerequisites: Module 02 · Lessons 01–03

What this lesson does

Your first prompt won't always work. Even with all six elements specified well, the output will sometimes miss the mark. The skill that separates intermediate users from advanced ones is how you respond when the first prompt fails.

Beginners give up or accept mediocre output. Intermediate users iterate randomly. Advanced users diagnose the failure systematically and fix the specific weakness.

By the end of this lesson, you'll be able to:

Diagnose what went wrong when output is weak
Apply targeted fixes rather than starting over
Use the model itself to help debug your prompts
Recognize when to iterate vs. when to restart

This is a tactical, pattern-recognition lesson. Expect to reference it often early in your practice.

01 · The two-loop model of prompt iteration

Most people iterate prompts in one loop: read the output, decide it's wrong, rewrite the whole prompt, try again. This is slow and tends to introduce new problems while fixing the old ones.

A better approach is two loops:

Inner loop — refine within the conversation

Keep the prompt where it is. Ask the model to revise the output, specifying what's wrong with what it produced. Fast. Doesn't lose what was good.

Outer loop — revise the prompt itself

When you've made the same correction in inner-loop iterations more than twice, the prompt itself needs fixing. Step out, edit the prompt, restart.

Most refinement should happen in the inner loop. Outer loop only when the prompt is structurally wrong.

02 · Inner loop — eight precise correction patterns

When the output is wrong, you can usually identify which dimension is wrong. Each has a precise correction pattern.

Pattern 1 — "Too generic"

The output reads like averaged content rather than something specific to your situation.

Correction: "This output is too generic. Add specifics from the context I provided: [list 2-3 specific items the output should have referenced]. Rewrite with those specifics integrated."

Pattern 2 — "Wrong length"

Too long or too short.

Correction (too long): "Reduce this to [X words]. Cut the [specific section/aspect] entirely — it's not needed. Tighten the remaining text without losing the [core point/data]."

Correction (too short): "Expand to [X words]. Deepen the discussion of [specific aspect] with more detail on [what's missing]. Don't add new topics — go deeper on what's there."

Pattern 3 — "Wrong tone or register"

The output is too informal, too academic, too marketing-y, too technical, etc.

Correction: "The tone is wrong. This needs to be [specific tone description with two reference points]. The current version reads like [what it sounds like]; it should read like [what it should sound like]. Rewrite preserving the content but adjusting the tone."

Pattern 4 — "Wrong structure"

The format doesn't match what you specified, or what you specified turned out not to work.

Correction: "The structure isn't working. Reorganize as follows: [explicit new structure with sections labeled]. Move content from current section X to new section Y. Drop section Z entirely."

Pattern 5 — "Wrong emphasis"

The output covered what you asked but emphasized the wrong elements.

Correction: "The emphasis is wrong. The most important point is [X], but the current draft buries it. Restructure so [X] leads, with [Y] and [Z] supporting. Cut emphasis on [W] — it's secondary here."

Pattern 6 — "Inaccurate or unsupported claim"

Specific claims in the output are wrong or aren't supported by what you provided.

Correction: "The output contains the claim [quote the problematic claim]. This isn't supported by the source materials I provided. Either: (a) cite where in the source this comes from, or (b) remove the claim and replace with content I can support."

Pattern 7 — "Violated constraint"

A hard constraint was violated.

Correction: "The output violated this constraint: [quote constraint]. Specifically, [point to the violation]. Reproduce the output with that constraint properly enforced. Do not violate it again."

Pattern 8 — "Missing requested element"

You asked for X and X isn't there.

Correction: "The output is missing [specific element you requested]. Add a [section/paragraph/sentence] covering [what's missing], integrated at [location]. Maintain the rest of the output as-is."

03 · The "what would make this perfect" prompt

When you can't quite articulate what's wrong, but the output isn't right, ask the model to help diagnose:

"Here's the output you produced. Before I provide feedback, I want your assessment: if you were reviewing this against my original request, what are the three weakest elements? Be specific and honest — don't tell me it's good if it isn't."

The model is often able to identify weaknesses in its own output when asked directly. The output may then be your guide to what to fix.

This works better than you'd expect, because the model has access to both your prompt and its output and can compare them.

04 · The outer loop — when to rewrite the prompt

After 2-3 inner-loop iterations, if you find yourself making the same correction repeatedly, the prompt itself has a problem. Step out of the conversation and rewrite.

Diagnostic questions

Before you rewrite, identify what's wrong with the prompt:

Was a constraint ambiguous? "Don't be too academic" is ambiguous; "Use plain language at an 8th-grade reading level" is specific.
Did you skip an element of the six-element framework? Most prompt failures are missing context, missing examples, or missing constraints.
Did you contradict yourself? "Make it concise and comprehensive" forces the model to pick one.
Is the role wrong for the task? A role mismatch propagates through every iteration.
Did you ask a question when you should have specified a deliverable? "What were the key risks?" produces an analysis; "Draft a 200-word risk summary for the executive briefing" produces a deliverable.

The 80/20 of prompt rewrites

In our experience teaching this, the single most common fix is adding a reference example. About half of the prompt rewrites that "fix" a stubborn problem are really about adding the example that was missing from the original.

If you find yourself in an outer-loop rewrite, try adding an example first before doing anything else.

05 · Sequential prompting for complex tasks

Some tasks are too complex for one prompt. The output is incomplete or shallow no matter how you specify it. The fix isn't a better prompt — it's multiple prompts.

Pattern — break the task into stages

Example task: "Draft a clinical study protocol amendment justification memo."

Single-prompt attempt: Produces a generic, shallow output covering all the elements at low depth.

Sequential prompt approach:

Stage 1 — Outline the memo. "Given the change we're making to the protocol, draft a one-page outline of the amendment justification memo. Identify the sections we need, the key points each must make, and what evidence supports each point."
Stage 2 — Draft the rationale section. "Using the outline above, draft the Rationale section in detail. 300-400 words. Lead with the scientific basis for the change."
Stage 3 — Draft the impact assessment. "Now draft the Impact on Trial Conduct section. 300-400 words. Cover impact on enrollment, data integrity, and timeline."
Stage 4 — Draft the regulatory considerations. "Now draft the Regulatory Considerations section. 200-300 words. Reference the specific ICH guidance that applies."
Stage 5 — Synthesize and check coherence. "Now produce the full memo combining the sections above. Verify that the rationale, impact, and regulatory sections are mutually consistent. Flag any inconsistencies."

Each stage produces better output than the all-at-once approach. The total time is similar; the quality is dramatically better.

When to use sequential prompting

Complex multi-section documents
Tasks where one section depends on another
Outputs where depth matters more than breadth
When the all-at-once attempt produced shallow content

When not to use it

Routine documents you can produce in one prompt
Time-sensitive drafts where speed beats depth
Tasks where the sections are genuinely independent and parallel

06 · The chain-of-thought trick

One advanced technique: ask the model to reason aloud before producing the output. This often improves quality, especially on tasks requiring analysis.

The pattern

"Before producing the [output], walk through your reasoning aloud:

What is the central claim I want to make?

What evidence supports that claim?

What's the strongest counter-argument?

How does my claim address that counter-argument?

After you've worked through these questions, produce the final [output]."

Why it works

The model effectively gets a "scratchpad" to think on. The output benefits from the reasoning even if the reasoning itself isn't used in the final product.

When to use it

Strategic memos
Risk assessments
Anything involving analysis or argument
Outputs going to skeptical audiences

When not to use it

Pure drafting tasks (literature summary, narrative reproduction)
Format-heavy outputs (tables, structured templates)
Routine tasks where the analysis isn't needed

07 · Debugging by example contrast

When you have a draft output that's almost right but you can't articulate what's wrong:

The pattern

"Here is the current draft: [paste current output]

Here is a previously approved version of similar output: [paste good reference]

What are the three specific structural or stylistic differences between these two? Then rewrite the current draft to match the structure and style of the reference, while preserving the content of the current draft."

The model can usually identify differences between two pieces of text that you struggle to articulate yourself, then rebuild to match.

This is one of the highest-leverage tricks in prompt iteration. Use it whenever you have a "good example" to compare against.

08 · Common iteration failure patterns

Failure 1 — Iteration that makes things worse

You correct the output and the next version is worse — different problems, lost good elements. Usually caused by over-broad correction prompts ("rewrite this") rather than targeted ones ("change only [X] to [Y]").

Fix: Use precise correction patterns from Section 02. Specify what to change AND what to preserve.

Failure 2 — Forgetting earlier context

After many iterations, the conversation gets long. The model starts forgetting earlier specifications. Output drifts back toward generic.

Fix: Restart with a clean conversation. Reproduce the consolidated prompt (original prompt + final corrections you discovered through iteration). Don't drag long conversations forward.

Failure 3 — Iterating past good

The output was already at "good enough for review." You keep iterating in search of "perfect." You spend an hour optimizing what 10 minutes of human editing would have improved more efficiently.

Fix: Set a stopping rule. "If output is 80% there in the first prompt and the next iteration doesn't get it past 90%, I'm taking over and editing manually." Iteration has diminishing returns.

Failure 4 — Iterating in the wrong place

You're iterating on a low-stakes section while the high-stakes section needs the work. Common when the model produced a polished but irrelevant section, and the important section was weak.

Fix: Iterate where the stakes are. Polishing introduction prose while the analysis section is shallow is misallocation.

09 · A practical exercise

Take any AI output you produced in the last week — anything. Look at it critically and identify which of the eight correction patterns from Section 02 would apply.

Then write the precise correction prompt you would use if you were going to fix it now.

Don't actually fix it. Just practice the pattern recognition. After a few rounds of this exercise, the patterns become automatic.

10 · Knowledge check

Three questions.

Q1. When should you use inner-loop iteration (refining within the same conversation) versus outer-loop iteration (rewriting the prompt)?

a) Always use outer loop — it's more thorough b) Always use inner loop — it's faster c) Use inner loop for targeted refinements; switch to outer loop when you find yourself making the same correction more than twice or when the prompt is structurally wrong d) Use inner loop for short outputs, outer loop for long ones

Q2. A medical writer's first prompt produced an output that's well-structured but covers the wrong emphasis (buries the most important finding). What's the best correction approach?

a) Start over with a new prompt b) Tell the model the emphasis is wrong, specify which element should lead and which should be secondary, and ask for a restructured version preserving content c) Manually rewrite the output d) Ask the model what it thinks the right emphasis is

Q3. When you've iterated 4-5 times and the conversation is getting long, what's the recommended action?

a) Continue iterating with stricter constraints b) Restart with a clean conversation using a consolidated prompt that incorporates what you've learned from the iterations c) Give up and write it yourself d) Switch to a more powerful model

Answers: Q1: c · Q2: b · Q3: b

11 · What's next

Lesson 05 — Common failure modes and recovery patterns — pulls back and looks at the most common failure patterns in real biotech AI use, with diagnostic frameworks for each. Less tactical than this lesson, more about pattern recognition at scale.

End of Lesson 04.