PILOT — Private preview. Progress is saved for this browser session only.
HaiPhai.AI Fluency for Biotech

Model Selection & The Effort Parameter

Lesson 1~15 min3-question check

Module 09 · Lesson 01

Model Selection & The Effort Parameter

Reading time: 15 minutes Track: Advanced Claude Setup · Universal Prerequisites: Module 01–03 (all Universal Foundations)


Quick orientation

This module assumes you already know how to write a decent prompt. What you don't yet know — and what separates competent users from expert ones — is how to make systematic decisions about which Claude to use, how much intelligence to buy, and how to configure the model's reasoning depth before you write a single word of prompt text.

Those decisions happen before the prompt. Get them right and your prompts work harder. Get them wrong and no amount of prompt engineering fixes it.

This lesson covers the model lineup, the effort parameter, and the deprecation calendar.


01 · The current model lineup

As of June 2026, there are three production Claude models. Each has a distinct role.

ModelAPI IDContext windowBase cost (input/output per million tokens)
Claude Opus 4.8claude-opus-4-81 million tokens$5 / $25
Claude Sonnet 4.6claude-sonnet-4-61 million tokens$3 / $15
Claude Haiku 4.5claude-haiku-4-5-20251001200k tokens$1 / $5

One model has already been deprecated: Claude Sonnet 4 and Claude Opus 4 (the original, without a version suffix) retire on June 15, 2026. If your team is using claude-sonnet-4-20250514 or claude-opus-4-20250514, migrate before that date.

Biotech note: The 1 million token context window on Opus 4.8 and Sonnet 4.6 is genuinely transformative for document-heavy work. A full IND module, a complete clinical study report, an entire regulatory dossier section — all fit in a single context window. This changes the architecture of document review workflows entirely.

When to use each model

Opus 4.8 — Complex, long-horizon tasks where quality is the constraint and speed/cost are secondary. Best for: deep literature synthesis, multi-document regulatory analysis, agentic coding, scientific reasoning, and any task where you'd want a senior scientist to think carefully rather than answer quickly.

Sonnet 4.6 — The production default for most workloads. Delivers 80–90% of Opus 4.8's capability at 40% lower cost with meaningfully faster response times. Best for: drafting, structured extraction, routine document tasks, classification, and anything you'd run at scale.

Haiku 4.5 — The speed/cost tier. Best for: high-volume classification, quick summarization, real-time applications, and the "triage" step in multi-model pipelines where you screen inputs before sending complex ones to a smarter model.

Practical rule: Start with Sonnet 4.6 for everything. Move to Opus 4.8 when you're hitting quality ceilings on tasks that require genuine multi-step reasoning. Move to Haiku 4.5 when you're running the same simple task thousands of times per day.


02 · The effort parameter (Opus 4.8 specific)

The effort parameter is the most important tuning lever on Claude Opus 4.8. It controls intelligence vs. token spend. On Opus 4.8, it defaults to high — you must set it explicitly if you want a different level.

LevelUse when
maxHardest problems; can show diminishing returns and overthinking on simple tasks
xhighBest for coding and agentic workflows; strong tool use triggering
highMost intelligence-sensitive work; the minimum for complex reasoning tasks
mediumCost-sensitive production workloads; still intelligent, lower token burn
lowShort, scoped tasks and latency-sensitive work; not intelligence-sensitive

Important: Opus 4.8 takes effort levels seriously, especially at the low end. At low and medium, it scopes its work precisely to what was asked and does not elaborate. This is useful for latency and cost, but on moderately complex tasks there is real risk of under-thinking.

The API call looks like this:

import anthropic
client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=64000,
    output_config={"effort": "xhigh"},
    messages=[{"role": "user", "content": "..."}]
)

Rule of thumb: For agentic coding tasks, start at xhigh with max_tokens set to 64k. For interactive document work (drafting, analysis), start at high. For bulk classification or summarization, try medium first and only raise if quality is insufficient.

Effort vs. prompt hacking: If you're getting shallow responses on complex tasks, raise the effort level before you try rewriting the prompt. Prompt tricks that compensate for low effort are fragile and carry over into later model versions as dead weight.


03 · Adaptive thinking: how Opus 4.8 reasons

Opus 4.8 uses adaptive thinking — the model decides when and how much to think internally before responding. This is different from earlier manual extended thinking (which required you to set a budget_tokens parameter and is now deprecated on Opus 4.8 — passing it returns a 400 error).

To enable adaptive thinking:

response = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=64000,
    thinking={"type": "adaptive"},
    output_config={"effort": "high"},
    messages=[{"role": "user", "content": "..."}]
)

Thinking is off by default on Opus 4.8. Enable it for tasks requiring multi-step reasoning. The model will decide how much to think on each request based on complexity.

Biotech application: Enable adaptive thinking for clinical interpretation questions, statistical reasoning, and protocol review. Leave it off for routine drafting where speed matters more than deep analysis.

Interleaved thinking — enabled automatically alongside adaptive thinking on Opus 4.7+ — lets the model reason between tool calls, not just before them. For agentic workflows, this produces substantially better sequential decision-making.


04 · The 1M context window and what it actually means

Both Opus 4.8 and Sonnet 4.6 have 1 million token context windows. That's approximately:

  • 750,000 words
  • A full clinical study report (~200k words) with room to spare
  • 10–15 complete scientific papers simultaneously
  • An entire regulatory module's worth of content

The practical implications:

  1. You can pass full documents without chunking. This eliminates retrieval-augmented generation (RAG) architecture complexity for many use cases. For a single document under ~750k words, just pass the whole thing.

  2. Document-first placement still matters. Even with 1M tokens, Claude performs better when long documents are placed at the top of the context and the query/instructions are at the end. This isn't context-window management — it's a performance optimization. Queries at the end have been shown to improve quality by up to 30% on complex multi-document inputs.

  3. The middle problem persists. Claude retrieves information most reliably from the beginning and end of a context window. Information buried in the middle of a 500k-token context may be less reliably accessed. For critical information, place it early.

When RAG is still better than full-context: Large, frequently-updated knowledge bases (>1M tokens), multi-user systems where different users access different subsets, and cases where you need auditable source attribution by chunk. For everything else — especially one-off document analysis — consider full-context first.


05 · Deprecation hygiene

AI models don't stay current on their own. Two practices that prevent expensive surprises:

Pin model IDs explicitly. Don't use convenience aliases like claude-sonnet-4-0 — these point to deprecated versions. Use the full versioned ID: claude-sonnet-4-6, claude-opus-4-8, claude-haiku-4-5-20251001.

Track the deprecation calendar. Anthropic publishes model retirement dates. When a model enters "deprecated" status, plan migration within 60 days. Retired models stop responding entirely.

Current deprecations (as of June 2026):

  • claude-sonnet-4-20250514 — retire June 15, 2026 → migrate to claude-sonnet-4-6
  • claude-opus-4-20250514 — retire June 15, 2026 → migrate to claude-opus-4-8

06 · Knowledge check

Q1. Your team runs a workflow that classifies 50,000 documents per day into one of six regulatory categories. Each document is ~400 words. Which model and effort level is the most defensible starting point?

a) Opus 4.8 at xhigh effort — maximum classification accuracy b) Sonnet 4.6 at high effort — good balance of quality and cost c) Haiku 4.5 — speed and cost-optimized for high-volume simple classification d) Sonnet 4.6 at max effort — ensures no misclassifications

Q2. A regulatory scientist needs to compare three complete FDA guidance documents simultaneously to identify inconsistencies before an IND submission. Which context window capability matters most here?

a) The 200k context window on Haiku 4.5 — sufficient for three guidance documents b) The 1M context window on Sonnet 4.6 — all three documents fit without chunking c) Retrieval-augmented generation — mandatory for document comparison d) Extended thinking with budget_tokens — required for multi-document reasoning

Q3. You're using Opus 4.8 for a complex literature synthesis task and getting responses that feel shallow. What is the correct first intervention?

a) Rewrite the prompt to be more detailed and specific b) Switch to Sonnet 4.6 which is more verbose by default c) Raise the effort parameter to high or xhigh d) Add chain-of-thought instructions to the prompt

Answers: Q1: c · Q2: b · Q3: c


End of Lesson 01.

Knowledge check

3 questions · select an answer to see if you got it
1.Your team runs a workflow classifying 50,000 documents per day into regulatory categories. Each document is ~400 words. Which model is the most defensible starting point?
2.A regulatory scientist needs to compare three complete FDA guidance documents simultaneously to identify inconsistencies. Which capability matters most?
3.You're using Opus 4.8 for complex literature synthesis and getting responses that feel shallow. What is the correct first intervention?