Module 09 · Lesson 04

Advanced Reasoning & Tool Use

Reading time: 20 minutes Track: Advanced Claude Setup · Universal Prerequisites: Module 09 · Lesson 03

When Claude thinks, and when it doesn't

Claude 4.x models can reason before responding — working through a problem internally before producing an answer. Understanding when this happens, how to control it, and how to use it in agentic workflows is the difference between basic Claude usage and expert deployment.

01 · Adaptive thinking: the current paradigm

Adaptive thinking is the current standard for reasoning in Claude 4.x models. Unlike earlier manual extended thinking (which required you to set a budget_tokens parameter), adaptive thinking lets Claude decide when and how deeply to reason on each request based on query complexity and the effort parameter.

To enable adaptive thinking:

response = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=64000,
    thinking={"type": "adaptive"},
    output_config={"effort": "high"},
    messages=[{"role": "user", "content": "..."}]
)

Key points:

Thinking is off by default on Opus 4.8. You must explicitly enable it.
On Opus 4.8, manual extended thinking (budget_tokens) is not supported and returns a 400 error.
On Sonnet 4.6, adaptive thinking is available and recommended; budget_tokens is deprecated but still functional.
The effort parameter and query complexity together determine how much the model thinks. Higher effort = more thinking on complex queries.

When adaptive thinking adds value:

Multi-step agentic workflows (coding agents, research pipelines)
Complex scientific or clinical reasoning with multiple constraints
Statistical interpretation and methodology questions
Any task where the answer requires reasoning through multiple interdependent steps

When to leave it off:

Drafting and writing tasks where the output is the goal
Classification and extraction tasks
High-volume, latency-sensitive applications

02 · Interleaved thinking: reasoning between tool calls

Standard thinking happens before Claude starts responding. Interleaved thinking happens between tool calls — Claude reasons about a tool result before deciding what to do next.

This is automatically enabled alongside adaptive thinking on Opus 4.7 and later models.

Why it matters: without interleaved thinking, an agentic workflow like "search these three databases, synthesize the results" operates like a script — tool call, tool call, tool call, synthesize. With interleaved thinking, Claude can reason after each tool result, adjust its approach, identify contradictions, and make better decisions about subsequent steps.

The API response structure with interleaved thinking looks like:

[thinking_block_1] → [tool_use_1] → [thinking_block_2] → [tool_use_2] → [thinking_block_3] → [text]

You don't configure this separately — it activates automatically with adaptive thinking on supported models.

Biotech application: For a literature review agentic workflow that calls web search tools iteratively, interleaved thinking lets Claude reason after finding each paper: "This contradicts the prior result — I should search for the reconciling study before concluding."

03 · Tool use architecture

Claude tool use has two fundamentally different types:

Client tools — You define the tool schema. Claude returns a tool_use block. Your code executes the operation and returns a tool_result. Most custom integrations use this.

Server tools — Anthropic hosts and executes these: web_search, code_execution, web_fetch. Claude calls them and receives results directly. No execution code required on your end.

Defining a client tool:

tools = [
    {
        "name": "search_trial_database",
        "description": "Search the internal clinical trial database for trials matching specified criteria. Use this when the user asks about active trials, eligibility criteria, or trial status.",
        "input_schema": {
            "type": "object",
            "properties": {
                "indication": {"type": "string", "description": "Disease or condition"},
                "phase": {"type": "string", "enum": ["I", "II", "III", "IV"]},
                "status": {"type": "string", "enum": ["recruiting", "active", "completed"]}
            },
            "required": ["indication"]
        }
    }
]

Three rules for effective tool descriptions:

State when to use it. "Use this when the user asks about..." — Claude uses the description to decide whether to call the tool. A description without a trigger condition produces unpredictable tool-calling behavior.
State what it returns. Claude needs to know what to expect back to correctly interpret the result.
Add strict: true for guaranteed schema conformance. When your downstream code depends on specific field names and types, strict: true ensures Claude's tool call exactly matches your schema.

04 · Steering tool-calling behavior

Claude's default tool_choice: auto lets it decide whether to use a tool or respond directly. You can steer this:

To increase tool use:

Use the available tools to investigate before responding. Do not rely on your training knowledge for factual questions about our specific systems.

To force a specific tool:

tool_choice={"type": "tool", "name": "search_trial_database"}

To prevent tool use:

tool_choice={"type": "none"}

Parallel tool calling:

Claude 4.x models support parallel tool execution — making multiple tool calls in a single response. This happens automatically when Claude identifies independent operations, but you can boost it to near-100% with an explicit instruction:

If you intend to call multiple tools and there are no dependencies between the tool calls, make all independent tool calls in parallel. Prioritize calling tools simultaneously whenever the actions can be done in parallel rather than sequentially.

Add this to your system prompt for any agentic workflow where speed matters.

05 · Thinking display options

When adaptive thinking is enabled, you can control how the thinking content appears in the response:

Summarized (default on most Claude 4 models) — Claude provides a condensed version of its thinking:

thinking={"type": "adaptive", "display": "summarized"}

Omitted (faster streaming, default on Opus 4.8) — Thinking happens but the content is not included in the response. The signature field is preserved for multi-turn continuity:

thinking={"type": "adaptive", "display": "omitted"}

Use omitted when you don't need to show reasoning to users and want faster time-to-first-token. Use summarized when transparency into Claude's reasoning adds value to your users (e.g., a regulatory review workflow where showing the reasoning chain is part of the audit trail).

Multi-turn with thinking: If you're using adaptive thinking across a multi-turn conversation, you must preserve thinking blocks in the conversation history. Pass them back in subsequent requests unchanged. Dropping thinking blocks breaks the reasoning chain.

06 · Self-checking pattern

For tasks where accuracy is critical (clinical interpretation, statistical analysis, regulatory conclusions), add an explicit self-check step:

After drafting your analysis, verify your conclusions against the following criteria before finalizing:
1. Every factual claim is supported by a document quote included in your response
2. No calculation errors in any numerical reasoning
3. Conclusions are proportionate to the evidence presented — not stronger than the data supports
4. Any uncertainty is explicitly flagged rather than resolved silently

If any check fails, revise before outputting your final response.

This produces more reliable outputs on high-stakes analytical tasks without requiring a second API call for verification.

07 · Knowledge check

Q1. You're building an agentic regulatory intelligence tool on Opus 4.8 that calls multiple database tools. Which configuration is correct for this use case?

a) thinking: {type: "enabled", budget_tokens: 10000} — manually control reasoning depth b) thinking: {type: "adaptive"}, output_config: {effort: "xhigh"} — adaptive thinking at high effort c) No thinking configuration — adaptive thinking is on by default in Opus 4.8 d) thinking: {type: "adaptive"} with effort: "low" to reduce latency

Q2. Your tool-using agent is calling database search tools sequentially when they could run in parallel, doubling response latency. What is the correct fix?

a) Set tool_choice: {type: "auto"} — this enables parallel calling automatically b) Add a system prompt instruction: "Make all independent tool calls in parallel" c) Switch to server tools — they execute in parallel automatically d) Reduce the number of tools to force Claude to choose one at a time

Q3. You're using adaptive thinking in a multi-turn conversation and notice Claude's reasoning quality degrading after several turns. What is the most likely cause?

a) Thinking budget has been exhausted for the session b) Adaptive thinking doesn't persist across turns — re-enable each time c) Thinking blocks are being dropped from the conversation history before passing back to the API d) The effort parameter decreases automatically after several turns

Answers: Q1: b · Q2: b · Q3: c

End of Lesson 04.