Dynamically Optimize Reasoning with Adaptive Mode

Adaptive thinking replaces deprecated manual budgets on Claude Opus 4.6, Sonnet 4.6, and default on Claude Mythos Preview. Set thinking: {type: "adaptive"} in API requests—Claude assesses request complexity to decide if/when to think, skipping for simple queries at low effort. This outperforms fixed budget_tokens on bimodal tasks and long agentic workflows by allocating reasoning precisely. It auto-enables interleaved thinking between tool calls, boosting agent performance without manual config.

Example curl:

curl https://api.anthropic.com/v1/messages \
--header "x-api-key: $ANTHROPIC_API_KEY" \
--header "anthropic-version: 2023-06-01" \
--data '{ "model": "claude-opus-4-6", "max_tokens": 16000, "thinking": {"type": "adaptive"}, "messages": [{"role": "user", "content": "Explain why the sum of two even numbers is always even."}] }'

Streaming works via thinking_delta events, matching manual mode.

Older models (Sonnet 4.5+) stick to thinking.type: "enabled" + budget_tokens.

Tune Depth with Effort Parameter

Pair adaptive with output_config: {effort: "level"} for soft guidance:

EffortBehavior
maxUnconstrained deep thinking (Opus/Sonnet 4.6 only)
high (default)Always thinks deeply on complex tasks
mediumModerate; skips very simple queries
lowMinimal; prioritizes speed, skips simple tasks

Use medium/low for latency-sensitive apps. Prompt-tune via system instructions like: "Extended thinking adds latency—use only for multi-step reasoning."

max_tokens caps total (thinking + output); high/max effort risks stop_reason: "max_tokens"—increase limit or drop effort.

Control Output and Costs Effectively

Default display: "summarized" returns thinking summary (full intelligence, prevents misuse); Mythos Preview defaults to omitted—set explicitly for summary. Use display: "omitted" to skip streaming thinking entirely, speeding time-to-first-text-token (streams only signature for verification).

Example: thinking: {type: "adaptive", display: "omitted"}. Signature verifies thinking on tool-use callbacks—pass full blocks back unchanged.

Switching modes breaks prompt cache breakpoints (system/tools cache regardless). Billed for full thinking process, even if omitted/summarized—output tokens exceed visible count. Specialized system prompt auto-included.

ModeUse WhenConfig
AdaptiveDefault for complex/agentic{type: "adaptive"} + effort
ManualPrecise token control{type: "enabled", budget_tokens: N} (deprecated on 4.6)
DisabledLowest latencyOmit or {type: "disabled"}

Migrate from enabled/budget_tokens now—removed soon. ZDR eligible: no post-response storage.