Claude Extended Thinking: Configurable Reasoning Boost

Allocate Thinking Budgets to Solve Hard Problems

Set thinking: {type: "enabled", budget_tokens: N} in Messages API requests, where N caps tokens for Claude's internal reasoning (e.g., 32k+ for thorough analysis on Claude 4 models). Claude generates thinking blocks with step-by-step logic, then incorporates it into text output. Larger budgets yield better quality on complex tasks but Claude often uses less than allocated. On Opus 4.6/Sonnet 4.6, switch to type: "adaptive" with effort parameter (deprecated manual mode works but vanishes soon). Must keep budget_tokens < max_tokens, though tool use allows exceeding via full context window. Output limits: 128k for Opus 4.6/Mythos Preview, 64k for Sonnet 4.6/Haiku 4.5, up to 300k batched with beta header.

For Claude 4, default display: "summarized" returns condensed thinking (full benefits, less misuse risk); Mythos defaults omitted. Full output needs sales contact. display: "omitted" skips thinking stream for faster first text token, delivering only signature (encrypted hash for verification)—ideal for non-user-facing apps, cutting wire overhead.

Example response: {"type": "thinking", "thinking": "Step-by-step...", "signature": "WaUjzkyp..."} followed by text.

Stream Thinking Without Delays

Stream via SSE: thinking_delta events deliver reasoning incrementally before text_delta. With omitted, skip deltas—get signature_delta instantly, text starts right after. Chunky delivery normal (batches for perf); expect delays between events, improving soon. Sequence: content_block_start (thinking) → deltas → stop → text start.

Omitted stream: block opens/closes post-signature, no thinking events, text immediate. Toggle display mid-convo ok, signature identical.

Tool Use Requires Preserving Thinking Blocks

Pairs with tools (tool_choice: "auto" or "none" only—forced choices error). Pass full unmodified thinking blocks from last assistant turn with tool results to maintain continuity; omit prior turns ok. Entire tool loop (think → tool_use → result → text) is one turn—can't toggle thinking mid-loop (auto-disables gracefully).

Interleaved thinking (beta interleaved-thinking-2025-05-14 header on Claude 4) lets Claude reason between tools post-results. Budget applies per think phase; can exceed max_tokens via context. Best: plan mode per turn, complete loops before switching.

Example: User query → thinking + tool_use → tool result (echo thinking) → more thinking/text.

Context and Tokens: Strict Limits Demand Planning

Claude 3.7+ enforces prompt + max_tokens (incl. thinking budget) ≤ context window—errors if over (unlike auto-adjust pre-3.7). Effective window: input - prev thinking + new thinking/encrypted + text. With tools: + prev thinking/tool tokens.

Prompt caching strips thinking for counts but preserve for tools/interleaved. Use 1-hour caches for long thinks. redacted_thinking blocks encrypt fully (no summary). Token count via API for multis. Trade-off: richer reasoning raises costs/latency but nails accuracy on math, analysis—test budgets empirically.