Extended thinking lets Claude reason step by step before giving its final answer. On complex math, coding, or analysis where deep thinking helps, it can improve answer quality. This guide covers how to enable it via the API and what to watch for, grounded in the official docs. (Official: docs.claude.com · as of June 2026)
What extended thinking is
With extended thinking on, Claude first writes its reasoning in a thinking block, then incorporates it into a final answer (text block). The response includes the reasoning block followed by the answer. It helps most on problems that need step-by-step thought, rather than simple everyday tasks.
How to enable it
Add a thinking object to the request. Traditionally you set type: "enabled" with a reasoning-token cap, budget_tokens.
import anthropic
client = anthropic.Anthropic()
msg = client.messages.create(
model="<model ID>",
max_tokens=2048,
thinking={"type": "enabled", "budget_tokens": 1024}, # budget_tokens < max_tokens
messages=[{"role": "user", "content": "Solve this step by step"}],
)
In the latest model line, however, the recommendation is adaptive thinking, where you set "how hard to think" via an effort parameter instead of a fixed token count. On some recent models the budget_tokens approach is marked deprecated (to be removed later). Exact parameters and supported models change, so check the official docs. Note the minimum reasoning budget is documented as 1,024 tokens; start small and increase gradually.
Summarized thinking and billing
On Claude 4 models, the Messages API returns a summary of the reasoning, not the full stream. Importantly, billing is based on the full thinking tokens generated, not the summary — so the billed output token count may differ from what you see. You can track reasoning tokens via usage.output_tokens_details.thinking_tokens in the response; when streaming, this appears only on the final message_delta.
When to use it
Extended thinking suits complex tasks that benefit from step-by-step reasoning — math, tricky coding/debugging, multi-step analysis. For light tasks like simple queries, summaries, or structured transforms, you usually don't need it. Reasoning can make responses slower and costlier, so choose per task.
Things to watch
- Incompatible settings — with thinking on, some of
temperature/top_p/top_kare restricted (e.g. temperature only 1). - Response time — extra processing can make responses slower.
- Large budgets — reasoning above 32k is best run via batch to avoid timeouts/connection limits (see reducing cost).
- Volatility — parameters (adaptive, effort, budget_tokens) and supported models may change; the official docs are the source of truth.
Related guides
In streaming, reasoning arrives as thinking_delta (see streaming). For errors/limits, see handling errors and rate limits; for the first call, see getting started with the Claude API.
Conversely, if a fixed output shape matters more, structured outputs may fit better.
For when to use extended thinking, web search, and Research, see the web search & Research guide.
Extended-thinking parameters (adaptive, effort, budget_tokens), supported models, and the minimum budget may change (some are marked deprecated); verify the latest in the official docs (docs.claude.com). This site is not an official Anthropic site.