Claude extended thinking — enabling and handling reasoning via the API

Extended thinking lets Claude reason step by step before giving its final answer. On complex math, coding, or analysis where deep thinking helps, it can improve answer quality. This guide covers how to enable it via the API and what to watch for, grounded in the official docs. (Official: docs.claude.com · as of June 2026)

What extended thinking is

With extended thinking on, Claude first writes its reasoning in a thinking block, then incorporates it into a final answer (text block). The response includes the reasoning block followed by the answer. It helps most on problems that need step-by-step thought, rather than simple everyday tasks.

How to enable it

Add a thinking object to the request. Traditionally you set type: "enabled" with a reasoning-token cap, budget_tokens.

import anthropic
client = anthropic.Anthropic()
msg = client.messages.create(
    model="<model ID>",
    max_tokens=2048,
    thinking={"type": "enabled", "budget_tokens": 1024},  # budget_tokens < max_tokens
    messages=[{"role": "user", "content": "Solve this step by step"}],
)

In the latest model line, however, the recommendation is adaptive thinking, where you set "how hard to think" via an effort parameter instead of a fixed token count. On some recent models the budget_tokens approach is marked deprecated (to be removed later). Exact parameters and supported models change, so check the official docs. Note the minimum reasoning budget is documented as 1,024 tokens; start small and increase gradually.

Summarized thinking and billing

On Claude 4 models, the Messages API returns a summary of the reasoning, not the full stream. Importantly, billing is based on the full thinking tokens generated, not the summary — so the billed output token count may differ from what you see. You can track reasoning tokens via usage.output_tokens_details.thinking_tokens in the response; when streaming, this appears only on the final message_delta.

When to use it

Extended thinking suits complex tasks that benefit from step-by-step reasoning — math, tricky coding/debugging, multi-step analysis. For light tasks like simple queries, summaries, or structured transforms, you usually don't need it. Reasoning can make responses slower and costlier, so choose per task.

Things to watch

Incompatible settings — with thinking on, some of temperature/top_p/top_k are restricted (e.g. temperature only 1).
Response time — extra processing can make responses slower.
Large budgets — reasoning above 32k is best run via batch to avoid timeouts/connection limits (see reducing cost).
Volatility — parameters (adaptive, effort, budget_tokens) and supported models may change; the official docs are the source of truth.

In streaming, reasoning arrives as thinking_delta (see streaming). For errors/limits, see handling errors and rate limits; for the first call, see getting started with the Claude API.

Conversely, if a fixed output shape matters more, structured outputs may fit better.

For when to use extended thinking, web search, and Research, see the web search & Research guide.

Extended-thinking parameters (adaptive, effort, budget_tokens), supported models, and the minimum budget may change (some are marked deprecated); verify the latest in the official docs (docs.claude.com). This site is not an official Anthropic site.

Claude extended thinking — enabling and handling reasoning via the API

What extended thinking is

How to enable it

Summarized thinking and billing

When to use it

Things to watch

Keep reading

Handling Claude API errors and rate limits — 429, 529, and retries

Claude API streaming — real-time output with SSE

Cutting Claude API Costs: Prompt Caching and the Batch API

Have a question or want to share how you use Claude?

What extended thinking is

How to enable it

Summarized thinking and billing

When to use it

Things to watch

Related guides

Keep reading

Handling Claude API errors and rate limits — 429, 529, and retries

Claude API streaming — real-time output with SSE

Cutting Claude API Costs: Prompt Caching and the Batch API

Have a question or want to share how you use Claude?