Claude API Context Windows and Tokens: Limits and Management

In the Claude API, the context window is the total capacity a model can handle in one request. Your input — system prompt, conversation history, attached documents, tool results — and the model’s generated output all count against the same limit. This guide covers what goes into the context window, what happens when you exceed it, and how to plan and manage tokens, based on the official docs. (As of June 2026. Per-model limits and behavior may change — see the official context windows docs.)

What counts against the window

Tokens in a request are not just the reply text. The system prompt, the accumulated conversation history (every prior user and assistant turn), attached documents and images, tool definitions, and tool results all count as input tokens — and the output the model generates must fit inside the same limit. The standard context window is 200K tokens, with support for up to 1M tokens (varies by model and conditions — check the official model comparison table). In multi-turn conversations and agentic workflows, tool results accumulate at every step, so the real thing to manage is usually cumulative growth rather than a single request.

One exception: with extended thinking, thinking tokens are billed as output within max_tokens, but previous turns’ thinking blocks are automatically stripped from the context calculation by the API. You do not remove them yourself, and they do not eat your conversation capacity.

What happens when you exceed the limit

Per the official docs, behavior differs by model generation. On Claude 4.5 models and newer, the API accepts a request even if input + max_tokens exceeds the window; if generation actually reaches the limit, it stops with stop_reason: "model_context_window_exceeded". Earlier models returned a validation error instead (a beta header opts into the newer behavior). If the response is cut off by max_tokens, you get stop_reason: "max_tokens". In both cases, check stop_reason in your code and handle it — warn the user, continue generation, and so on. For error handling in general, see the errors and rate limits guide.

Token counting API — count before you send

The starting point for limit management is the token counting API. It accepts the same structured input as message creation (including system prompts, tools, images, and PDFs) and returns the total input token count. Call it before the real request to avoid overflows and unintended long-context costs. Since tokens are cost, pair this with the cost optimization guide (prompt caching and batches).

max_tokens and rate limits

max_tokens caps output length. Per the official rate limits docs, output-tokens-per-minute (OTPM) limits are evaluated in real time on actually generated tokens; the max_tokens value itself does not factor into OTPM. So setting it generously to avoid truncation carries no rate-limit downside.

Long conversations: use compaction

As a conversation approaches the limit, the official docs recommend server-side compaction (summarizing the history) as the primary strategy, with context editing for finer control such as clearing tool results and thinking blocks. For app-side tips on long chats in claude.ai, see keeping context in long conversations. For choosing and pinning model IDs, see model IDs and versioning.

The limits here (200K, 1M) and behaviors reflect the official documentation as of June 2026; they vary by model and plan and may change. For exact per-model context sizes, see the official docs and the model comparison table. This site is not an official Anthropic site.

Claude API Context Windows and Tokens: Limits and Management

What counts against the window

What happens when you exceed the limit

Token counting API — count before you send

max_tokens and rate limits

Long conversations: use compaction

Keep reading

Claude API Model IDs and Versioning: Aliases, Pinned Snapshots, and Migration

The GET /v1/environments/{id}/work endpoint, which lists pending work for a self

Claude Code 2.1.174

DXC will integrate Claude into the systems banks, airlines, and other regulated industries rely on

Have a question or want to share how you use Claude?