Claude API streaming — real-time output with SSE

When you fetch a long answer from the Claude API, waiting on a blank screen until it's fully built hurts UX and risks dropped connections. Streaming returns the response chunk by chunk as it's produced, so you can show it immediately. This guide covers how to turn on streaming and the event flow. (Official docs: docs.claude.com · as of June 2026)

What streaming is, and why

Streaming sends the response in small pieces over SSE (server-sent events), and the client processes them as they arrive. It's especially useful for:

Chat UIs — text that flows like typing feels faster.
Long outputs — large max_tokens responses risk timeouts/dropped connections if awaited at once, so streaming is recommended.
Live progress — when you must show progress immediately.

How to turn it on: stream true

Add "stream": true to your request. curl example:

curl https://api.anthropic.com/v1/messages \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  -d '{"model":"","max_tokens":256,"messages":[{"role":"user","content":"Hello"}],"stream":true}'

Model IDs change, so check the model IDs & versioning guide and official docs.

Easier with the SDK

The official Python and TypeScript SDKs offer streaming helpers. Python example (just the text stream):

import anthropic
client = anthropic.Anthropic()
with client.messages.stream(
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}],
    model="",
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

If you don't need to handle chunks yourself, the SDKs can stream under the hood and return the complete Message object (e.g. Python get_final_message(), TypeScript finalMessage()). For SDK basics, see getting started with the SDK.

Understanding the event flow

If you integrate the API directly, you handle events yourself. One stream roughly flows like this:

message_start — begins with a Message object with empty content
content blocks — each is content_block_start → content_block_delta (multiple) → content_block_stop
message_delta — wrap-up changes
message_stop — end

ping events may appear in between. Text arrives as deltas inside content_block_delta; tool use and extended thinking can use different delta types (e.g. partial JSON, thinking deltas), so check the official events docs too.

Things to watch

Dropped connections — some networks cut idle connections. Use streaming for long work and prepare retry logic (official SDKs handle some retries automatically).
Delta types — concatenating only text can drop tool input or thinking content, so branch by the delta you need.
Model/event changes — model strings and event details may change; the official docs are the source of truth.

For the first call, see getting started with the Claude API; for cost savings (prompt caching, batch), see reducing cost; for tools, see the Tool Use guide.

For handling timeouts (504), limits (429) and retries, see handling errors and rate limits.

For enabling the reasoning (thinking) process, see the extended thinking guide.

Details such as model IDs and event types may change; verify the latest in the official docs (docs.claude.com). This site is not an official Anthropic site.

Claude API streaming — real-time output with SSE

What streaming is, and why

How to turn it on: stream true

Easier with the SDK

Understanding the event flow

Things to watch

Keep reading

Claude extended thinking — enabling and handling reasoning via the API

Handling Claude API errors and rate limits — 429, 529, and retries

Cutting Claude API Costs: Prompt Caching and the Batch API

Have a question or want to share how you use Claude?

What streaming is, and why

How to turn it on: stream true

Easier with the SDK

Understanding the event flow

Things to watch

Related guides

Keep reading

Claude extended thinking — enabling and handling reasoning via the API

Handling Claude API errors and rate limits — 429, 529, and retries

Cutting Claude API Costs: Prompt Caching and the Batch API

Have a question or want to share how you use Claude?