When you fetch a long answer from the Claude API, waiting on a blank screen until it's fully built hurts UX and risks dropped connections. Streaming returns the response chunk by chunk as it's produced, so you can show it immediately. This guide covers how to turn on streaming and the event flow. (Official docs: docs.claude.com · as of June 2026)
What streaming is, and why
Streaming sends the response in small pieces over SSE (server-sent events), and the client processes them as they arrive. It's especially useful for:
- Chat UIs — text that flows like typing feels faster.
- Long outputs — large
max_tokensresponses risk timeouts/dropped connections if awaited at once, so streaming is recommended. - Live progress — when you must show progress immediately.
How to turn it on: stream true
Add "stream": true to your request. curl example:
curl https://api.anthropic.com/v1/messages \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-H "content-type: application/json" \
-d '{"model":"","max_tokens":256,"messages":[{"role":"user","content":"Hello"}],"stream":true}'
Model IDs change, so check the model IDs & versioning guide and official docs.
Easier with the SDK
The official Python and TypeScript SDKs offer streaming helpers. Python example (just the text stream):
import anthropic
client = anthropic.Anthropic()
with client.messages.stream(
max_tokens=1024,
messages=[{"role": "user", "content": "Hello"}],
model="",
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
If you don't need to handle chunks yourself, the SDKs can stream under the hood and return the complete Message object (e.g. Python get_final_message(), TypeScript finalMessage()). For SDK basics, see getting started with the SDK.
Understanding the event flow
If you integrate the API directly, you handle events yourself. One stream roughly flows like this:
message_start— begins with a Message object with empty content- content blocks — each is
content_block_start→content_block_delta(multiple) →content_block_stop message_delta— wrap-up changesmessage_stop— end
ping events may appear in between. Text arrives as deltas inside content_block_delta; tool use and extended thinking can use different delta types (e.g. partial JSON, thinking deltas), so check the official events docs too.
Things to watch
- Dropped connections — some networks cut idle connections. Use streaming for long work and prepare retry logic (official SDKs handle some retries automatically).
- Delta types — concatenating only text can drop tool input or thinking content, so branch by the delta you need.
- Model/event changes — model strings and event details may change; the official docs are the source of truth.
Related guides
For the first call, see getting started with the Claude API; for cost savings (prompt caching, batch), see reducing cost; for tools, see the Tool Use guide.
For handling timeouts (504), limits (429) and retries, see handling errors and rate limits.
For enabling the reasoning (thinking) process, see the extended thinking guide.
Details such as model IDs and event types may change; verify the latest in the official docs (docs.claude.com). This site is not an official Anthropic site.