When building chatbots or apps with long responses on Claude, streaming is almost essential. Instead of receiving the whole response at once, you get tokens as they're generated and flow them to the screen — greatly improving perceived speed and avoiding HTTP timeouts on large max_tokens requests. This article summarizes the streaming event structure and SDK usage per the official docs.
Streaming event flow
Streaming is delivered via SSE (Server-Sent Events); each event has a name and JSON data. Per the official docs, a single stream flows as:
message_start— begins with aMessageobject whosecontentis empty.- Multiple content blocks — each is
content_block_start→ one or morecontent_block_delta→content_block_stop. Each block has anindexpointing to its position in the finalcontentarray. - One or more
message_delta— top-level changes to the finalMessage(e.g.,stop_reason). The usage token counts here are cumulative. - A final
message_stop.
ping events may appear in between, and during high usage an error event (e.g., overloaded_error, equivalent to HTTP 529 in non-streaming) can arrive within the stream. Per the versioning policy, new event types may be added, so handle unknown event types gracefully.
Delta types
Each content_block_delta carries a delta updating the block at a given index. Main types:
- text_delta — a text fragment, e.g.
{"type":"text_delta","text":"Hello"} - input_json_delta — updates to a
tool_useblock'sinput. It arrives as partial JSON strings, so accumulate untilcontent_block_stopthen parse, or use SDK helpers for incremental parsing. (The finaltool_use.inputis always an object.) - thinking_delta — reasoning fragments when streaming extended thinking. A
signature_deltafor integrity verification arrives once, just beforecontent_block_stop.
Streaming with the SDK
The official Python and TypeScript SDKs offer multiple streaming approaches (Python supports both sync and async). The PHP SDK streams via createStream(). Two common patterns:
1) Real-time token output
To process tokens as they arrive, open the stream with a context manager and iterate text_stream.
client = anthropic.Anthropic()
with client.messages.stream(
max_tokens=1024,
messages=[{"role": "user", "content": "Hello"}],
model="claude-opus-4-8",
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
2) Get the final message only
If you don't need real-time tokens, you can stream under the hood yet receive the complete Message object — identical to what .create() returns. It's especially useful to avoid HTTP timeouts on large max_tokens requests.
with client.messages.stream(
max_tokens=128000,
messages=[{"role": "user", "content": "Write a detailed analysis..."}],
model="claude-opus-4-8",
) as stream:
message = stream.get_final_message()
print(message.content[0].text)
Accumulation differs by language: Python get_final_message(), TypeScript finalMessage(), Go message.Accumulate(event) inside the loop, Java MessageAccumulator, Ruby .accumulated_message. Unless you're doing a direct HTTP integration, using the official SDK is recommended.
Error recovery mid-stream
If a stream is interrupted by network issues or timeouts, you can resume rather than re-fetch from scratch. The official docs note the method differs by model generation.
- Claude 4.5 and earlier: place the received partial response as the beginning of an assistant message and continue.
- Claude 4.6 and later: use a user message containing the partial response that instructs the model to continue (e.g., "Your previous response was interrupted and ended with [...]. Continue from where you left off.").
Note: tool_use and thinking blocks cannot be partially recovered; you can resume from the most recent text block. Where possible, leverage the SDK's message accumulation and error handling.
Summary
The essentials of streaming: (1) understand the SSE event flow (start→delta→stop), (2) handle delta types (text, input_json, thinking), (3) use the SDK for real-time output or final message, (4) handle unknown events/errors safely, and (5) use generation-specific recovery. Related reading: Getting Started with the Claude SDK, Error Handling & Retry Guide.
This article is based on public information from the official Anthropic docs (platform.claude.com/docs). SDK APIs and event formats can change between versions, so verify against the official docs when implementing. This site is not an official Anthropic site.