Claude SDK Streaming: SSE Events, Real-time Output & Error Recovery

When building chatbots or apps with long responses on Claude, streaming is almost essential. Instead of receiving the whole response at once, you get tokens as they're generated and flow them to the screen — greatly improving perceived speed and avoiding HTTP timeouts on large max_tokens requests. This article summarizes the streaming event structure and SDK usage per the official docs.

Streaming event flow

Streaming is delivered via SSE (Server-Sent Events); each event has a name and JSON data. Per the official docs, a single stream flows as:

message_start — begins with a Message object whose content is empty.
Multiple content blocks — each is content_block_start → one or more content_block_delta → content_block_stop. Each block has an index pointing to its position in the final content array.
One or more message_delta — top-level changes to the final Message (e.g., stop_reason). The usage token counts here are cumulative.
A final message_stop.

ping events may appear in between, and during high usage an error event (e.g., overloaded_error, equivalent to HTTP 529 in non-streaming) can arrive within the stream. Per the versioning policy, new event types may be added, so handle unknown event types gracefully.

Delta types

Each content_block_delta carries a delta updating the block at a given index. Main types:

text_delta — a text fragment, e.g. {"type":"text_delta","text":"Hello"}
input_json_delta — updates to a tool_use block's input. It arrives as partial JSON strings, so accumulate until content_block_stop then parse, or use SDK helpers for incremental parsing. (The final tool_use.input is always an object.)
thinking_delta — reasoning fragments when streaming extended thinking. A signature_delta for integrity verification arrives once, just before content_block_stop.

Streaming with the SDK

The official Python and TypeScript SDKs offer multiple streaming approaches (Python supports both sync and async). The PHP SDK streams via createStream(). Two common patterns:

1) Real-time token output

To process tokens as they arrive, open the stream with a context manager and iterate text_stream.

client = anthropic.Anthropic()

with client.messages.stream(
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}],
    model="claude-opus-4-8",
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

2) Get the final message only

If you don't need real-time tokens, you can stream under the hood yet receive the complete Message object — identical to what .create() returns. It's especially useful to avoid HTTP timeouts on large max_tokens requests.

with client.messages.stream(
    max_tokens=128000,
    messages=[{"role": "user", "content": "Write a detailed analysis..."}],
    model="claude-opus-4-8",
) as stream:
    message = stream.get_final_message()
print(message.content[0].text)

Accumulation differs by language: Python get_final_message(), TypeScript finalMessage(), Go message.Accumulate(event) inside the loop, Java MessageAccumulator, Ruby .accumulated_message. Unless you're doing a direct HTTP integration, using the official SDK is recommended.

Error recovery mid-stream

If a stream is interrupted by network issues or timeouts, you can resume rather than re-fetch from scratch. The official docs note the method differs by model generation.

Claude 4.5 and earlier: place the received partial response as the beginning of an assistant message and continue.
Claude 4.6 and later: use a user message containing the partial response that instructs the model to continue (e.g., "Your previous response was interrupted and ended with [...]. Continue from where you left off.").

Note: tool_use and thinking blocks cannot be partially recovered; you can resume from the most recent text block. Where possible, leverage the SDK's message accumulation and error handling.

Summary

The essentials of streaming: (1) understand the SSE event flow (start→delta→stop), (2) handle delta types (text, input_json, thinking), (3) use the SDK for real-time output or final message, (4) handle unknown events/errors safely, and (5) use generation-specific recovery. Related reading: Getting Started with the Claude SDK, Error Handling & Retry Guide.

This article is based on public information from the official Anthropic docs (platform.claude.com/docs). SDK APIs and event formats can change between versions, so verify against the official docs when implementing. This site is not an official Anthropic site.

Claude SDK Streaming: SSE Events, Real-time Output & Error Recovery

Streaming event flow

Delta types

Streaming with the SDK

1) Real-time token output

2) Get the final message only

Error recovery mid-stream

Summary

Keep reading

Claude Agent SDK Guide: Claude Code's Agent Loop in Code

Getting Started with the Claude SDK (Python & TypeScript)

Creating Your First CLAUDE.md — Start with One Line

How to Read Claude Code Error Messages — A Beginner's Guide

Have a question or want to share how you use Claude?