Claude SDK Streaming: SSE Events, Real-time Output & Error Recovery

Learn Claude streaming's SSE event flow (message_start→content_block_delta→message_stop), text/input_json/thinking delta types, real-time token output and final-message retrieval with the SDK, and generation-specific error recovery — based on official docs.

When building chatbots or apps with long responses on Claude, streaming is almost essential. Instead of receiving the whole response at once, you get tokens as they're generated and flow them to the screen — greatly improving perceived speed and avoiding HTTP timeouts on large max_tokens requests. This article summarizes the streaming event structure and SDK usage per the official docs.

스트리밍 이벤트 흐름 (SSE) message_start → (블록 시작·delta·정지) × N → message_delta → message_stop message_start빈 content content_block_start content_block_delta × 多text_delta / input_json_delta/ thinking_delta content_block_stop message_deltausage 누적 message_stop content block 단위로 반복 (각 block에 index) ※ 중간에 ping 이벤트, 드물게 error(예: overloaded_error) 이벤트가 섞일 수 있음

Streaming event flow

Streaming is delivered via SSE (Server-Sent Events); each event has a name and JSON data. Per the official docs, a single stream flows as:

  1. message_start — begins with a Message object whose content is empty.
  2. Multiple content blocks — each is content_block_start → one or more content_block_deltacontent_block_stop. Each block has an index pointing to its position in the final content array.
  3. One or more message_delta — top-level changes to the final Message (e.g., stop_reason). The usage token counts here are cumulative.
  4. A final message_stop.

ping events may appear in between, and during high usage an error event (e.g., overloaded_error, equivalent to HTTP 529 in non-streaming) can arrive within the stream. Per the versioning policy, new event types may be added, so handle unknown event types gracefully.

Delta types

Each content_block_delta carries a delta updating the block at a given index. Main types:

  • text_delta — a text fragment, e.g. {"type":"text_delta","text":"Hello"}
  • input_json_delta — updates to a tool_use block's input. It arrives as partial JSON strings, so accumulate until content_block_stop then parse, or use SDK helpers for incremental parsing. (The final tool_use.input is always an object.)
  • thinking_delta — reasoning fragments when streaming extended thinking. A signature_delta for integrity verification arrives once, just before content_block_stop.
SDK 스트리밍 처리, 두 가지 방식 ① 실시간 토큰 출력 도착하는 즉시 화면에 표시 for text in stream.text_stream: print(text, end="") → 챗 UI 타이핑 효과, 체감 응답속도 향상 이벤트 직접 처리 가능 ② 최종 메시지만 받기 스트리밍은 내부적으로, 결과는 통째로 with client.messages.stream(...) as stream: msg = stream.get_final_message() → 큰 max_tokens 요청에서 HTTP 타임아웃 방지용

Streaming with the SDK

The official Python and TypeScript SDKs offer multiple streaming approaches (Python supports both sync and async). The PHP SDK streams via createStream(). Two common patterns:

1) Real-time token output

To process tokens as they arrive, open the stream with a context manager and iterate text_stream.

client = anthropic.Anthropic()

with client.messages.stream(
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}],
    model="claude-opus-4-8",
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

2) Get the final message only

If you don't need real-time tokens, you can stream under the hood yet receive the complete Message object — identical to what .create() returns. It's especially useful to avoid HTTP timeouts on large max_tokens requests.

with client.messages.stream(
    max_tokens=128000,
    messages=[{"role": "user", "content": "Write a detailed analysis..."}],
    model="claude-opus-4-8",
) as stream:
    message = stream.get_final_message()
print(message.content[0].text)

Accumulation differs by language: Python get_final_message(), TypeScript finalMessage(), Go message.Accumulate(event) inside the loop, Java MessageAccumulator, Ruby .accumulated_message. Unless you're doing a direct HTTP integration, using the official SDK is recommended.

Error recovery mid-stream

If a stream is interrupted by network issues or timeouts, you can resume rather than re-fetch from scratch. The official docs note the method differs by model generation.

  • Claude 4.5 and earlier: place the received partial response as the beginning of an assistant message and continue.
  • Claude 4.6 and later: use a user message containing the partial response that instructs the model to continue (e.g., "Your previous response was interrupted and ended with [...]. Continue from where you left off.").

Note: tool_use and thinking blocks cannot be partially recovered; you can resume from the most recent text block. Where possible, leverage the SDK's message accumulation and error handling.

Summary

The essentials of streaming: (1) understand the SSE event flow (start→delta→stop), (2) handle delta types (text, input_json, thinking), (3) use the SDK for real-time output or final message, (4) handle unknown events/errors safely, and (5) use generation-specific recovery. Related reading: Getting Started with the Claude SDK, Error Handling & Retry Guide.

This article is based on public information from the official Anthropic docs (platform.claude.com/docs). SDK APIs and event formats can change between versions, so verify against the official docs when implementing. This site is not an official Anthropic site.

Keep reading

Have a question or want to share how you use Claude?

Join the community to share tips with other users, or explore more guides.