Handling Claude API errors and rate limits — 429, 529, and retries

Running the Claude API in production, you'll hit rate limits (429) and temporary server overload (529). Reliable apps anticipate these and retry sensibly. This guide covers the main error codes, the 429-vs-529 difference, and a retry strategy, grounded in the official docs. (Official: docs.claude.com/errors · as of June 2026)

Main HTTP error codes

Per the official docs, the API follows a predictable HTTP error-code format.

400 invalid_request_error — request format/content issue (may also cover other 4XX)
401 authentication_error — API key issue
402 billing_error — billing/payment issue
403 permission_error — key lacks permission for the resource
404 not_found_error — resource not found
413 request_too_large — request exceeds the size limit (e.g. Messages API 32MB)
429 rate_limit_error — rate limit exceeded
500 api_error — unexpected internal error at Anthropic
504 timeout_error — timed out while processing (use streaming for long requests)
529 overloaded_error — API temporarily overloaded

Errors are always JSON, with an error object holding type and message, plus a request_id for tracking. Note that streaming (SSE) can error after a 200 response, so it may not follow the standard mechanism.

What a 429 rate limit is

Messages API limits are measured in requests per minute (RPM), input tokens per minute (ITPM), and output tokens per minute (OTPM), per model class. Exceed a limit and you get 429 with a retry-after header telling you how long to wait. The anthropic-ratelimit-* headers indicate which limit triggered it.

Note that for most models, cached input tokens don't count toward ITPM — which is why prompt caching effectively widens your limits (see reducing cost).

429 and 529 are different

429 (rate_limit_error) — your usage exceeded a limit. Slow your request rate and follow retry-after.
529 (overloaded_error) — server-side temporary overload, not your fault. Back off and retry calmly.

They look similar but the diagnosis differs. If it persists, check service health at status.claude.com.

Retry strategy

The core idea is exponential backoff with a bit of jitter, bounded to a max attempt count. For 429, prefer the retry-after value; for 500/504/529, increase the interval and retry. Other 4xx (400/401/403/404…) should be fixed, not retried.

import anthropic, time
client = anthropic.Anthropic()

def call_with_retry(**kwargs):
    for attempt in range(5):  # up to 5 tries
        try:
            return client.messages.create(**kwargs)
        except anthropic.RateLimitError:        # 429
            time.sleep(2 ** attempt)            # 1s,2s,4s... (prefer retry-after if present)
        except anthropic.APIStatusError as e:   # 500/504/529 etc.
            if e.status_code in (500, 504, 529):
                time.sleep(2 ** attempt)
            else:
                raise                            # other 4xx must be fixed
    raise RuntimeError("retries exhausted")

The official SDKs auto-retry some transient errors. Avoid infinite or parallel-flood retries — they make things worse.

Hitting limits less

Ramp gradually — sudden spikes can trip acceleration limits (429), so increase traffic slowly.
Prompt caching — cached input usually doesn't count toward ITPM, raising effective throughput.
Monitor — use the Usage page in the Claude Console to see headroom and peak use.

For the first call, see getting started with the Claude API; for long answers/timeouts, see streaming; for cost/caching, see reducing cost.

Error codes, limit figures, and headers may change (officially, type values may grow); verify the latest in the official docs (docs.claude.com). This site is not an official Anthropic site.

Handling Claude API errors and rate limits — 429, 529, and retries

Main HTTP error codes

What a 429 rate limit is

429 and 529 are different

Retry strategy

Hitting limits less

Keep reading

Claude extended thinking — enabling and handling reasoning via the API

Claude API streaming — real-time output with SSE

Cutting Claude API Costs: Prompt Caching and the Batch API

Have a question or want to share how you use Claude?

Main HTTP error codes

What a 429 rate limit is

429 and 529 are different

Retry strategy

Hitting limits less

Related guides

Keep reading

Claude extended thinking — enabling and handling reasoning via the API

Claude API streaming — real-time output with SSE

Cutting Claude API Costs: Prompt Caching and the Batch API

Have a question or want to share how you use Claude?