Handling Claude API errors and rate limits — 429, 529, and retries

Core to production reliability. The main error codes, 429 (limit) vs 529 (server overload), exponential-backoff retries, and how to hit limits less — per the official docs.

Running the Claude API in production, you'll hit rate limits (429) and temporary server overload (529). Reliable apps anticipate these and retry sensibly. This guide covers the main error codes, the 429-vs-529 difference, and a retry strategy, grounded in the official docs. (Official: docs.claude.com/errors · as of June 2026)

Errors fall into three groupsFix your request (4xx)400 bad format/content401 auth (key) issue402 billing issue403 no permission404 not found413 request too largeOver limit (429)rate_limit_errorwait retry-after,then retryServer side (5xx, 529)500 internal error504 timeout529 overloaded→ retry with backoffCodes/type values may grow — per official errors docs · as of 2026.6

Main HTTP error codes

Per the official docs, the API follows a predictable HTTP error-code format.

  • 400 invalid_request_error — request format/content issue (may also cover other 4XX)
  • 401 authentication_error — API key issue
  • 402 billing_error — billing/payment issue
  • 403 permission_error — key lacks permission for the resource
  • 404 not_found_error — resource not found
  • 413 request_too_large — request exceeds the size limit (e.g. Messages API 32MB)
  • 429 rate_limit_error — rate limit exceeded
  • 500 api_error — unexpected internal error at Anthropic
  • 504 timeout_error — timed out while processing (use streaming for long requests)
  • 529 overloaded_error — API temporarily overloaded

Errors are always JSON, with an error object holding type and message, plus a request_id for tracking. Note that streaming (SSE) can error after a 200 response, so it may not follow the standard mechanism.

What a 429 rate limit is

Messages API limits are measured in requests per minute (RPM), input tokens per minute (ITPM), and output tokens per minute (OTPM), per model class. Exceed a limit and you get 429 with a retry-after header telling you how long to wait. The anthropic-ratelimit-* headers indicate which limit triggered it.

Note that for most models, cached input tokens don't count toward ITPM — which is why prompt caching effectively widens your limits (see reducing cost).

429 and 529 are different

  • 429 (rate_limit_error)your usage exceeded a limit. Slow your request rate and follow retry-after.
  • 529 (overloaded_error)server-side temporary overload, not your fault. Back off and retry calmly.

They look similar but the diagnosis differs. If it persists, check service health at status.claude.com.

How to retryRequestif it fails?429prefer retry-after500/504/529exp. backoff + jitterother 4xxfix request, don\u2019t retryRetry, boundedup to a max countOfficial SDKs auto-retry some cases — avoid infinite/parallel retries · as of 2026.6

Retry strategy

The core idea is exponential backoff with a bit of jitter, bounded to a max attempt count. For 429, prefer the retry-after value; for 500/504/529, increase the interval and retry. Other 4xx (400/401/403/404…) should be fixed, not retried.

import anthropic, time
client = anthropic.Anthropic()

def call_with_retry(**kwargs):
    for attempt in range(5):  # up to 5 tries
        try:
            return client.messages.create(**kwargs)
        except anthropic.RateLimitError:        # 429
            time.sleep(2 ** attempt)            # 1s,2s,4s... (prefer retry-after if present)
        except anthropic.APIStatusError as e:   # 500/504/529 etc.
            if e.status_code in (500, 504, 529):
                time.sleep(2 ** attempt)
            else:
                raise                            # other 4xx must be fixed
    raise RuntimeError("retries exhausted")

The official SDKs auto-retry some transient errors. Avoid infinite or parallel-flood retries — they make things worse.

Hitting limits less

  • Ramp gradually — sudden spikes can trip acceleration limits (429), so increase traffic slowly.
  • Prompt caching — cached input usually doesn't count toward ITPM, raising effective throughput.
  • Monitor — use the Usage page in the Claude Console to see headroom and peak use.

For the first call, see getting started with the Claude API; for long answers/timeouts, see streaming; for cost/caching, see reducing cost.

Error codes, limit figures, and headers may change (officially, type values may grow); verify the latest in the official docs (docs.claude.com). This site is not an official Anthropic site.

Keep reading

Have a question or want to share how you use Claude?

Join the community to share tips with other users, or explore more guides.