Cutting Claude API Costs: Prompt Caching and the Batch API

Two features that cut Claude API costs: prompt caching and the Batch API. By the official pricing — cache hits at 0.1x (~90% off) and batch at 50% off — plus cache_control breakpoints, custom_id, and how the two discounts stack.

The Claude API bills by token usage, so cost design matters for high-volume, repetitive work. Anthropic officially offers two features that cut costs substantially — prompt caching and the Batch API. This guide covers how they work and their official pricing. (As of June 2026. Prices may change, so check the official pricing for the latest.)

Two axes of cost savings Prompt caching reuse the same repeated context cache read = 0.1x input (~90% off) Batch API run non-urgent work async 50% off input AND output The two discounts can stack (per official docs)

API billing basics

Claude API cost is computed per token, with input tokens (your prompt and context) and output tokens (the model's response) billed at different rates. So there are two ways to lower cost: (1) reduce the rate when you send the same input repeatedly (caching), or (2) reduce the overall rate for non-urgent work (batch).

Prompt caching: lower the rate on repeated context

If you send the same system prompt or a long reference document on every request, you can cache that part and reuse it. Adding a cache_control breakpoint to your message content makes everything before it cacheable. The first request writes to the cache; later requests read from it.

// 메시지 content에 cache_control 분기점(breakpoint)을 추가
{
  "type": "text",
  "text": "여기에 반복 재사용할 긴 시스템 프롬프트/참고 문서...",
  "cache_control": { "type": "ephemeral" }
}

By the official pricing table (Claude Opus 4.8, per MTok), the rates break down as follows.

Claude Opus 4.8 input token rates (per MTok, official) Base input $5.00 5m cache write $6.25 1h cache write $10 Cache hit (read) $0.50 (~90% off) Source: official Claude pricing (docs.claude.com). As of June 2026 · prices may change
  • Base input: $5 / MTok
  • 5-minute cache write: $6.25 / MTok (about 1.25x base)
  • 1-hour cache write: $10 / MTok (about 2x base)
  • Cache hit (read): $0.50 / MTok (0.1x base, about 90% off)

So caching costs slightly more on the first write, but pays off the more times you read the same context. Each model has a minimum token requirement for a cache breakpoint; below it, caching may not apply — check the official prompt caching docs for exact conditions.

Batch API: 50% off the overall rate

If the task isn't something a user is waiting on right now — bulk document processing, data classification, mass content generation — the Batch API fits. Bundle requests and submit them asynchronously, and both input and output tokens are charged at 50% of standard prices (officially stated, with no minimum-volume threshold).

// 배치는 요청들의 리스트로 구성 (각 요청에 custom_id + params)
{
  "custom_id": "req-1",
  "params": {
    "model": "claude-opus-4-8",
    "max_tokens": 1024,
    "messages": [{ "role": "user", "content": "..." }]
  }
}

Each request includes a custom_id (1–64 chars, alphanumeric/hyphen/underscore) to identify the result and a params object with standard Messages parameters. Batches are processed asynchronously and concurrently, and you retrieve results later.

The two stack

The official docs note that prompt caching and Message Batch discounts can stack. For high-volume work that shares context across requests, you can take 50% off overall with batch and further reduce repeated context with caching. Since batches can take longer than 5 minutes, using the 1-hour cache is recommended for better hit rates.

When to use which

  • Sending the same system prompt/document repeatedly → prompt caching
  • High-volume work with no need for an immediate response → Batch API
  • Both (shared context + non-real-time volume) → use them together
  • Other tips: pick the right model for the task (Haiku for simple, Sonnet for most production, Opus for the most complex reasoning), and monitor usage.

Summary

Key points: (1) the API bills per token with different input/output rates; (2) prompt caching drops cache reads of repeated context to 0.1x base input (~90% off); (3) batch is 50% off both input and output; (4) the two can stack. For exact, current prices and conditions, see the official pricing and the batch/caching docs.

The figures here are taken directly from the official Claude pricing table (docs.claude.com, as of June 2026, Opus 4.8). Prices and policies may change, so always verify the official docs before estimating real costs. (Official source: docs.claude.com)

Keep reading

Have a question or want to share how you use Claude?

Join the community to share tips with other users, or explore more guides.