Cutting Claude API Costs: Prompt Caching and the Batch API

The Claude API bills by token usage, so cost design matters for high-volume, repetitive work. Anthropic officially offers two features that cut costs substantially — prompt caching and the Batch API. This guide covers how they work and their official pricing. (As of June 2026. Prices may change, so check the official pricing for the latest.)

API billing basics

Claude API cost is computed per token, with input tokens (your prompt and context) and output tokens (the model's response) billed at different rates. So there are two ways to lower cost: (1) reduce the rate when you send the same input repeatedly (caching), or (2) reduce the overall rate for non-urgent work (batch).

Prompt caching: lower the rate on repeated context

If you send the same system prompt or a long reference document on every request, you can cache that part and reuse it. Adding a cache_control breakpoint to your message content makes everything before it cacheable. The first request writes to the cache; later requests read from it.

// 메시지 content에 cache_control 분기점(breakpoint)을 추가
{
  "type": "text",
  "text": "여기에 반복 재사용할 긴 시스템 프롬프트/참고 문서...",
  "cache_control": { "type": "ephemeral" }
}

By the official pricing table (Claude Opus 4.8, per MTok), the rates break down as follows.

Base input: $5 / MTok
5-minute cache write: $6.25 / MTok (about 1.25x base)
1-hour cache write: $10 / MTok (about 2x base)
Cache hit (read): $0.50 / MTok (0.1x base, about 90% off)

So caching costs slightly more on the first write, but pays off the more times you read the same context. Each model has a minimum token requirement for a cache breakpoint; below it, caching may not apply — check the official prompt caching docs for exact conditions.

Batch API: 50% off the overall rate

If the task isn't something a user is waiting on right now — bulk document processing, data classification, mass content generation — the Batch API fits. Bundle requests and submit them asynchronously, and both input and output tokens are charged at 50% of standard prices (officially stated, with no minimum-volume threshold).

// 배치는 요청들의 리스트로 구성 (각 요청에 custom_id + params)
{
  "custom_id": "req-1",
  "params": {
    "model": "claude-opus-4-8",
    "max_tokens": 1024,
    "messages": [{ "role": "user", "content": "..." }]
  }
}

Each request includes a custom_id (1–64 chars, alphanumeric/hyphen/underscore) to identify the result and a params object with standard Messages parameters. Batches are processed asynchronously and concurrently, and you retrieve results later.

The two stack

The official docs note that prompt caching and Message Batch discounts can stack. For high-volume work that shares context across requests, you can take 50% off overall with batch and further reduce repeated context with caching. Since batches can take longer than 5 minutes, using the 1-hour cache is recommended for better hit rates.

When to use which

Sending the same system prompt/document repeatedly → prompt caching
High-volume work with no need for an immediate response → Batch API
Both (shared context + non-real-time volume) → use them together
Other tips: pick the right model for the task (Haiku for simple, Sonnet for most production, Opus for the most complex reasoning), and monitor usage.

Summary

Key points: (1) the API bills per token with different input/output rates; (2) prompt caching drops cache reads of repeated context to 0.1x base input (~90% off); (3) batch is 50% off both input and output; (4) the two can stack. For exact, current prices and conditions, see the official pricing and the batch/caching docs.

The figures here are taken directly from the official Claude pricing table (docs.claude.com, as of June 2026, Opus 4.8). Prices and policies may change, so always verify the official docs before estimating real costs. (Official source: docs.claude.com)

Cutting Claude API Costs: Prompt Caching and the Batch API

API billing basics

Prompt caching: lower the rate on repeated context

Batch API: 50% off the overall rate

The two stack

When to use which

Summary

Keep reading

What Is Claude Cowork? Claude's File-Working Desktop Mode

Claude API Model IDs and Versioning: Aliases, Pinned Snapshots, and Migration

Claude Skills Library: Where to Get Which Skills (2026)

Ready-to-Use Claude Prompt Templates: Practical Examples by Task

Have a question or want to share how you use Claude?