Claude Opus 4.8 New Features Guide: Fast Mode, Mid-Conversation System Messages, effort

A practical look at what is new in Opus 4.8 — fast mode, mid-conversation system messages, the new effort default, prompt-caching improvements, and adaptive thinking.

Claude Opus 4.8 keeps the same tools and platform features as the previous-generation Opus 4.7, while adding a handful of new capabilities and behaviors. This guide focuses on the changes that are genuinely noticeable or immediately useful in development. For spec comparisons and what changed from 4.7, see the separate "Opus 4.8 complete guide."

🟢 Up to date as of June 2026 · Current lineup: Claude Opus 4.8 / Claude Sonnet 4.6 / Claude Haiku 4.5. This notice updates automatically when a new model ships.
What is new in Opus 4.8Fast modeOutput speedup to 2.5xAPI previewMid-convosystem messageinsertablecache kepteffortdefault highdeep reasoningadjustablePrompt cachemin 1,024tokensshort cachedAdaptivethinkingonly as neededoff by default

1. Fast mode — same model, faster output

Fast mode is offered as a research preview in the Claude API (not a general release). Set speed: "fast" on a request and the same Opus 4.8 model raises output tokens per second by up to 2.5x. In exchange, pricing is premium (higher). It helps when you need a long response quickly, but since cost rises, use it only on calls that truly need it.

2. Mid-conversation system messages — change instructions without breaking the cache

Until now, the system message (base instructions) could only sit at the very start of a conversation. From Opus 4.8 you can also place a role: "system" message after a user turn. As a result, adding new instructions midway through a long conversation preserves the earlier prompt cache, so you continue quickly without paying to recompute it. No beta header is needed. This is especially beneficial for long-running agentic work.

3. effort now defaults to 'high'

effort controls how deeply the model thinks before answering. In Opus 4.8 the default is high across all environments, including the Claude API and Claude Code. That means if you set nothing, it runs in the most thorough reasoning mode. If you want fast, lightweight responses, set effort lower explicitly.

4. Prompt cache minimum length is now 1,024 tokens

Prompt caching saves and reuses repeated leading content to cut cost and time. In Opus 4.7 the minimum cacheable length was longer, so short prompts were not cached. In Opus 4.8 that minimum drops to 1,024 tokens, bringing shorter prompts into caching range with no code changes.

5. Adaptive thinking — think only when needed

Opus 4.8 supports a single 'thinking' mode: adaptive thinking. When enabled (thinking: {type: "adaptive"}), the model decides each turn on its own — answering simple lookups directly and reasoning first on complex, multi-step problems. So it does not spend thinking tokens on easy tasks. Note that unless you explicitly enable it, thinking is off by default. The previous generation's 'extended thinking budget (budget_tokens)' approach is not supported.

Other improvements

According to official documentation, Opus 4.8 improves on 4.7 in long-horizon agentic coding (less compaction and better compaction recovery), reasoning-effort calibration, and fewer skipped tool calls. The stop_details carried in refusal responses is now officially documented, making it easier for applications to handle refusal types distinctly.

Good to know

Among these, fast mode, system messages, effort, and prompt caching mainly apply when you build with the Claude API. If you use Opus 4.8 in regular chat (claude.ai), you will mostly feel improvements in response quality and speed and in the stability of long tasks. Also, like 4.7, Opus 4.8 does not support changing sampling values such as temperature, top_p, or top_k (defaults only).

Related: Claude Model Selection Guide.

Keep reading

Have a question or want to share how you use Claude?

Join the community to share tips with other users, or explore more guides.