How to Reduce Claude Usage: 6 Habits to Hit Limits Less

If you keep hitting Claude limits — the mechanism (it re-reads the whole chat each message) plus practical token-saving habits: new chats, batching, trimming attachments, and more.

🌐 This article was machine-translated and may contain inaccuracies. Refer to the Korean original if in doubt.

If you keep hitting limits, start with the mechanism — Claude counts tokens, not the number of messages, and it re-reads the entire conversation on every message. So the longer a chat gets, the more each reply costs. Here are practical habits to hit limits less often. (For what the limits are and when they reset, see the usage limits guide.)

6 habits to save usage New task = new chat carry context as a short summary Batch questions splitting reloads context each time Trim attachments PDFs & images cost many tokens Split by task outline, write, edit in separate chats Fewer regenerations rewind & edit instead of pushing Lighter model for simple work (e.g. Haiku) Core idea: Claude re-reads the whole conversation every message, so longer chats make each reply more expensive.

Why long chats are expensive

Every time Claude generates a reply, it reprocesses all prior messages. Message 1 is nearly free, but at message 30 it re-reads the previous 29 before answering. The same question costs more as the chat grows. Every habit below comes back to one idea — cut the re-processing waste.

1. New task = new chat

When the topic changes, open a new chat. If you need prior context, ask "summarize what we've covered", copy that summary, and paste it as the first message of a new chat — carrying the context without the heavy history.

2. Batch your questions

Sending three things separately reloads the context three times. Ask them in a single message where you can.

3. Trim big attachments

PDFs, images, and screenshots use far more tokens than text (a single image/page is reported to cost on the order of thousands of tokens — exact values vary by content). Upload only what's needed, or pass it as text when possible.

4. Split work by task

Don't cram a big job into one chat. Breaking it into outline → write → edit across separate chats keeps each chat's context light.

5. Fewer regenerations

When a reply misses, pushing forward with "no, I meant…" re-reads everything each time. If it went badly off, instead of pushing ahead, edit an earlier message and retry from that point.

6. Lighter model for simple work

You don't need the top model for summaries or classification. If you can pick a model in chat, choose a lighter one (e.g. Haiku). See the model comparison.

If you use Claude Code

  • Plan mode (Shift+Tab): review the plan first and cut needless steps to avoid trial-and-error token waste.
  • /compact: clear context that no longer matters to lighten the session.
  • @file references: load content on demand instead of pasting long text.
  • CLAUDE.md: write down repeated explanations so you don't re-explain every time (how to write it).

Even applying one or two of these makes a difference. For the limits themselves, see the usage limits guide.

Disclaimer: The principle that "long chats use more tokens" is true, but specific token figures cited in some sources (e.g. tokens per message) are estimates that vary by content, model, and date. This article focuses on verifiable mechanisms. Limit/pricing policies change — verify in the official docs. This site is not affiliated with Anthropic.

Keep reading

Have a question or want to share how you use Claude?

Join the community to share tips with other users, or explore more guides.