If you keep hitting limits, start with the mechanism — Claude counts tokens, not the number of messages, and it re-reads the entire conversation on every message. So the longer a chat gets, the more each reply costs. Here are practical habits to hit limits less often. (For what the limits are and when they reset, see the usage limits guide.)
Why long chats are expensive
Every time Claude generates a reply, it reprocesses all prior messages. Message 1 is nearly free, but at message 30 it re-reads the previous 29 before answering. The same question costs more as the chat grows. Every habit below comes back to one idea — cut the re-processing waste.
1. New task = new chat
When the topic changes, open a new chat. If you need prior context, ask "summarize what we've covered", copy that summary, and paste it as the first message of a new chat — carrying the context without the heavy history.
2. Batch your questions
Sending three things separately reloads the context three times. Ask them in a single message where you can.
3. Trim big attachments
PDFs, images, and screenshots use far more tokens than text (a single image/page is reported to cost on the order of thousands of tokens — exact values vary by content). Upload only what's needed, or pass it as text when possible.
4. Split work by task
Don't cram a big job into one chat. Breaking it into outline → write → edit across separate chats keeps each chat's context light.
5. Fewer regenerations
When a reply misses, pushing forward with "no, I meant…" re-reads everything each time. If it went badly off, instead of pushing ahead, edit an earlier message and retry from that point.
6. Lighter model for simple work
You don't need the top model for summaries or classification. If you can pick a model in chat, choose a lighter one (e.g. Haiku). See the model comparison.
If you use Claude Code
- Plan mode (Shift+Tab): review the plan first and cut needless steps to avoid trial-and-error token waste.
- /compact: clear context that no longer matters to lighten the session.
- @file references: load content on demand instead of pasting long text.
- CLAUDE.md: write down repeated explanations so you don't re-explain every time (how to write it).
Even applying one or two of these makes a difference. For the limits themselves, see the usage limits guide.