← Back to Blog

Claude Code Rate Limits and How to Stop Hitting Them (2026).

Claude Code rate limits, how to check your usage, the 5-hour and weekly caps explained, plus 9 ways to cut token usage by 80% without changing what you ship.

Tom CrawshawBy Tom Crawshaw·

Claude Code rate limits are usage caps that throttle or block your session when you cross either a 5-hour rolling window or a weekly threshold on your plan. The fastest way to stop hitting them is to switch your default model to Sonnet, compact every 20-30 messages, scope subagents through plan mode, and install the token-optimizer skill.

A simple gauge with the needle nearly touching the orange upper-limit zone
The gauge that decides when your session pauses. The trick is to stop dancing on the red zone.

I'm Tom. I run Claude Code 4-8 hours a day across multiple projects and I rarely hit a limit. This post is what I actually do.

What are the Claude Code rate limits?.

Claude Code rate limits are tied to your Claude plan, not to Claude Code as a separate product. The free tier gives you a small daily allotment of messages. Pro gives you roughly 5x the free tier in a rolling 5-hour window. Max gives you 20x. Every plan also has a weekly ceiling that resets on a fixed day, and Opus burns through that ceiling faster than Sonnet.

You will hit one of three caps in practice. The 5-hour rolling cap throttles you mid-session when you push hard on Opus. The weekly cap stops the bleeding for the rest of the week if you spent all five days on long agentic loops. The output cap on individual responses can clip a long file rewrite without warning.

The full pricing breakdown is on the Claude Code pricing post.

How do I check my Claude Code usage?.

Three checks, in order:

  1. Inside Claude Code itself. Run /cost in the CLI. You get the current session's token spend, model mix, and approximate dollar equivalent. Fastest signal when you suspect a session has gone heavy.
  2. The Claude desktop app. This is the one most people miss. Almost everyone running Claude Code is on a Pro or Max subscription, not the API. So open the Claude desktop app, go to Settings, and you can see your usage against your plan's 5-hour and weekly caps right there. This is the canonical source for subscription users, and it is the check I run first.
  3. The token-optimizer dashboard. If you have token-optimizer installed, /token-optimizer opens an interactive dashboard with cost per session, model mix, cache hit rate, and subagent spending. More detail than the app, scoped to your local sessions.

The Anthropic console at console.anthropic.com is the fourth check, but only if you run Claude Code against your own API key. Open Usage there for per-day token consumption by model. If you are on the standard Pro or Max plan, the console will not show your Claude Code usage, the desktop app will.

For serious users who want their own tracker: pipe /cost output to a log file via a Stop hook and roll a weekly rollup. Five lines of bash. The Claude Code hooks post covers the hook setup.

A concept diagram of the four caps that pause a Claude Code session: the 5-hour rolling window, the weekly cap, the per-response output cap, and the Opus ceiling
Four caps run alongside every session. The one that hits first is the one that pauses you.

Why did Claude Code say "limit reached"?.

Four causes, in order of frequency:

  1. You hit the 5-hour rolling cap. Long Opus sessions are the usual culprit. The cap resets automatically 5 hours after your first message in the current window, not on the hour.
  2. You hit the weekly cap. Resets on the day of the week tied to your billing cycle. The console shows the exact reset time. Switching to Sonnet temporarily extends your runway but does not reset the cap.
  3. Output cap on a single response. Claude Code defaults to a generous output cap, but you can hit it on long file generation or massive refactor diffs. The fix is to break the task into smaller asks.
  4. A model-specific cap. Opus has its own ceiling distinct from Sonnet on Max plans. You can run out of Opus while Sonnet is still available. Switch with /model sonnet to keep working.

Two of these caps can be raised without changing plans. The other two require either a plan upgrade or an architectural change to how you use the tool. The next sections cover both paths.

A comparison panel of Pro versus Max showing how each tier handles caps and bursts
Pro versus Max in one panel. The decision is daily volume, not features.

How do I extend Claude Code rate limits?.

First, the honest framing. Most people reading this do not want to leave their Claude Code subscription. You are paying a flat monthly rate and you want to get more out of it, not start paying per token again. So treat the first two levers as the exception. The real win for almost everyone is the efficiency section right after this one.

You have three levers, in escalating cost order:

Lever 1: Upgrade the Claude plan. Pro to Max gets you roughly 4x more usage. Max to Team or Enterprise adds another step up plus admin features. If you bill Claude Code as a business expense and use it 4+ hours a day, the Max plan pays for itself in saved context-switching.

Lever 2: Move heavy workloads to the Claude API. This is the exception, not the answer most people want, so I am putting the caveat first. You would be leaving the flat-rate plan you already pay for and switching to metered per-token billing. It only makes sense for unattended production agent loops that run overnight on their own. For everyday coding on a Pro or Max plan, do not do this. The efficiency moves below get you further on the subscription you already have. If you genuinely need it, you run Claude Code against your own Anthropic API key and the setup is one environment variable.

Lever 3: Route specific workloads to a cheaper backend. Pair Claude Code with free-claude-code or Crush for the parts of your work that do not need Sonnet 4.6 quality. Lint fixes, doc generation, file renames, basic refactors. Route them to a local model or to OpenRouter. Keep Claude Code on the actual thinking.

If none of those work for you, the better answer is to use the plan you have more efficiently. That is the next 80% of this post.

How do I reduce Claude Code token usage?.

Nine moves, ranked by how much they actually move the needle.

1. Default to Sonnet, escalate to Opus.

This is the single biggest lever. Sonnet 4.6 handles 80% of real work at roughly a fifth of the cost of Opus 4.7 on Max plans. Set Sonnet as your default model in your CLAUDE.md or via /model sonnet. Reserve Opus for multi-file architecture decisions, hard debugging, and irreversible production work. When the hard part is done, drop back to Sonnet.

I run a one-line rule in my CLAUDE.md that says: "Default to Sonnet. Ask before escalating to Opus." It cut my Opus usage by 70% in the first week.

One caveat worth knowing before you hardcode it. If you pin Sonnet as your default model, you lose auto mode, where Claude Code picks the model for you and automatically steps down to a cheaper one as you approach your limit. Auto mode is genuinely useful and giving it up stings. So the real trade-off is this. Pin Sonnet when you want hard, predictable cost control. Leave it on auto when you want the routing handled for you. I pin Sonnet on long agentic sessions and let auto run the day-to-day.

2. Compact every 20-30 messages.

Claude Code accumulates context as you go. Every file you read, every tool output, every prior message stays in the working set until you compact. After 30-40 messages, you are paying for 50,000-80,000 tokens of conversation that no longer serves the task.

Run /compact aggressively. Twice an hour on a long session. Once after any large file read. The first run is the painful one because you watch your context drop. After that you forget you are doing it.

3. Use /clear between unrelated tasks.

/compact keeps the gist. /clear wipes the session. If you are switching from "fix the auth bug" to "draft a marketing email," you want a clean slate. The new task does not need 80,000 tokens of debugging context attached to it.

I run /clear 4-6 times a day. Anything more than a 10-minute context switch gets a fresh session.

One habit that makes this safe on important work: write a handoff file before you compact or clear. A short HANDOFF.md covering what you were doing, what is already done, and where the next session should pick up. Then clear without fear, because the next session reads the handoff and loses nothing. I do this before every clear on a live build, and it is the single reason I can clear so aggressively without dropping the thread.

4. Install the token-optimizer skill.

The token-optimizer skill by Alex Greensh is the single best Claude Code optimization tool I have installed. It runs as an external process so it costs zero context tokens of its own.

What it actually does:

Install with /plugin marketplace add alexgreensh/token-optimizer, then run /token-optimizer for an interactive audit. It will tell you exactly what to cut.

5. Tighten your CLAUDE.md.

Every CLAUDE.md byte loads on every session. A 200-line CLAUDE.md across 50 projects a week adds up. My rule: if a CLAUDE.md line is not actively shaping behaviour, it gets cut.

Two specific patterns to delete on sight:

The Claude Code memory post goes deeper on writing a tight CLAUDE.md.

6. Scope subagents through plan mode first.

Three parallel subagents each running a 30-step plan can eat an hour of usage in 10 minutes. The fix is to use plan mode first, scope the work, then dispatch.

The pattern I run:

  1. Switch to plan mode with Shift+Tab.
  2. Get Claude to write the plan.
  3. Review the plan, kill any steps that are speculative.
  4. Exit plan mode and dispatch the pruned plan to subagents.

Plan mode is roughly free because it does not edit files. The cost of the dispatched work is whatever the work would have cost anyway, but without the speculation tax.

7. Use Haiku for batch tasks.

Haiku 4.5 is roughly 4x cheaper than Sonnet and roughly fast enough for things like file renames, doc cleanups, lint fixes, and migration scripts. Set it as the model for batch subagent runs.

The pattern: main session on Sonnet, dispatched batch work to subagents running Haiku.

8. Cap output explicitly.

Add a line to your CLAUDE.md that says: "Keep responses under 2,000 tokens unless I ask for the full version." Claude obeys it. You stop getting 4,000-token essay answers to two-line questions.

9. Look at what other people are running.

A few other optimization tools worth checking, ranked by what I have either used or seen working in production:

Pick one, not all. The token-optimizer skill plus the native moves above will cover 80% of the gap for most users.

Where to start if you only do three things.

The 80/20 of this post, in order:

  1. Set Sonnet as your default model. One line in your CLAUDE.md. Cuts Opus burn by 60-70% the first week.
  2. Install the token-optimizer skill. /plugin marketplace add alexgreensh/token-optimizer, then run /token-optimizer. Removes the structural waste that nobody catches manually.
  3. Compact and clear ruthlessly. /compact twice an hour during long sessions. /clear between unrelated tasks. Free habit, biggest single behavioural change.

Everything else in this post is incremental on top of those three.

What happens when you hit the Claude Code weekly limit?.

You get a polite message saying you have hit your weekly cap and can either upgrade your plan or wait for the reset. The 5-hour cap keeps working independently, so you can still run short sessions, you just cannot run long ones.

Two things to know:

  1. The reset time is shown in your console and is tied to your billing cycle, not to a calendar week. If you signed up on a Wednesday, your week resets every Wednesday.
  2. Switching to the Claude API with your own key keeps you working immediately. Pay per token, no plan cap. Useful as an overflow buffer.

If you are hitting the weekly cap regularly on Max, the optimization moves above will pull you back into the cap on most weeks. If they do not, you are doing genuine production work and the Enterprise tier or API billing is the right answer.

Claude Code rate limits FAQ.

What is the Claude Code 5-hour limit?

The Claude Code 5-hour limit is a rolling usage cap that resets exactly 5 hours after your first message in the current window. It is your plan's standard rate limit and the most common cap you hit during a normal workday. The cap is shared across Claude.ai and Claude Code, so heavy use in either counts towards it.

What is the Claude Code weekly limit?

The Claude Code weekly limit is a 7-day usage ceiling that resets on the day of the week tied to your billing cycle. Hitting it stops your sessions for the rest of the week unless you upgrade or move to API billing. Opus burns through the weekly limit roughly 5x faster than Sonnet on Max plans.

How do I check my Claude Code usage?

Run /cost inside Claude Code for the current session, or sign into console.anthropic.com and open Usage for the canonical daily and weekly breakdown. For deeper insight, install the token-optimizer skill and run /token-optimizer for a dashboard with cost per session, model mix, and cache hit rate.

How do I extend Claude Code rate limits?

You have three paths. Upgrade the Claude plan (Pro to Max gives roughly 4x more usage). Move heavy workloads to the Claude API and pay per token, which bypasses the plan cap. Or route specific workloads to a cheaper backend through Crush or free-claude-code. The third option is the cheapest if you are willing to set it up.

Why is Claude Code running slow?

Slow Claude Code is usually one of three things. The session is bloated with old context (fix with /compact or /clear). The current model is overloaded by Anthropic (check the Is Claude Down page). Or your CLAUDE.md is loading too much on every message (tighten it).

Can I share my Claude Code usage across team members?

Yes, on Team and Enterprise plans. Each seat gets its own rate limit, with admin visibility into team-wide consumption. On Pro and Max, usage is per-account and not shareable.

What is the cheapest way to run Claude Code?

The cheapest setup is Claude Code on Pro with Sonnet as the default, token-optimizer installed, and overflow routed to OpenRouter or a local model through Crush. That gets most operators below $25 a month for daily heavy use.

Ready to cut your Claude Code usage in half?.

The five moves that matter most are in this post. The Blueprint walks you through the full setup, including the CLAUDE.md template I use, the token-optimizer install, and the model routing pattern. Free, 60 minutes, no coding required.

Free · 60 Minutes · No coding required

The Claude Code Blueprint.

Five interactive lessons. Install Claude Code, build your first automation, and deploy it live on the internet — all in under an hour. Free, no coding required.

Grab the Blueprint