Considering all the problems they've been having with over-charging Claude Code users over the past few weeks it's the very least they could do. Max subscribers are hitting their 5 hour usage limits in 30-40 minutes with a single instance doing light work, while Anthropic have no support or contact mechanism for users that they respond to.
This hasn't been my experience either. I personally find the max plan is very generous for day-to-day usage. And I don't even use compact manually.
However, when I tried out the SuperPower skill and had multiple agents working on several projects at the same time, it did hit the 5-hour usage limit. But SuperPower hasn't been very useful for me and wastes a lot of tokens. When you want to trade longer running time for high token consumption, you only get a marginal increase in performance.
So people, if you are finding yourself using up tokens too quickly, you probably want to check your skills or MCPs etc.
It's known that Anthropic's $20 Pro subscription is a gateway plan to their $100 Max subscription, since you'll easily burn your token rate on a single prompt or two. Meanwhile, I've had ample usage testing out Codex on the basic $20 ChatGPT Plus plan without a problem.
As for Anthropic's $100 Max subscription, it's almost always better to start new sessions for tasks since a long conversation will burn your 5-hour usage limit with just a few prompts (assuming they read many files). It's also best to start planning first with Claude, providing line numbers and exact file paths prior, and drilling down the requirements before you start any implementation.
> It's known that Anthropic's $20 Pro subscription is a gateway plan to their $100 Max subscription, since you'll easily burn your token rate on a single prompt or two.
I genuinely have no idea what people mean when I read this kind of thing. Are you abusing the word "prompt" to mean "conversation"? Or are you providing a huge prompt that is meant to spawn 10 subagents and write multiple new full-stack features in one go?
For most users, the $20 Pro subscription, when used with Opus, does not hit the 5-hour limit on "a single prompt or two", i.e. 1-2 user messages.
Today I literally gave Claude a single prompt, asking it to make a plan to implement a relatively simple feature that spanned a couple
different codebases. It churned for a long time, I asked a couple very simple
follow up questions, and then I was out of tokens. I do not consider myself to be any kind of power user at all.
The only time I've ever seen this happen is when you give it a massive codebase, without any meaningful CLAUDE.md to help make sense of it and no explicitly @ mentioning of files/folders to guide, and then ask it for something with huge cross-cutting.
> spanned a couple different codebases
There you go.
If you're looking to prevent this issue I really recommend you set up a number of AGENTS.md files, at least top-level and potentially nested ones for huge, sprawling subfolders. As well as @ mentioning the most relevant 2-3 things, even if it's folder level rather than file.
Not just for Claude, it greatly increases speed and reduces context rot for any model if they have to search less and more quickly understand where things live and how they work together.
I have a tool that scans all code files in a repo and prints the symbols (AST based), it makes orienting around easy, it can be scoped to a file or folder.
I am on $100 max subscription, and I rarely hit the limit, I used to but not anymore, but then again, I stopped building two products at the same time and concentrate to finish up the first/"easiest" one.
> you'll easily burn your token rate on a single prompt or two
My experience has been that I can usually work for a few hours before hitting a rate limit on the $20 subscription. My work time does not frequently overlap with core business hours in PDT, however. I wonder whether there is an aspect of this that is based on real-time dynamic usage.
You hit a vague, never-quite-explained 5h window limit that has nothing to do with what you're doing, but with what every user is doing together. It's totally not downtime, you're just "using it too much" and they're telling you to fuck off until the overall usage slows down.
The order of priority is: everyone using the API (you don't want to calculate the price) → everyone on a $200/month plan → everyone on a $20/month plan → every free user.
Let's be perfectly clear: if user actions had anything to do with hitting these limits, the limits would be prominently displayed within the tool itself, you'd be able to watch it change in real time, and you'd be able to pinpoint your usage per each conversation and per each message within that conversation.
The fact that you cannot do that is not because they can't be bothered to add such a feature, but because they want to be able to tweak those numbers on the backend while still having plausible deniability and being able to blame it on the user.
Instead, the little "usage stats" they give you is grouped by the hour and only split between input and output tokens, telling you nothing.
For the same reason they use "tokens" instead of kilobytes: so that you don't do the conversion yourself and realise that for example spending a million "tokens" on claude-opus-4.6 costs you anywhere from $10 (input tokens) to $37.5 (output tokens). Now, 1 million tokens sounds pretty big and "unreachable" until you realise that's about 4 megabytes of text. It's less than three floppy disks of data going back and forth.
Now let's assume you want to send a CD worth of data to Opus 4.6. 700 megabytes * $10 (price per million input tokens) / 4 (rounding down one megabyte to roughly 250k "tokens") = $1750. For Opus 4.6 to return a CD amount of data back to you: $37.50 * 700 / 4 = ~$6.5k.
A terabyte worth of data with a 50:50 input/output ratio would cost you $5.7 million. A terabyte worth of data with a 50:50 input/output ratio on gpt-5.2-pro would cost you $25.2 million. (Note: OpenAI's API pricing still hasn't been updated to reflect 5.3 prices.)
So we get layers upon layers upon layers upon layers upon layers of obfuscation to hide those numbers from you when you simply subscribe for a fixed monthly fee!
you can just watch the limit on the claude usage settings view.
itd be nice to know how much the session context window applies wrt token caching, but disabling all those skills and stopping sending a screenshot every couple messages gets that 5hour limit and weekly limit a bunch better
It hasn't always done this, it's a relatively recent problem in the last 1-4 weeks (roughly). (note: I'm on the $160AUD/mo plan, so I think that's $100USD).
exactly my experience. i am on pro subscription and when coding with claude in console i can only do about 30 min of work before i have to wait for 4 hours. i actually code more by chatting with claude via poe.com than using the subscription that i have
pro is the $20, right? It runs out quickly, especially using opus. But what do you expect for that kind of money? For serious work at least Max $100 is needed.
“Light work” is a pretty bold statement my dude. I run max for 8+ hour coding sessions with 3-4 windows where I’m babysitting and watching the thing and I never even get session warnings. The only time I bump up against limits is on token hungry tasks like reverse engineering 3M+ LOC codebases or 5-6 agents generating unit tests in parallel. Something tells me that what you call “light work” is not remotely the same as what I consider “light work”
That would be heartening, if I wasn’t consuming tokens 10x as fast as expected, and they just had attribution bugs.
Do you have references to this being documented as the actual issue, or is this just speculation?
I want to support Anthropic, but with the Codex desktop app *so much better* than Anthropic’s combined with the old “5 back and forths with Opus and your quota is gone”, it’s hard to see going back
Yeah I think it's either a billing bug, or some sort of inbuilt background sub-agent loop gone wild inside Claude Code, if you have a look at recent issues on the Github relating to 'limits', 'usage', 'tokens' you'll see a lot of discussion about it: https://github.com/anthropics/claude-code/issues?q=sort%3Aup...
Is this one of those, Hey turn off the overcharge protection because you can go $50 into debt for free, and then maybe you'll just keep going and not notice you owe us an extra $500 type of situations?
I was worried about this when I turned it on myself, but under the usage panel it shows that it limited my spending to just the $50 and that auto-reload is off, so it doesn't seem this would be the case.
I turned on this overspend and limited the spending to $20. A day later I checked my spending, I had used "295%" of my limit. Almost $60. No idea why it didn't respect my setting.
I'll pass on this $50, but please hire real human and fix your crappy app, Claude!
This bug has been for years: in Claude (web or app), if you create a new chat at the middle of existing chat thinking or tool calling, the existing chat will be broken, either losing data, or become unusable.
It's unbelievable Anthropic worth hundreds of billions but can't fix this.
My favorite bug is when I spend 5 minutes writing something and send it, then something breaks and refreshes, deleting the whole thing. Like, thanks, much appreciated, please waste more of my time.
Literally every time if I press the "stop" button while it's writing back for the first conversation turn because I notice I forgot something and want to correct it, my prompt is lost.
Anytime I run into a bug like this, part of me wants to go calculate how much of humanity's collective time has been wasted by one company not fixing a trivial bug. It's got to be a lot.
I mean, it's a nice gesture. I use extra usage a little bit when the quota runs out and I'm still in the middle of a task. (Afterwards I switch to other agents that track limits monthly instead of by 5 hour window.) This $50 credits should last a while.
So weird seeing comments like this on HN. AI is the most revolutionary technology in programming and computing, perhaps since the first programs were made. Of course it's going to be the most talked about topic.
This has not been my experience at all. The only time I even got close to this is multiple long sessions that had multiple compacts.
The key is if you hit compact, start a new session.
However, when I tried out the SuperPower skill and had multiple agents working on several projects at the same time, it did hit the 5-hour usage limit. But SuperPower hasn't been very useful for me and wastes a lot of tokens. When you want to trade longer running time for high token consumption, you only get a marginal increase in performance.
So people, if you are finding yourself using up tokens too quickly, you probably want to check your skills or MCPs etc.
As for Anthropic's $100 Max subscription, it's almost always better to start new sessions for tasks since a long conversation will burn your 5-hour usage limit with just a few prompts (assuming they read many files). It's also best to start planning first with Claude, providing line numbers and exact file paths prior, and drilling down the requirements before you start any implementation.
I genuinely have no idea what people mean when I read this kind of thing. Are you abusing the word "prompt" to mean "conversation"? Or are you providing a huge prompt that is meant to spawn 10 subagents and write multiple new full-stack features in one go?
For most users, the $20 Pro subscription, when used with Opus, does not hit the 5-hour limit on "a single prompt or two", i.e. 1-2 user messages.
> spanned a couple different codebases
There you go.
If you're looking to prevent this issue I really recommend you set up a number of AGENTS.md files, at least top-level and potentially nested ones for huge, sprawling subfolders. As well as @ mentioning the most relevant 2-3 things, even if it's folder level rather than file.
Not just for Claude, it greatly increases speed and reduces context rot for any model if they have to search less and more quickly understand where things live and how they work together.
i don't want to think about how to hack a tool i'm paying for not locking me out because "i promped wrong"
My experience has been that I can usually work for a few hours before hitting a rate limit on the $20 subscription. My work time does not frequently overlap with core business hours in PDT, however. I wonder whether there is an aspect of this that is based on real-time dynamic usage.
The order of priority is: everyone using the API (you don't want to calculate the price) → everyone on a $200/month plan → everyone on a $20/month plan → every free user.
This morning: (new chat) 42 seconds of thinking, 20 lines of code changed in 4 files = 5% usage
Last night: 25 minutes of thinking, 150 lines of code generated in 10 new files = 7% usage
Let's be perfectly clear: if user actions had anything to do with hitting these limits, the limits would be prominently displayed within the tool itself, you'd be able to watch it change in real time, and you'd be able to pinpoint your usage per each conversation and per each message within that conversation.
The fact that you cannot do that is not because they can't be bothered to add such a feature, but because they want to be able to tweak those numbers on the backend while still having plausible deniability and being able to blame it on the user.
Instead, the little "usage stats" they give you is grouped by the hour and only split between input and output tokens, telling you nothing.
Now let's assume you want to send a CD worth of data to Opus 4.6. 700 megabytes * $10 (price per million input tokens) / 4 (rounding down one megabyte to roughly 250k "tokens") = $1750. For Opus 4.6 to return a CD amount of data back to you: $37.50 * 700 / 4 = ~$6.5k.
A terabyte worth of data with a 50:50 input/output ratio would cost you $5.7 million. A terabyte worth of data with a 50:50 input/output ratio on gpt-5.2-pro would cost you $25.2 million. (Note: OpenAI's API pricing still hasn't been updated to reflect 5.3 prices.)
So we get layers upon layers upon layers upon layers upon layers of obfuscation to hide those numbers from you when you simply subscribe for a fixed monthly fee!
Most people care about getting the right bytes.
itd be nice to know how much the session context window applies wrt token caching, but disabling all those skills and stopping sending a screenshot every couple messages gets that 5hour limit and weekly limit a bunch better
Writing self-serving LinkedIn productivity porn
That would be heartening, if I wasn’t consuming tokens 10x as fast as expected, and they just had attribution bugs.
Do you have references to this being documented as the actual issue, or is this just speculation?
I want to support Anthropic, but with the Codex desktop app *so much better* than Anthropic’s combined with the old “5 back and forths with Opus and your quota is gone”, it’s hard to see going back
Nope. I'm putting a lot of trust in American Express and the continued availability of Claude competitors.
Doesn't appear to include the new model though, only the state-of-yesterdays-art (literally yesterdays).
This bug has been for years: in Claude (web or app), if you create a new chat at the middle of existing chat thinking or tool calling, the existing chat will be broken, either losing data, or become unusable.
It's unbelievable Anthropic worth hundreds of billions but can't fix this.
Anytime I run into a bug like this, part of me wants to go calculate how much of humanity's collective time has been wasted by one company not fixing a trivial bug. It's got to be a lot.
Sometimes I think it’s better to just use code tab to chat knowing that’s more reliable.
Go to https://claude.ai/settings/usage, turn on extra usage and enable the promo from the notification afterwards.
I received €42, top up was not required and auto-reload is off.
Ah well. Back to Codex.