Speed up responses with fast mode

(code.claude.com)

28 points | by surprisetalk 1 hour ago

13 comments

  • clbrmbr 3 minutes ago
    I’d love to hear from engineers who find that faster speed is a big unlock for them.

    The deadline piece is really interesting. I suppose there’s a lot of people now who are basically limited by how fast their agents can run and on very aggressive timelines with funders breathing down their necks?

    • sothatsit 0 minutes ago
      If it could help avoid you needing to context switch between multiple agents, that could be a big mental load win.
  • Nition 28 minutes ago
    Note that you can't use this mode to get the most out of a subscription - they say it's always charged as extra usage:

    > Fast mode usage is billed directly to extra usage, even if you have remaining usage on your plan. This means fast mode tokens do not count against your plan’s included usage and are charged at the fast mode rate from the first token.

    Although if you visit the Usage screen right now, there's a deal you can claim for $50 free extra usage this month.

  • jhack 5 minutes ago
    The pricing on this is absolutely nuts.
  • IMTDb 28 minutes ago
    I’m curious what’s behind the speed improvements. It seems unlikely it’s just prioritization, so what else is changing? Is it new hardware (à la Groq or Cerebras)? That seems plausible, especially since it isn’t available on some cloud providers.

    Also wondering whether we’ll soon see separate “speed” vs “cleverness” pricing on other LLM providers too.

    • jstummbillig 14 minutes ago
      > It seems unlikely it’s just prioritization

      Why does this seem unlikely? I have no doubt they are optimizing all the time, including inference speed, but why could this particular lever not entirely be driven by skipping the queue? It's an easy way to generate more money.

      • singpolyma3 8 minutes ago
        Until everyone buys it. Like fast pass at an amusement park where the fast line is still two hours long
        • servercobra 0 minutes ago
          It's a good way to squeeze extra out of most people without actually raising prices.
    • sothatsit 24 minutes ago
      There are a lot of knobs they could tweak. Newer hardware and traffic prioritisation would both make a lot of sense. But they could also lower batching windows to decrease queueing time at the cost of lower throughput, or keep the KV cache in GPU memory at the expense of reducing the number of users they can serve from each GPU node.
    • Nition 22 minutes ago
      I wonder if they might have mostly implemented this for themselves to use internally, and it is just prioritization but they don't expect too many others to pay the high cost.
    • pshirshov 26 minutes ago
      > so what else is changing?

      Let me guess. Quantization?

  • pronik 29 minutes ago
    While it's an excellent way to make more money in the moment, I think this might become a standard no-extra-cost feature in several months (see Opus becoming way cheaper and a default model within months). Mental load management while using agents will become even more important it seems.
    • giancarlostoro 27 minutes ago
      Yeah especially once they make an even faster fast mode.
  • simonw 37 minutes ago
    The one question I have that isn't answered by the page is how much faster?

    Obviously they can't make promises but I'd still like a rough indication of how much this might improve the speed of responses.

  • 1123581321 48 minutes ago
    Could be a use for the $50 extra usage credit. It requires extra usage to be enabled.

    > Fast mode usage is billed directly to extra usage, even if you have remaining usage on your plan. This means fast mode tokens do not count against your plan’s included usage and are charged at the fast mode rate from the first token.

    • minimaxir 28 minutes ago
      After exceeding the increasingly shrinking session limit with Opus 4.6, I continued with the extra usage only for a few minutes and it consumed about $10 of the credit.

      I can't imagine how quickly this Fast Mode goes through credit.

  • krm01 33 minutes ago
    Will this mean that when cost is more important than latency that replies will now take longer?

    I’m not in favor of the ad model chatgpt proposes. But business models like these suffer from similar traps.

    If it works for them, then the logical next step is to convert more to use fast mode. Which naturally means to slow things down for those that didn’t pick/pay for fast mode.

    We’ve seen it with iPhones being slowed down to make the newer model seem faster.

    Not saying it’ll happen. I love Claude. But these business models almost always invite dark patterns in order to move the bottom line.

  • hmokiguess 14 minutes ago
    Give me a slow mode that’s cheaper instead lol
  • pedropaulovc 30 minutes ago
    Where is this perf gain coming from? Running on TPUs?
  • solidasparagus 25 minutes ago
    I pay $200 a month and don't get any included access to this? Ridiculous
    • pedropaulovc 22 minutes ago
      Well, you can burn your $50 bonus on it
    • bakugo 11 minutes ago
      The API price is 6x that of normal Opus, so look forward to a new $1200/mo subscription that gives you the same amount of usage if you need the extra speed.
      • MuffinFlavored 8 minutes ago
        I always wondered this, is this true/does the math come out to be really that bad? 6x?

        Is the writing on the wall for $100-$200/mo users that, it's basically known-subsidized for now and $400/mo+ is coming sooner than we think?

        Are they getting us all hooked and then going to raise it in the future, or will inference prices go down to offset?

    • kingforaday 11 minutes ago
      ..But it says "Available to all Claude Code users on subscription plans (Pro/Max/Team/Enterprise) and Claude Console."

      Is this wrong?

  • thehamkercat 1 hour ago
    Interesting, output price is insane/Mtok
  • speedping 32 minutes ago
    > $30/150 MTok Umm no thank you