Cursor Introduces Composer 2.5

(cursor.com)

272 points | by asar 1 day ago

47 comments

  • chemex 1 hour ago
    I've been using Claude Code as my daily driver on a React Native + iOS codebase for the last few months. The thing that surprised me wasn't quality differences on individual edits — those are pretty close once you control for harness wiring — but how differently I'd ended up structuring my workflow around each style of tool.

    Tab completion + chat-in-sidebar feels like an extension of my editing. An agentic harness feels more like delegating a 20-minute task and coming back to review. Different cognitive load, different bug profile. The "which is better" framing tends to skip over the fact that they reward different working styles.

    Two things I'd watch on Composer 2.5 specifically:

    1. How it handles long-running multi-file refactors that touch 10+ files. My experience with smaller models in that slot is they lose track of which files they've already edited around 30% of the way through. Frontier models keep the plan coherent for longer.

    2. How it deals with non-obvious file boundaries. The thing that takes me out of "let it work" mode is the model deciding it needs to edit a config file I didn't think of. Usually that's right, but occasionally it's spelunking somewhere I don't want it to be.

    The Kimi K2.5 base is interesting on its own. Open weights below frontier closed models is the thing worth watching from the harness side. If anyone's set up to fine-tune for a specific harness, this is the moment.

    • chis 1 minute ago
      AI slop detected, you're under arrest
  • rcleveng 4 hours ago
    I have to say the new model is quite good at the basics, I've been handing over more and more tasks from Linear straight to it instead of the copy-paste into Claude dance lately.

    At this point, more of my complaints are on the harness side, which is odd since originally they were by far the best harness out there.

    Support - This is pretty much non-existant, it's community support or sales support.

    Interacting with GitHub - this should work and be awesome, Claude code does this well (responding to lint errors and comments). Cursor you have to poke the agent to look at the comments or lint errors, and even then it's about 10% good. Even GitHub Copilot is better here.

    Bugbot - I have it setup to trigger manually, but it still seems to wake up and burn 80-120k tokens just to notice it's configured to be manually invoked. When it does run, it tells me there's no issues (but claude or copilot both find real things)

    App - When you have both agent window and the ide windows, it's hard to open up the code in the right directory. A simple "cursor ." from the terminal used to do it, now it'll often open the agent window, you have to try a few times for it to work.

    I love that they are running super fast, it's just hard when many of the basics break or don't work.

    • khazhoux 4 hours ago
      > I've been handing over more and more tasks from Linear straight to it instead of the copy-paste into Claude dance lately

      Tangent: we've been using Linear at work and I still don't understand why it claims to be "task tracking for agents". Is there anything at all that lends itself better to agentic workflows compared to JIRA or gitlab/github issues or whatever else?

      Seems like Linear just hopped on the buzzword hype train at the exact right moment...

      • dbalatero 4 hours ago
        > Seems like Linear just hopped on the buzzword hype train at the exact right moment...

        I think you nailed it. Provided an agent can connect and ingest the information in the ticket, that's basically what's needed. I guess it's nice to be able to nudge ticket status and post back to it, but all of those seem like wiring up existing APIs to an MCP and calling it good. I don't see why JIRA couldn't execute on that, despite being Atlassian.

        • rcleveng 3 hours ago
          Yup, honestly a google spreadsheet could probably do it as well.

          I like the "copy prompt" feature, it's super simple but makes it just a few seconds to go from issue -> claude session.

          Also assigning directly to cursor or codex, that's how I handle the easier tasks.

          We also have scheduled tasks that elaborate existing tickets with information where needed, again that's just MCP but it works well enough

  • throwaw12 11 hours ago
    > Composer 2.5 is built on the same open-source checkpoint as Composer 2, Moonshot's Kimi K2.5.

    Really nice to see they're giving credit to the company and I am optimistic Kimi K open models soon will outperform Opus models

    • vessenes 8 hours ago
      Sounds like it's the last Kimi-line model at Cursor? As expected they say they'll be training a larger model on the SpaceX infrastructure, or have already started most likely.

      I'm very curious to read about the Composer 3 architecture when it comes out. More frontier coding models are a good thing, especially if they diversify into different strengths/weaknesses.

      • bfeynman 3 hours ago
        That only seems plausible if whatever corpse of xAI is around is giving them engineering time. I don't know if they hired a bunch of ex frontier lab staff but its unlikely they have the technical capability to train their own frontier models especially the pretraining. Because the thing is if its not competitive with claude/codex it will be panned.
    • scosman 7 hours ago
      > I am optimistic Kimi K open models soon will outperform Opus models

      Hard to outperform the model you distill...

      • nl 6 hours ago
        Most of the performance on coding comes from RL, not distillation.

        Distillation helps with world knowledge and things like that.

      • intrasight 7 hours ago
        Is that true? If the distillation is not lossy and the model runs much faster due to less resource consumption, then it may outperform.
        • mwigdahl 7 hours ago
          One of those conditionals is a pretty huge assumption.
          • intrasight 35 minutes ago
            It's an assumption and it can be tested
    • howdareme9 11 hours ago
      Only because last time they tried to hide it lol
      • trymas 9 hours ago
        Yes and if I remember the drama correctly - Kimi's license or terms of use says that for commercial use cases (or was it user count?) - you must declare credit to Moonshot and Kimi.
        • Lennie 8 hours ago
          It's important to mention: they were compliant, because they trained the model at an AI hosting provider that had a partnership with Moonshot AI, but Moonshot didn't know Cursor was a customer.
        • Aurornis 5 hours ago
          This was misinformed Twitter and Reddit drama.

          They had properly licensed it and were complying with the terms of the license.

          • davidatbu 2 hours ago
            Note that something that helped the misinformation was that, on Twitter, there were Kimi employees expressing their surprise that the base model was Kimi K2.5, and their indignation that Cursor didn't credit Kimi. They later deleted their tweets (what I infer from that is that some employees were not aware of some pre-existing agreement or understanding between Cursor and Kimi until the drama happened).
        • maxdo 8 hours ago
          How can distilled opus become better than original? There are numbers of reports including anthropic that kimi team was participating in fraudulent activities
          • throwa356262 8 hours ago
            Do we know the "fraudulent " requests really came from moonshot engineers and was not QA team running a ton of benchmarks against other models?

            I feel distilling something as big as Opus would require many many more samples, but I dont really know much about this subject

            • maxdo 3 hours ago
              sure, sounds like QA lol

              Scale: Over 3.4 million exchanges

              The operation targeted:

              Agentic reasoning and tool use Coding and data analysis Computer-use agent development Computer vision Moonshot (Kimi models) employed hundreds of fraudulent accounts spanning multiple access pathways. Varied account types made the campaign harder to detect as a coordinated operation. We attributed the campaign through request metadata, which matched the public profiles of senior Moonshot staff. In a later phase, Moonshot used a more targeted approach, attempting to extract and reconstruct Claude’s reasoning traces.

              • ta20240528 2 hours ago
                And when you here unsubstantiated rumours* that ­say Anthropic has been sending exchanges to say Alibaba's Qwen, will you als oconclude the same about the entire US AI industry?

                I doubt it.

                * publish the logs.

                • ifwinterco 2 hours ago
                  Even if it's true, it's not like US AI companies can complain, given their entire business is based on ripping off text without attribution
  • goyozi 13 hours ago
    I kind of want to try it, to see if and how far they can take an open model and improve it but I really don’t miss the Cursor user experience. Constant UI changes, half-baked features, smaller and smaller limits, useless AI change attribution; I think I’ll wait for others to report if it’s any good.
    • whywhywhywhy 10 hours ago
      Noticed recently they keep opening their “Agents” window when the project was last opened in the VSCode fork window in the hopes I’ll just continue working in that when the UI is totally different and missing things I need.

      For a professional tool it’s getting egregious how little respect they have for my workflows and flow state they way they keep moving, changing iconography and flipping switches of the UI.

      It’s clearly being ran by someone who comes from a social app or sales app growth hacking background.

      • SebastianKra 1 hour ago
        It seems obvious that they plan to eventually drop VSCode. I'd be willing to take them up on that offer. Their agent window is genuinely better as a starting point.

        What annoys me is how little they want to integrate with ...anything. Wanna open a link in your default browser? Use our built-in chromium fork, we insist. Wanna open a location in Zed? No, please use our half-baked editor re-implementation. Wanna open a location in Cursors own vscode-based editor? You can't. Managed to work around that somehow? We changed your files to "Worktree TS", disabling all your language servers. It's like programming on an iPhone.

      • dmix 7 hours ago
        I’ve personally never experienced that issue with Cursor. I never use the agents window and it always shows me the editor.
        • whywhywhywhy 3 hours ago
          You're not in the A/B test. I've never opened the agents window consensually.
    • fjdjshsh 1 hour ago
      I've had good experiences with Cursor so far and it's my main IDE. I've noticed some UI changes, but I've switched fast and they didn't bug me
    • omederos 5 hours ago
    • Aurornis 3 hours ago
      I try it from time to time and feel the same way. Some people I know really like it but I can’t tell if that’s because it’s good or just because it’s what they’ve become familiar with and they don’t like to change tools. Cursor had a good head start and a lot of early PR.
    • kilroy123 8 hours ago
      I 100% agree. It's soooo buggy.

      I gave up, canceled my plan, and went back to boring old VSCode. It feels so much more stable, and my Mac no longer runs out of memory. With cursor I had to reboot my macbook several times a week and had to always be plugged in.

      • smnscu 3 hours ago
        That's me with Google Antigravity. Switching back to vscode was such a breath of fresh air. Porting over my (extensive) settings/extensions/keyboard shortcuts was extremely easy too (just ask the agent to do it), and now I can use both Copilot models and Claude Code easily. More to your point though, the speed and stability is incomparable. I can't remember having many issues with Cursor last year when I used it at my last job, but still, vscode has been surprisingly pleasant for agentic use.
    • tomasz-tomczyk 10 hours ago
      Yeah I have a soft spot for Cursor because it was my first tool that unlocked huge productivity with AI, but I avoid doing anything there now.

      Should try their CLI!

    • indiantinker 8 hours ago
      I agree. I quit cursor and replaced it with conductor and a mix of Claude Code / Codex/ Copilot and i dont miss it as such. Maybe one day I will come back.
    • ttouch 9 hours ago
      you can use either the cursor cli and/or zed editor with cursor as the underlying provider with ACP (agent context protocol)
      • presentation 8 hours ago
        Tried that, it just seemed way dumber this way unfortunately. And the zed UI provided 0 visibility whenever it was doing tool calls, and for some reason it kept running sleep 30 calls because it couldn’t figure out how to see the results of its own tool calls for some reason.
    • rubyn00bie 11 hours ago
      Damn do I feel the UI changes being a pain point.

      It’s a near constant regression in my workflows. “Multiple agents” got destroyed recently, and the new interface for it some sort of command isn’t as good or reliable. Then you’ve got modals everywhere[1] and truncated bits (like long branch names) that make it insanely frustrating to use.

      They’re constantly changing the UI without actually improving it at all. I’ll likely cancel it and use opencode for personal stuff with Deepseek and only use it at work because I have to. There was a time when I appreciated the harness but it’s becoming less useful, or at least noticeable, over time… all the while the actual UI becomes substantially more painful and awkward to use (like @ in the “agents” window being completely unable to find a file because it’s some sort of “global” scope).

      One thing that surprises me about this whole segment is that JetBrains haven’t eaten these folks lunch. Their IDEs are leagues better than VSCode but their AI integration is awful by comparison (and the bar is low). I can’t even see how much of the context window I have left.

      [1] it’s insane I have to answer questions in a tiny input box I cannot resize or adjust the size of. Let alone the fact the text area I input prompts into cannot be resized. Truly feels like the UI/UX is done by people without any experience.

      • animuchan 9 hours ago
        > Truly feels like the UI/UX is done by people

        To me it feels like it's done entirely by an LLM, starting from the product vision.

    • jstummbillig 11 hours ago
      Isn't there a cli version of cursor by now?
      • yourboirusty 9 hours ago
        It's a bit better than the VSCode fork, but still much worse than competition:

        - lags constantly,

        - if you type while it's generating you'll get missed inputs,

        - 'plan mode' doesn't clear context before starting work,

        - you can't directly edit the plan, you can only ask the bot to do it,

        - you can't immediately whitelist commands, only accept once or allow all.

      • vorticalbox 11 hours ago
    • epolanski 11 hours ago
      Good point.

      One of the things I've came to appreciate about the cli tools like Codex or Claude is that the interface is so limited that every feature they release is still limited and constrained to the same UX limitations, whereas those "funkier" IDEs change from month to month giving me further fatigue.

  • brunooliv 8 hours ago
    Any reason why they indexed on Kimi K2.5 model? I have tried many open-source ones in Opencode, and, in my experience (standard backend development, Java, Python, Spring, etc) Qwen3.6 is SO MUCH BETTER that's shocking. Kimi can't even get most tool calling arguments right.
    • CuriouslyC 7 hours ago
      There's a lead time on models, and there's some tuning gotchas they probably already figured out with Kimi, so they weren't ready to just drop everything and switch. I'm sure they will switch models eventually.
      • roflcopter69 7 hours ago
        I recommend reading the entire article

          Together with SpaceXAI, we're training a significantly larger model from scratch, using 10x more total compute.
          With Colossus 2's million H100-equivalents and our combined data and training techniques, we expect this to be a major leap in model capability.
        • grim_io 5 hours ago
          I guess this will largely decide if xai is going to pay 60 or 10 billion, depending on the success of the new coding model.
    • KaoruAoiShiho 7 hours ago
      Kimi 2.5 has the best long context. For raw coding benchmark scores you can just post train on top of it with more specialized data. 2.5 is kinda old, 2.6 is the current release which is exactly just that and catches up to the frontier in most aspects.
    • Bombthecat 8 hours ago
      Cheaper to run?
  • wunderlotus 1 hour ago
    I love Cursor as a tool, but I'm skeptical bc:

    1/ CursorBench is so opaque [1] that it makes it hard to trust. Not to mention the v3.1 eval is a newer iteration and there's no insight into the tasks or if the model was just tuned to max it out. Composer 2 previously scored between 60-65% on the previous benchmark eval [2] but scores between 50-55% on CB v3.1[3].

    2/ I've experienced Composer 2's performance and it leaves much to be desired as a daily driver for a knowledge worker. but KWs are obviously not the target users and I can see how it's cost-efficient for executing on clearly-defined, discrete coding tasks. Obviously that's their value proposition and they're figuring out how to communicate it well to the target customer. It just doesn't feel like CursorBench is that.

    [1] https://cursor.com/blog/cursorbench#building-cursorbench

    [2] https://cursor.com/blog/composer-2-technical-report#performa...

    [3] https://cursor.com/blog/composer-2-5

  • steviedotboston 6 hours ago
    It's very confusing that they use the same name as the very well known PHP package manager, composer

    https://getcomposer.org/

    • wesammikhail 6 hours ago
      I dont know what it is with products names these days. Antigravity, Antimatter, Composer, Clay, Ramp, Bolt, etc.

      You'd think the founders would Google for naming conflict before choosing a name.

      • varun_ch 5 hours ago
        I genuinely wonder if consulting LLMs for naming advice could be an explanation.

        They certainly wouldn’t be great at coming up with new words for a product name.

        • dewey 1 hour ago
          Naming issues are as old as time. Apple Computer vs. Apple Records comes to mind as a popular example.
  • asar 1 day ago
    The model is (like Composer 2) based on Kimi K2.5 and they claim SOTA performance for 1/10th of the cost. The tweet also mentions that they've started a new model from scratch on Colossus 2 (xAI/SpaceX Cluster). Really impressive how they've made this jump from being called the vscode fork with no moat just a couple of months ago.
    • antirez 13 hours ago
      How much the RL they are doing really improves Kimi K2.5 is to be seen. So, right now, the ground truth is that they combined what they had with a strong open weights model. The RL improvement may be both marginal (since may folks report strong results with vanilla K2.6) and may mostly bias the model towards coding tasks: when a model like this is trained to be generalist, there is a tension between being good at one thing and the other, in terms of SFT and RL. You can see this in the DeepSeek v4 Flash training report for instance but it is a known fact. So if you have the GPUs and a decent RL pipeline that does not run the model you can indeed specialize it a bit more for a given task at the expenses of tasks people will not do inside Cursor. But, so far, the measurable reality is that Cursor uses an open weight model like most could do, and the RL story could be partilly a marketing move to call to Composer 2.5 more than a real strong gain, given that there is no way to verify and K2.5 was already strong. And we also know that they had to partner to do the training, which is also not a good news.
    • onlyrealcuzzo 1 day ago
      > Really impressive how they've made this jump from being called the vscode fork with no moat just a couple of months ago.

      Impressive, yes. But they still don't have a moat...

      • jmcqk6 1 hour ago
        I've been using cursor for over a year for my personal projects. At work, I use Claude Code, and so I've been wondering if I'm missing something in the other agents.

        Over the last week, I tried out two other agents on my personal projects: dirac and forgecode, after seeing impressive results from both of them on terminal bench.

        After a good amount of testing, and over $100 in open router spend, I'm back to cursor.

        I really liked forgecode the best, and it feels better than claude code, but cursor definitely feels best to me. Composer 2.5 is fast and effective, and it makes a huge difference. I was running `forge` with Opus, and it was taking dozens of minutes to do things, and the feedback loop was so slow.

        The previous version of composer was also much faster, and it makes a difference. Maybe people like context switching, but I prefer to stay focussed on the task in front of me, and I'm reviewing the code carefully.

        I think that's a pretty good moat. I was ready to end my subscription a week ago, and now I'm back after learning the grass is not necessarily greener on the other side of the fence.

      • infecto 21 hours ago
        I am not sure we should dismiss what they have today. Nobody has yet to come close with a full package ide that works well for coding. Is that not a moat? It is easy for my to in my head discount it, thinking that I could build something myself but between autocomplete and their workflow for agent use, it feels like they have some tangible moat emerging.
        • virgilp 10 hours ago
          If we ignore cost (which is kinda hard to ignore), I feel Codex kinda' does it for me. Sure it's not really an editor but I find I don't need that _that much_ and it's easy to launch an external editor (they actually have the feature).

          The ironic thing is that half a year ago, after trying factory.ai I thought chat-first interface was a stupid idea that will never work.

        • chillfox 11 hours ago
          Have you tried Zed?

          I haven’t tried Cursor, so don’t know how they compare, but I like Zed a lot.

          Anyway, would love to see a comparison from someone who has used a recent version of each.

          • turastory 10 hours ago
            A few years ago I tried Zed when it was still pretty early, but eventually settled on Cursor. I gave Zed another shot a few days ago because Cursor’s worktree support still feels pretty weak.

            In my setup I use multiple agents like Claude Code and Codex, and Zed’s ACP support makes it pretty nice to manage them all as “threads” in one place. Worktree switching also feels much smoother.

            Overall the experience was pretty good, but the way the agent and editor are integrated still feels a bit lacking, and tab completion is the big one for me. Cursor’s tab completion is still the best I’ve used.

            So now I’m using both. For work that needs a lot of focus and careful iteration, I use Cursor. For things that are easy to split into worktrees and hand off to agents, I use Zed with Claude/Codex.

            • chillfox 4 hours ago
              Interesting, is it that the tab completion is giving better results, or how it works is better?
              • ramses0 4 hours ago
                The tab completion is "faster than vim" from a long-time vimmer. It's at the point where a lot of times i'll lead with the comment instead of the code:

                    # now take the list and sort by x.lastName
                    <tab>
                
                ...and it'll "do the thing" (w/ type hints, its own comments, etc). Obviously in this very simple, understandable, completely contrived example, it's "trivial" (but 3 years ago would have seemed like magic), but it'll also pick up on "continuation / more of the same" type edits. A comment like `# use random_utility to call the api and only accept matches which supplement addresses that have already been found` will (usually) autocomplete all the gobbledy-gook w.r.t. tokens, URL's, function names, etc. so it's effectively an "automatic omni-complete with simplistic post-processing"

                Example #2: I was just fixing some vibe-coded slop, where it was taking `click.echo( some_api.whatever_endpoint() )` and the "slop" portion was literally emitting: `str('{ "A": 1, "B": 2 }')` and that function call was emitting it directly.

                On the command line, I was doing `blah whatever-endpoint --something | jq '.'` and got tired of the JQ thing, so I'm like: "I'll just use `json.dumps(...,indent=2)`", but lo and behold, I'm getting a dumb JSON string literal, not a pretty printed object shape.

                I start typing `json.loads(` to move from "str()" to "dict()" ... and it autocompletes the whole scenario (on that line), then I move to `def some_other_endpoint` and it basically has that same edit queued up. (ie: it "knows" what i'm about to do).

                ...so overall, "faster than vim", even with high skill bar for repetition, motion, macros, sed-style edits, etc. You can't beat: "<tab>", especially when it's lightly intelligent (ie: knows when/what/str/int, adapts do different function calls, etc).

          • nl 6 hours ago
            I've tried Zed and really didn't like it.

            I like VS Code with the Claude Plugin, and sometimes with the Codex Plugin

            • infecto 5 hours ago
              Tried it and it’s fine but the AI integration is not tight enough for me.
      • alach11 22 hours ago
        Isn't a large user base and the data collected from those users a moat of sorts?
        • onlyrealcuzzo 21 hours ago
          A moat is when you have something other's can't easily get.

          Every MAG 7 / FAANG company already has more users and more data...

          That's not a moat.

          That's traction.

          • LinXitoW 7 hours ago
            They don't have the same quality and kind of data. For example, Claude Code might have general conversation flow data for implementing feature X, but Cursor has users individual editing actions AND the chat flow. Which line did the user manually edit after the agent did it's thing? What's the commit message (if done manually)? Stuff like that is worth it's weight in gold.
          • wilg 13 hours ago
            That's not X.

            That's Y.

            • uxcolumbo 11 hours ago
              Been a bit out of the loop.

              What's wrong with using very short sentences like 'That's not X. That's Y.'?

              • arcanemachiner 11 hours ago
                Commonly used phrase by LLMs. Gives people slop vibes these days.
                • Kiro 9 hours ago
                  "It's not X, it's Y" is a good way to illustrate a point. Same goes for many other common LLM phrases. It's used because it's effective.
                • monsieurbanana 6 hours ago
                  Huh. I associate it with LinkedIn slop, which is probably 100% ai nowadays but they certainly didn't wait for llms.
            • DonHopkins 12 hours ago
              I fear the day that large parts of perfectly valid English language and punctuation are off limits for humans to use because LLMs use them too (having learned them from humans), and somebody will always whine and post low effort "slop" comments that are much more annoying and less useful than the slop itself, or even incorrectly whine about human written text that happens to match your hyper-sensitive slop detector.

              Plus you are always running the risk of being rude and insulting when incorrectly labeling text actually written by humans as slop — making a jackass of yourself — and opening yourself up to being trolled by humans purposefully inserting em-dashes and catch phrases just to trigger you. That's not clever. That's gullible.

              How much cognitive and physical effort and time do you put into trying to figure out if everything you read is slop, then complaining about it? If that's your job or calling in life, you could be easily replaced with AI. Find something more creative to do with your time.

              If you really object to low effort slop, and not just relish it as an opportunity to whine, then how about instead of posting low effort whines about slop, you put in the actual effort to do something about it, and rewrite the slop in a way that won't trigger your slop detector, then post that instead, to train AI not to write slop.

              Is your problem that it's slop, or that it's AI generated? Because your whining about low effort AI generated slop without contributing to the conversation or addressing the point of the comment you're replying to is just low effort human generated slop.

              Please don't post slop while complaining about slop.

        • AussieWog93 21 hours ago
          Honestly the data itself is probably worth heaps even in the company itself collapses. Early attention engineering when humans were still in the loop!!!
          • NitpickLawyer 13 hours ago
            > Early attention engineering when humans were still in the loop

            Exactly. Cursor was the first product used by tons of devs on real codebases. Just the signal "acceptance rate" is huge and can't be easily captured w/ synthetic data.

      • kkukshtel 23 hours ago
        And its still just a vscode fork
        • icemelt8 11 hours ago
          Cursor 3 is a complete rewrite, its no longer a fork.
          • gkbrk 7 hours ago
            It's still a VSCode fork. Even Cursor's own About window tells you it's VSCode.

              Cursor
              Version: 3.4.20
              VSCode Version: 1.105.1
          • muhfournik 6 hours ago
            I believe the agent view is a complete rewrite, and maybe the other parts but not the editor itself
    • farco12 2 hours ago
      One would hope the vscode fork with a $50B valuation and no moat, would wisely spend the money they raised to build a moat.
    • liuliu 1 day ago
      Since the frontier is only 8-month ahead of DeepSeek, it is hard to see how model training can be a moat as all the tricks are available from open labs in China. You really just need <100m to bootstrap at this point.
    • wg0 14 hours ago
      This was the only way forward.
    • the_duke 12 hours ago
      In my opinion cursor actually has one of the best harnesses again at the moment.
    • Lionga 1 day ago
      They are still a vscode fork with no moat? Like they lost about 70% of users in half a year which goes to show how there is not even the tiniest of moat.
      • GenerWork 1 day ago
        I feel like they've been targeting enterprise pretty hard. I know my company uses them, and the companies that hire us also use Cursor.
        • Squarex 13 hours ago
          All enterprises I know use GitHub copilot as they already have Office, Teams, … wonder how will it change with the recent pricing changes
        • pjmlp 11 hours ago
          I can tell my company wants nothing with them.
        • kvetching 19 hours ago
          Cursor will definitely win the enterprise for coding. Enterprises aren't going to trust a TUI
          • esafak 14 hours ago
            Why not? That makes no sense to me.
      • kilroy123 8 hours ago
        I think it's going to be brutal for them to compete with OpenAI and Anthropic.

        I switched to claude code because of usage. For $200 a month, I would run out of usage halfway through the month. Then be forced to use their composer model or whatever slow, dumb model they served up in their "auto" mode.

        For that same $200 a month, I could use claude code and basically never hit usage limits.

        I don't understand what people are doing who run into the limits on that max x20 plan. I NEVER have.

    • whywhywhywhy 1 day ago
      It's still a VsCode fork just now with a Kimi fine tune and still no moat...

      I won't debate that it turns out none of this mattered when it came to being as successful company though and kinda makes anyone who tried to roll their own instead of fork look a little silly.

      • hkleppe 13 hours ago
        "No moat", well...

        How I see this is that its so important to bundle the model with the right tooling.

        Like a racecar, having the best engine doesn't help if the rest of the car lacks other winning properties (reliability, aerodynics etc).

        So for Cursor, which IMO, they put themself in a strong position by having both a solid IDE __and__ a solid+cost efficient model. Those two working great in combination for the task they are designed to solve (coding) is more important than benchmarks

    • DeathArrow 11 hours ago
      >Really impressive how they've made this jump from being called the vscode fork with no moat just a couple of months ago.

      With so much money and computing from SpaceX, is not so impressive.

    • make3 11 hours ago
      why is that part impressive specifically? they got purchased by SpaceX, they have access to infinite compute and cash now.

      & now they're still losing all of their users to Claude Code and Codex.

      • DeathArrow 10 hours ago
        >& now they're still losing all of their users to Claude Code and Codex.

        Why pay for Cursor when I can use GLM 5.1, Kimi K2.6, MiniMax M2.7, Xiaomi MiMo V2.5 Pro and Deepseek v4 for cheap and use whatever harness I want, including Claude Code.

        It's not like Cursor harness is the best out there.

        And even if I want to edit the code, I don't need to run the agent harness in an IDE.

        • make3 8 hours ago
          these are in the trillion parameters range, not sure it's actually that cheap to have at a reasonable speed without quality degradation & without like.. your own DGX B200
          • DeathArrow 7 hours ago
            I didn't say to run them at home. There are some cheap coding plans that gets you plenty of usage for the Chinese models.
    • aurareturn 1 day ago
      I doubt it's a brand new model. It's likely just Kimi K2.5 further trained on coding.
      • enraged_camel 1 day ago
        They didn't say it's a new model... in fact they said exactly what you just said.
  • memoryleakgame 22 hours ago
    If these benches from their site hold up (they likely wont)

    Wouldn't this compress ai revenue like 15x quickly

    If they really have a 4.7 opus high equivalent at 1/16 the cost wouldn't this significantly effect all the current capex and planing

    Maybe they are getting elon to cover cost

    • vessenes 8 hours ago
      It's worth being specific:

      "Will this decrease Revenue?" -- only if demand for high quality tokens is inelastic. If demand is instead elastic (grows with cheaper pricing) then revenue will likely increase.

      "Will this lower earnings?" -- they have a current inference margin for their old models, and with the Elon deal in place, they have a new inference margin. It might be better or worse than their old one. If it's worse, then they'd need to see a concomitant increase in usage. If they don't, then yes it might lower earnings.

      "Will this lower corporate value?" -- no - not least because this company is going to be owned by SpaceX approximately 90 days after IPO -- so all the new owner will care about is being benchmark competitive with Anthropic and oAI for the first n quarters. If they can do that, it will massively increase the corporate value of SX; it's hard to build a frontier lab.

    • infecto 21 hours ago
      The way I have read their benchmark results is that they trained a model to work insanely well in their coding workflow. It’s not a general purpose model.

      One of the surprisingly hardest problems to solve is to get a model to use the tools you give it access to.

    • zackify 21 hours ago
      this thing is so awesome on fast mode, so far i am impressed, some of its observations feel similar to opus.

      i use gpt 5.5 and opus 4.7 a lot every day, if i can get good results at this speed, hopefully the usage level holds up on my team plan haha

    • smallnamespace 12 hours ago
      AI revenue has been going up while the cost per token has been rapidly falling. The Jevons paradox applies here. The cheaper software is, the more software is written. There is not a finite demand for software.
      • rafaelmn 11 hours ago
        > AI revenue has been going up while the cost per token has been rapidly falling

        Every model release now has been straight price increases since what GPT 4 ? When was the last time a new flagship model decreased prices compared to the previous one ?

        • jstummbillig 10 hours ago
          1. GPT 4 has gotten 6x cheaper over it's evolution (from initial release to Turbo to 4o). Maybe you meant "Only since 4o and only since its final release". Alas.

          2. We are not interested in how different model naming schemes relate to prices, we are interested in the capabilities. So if you want to learn something about price development you need comparative levels of capabilities, and then look at the prices. 4o is not comparable to 5.5 in the first regard. It is (according to the benchmarks) maybe more comparable to current 5 nano - which is 98% cheaper.

        • dktp 10 hours ago
          Opus 4.5 became significantly cheaper directly per token
          • rafaelmn 10 hours ago
            You are right I forgot about that ! I think my point still stands - price per token is not decreasing for frontier capabilities, in fact it's increasing.
            • radu_floricica 7 hours ago
              This only means the frontier is growing faster than the price is decreasing. It's just the sum of two separate tendencies, and has little predictive value. TBH, I'm ok with this tradeoff - higher capability at slightly higher cost is perfectly fine.
        • baq 11 hours ago
          token efficiency
          • chillfox 11 hours ago
            Not seeing that either, tried really using Opus 4.7 today, and it ended up at $50 for the same kida thing that came out to $25 last week with Opus 4.6.
            • baq 11 hours ago
              each model is different and nothing should be taken for granted, run your evals for your use cases. I'm not using Opus 4.7 for almost anything. I've seen very good improvements in GPTs since 5.2 and Opus 4.5 to 4.6 was quite an upgrade.
          • wesammikhail 10 hours ago
            Models consume more tokens than ever for the same tasks.
      • vb-8448 5 hours ago
        I, and I guess basically everyone here, don't have access to OAI or Anthropic books, and it's really difficult to disprove your statements but:

        - AI revenue going up & cost/token are not related metrics, at least not in the way you are assuming - basically all players (except OAI for the moment) struggling with capacity and/or reducing-dismissing subscription based solutions in favour of pay-per-use. If token cost/token was falling, we would see quite the opposite.

      • lompad 10 hours ago
        This is conjecture. There is a reason both openai and anthropic refuse to comment on inference costs. If it were falling so much, they would use it to brag. I really don't understand why so many people keep repeating it without any actual data for the frontier models.

        Apart from that, I'm not sure if focusing on tokens is even a good idea, because they are so different from model to model. I'd almost consider them a red herring now.

        We could look at tasks instead. Is there anything even remotely suggesting that your typical task you give an LLM now costs less in inference than before?

    • 2001zhaozhao 21 hours ago
      > compress ai revenue like 15x

      that roughly just puts it on par with OpenAI and Anthropic subscriptions in terms of pricing per token

    • romanovcode 11 hours ago
      The problem with this is that we do not know the actual cost. For all we know they might be pulling an Anthropic. Subsidizing costs to get users, then increasing them later on.
      • yorwba 9 hours ago
        They're offering a model based on Kimi K2.5 for $0.50/M input and $2.50/M output while the cheapest third-party provider on OpenRouter charges $0.40/M input and $1.90/M output https://openrouter.ai/moonshotai/kimi-k2.5 Those third-party providers have little incentive to subsidize their customers, so Cursor probably has a margin >20% on their inference cost.

        The real money furnace is the training, not just of models that get released, but also experimental training runs that fail to move benchmarks and are quietly thrown away. E.g. Cursor claim that 85% of the compute for Composer 2.5 comes from additional training on top of Kimi K2.5, where I'm not sure how they determined that, but it can't have been cheap. Then they say "Together with SpaceXAI, we're training a significantly larger model from scratch, using 10x more total compute."

        So yes, they're probably attempting to replicate the Anthropic playbook of paying a large upfront cost for a very good model, and then rapidly acquiring paying customers, hoping that the inference margin will be enough to cover the training cost.

    • epolanski 11 hours ago
      I'm not sure that to be the case, it seems like bringing capabilities up and costs down merely serves to induce more demand.
  • PUSH_AX 1 day ago
    They set themselves up for flack when they use whatever these evals are… they did the same for composer 2 which was evaled in close competition with frontier models, spoiler alert, it wasn’t even close in practice.

    So now 2.5 is supposed to compete with opus 4.7? Sure…

    • jmcqk6 1 hour ago
      That does not match my experience. Composer 2 was fantastic for my uses, and I hit Composer 2.5 with some very difficult things last night, which it handled fast and effectively. I don't really care about benchmarks. I care about practice, and in practice, it's been very very good for me.
    • tuo-lei 1 day ago
      they say it themselves in the post - behavior dimensions "not well captured by existing benchmarks". that was the exact problem with composer 2. not dumber on individual tasks, just bad at session-level decisions like when to stop editing, how much context to carry forward, when to re-read a file vs assume. you don't catch any of that in an isolated eval.
    • infecto 21 hours ago
      As I have said before in prior composer threads. The proof is in the usage. I am inclined to somewhat believe the results as I use composer and also take the results for the given context. It’s not a general purpose sota model. It’s a model that runs inexpensively in their coding workflow that is creating results similar to opus or gpt.
    • criemen 1 day ago
      Well is that a statement about the quality of Opus 4.7 or about compose 2.5? :P
  • jtwaleson 1 day ago
    Ok this might be weird but I've moved everyone in my 4 person team to our team plan and costs seem to have sky rocketed compared to the individual plans. Where before most people spent 20-100 USD, now the total bill is more like 1k USD. I haven't gone into the details but it feels like I'm being scammed.
    • mohsen1 11 hours ago
      We moved off Cursor and onto Codex + Claude Code. Cost went from multiple thousand per engineer per month to about $500
      • zackify 6 hours ago
        Best deal currently:

        Cursor team Codex team Claude team

        Swap between the models when limited.

        I am saving our company a lot of money vs Claude enterprise usage cost

    • DedlySnek 13 hours ago
      My company is shifting us from Cursor to Claude due to increased costs.
    • danbrooks 1 day ago
      Check which model you're using.

      The fast version of composer is the default now (which costs ~x3 as much).

    • infecto 21 hours ago
      Keep in mind I believe there is a larger buffer given to personal plans. If they have 50% extra with the personal plan you now only get 25%.
    • skeptic_ai 11 hours ago
      I did some monitoring. 15 accounts, 300 millions tokens input, 200k output went to 0 the 5h quota in 7 hours. 4 parallel tasks.

      I think 300 million is too low. For reference before I could do more than 1 billion on same conditions.

    • PUSH_AX 1 day ago
      My cursor costs sky rocketed recently too
  • zurfer 9 hours ago
    Kudos to the team. Please consider making the model available via API!
  • everfrustrated 1 day ago
    • dang 14 hours ago
      Thanks! Link belatedly changed above.
  • machiaweliczny 6 hours ago
    Tested and it's good. Fast version is bad though. I like planning model in Cursor that it works more like human written design doc instead of too detailed AI plan. Seems like this is more responsible for results that model but still on fast it failed but on normal got good results.
  • m_mueller 13 hours ago
    It's a bit confusing to me why they'd make this 'fast' version the default, as it appears to be much more expensive than Composer 2. Wasn't it supposed to be a very cheap alternative to SOTA models?
    • mrklol 12 hours ago
      Isn’t it a really cheap alternative to sota models (according to benchmarks)?
  • WhitneyLand 6 hours ago
    Say what you want about Cursor but they don’t lack for ambition.

    Forking VS Code, going big on bleeding edge features like cloud agents, and now they’ve thrown down the gauntlet directly challenging frontier labs by training their own model (“much larger” than Kimi 2.5’s 1T parameters) from scratch.

    They’ve been highly successful so far. Raised $50B, $2B in revenue, forecast to end 2026 above $6B. But even at these heights, they’re just not in the same league as OpenAI/Anthropic/Google.

    And if building a state of the art multitrillion parameter model is not challenging enough, it’s a mountain you don’t climb just once. Every few months you need to push it farther with a new release. Fall off for a couple cycles and like Facebook you may never catch up again.

    Not for the faint of heart.

    • pdq 4 hours ago
      Why is this comment upvoted?

      It is most likely AI generated with a nice "Raised $50B" hallucination and filled with cliches ("thrown down the gauntlet", "mountain you don’t climb just once", "not for the faint of heart").

      • Aurornis 3 hours ago
        Good catch. I didn’t even notice it at first, but the hallucinations on top of cliches gives it away.

        The account doesn’t have a history of other comments that have too much of an AI vibe, but this one does. Even if it wasn’t AI, it’s misinformation.

    • Aurornis 5 hours ago
      EDIT: As others have pointed out, the comment above contains hallucinations (Like the $50 billion number) and a lot of AI tells. The account doesn’t have a history of AI-like comments but the hallucinations and structure in this one are suspicious. If anything, don’t trust the numbers it cites because they’re made up.

      Cursor is a team that I want to see succeed. They have stacked their company with very smart people and they’re going hard at a highly competitive market. We all win when there is more competition and more innovation.

      My problem is that every few months I look at Cursor’s product offerings and maybe retry it, but it never feels like something I want to use. Part is personal preference, the other part is the fact that my combination of other tools and services just does a better job. Their biggest advantage felt like first-mover advantage when they came out early and captured market share, but at in person meetups I hear stories about companies switching away from Cursor or trying to convince their management to let them switch away. They need to come up with a compelling advantage fast, which is a hard thing to do against the other companies with their virtually unlimited budgets by comparison.

      • adamkeys 5 hours ago
        Same, I kick the tires on Cursor every several weeks wanting to find they've finally crossed some chasm I can't quite explain. But every time, I bounce off the ground-truth that they're forked off vscode, which just isn't for me. I think moving agents to the center of their experience and developing a model that focuses on speed/efficiency over maximum depth is a promising step away from being a spicy vscode fork.
        • whs 4 hours ago
          My company is heavy on Cursor and I still ask them to provide me GitHub Copilot, for the sole reason that Cursor is probably the reason Microsoft had to implement technical enforcement of their TOS on proprietary plugins. Previously, you could use PyLance on VSCodium but now those plugins do not work outside VSCode anymore.

          If Cursor (and every other commercial VSCode forks) didn't use MS extension store in the beginning and violate the TOS these might not have happened.

        • chrisrickard 4 hours ago
          Cursor 3 is a full rewrite. No VS Code
    • causal 6 hours ago
      Yeah I want them to do well. I find Cursor to be a much better tool for actually working with the code the agent writes than whatever the big vendors provide.
    • highfrequency 4 hours ago
      > now they’ve thrown down the gauntlet directly challenging frontier labs by training their own model (“much larger” than Kimi 2.5’s 1T parameters) from scratch.

      To clarify, the model Composer 2.5 announced in this post is not that; it uses Kimi 2.5 as a strong starting point. This is not to discount Cursor's work or future ambitions, but one of the most striking things about the last 6 months is that multiple open-source models/labs are now within striking distance of the frontier closed-sourced labs.

      See eg Kimi 2.6 benchmarks: https://www.kimi.com/blog/kimi-k2-6

    • didroe 5 hours ago
      They have no choice but to train their own model to try and survive. They're paying API pricing for the top tier models but competing against subsidized subscriptions.
    • dtagames 4 hours ago
      As a heavy user, I don't think the model is their product. Cursor is primarily a harness and lately, a specialized agent dashboard.

      Composer, their in house model, is dispatched by other models like Claude Opus for individual items on a task list. No one is suggesting you write your main prompt to Composer 2.

    • worldsavior 5 hours ago
      Them raising this much money doesn't mean they're successful, it only means they know how to fool the investors well. A project that is basically an extension to VSCode only adding a chat interface, isn't really worth this much money. Obviously, it's the users, but people think it's something genius and revolutionary, but no.
      • infecto 4 hours ago
        This is rsync all over again. Go create it yourself if you think it’s just a simple extension.
    • benmusch 4 hours ago
      they aren't "throwing down the gauntlet", they're trying to find ways to eke margin out of their product by owning a commodity-level coding model. it's an impressive engineering task but it's not particularly ambitious.
    • Survey8430 3 hours ago
      AI comment... BOO!
  • granzymes 22 hours ago
    Surprised this got pushed off the front page so quickly! It’s exciting to see what the Cursor team has been able to do with significantly fewer resources than the frontier labs.

    I do wish they weren’t joining xAI. Something tells me there will be a contingent of researchers that departs Cursor if that merger is consummated.

  • luodaint 8 hours ago
    Benchmarks measure turn-level capabilities: you feed a task into the system and then grade the result. Capability for production-level usage concerns session-level decision making: does the agent know when to stop editing, retain the right amount of context, or go back and reread the file if the state has changed?

    This is not a property of the model, but a property of the discipline; it can be operationalized by what you have documented before the session begins. Without "stop editing where you can no longer follow your changes to the spec" and "go back and read the migration file before changing the schema," there is nothing to halt the process until it fails integration.

    Those teams who get consistent results independent of the model being used typically do so because they have operationalized their discipline first. Those switching out models monthly tend to expect the model to supply them.

  • jorl17 7 hours ago
    I want to like composer, but I just can't.

    - Its communication style is completely opposite to Anthropic models. It's not as bad as OpenAI's models, which are obsessed with "shapes", "wrinkles", hyphenated-words, and other cryptic formulations that make you feel like you're not on planet earth after a while talking to them. But it is nonetheless markedly "rude", "dry", "cold", gives off this "entitled I'm right, you're wrong" attitude. I once had composer2-fast accidentally run `rm -rf $HOME` (no harm done) as part of a bug in an install script it wrote and all it could say once it realized it was: "Running script with proper hardening". Qwen's models have clearly been distilled from Anthropic models because they have a much closer communication style and that's why I hope cursor will one day release a new family of composer models derived from that. A damn joy to use.

    - It's just dumb. I don't know what they're doing with benchmarks, but for my work (python, bash, docker, whatever), cursor is just incredibly dumb. Always does in 10 lines what could be done in one. Doesn't know loads of internals of things that other models know. Never places things in the right files, constantly makes terrible edits (inline imports, edits without testing). Everything is so complicated when done by composer2, it's just a joke to me at this point. It clearly needs more handholding than Opus 4.x or GPT-5.x. I tried 2.5-fast and it seemed more of the same. And this would sort of be acceptable if it owned up to its incompetence, but it is so confidently incompetent that it's revolting.

    I know that for many people the "tone" of the models is not relevant, or maybe they even prefer models like these. I simply cannot work like that.

    Ever since Gemini started blowing benchmarks out of the water while being a clearly inferior model incapable of producing anything (and pretty much just doing tool calls without any feedback to the user), I gave up on benchmarks. Composer has been more of the same in that regard.

    As a GPT model would say:

       "Small wrinkle: the production-ready benchmark results were tainted by real-world data points. I've assimilated the inconsistencies and added guardrails so that v2 has the right shape for future evaluations."
  • 0fes911 8 hours ago
    I found composer 2 pretty good as a subagent delegating tasks like auditing for bugs after finishing implementation, but hopefully composer 2.5 will be more reliable so it can be used to implement and execute long running tasks.
  • bingud 10 hours ago
    Seems like a promising and useful model but its probably scary how much customer data they fed into it to reach this performance
  • Armonsrer 4 hours ago
    It looks a massive update from cursor and i like their platform Let hope its good
  • uf00lme 13 hours ago
    I wonder why they didn’t train off Kimi 2.6, I hope is it because they already had a good base and not that they messed up that relationship.
    • NitpickLawyer 11 hours ago
      > and not that they messed up that relationship.

      There's nothing to mess up. The license is MIT w/ attribution, and the attribution clause can be easily sidestepped w/o any legal repercussions. The "drama" was simply content creators going nuts over some misunderstandings and poor comms from some kimi related devs.

    • re-thc 13 hours ago
      That's 3.0
  • I_am_tiberius 10 hours ago
    I hope people soon wake up to the fact that they use user data for model fine tuning.
  • try-working 10 hours ago
    A lot of people saying Cursor have no moat. Sure. Neither do OpenAI or Anthropic.
    • svantana 10 hours ago
      You could say they have a sort of anti-moat (drawbridge?) since you can use their product to create a competitor. But that's true of most dev tools, in a sense.
    • neevans 10 hours ago
      [dead]
  • big-chungus4 13 hours ago
    Can you please train Qwen 3.5 like 0.8B to 9B using the same training techniques
  • vanuatu 1 day ago
    It's always great that more companies are throwing their hat in the ring, especially focusing on value (latency + intelligence + cost)
  • Glohrischi 7 hours ago
    Hahah wtf? They are training on colossus 2? Their own model?

    Dude what the hell happened to Musks Grok? How incapable are they that they give away training compute to Cursor like this?

    Weird that the genius Musk doesn't need his own compute, after all shouldn't Macrohard (no joke) already building the worlds software from scratch?

    • mgambati 7 hours ago
      Words on the street is that xAI will buy cursor.
      • Glohrischi 7 hours ago
        Yeah for 10-60 BILLION. which again makes this even stupider.

        For this amount of money you can rebuild cursor and everything else on the market, and with the rest of 9-59 Billion, you just hire experts in coding and let them code real high quality code examples.

        And then you just use your existing grok pipeline and just add this functionality.

        This xAI stuff has to be run by idiots

        • radu_floricica 7 hours ago
          Buy "Cursor", not "Cursor's IP". This means brand, users, and a shitton of data.

          And if you combine a shitton of data with a lot of compute, large userbase and good engineers, you have a pretty good chance of doing something interesting.

          • Glohrischi 7 hours ago
            Yeah you know how much 10-60 Billion are?

            You could literaly just give your compute away for free for a year to pull people in.

            Make an API Endpoint for free with the caviat that they are allowed to use the data for traing, what everyone else does too.

    • timmmmmmay 5 hours ago
      it seems like they were trying that last year, it didn't work, so he flipped out and fired everyone and now plan B is to buy Cursor and run a quick rename of "Composer 3" to "Grok 5"
  • DeathArrow 10 hours ago
    I think anybody will be much better by acquiring a coding plan from Kimi.com and using Kimi K2.6, with whatever harness they like, including Claude Code, instead of paying more for Cursor's version of Kimi K2.5.
  • enraged_camel 6 hours ago
    I tested it yesterday. It is pretty bad. Just like with Composer 2, it's fast, but quality is nowhere near what Cursor claims with their benchmarks. It is not even at Opus 4.5 level.

    I gave it a mix of refactoring tasks and new feature tasks. For each one, I had it write a plan, then I had Codex review it. Codex found major issues with every plan: patterns that don't match the rest of the code base, hallucinated variable/function names, and even outright bugs in the way the plan was written. I fed the feedback to Composer 2. After it made the changes and implemented the revised plan, I had Codex and Opus 4.7 do code reviews, and once again both of them found major bugs.

    Overall it was a very frustrating experience. I feel like I wasted a whole day. Which is sad, as I have been looking for an excuse to come back to Cursor. But as things stand, Codex + CC combo cannot be beat, not just in terms of price but also quality.

  • lukebrichey 22 hours ago
    this feels super bullish on cursor/spacexai's ability to train a frontier level model. could be truly SOTA on coding given that their RL data is this powerful
  • jdlyga 1 day ago
    It's a bit odd that they're not comparing it against Sonnet
    • jjice 1 day ago
      I don't think so. They're comparing it to the highest tier available models from Anthropic and OpenAI. Generally speaking, Opus is better than Sonnet in almost every way, so why have the redundancy?
      • 3836293648 12 hours ago
        Price to performance?
        • jjice 2 hours ago
          I think their comparison to how their benchmarks compare to Opus are a great way to show "look at similar benchmarks for a fraction of the cost". If it has Opus benchmarks (I don't actually take benchmarks seriously, but for their comparison purposes) and Sonnet is still more than half the price of Opus, I figure it's close enough where it doesn't matter.
    • CodingJeebus 1 day ago
      The tweet specifies that the new model is geared towards long-running tasks, which is what you'd use a model like Opus for anyway.
  • svclaws 1 day ago
    Their previous Composer was already marketed as a cheap model capable of competing with SOTA on most tasks. The evals they shared back then backed this up but in my day-to-day usage it fell short across the board. Canceled my cursor subscription and switched to Claude Code a few weeks ago. It has its own shortcomings but in terms of model capability and UX quality Cursor will have a hard time competing in the long term. Elon Musk will be a very good way out for them.
  • ChrisArchitect 1 day ago
  • polski-g 22 hours ago
    I don't know why their model isn't on Openrouter yet. They must not have enough capacity to offer it.
  • Dongyu_Jia 10 hours ago
    Will this be the cursor's last dance? LoL
  • sergiotapia 1 day ago
    Congratulations on the launch! I'm interested in trying Cursor but it's very confusing what I should buy. What does the Pro $20 plan get me in usage if I only use Composer 2.5? How fast is the model?
    • darkwi11ow 1 day ago
      I use $20 plan on daily basis for more than a year now, and have yet to exhaust that limit. The plan includes $20 in api costs for non-Cursor premium models and $20 for Composer and Auto models provided by Cursor themselves.

      That said, I am pretty old-fashioned coder and use LLM mostly to overcome the blank page problem, which means I review and often rewrite LLM output by hand and avoid prompt loops for a single task.

      People who are aiming to not read code any more might find this $20 plan lacking for their needs, however for my needs it fits perfectly.

      • kaizoku156 1 day ago
        The limits are probably even higher than that, i seem to get about 100$+ of usage on composer and about 45-50 usd on non composer models
  • re-thc 1 day ago
    Did they just upgrade Kimi 2.5 to 2.6?
  • k3ymaker 2 hours ago
    [dead]
  • joka88xj 5 hours ago
    [flagged]
  • NikolaosC 5 hours ago
    [dead]
  • vinzdg 6 hours ago
    [flagged]
  • scuderiaseb 1 day ago
    [dead]
  • contextcost 12 hours ago
    [dead]
  • SadErn 12 hours ago
    [dead]