Cohere's First Model for Developers

(cohere.com)

120 points | by hmokiguess 5 days ago

8 comments

  • amunozo 2 hours ago
    Are these models trained from scratch or do they necessarily need distillation from bigger models to be competitive? It's usually the case that they're a small model for a family with a bigger model. In the first case, does anybody know what's the economy of training this 30B-A3B model vs. training a DeepSeek V4 Pro or Flash size of models (1.6T, 200 something B, less activated)?
  • AbuAssar 46 minutes ago
    strange, I already submitted the same url 6 days ago:

    https://news.ycombinator.com/item?id=48475095

    • mkl 7 minutes ago
      What's strange? Yours got no comments, so another attempt seems okay. It's pretty random what gets to the front page when.
  • matt_daemon 6 hours ago
    > Hardware (minimum): 1× H100 @ FP8

    Cool to see this but seems like it would be pretty expensive to run

    • anon373839 5 hours ago
      This is a 30B parameter model with 3B active. It should run performantly on a Mac with > 48GB RAM at 8bit precision.
      • ltononro 2 hours ago
        Well that is like 3 USD/hour if you run it on a rented gpu
  • moojacob 9 hours ago
    I was a fan of coheres general purpose LLM. Command A I think? Before they came out with their reasoning model.

    More competition is better.

    • SubiculumCode 9 hours ago
      I always forget the VRAM requirements on these MOE things
      • sipjca 9 hours ago
        fwiw because of the relatively few activated params offloading to system RAM is quite feasible, you can see the endless amount of people doing this on r/localllama with qwen3.6 35a3b
        • bitwize 6 hours ago
          I ran Gemma4 26B A4B on an 8yo PC with a fucking GTX and it did rather well.
          • doodlesdev 2 hours ago
            Well, that's pretty impressive. Care to share your setup to do that? How much DDR3/DDR4 do you have, too?
  • tonyrice 10 hours ago
    I'm excited to see more OSS models
  • zuzululu 9 hours ago
    Wasn't aware that Cohere was still around but this release doesn't exactly instill confidence.
    • greyb 8 hours ago
      >Wasn't aware that Cohere was still around but this release doesn't exactly instill confidence.

      It's being kept alive because the Canadian government is desperate to have a local frontier lab and is willing to inject funding and force its adoption in government services, but leadership at Cohere is known to be weak in Canadian tech circles, and they pivoting to an enterprise-first market around production RAG rather than anything close to frontier work.

      I'm glad they're doing open weight releases but they're not viable in the long-run. It is embarrassing sharing similar spaces with them, but I'll try this release out in OpenCode and re-think afterwards.

      • suddenlybananas 5 hours ago
        It's embarassing? Awfully harsh!
        • moralestapia 1 hour ago
          It really is. I’m very familiar with that as well.

          It’s truly embarrassing how much hand-holding those guys have received from angels, investors, the government, etc. To the point where the same investors they’re going to pitch to are preparing their slides, telling them what to say during the presentation, and then approving them for even more funding afterward, lol.

          That government part is corruption and illegal, by the way.

          Actual usage on many of their APIs/models is painfully low, like in ... hundreds of DAUs. I don't blame them for this, but this is a "company" that should have died 2 years ago.

        • N_Lens 4 hours ago
          It's easy to be critical.
    • redwood 2 hours ago
      Aren't they focused on embeddings and strong there?
    • kadoban 8 hours ago
      Really? Why not. From the benchmarks at least it's a pretty decent small model.
  • cyanydeez 2 days ago
    looks like it's just qwen 3.6 coder.
    • lumost 12 hours ago
      its worse at code compared to qwen 3.6 coder.
      • stymaar 7 hours ago
        How can it be worse than something that doesn't exist?
        • amunozo 2 hours ago
          Sometimes non-existing is better than existing for unnecessary or harmful things. I know that is not what you mean but I just found it relevant in the age in which making new stuff is so fast and easy due to LLMs. Main enshitification would come, imo, not from bad things but for unnecessary things that nobody asked for.
    • SubiculumCode 10 hours ago
      Do you mean it's based on qwen 3.6 coder?
      • daemonologist 10 hours ago
        There is no "coder" version of Qwen 3.6; I think they just mean it's a coding-focused model of similar size and performance (to Qwen 3.6 35B-A3B).

        Regular Qwen 3.6 benchmarks slightly better and has much wider software support though, so this is probably of interest only to organizations which disallow models trained in China.

        • kadoban 9 hours ago
          I mean, Qwen 3.6 kicks ass. I don't know who these people are, but if their first outing is "not quite as good as Qwen 3.6", that's not a bad start by any means.

          30B vs 35B isn't nothing either.

          If it ends up just being some tweaks to someone else's weights, then meh.

          • mtone 8 hours ago
            It was trained from scratch by Cohere. They're the only Canadian AI lab - I'm glad they're releasing open weights and I wish them luck catching up!
  • moralestapia 12 hours ago
    >Our plan to being profitable is to give mediocre stuff for free