Something is afoot in the land of Qwen

(simonwillison.net)

105 points | by simonw 1 hour ago

13 comments

  • sosodev 54 minutes ago
    I really hope this doesn't hinder development too much. As Simon says, Qwen3.5 is very impressive.

    I've been testing Qwen3.5-35B-A3B over the past couple of days and it's a very impressive model. It's the most capable agentic coding model I've tested at that size by far. I've had it writing Rust and Elixir via the Pi harness and found that it's very capable of handling well defined tasks with minimal steering from me. I tell it to write tests and it writes sane ones ensuring they pass without cheating. It handles the loop of responding to test and compiler errors while pushing towards its goal very well.

    • Twirrim 21 minutes ago
      I've been testing the same with some rust, and it's has spent a fair bit of time going through an infinite seeming loop before finally unjamming itself. It seems a little more likely to jam up than some other models I've experimented with.

      It's also driving itself crazy with deadpool & deadpool-r2d2 that it chose during planning phase.

      That said, it does seem to be doing a very good job in general, the code it has created is mostly sane other than this fuss over the database layer, which I suspect I'll have to intervene on. It's certainly doing a better job than other models I'm able to self-host so far.

      • sosodev 11 minutes ago
        Some of the early quants had issues with tool calling and looping. So you might want to check that you're running the latest version / recommended settings.
    • a3b_unknown 19 minutes ago
      What is the meaning of 'A3B'?
      • simonw 16 minutes ago
        It's the number of active parameters for a Mixture of Experts (misleading name IMO) model.

        Qwen3.5-35B-A3B means that the model itself consists of 35 billion floating point numbers - very roughly 35GB of data - which are all loaded into memory at once.

        But... on any given pass through the model weights only 3 billion of those parameters are "active" aka have matrix arithmetic applied against them.

        This speeds up inference considerably because the computer has to do less operations for each token that is processed. It still needs the full amount of memory though as the 3B active it uses are likely different on every iteration.

    • paoliniluis 44 minutes ago
      what's your take between Qwen3.5-35B-A3B and Qwen3-Coder-Next?
      • kamranjon 29 minutes ago
        In my experience qwen 3 coder next is better. I ran quite a few tests yesterday and it was much better at utilizing tool calls properly and understanding complex code. For its size though 3.5 35B was very impressive. coder next is an 80b model so i think its just a size thing - also for whatever reason coder next is faster on my machine. Only model that is competitive in speed is GLM 4.7 flash
        • xrd 22 minutes ago
          What do you use as the orchestrator? By this I mean opencode, or the like. Is that the right term?
          • simonw 20 minutes ago
            I use the term "harness" for those - or just "coding agent". I think orchestrator is more appropriate for systems that try to coordinate multiple agents running at the same time.

            This terminology is still very much undefined though, so my version may not be the winning definition.

      • sosodev 38 minutes ago
        In my experience Qwen3.5 is better even at smaller distillations. From what I understand the Qwen3-next series of models was just a test/preview of the architectural changes underpinning Qwen3.5. So Qwen3.5 is a more complete and well trained version of those models.
  • quantum_state 2 minutes ago
    I would second that Qwen3.5 is exceptionally good. In a calibration, it (35b variant) was running locally with Ada NextGen 24GB to do the same things with easy-llm-cli in comparison with gemini-cli + Gemini 3 Pro, they were at par … really impressive it ran pretty fast …
  • softwaredoug 52 minutes ago
    I wonder how a US lab hasn't dumped truckloads of cash into various laps to ensure these researchers have a place at their lab
    • velcrovan 27 minutes ago
      What the US has done is dumped truckloads of cash to make it likely that as a legal immigrant you will be abducted and sent to a camp.
    • mft_ 49 minutes ago
      Indeed; or, Europe badly needs a competitive model to hedge against US political nonsense.
    • ecshafer 9 minutes ago
      China is also giving them dump trucks full of cash though. Plus you have to content with the nationalism reason (unfortunately this has died off in America for too many). The idea of building your country is valued for most Chinese I have met. Plus China is incredibly nice to live in, especially if you have lots of money and/or connections. So you can work in China, get paid lots of money, feel like you are doing good. Or In America you can get paid lots of money, and get yelled at by people online because the Government wants to use your model.
      • danny_codes 2 minutes ago
        China city life is amazingly convenient. Trains and subways are just such an enormous quality of life boost. Add to that the relative cleanliness of having nearly zero homelessness and you’ve got something very compelling.

        I will say we are winning in accessibility. China doesn’t have much of a ramp game

    • bilbo0s 45 minutes ago
      They probably have tried, but you have to have more cash than those researchers feel they can get starting their own lab. When you consider the fact that their new startup lab would have the entire nation of China as, in effect, a captive market; you start to see how almost any amount of money would be too little to convince them not to make a run at that new startup. If money is their aim.

      I think Alibaba needs to just give these guys a blank check. Let them fill it in themselves. Absent that, I'm pretty sure they'll make their own startup.

      I do think it'd be a big loss for the rest of the world though if they close whatever model their startup comes up with.

      • simgt 0 minutes ago
        > I do think it'd be a big loss for the rest of the world though if they close whatever model their startup comes up with.

        That's very likely to happen once the gap with OpenAI/Anthropic has been closed and they managed to pop the bubble.

  • butILoveLife 40 minutes ago
    >I’m hearing positive noises about the 27B and 35B models for coding tasks that still fit on a 32GB/64GB Mac

    Isnt it interesting that you never see someone say "I used this on my Mac and it was useful"

    Instead we get "you could put this on your Mac" or "I tried it, and it worked but it was too slow"

    I feel like these people are performing an evil when they are making suggestions that cause a waste of money.

    • kamranjon 34 minutes ago
      I use Qwen 3 Coder Next daily on my mac as my main coding agent. It is incredibly capable and its strange how you are painting this picture as if its a fringe use case, there are whole communities that have popped up around running local models.
      • butILoveLife 26 minutes ago
        Can I doubt your claim? I have had such terrible luck with AI coding on <400B models. Not to mention, I imagine your codebase is tiny. Or you are working for some company that isnt keeping track of your productivity.

        I am trying super hard to use cheap models, and outside SOTA models, they have been more trouble than they are worth.

    • simonw 33 minutes ago
      The thing I'm most excited about is the moment that I run a model on my 64GB M2 that can usefully drive a coding agent harness.

      Maybe Qwen3.5-35B-A3B is that model? This comment reports good results: https://news.ycombinator.com/item?id=47249343#47249782

      I need to put that through its paces.

      • JLO64 27 minutes ago
        Yesterday I test ran Qwen3.5-35B-A3B on my MBP M3 Pro with 36GB via LM Studio and OpenCode. I didn’t have it write code but instead use Rodney (thanks for making it btw!) to take screenshots and write documentation using them. Overall I was pretty impressed at how well it handled the harness and completed the task locally. In the past I would’ve had Haiku do this, but I might switch to doing it locally from now on.
      • xrd 21 minutes ago
        I suppose this shows my laziness because I'm sure you have written extensively about it, but what orchestrator (like opencode) do you use with local models?
        • simonw 18 minutes ago
          I've not really settled on one yet. I've tried OpenCode and Codex CLI, but I know I should give Pi a proper go.

          So far none of them have be useful enough at first glance with a local model for me to stick with them and dig in further.

          • xrd 9 minutes ago
            I've used opencode and the remote free models they default to aren't awful but definitely not on par with Gemini CLI nor Claude. I'm really interested in trying to find a way to chain multiple local high end consumer Nvidia cards into an alternative to the big labs offering.
    • benatkin 33 minutes ago
      I think this is directing coders towards self-sufficiency and that's a good thing. If they don't end up using it for agentic coding, they can use it for running tests, builds, non-agentic voice controlled coding, video creation, running kubernetes, or agent orchestration. So no, it's not evil, even if it doesn't go quite as expected.
  • airstrike 1 hour ago
    I'm hopeful they will pick up their work elsewhere and continue on this great fight for competitive open weight models.

    To be honest, it's sort of what I expected governments to be funding right now, but I suppose Chinese companies are a close second.

  • skeeter2020 53 minutes ago
    Getting a bit of whiplash goin from AI is replacing people, to AI is dead without (these specific) people. Surely we're far enough ahead that AI can take it from here?

    Wild times!

    • vidarh 50 minutes ago
      Who is suggesting "AI is dead without (these specific) people"? People are wondering what it means specifically for the Qwen model family.
    • mhitza 47 minutes ago
      We've gone from AGI goals to short-term thinking via Ads. That puts things better in perspective, I think.
  • ilaksh 17 minutes ago
    Does anyone know when the small Qwen 3.5 models are going to be on OpenRouter?
  • zoba 1 hour ago
    I tried the new qwen model in Codex CLI and in Roo Code and I found it to be pretty bad. For instance I told it I wanted a new vite app and it just started writing all the files from scratch (which didn’t work) rather than using the vite CLI tool.

    Is there a better agentic coding harness people are using for these models? Based on my experience I can definitely believe the claims that these models are overfit to Evals and not broadly capable.

    • sosodev 47 minutes ago
      I've noticed that open weight models tend to hesitate to use tools or commands unless they appeared often in the training or you tell them very explicitly to do so in your AGENTS.md or prompt.

      They also struggle at translating very broad requirements to a set of steps that I find acceptable. Planning helps a lot.

      Regarding the harness, I have no idea how much they differ but I seem to have more luck with https://pi.dev than OpenCode. I think the minimalism of Pi meshes better with the limited capabilities of open models.

  • raffael_de 1 hour ago
    > me stepping down. bye my beloved qwen.

    the qwen is dead, long live the qwen.

  • ChrisArchitect 57 minutes ago
  • hwers 32 minutes ago
    My conspiracy theory hat is that somehow investors with a stake in openai as well is sabotaging, like they did when kicking emad out of stabilityai
  • multisport 40 minutes ago
    inb4 qwen is less of a supply chain risk than anthropic
  • vonneumannstan 45 minutes ago
    Were they kneecapped by Anthropic blocking their distillation attempts?