The Chinese are doing what they have been doing to the manufacturing industry as well. Take the core technology and just optimize, optimize, optimize for 10x the cost/efficiency. As simple as that. Super impressive. These models might be bechmaxxed but as another comment said, i see so many that it might as well be the most impressive benchmaxxing today, if not just a genuinely SOTA open source model. They even released a closed source 1 trillion parameter model today as well that is sitting on no3(!) on lm arena. EVen their 80gb model is 17th, gpt-oss 120b is 52nd
https://qwen.ai/blog?id=241398b9cd6353de490b0f82806c7848c5d2...
The biggest takeaway is that they claim SOTA for multi-modal stuff even ahead of proprietary models and still released it as open-weights. My first tests suggest this might actually be true, will continue testing. Wow
Most multi-modal input implementations suck, and a lot of them suck big time.
Doesn't seem to be far ahead of existing proprietary implementations. But it's still good that someone's willing to push that far and release the results. Getting multimodal input to work even this well is not at all easy.
If you're in SF, you don't want to miss this.
The Qwen team is making their first public appearance in the United States, with the VP of Qwen Lab speaking at the meetup below during SF teach week.
https://partiful.com/e/P7E418jd6Ti6hA40H6Qm
Rare opportunity to directly engage with the Qwen team members.
Their A3B Omni paper mentions that the Omni at that size outperformed the (unreleased I guess) VL. Edit: I see now that there is no Omni-235B-A22B; disregard the following. ~~Which is interesting - I'd have expected the larger model to have more weights to "waste" on additional modalities and thus for the opposite to be true (or for the VL to outperform in both cases, or for both to benefit from knowledge transfer).~~
I can see how it would be in China's interest to make sure there was an LLM that produced cutting edge performance in Chinese-language conversations.
And some uses of LLMs are intensely political; think of a student using an LLM to learn about the causes of the civil war. I can understand a country wanting their own LLMs for the same reason they write their own history textbooks.
By releasing the weights they they can get free volunteer help, win hearts and minds with their open approach, weaken foreign corporations, give their citizens robust performance in their native language, and exercise narrative control - all at the same time.
They might have dozens of reasons, but they already did what they did.
Some of the reasons could be:
- mitigation of US AI supremacy
- Commodify AI use to push forward innovation and sell platforms to run them, e.g. if iPhone wins local intelligence, it benefits China, because China is manufacturing those phones
They also released today Qwen3-VL Plus [1] today alongside Qwen3-VL 235B [2] and they don't tell us which one is better.
Also, qwen-plus-2025-09-11 [3] vs qwen3-235b-a22b-instruct-2507 [4]. What's the difference? Who knows.
[1] https://modelstudio.console.alibabacloud.com/?tab=doc#/doc/?...
[2] https://modelstudio.console.alibabacloud.com/?tab=doc#/doc/?...
[3] https://modelstudio.console.alibabacloud.com/?tab=doc#/doc/?...
[4] https://modelstudio.console.alibabacloud.com/?tab=doc#/doc/?...
Doesn't seem to be far ahead of existing proprietary implementations. But it's still good that someone's willing to push that far and release the results. Getting multimodal input to work even this well is not at all easy.
https://openrouter.ai/qwen/qwen3-235b-a22b-thinking-2507
Now with this I will use it to identify and caption meal pictures and user pictures for other workflows. Very cool!
- https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Thinking
- https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Instruct
Relevant comparison is on page 15: https://arxiv.org/abs/2509.17765
And some uses of LLMs are intensely political; think of a student using an LLM to learn about the causes of the civil war. I can understand a country wanting their own LLMs for the same reason they write their own history textbooks.
By releasing the weights they they can get free volunteer help, win hearts and minds with their open approach, weaken foreign corporations, give their citizens robust performance in their native language, and exercise narrative control - all at the same time.
Some of the reasons could be:
- mitigation of US AI supremacy
- Commodify AI use to push forward innovation and sell platforms to run them, e.g. if iPhone wins local intelligence, it benefits China, because China is manufacturing those phones
- talent war inside China
- soften the sentiment against China in the US
- they're just awesome people
- and many more