Qwen3–235B-A22B : Best Open-sourced model, beats Kimi-K2
How to use Qwen3–235B-A22B for free?
After Kimi-K2, we have another big break in open-source arena, i.e. an updated Qwen3–235B-A22B which has beaten Kimi-K2 on multiple benchmarks.
Qwen3 just released a new variant: Qwen3–235B-A22B-Instruct-2507. No launch event. No noise. But it quietly delivers one of the most capable open-source models we’ve seen this year. It improves performance, follows instructions better, and is optimized for real-world use.
My new book on Model Context Protocol is out now
Model Context Protocol: Advanced AI Agents for Beginners (Generative AI books)
Mixture of Experts: 235B Params, 22B at Runtime
It uses a Mixture-of-Experts (MoE) architecture. The model has 128 experts, but only 8 are active per token. That brings effective active size down to 22B, while still gaining from the larger network’s training capacity.
This matters: you get the scale of a large model without running it at full size. Lower cost, better throughput, easier to serve. It runs like a 22B, but thinks like something bigger.
Long Context: 262K Tokens
Qwen3 supports 262,144 tokens of context natively. This makes it useful for large documents, multi-turn conversations, legal texts, long-form code, and agent memory.
For comparison: most models still cap at 32K or 128K, often with tricks to fake longer contexts. Qwen3 handles long-range dependencies straight-up.
GQA for Efficient Attention
It uses Grouped Query Attention (GQA): 64 query heads and 4 key/value heads. The design improves memory usage and speeds up attention layers, especially at longer context lengths.
It’s a trade-off that reduces attention cost without giving up performance. Makes it more stable during inference too.
Instruct Variant
This version, Instruct-2507, is tuned to follow instructions properly. It handles multi-step queries, long-form reasoning, and structured prompts better than the base model.
The earlier version often missed parts of the prompt or responded vaguely. This one stays on-topic and gives direct answers. Alignment is better, without overdoing it.
Benchmarks: Better Across the Board

The model sees solid gains in real benchmarks:
- Reasoning: Improves scores on AIME25, ARC-AGI, ZebraLogic. These aren’t simple multiple-choice tests. They test logic and reasoning. This model holds up.
- QA: SimpleQA and CSimpleQA results jumped sharply. The instruct tuning helped it give factual, relevant answers.
- Multilingual: Performs more consistently across languages. Fewer random gaps in knowledge outside English.
- Coding: LiveCodeBench shows better code generation and execution accuracy compared to older versions.
- Writing & Dialogue: More coherent outputs in WritingBench and Arena-Hard. Less filler, fewer made-up answers.
It is able to beat kimi-k2 on multiple benchmarks
No Thinking Mode
Unlike some recent models, this one doesn’t support thinking mode, no <think></think> tags or internal reasoning stages. That’s good if you want clean, usable outputs without extra parsing. Makes it easier to plug into agents or pipelines. Less overhead.
Tools Integration via Qwen-Agent
It works with Qwen-Agent, which supports tools out of the box using the Model Context Protocol (MCP). This setup allows the model to call external tools (like web search, code interpreters, etc.) with minimal setup.
The tool-calling is structured, readable, and easy to extend. Useful for agents, workflows, and automation tasks.
Easy to Run
You can serve this model on:
- Transformers 4.51+
- vLLM, SGLang
- LMStudio, Ollama, llama.cpp
- OpenAI-compatible APIs
No proprietary stack needed. It can run locally or scale in production, depending on what you need.
How to use Qwen3–235B-A22B for free?
The model can be accessed at huggingface
Qwen/Qwen3-235B-A22B-Instruct-2507 · Hugging Face
Even the codes are mentioned on the same page
Final Thought
This model isn’t a demo. It’s made for use. If you’re working on anything that needs instruction-following, long context, tool calls, or multilingual QA, Qwen3–235B-A22B-Instruct-2507 is worth trying.
Not experimental. Not experimental. Just solid.
Qwen3–235B-A22B : Best Open-sourced model, beats Kimi-K2 was originally published in Data Science in Your Pocket on Medium, where people are continuing the conversation by highlighting and responding to this story.