Qwen3: Best Open-Sourced LLM, beats DeepSeek-R1, Llama4
How to use Qwen3 for free?
The much-anticipated Qwen3 has been released, and it looks like a model series monster. Developed by Alibaba Group, the model has been released in multiple variants ranging from 0.6B params to 235B params.
https://medium.com/media/339217de3362ff62df68d8d21c19c25c/href
Data Science in Your Pocket – No Rocket Science
Key Features Qwen3


https://medium.com/media/bc9880f1f348c80c3248033a2ad79f15/href
1. Model Variants & Open-Weighting
- MoE (Mixture of Experts) Models:
Qwen3–235B-A22B: 235B total params, 22B activated
Qwen3–30B-A3B: 30B total params, 3B activated
- Dense Models (Apache 2.0 licensed):
Qwen3–32B, Qwen3–14B, Qwen3–8B, Qwen3–4B, Qwen3–1.7B, Qwen3–0.6B
- Supports long-context (up to 128K tokens for larger models, 32K for smaller ones).
2. Hybrid Thinking Modes
- Thinking Mode: Deep, step-by-step reasoning for complex tasks.
- Non-Thinking Mode: Fast, direct responses for simpler queries.
- Enables better computational budget control (trade-off between speed and accuracy).
3. Multilingual Support (119 Languages & Dialects)
- Covers major language families:
Indo-European (English, French, Hindi, etc.)
Sino-Tibetan (Chinese, Cantonese, Burmese)
Afro-Asiatic (Arabic, Hebrew)
Austronesian (Indonesian, Tagalog)
Dravidian (Tamil, Telugu)
Turkic (Turkish, Uzbek)
Others (Japanese, Korean, Swahili, etc.)
4. Improved Agent & Coding Capabilities
- Enhanced interaction with environments (e.g., code execution, tool use).
- Supports Multi-Code Programming (MCP) for better coding performance.
5. Pre-Training Advancements
- 36 trillion tokens (2x more than Qwen2.5).
- Multi-stage training:
S1: 4K context, 30T tokens (general knowledge).
S2: Increased STEM/coding data (5T tokens).
S3: Extended to 32K context for long-context handling.
- Synthetic data generation using Qwen2.5-Math & Qwen2.5-Coder.
6. Post-Training Optimization
- Four-stage pipeline:
Long Chain-of-Thought (CoT) cold start (math, coding, reasoning).
Reasoning-based Reinforcement Learning (RL) (rule-based rewards).
Thinking mode fusion (blending fast & deep reasoning).
General RL fine-tuning (20+ tasks for instruction following, agent skills).
https://medium.com/media/7d45ba5464b696155af92aeca7bb544e/href
Benchmarks and Metrics




Summarising the above benchmarks
1. General Knowledge & Reasoning
- Outperforms: DeepSeek-V3, LLaMA-4-Maverick, and Qwen2.5 in MMLU (general knowledge) and BBH (complex reasoning).
- Strong in expert-level STEM reasoning (SuperGPQA), beating models like DeepSeek-V3 and LLaMA-4.
2. Mathematics & Science
- Superior in math benchmarks (GSM8K, MATH), surpassing Qwen2.5, LLaMA-4, and DeepSeek-V3.
- Leads in advanced STEM reasoning (GPQA), outperforming Gemini 2.5-Pro and DeepSeek-V3.
3. Coding & Programming
- Dominates code generation (LiveCodeBench, EvalPlus), beating GPT-4o, DeepSeek-V3, and LLaMA-4.
- Strong in Python programming (MBPP), exceeding Qwen2.5 and other open models.
4. Multilingual Tasks
- Best-in-class for multilingual understanding (MGSM, MMMLU), surpassing DeepSeek-V3 and Qwen2.5.
5. Efficiency (MoE Advantage)
- Qwen3 MoE models (e.g., 235B-A22B) match or exceed larger dense models (like Qwen2.5–72B) while using far fewer active parameters.
- Smaller MoE models (e.g., 30B-A3B) outperform much larger dense models efficiently.
How to use Qwen3?
As mentioned, the model weights are open source and can be found on Hugging Face.
also, if you wish to try out the model without deploying, you can check out qwen.chat as well as Hugging Face Spaces.
The model weights are also present on Ollama if you prefer local LLMs.
With this, it’s a wrap. I hope you try out the new Qwen3 series.
Qwen3 : Best Open-Sourced LLM, beats DeepSeek-R1, Llama4 was originally published in Data Science in Your Pocket on Medium, where people are continuing the conversation by highlighting and responding to this story.