ByteDance Seed-Thinking-v1.5 key features
After a long wait of three months, DeepSeek-R1 by DeepSeek is defeated. And defeated by whom? By ByteDance Seed-Thinking v1.5, which now boasts to be the best reasoning LLM after Gemini 2.5 Pro.
What is Seed-Thinking-v1.5?
Seed-Thinking-v1.5 is an advanced reasoning AI model developed by ByteDance that specializes in solving complex problems by “thinking step-by-step” before giving an answer. It combines two key technologies:
- Mixture-of-Experts (MoE): Instead of using all parts of the AI at once, it activates only the necessary sections (20 billion out of 200 billion total parts), making it faster and more efficient.
- Reinforcement Learning (RL): The AI improves by learning from feedback, similar to how humans learn from rewards and mistakes.
How Was It Trained?
Supervised Fine-Tuning (SFT):
- First, the AI was trained on 400,000 examples, including:
- 300,000 structured problems (math, coding, logic puzzles).
- 100,000 open-ended tasks (creative writing, conversations).
- This helps the AI understand different types of questions before moving to advanced training.
Reinforcement Learning (RL):
- The AI then practiced solving problems and received feedback on its answers.
Key Improvements in RL Training:
Value-Pretraining: The AI first learns from simple, correct answers before tackling harder ones.
Decoupled-GAE: Adjusts how the AI weighs short-term vs. long-term reasoning to avoid mistakes.
Online Data Distribution Adaptation: Balances training between easy and hard problems to prevent the AI from getting stuck.
How Does It Check Its Own Answers?
Instead of just saying “right” or “wrong,” Seed-Thinking-v1.5 uses two smart verification methods:
Seed-Verifier: A basic checker that confirms if answers match expected results.
Seed-Thinking-Verifier: A more advanced checker that explains why an answer is right or wrong, reducing errors and cheating.
For creative tasks (like writing), it uses a Pairwise Reward Model, which compares two responses and picks the better one based on human preferences.
Benchmarks and metrics

1. AIME 2024 & AIME 2025 — MATH DOMINANCE
- Seed-Thinking: 86.7% (2024), 74.0% (2025)
- DeepSeek: 79.8%, 65.0%
Why it matters: AIME (American Invitational Mathematics Exam) is notoriously tricky — it requires multi-step reasoning, not just formulas.
Verdict: Seed-Thinking is clearly better at structured mathematical reasoning.
2. Beyond AIME — EVEN TOUGHER MATH
- Seed-Thinking: 48.0%
- DeepSeek: 42.4%
This benchmark includes problems even harder than AIME, so the gap here matters
Verdict: Seed-Thinking handles abstract and multi-step problem solving better under pressure.
3. Codeforces — COMPETITIVE PROGRAMMING
- Seed-Thinking: 55.0%
- DeepSeek: 45.0%
This is all about algorithmic thinking, not just syntax.
Verdict: Seed-Thinking writes more correct, logic-tight code. Less likely to flub edge cases.
4. SWE-bench — REAL-WORLD SOFTWARE ENGINEERING
- Seed-Thinking: 47.0%
- DeepSeek: 49.2%
The one benchmark where DeepSeek slightly edges out — by a hair
Verdict: DeepSeek might still have a slight edge in dev tool usage or structured software tasks, but it’s basically a draw.
5. GPQA Diamond — HARD QA ON SCIENCE & TECH
- Seed-Thinking: 77.3%
- DeepSeek: 71.5%
This is about graduate-level science questions, where precision matters.
Verdict: Seed-Thinking offers more accurate, fact-grounded answers.
6. ARC-AGI — AGENT-LIKE PROBLEM SOLVING
- Seed-Thinking: 39.9%
- DeepSeek: 18.3%
This one is huge. ARC-AGI is about abstract reasoning and general intelligence — patterns, concept transfer, and analogies.
Verdict: Seed-Thinking performs over 2x better here. That’s AGI-level capability range.
The model has not been open-sourced yet. The GitHub repo is open, and I think the model will be dropped soon.
Read more about the model and its key features here:
GitHub – ByteDance-Seed/Seed-Thinking-v1.5
Seed-Thinking v1.5: New reasoning model beats DeepSeek-R1 was originally published in Data Science in Your Pocket on Medium, where people are continuing the conversation by highlighting and responding to this story.