Seed-Thinking v1.5: New reasoning model beats DeepSeek-R1

Seed-Thinking v1.5: New reasoning model beats DeepSeek-R1

ByteDance Seed-Thinking-v1.5 key features

Photo by Markus Spiske on Unsplash

After a long wait of three months, DeepSeek-R1 by DeepSeek is defeated. And defeated by whom? By ByteDance Seed-Thinking v1.5, which now boasts to be the best reasoning LLM after Gemini 2.5 Pro.

What is Seed-Thinking-v1.5?

Seed-Thinking-v1.5 is an advanced reasoning AI model developed by ByteDance that specializes in solving complex problems by “thinking step-by-step” before giving an answer. It combines two key technologies:

  1. Mixture-of-Experts (MoE): Instead of using all parts of the AI at once, it activates only the necessary sections (20 billion out of 200 billion total parts), making it faster and more efficient.
  2. Reinforcement Learning (RL): The AI improves by learning from feedback, similar to how humans learn from rewards and mistakes.

How Was It Trained?

Supervised Fine-Tuning (SFT):

  • First, the AI was trained on 400,000 examples, including:
  • 300,000 structured problems (math, coding, logic puzzles).
  • 100,000 open-ended tasks (creative writing, conversations).
  • This helps the AI understand different types of questions before moving to advanced training.

Reinforcement Learning (RL):

  • The AI then practiced solving problems and received feedback on its answers.

Key Improvements in RL Training:

Value-Pretraining: The AI first learns from simple, correct answers before tackling harder ones.

Decoupled-GAE: Adjusts how the AI weighs short-term vs. long-term reasoning to avoid mistakes.

Online Data Distribution Adaptation: Balances training between easy and hard problems to prevent the AI from getting stuck.

How Does It Check Its Own Answers?

Instead of just saying “right” or “wrong,” Seed-Thinking-v1.5 uses two smart verification methods:

Seed-Verifier: A basic checker that confirms if answers match expected results.

Seed-Thinking-Verifier: A more advanced checker that explains why an answer is right or wrong, reducing errors and cheating.

For creative tasks (like writing), it uses a Pairwise Reward Model, which compares two responses and picks the better one based on human preferences.

Benchmarks and metrics

1. AIME 2024 & AIME 2025 — MATH DOMINANCE

  • Seed-Thinking: 86.7% (2024), 74.0% (2025)
  • DeepSeek: 79.8%, 65.0%

Why it matters: AIME (American Invitational Mathematics Exam) is notoriously tricky — it requires multi-step reasoning, not just formulas.

Verdict: Seed-Thinking is clearly better at structured mathematical reasoning.

2. Beyond AIME — EVEN TOUGHER MATH

  • Seed-Thinking: 48.0%
  • DeepSeek: 42.4%

This benchmark includes problems even harder than AIME, so the gap here matters

Verdict: Seed-Thinking handles abstract and multi-step problem solving better under pressure.

3. Codeforces — COMPETITIVE PROGRAMMING

  • Seed-Thinking: 55.0%
  • DeepSeek: 45.0%

This is all about algorithmic thinking, not just syntax.

Verdict: Seed-Thinking writes more correct, logic-tight code. Less likely to flub edge cases.

4. SWE-bench — REAL-WORLD SOFTWARE ENGINEERING

  • Seed-Thinking: 47.0%
  • DeepSeek: 49.2%

The one benchmark where DeepSeek slightly edges out — by a hair

Verdict: DeepSeek might still have a slight edge in dev tool usage or structured software tasks, but it’s basically a draw.

5. GPQA Diamond — HARD QA ON SCIENCE & TECH

  • Seed-Thinking: 77.3%
  • DeepSeek: 71.5%

This is about graduate-level science questions, where precision matters.

Verdict: Seed-Thinking offers more accurate, fact-grounded answers.

6. ARC-AGI — AGENT-LIKE PROBLEM SOLVING

  • Seed-Thinking: 39.9%
  • DeepSeek: 18.3%

This one is huge. ARC-AGI is about abstract reasoning and general intelligence — patterns, concept transfer, and analogies.

Verdict: Seed-Thinking performs over 2x better here. That’s AGI-level capability range.

The model has not been open-sourced yet. The GitHub repo is open, and I think the model will be dropped soon.

Read more about the model and its key features here:

GitHub – ByteDance-Seed/Seed-Thinking-v1.5


Seed-Thinking v1.5: New reasoning model beats DeepSeek-R1 was originally published in Data Science in Your Pocket on Medium, where people are continuing the conversation by highlighting and responding to this story.

Share this article
0
Share
Shareable URL
Prev Post

Google Firebase Studio: Google cooked Replit, Lovable, Cursor and Bolt

Next Post

OpenAI o3 and o4-mini released

Read next
Subscribe to our newsletter
Get notified of the best deals on our Courses, Tools and Giveaways..