Tiny-R1: 32B model achieves DeepSeek-R1 performance

Tiny-R1: 32B model achieves DeepSeek-R1 performance

Tiny-R1 has just 5% of DeepSeek-R1 parameters

Photo by Solen Feyissa on Unsplash

DeepSeek-R1 took the world by storm a few days back. The model being so good, and open source as well, became everyone’s first choice. But unfortunately, even after being open-sourced, the model couldn’t be used locally by common folks

Why? Its huge in size. 671B params precisely

For anyone with mediocre hardware, running DeepSeek-R1 is still a dream. But not any longer as a new model Tiny-R1, with just 32B params, that’s about 5% of total parameters of DeepSeek-R1, has almost matched its performance on major benchmarks

What is Tiny-R1 32B?

he Tiny-R1–32B-Preview model by Qihoo360 is a first-generation reasoning model designed to deliver near-R1 performance while utilizing only 5% of the parameters of full R1 models. The model is optimized using SuperDistillation and outperforms several larger models, such as Deepseek-R1-Distill-Llama-70B, particularly in tasks related to math, coding, and science.

What is SuperDistillation?

SuperDistillation is a technique that refines the process of knowledge distillation (transferring knowledge of big modelss to smaller models). While traditional distillation involves training a smaller model (the student) to replicate the behavior of a larger, pre-trained model (the teacher), superdistillation enhances this by focusing on transferring more fine-grained knowledge, such as internal representations or intermediate features, in addition to the final outputs. This leads to more efficient and effective student models.

Key Features:

  • Performance: Tiny-R1–32B-Preview achieves high scores across various domains:

Math:

Tiny-R1–32B-Preview (78.1) is very close to Deepseek-R1 (79.8) but slightly lower.

Both Deepseek-R1-Distill models lag behind, with scores of 72.6 (Qwen-32B) and 70.0 (Llama-70B).

Coding:

Tiny-R1–32B-Preview (61.6) outperforms the Deepseek-R1-Distill models (57.2 and 57.5).

Deepseek-R1 shows the highest performance in this domain (65.9).

Science:

Tiny-R1–32B-Preview (65.0) is quite competitive with Deepseek-R1-Distill-Llama-70B (65.2), but still falls behind Deepseek-R1 (71.5).

How Tiny-R1 trained?

Base Model Selection:

  • The team started with Deepseek-R1-Distill-Qwen-32B, a large pretrained model.

Supervised Fine-Tuning (SFT):

  • They applied Supervised Fine-Tuning (SFT) to adapt the model to three specific domains: Mathematics, Code, and Science.
  • This involves training the model on domain-specific data to specialize it for each task.

360-LLaMA-Factory Framework:

  • The fine-tuning was done using the 360-LLaMA-Factory training framework, which is designed to efficiently train large models on specialized tasks.

Using Open-Source Data:

  • For each domain, open-source data was used as seeds (starting points).
  • These seeds consisted of questions in Math, Code, and Science to help the model learn task-specific knowledge.

Generating Responses with Deepseek-R1:

  • The model, Deepseek-R1, was used to generate appropriate responses for each domain (Math, Code, and Science) based on the seed questions.

Creating Specialized Models:

  • From this, three specialized models were created, one for each domain: Math Model, Code Model, and Science Model.

Combining Models Using Mergekit:

  • The team then used the Mergekit tool (developed by the Arcee team) to combine these three specialized models into one unified model.

Creating Tiny-R1–32B-Preview:

  • The final result was Tiny-R1–32B-Preview, a compact model that demonstrates strong performance across all three domains.

How to use Tiny-R1?

The model is open-sourced and the weights are available on

qihoo360/TinyR1-32B-Preview · Hugging Face

Hope you try out the model


Tiny-R1: 32B model achieves DeepSeek-R1 performance was originally published in Data Science in your pocket on Medium, where people are continuing the conversation by highlighting and responding to this story.

Share this article
0
Share
Shareable URL
Prev Post

Is AI making Software Developers dumb?

Next Post

Wan2.1: Best open-sourced AI Video generation model, beats OpenAI Sora

Read next
Subscribe to our newsletter
Get notified of the best deals on our Courses, Tools and Giveaways..