Tiny-R1: 32B model achieves DeepSeek-R1 performance

Tiny-R1: 32B model achieves DeepSeek-R1 performance

Tiny-R1 has just 5% of DeepSeek-R1 parameters

Photo by Solen Feyissa on Unsplash

DeepSeek-R1 took the world by storm a few days back. The model being so good, and open source as well, became everyone’s first choice. But unfortunately, even after being open-sourced, the model couldn’t be used locally by common folks

Why? Its huge in size. 671B params precisely

For anyone with mediocre hardware, running DeepSeek-R1 is still a dream. But not any longer as a new model Tiny-R1, with just 32B params, that’s about 5% of total parameters of DeepSeek-R1, has almost matched its performance on major benchmarks

What is Tiny-R1 32B?

he Tiny-R1–32B-Preview model by Qihoo360 is a first-generation reasoning model designed to deliver near-R1 performance while utilizing only 5% of the parameters of full R1 models. The model is optimized using SuperDistillation and outperforms several larger models, such as Deepseek-R1-Distill-Llama-70B, particularly in tasks related to math, coding, and science.

What is SuperDistillation?

SuperDistillation is a technique that refines the process of knowledge distillation (transferring knowledge of big modelss to smaller models). While traditional distillation involves training a smaller model (the student) to replicate the behavior of a larger, pre-trained model (the teacher), superdistillation enhances this by focusing on transferring more fine-grained knowledge, such as internal representations or intermediate features, in addition to the final outputs. This leads to more efficient and effective student models.

Key Features:

  • Performance: Tiny-R1–32B-Preview achieves high scores across various domains:

Math:

Tiny-R1–32B-Preview (78.1) is very close to Deepseek-R1 (79.8) but slightly lower.

Both Deepseek-R1-Distill models lag behind, with scores of 72.6 (Qwen-32B) and 70.0 (Llama-70B).

Coding:

Tiny-R1–32B-Preview (61.6) outperforms the Deepseek-R1-Distill models (57.2 and 57.5).

Deepseek-R1 shows the highest performance in this domain (65.9).

Science:

Tiny-R1–32B-Preview (65.0) is quite competitive with Deepseek-R1-Distill-Llama-70B (65.2), but still falls behind Deepseek-R1 (71.5).

How Tiny-R1 trained?

Base Model Selection:

  • The team started with Deepseek-R1-Distill-Qwen-32B, a large pretrained model.

Supervised Fine-Tuning (SFT):

  • They applied Supervised Fine-Tuning (SFT) to adapt the model to three specific domains: Mathematics, Code, and Science.
  • This involves training the model on domain-specific data to specialize it for each task.

360-LLaMA-Factory Framework:

  • The fine-tuning was done using the 360-LLaMA-Factory training framework, which is designed to efficiently train large models on specialized tasks.

Using Open-Source Data:

  • For each domain, open-source data was used as seeds (starting points).
  • These seeds consisted of questions in Math, Code, and Science to help the model learn task-specific knowledge.

Generating Responses with Deepseek-R1:

  • The model, Deepseek-R1, was used to generate appropriate responses for each domain (Math, Code, and Science) based on the seed questions.

Creating Specialized Models:

  • From this, three specialized models were created, one for each domain: Math Model, Code Model, and Science Model.

Combining Models Using Mergekit:

  • The team then used the Mergekit tool (developed by the Arcee team) to combine these three specialized models into one unified model.

Creating Tiny-R1–32B-Preview:

  • The final result was Tiny-R1–32B-Preview, a compact model that demonstrates strong performance across all three domains.

How to use Tiny-R1?

The model is open-sourced and the weights are available on

qihoo360/TinyR1-32B-Preview · Hugging Face

Hope you try out the model


Tiny-R1: 32B model achieves DeepSeek-R1 performance was originally published in Data Science in your pocket on Medium, where people are continuing the conversation by highlighting and responding to this story.

Share this article
0
Share
Shareable URL
Prev Post

Is AI making Software Developers dumb?

Next Post

What is DeepSeek DeepEP?

Read next
Subscribe to our newsletter
Get notified of the best deals on our Courses, Tools and Giveaways..