Phi-4-Reasoning: Microsoft’s new LLMs are Smarter, Faster, Free-er

Phi-4-Reasoning: Microsoft’s new LLMs are Smarter, Faster, Free-er

How to use phi4-reasoning LLMs for free?

Photo by Ed Hardie on Unsplash

Another week, another reasoning LLM drops — but this one’s from Microsoft, and it slaps. After Qwen3 and DeepSeek-Prover-V2 already this week, Microsoft throws its hat in the ring with not one but three new open-weight models designed to handle serious reasoning like a champ.

https://medium.com/media/67cfeea2818f41a233efb864c124039b/href

Whether you’re solving high school algebra, battling 3SAT, or building agents that don’t hallucinate mid-task, Phi-4 Reasoning might just be your new best friend.

The team has released 3 models: Phi-4-reasoning, Phi-4-reasoning-plus and Phi-4-mini-reasoning.

Benchmarks
Reasoning examples

Model Breakdown

Phi-4-Reasoning: The Baseline Beast

This one’s the solid all-rounder. Think of it as the base Phi-4 model on steroids.

  • Training: 1.4M high-quality STEM prompts with detailed reasoning traces (courtesy of o3-mini).

Cool Tricks:

  • Uses custom <think> tags to structure logical blocks.
  • Context window stretched from 16K → 32K tokens.

Performance:

  • Beats models like DeepSeek-R1-Distill (70B!).
  • Handles math, coding, and planning tasks surprisingly well.

TL;DR: If you want general reasoning with solid performance and zero fuss, start here

Phi-4-Reasoning-Plus: The Math Specialist

Now we’re cooking with reinforcement learning.

  • What’s New: Fine-tuned with Group Relative Policy Optimisation (GRPO) using 6K handpicked math problems.
  • Reward System: Encourages accuracy, discourages rambling. Think: “Be smart. Don’t waffle.”

Performance Gains:

  • +10–15% accuracy in AIME and OmniMath.
  • Longer reasoning traces = deeper insight (at the cost of inference time).

TL;DR: If your life revolves around math problems or competitive benchmarks, this one’s your MVP.

Phi-4-Mini-Reasoning: The Tiny Titan

Small, scrappy, and surprisingly strong — like Ant-Man, but for logic.

  • Size: Just 3.8B params, yet supports 128K token context length.
  • Trained On: Synthetic math data from more capable teacher models.
  • Specialization: Step-by-step logic, ideal for mobile or edge scenarios.

Caveats:

  • Not general-purpose — struggles outside math/logic.
  • May hallucinate facts due to smaller size (RAG recommended).

TL;DR: Perfect for lightweight math tasks, but not your next chatbot engine.

How to use Phi-4-reasoning models?

The models are completely open source, and the weights are available on Hugging Face. Do check out at the link below.

microsoft/Phi-4-reasoning · Hugging Face

The code snippet below can be used for loading the models locally

pip install flash_attn==2.7.4.post1 torch==2.5.1 transformers==4.51.3 accelerate==1.3.0
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
torch.random.manual_seed(0)

model_id = "microsoft/Phi-4-mini-reasoning"
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="cuda",
torch_dtype="auto",
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(model_id)

messages = [{
"role": "user",
"content": "How to solve 3*x^2+4*x+5=1?"
}]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_dict=True,
return_tensors="pt",
)

outputs = model.generate(
**inputs.to(model.device),
max_new_tokens=32768,
temperature=0.8,
top_p=0.95,
do_sample=True,
)
outputs = tokenizer.batch_decode(outputs[:, inputs["input_ids"].shape[-1]:])

print(outputs[0])

When to use which model?

  • Need long-context for documents? Mini has 128K tokens.
  • Need maximum accuracy on math? Go with Plus.
  • Just testing things out? Start with the base model.

Final Thoughts

Microsoft isn’t just playing catch-up — they’re sprinting into the reasoning arena with models that are lean, smart, and refreshingly open.

If you’re building autonomous agents, tutoring systems, or just exploring logic-heavy LLMs, Phi-4 Reasoning models are absolutely worth a spin. Just remember: test before you trust — especially in high-stakes use cases.

Hope you try out the new reasoning models !!


Phi-4-Reasoning: Microsoft’s new LLMs are Smarter, Faster, Free-er was originally published in Data Science in Your Pocket on Medium, where people are continuing the conversation by highlighting and responding to this story.

Share this article
0
Share
Shareable URL
Prev Post

DeepSeek-Prover-V2 Free API

Next Post

WhatsApp MCP: Let AI control your WhatsApp

Read next
Subscribe to our newsletter
Get notified of the best deals on our Courses, Tools and Giveaways..