How to use phi4-reasoning LLMs for free?
Another week, another reasoning LLM drops — but this one’s from Microsoft, and it slaps. After Qwen3 and DeepSeek-Prover-V2 already this week, Microsoft throws its hat in the ring with not one but three new open-weight models designed to handle serious reasoning like a champ.
https://medium.com/media/67cfeea2818f41a233efb864c124039b/href
Whether you’re solving high school algebra, battling 3SAT, or building agents that don’t hallucinate mid-task, Phi-4 Reasoning might just be your new best friend.
The team has released 3 models: Phi-4-reasoning, Phi-4-reasoning-plus and Phi-4-mini-reasoning.




Model Breakdown
Phi-4-Reasoning: The Baseline Beast
This one’s the solid all-rounder. Think of it as the base Phi-4 model on steroids.
- Training: 1.4M high-quality STEM prompts with detailed reasoning traces (courtesy of o3-mini).
Cool Tricks:
- Uses custom <think> tags to structure logical blocks.
- Context window stretched from 16K → 32K tokens.
Performance:
- Beats models like DeepSeek-R1-Distill (70B!).
- Handles math, coding, and planning tasks surprisingly well.
TL;DR: If you want general reasoning with solid performance and zero fuss, start here
Phi-4-Reasoning-Plus: The Math Specialist
Now we’re cooking with reinforcement learning.
- What’s New: Fine-tuned with Group Relative Policy Optimisation (GRPO) using 6K handpicked math problems.
- Reward System: Encourages accuracy, discourages rambling. Think: “Be smart. Don’t waffle.”
Performance Gains:
- +10–15% accuracy in AIME and OmniMath.
- Longer reasoning traces = deeper insight (at the cost of inference time).
TL;DR: If your life revolves around math problems or competitive benchmarks, this one’s your MVP.
Phi-4-Mini-Reasoning: The Tiny Titan
Small, scrappy, and surprisingly strong — like Ant-Man, but for logic.
- Size: Just 3.8B params, yet supports 128K token context length.
- Trained On: Synthetic math data from more capable teacher models.
- Specialization: Step-by-step logic, ideal for mobile or edge scenarios.
⚠ Caveats:
- Not general-purpose — struggles outside math/logic.
- May hallucinate facts due to smaller size (RAG recommended).
TL;DR: Perfect for lightweight math tasks, but not your next chatbot engine.


How to use Phi-4-reasoning models?
The models are completely open source, and the weights are available on Hugging Face. Do check out at the link below.
microsoft/Phi-4-reasoning · Hugging Face
The code snippet below can be used for loading the models locally
pip install flash_attn==2.7.4.post1 torch==2.5.1 transformers==4.51.3 accelerate==1.3.0
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
torch.random.manual_seed(0)
model_id = "microsoft/Phi-4-mini-reasoning"
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="cuda",
torch_dtype="auto",
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
messages = [{
"role": "user",
"content": "How to solve 3*x^2+4*x+5=1?"
}]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_dict=True,
return_tensors="pt",
)
outputs = model.generate(
**inputs.to(model.device),
max_new_tokens=32768,
temperature=0.8,
top_p=0.95,
do_sample=True,
)
outputs = tokenizer.batch_decode(outputs[:, inputs["input_ids"].shape[-1]:])
print(outputs[0])
When to use which model?

- Need long-context for documents? Mini has 128K tokens.
- Need maximum accuracy on math? Go with Plus.
- Just testing things out? Start with the base model.
Final Thoughts
Microsoft isn’t just playing catch-up — they’re sprinting into the reasoning arena with models that are lean, smart, and refreshingly open.
If you’re building autonomous agents, tutoring systems, or just exploring logic-heavy LLMs, Phi-4 Reasoning models are absolutely worth a spin. Just remember: test before you trust — especially in high-stakes use cases.
Hope you try out the new reasoning models !!
Phi-4-Reasoning: Microsoft’s new LLMs are Smarter, Faster, Free-er was originally published in Data Science in Your Pocket on Medium, where people are continuing the conversation by highlighting and responding to this story.