How to use Devstral-Small by Mistral AI for free?
And Mistral AI is back with a bang, and it looks like everyone is just behind eating up software jobs. This time, the French AI Startup and open-sourced Devstral-Small-2505, a 24B mode, which looks to be the best for Software Engineering tasks amongst similar sized models by a huge margin.
Meet Devstral: Your New Coding Sidekick
Devstral isn’t just another code-generating LLM — it’s a purpose-built agentic model that can actually think and act like a software dev. It doesn’t just spit out functions; it navigates codebases, edits multiple files, and powers intelligent software engineering workflows. Think of it as the brains behind next-gen dev tools and agents.
And it’s already turning heads.
Powered by Mistral-Small-3.1
Devstral is fine-tuned from Mistral-Small-3.1, giving it an impressive 128k token context window — enough memory to juggle massive codebases without losing track of the plot. Before fine-tuning, the vision encoder was ditched, making this a purely text-based model — perfect for all your repo-wrangling needs. Hence, it doesn’t support
Why Devstral Stands Out
Let’s break down the good stuff:
Agentic Coding Superpowers
Devstral was built from the ground up to enable agentic workflows — meaning it’s not just outputting code, it’s making informed decisions as part of a larger dev loop. Ideal for autonomous agents, copilots, and tool-integrated workflows.
Lightweight, Local-Ready
At 24 billion parameters, Devstral hits the sweet spot: smart enough to be powerful, but light enough to run locally on a single RTX 4090 or even a 32GB Mac. That’s right — no data centre required.
Open Licensing
Released under Apache 2.0, you’re free to use, tweak, and commercialise it. No strings, no drama.
Tekken Tokenizer
It uses a high-capacity Tekken tokenizer with a 131k vocabulary, giving it nuanced control over token distribution, especially useful in code-heavy tasks where token bloat is real.
Benchmarks

Let’s get nerdy with the SWE-Bench Verified scores, the gold standard for testing real-world software engineering smarts:

Devstral not only beats other open-source models — it leapfrogs closed-source giants like GPT-4.1-mini. Even with its smaller size, it outperforms bigger models like Deepseek-V3–0324 and Qwen3–232B when tested under the same scaffold.
What is a Scaffold?
In the context of LLM benchmarking (like SWE-Bench), a scaffold refers to the evaluation framework or prompt structure that wraps around a task to help the model complete it. Think of it as the test environment setup — the scaffolding around a problem that provides context, instructions, and sometimes even tools for the model to use.
Different scaffolds = different performance. Just like giving someone better tools and instructions can help them solve a problem faster, a well-designed scaffold can significantly improve an LLM’s performance on a benchmark.
It includes stuff like:
- How the problem is described to the model (prompt formatting)
- What tools or APIs are available (like file editors, search tools, or test runners)
- How intermediate steps are handled (e.g., multi-step reasoning or tool usage)
- What counts as a “successful fix” (e.g., passing a test or changing the right lines)
How to use Devstral-Small for free?
The model weights are open-sourced and can be accessed from HuggingFace
mistralai/Devstral-Small-2505 · Hugging Face
Also, if you don’t have enough resources, try using it in this free space
Devstral Small 2505 – a Hugging Face Space by Bluestrike
Concluding,
Devstral is a breath of fresh air for software engineers and agent developers. It blends agentic intelligence, scalability, and real-world performance, all packed in a locally-runnable, open-source model. Whether you’re building dev agents, code explorers, or just want an LLM that actually understands your codebase, Devstral should be on your radar.
Devstral Small: The best Software Engineering Agentic LLM by Mistral was originally published in Data Science in Your Pocket on Medium, where people are continuing the conversation by highlighting and responding to this story.