Tiny Recursive Model: Proving Small AI Can Think Big

Tiny Recursive Model: Proving Small AI Can Think Big

Imagine if a tiny smartphone app could solve brain-teasing puzzles better than the world’s biggest supercomputers running AI. That’s the bold idea behind the Tiny Recursive Model, or TRM — a super-small AI brain with just 7 million “parameters” (think of parameters as the basic building blocks or settings that help AI learn patterns). Created by researcher Alexia Jolicoeur-Martineau at Samsung’s AI lab in Montreal, TRM shows that less can indeed be more when it comes to smart thinking, especially for tough logic puzzles. This isn’t about making chatbots; it’s about cracking hard problems like filling in Sudoku grids or navigating mazes, where big AI often stumbles.​

TRM builds on a previous idea called the Hierarchical Reasoning Model (HRM) but makes it way simpler and stronger. Trained on just a handful of examples — around 1,000 per puzzle — TRM beats giant language models (LLMs, which are huge AI systems like ChatGPT that predict words) on tests that measure real reasoning, not just wordplay. In this article, we’ll break it down step by step, explaining the tech in everyday terms, why it’s a game-changer, and what makes it tick.​

The Big Problem: Why Huge AI Isn’t Always Smarter

Most top AI today, like LLMs from companies such as OpenAI or Google, are enormous. They have billions or even trillions of parameters, trained on massive internet data to chat, write stories, or answer questions. But when it comes to pure reasoning — figuring out patterns in puzzles without hints — these giants often flop. Why? They excel at memorizing patterns from data but struggle with novel, abstract problems because they’re too big and get “overfitted” (meaning they learn the training data too well but can’t generalize to new stuff, like cramming for a test without understanding the subject).​

Benchmarks like ARC-AGI test this. ARC-AGI (Abstraction and Reasoning Corpus for Artificial General Intelligence) is a set of colorful grid puzzles where you spot rules from a few examples and apply them to new grids. It’s designed to mimic human-like smarts, not rote learning. Big LLMs score low here — around 30–40% on the easier ARC-AGI-1 and under 5% on the harder ARC-AGI-2 — because they generate text step-by-step (chain-of-thought reasoning) but can’t truly iterate like a human pondering a problem. TRM flips this by being small and “recursive,” meaning it loops back on its own thoughts to refine them, much like how you might sketch a maze solution, check it, and tweak it until it’s right.​​

My book with 20+ End to End Data Science Case Studies from 5 different domains is available on Amazon.

Cracking Data Science Case Study Interview: Data, Features, Models and System Design

Architectural Innovation: How TRM’s Simple Design Packs a Punch

At its heart, TRM is a “neural network” — a computer program inspired by the brain, made of layers that process info like filters in a photo app. But unlike deep networks in big LLMs (which stack dozens or hundreds of layers like a tall building), TRM uses just two layers. That’s its big innovation: recursion over size. Recursion is like a loop in a video game where a character keeps trying actions until they work — in TRM, the network repeatedly updates its internal “thoughts” without growing bigger.​

Here’s how it works in plain steps. TRM starts with a puzzle input, like a Sudoku grid (a 9×9 number puzzle where each row, column, and box must have unique digits 1–9). It has two key “memories”:

  • A “latent reasoning state” (call it a scratchpad z), which holds ongoing thoughts or patterns it’s spotting.
  • A “current solution embedding” (call it y), which is its best guess so far, like a half-filled grid.

The network does two phases in a loop:

  1. Think phase: For several “inner steps” (up to 16 times), it updates the scratchpad z using the input puzzle x, the current guess y, and the old z. This is like brainstorming: “What’s missing here? Any patterns?” It’s powered by a simple self-attention mechanism (a way for the AI to focus on important parts of the data, like highlighting key clues in a puzzle) mixed with basic math operations.​
  2. Act phase: After thinking, it refines the guess y using the updated scratchpad. This outputs an improved solution, like filling in more numbers correctly.

This loop repeats until the model decides to stop, using “adaptive halting” — a built-in check that says, “Is this good enough?” based on how much it’s improving. No endless spinning; it stops smartly to save energy. To train it, they use “deep supervision,” giving feedback not just at the end but at every loop step, so it learns to think better mid-process (like a teacher correcting your work as you go).​

TRM skips fancy extras: No need for “fixed-point theorems” (math tricks to stabilize loops, used in HRM) or biological inspirations (HRM mimicked brain hierarchies). It’s one network doing everything, trained with standard methods on small data — making it cheap and fast to run on regular computers.​​

How TRM Stands Out from Top LLMs

Top LLMs like Gemini 2.5 Pro (Google’s massive model) or DeepSeek R1 are “autoregressive” — they predict one word (or token) at a time, building long chains of thought. This works for stories but fails on visual puzzles like ARC-AGI, where you need to manipulate grids precisely. They’re trained on trillions of words, costing millions in electricity, and still hit walls on zero-shot reasoning (solving without examples).​

TRM is different in every way:

  • Size and Efficiency: LLMs have 1B+ parameters; TRM has 7M (0.0007% the size). It trains on ~1,000 puzzle examples vs. billions of text snippets. Inference (running the model) is quicker — no generating endless text, just targeted loops.​
  • Reasoning Style: LLMs use chain-of-thought, which is linear and text-heavy. TRM’s recursion is iterative and latent (hidden math states, not words), letting it “ponder” deeply without outputting chit-chat. This avoids “hallucinations” (making up wrong facts) common in LLMs.​
  • Generalization Power: Big models overfit to text patterns; TRM’s small size plus recursion prevents this, shining on unseen puzzles. It doesn’t need internet-scale data because recursion builds reasoning from scratch.​​
  • No Generative Fluff: LLMs are great at creating text but deterministic here — TRM gives one solid answer per puzzle, perfect for logic tasks, not creative writing.​

In short, while LLMs scale up to brute-force smarts, TRM scales “in” through clever looping, proving you don’t need a giant brain for big thoughts.​

Benchmarking Results: Small Model, Big Wins

TRM was tested on tough puzzles needing pattern-spotting and planning:

  • Sudoku-Extreme: Super-hard 9×9 grids with tricky constraints. TRM scored 87.4% accuracy, up from HRM’s 77% and way better than LLMs (which struggle with exact math without plugins).​​
  • Maze-Hard: Navigating complex mazes from start to goal. TRM hit 85.3%, beating HRM’s 74.5%.​
  • ARC-AGI-1 (Easier Version): 45% accuracy — tops HRM’s 40.3% and LLMs like o3-mini (34.5%), Gemini 2.5 Pro (37.8%), and DeepSeek R1 (around 40%).​
  • ARC-AGI-2 (Harder, Unseen Tasks): 8% accuracy, doubling Gemini’s 4.9% and HRM’s 5%. This is huge because ARC-2 mimics real AGI challenges.​

Overall, TRM averaged 10–20% gains over priors on these, using 75% less compute. It even matches pro setups on public leaderboards. These aren’t flukes — tests without recursion dropped scores by 15–20%, proving the loop is key.​​

Why Does TRM Work So Well?

So, why does this tiny looper outperform behemoths? It’s a mix of smarts and simplicity.

  1. Recursion fights overfitting. Big LLMs memorize training noise; TRM’s loops let it self-correct, building robust rules from few examples — like practicing a skill through trial and error instead of reading a library. The small size (2 layers) keeps it focused, avoiding the “bloat” that dilutes reasoning in deep nets.​
  2. The dual-memory setup (scratchpad z for raw ideas, y for polished output) separates “thinking” from “acting.” This mimics human cognition: brainstorm freely, then refine. Deep supervision trains both, making each loop smarter — studies show mid-loop feedback boosts accuracy by 10–15%.​
  3. Adaptive halting saves power and prevents overthinking. Without it, loops could waste time; here, it stops when improvements plateau, like knowing when to quit editing a draft.​

Experts think recursion simulates “deeper” networks dynamically — 16 loops act like 32 layers without the memory cost. It shines on combinatorial tasks (puzzles with exploding possibilities) because iteration explores options efficiently, unlike LLMs’ one-shot guesses. Drawbacks? It’s task-specific (great for puzzles, less for chat) and needs labeled data for training, but at this scale, that’s easy.​​

Future tweaks could add more layers or mix with LLMs for hybrid power. For now, TRM hints that AGI might come from clever loops, not just more hardware.​

TRM’s open-source code on GitHub means anyone can tinker, potentially sparking a wave of efficient AI for phones or robots. It’s a reminder: In AI, brains matter more than brawn


Tiny Recursive Model: Proving Small AI Can Think Big was originally published in Data Science in Your Pocket on Medium, where people are continuing the conversation by highlighting and responding to this story.

Share this article
0
Share
Shareable URL
Prev Post

DeepSeek is dead !

Next Post

LFM2–8B-A1B : Best Edge AI LLM for mobiles

Read next
Subscribe to our newsletter
Get notified of the best deals on our Courses, Tools and Giveaways..