Less is More : Recursive Reasoning with Tiny Networks paper explained
How 7M LLMs beats DeepSeek-R1, Gemini 2.5 Pro on Arc AGI ?
Alexia Jolicoeur-Martineau from Samsung Montréal wrote a paper that basically asks a rude question:
https://medium.com/media/cfa5cf24abb45495115d6fde4b41a76b/href
do we really need billion-parameter models to solve puzzles like Sudoku?
My new book on Audio Generative AI is out !!
Turns out, no.
The paper introduces Tiny Recursive Model (TRM) a 7M-parameter network that outperforms LLMs like Gemini 2.5 Pro, DeepSeek R1, and even the so-called “reasoning” variants on tasks like Sudoku-Extreme, Maze-Hard, and ARC-AGI (the benchmark designed to measure general reasoning, not just next-token prediction).
This thing beats those giants by an embarrassing margin:
- 87% on Sudoku-Extreme (vs HRM’s 55% and LLMs’ 0%)
- 85% on Maze-Hard
- 45% on ARC-AGI-1 and 8% on ARC-AGI-2, better than Gemini 2.5 Pro with 30 000× more parameters.
So, how?
Why LLMs still choke on logic puzzles

Large language models generate one token at a time. If a single step goes wrong, the entire reasoning collapses. Chain-of-Thought helps a bit, but it’s brittle, it can reason wrong step-by-step just as confidently. People add “Test-Time Compute” tricks like majority voting or reward scoring, but even then, models fail to reason symbolically or recursively.
After six years of ARC-AGI attempts, no LLM has reached human-level accuracy. That’s where smaller, supervised reasoning systems like Hierarchical Reasoning Model (HRM) came in.
HRM: two brains arguing inside a network

HRM used two small transformer networks:
One low-level module (fL) updated frequently.
One high-level module (fH) updated occasionally. Both maintained their own latent states (zL, zH) and exchanged information in a recursive loop.
x = embedded input
zL = fL(zL + zH + x)
zL = fL(zL + zH + x)
zH = fH(zL + zH)
zL = fL(zL + zH + x)
...
y_hat = fO(zH)
So fL gets called multiple times (high-frequency), fH less often (low-frequency).
At the end, fH’s output (zH) is used to predict the answer
The idea was vaguely biological, like neurons firing at different temporal frequencies.
- HRM used something called deep supervision: it trained the model to refine its answer across up to 16 recursive steps, reusing the latent states from earlier passes.
- It also used a halting policy (ACT) based on Q-learning to decide when to stop refining, so it didn’t waste time doing all 16 passes for easy samples.
The problem? It was all over-engineered. Two networks. Biological metaphors. Detaching gradients. An Implicit Function Theorem assumption that pretended recursion reached a “fixed point” (which it didn’t). And every training step needed two forward passes.
The result: good, but not great. HRM topped out at 55% on Sudoku-Extreme.
TRM: one tiny network, pure recursion


TRM throws most of HRM away and keeps only the part that actually works: recursive refinement.
No hierarchy, no biology, no fixed-point theorems. Just one small network that keeps improving its guess.
The setup is simple:
- x = question (like a Sudoku grid)
- y = current predicted answer
- z = latent reasoning state
The model recursively updates:
z = net(x, y, z) # reason a bit more
y = net(y, z) # refine answer
and repeats this for a few cycles. The first few recursions run without gradients, the last one backpropagates.
That’s it. Two layers. Seven million parameters. No tricks.

Example

- Generally, LLMs reason like someone writing an essay in one go once a word is written, it’s final. If the third sentence is wrong, everything after it collapses. That’s flat reasoning: no real self-correction.
- HRM tried to fix that by splitting the brain in two. One small network looks at fine details (low-level), another watches the big picture (high-level). They talk back and forth, refining each other’s answers over many rounds. It works, but it’s messy
- TRM throws all that away. One tiny network, two memories:
y = current answer
z = what it’s thinking about
It just loops: look → think → improve. Each pass makes the answer a bit better.
In plain terms:
LLM = writes once, never checks.
HRM = two people debating each draft.
TRM = one person rereading and fixing their own draft until it’s right.
And somehow, that last one, the simplest, wins.
Smaller is better

The ablations in the paper are brutal. Increasing the layer count worsened generalization.
Adding more latent variables worsened it again. Using attention layers hurt Sudoku performance (MLPs worked better since the grid size is small and fixed).
Even the Q-learning halting mechanism from HRM was replaced with a single binary prediction, “has the model reached the correct answer yet?” no need for a second forward pass.
A few other tweaks helped:
- Exponential Moving Average (EMA) to stabilize training on tiny datasets.
- Data augmentation : 1000 Sudoku permutations per example.
- Deep supervision: reusing latent states across 16 reasoning iterations.
With all that, TRM generalizes absurdly well for its size.
What this means

TRM shows that the structure of reasoning matters more than size.
Recursion lets a small network simulate depth by repeatedly re-thinking its own output. Instead of storing all reasoning inside parameters, it uses time, a few reasoning steps, to refine and self-correct.
It’s a different kind of intelligence: not one giant monolith memorizing everything, but a small model that learns how to improve itself in loops.
It also exposes a gap in how we evaluate AI. LLMs dominate language benchmarks, but when you ask them to reason through a concrete, rule-based environment like Sudoku or ARC-AGI, they crumble. Meanwhile, a 7M-parameter recursive model trained on a thousand examples wipes the floor with them.
The takeaway

This isn’t about throwing out LLMs. It’s about remembering that more parameters don’t equal more reasoning. TRM is the first serious demonstration that recursion beats brute force.
A network doesn’t need to be massive, it needs to think twice.
Model Context Protocol: Advanced AI Agents for Beginners (Generative AI books)
Less is More : Recursive Reasoning with Tiny Networks paper explained was originally published in Data Science in Your Pocket on Medium, where people are continuing the conversation by highlighting and responding to this story.