Dots.ocr : The best small-sized OCR ever

Dots.ocr : The best small-sized OCR ever

Dots.ocr : The best small-sized OCR ever

How to use dots.ocr for free?

Photo by Akhilesh Sharma on Unsplash

If you’re still messing with clunky OCR pipelines in 2025, stop. Dots.ocr just dropped, and it’s exactly the kind of “quietly brilliant” model that makes you double-check the parameter count.

Yes, it’s 1.7B. No, it doesn’t feel like it.

Model Context Protocol: Advanced AI Agents for Beginners (Generative AI books)

This thing eats scanned documents for breakfast: layout, content, multilingual text, formulas, tables, it parses all of it without a problem. And it does it in a single unified vision-language model. No weird multi-stage preprocessing. You throw it a prompt. It figures the rest out.

Let’s talk about what makes this model so stupidly good.

Not Just Another OCR Model

dots.ocr is a vision-language model that speaks fluent document.

Where other models uses YOLO-style detectors with a language model, dots.ocr uses just one VLM to handle layout detection, text parsing, reading order, and even formulas. No switching between models. No feature misalignment. Just a clean, prompt-based interface to switch between tasks.

  • You want layout detection? Change the prompt.
  • You want text-only OCR? Change the prompt.
  • Want to ground a region by bounding box? There’s a prompt for that too.

This makes it absurdly easy to deploy, debug, and extend. You don’t need to maintain three different models and hope they agree on the coordinates of a table. Dots.ocr just gets it right the first time.

Performance

Let’s talk benchmarks. Because if you’ve been in the OCR world long enough, you know flashy claims don’t mean squat without good numbers.

OmniDocBench

On the gold-standard benchmark for document parsing, dots.ocr flat-out owns its category. It hits top scores across:

  • Text recognition: EN: 0.032, ZH: 0.066 (Lower = better)
  • Formula detection: On par with 72B models like Gemini2.5-Pro
  • Table understanding: 88.6 / 89.0 TableTEDS (EN/ZH)
  • Reading order: It basically nails this, scoring lower errors than GPT-4o, Mistral, or even MonkeyOCR-Pro-3B

To put that in perspective, this 1.7B model is outperforming models 20x its size.

Multilingual Parsing (dots.ocr-bench)

dots.ocr doesn’t just survive in low-resource languages, it thrives. On their in-house benchmark (1493 PDFs across 100 languages), it slashes error rates by nearly half compared to Doubao or MonkeyOCR.

Why does this matter? Because most OCR systems collapse the moment you throw something like Tibetan or Kannada at them. dots.ocr just shrugs and keeps parsing.

Layout Detection: YOLO Who?

DocLayout-YOLO was supposed to be the “good enough” baseline. But dots.ocr wipes the floor with it:

  • F1@IoU .50: 0.93 overall vs YOLO’s 0.80
  • For formula detection alone: 0.832 vs 0.620

And it does this without being a dedicated detection model. Just prompt it with prompt_layout_only_en, and it becomes one. That’s the trick: VLMs used to be jack-of-all-trades, master-of-none. dots.ocr feels like a master.

Deep Cut: OLMOCR-bench

If you’ve ever worked with noisy PDFs, old scans, math-heavy journals, weird headers , you know these are where models die.

  • MonkeyOCR-pro-3B pulls a decent 75.8 overall.
  • dots.ocr? 79.1. It doesn’t flinch even on multi-column garbage scans with embedded LaTeX and footnotes from hell.

There’s even a breakdown for specific document types:

Textbooks, exam papers, financial reports, newspapers… dots.ocr leads or is second-best across the board. That’s wild considering it’s running on ~3B parameters with BF16 precision.

Deployment Is Surprisingly Clean

You can deploy it via vLLM or Huggingface APIs. The docs are actually usable, but here’s what caught my eye:

  • No TensorRT drama. No need to baby-sit CUDA.
  • Prompt-based task switching means you don’t need a custom inference script for every document type.
  • Docker support if you’re lazy (I was).

It even has a working Gradio demo.

dots.ocr

It’s Not Perfect, Yet

This wouldn’t be an honest post if I didn’t point out the flaws:

  • High-density images can trip it up. If your image is 11289600 pixels or more, downsample or bump DPI to 200.
  • Special characters like … or ___ cause weird repetition bugs in outputs. You’ll want to try alternate prompts in those cases.
  • No picture parsing. That’s still a gap. If your docs embed infographics, you’re out of luck.
  • Throughput’s limited for bulk jobs. It’s not yet optimized for high-scale PDF ingestion.

But considering it’s a first release, these are minor tradeoffs. It’s still more robust than most of what’s on the market.

Why This Model Actually Matters

dots.ocr feels like a proof point, not just for document parsing, but for vision-language modeling done right. For years, OCR was a separate domain with clunky tooling and brittle pipelines. Now? It’s just another prompt away.

This isn’t about OCR anymore. It’s about collapsing whole toolchains into a single flexible VLM that actually works.

If you’re building anything involving scanned forms, multi-language docs, academic papers, or even messy invoices, test this thing. It’s free, it’s fast, and it’s dangerously good.

The model is open-sourced and can be accessed below

rednote-hilab/dots.ocr · Hugging Face

Closing Note

I don’t usually write love letters to OCR tools. But dots.ocr feels like the kind of model that makes other tools irrelevant overnight. Try it before it gets bloated, commercialized, or buried under ten layers of enterprise licensing.

And if you’re building a project around document intelligence, skip the YOLOs, the UNets, the handmade table heuristics. Just use this.


Dots.ocr : The best small-sized OCR ever was originally published in Data Science in Your Pocket on Medium, where people are continuing the conversation by highlighting and responding to this story.

Share this article
0
Share
Shareable URL
Prev Post

GPT-5 : OpenAI’s Worst Release Yet

Next Post

Hunyuan 1.8B : Best small sized reasoning LLM series

Read next
Subscribe to our newsletter
Get notified of the best deals on our Courses, Tools and Giveaways..